from:"Intel"

Re: [PATCH] drm/tests: mm: Convert to drm_dbg_printer

2024-02-09 Thread Intel




On 2/9/24 15:08, Michał Winiarski wrote:

Fix one of the tests in drm_mm that was not converted prior to
drm_debug_printer removal, causing tests build failure.

Fixes: e154c4fc7bf2d ("drm: remove drm_debug_printer in favor of 
drm_dbg_printer")
Signed-off-by: Michał Winiarski 
---
  drivers/gpu/drm/tests/drm_mm_test.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tests/drm_mm_test.c 
b/drivers/gpu/drm/tests/drm_mm_test.c
index 1eb0c304f9607..3488d930e3a38 100644
--- a/drivers/gpu/drm/tests/drm_mm_test.c
+++ b/drivers/gpu/drm/tests/drm_mm_test.c
@@ -188,7 +188,7 @@ static void drm_test_mm_init(struct kunit *test)
  
  static void drm_test_mm_debug(struct kunit *test)

  {
-   struct drm_printer p = drm_debug_printer(test->name);
+   struct drm_printer p = drm_dbg_printer(NULL, DRM_UT_CORE, test->name);
struct drm_mm mm;
struct drm_mm_node nodes[2];
  


Reviewed-by: Thomas Hellström

Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: fix check in dma_resv_add_fence

2023-11-27 Thread Intel




On 11/27/23 14:24, Christian König wrote:

Ping? Can I get an rb or acked-by for that?

Thanks,
Christian.

Am 15.11.23 um 10:30 schrieb Christian König:

It's valid to add the same fence multiple times to a dma-resv object and
we shouldn't need one extra slot for each.

Signed-off-by: Christian König 
Fixes: a3f7c10a269d5 ("dma-buf/dma-resv: check if the new fence is 
really later")

Cc: sta...@vger.kernel.org # v5.19+



Reviewed-by: Thomas Hellström 



---
  drivers/dma-buf/dma-resv.c |  2 +-
  include/linux/dma-fence.h  | 15 +++
  2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 38b4110378de..eb8b733065b2 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -301,7 +301,7 @@ void dma_resv_add_fence(struct dma_resv *obj, 
struct dma_fence *fence,

    dma_resv_list_entry(fobj, i, obj, , _usage);
  if ((old->context == fence->context && old_usage >= usage &&
- dma_fence_is_later(fence, old)) ||
+ dma_fence_is_later_or_same(fence, old)) ||
  dma_fence_is_signaled(old)) {
  dma_resv_list_set(fobj, i, fence, usage);
  dma_fence_put(old);
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index ebe78bd3d121..b3772edca2e6 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -498,6 +498,21 @@ static inline bool dma_fence_is_later(struct 
dma_fence *f1,

  return __dma_fence_is_later(f1->seqno, f2->seqno, f1->ops);
  }
  +/**
+ * dma_fence_is_later_or_same - return true if f1 is later or same 
as f2

+ * @f1: the first fence from the same context
+ * @f2: the second fence from the same context
+ *
+ * Returns true if f1 is chronologically later than f2 or the same 
fence. Both
+ * fences must be from the same context, since a seqno is not 
re-used across

+ * contexts.
+ */
+static inline bool dma_fence_is_later_or_same(struct dma_fence *f1,
+  struct dma_fence *f2)
+{
+    return f1 == f2 || dma_fence_is_later(f1, f2);
+}
+
  /**
   * dma_fence_later - return the chronologically later fence
   * @f1:    the first fence from the same context


___
Linaro-mm-sig mailing list -- linaro-mm-...@lists.linaro.org
To unsubscribe send an email to linaro-mm-sig-le...@lists.linaro.org

Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

2023-09-01 Thread Intel




On 8/31/23 21:07, Danilo Krummrich wrote:

On Thu, Aug 31, 2023 at 06:53:01PM +0200, Thomas Hellström (Intel) wrote:

Hi,

On 8/31/23 13:18, Danilo Krummrich wrote:

On Thu, Aug 31, 2023 at 11:04:06AM +0200, Thomas Hellström (Intel) wrote:

Hi!

On 8/30/23 17:00, Danilo Krummrich wrote:

On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellström (Intel) wrote:

On 8/30/23 14:49, Danilo Krummrich wrote:

Hi Thomas,

thanks for having a look!

On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote:

Hi, Danilo.

Some quick comments since I'm doing some Xe work in this area. Will probably
get back with more.

On 8/20/23 23:53, Danilo Krummrich wrote:




diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
index ed8d50200cc3..693e2da3f425 100644
--- a/include/drm/drm_gpuva_mgr.h
+++ b/include/drm/drm_gpuva_mgr.h
@@ -26,12 +26,16 @@
   */
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
+#include 
  struct drm_gpuva_manager;
+struct drm_gpuva_gem;
  struct drm_gpuva_fn_ops;
  /**
@@ -140,7 +144,7 @@ struct drm_gpuva {
  int drm_gpuva_insert(struct drm_gpuva_manager *mgr, struct drm_gpuva *va);
  void drm_gpuva_remove(struct drm_gpuva *va);
-void drm_gpuva_link(struct drm_gpuva *va);
+void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuva_gem *vm_bo);
  void drm_gpuva_unlink(struct drm_gpuva *va);
  struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
@@ -240,15 +244,137 @@ struct drm_gpuva_manager {
 * @ops: _gpuva_fn_ops providing the split/merge steps to drivers
 */
const struct drm_gpuva_fn_ops *ops;
+
+   /**
+* @d_obj: Dummy GEM object; used internally to pass the GPU VMs
+* dma-resv to _exec.
+*/
+   struct drm_gem_object d_obj;
+
+   /**
+* @resv: the _resv for _gem_objects mapped in this GPU VA
+* space
+*/
+   struct dma_resv *resv;
+
+   /**
+* @exec: the _exec helper to lock external _gem_objects
+*/
+   struct drm_exec exec;
+
+   /**
+* @mt_ext: _tree storing external _gem_objects
+*/
+   struct maple_tree mt_ext;

Why are you using a maple tree here? Insertion and removal is O(log(n))
instead of O(1) for a list?


Having a list of drm_gem_objects directly wouldn't work, as multiple GPU-VMs
could have mappings of the same extobj.

I considered using the VM_BO abstraction (struct drm_gpuva_gem) as list entry
instead, which also seems to be the obvious choice. However, there is a locking
conflict.

A drm_gem_object keeps a list of drm_gpuva_gems, while each drm_gpuva_gem keeps
a list of drm_gpuvas. Both lists are either protected with the dma-resv lock of
the corresponding drm_gem_object, or with an external lock provided by the
driver (see drm_gem_gpuva_set_lock()). The latter is used by drivers performing
changes on the GPUVA space directly from the fence signalling path.

Now, similar to what drm_gpuva_link() and drm_gpuva_unlink() are doing already,
we'd want to add a drm_gpuva_gem to the extobj list for the first mapping being
linked and we'd want to remove it for the last one being unlinked.

(Actually we'd want to add the drm_gpuva_gem object to the extobj list even
before, because otherwise we'd not acquire it's dma-resv lock of this GEM object
through drm_gpuva_manager_lock(). But that's trival, we could do that when we
create the drm_gpuva_gem, which we need to do anyways.)

Anyway, we'd probably want to keep removing the drm_gpuva_gem from the extobj
list from drm_gpuva_unlink() when the last mapping of this BO is unlinked. In
order to do so, we'd (as discussed above) either need to hold the outer GPU-VM
lock or the GPU-VMs dma-resv lock. Both would be illegal in the case
drm_gpuva_unlink() is called from within the fence signalling path. For drivers
like XE or Nouveau, we'd at least need to make sure to not mess up the locking
hierarchy of GPU-VM lock and dma-resv lock of the corresponding BO.

Considering all that, I thought it's probably better to track extobjs separate
from the drm_gpuva_gem, hence the maple tree choice.

Hm. OK, in Xe we're having a list of the xe_vmas (drm_gpuvas) that point to
external objects, or in the case of multiple mappings to the same gem
object, only one of the drm_gpuvas is in the list. These are protected by
the GPU-VM lock. I don't see a problem with removing those from the fence
signalling path, though?

I intentionally tried to avoid keeping a list of drm_gpuvas to track extobjs,
since this is generic code I don't know how much mappings of an external object
the corresponding driver potentially creates. This could become a pretty large
list to iterate. Another reason was, that I want to keep the drm_gpuva structure
as small as possible, hence avoiding another list_head.

Yes, the list might be pretty large, but OTOH you never iterate to access a
single list element. When you need to iterate

Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

2023-08-31 Thread Intel


Hi,

On 8/31/23 13:18, Danilo Krummrich wrote:

On Thu, Aug 31, 2023 at 11:04:06AM +0200, Thomas Hellström (Intel) wrote:

Hi!

On 8/30/23 17:00, Danilo Krummrich wrote:

On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellström (Intel) wrote:

On 8/30/23 14:49, Danilo Krummrich wrote:

Hi Thomas,

thanks for having a look!

On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote:

Hi, Danilo.

Some quick comments since I'm doing some Xe work in this area. Will probably
get back with more.

On 8/20/23 23:53, Danilo Krummrich wrote:




diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h
index ed8d50200cc3..693e2da3f425 100644
--- a/include/drm/drm_gpuva_mgr.h
+++ b/include/drm/drm_gpuva_mgr.h
@@ -26,12 +26,16 @@
  */
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 struct drm_gpuva_manager;
+struct drm_gpuva_gem;
 struct drm_gpuva_fn_ops;
 /**
@@ -140,7 +144,7 @@ struct drm_gpuva {
 int drm_gpuva_insert(struct drm_gpuva_manager *mgr, struct drm_gpuva *va);
 void drm_gpuva_remove(struct drm_gpuva *va);
-void drm_gpuva_link(struct drm_gpuva *va);
+void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuva_gem *vm_bo);
 void drm_gpuva_unlink(struct drm_gpuva *va);
 struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr,
@@ -240,15 +244,137 @@ struct drm_gpuva_manager {
 * @ops: _gpuva_fn_ops providing the split/merge steps to drivers
 */
const struct drm_gpuva_fn_ops *ops;
+
+   /**
+* @d_obj: Dummy GEM object; used internally to pass the GPU VMs
+* dma-resv to _exec.
+*/
+   struct drm_gem_object d_obj;
+
+   /**
+* @resv: the _resv for _gem_objects mapped in this GPU VA
+* space
+*/
+   struct dma_resv *resv;
+
+   /**
+* @exec: the _exec helper to lock external _gem_objects
+*/
+   struct drm_exec exec;
+
+   /**
+* @mt_ext: _tree storing external _gem_objects
+*/
+   struct maple_tree mt_ext;

Why are you using a maple tree here? Insertion and removal is O(log(n))
instead of O(1) for a list?


Having a list of drm_gem_objects directly wouldn't work, as multiple GPU-VMs
could have mappings of the same extobj.

I considered using the VM_BO abstraction (struct drm_gpuva_gem) as list entry
instead, which also seems to be the obvious choice. However, there is a locking
conflict.

A drm_gem_object keeps a list of drm_gpuva_gems, while each drm_gpuva_gem keeps
a list of drm_gpuvas. Both lists are either protected with the dma-resv lock of
the corresponding drm_gem_object, or with an external lock provided by the
driver (see drm_gem_gpuva_set_lock()). The latter is used by drivers performing
changes on the GPUVA space directly from the fence signalling path.

Now, similar to what drm_gpuva_link() and drm_gpuva_unlink() are doing already,
we'd want to add a drm_gpuva_gem to the extobj list for the first mapping being
linked and we'd want to remove it for the last one being unlinked.

(Actually we'd want to add the drm_gpuva_gem object to the extobj list even
before, because otherwise we'd not acquire it's dma-resv lock of this GEM object
through drm_gpuva_manager_lock(). But that's trival, we could do that when we
create the drm_gpuva_gem, which we need to do anyways.)

Anyway, we'd probably want to keep removing the drm_gpuva_gem from the extobj
list from drm_gpuva_unlink() when the last mapping of this BO is unlinked. In
order to do so, we'd (as discussed above) either need to hold the outer GPU-VM
lock or the GPU-VMs dma-resv lock. Both would be illegal in the case
drm_gpuva_unlink() is called from within the fence signalling path. For drivers
like XE or Nouveau, we'd at least need to make sure to not mess up the locking
hierarchy of GPU-VM lock and dma-resv lock of the corresponding BO.

Considering all that, I thought it's probably better to track extobjs separate
from the drm_gpuva_gem, hence the maple tree choice.

Hm. OK, in Xe we're having a list of the xe_vmas (drm_gpuvas) that point to
external objects, or in the case of multiple mappings to the same gem
object, only one of the drm_gpuvas is in the list. These are protected by
the GPU-VM lock. I don't see a problem with removing those from the fence
signalling path, though?

I intentionally tried to avoid keeping a list of drm_gpuvas to track extobjs,
since this is generic code I don't know how much mappings of an external object
the corresponding driver potentially creates. This could become a pretty large
list to iterate. Another reason was, that I want to keep the drm_gpuva structure
as small as possible, hence avoiding another list_head.

Yes, the list might be pretty large, but OTOH you never iterate to access a
single list element. When you need to iterate the whole list you need to do
that regardless of the data structure used. As for the list head, it might
perhaps be aliased (union

Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

2023-08-31 Thread Intel


Hi!

On 8/30/23 17:00, Danilo Krummrich wrote:

On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellström (Intel) wrote:

On 8/30/23 14:49, Danilo Krummrich wrote:

Hi Thomas,

thanks for having a look!

On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote:

Hi, Danilo.

Some quick comments since I'm doing some Xe work in this area. Will probably
get back with more.

On 8/20/23 23:53, Danilo Krummrich wrote:

So far the DRM GPUVA manager offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVA manager
represent a basic GPU-VM implementation. In this context, this patch aims
at generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
  this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
  shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
  GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
  of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

Rather than being designed as a "framework", the target is to make all
features appear as a collection of optional helper functions, such that
drivers are free to make use of the DRM GPUVA managers basic
functionality and opt-in for other features without setting any feature
flags, just by making use of the corresponding functions.

Signed-off-by: Danilo Krummrich 
---
drivers/gpu/drm/drm_gpuva_mgr.c | 688 +++-
include/drm/drm_gem.h   |  48 ++-
include/drm/drm_gpuva_mgr.h | 302 +-
3 files changed, 1010 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index f86bfad74ff8..69872b205961 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -655,6 +655,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr,
/**
 * drm_gpuva_manager_init() - initialize a _gpuva_manager
 * @mgr: pointer to the _gpuva_manager to initialize
+ * @drm: the drivers _device
 * @name: the name of the GPU VA space
 * @start_offset: the start offset of the GPU VA space
 * @range: the size of the GPU VA space
@@ -669,6 +670,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr,
 */
void
drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+  struct drm_device *drm,
   const char *name,
   u64 start_offset, u64 range,
   u64 reserve_offset, u64 reserve_range,
@@ -677,6 +679,11 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
mgr->rb.tree = RB_ROOT_CACHED;
INIT_LIST_HEAD(>rb.list);
+   mt_init(>mt_ext);
+
+   INIT_LIST_HEAD(>evict.list);
+   spin_lock_init(>evict.lock);
+
drm_gpuva_check_overflow(start_offset, range);
mgr->mm_start = start_offset;
mgr->mm_range = range;
@@ -694,6 +701,9 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
 reserve_range)))
__drm_gpuva_insert(mgr, >kernel_alloc_node);
}
+
+   drm_gem_private_object_init(drm, >d_obj, 0);
+   mgr->resv = mgr->d_obj.resv;
}
EXPORT_SYMBOL_GPL(drm_gpuva_manager_init);
@@ -713,10 +723,575 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
__drm_gpuva_remove(>kernel_alloc_node);
WARN(!RB_EMPTY_ROOT(>rb.tree.rb_root),
-"GPUVA tree is not empty, potentially leaking memory.");
+"GPUVA tree is not empty, potentially leaking memory.\n");
+
+   mtree_destroy(>mt_ext);
+   WARN(!list_empty(>evict.list), "Evict list should be empty.\n");
+
+   drm_gem_private_object_fini(>d_obj);
}
EXPORT_SYMBOL_GPL(drm_gpuva_manager_destroy);
+/**
+ * drm_gpuva_manager_prepare_objects() - prepare all assoiciated BOs
+ * @mgr: the _gpuva_manager
+ * @num_fences: the amount of _fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all _gem_objects the given
+ * _gpuva_manager contains mappings of.
+ *
+ * Drivers can obtain the corresponding _exec instance through
+ * DRM_GPUVA_EXEC(). It is the drivers responsibility to call drm_exec_init()
+ * and drm_exec_fini() accordingly.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_manager_prepare_objects(struct drm_gpuva_manager *mgr,
+ unsigned int num_fences)
+{
+   struct d

Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

2023-08-30 Thread Intel




On 8/30/23 14:49, Danilo Krummrich wrote:

Hi Thomas,

thanks for having a look!

On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote:

Hi, Danilo.

Some quick comments since I'm doing some Xe work in this area. Will probably
get back with more.

On 8/20/23 23:53, Danilo Krummrich wrote:

So far the DRM GPUVA manager offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVA manager
represent a basic GPU-VM implementation. In this context, this patch aims
at generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
 this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
 shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
 GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
 of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

Rather than being designed as a "framework", the target is to make all
features appear as a collection of optional helper functions, such that
drivers are free to make use of the DRM GPUVA managers basic
functionality and opt-in for other features without setting any feature
flags, just by making use of the corresponding functions.

Signed-off-by: Danilo Krummrich 
---
   drivers/gpu/drm/drm_gpuva_mgr.c | 688 +++-
   include/drm/drm_gem.h   |  48 ++-
   include/drm/drm_gpuva_mgr.h | 302 +-
   3 files changed, 1010 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index f86bfad74ff8..69872b205961 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -655,6 +655,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr,
   /**
* drm_gpuva_manager_init() - initialize a _gpuva_manager
* @mgr: pointer to the _gpuva_manager to initialize
+ * @drm: the drivers _device
* @name: the name of the GPU VA space
* @start_offset: the start offset of the GPU VA space
* @range: the size of the GPU VA space
@@ -669,6 +670,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr,
*/
   void
   drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+  struct drm_device *drm,
   const char *name,
   u64 start_offset, u64 range,
   u64 reserve_offset, u64 reserve_range,
@@ -677,6 +679,11 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
mgr->rb.tree = RB_ROOT_CACHED;
INIT_LIST_HEAD(>rb.list);
+   mt_init(>mt_ext);
+
+   INIT_LIST_HEAD(>evict.list);
+   spin_lock_init(>evict.lock);
+
drm_gpuva_check_overflow(start_offset, range);
mgr->mm_start = start_offset;
mgr->mm_range = range;
@@ -694,6 +701,9 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
 reserve_range)))
__drm_gpuva_insert(mgr, >kernel_alloc_node);
}
+
+   drm_gem_private_object_init(drm, >d_obj, 0);
+   mgr->resv = mgr->d_obj.resv;
   }
   EXPORT_SYMBOL_GPL(drm_gpuva_manager_init);
@@ -713,10 +723,575 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)
__drm_gpuva_remove(>kernel_alloc_node);
WARN(!RB_EMPTY_ROOT(>rb.tree.rb_root),
-"GPUVA tree is not empty, potentially leaking memory.");
+"GPUVA tree is not empty, potentially leaking memory.\n");
+
+   mtree_destroy(>mt_ext);
+   WARN(!list_empty(>evict.list), "Evict list should be empty.\n");
+
+   drm_gem_private_object_fini(>d_obj);
   }
   EXPORT_SYMBOL_GPL(drm_gpuva_manager_destroy);
+/**
+ * drm_gpuva_manager_prepare_objects() - prepare all assoiciated BOs
+ * @mgr: the _gpuva_manager
+ * @num_fences: the amount of _fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all _gem_objects the given
+ * _gpuva_manager contains mappings of.
+ *
+ * Drivers can obtain the corresponding _exec instance through
+ * DRM_GPUVA_EXEC(). It is the drivers responsibility to call drm_exec_init()
+ * and drm_exec_fini() accordingly.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_manager_prepare_objects(struct drm_gpuva_manager *mgr,
+ unsigned int num_fences)
+{
+   struct drm_exec *exec = DRM_GPUVA_EXEC(mgr);
+   MA_STATE(mas, >mt_ext, 0, 0);
+   union {
+   void *ptr;
+   uintptr_t cnt;
+

Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation

2023-08-30 Thread Intel


Hi, Danilo.

Some quick comments since I'm doing some Xe work in this area. Will 
probably get back with more.


On 8/20/23 23:53, Danilo Krummrich wrote:

So far the DRM GPUVA manager offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.

However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVA manager
represent a basic GPU-VM implementation. In this context, this patch aims
at generalizing the following elements.

1) Provide a common dma-resv for GEM objects not being used outside of
this GPU-VM.

2) Provide tracking of external GEM objects (GEM objects which are
shared with other GPU-VMs).

3) Provide functions to efficiently lock all GEM objects dma-resv the
GPU-VM contains mappings of.

4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
of, such that validation of evicted GEM objects is accelerated.

5) Provide some convinience functions for common patterns.

Rather than being designed as a "framework", the target is to make all
features appear as a collection of optional helper functions, such that
drivers are free to make use of the DRM GPUVA managers basic
functionality and opt-in for other features without setting any feature
flags, just by making use of the corresponding functions.

Signed-off-by: Danilo Krummrich 
---
  drivers/gpu/drm/drm_gpuva_mgr.c | 688 +++-
  include/drm/drm_gem.h   |  48 ++-
  include/drm/drm_gpuva_mgr.h | 302 +-
  3 files changed, 1010 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
index f86bfad74ff8..69872b205961 100644
--- a/drivers/gpu/drm/drm_gpuva_mgr.c
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -655,6 +655,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr,
  /**
   * drm_gpuva_manager_init() - initialize a _gpuva_manager
   * @mgr: pointer to the _gpuva_manager to initialize
+ * @drm: the drivers _device
   * @name: the name of the GPU VA space
   * @start_offset: the start offset of the GPU VA space
   * @range: the size of the GPU VA space
@@ -669,6 +670,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr,
   */
  void
  drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
+  struct drm_device *drm,
   const char *name,
   u64 start_offset, u64 range,
   u64 reserve_offset, u64 reserve_range,
@@ -677,6 +679,11 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
mgr->rb.tree = RB_ROOT_CACHED;
INIT_LIST_HEAD(>rb.list);
  
+	mt_init(>mt_ext);

+
+   INIT_LIST_HEAD(>evict.list);
+   spin_lock_init(>evict.lock);
+
drm_gpuva_check_overflow(start_offset, range);
mgr->mm_start = start_offset;
mgr->mm_range = range;
@@ -694,6 +701,9 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr,
 reserve_range)))
__drm_gpuva_insert(mgr, >kernel_alloc_node);
}
+
+   drm_gem_private_object_init(drm, >d_obj, 0);
+   mgr->resv = mgr->d_obj.resv;
  }
  EXPORT_SYMBOL_GPL(drm_gpuva_manager_init);
  
@@ -713,10 +723,575 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr)

__drm_gpuva_remove(>kernel_alloc_node);
  
  	WARN(!RB_EMPTY_ROOT(>rb.tree.rb_root),

-"GPUVA tree is not empty, potentially leaking memory.");
+"GPUVA tree is not empty, potentially leaking memory.\n");
+
+   mtree_destroy(>mt_ext);
+   WARN(!list_empty(>evict.list), "Evict list should be empty.\n");
+
+   drm_gem_private_object_fini(>d_obj);
  }
  EXPORT_SYMBOL_GPL(drm_gpuva_manager_destroy);
  
+/**

+ * drm_gpuva_manager_prepare_objects() - prepare all assoiciated BOs
+ * @mgr: the _gpuva_manager
+ * @num_fences: the amount of _fences to reserve
+ *
+ * Calls drm_exec_prepare_obj() for all _gem_objects the given
+ * _gpuva_manager contains mappings of.
+ *
+ * Drivers can obtain the corresponding _exec instance through
+ * DRM_GPUVA_EXEC(). It is the drivers responsibility to call drm_exec_init()
+ * and drm_exec_fini() accordingly.
+ *
+ * Returns: 0 on success, negative error code on failure.
+ */
+int
+drm_gpuva_manager_prepare_objects(struct drm_gpuva_manager *mgr,
+ unsigned int num_fences)
+{
+   struct drm_exec *exec = DRM_GPUVA_EXEC(mgr);
+   MA_STATE(mas, >mt_ext, 0, 0);
+   union {
+   void *ptr;
+   uintptr_t cnt;
+   } ref;
+   int ret;
+
+   ret = drm_exec_prepare_obj(exec, >d_obj, num_fences);
+   if (ret)
+   goto out;
+
+   rcu_read_lock();
In xe we're protecting the external object list with an outer lock, 
(same as protecting the mgr itself).

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Intel




On 7/6/23 17:48, Danilo Krummrich wrote:

Hi Thomas,

On 7/6/23 10:49, Thomas Hellström (Intel) wrote:

Hi, Danilo

Some review comments below:

On 6/30/23 00:25, Danilo Krummrich wrote:

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple 
and

arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
    making use of the maple_tree.


It looks like we're not using the maple tree anymore, but rather an 
instantiation of an interval tree.


(Perhaps as a follow-up it makes sense to provide a pre-instantiated 
common u64 version of the interval tree in addition to the unsigned 
long one since it appears to be used in multiple places in graphics 
drivers).



2) Generically connect GPU VA mappings to their backing buffers, in
    particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
    operations on the GPU VA space. In particular splitting and merging
    of GPU VA mappings, e.g. for intersecting mapping requests or 
partial

    unmap requests.

Tested-by: Donald Robson
Reviewed-by: Boris Brezillon
Suggested-by: Dave Airlie
Signed-off-by: Danilo Krummrich
---
  Documentation/gpu/drm-mm.rst    |   36 +
  drivers/gpu/drm/Makefile    |    1 +
  drivers/gpu/drm/drm_gem.c   |    3 +
  drivers/gpu/drm/drm_gpuva_mgr.c | 1743 
+++

  include/drm/drm_drv.h   |    6 +
  include/drm/drm_gem.h   |   52 +
  include/drm/drm_gpuva_mgr.h |  756 ++
  7 files changed, 2597 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
  create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst 
b/Documentation/gpu/drm-mm.rst

index a52e6f4117d6..3d5dc9dc1bfe 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,42 @@ DRM MM Range Allocator Function References
  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
 :export:
+DRM GPU VA Manager
+==
+
+Overview
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Overview
+
+Split and Merge
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Split and Merge
+
+Locking
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Locking
+
+Examples
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Examples
+
+DRM GPU VA Manager Function References
+--
+
+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :export:
+
  DRM Buddy Allocator
  ===
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 414855e2a463..6d6c9dec66e8 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -45,6 +45,7 @@ drm-y := \
  drm_vblank.o \
  drm_vblank_work.o \
  drm_vma_manager.o \
+    drm_gpuva_mgr.o \
  drm_writeback.o
  drm-$(CONFIG_DRM_LEGACY) += \
  drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 1a5a2cd0d4ec..cd878ebddbd0 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct 
drm_device *dev,

  if (!obj->resv)
  obj->resv = >_resv;
+    if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))
+    drm_gem_gpuva_init(obj);
+
  drm_vma_node_reset(>vma_node);
  INIT_LIST_HEAD(>lru_node);
  }
diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c 
b/drivers/gpu/drm/drm_gpuva_mgr.c

new file mode 100644
index ..4414990c05cc
--- /dev/null
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -0,0 +1,1743 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
SPDX-License is GPL-2.0 but below looks like MIT. Can we use "GPL-2.0 
OR MIT" or does something restrict it to GPL-only?

+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a
+ * copy of this software and associated documentation files (the 
"Software"),
+ * to deal in the Software without restriction, including without 
limitation
+ * the rights to use, copy, modify, merge, publish, distribute, 
sublicense,
+ * and/or sell copies of the Software, and to permit persons to 
whom the
+ * Software is furnished to do so, subject to the following 
conditions:

+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANT

Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings

2023-07-06 Thread Intel


Hi, Danilo

Some review comments below:

On 6/30/23 00:25, Danilo Krummrich wrote:

Add infrastructure to keep track of GPU virtual address (VA) mappings
with a decicated VA space manager implementation.

New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers
start implementing, allow userspace applications to request multiple and
arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is
intended to serve the following purposes in this context.

1) Provide infrastructure to track GPU VA allocations and mappings,
making use of the maple_tree.


It looks like we're not using the maple tree anymore, but rather an 
instantiation of an interval tree.


(Perhaps as a follow-up it makes sense to provide a pre-instantiated 
common u64 version of the interval tree in addition to the unsigned long 
one since it appears to be used in multiple places in graphics drivers).



2) Generically connect GPU VA mappings to their backing buffers, in
particular DRM GEM objects.

3) Provide a common implementation to perform more complex mapping
operations on the GPU VA space. In particular splitting and merging
of GPU VA mappings, e.g. for intersecting mapping requests or partial
unmap requests.

Tested-by: Donald Robson
Reviewed-by: Boris Brezillon
Suggested-by: Dave Airlie
Signed-off-by: Danilo Krummrich
---
  Documentation/gpu/drm-mm.rst|   36 +
  drivers/gpu/drm/Makefile|1 +
  drivers/gpu/drm/drm_gem.c   |3 +
  drivers/gpu/drm/drm_gpuva_mgr.c | 1743 +++
  include/drm/drm_drv.h   |6 +
  include/drm/drm_gem.h   |   52 +
  include/drm/drm_gpuva_mgr.h |  756 ++
  7 files changed, 2597 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
  create mode 100644 include/drm/drm_gpuva_mgr.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a52e6f4117d6..3d5dc9dc1bfe 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -466,6 +466,42 @@ DRM MM Range Allocator Function References
  .. kernel-doc:: drivers/gpu/drm/drm_mm.c
 :export:
  
+DRM GPU VA Manager

+==
+
+Overview
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Overview
+
+Split and Merge
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Split and Merge
+
+Locking
+---
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Locking
+
+Examples
+
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :doc: Examples
+
+DRM GPU VA Manager Function References
+--
+
+.. kernel-doc:: include/drm/drm_gpuva_mgr.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c
+   :export:
+
  DRM Buddy Allocator
  ===
  
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile

index 414855e2a463..6d6c9dec66e8 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -45,6 +45,7 @@ drm-y := \
drm_vblank.o \
drm_vblank_work.o \
drm_vma_manager.o \
+   drm_gpuva_mgr.o \
drm_writeback.o
  drm-$(CONFIG_DRM_LEGACY) += \
drm_agpsupport.o \
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 1a5a2cd0d4ec..cd878ebddbd0 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev,
if (!obj->resv)
obj->resv = >_resv;
  
+	if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA))

+   drm_gem_gpuva_init(obj);
+
drm_vma_node_reset(>vma_node);
INIT_LIST_HEAD(>lru_node);
  }
diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c
new file mode 100644
index ..4414990c05cc
--- /dev/null
+++ b/drivers/gpu/drm/drm_gpuva_mgr.c
@@ -0,0 +1,1743 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
SPDX-License is GPL-2.0 but below looks like MIT. Can we use "GPL-2.0 OR 
MIT" or does something restrict it to GPL-only?

+ * Copyright (c) 2022 Red Hat.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE

Re: [PATCH v2] drm/i915: Replace kmap() with kmap_local_page()

2023-06-21 Thread Intel




On 6/21/23 18:35, Ira Weiny wrote:

Thomas Hellström (Intel) wrote:

I think one thing worth mentioning in the context of this patch is that
IIRC kmap_local_page() will block offlining of the mapping CPU until
kunmap_local(), so while I haven't seen any guidelines around the usage
of this api for long-held mappings, I figure it's wise to keep the
mapping duration short, or at least avoid sleeping with a
kmap_local_page() map active.

I figured, while page compression is probably to be considered "slow"
it's probably not slow enough to motivate kmap() instead of
kmap_local_page(), but if anyone feels differently, perhaps it should be
considered.

What you say is all true.  But remember the mappings are only actually
created on a HIGHMEM system.  HIGHMEM systems are increasingly rare.  Also
they must suffer such performance issues because there is just no other
way around supporting them.

Also Sumitra, and our kmap conversion project in general, is focusing on
not using kmap* if at all possible.  Thus the reason V1 tried to use
page_address().

Could we guarantee the i915 driver is excluded from all HIGHMEM systems?


The i915 maintainers might want to chime in here, but I would say no, we 
can't, although we don't care much about optimizing for them. Same for 
the new xe driver.


Thanks,

/Thomas





With that said, my Reviewed-by: still stands.

Thanks!
Ira

Re: [PATCH 1/2] drm: execution context for GEM buffers v5

2023-06-21 Thread Intel




On 6/21/23 15:36, Christian König wrote:

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existing TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
 overhead is unecessary and measurable.
v3: drop duplicate tracking, radeon is really the only one needing that.
v4: fixes issues pointed out by Danilo, some typos in comments and a
 helper for lock arrays of GEM objects.
v5: some suggestions by Boris Brezillon, especially just use one retry
 macro, drop loop in prepare_array, use flags instead of bool

Signed-off-by: Christian König 
---
  Documentation/gpu/drm-mm.rst |  12 ++
  drivers/gpu/drm/Kconfig  |   6 +
  drivers/gpu/drm/Makefile |   2 +
  drivers/gpu/drm/drm_exec.c   | 330 +++
  include/drm/drm_exec.h   | 120 +
  5 files changed, 470 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_exec.c
  create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a79fd3549ff8..a52e6f4117d6 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -493,6 +493,18 @@ DRM Sync Objects
  .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
 :export:
  
+DRM Execution context

+=
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
  GPU Scheduler
  =
  
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig

index afb3b2f5f425..c2f3d234c89e 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -194,6 +194,12 @@ config DRM_TTM
  GPU memory types. Will be enabled automatically if a device driver
  uses it.
  
+config DRM_EXEC

+   tristate
+   depends on DRM
+   help
+ Execution context for command submissions
+
  config DRM_BUDDY
tristate
depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 7a09a89b493b..414855e2a463 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
drm_panel_orientation_quirks.o
  #
  # Memory-management helpers
  #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
  
  obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
  
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c

new file mode 100644
index ..285bf80740b5
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,330 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include 
+#include 
+#include 
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ * struct drm_gem_object *obj;
+ * struct drm_exec exec;
+ * unsigned long index;
+ * int ret;
+ *
+ * drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT);
+ * drm_exec_until_all_locked() {
+ * ret = drm_exec_prepare_obj(, boA, 1);
+ * drm_exec_retry_on_contention();
+ * if (ret)
+ * goto error;
+ *
+ * ret = drm_exec_prepare_obj(, boB, 1);
+ * drm_exec_retry_on_contention();
+ * if (ret)
+ * goto error;
+ * }
+ *
+ * drm_exec_for_each_locked_object(, index, obj) {
+ * dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ * ...
+ * }
+ * drm_exec_fini();
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Dummy value used to initially enter the retry loop */
+#define DRM_EXEC_DUMMY (void*)~0
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+   struct drm_gem_object *obj;
+   unsigned long index;
+
+   drm_exec_for_each_locked_object(exec, index, obj) {
+   dma_resv_unlock(obj->resv);
+   drm_gem_object_put(obj);
+   }
+
+   drm_gem_object_put(exec->prelocked);
+   exec->prelocked = NULL;
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @flags: controls locking behavior, see DRM_EXEC_* defines
+ *
+ * Initialize the object and

Re: [PATCH v2] drm/i915: Replace kmap() with kmap_local_page()

2023-06-21 Thread Intel




On 6/20/23 20:07, Sumitra Sharma wrote:

On Tue, Jun 20, 2023 at 06:23:38AM -0700, Ira Weiny wrote:

Sumitra Sharma wrote:

On Sun, Jun 18, 2023 at 11:11:08AM -0700, Ira Weiny wrote:

Sumitra Sharma wrote:

kmap() has been deprecated in favor of the kmap_local_page()
due to high cost, restricted mapping space, the overhead of a
global lock for synchronization, and making the process sleep
in the absence of free slots.

kmap_local_page() is faster than kmap() and offers thread-local
and CPU-local mappings, take pagefaults in a local kmap region
and preserves preemption by saving the mappings of outgoing tasks
and restoring those of the incoming one during a context switch.

The mapping is kept thread local in the function
“i915_vma_coredump_create” in i915_gpu_error.c

Therefore, replace kmap() with kmap_local_page().

Suggested-by: Ira Weiny 


NIT: No need for the line break between Suggested-by and your signed off line.


Hi Ira,

What does NIT stand for?

Shorthand for 'nitpicking'.

"giving too much attention to details that are not important, especially
as a way of criticizing: "

- https://dictionary.cambridge.org/dictionary/english/nitpicking

Via email this is a way for authors of an email to indicate something is
technically wrong but while nicely acknowledging that it is not very
significant and could be seen as overly critical.

For this particular comment I'm showing something to pay attention to next
time but that was not a big deal this time around.


Hi Ira,

Thank for your explanation on NIT.



Thank you. I will take care about the line breaks.


Signed-off-by: Sumitra Sharma 
---

Changes in v2:
- Replace kmap() with kmap_local_page().

Generally it is customary to attribute a change like this to those who
suggested it in a V1 review.

For example:

- Tvrtko/Thomas: Use kmap_local_page() instead of page_address()

Also I don't see Thomas on the new email list.  Since he took the time to
review V1 he might want to check this version out.  I've added him to the
'To:' list.

Also a link to V1 is nice.  B4 formats it like this:

- Link to v1: https://lore.kernel.org/all/20230614123556.ga381...@sumitra.com/

All that said the code looks good to me.  So with the above changes.

Reviewed-by: Ira Weiny 


I have noted down the points mentioned above. Thank you again.

I am not supposed to create another version of this patch for
adding the above mentions, as you and Thomas both gave this patch
a reviewed-by tag. Right?


Based on this response[*] from Tvrtko I think this version can move
through without a v3.

Okay!


Thanks & regards
Sumitra


I think one thing worth mentioning in the context of this patch is that 
IIRC kmap_local_page() will block offlining of the mapping CPU until 
kunmap_local(), so while I haven't seen any guidelines around the usage 
of this api for long-held mappings, I figure it's wise to keep the 
mapping duration short, or at least avoid sleeping with a 
kmap_local_page() map active.


I figured, while page compression is probably to be considered "slow" 
it's probably not slow enough to motivate kmap() instead of 
kmap_local_page(), but if anyone feels differently, perhaps it should be 
considered.


With that said, my Reviewed-by: still stands.

/Thomas




Thanks!
Ira

[*] 
https://lore.kernel.org/all/bcb0a1d2-cd4d-a56f-1ee6-7ccfdd2f7...@linux.intel.com/


Thanks all! I'll just re-send the patch for our CI, since it didn't get
picked up automatically (stuck in moderation perhaps), with all r-b tags
added and extra line space removed and merge it if results will be green.

Regards,

Tvrtko




Thanks & regards
Sumitra

PS: I am new to the open source vocabulary terms.


- Change commit subject and message.

  drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index f020c0086fbc..bc41500eedf5 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1164,9 +1164,9 @@ i915_vma_coredump_create(const struct intel_gt *gt,
  
  			drm_clflush_pages(, 1);
  
-			s = kmap(page);

+   s = kmap_local_page(page);
ret = compress_page(compress, s, dst, false);
-   kunmap(page);
+   kunmap_local(s);
  
  			drm_clflush_pages(, 1);
  
--

2.25.1

Re: [PATCH 01/13] drm: execution context for GEM buffers v4

2023-06-19 Thread Intel




On 6/19/23 11:48, Christian König wrote:

Hi,

Am 19.06.23 um 11:33 schrieb Thomas Hellström (Intel):

[SNIP]
Sometimes you want to just drop the contended lock after the above 
relaxation. (Eviction would be one example), and not add as 
prelocked, if the contended object goes out of scope. Eviction 
would in some situations be one such example, -EDEADLOCK leading to 
an error path where the object should otherwise be freed is 
another. Perhaps we could add an argument to prepare_obj() as to 
whether the object should be immediately put after relaxation.


I was considering a try_prepare version as well, that should cover 
this use case.


That sounds a bit different from this use-case. The use-case above 
would, on -EDEADLOCK actually unlock everything, then lock-slow the 
contending lock and then immediately unlock it and drop.


Hui? What would that be good for?


It's for the case where you have nested locking, the contended lock has 
gone out-of-scope and you're probably not going to need it on the next 
attempt. I think the refcounted "prelocked" object that is lacking in 
the i915 variant will resolve all correctness / uaf issues, but still 
the object might be needlessly carried around for yet another locking round.





It sounds like try_prepare would just skip locking and continue with 
everything locked so far still locked?


Correct.








+    ret = drm_exec_obj_locked(exec, obj);
+    if (unlikely(ret)) {
+    dma_resv_unlock(obj->resv);
+    goto error_dropref;
+    }
+
+    swap(exec->prelocked, obj);
+
+error_dropref:
+    /* Always cleanup the contention so that error handling can 
kick in */

+    drm_gem_object_put(obj);
+    exec->contended = NULL;
+    return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence 
slots. All

+ * successfully locked objects are put into the locked container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -EALREADY when 
object is
+ * already locked, -ENOMEM when memory allocation failed and zero 
for success.

+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct 
drm_gem_object *obj,

+ unsigned int num_fences)
+{
+    int ret;
+
+    ret = drm_exec_lock_contended(exec);
+    if (unlikely(ret))
+    return ret;
+
+    if (exec->prelocked == obj) {
+    drm_gem_object_put(exec->prelocked);
+    exec->prelocked = NULL;
+
+    return dma_resv_reserve_fences(obj->resv, num_fences);
+    }
+
+    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
+    ret = dma_resv_lock_interruptible(obj->resv, >ticket);
+    else
+    ret = dma_resv_lock(obj->resv, >ticket);
+
+    if (unlikely(ret == -EDEADLK)) {
+    drm_gem_object_get(obj);
+    exec->contended = obj;
+    return -EDEADLK;
+    }
+
+    if (unlikely(ret == -EALREADY &&
+    (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
+    goto reserve_fences;
+
+    if (unlikely(ret))
+    return ret;
+
+    ret = drm_exec_obj_locked(exec, obj);
+    if (ret)
+    goto error_unlock;
+
+reserve_fences:
+    /* Keep locked when reserving fences fails */
+    return dma_resv_reserve_fences(obj->resv, num_fences);


Ugh, what is the use-case for keeping things locked here? How would 
a caller tell the difference between an error where everything is 
locked and nothing is locked? IMO, we should unlock on error here. 
If there indeed is a use-case we should add a separate function for 
reserving fences for all locked objects, rather than returning 
sometimes locked on error sometime not.


We return the object locked here because it was to much churn to 
remove it again from the array and we are getting fully cleaned up 
at the end anyway.


OK, so if we add an unlock functionality, we could just have a 
consistent locking state on error return?


Yeah, that should work. Going to work on this.


Great.

Thanks,

Thomas




Regards,
Christian.



Thanks,
Thomas



Regards,
Christian.



Thanks,

Thomas



+
+error_unlock:
+    dma_resv_unlock(obj->resv);
+    return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+/**
+ * drm_exec_prepare_array - helper to prepare an array of objects
+ * @exec: the drm_exec object with the state
+ * @objects: array of GEM object to prepare
+ * @num_objects: number of GEM objects in the array
+ * @num_fences: number of fences to reserve on each GEM object
+ *
+ * Prepares all GEM objects in an array, handles contention but 
aports on first
+ * error otherwise. Reserves @num_fences on each GEM object after 
locking it.

+ *
+ * Returns: -EALREADY when object is already locked, -ENOMEM when 
memory

+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_array(struct drm_exec *exec,
+

Re: [PATCH 01/13] drm: execution context for GEM buffers v4

2023-06-19 Thread Intel


Hi!

On 6/19/23 11:20, Christian König wrote:

Hi guys,

Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):

[SNIP]


I really need to find some time to work on that anyway.

I've been playing with drm_exec for a couple weeks now, and I wanted
to share something I hacked to try and make the API simpler and
more robust against misuse (see the below diff, which is a slightly
adjusted version of your work).


It would be good if we could have someone taking charge of this 
series and address all review comments, I see some of my comments 
getting lost, we have multiple submitters and I can't find a 
dri-devel patchwork entry for this. Anyway some comments below.


I can try to find some time for the series this week (As long as 
nobody comes along and has any burning roof).






In this version, the user is no longer in control of the retry
loop. Instead, it provides an expression (a call to a
sub-function) to be re-evaluated each time a contention is
detected. IMHO, this makes the 'prepare-objs' functions easier to
apprehend, and avoids any mistake like calling
drm_exec_continue_on_contention() in an inner loop, or breaking
out of the drm_exec_while_all_locked() loop unintentionally.


In i915 we've had a very similar helper to this, and while I agree 
this newer version would probably help make code cleaner, but OTOH 
there also are some places where the short 
drm_exec_while_all_locked() -likeblock don't really motivate a 
separate function. Porting i915 to the current version will take some 
work, For  the xe driver both versions would work fine.


Yeah, this is actually what my first version of this looked like. But 
I abandoned that approach because we have a lot of cases were we just 
quickly want to lock a few GEM objects and don't want the extra 
overhead of putting all the state into some bag to forward it to a 
function.




Some additional review comments not related to the interface change 
below:




It also makes the internal management a bit simpler, since we
no longer call drm_exec_cleanup() on the first attempt, and can
thus get rid of the DRM_EXEC_DUMMY trick.

In the below diff, I also re-introduced native support for
duplicates as an opt-in, so we don't have to do things like:

ret = drm_exec_prepare_obj(exec, obj, num_fences);
if (ret == -EALREADY)
    ret = dma_resv_reserve_fences(obj->resv, num_fences);
if (ret)
    return ret;

and can just do:

ret = drm_exec_prepare_obj(exec, obj, num_fences);
if (ret)
    return;

Of course drivers can open-code a wrapper doing the same thing, but
given at least pvr and radeon need this, it'd be nice if the core
could support it natively.

That's mostly it. Just wanted to share what I had in case you're
interested. If not, that's fine too.

Regards,

Boris
---
  Documentation/gpu/drm-mm.rst |  12 ++
  drivers/gpu/drm/Kconfig  |   6 +
  drivers/gpu/drm/Makefile |   2 +
  drivers/gpu/drm/drm_exec.c   | 274 
+++

  include/drm/drm_exec.h   | 130 +
  5 files changed, 424 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_exec.c
  create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst 
b/Documentation/gpu/drm-mm.rst

index fe40ee686f6e..c9f120cfe730 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -524,6 +524,18 @@ DRM Sync Objects
  .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
 :export:
  +DRM Execution context
+=
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
  GPU Scheduler
  =
  diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 76991720637c..01a38fcdb1c4 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -194,6 +194,12 @@ config DRM_TTM
    GPU memory types. Will be enabled automatically if a device 
driver

    uses it.
  +config DRM_EXEC
+    tristate
+    depends on DRM
+    help
+  Execution context for command submissions
+
  config DRM_BUDDY
  tristate
  depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 1873f64db171..18a02eaf2d49 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
drm_panel_orientation_quirks.o

  #
  # Memory-management helpers
  #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
    obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
  diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index ..e0ad1a3e1610
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,274 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include 
+#include 
+#include 
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for 
locking
+ * multiple GEM objects while prepar

Re: [PATCH 01/13] drm: execution context for GEM buffers v4

2023-06-19 Thread Intel




On 6/17/23 13:54, Boris Brezillon wrote:

+Matthew who's been using drm_exec in Xe if I'm correct.

Hello Christian,

On Wed, 14 Jun 2023 15:02:52 +0200
Boris Brezillon  wrote:


On Wed, 14 Jun 2023 14:30:53 +0200
Christian König  wrote:


Am 14.06.23 um 14:23 schrieb Boris Brezillon:

Hi Christian,

On Thu,  4 May 2023 13:51:47 +0200
"Christian König"  wrote:


This adds the infrastructure for an execution context for GEM buffers
which is similar to the existing TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

As many other drivers do already, we are considering using drm_exec()
for our resv locking in the PowerVR driver, so we might have more
questions/comments in the coming days/weeks, but I already have a
couple right now (see below).


v3: drop duplicate tracking, radeon is really the only one needing that

I think we'd actually be interested in duplicate tracking. Is there any
way we can make it an optional feature through some extra helpers/flags?
Doesn't have to be done in this patch series, I'm just wondering if this
is something we can share as well.

You can still capture the -EALREADY error and act appropriately in your
driver.

For radeon it just means ignoring the error code and going ahead, but
that behavior doesn't seem to be desired in most cases.

Initially I though we need to separately track how many and how often
BOs are duplicated, but there is simply no use for this.
   

[...]


+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ * struct drm_gem_object *obj;
+ * struct drm_exec exec;
+ * unsigned long index;
+ * int ret;
+ *
+ * drm_exec_init(, true);
+ * drm_exec_while_not_all_locked() {
+ * ret = drm_exec_prepare_obj(, boA, 1);
+ * drm_exec_continue_on_contention();
+ * if (ret)
+ * goto error;
+ *

Have you considered defining a drm_exec_try_prepare_obj_or_retry()
combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()?

#define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \
  ({ \
  int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \
  if (unlikely(drm_exec_is_contended(exec))) \
  continue; \
  __ret; \
  })

This way the following pattern

ret = drm_exec_prepare_obj(, boA, 1);
drm_exec_continue_on_contention();
if (ret)
goto error;

can be turned into something more conventional:

ret = drm_exec_try_prepare_obj_or_retry(, boA, 1);
if (ret)
goto error;

Yeah, I was considering that as well. But then abandoned it as to
complicated.

I really need to find some time to work on that anyway.

I've been playing with drm_exec for a couple weeks now, and I wanted
to share something I hacked to try and make the API simpler and
more robust against misuse (see the below diff, which is a slightly
adjusted version of your work).


It would be good if we could have someone taking charge of this series 
and address all review comments, I see some of my comments getting lost, 
we have multiple submitters and I can't find a dri-devel patchwork entry 
for this. Anyway some comments below.




In this version, the user is no longer in control of the retry
loop. Instead, it provides an expression (a call to a
sub-function) to be re-evaluated each time a contention is
detected. IMHO, this makes the 'prepare-objs' functions easier to
apprehend, and avoids any mistake like calling
drm_exec_continue_on_contention() in an inner loop, or breaking
out of the drm_exec_while_all_locked() loop unintentionally.


In i915 we've had a very similar helper to this, and while I agree this 
newer version would probably help make code cleaner, but OTOH there also 
are some places where the short drm_exec_while_all_locked() -likeblock 
don't really motivate a separate function. Porting i915 to the current 
version will take some work, For  the xe driver both versions would work 
fine.


Some additional review comments not related to the interface change below:



It also makes the internal management a bit simpler, since we
no longer call drm_exec_cleanup() on the first

Re: [PATCH v2] drm/i915: Replace kmap() with kmap_local_page()

2023-06-19 Thread Intel




On 6/18/23 20:11, Ira Weiny wrote:

Sumitra Sharma wrote:

kmap() has been deprecated in favor of the kmap_local_page()
due to high cost, restricted mapping space, the overhead of a
global lock for synchronization, and making the process sleep
in the absence of free slots.

kmap_local_page() is faster than kmap() and offers thread-local
and CPU-local mappings, take pagefaults in a local kmap region
and preserves preemption by saving the mappings of outgoing tasks
and restoring those of the incoming one during a context switch.

The mapping is kept thread local in the function
“i915_vma_coredump_create” in i915_gpu_error.c

Therefore, replace kmap() with kmap_local_page().

Suggested-by: Ira Weiny 


NIT: No need for the line break between Suggested-by and your signed off line.


Signed-off-by: Sumitra Sharma 
---

Changes in v2:
- Replace kmap() with kmap_local_page().

Generally it is customary to attribute a change like this to those who
suggested it in a V1 review.

For example:

- Tvrtko/Thomas: Use kmap_local_page() instead of page_address()

Also I don't see Thomas on the new email list.  Since he took the time to
review V1 he might want to check this version out.  I've added him to the
'To:' list.


Thanks.



Also a link to V1 is nice.  B4 formats it like this:

- Link to v1: https://lore.kernel.org/all/20230614123556.ga381...@sumitra.com/

All that said the code looks good to me.  So with the above changes.

Reviewed-by: Ira Weiny 


LGTM. Reviewed-by: Thomas Hellström 






- Change commit subject and message.

  drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index f020c0086fbc..bc41500eedf5 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1164,9 +1164,9 @@ i915_vma_coredump_create(const struct intel_gt *gt,
  
  			drm_clflush_pages(, 1);
  
-			s = kmap(page);

+   s = kmap_local_page(page);
ret = compress_page(compress, s, dst, false);
-   kunmap(page);
+   kunmap_local(s);
  
  			drm_clflush_pages(, 1);
  
--

2.25.1

Re: [Intel-gfx] [PATCH] drm/i915: Call page_address() on page acquired with GFP_KERNEL flag

2023-06-14 Thread Intel




On 6/14/23 15:22, Tvrtko Ursulin wrote:


On 14/06/2023 13:35, Sumitra Sharma wrote:

Pages allocated with GFP_KERNEL cannot come from Highmem.
That is why there is no need to call kmap() on them.


Are you sure it is GFP_KERNEL backed and not tmpfs? I am not sure 
myself so let me copy Matt and Thomas if they happen to know off hand.


It looks to me these are shmem pages or TTM pages. Both could be 
highmem. So I think kmap is the correct choice here.


/Thomas






Regards,

Tvrtko


Therefore, don't call kmap() on the page coming from
vma_res->bi.pages using for_each_sgt_page() in
i915_vma_coredump_create().

Use a plain page_address() to get the kernel address instead.

Signed-off-by: Sumitra Sharma 
---
  drivers/gpu/drm/i915/i915_gpu_error.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c

index f020c0086fbc..6f51cb4fc55c 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1164,9 +1164,8 @@ i915_vma_coredump_create(const struct intel_gt 
*gt,

    drm_clflush_pages(, 1);
  -    s = kmap(page);
+    s = page_address(page);
  ret = compress_page(compress, s, dst, false);
-    kunmap(page);
    drm_clflush_pages(, 1);

Re: [PATCH 01/13] drm: execution context for GEM buffers v4

2023-05-04 Thread Intel




On 5/4/23 13:51, Christian König wrote:

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existing TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
 overhead is unecessary and measurable.
v3: drop duplicate tracking, radeon is really the only one needing that.
v4: fixes issues pointed out by Danilo, some typos in comments and a
 helper for lock arrays of GEM objects.

Signed-off-by: Christian König 

...

+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+   /**
+* @interruptible: If locks should be taken interruptible
+*/
+   boolinterruptible;
+
+   /**
+* @ticket: WW ticket used for acquiring locks
+*/
+   struct ww_acquire_ctx   ticket;
+
+   /**
+* @num_objects: number of objects locked
+*/
+   unsigned intnum_objects;
+
+   /**
+* @max_objects: maximum objects in array
+*/
+   unsigned intmax_objects;
+
+   /**
+* @objects: array of the locked objects
+*/
+   struct drm_gem_object   **objects;


Hi, Christian. Did you consider using a list here with links embedded in 
gem objects, now that only locked objects are to be put on the list / array.


That should work as only the process owning the lock may access the list 
link. Apart from getting rid of reallocing this is beneficial for the 
more general types of ww transactions that are used by i915 (and to a 
minor extent xe as well, I think).


In those cases we would want to unlock a temporary held object within 
the while_not_all_locked() loop and would then have to search the entire 
array for the correct pointer. Instead one could just remove it from the 
list of locked objects.


Thanks,

Thomas

Re: [PATCH v3 0/5] drm/i915: Allow user to set cache at BO creation

2023-04-29 Thread Intel

On 4/28/23 19:43, Yang, Fei wrote:

>> On 4/28/23 17:19, Yang, Fei wrote:
>>> On 4/28/23 07:47, fei.y...@intel.com wrote:
 From: Fei Yang 

 The first three patches in this series are taken from
 https://patchwork.freedesktop.org/series/116868/
 These patches are included here because the last patch
 has dependency on the pat_index refactor.

 This series is focusing on uAPI changes,
 1. end support for set caching ioctl [PATCH 4/5]
 2. add set_pat extension for gem_create [PATCH 5/5]

 v2: drop one patch that was merged separately
      341ad0e8e254 drm/i915/mtl: Add PTE encode function
 v3: rebase on https://patchwork.freedesktop.org/series/117082/
>>>
>>> Hi, Fei.
>>>
>>> Does this uAPI update also affect any discrete GPUs supported by i915,
>>
>> It does.
>>
>>> And in that case, does it allow setting non-snooping PAT indices on
>>> those devices?
>>
>> It allows setting PAT indices specified in
>> KMD does a sanity check so that it won't go over the max recommended
>> by bspec.
>>
>>> If so, since the uAPI for discrete GPU devices doesn't allow 
incoherency
>>> between GPU and CPU (apart from write-combining buffering), the 
correct

>>> CPU caching mode matching the PAT index needs to be selected for the
>>> buffer object in i915_ttm_select_tt_caching().
>>
>> The patch doesn't affect the CPU caching mode setting logic though.
>> And the caching settings for objects created by kernel should remain
>> the same for both CPU and GPU, objects created by userspace are
>> managed completely by userspace.
>>
>> One question though, what do you mean by non-snooping PAT indices?
>
> Yes, that was actually the bottom question: What do these PAT settings
> allow you to do WRT the snooping on supported discrete devices (DG2) on
> i915?
> If they indeed don't allow disabling snooping, then that's not a 
problem.

When dGPU's access SysMem, the PCIe default is for that access to 
snoop the
host's caches. All of our current dGPU's do that -- independent of PAT 
setting.

> If they do, the ttm code there needs some modification.

I'm not familiar with ttm, but if your concern is that certain PAT index
could disable snooping, that is not possible for current dGPU's.
I think it is possible for Xe2/3 though, because there will be COH_MODE
defined in the PAT registers going forward.

OK. If that's the case, then it should be safe to disregard this concern.

Thanks,

Thomas

-Fei

Re: [PATCH v3 0/5] drm/i915: Allow user to set cache at BO creation

2023-04-28 Thread Intel



On 4/28/23 17:19, Yang, Fei wrote:

> On 4/28/23 07:47, fei.y...@intel.com wrote:
>> From: Fei Yang 
>>
>> The first three patches in this series are taken from
>> https://patchwork.freedesktop.org/series/116868/
>> These patches are included here because the last patch
>> has dependency on the pat_index refactor.
>>
>> This series is focusing on uAPI changes,
>> 1. end support for set caching ioctl [PATCH 4/5]
>> 2. add set_pat extension for gem_create [PATCH 5/5]
>>
>> v2: drop one patch that was merged separately
>>      341ad0e8e254 drm/i915/mtl: Add PTE encode function
>> v3: rebase on https://patchwork.freedesktop.org/series/117082/
>
> Hi, Fei.
>
> Does this uAPI update also affect any discrete GPUs supported by i915,

It does.

> And in that case, does it allow setting non-snooping PAT indices on
> those devices?

It allows setting PAT indices specified in
KMD does a sanity check so that it won't go over the max recommended
by bspec.

> If so, since the uAPI for discrete GPU devices doesn't allow incoherency
> between GPU and CPU (apart from write-combining buffering), the correct
> CPU caching mode matching the PAT index needs to be selected for the
> buffer object in i915_ttm_select_tt_caching().

The patch doesn't affect the CPU caching mode setting logic though.
And the caching settings for objects created by kernel should remain
the same for both CPU and GPU, objects created by userspace are
managed completely by userspace.

One question though, what do you mean by non-snooping PAT indices?
The PAT index registers don't really control coherency mode in the past,
I believe MTL is the first one that has COH_MODE in the PAT registers.
Aren't discrete GPUs snooping CPU cache automatically?


Yes, that was actually the bottom question: What do these PAT settings 
allow you to do WRT the snooping on supported discrete devices (DG2) on 
i915?


If they indeed don't allow disabling snooping, then that's not a 
problem. If they do, the ttm code there needs some modification.



Thanks,

Thomas





-Fei

> Thanks,
> Thomas
>
>>
>> Fei Yang (5):
>>    drm/i915: preparation for using PAT index
>>    drm/i915: use pat_index instead of cache_level
>>    drm/i915: make sure correct pte encode is used
>>    drm/i915/mtl: end support for set caching ioctl
>>    drm/i915: Allow user to set cache at BO creation
>>
>> drivers/gpu/drm/i915/display/intel_dpt.c      | 12 +--
>> drivers/gpu/drm/i915/gem/i915_gem_create.c    | 36 +
>> drivers/gpu/drm/i915/gem/i915_gem_domain.c    | 46 ++-
>> .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 10 ++-
>> drivers/gpu/drm/i915/gem/i915_gem_mman.c      |  3 +-
>> drivers/gpu/drm/i915/gem/i915_gem_object.c    | 67 +++-
>> drivers/gpu/drm/i915/gem/i915_gem_object.h    |  8 ++
>> .../gpu/drm/i915/gem/i915_gem_object_types.h  | 26 +-
>> drivers/gpu/drm/i915/gem/i915_gem_shmem.c     |  9 ++-
>> drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c    |  4 +-
>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
>> .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
>> .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
>> .../drm/i915/gem/selftests/i915_gem_mman.c    |  2 +-
>> drivers/gpu/drm/i915/gt/gen6_ppgtt.c          | 10 ++-
>> drivers/gpu/drm/i915/gt/gen8_ppgtt.c          | 73 +
>> drivers/gpu/drm/i915/gt/gen8_ppgtt.h          |  3 +-
>> drivers/gpu/drm/i915/gt/intel_ggtt.c          | 76 +-
>> drivers/gpu/drm/i915/gt/intel_gtt.h           | 20 +++--
>>   drivers/gpu/drm/i915/gt/intel_migrate.c       | 47 ++-
>> drivers/gpu/drm/i915/gt/intel_migrate.h       | 13 ++-
>> drivers/gpu/drm/i915/gt/intel_ppgtt.c         |  6 +-
>> drivers/gpu/drm/i915/gt/selftest_migrate.c    | 47 +--
>> drivers/gpu/drm/i915/gt/selftest_reset.c      |  8 +-
>> drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
>> drivers/gpu/drm/i915/gt/selftest_tlb.c        |  4 +-
>> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c      | 10 ++-
>> drivers/gpu/drm/i915/i915_debugfs.c           | 55 ++---
>> drivers/gpu/drm/i915/i915_gem.c               | 16 +++-
>> drivers/gpu/drm/i915/i915_gpu_error.c         |  8 +-
>> drivers/gpu/drm/i915/i915_pci.c               | 79 ---
>> drivers/gpu/drm/i915/i915_vma.c               | 16 ++--
>> drivers/gpu/drm/i915/i915_vma.h               |  2 +-
>> drivers/gpu/drm/i915/i915_vma_types.h         |  2 -
>> drivers/gpu/drm/i915/intel_device_info.h      |  5 ++
>> drivers/gpu/drm/i915/selftests/i915_gem.c     |  5 +-
>> .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>> drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>> .../drm/i915/selftests/intel_memory_region.c  |  4 +-
>> .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 +++
>> drivers/gpu/drm/i915/selftests/mock_gtt.c     |  8 +-
>> include/uapi/drm/i915_drm.h                   | 36 +
>> tools/include/uapi/drm/i915_drm.h             | 36

Re: [PATCH v3 0/5] drm/i915: Allow user to set cache at BO creation

2023-04-28 Thread Intel




On 4/28/23 07:47, fei.y...@intel.com wrote:

From: Fei Yang 

The first three patches in this series are taken from
https://patchwork.freedesktop.org/series/116868/
These patches are included here because the last patch
has dependency on the pat_index refactor.

This series is focusing on uAPI changes,
1. end support for set caching ioctl [PATCH 4/5]
2. add set_pat extension for gem_create [PATCH 5/5]

v2: drop one patch that was merged separately
 341ad0e8e254 drm/i915/mtl: Add PTE encode function
v3: rebase on https://patchwork.freedesktop.org/series/117082/


Hi, Fei.

Does this uAPI update also affect any discrete GPUs supported by i915, 
And in that case, does it allow setting non-snooping PAT indices on 
those devices?


If so, since the uAPI for discrete GPU devices doesn't allow incoherency 
between GPU and CPU (apart from write-combining buffering), the correct 
CPU caching mode matching the PAT index needs to be selected for the 
buffer object in i915_ttm_select_tt_caching().


Thanks,
Thomas



Fei Yang (5):
   drm/i915: preparation for using PAT index
   drm/i915: use pat_index instead of cache_level
   drm/i915: make sure correct pte encode is used
   drm/i915/mtl: end support for set caching ioctl
   drm/i915: Allow user to set cache at BO creation

  drivers/gpu/drm/i915/display/intel_dpt.c  | 12 +--
  drivers/gpu/drm/i915/gem/i915_gem_create.c| 36 +
  drivers/gpu/drm/i915/gem/i915_gem_domain.c| 46 ++-
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++-
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  3 +-
  drivers/gpu/drm/i915/gem/i915_gem_object.c| 67 +++-
  drivers/gpu/drm/i915/gem/i915_gem_object.h|  8 ++
  .../gpu/drm/i915/gem/i915_gem_object_types.h  | 26 +-
  drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  9 ++-
  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |  2 -
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  4 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c  | 16 ++--
  .../gpu/drm/i915/gem/selftests/huge_pages.c   |  2 +-
  .../drm/i915/gem/selftests/i915_gem_migrate.c |  2 +-
  .../drm/i915/gem/selftests/i915_gem_mman.c|  2 +-
  drivers/gpu/drm/i915/gt/gen6_ppgtt.c  | 10 ++-
  drivers/gpu/drm/i915/gt/gen8_ppgtt.c  | 73 +
  drivers/gpu/drm/i915/gt/gen8_ppgtt.h  |  3 +-
  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 76 +-
  drivers/gpu/drm/i915/gt/intel_gtt.h   | 20 +++--
  drivers/gpu/drm/i915/gt/intel_migrate.c   | 47 ++-
  drivers/gpu/drm/i915/gt/intel_migrate.h   | 13 ++-
  drivers/gpu/drm/i915/gt/intel_ppgtt.c |  6 +-
  drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 +--
  drivers/gpu/drm/i915/gt/selftest_reset.c  |  8 +-
  drivers/gpu/drm/i915/gt/selftest_timeline.c   |  2 +-
  drivers/gpu/drm/i915/gt/selftest_tlb.c|  4 +-
  drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c  | 10 ++-
  drivers/gpu/drm/i915/i915_debugfs.c   | 55 ++---
  drivers/gpu/drm/i915/i915_gem.c   | 16 +++-
  drivers/gpu/drm/i915/i915_gpu_error.c |  8 +-
  drivers/gpu/drm/i915/i915_pci.c   | 79 ---
  drivers/gpu/drm/i915/i915_vma.c   | 16 ++--
  drivers/gpu/drm/i915/i915_vma.h   |  2 +-
  drivers/gpu/drm/i915/i915_vma_types.h |  2 -
  drivers/gpu/drm/i915/intel_device_info.h  |  5 ++
  drivers/gpu/drm/i915/selftests/i915_gem.c |  5 +-
  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
  .../drm/i915/selftests/intel_memory_region.c  |  4 +-
  .../gpu/drm/i915/selftests/mock_gem_device.c  |  9 +++
  drivers/gpu/drm/i915/selftests/mock_gtt.c |  8 +-
  include/uapi/drm/i915_drm.h   | 36 +
  tools/include/uapi/drm/i915_drm.h | 36 +
  44 files changed, 621 insertions(+), 243 deletions(-)

Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences

2023-04-04 Thread Intel




On 4/4/23 15:10, Christian König wrote:

Am 04.04.23 um 14:54 schrieb Thomas Hellström:

Hi, Christian,

On 4/4/23 11:09, Christian König wrote:

Am 04.04.23 um 02:22 schrieb Matthew Brost:

From: Thomas Hellström 

For long-running workloads, drivers either need to open-code 
completion

waits, invent their own synchronization primitives or internally use
dma-fences that do not obey the cross-driver dma-fence protocol, but
without any lockdep annotation all these approaches are error prone.

So since for example the drm scheduler uses dma-fences it is 
desirable for
a driver to be able to use it for throttling and error handling 
also with
internal dma-fences tha do not obey the cros-driver dma-fence 
protocol.


Introduce long-running completion fences in form of dma-fences, and 
add

lockdep annotation for them. In particular:

* Do not allow waiting under any memory management locks.
* Do not allow to attach them to a dma-resv object.
* Introduce a new interface for adding callbacks making the helper 
adding

   a callback sign off on that it is aware that the dma-fence may not
   complete anytime soon. Typically this will be the scheduler 
chaining

   a new long-running fence on another one.


Well that's pretty much what I tried before: 
https://lwn.net/Articles/893704/


And the reasons why it was rejected haven't changed.

Regards,
Christian.

Yes, TBH this was mostly to get discussion going how we'd best tackle 
this problem while being able to reuse the scheduler for long-running 
workloads.


I couldn't see any clear decision on your series, though, but one 
main difference I see is that this is intended for driver-internal 
use only. (I'm counting using the drm_scheduler as a helper for 
driver-private use). This is by no means a way to try tackle the 
indefinite fence problem.


Well this was just my latest try to tackle this, but essentially the 
problems are the same as with your approach: When we express such 
operations as dma_fence there is always the change that we leak that 
somewhere.


My approach of adding a flag noting that this operation is dangerous 
and can't be synced with something memory management depends on tried 
to contain this as much as possible, but Daniel still pretty clearly 
rejected it (for good reasons I think).




We could ofc invent a completely different data-type that abstracts 
the synchronization the scheduler needs in the long-running case, or 
each driver could hack something up, like sleeping in the 
prepare_job() or run_job() callback for throttling, but those waits 
should still be annotated in one way or annotated one way or another 
(and probably in a similar way across drivers) to make sure we don't 
do anything bad.


 So any suggestions as to what would be the better solution here 
would be appreciated.


Mhm, do we really the the GPU scheduler for that?

I mean in the 1 to 1 case  you basically just need a component which 
collects the dependencies as dma_fence and if all of them are 
fulfilled schedules a work item.


As long as the work item itself doesn't produce a dma_fence it can 
then still just wait for other none dma_fence dependencies.


Then the work function could submit the work and wait for the result.

The work item would then pretty much represent what you want, you can 
wait for it to finish and pass it along as long running dependency.


Maybe give it a funky name and wrap it up in a structure, but that's 
basically it.


This very much sounds like a i915_sw_fence for the dependency tracking 
and dma_fence_work for the actual work although it's completion fence is 
a dma_fence.


Although that goes against the whole idea of a condition for merging the 
xe driver would be that we implement some sort of minimal scaffolding 
for long-running workloads in the drm scheduler, and the thinking behind 
that is to avoid implementing intel-specific solutions like those...


Thanks,

Thomas




Regards,
Christian.



Thanks,

Thomas

Re: [PATCH 1/9] drm: execution context for GEM buffers v3

2023-03-10 Thread Intel


Hi Christian

On 2/28/23 09:33, Christian König wrote:

This adds the infrastructure for an execution context for GEM buffers
which is similar to the existinc TTMs execbuf util and intended to replace
it in the long term.

The basic functionality is that we abstracts the necessary loop to lock
many different GEM buffers with automated deadlock and duplicate handling.

v2: drop xarray and use dynamic resized array instead, the locking
 overhead is unecessary and measureable.
v3: drop duplicate tracking, radeon is really the only one needing that.

Signed-off-by: Christian König 


Nice. This seems to have all we need for now for Xe as well, although 
not for i915 ATM.


I see future exension of this for things like sleeping lock evictions 
(where locks may go out-of-scope during the locking loop), including 
other resv containers, passing through dma-buf move_notify() etc, what 
are your thoughts around that? Extend this or build a different 
functionality and make this a specialization of that?


What about the drm_gem_lock_reservations() interface? While a bit less 
capable it's confusing having two apis doing essentially the same thing, 
could we deprecate that?


Finally a comment below:



---
  Documentation/gpu/drm-mm.rst |  12 ++
  drivers/gpu/drm/Kconfig  |   6 +
  drivers/gpu/drm/Makefile |   2 +
  drivers/gpu/drm/drm_exec.c   | 249 +++
  include/drm/drm_exec.h   | 115 
  5 files changed, 384 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_exec.c
  create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index a79fd3549ff8..a52e6f4117d6 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -493,6 +493,18 @@ DRM Sync Objects
  .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
 :export:
  
+DRM Execution context

+=
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
  GPU Scheduler
  =
  
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig

index 17d252dc25e2..84a5fc28c48d 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -200,6 +200,12 @@ config DRM_TTM
  GPU memory types. Will be enabled automatically if a device driver
  uses it.
  
+config DRM_EXEC

+   tristate
+   depends on DRM
+   help
+ Execution context for command submissions
+
  config DRM_BUDDY
tristate
depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index ab4460fcd63f..d40defbb0347 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += 
drm_panel_orientation_quirks.o
  #
  # Memory-management helpers
  #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
  
  obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
  
diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c

new file mode 100644
index ..df546cc5a227
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,249 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include 
+#include 
+#include 
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking
+ * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure
+ * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ * struct drm_gem_object *obj;
+ * struct drm_exec exec;
+ * unsigned long index;
+ * int ret;
+ *
+ * drm_exec_init(, true);
+ * drm_exec_while_not_all_locked() {
+ * ret = drm_exec_prepare_obj(, boA, 1);
+ * drm_exec_continue_on_contention();
+ * if (ret)
+ * goto error;
+ *
+ * ret = drm_exec_lock(, boB, 1);


Should be drm_exec_prepare_obj()?


+ * drm_exec_continue_on_contention();
+ * if (ret)
+ * goto error;
+ * }
+ *
+ * drm_exec_for_each_locked_object(, index, obj) {
+ * dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ * ...
+ * }
+ * drm_exec_fini();
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Dummy value used to initially enter the retry loop */
+#define DRM_EXEC_DUMMY (void*)~0
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+   struct drm_gem_object *obj;
+   unsigned long index;
+
+

Re: [PATCH 9/9] drm: move ttm_execbuf_util into vmwgfx

2023-03-09 Thread Intel


Hi,

On 3/9/23 06:14, Zack Rusin wrote:

On Wed, 2023-03-08 at 10:10 +0100, Christian König wrote:

Am 08.03.23 um 06:14 schrieb Zack Rusin:

On Tue, 2023-02-28 at 09:34 +0100, Christian König wrote:

VMWGFX is the only remaining user of this and should probably moved over
to drm_exec when it starts using GEM as well.

Is this because vmwgfx piggybacks buffer-id relocations on top of ttm
validations or
did you just find it too hard to port it over? I'd prefer to avoid ttm moves to
vmwgfx and at least have a clear idea of what we need to do to port.

I've just found it to hard to port it over because vmwgfx does some
strange things with the validation code here.

If you want we can take a deeper look at this together, but I need to
find some time.

Alternatively just tell me how to do it and I will add that to the patch
set :)

I don't want to hold up the set (it looks good btw), because I had to look at
something else today and tomorrow.

We overload the validation lists to do quite a bit more than just reservations
though.

There are, I think, four separate things that need to be refactored there
(Christian, feel free to skip this section, this is mainly for VMware folks on 
the
team):
1) Relocations - userspace uses the id's of the bo's in the command stream, but 
on
the kernel side those id's are different (or in vmwgfx terminology gem id != mob
id), so the buffer id's in the command stream need to be replaced,
2) Resource validation. vmwgfx splits the userspace objects into buffers and
resources (shaders, surfaces, contexts). The resources are not buffers but are
backed by them. A single buffer can back multiple different resources and 
sometimes
the kernel has to actually allocate a buffer to back a resource and attach it 
to it
(i.e. in common terminology buffer is the memory and resources are placed in 
it) .
Now this shouldn't be in the kernel at all, the resources shouldn't have been 
kernel
objects and instead we should have left them completely to userspace.


The reason behind this is partly historical, since initially the 
resources (IIRC surfaces and shaders at that time),

were per-device and not per context and not backed by buffer objects.

The main reason for not doing the transition to user-space for resources 
was the SVGA device "between-draw-call-validation". If you for example 
unbound a mob that was backing a resource that was bound to a context, 
you'd need to start a whole chain of resource invalidations to avoid the 
device complaining about invalid setups at awkward moments, for example 
when it felt like suspending.


Not sure whether that is gone now though, or whether there are better 
ways to handle that.


/Thomas



3) Coherency tracking. We use validation lists as a central place for tracking 
which
bo's/resources are used in a command buffer and we use it to keep track of which
buffers/resources will endup dirty to implement coherency.
4) Central place to allocate memory for relocation/validation nodes.

Where we want to endup is with 2 completely gone from the kernel side and 1, 3 
and 4
refactored and cleaned up. I think there's at least 4 separate patches to this 
port,
so it's not a trivial thing. We will take a look at this on Friday in more 
detail to
see what we can do.

z

Re: [PATCH 9/9] drm: move ttm_execbuf_util into vmwgfx

2023-03-09 Thread Intel




On 3/8/23 10:10, Christian König wrote:

Am 08.03.23 um 06:14 schrieb Zack Rusin:

On Tue, 2023-02-28 at 09:34 +0100, Christian König wrote:
VMWGFX is the only remaining user of this and should probably moved 
over

to drm_exec when it starts using GEM as well.
Is this because vmwgfx piggybacks buffer-id relocations on top of ttm 
validations or
did you just find it too hard to port it over? I'd prefer to avoid 
ttm moves to

vmwgfx and at least have a clear idea of what we need to do to port.


I've just found it to hard to port it over because vmwgfx does some 
strange things with the validation code here.


If you want we can take a deeper look at this together, but I need to 
find some time.


Alternatively just tell me how to do it and I will add that to the 
patch set :)


Regards,
Christian.


Hmm.

It turns out Xe was using these from the very start as well. But I will 
take a look at what it take to deprecate that usage, so don't let that 
stop this removal. We need a more flexible WW transaction handling anyway.


/Thomas





z

Re: [PATCH 2/3] drm/amd: Convert amdgpu to use suballocation helper.

2023-02-23 Thread Intel




On 2/23/23 17:22, Christian König wrote:

Am 23.02.23 um 15:29 schrieb Thomas Hellström:


On 2/23/23 12:15, Christian König wrote:

Am 23.02.23 um 11:57 schrieb Thomas Hellström:

From: Maarten Lankhorst 

Now that we have a generic suballocation helper, Use it in amdgpu.
For lines that get moved or changed, also fix up pre-existing style 
issues.


Signed-off-by: Maarten Lankhorst 
Co-developed-by: Thomas Hellström 
Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/Kconfig    |   1 +
  drivers/gpu/drm/amd/amdgpu/Kconfig |   1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  26 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c |   5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  23 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 324 
++---

  7 files changed, 46 insertions(+), 337 deletions(-)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 8fbe57407c60..73ddfdf3a894 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -77,6 +77,7 @@ config DRM_KUNIT_TEST
  select DRM_DISPLAY_HELPER
  select DRM_LIB_RANDOM
  select DRM_KMS_HELPER
+    select DRM_SUBALLOC_HELPER
  select DRM_BUDDY
  select DRM_EXPORT_FOR_TESTS if m
  select DRM_KUNIT_TEST_HELPERS


This looks like it's misplaced, apart from that the patch looks good 
to me.


Looks like a TAB vs spaces issue. The resulting file looks correct. 
Also added the same select for Radeon in the following patch which 
was forgotten.



That wasn't what I meant. This here is the patch to change amdgpu, but 
you are adding the dependency to the KUNIT test.


It looks like that adding this line should be in patch #1, not patch #2.


Ah yes, hmm. that change shouldn't be there at all. It probably strayed 
from the Kunit test that I haven't posted yet.


Will do a v4. We need to get rid of those integer divisions as well.

/Thomas



Regards,
Christian.



Added your R-B to all patches, even if it wasn't exlicit for this 
one. Please let me know if I misunderstood that one.


Thanks,

Thomas




Regards,
Christian.

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig

index 5341b6b242c3..0ed12171450b 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -18,6 +18,7 @@ config DRM_AMDGPU
  select BACKLIGHT_CLASS_DEVICE
  select INTERVAL_TREE
  select DRM_BUDDY
+    select DRM_SUBALLOC_HELPER
  # amdgpu depends on ACPI_VIDEO when ACPI is enabled, for 
select to work

  # ACPI_VIDEO's dependencies must also be selected.
  select INPUT if ACPI
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index 164141bc8b4a..dda88090f044 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -424,29 +424,11 @@ struct amdgpu_clock {
   * alignment).
   */
  -#define AMDGPU_SA_NUM_FENCE_LISTS    32
-
  struct amdgpu_sa_manager {
-    wait_queue_head_t    wq;
-    struct amdgpu_bo    *bo;
-    struct list_head    *hole;
-    struct list_head    flist[AMDGPU_SA_NUM_FENCE_LISTS];
-    struct list_head    olist;
-    unsigned    size;
-    uint64_t    gpu_addr;
-    void    *cpu_ptr;
-    uint32_t    domain;
-    uint32_t    align;
-};
-
-/* sub-allocation buffer */
-struct amdgpu_sa_bo {
-    struct list_head    olist;
-    struct list_head    flist;
-    struct amdgpu_sa_manager    *manager;
-    unsigned    soffset;
-    unsigned    eoffset;
-    struct dma_fence    *fence;
+    struct drm_suballoc_manager    base;
+    struct amdgpu_bo    *bo;
+    uint64_t    gpu_addr;
+    void    *cpu_ptr;
  };
    int amdgpu_fence_slab_init(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c

index b348dbe2..df7eb0b7c4b9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -69,7 +69,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,

    if (size) {
  r = amdgpu_sa_bo_new(>ib_pools[pool_type],
-  >sa_bo, size, 256);
+ >sa_bo, size);
  if (r) {
  dev_err(adev->dev, "failed to get a new IB (%d)\n", r);
  return r;
@@ -309,8 +309,7 @@ int amdgpu_ib_pool_init(struct amdgpu_device 
*adev)

    for (i = 0; i < AMDGPU_IB_POOL_MAX; i++) {
  r = amdgpu_sa_bo_manager_init(adev, >ib_pools[i],
-  AMDGPU_IB_POOL_SIZE,
-  AMDGPU_GPU_PAGE_SIZE,
+  AMDGPU_IB_POOL_SIZE, 256,
    AMDGPU_GEM_DOMAIN_GTT);
  if (r)
  goto error;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h

index 93207badf83f..5a85726ce853 100644
---

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-18 Thread Intel




On 1/19/23 05:58, Matthew Brost wrote:

On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote:

On 1/18/23 21:37, Thomas Hellström (Intel) wrote:

On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
     DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
     space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
     to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
     UMDs can request the named operations to be processed either
     synchronously or asynchronously. It supports DRM syncobjs
     (incl. timelines) as synchronization mechanism. The management of the
     GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
     execution happens asynchronously. It supports DRM syncobj (incl.
     timelines) as synchronization mechanism. DRM GEM object locking is
     handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
   Documentation/gpu/driver-uapi.rst   |   3 +
   drivers/gpu/drm/nouveau/Kbuild  |   2 +
   drivers/gpu/drm/nouveau/Kconfig |   2 +
   drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
   drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
   drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
   drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
   drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
   drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
   drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
   drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
   11 files changed, 1295 insertions(+), 4 deletions(-)
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
   create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...

+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+    struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+    struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+    struct bind_job_op *op;
+    int ret = 0;
+

I was looking at how nouveau does the async binding compared to how xe
does it.
It looks to me that this function being a scheduler run_job callback is
the main part of the VM_BIND dma-fence signalling critical section for
the job's done_fence and if so, needs to be annotated as such?

Yes, that's the case.


For example nouveau_uvma_region_new allocates memory, which is not
allowed if in a dma_fence signalling critical section and the locking
also looks suspicious?

Thanks for pointing this out, I missed that somehow.

I will change it to pre-allocate new regions, mappings and page tables
within the job's submit() function.


Yea that what we basically do in Xe, in the IOCTL step allocate all the
backing store for new page tables, populate new page tables (these are
not yet visible in the page table structure), and in last step which is
executed after all the dependencies are satified program all the leaf
entires making the new binding visible.

We screwed have this up by defering most of the IOCTL to a worker but
will fix this fix this one way or another soon - get rid of worker or
introduce a type of sync that is signaled after the worker + publish the
dma-fence in the worker. I'd like to close on this one soon.
  

For the ops structures the drm_gpuva_manager allocates for reporting the
split/merge steps back to the driver I have ideas to entirely avoid
allocations, which also is a good thing in respect of Christians feedback
regarding the huge amount of mapping requests some applications seem to
generate.


It should be fine to have allocations to report the split/merge step as
this step should be before a dma-fence is published, but yea if possible
to avoid extra allocs as that is always better.

Also BTW, great work on drm_gpuva_manager too. We will almost likely
pick this up in Xe rather than open coding all of this as we currently
do. We should probably start the port to this soon so we can contribute
to the implementation and get both of our drivers upstream sooner.
  

Regarding the locking, anything specific that makes it look suspicious to
you?


I haven't looked into this too but almost certainly Thomas is suggesting
that if you allocate memory anywhere under the nouveau_uvmm_lock then
you can't use this lock in the run_job() callback as this in the
dma-fencing path.


Yes, that was what looked suspicious to me, although I haven't either 
looked at the code in detail to say for sure.


But starting by annotating this with dma_fence_[be

Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI

2023-01-18 Thread Intel




On 1/18/23 07:12, Danilo Krummrich wrote:

This commit provides the implementation for the new uapi motivated by the
Vulkan API. It allows user mode drivers (UMDs) to:

1) Initialize a GPU virtual address (VA) space via the new
DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA
space managed by the kernel and userspace, respectively.

2) Allocate and free a VA space region as well as bind and unbind memory
to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl.
UMDs can request the named operations to be processed either
synchronously or asynchronously. It supports DRM syncobjs
(incl. timelines) as synchronization mechanism. The management of the
GPU VA mappings is implemented with the DRM GPU VA manager.

3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The
execution happens asynchronously. It supports DRM syncobj (incl.
timelines) as synchronization mechanism. DRM GEM object locking is
handled with drm_exec.

Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM
GPU scheduler for the asynchronous paths.

Signed-off-by: Danilo Krummrich 
---
  Documentation/gpu/driver-uapi.rst   |   3 +
  drivers/gpu/drm/nouveau/Kbuild  |   2 +
  drivers/gpu/drm/nouveau/Kconfig |   2 +
  drivers/gpu/drm/nouveau/nouveau_abi16.c |  16 +
  drivers/gpu/drm/nouveau/nouveau_abi16.h |   1 +
  drivers/gpu/drm/nouveau/nouveau_drm.c   |  23 +-
  drivers/gpu/drm/nouveau/nouveau_drv.h   |   9 +-
  drivers/gpu/drm/nouveau/nouveau_exec.c  | 310 ++
  drivers/gpu/drm/nouveau/nouveau_exec.h  |  55 ++
  drivers/gpu/drm/nouveau/nouveau_sched.c | 780 
  drivers/gpu/drm/nouveau/nouveau_sched.h |  98 +++
  11 files changed, 1295 insertions(+), 4 deletions(-)
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c
  create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h

...


+static struct dma_fence *
+nouveau_bind_job_run(struct nouveau_job *job)
+{
+   struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+   struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli);
+   struct bind_job_op *op;
+   int ret = 0;
+


I was looking at how nouveau does the async binding compared to how xe 
does it.
It looks to me that this function being a scheduler run_job callback is 
the main part of the VM_BIND dma-fence signalling critical section for 
the job's done_fence and if so, needs to be annotated as such?


For example nouveau_uvma_region_new allocates memory, which is not 
allowed if in a dma_fence signalling critical section and the locking 
also looks suspicious?


Thanks,

Thomas



+   nouveau_uvmm_lock(uvmm);
+   list_for_each_op(op, _job->ops) {
+   switch (op->op) {
+   case OP_ALLOC: {
+   bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE;
+
+   ret = nouveau_uvma_region_new(uvmm,
+ op->va.addr,
+ op->va.range,
+ sparse);
+   if (ret)
+   goto out_unlock;
+   break;
+   }
+   case OP_FREE:
+   ret = nouveau_uvma_region_destroy(uvmm,
+ op->va.addr,
+ op->va.range);
+   if (ret)
+   goto out_unlock;
+   break;
+   case OP_MAP:
+   ret = nouveau_uvmm_sm_map(uvmm,
+ op->va.addr, op->va.range,
+ op->gem.obj, op->gem.offset,
+ op->flags && 0xff);
+   if (ret)
+   goto out_unlock;
+   break;
+   case OP_UNMAP:
+   ret = nouveau_uvmm_sm_unmap(uvmm,
+   op->va.addr,
+   op->va.range);
+   if (ret)
+   goto out_unlock;
+   break;
+   }
+   }
+
+out_unlock:
+   nouveau_uvmm_unlock(uvmm);
+   if (ret)
+   NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret);
+   return ERR_PTR(ret);
+}
+
+static void
+nouveau_bind_job_free(struct nouveau_job *job)
+{
+   struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job);
+   struct bind_job_op *op, *next;
+
+   list_for_each_op_safe(op, next, _job->ops) {
+   struct drm_gem_object *obj =

Re: [PATCH 05/16] drm/vmwgfx: Refactor resource validation hashtable to use linux/hashtable implementation.

2022-10-19 Thread Intel


Hi, Zack,

On 10/19/22 18:32, Zack Rusin wrote:

On Wed, 2022-10-19 at 12:21 +0200, Thomas Hellström (Intel) wrote:

⚠ External Email

On 10/17/22 21:54, Zack Rusin wrote:

From: Maaz Mombasawala 

Vmwgfx's hashtab implementation needs to be replaced with linux/hashtable
to reduce maintenence burden.
As part of this effort, refactor the res_ht hashtable used for resource
validation during execbuf execution to use linux/hashtable implementation.
This also refactors vmw_validation_context to use vmw_sw_context as the
container for the hashtable, whereas before it used a vmwgfx_open_hash
directly. This makes vmw_validation_context less generic, but there is
no functional change since res_ht is the only instance where validation
context used a hashtable in vmwgfx driver.

Why is a pointer to the vmw_sw_context added to the validation context,
rather than a pointer to the new hash table type itself? Seems like an
unnecessary indirection which adds a dependency on the sw context to the
validation context?

Hi, Thomas. Thanks for taking a look at this! That's because the entire public
interface of hashtable depends on it being a struct on which sizeof works rather
than a pointer, i.e. almost the entire interface of hasthbale.h is defined and 
they
all require that HASH_SIZE(hashtable) which is defined as
#define HASH_SIZE(hashtable) (ARRAY_SIZE(hashtable))
works on the hashtable. So the interface of hashtable.h can't operate on 
pointers,
but rather needs the full struct.

So the only two options are either rewriting the functions from 
linux/hashtable.h to
take explicit HASH_SIZE(hashtable) or make sure that in the functions where you 
will
use hashtable.h you pass the parent struct so that sizeof on the hashtable works
correctly. I've seen both approaches in the kernel, but in this case we thought 
the
latter made more sense.


Ah, Ok, yes, tricky one. Lots of options, none of them perfect.

Reviewed-by: Thomas Hellström

Re: [PATCH 05/16] drm/vmwgfx: Refactor resource validation hashtable to use linux/hashtable implementation.

2022-10-19 Thread Intel




On 10/17/22 21:54, Zack Rusin wrote:

From: Maaz Mombasawala 

Vmwgfx's hashtab implementation needs to be replaced with linux/hashtable
to reduce maintenence burden.
As part of this effort, refactor the res_ht hashtable used for resource
validation during execbuf execution to use linux/hashtable implementation.
This also refactors vmw_validation_context to use vmw_sw_context as the
container for the hashtable, whereas before it used a vmwgfx_open_hash
directly. This makes vmw_validation_context less generic, but there is
no functional change since res_ht is the only instance where validation
context used a hashtable in vmwgfx driver.


Why is a pointer to the vmw_sw_context added to the validation context, 
rather than a pointer to the new hash table type itself? Seems like an 
unnecessary indirection which adds a dependency on the sw context to the 
validation context?



Signed-off-by: Maaz Mombasawala 
Signed-off-by: Zack Rusin 
---
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c| 24 --
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.h|  5 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c| 14 ++
  drivers/gpu/drm/vmwgfx/vmwgfx_validation.c | 55 +++---
  drivers/gpu/drm/vmwgfx/vmwgfx_validation.h | 26 +++---
  5 files changed, 58 insertions(+), 66 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 13b90273eb77..8d77e79bd904 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -830,6 +830,22 @@ static void vmw_write_driver_id(struct vmw_private *dev)
}
  }
  
+static void vmw_sw_context_init(struct vmw_private *dev_priv)

+{
+   struct vmw_sw_context *sw_context = _priv->ctx;
+
+   hash_init(sw_context->res_ht);
+}
+
+static void vmw_sw_context_fini(struct vmw_private *dev_priv)
+{
+   struct vmw_sw_context *sw_context = _priv->ctx;
+
+   vfree(sw_context->cmd_bounce);
+   if (sw_context->staged_bindings)
+   vmw_binding_state_free(sw_context->staged_bindings);
+}
+
  static int vmw_driver_load(struct vmw_private *dev_priv, u32 pci_id)
  {
int ret;
@@ -839,6 +855,8 @@ static int vmw_driver_load(struct vmw_private *dev_priv, 
u32 pci_id)
  
  	dev_priv->drm.dev_private = dev_priv;
  
+	vmw_sw_context_init(dev_priv);

+
mutex_init(_priv->cmdbuf_mutex);
mutex_init(_priv->binding_mutex);
spin_lock_init(_priv->resource_lock);
@@ -1168,9 +1186,7 @@ static void vmw_driver_unload(struct drm_device *dev)
  
  	unregister_pm_notifier(_priv->pm_nb);
  
-	if (dev_priv->ctx.res_ht_initialized)

-   vmwgfx_ht_remove(_priv->ctx.res_ht);
-   vfree(dev_priv->ctx.cmd_bounce);
+   vmw_sw_context_fini(dev_priv);
if (dev_priv->enable_fb) {
vmw_fb_off(dev_priv);
vmw_fb_close(dev_priv);
@@ -1198,8 +1214,6 @@ static void vmw_driver_unload(struct drm_device *dev)
vmw_irq_uninstall(_priv->drm);
  
  	ttm_object_device_release(_priv->tdev);

-   if (dev_priv->ctx.staged_bindings)
-   vmw_binding_state_free(dev_priv->ctx.staged_bindings);
  
  	for (i = vmw_res_context; i < vmw_res_max; ++i)

idr_destroy(_priv->res_idr[i]);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index 09e2d738aa87..d87aeedb78d0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -30,6 +30,7 @@
  
  #include 

  #include 
+#include 
  
  #include 

  #include 
@@ -93,6 +94,7 @@
  #define VMW_RES_STREAM ttm_driver_type2
  #define VMW_RES_FENCE ttm_driver_type3
  #define VMW_RES_SHADER ttm_driver_type4
+#define VMW_RES_HT_ORDER 12
  
  #define MKSSTAT_CAPACITY_LOG2 5U

  #define MKSSTAT_CAPACITY (1U << MKSSTAT_CAPACITY_LOG2)
@@ -425,8 +427,7 @@ struct vmw_ctx_validation_info;
   * @ctx: The validation context
   */
  struct vmw_sw_context{
-   struct vmwgfx_open_hash res_ht;
-   bool res_ht_initialized;
+   DECLARE_HASHTABLE(res_ht, VMW_RES_HT_ORDER);
bool kernel;
struct vmw_fpriv *fp;
struct drm_file *filp;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index f085dbd4736d..c943ab801ca7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -1,7 +1,7 @@
  // SPDX-License-Identifier: GPL-2.0 OR MIT
  /**
   *
- * Copyright 2009 - 2015 VMware, Inc., Palo Alto, CA., USA
+ * Copyright 2009 - 2022 VMware, Inc., Palo Alto, CA., USA
   *
   * Permission is hereby granted, free of charge, to any person obtaining a
   * copy of this software and associated documentation files (the
@@ -25,6 +25,7 @@
   *
   **/
  #include 
+#include 
  
  #include "vmwgfx_drv.h"

  #include "vmwgfx_reg.h"
@@ -34,7 +35,6 @@
  #include

Re: Balancing ttm_mem_io_reserve() and ttm_mem_io_free()

2022-08-19 Thread Intel


Hi, Christian,

On 8/19/22 10:52, Christian König wrote:

Hi Thomas,

IIRC we intentionally dropped that approach of balancing 
ttm_mem_io_reserve()/ttm_mem_io_free().


Instead the results from ttm_mem_io_reserve() are cached and that 
cached information is freed by ttm_mem_io_free(). In other words every 
time we need to make sure we have the cache filled we call 
ttm_mem_io_reserve() and everytime we are about to free the resource 
or don't need the mapping any more we call ttm_mem_io_free().


The callbacks to io_mem_reserve() and io_mem_free() are then balanced.


Hmm, yes, in the end at resource destroy, anything reserved would indeed 
have been freed, but consider the following:


ttm_bo_vm_fault();
ttm_bo_vmap();
ttm_bo_vunmap();
ttm_bo_unmap_virtual();

Here, wouldn't we release the io_reservation on ttm_bo_vunmap() when it 
should really have been released on ttm_bo_unmap_virtual()?





Fixing missing _free() calls in the error path is probably a good 
idea, but I wouldn't go beyond that.


Why should any of that be racy? You need to hold the reservation lock 
to call any of those functions.


It's when now a ttm_resource has been detached from a bo, and combined 
with an ongoing async memcpy we no longer have a bo reservation to 
protect with. Now the async memcpy currently only exists in i915 and we 
might at some point be able to get rid of it, but it illustrates the 
problem.


Thanks,

Thomas




Regards,
Christian.






Am 19.08.22 um 10:13 schrieb Thomas Hellström:

Hi Christian,

I'm looking for a way to take some sort of reference across possible 
VRAM accesses  over the PCI bar, be it for runtime PM, workarounds or 
whatever.


The ttm_mem_io_reserve/free seems like a good candidate, but is 
completely unbalanced and looks racy. In particular error paths 
forget to call ttm_mem_io_free().


Would you have any objections if I took a look at attempting to 
balance calls to those functions, or do you have any other 
suggestions for a better method?


Thanks,

Thomas

Re: [PATCH v7 2/2] drm/gem: Don't map imported GEMs

2022-06-30 Thread Intel




On 6/30/22 22:22, Dmitry Osipenko wrote:

Hello Thomas,

On 6/30/22 23:15, Thomas Hellström (Intel) wrote:

Hi, Dmitry,

On 6/30/22 22:04, Dmitry Osipenko wrote:

Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
handle imported dma-bufs properly, which results in mapping of something
else than the imported dma-buf. On NVIDIA Tegra we get a hard lockup when
userspace writes to the memory mapping of a dma-buf that was imported
into
Tegra's DRM GEM.

Majority of DRM drivers prohibit mapping of the imported GEM objects.
Mapping of imported GEMs require special care from userspace since it
should sync dma-buf because mapping coherency of the exporter device may
not match the DRM device. Let's prohibit the mapping for all DRM drivers
for consistency.

Cc: sta...@vger.kernel.org
Suggested-by: Thomas Hellström 
Signed-off-by: Dmitry Osipenko 

This might break drivers whose obj->funcs->mmap() callback already
handles this case, and has userspace that does the right thing.

The drm-shmem helper should be the only that maps imported GEMs
properly, but drivers that use drm-shmem already prohibit to map
imported GEMs. Okay, I'll try to re-check once again to be sure.


OK. If you aren't 100.1% sure, then please drop the Cc: stable tag and 
let the patch ride out at least an -rc series, because breaking a stable 
kernel is something we woudln't want to do.


Thanks,

Thomas






I think the disabling must be checked on a per-driver basis to avoid
that; in particular drivers that already call dma_buf_mmap() should be
able to continue doing this.

Also the Fixes: review comment remains,

This should've been broken forever, don't think that we have a fixes tag
here. I looked thought all my previous patches and added the Fixes
wherever was possible. If I really missed something, then please let me
know.

Re: [PATCH v7 2/2] drm/gem: Don't map imported GEMs

2022-06-30 Thread Intel


Hi, Dmitry,

On 6/30/22 22:04, Dmitry Osipenko wrote:

Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
handle imported dma-bufs properly, which results in mapping of something
else than the imported dma-buf. On NVIDIA Tegra we get a hard lockup when
userspace writes to the memory mapping of a dma-buf that was imported into
Tegra's DRM GEM.

Majority of DRM drivers prohibit mapping of the imported GEM objects.
Mapping of imported GEMs require special care from userspace since it
should sync dma-buf because mapping coherency of the exporter device may
not match the DRM device. Let's prohibit the mapping for all DRM drivers
for consistency.

Cc: sta...@vger.kernel.org
Suggested-by: Thomas Hellström 
Signed-off-by: Dmitry Osipenko 


This might break drivers whose obj->funcs->mmap() callback already 
handles this case, and has userspace that does the right thing.


I think the disabling must be checked on a per-driver basis to avoid 
that; in particular drivers that already call dma_buf_mmap() should be 
able to continue doing this.


Also the Fixes: review comment remains,

thanks,

Thomas




---
  drivers/gpu/drm/drm_gem.c  | 4 
  drivers/gpu/drm/drm_gem_shmem_helper.c | 9 -
  2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 86d670c71286..fc9ec42fa0ab 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1034,6 +1034,10 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, 
unsigned long obj_size,
  {
int ret;
  
+	/* Don't allow imported objects to be mapped */

+   if (obj->import_attach)
+   return -EINVAL;
+
/* Check for valid size. */
if (obj_size < vma->vm_end - vma->vm_start)
return -EINVAL;
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 8ad0e02991ca..6190f5018986 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -609,17 +609,8 @@ EXPORT_SYMBOL_GPL(drm_gem_shmem_vm_ops);
   */
  int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct 
vm_area_struct *vma)
  {
-   struct drm_gem_object *obj = >base;
int ret;
  
-	if (obj->import_attach) {

-   /* Drop the reference drm_gem_mmap_obj() acquired.*/
-   drm_gem_object_put(obj);
-   vma->vm_private_data = NULL;
-
-   return dma_buf_mmap(obj->dma_buf, vma, 0);
-   }
-
ret = drm_gem_shmem_get_pages(shmem);
if (ret) {
drm_gem_vm_close(vma);

Re: [PATCH v6 02/22] drm/gem: Move mapping of imported dma-bufs to drm_gem_mmap_obj()

2022-06-29 Thread Intel




On 6/29/22 10:22, Dmitry Osipenko wrote:

On 6/29/22 09:40, Thomas Hellström (Intel) wrote:

On 5/27/22 01:50, Dmitry Osipenko wrote:

Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
handle imported dma-bufs properly, which results in mapping of something
else than the imported dma-buf. For example, on NVIDIA Tegra we get a
hard
lockup when userspace writes to the memory mapping of a dma-buf that was
imported into Tegra's DRM GEM.

To fix this bug, move mapping of imported dma-bufs to drm_gem_mmap_obj().
Now mmaping of imported dma-bufs works properly for all DRM drivers.

Same comment about Fixes: as in patch 1,

Cc: sta...@vger.kernel.org
Signed-off-by: Dmitry Osipenko 
---
   drivers/gpu/drm/drm_gem.c  | 3 +++
   drivers/gpu/drm/drm_gem_shmem_helper.c | 9 -
   drivers/gpu/drm/tegra/gem.c    | 4 
   3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 86d670c71286..7c0b025508e4 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1038,6 +1038,9 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj,
unsigned long obj_size,
   if (obj_size < vma->vm_end - vma->vm_start)
   return -EINVAL;
   +    if (obj->import_attach)
+    return dma_buf_mmap(obj->dma_buf, vma, 0);

If we start enabling mmaping of imported dma-bufs on a majority of
drivers in this way, how do we ensure that user-space is not blindly
using the object mmap without calling the needed DMA_BUF_IOCTL_SYNC
which is needed before and after cpu access of mmap'ed dma-bufs?

I was under the impression (admittedly without looking) that the few
drivers that actually called into dma_buf_mmap() had some private
user-mode driver code in place that ensured this happened.

Since it's a userspace who does the mapping, then it should be a
responsibility of userspace to do all the necessary syncing.


Sure, but nothing prohibits user-space to ignore the syncing thinking 
"It works anyway", testing those drivers where the syncing is a NOP. And 
when a driver that finally needs syncing is tested it's too late to fix 
all broken user-space.



  I'm not
sure whether anyone in userspace really needs to map imported dma-bufs
in practice. Nevertheless, this use-case is broken and should be fixed
by either allowing to do the mapping or prohibiting it.

Then I'd vote for prohibiting it, at least for now. And for the future 
moving forward we could perhaps revisit the dma-buf need for syncing, 
requiring those drivers that actually need it to implement emulated 
coherent memory which can be done not too inefficiently (vmwgfx being 
one example).


/Thomas

Re: [PATCH v6 08/22] drm/virtio: Unlock reservations on dma_resv_reserve_fences() error

2022-06-29 Thread Intel




On 5/27/22 01:50, Dmitry Osipenko wrote:

Unlock reservations on dma_resv_reserve_fences() error to fix recursive
locking of the reservations when this error happens.

Fixes:

Cc: sta...@vger.kernel.org


With that fixed,

Reviewed-by: Thomas Hellström 



Signed-off-by: Dmitry Osipenko 
---
  drivers/gpu/drm/virtio/virtgpu_gem.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c 
b/drivers/gpu/drm/virtio/virtgpu_gem.c
index 580a78809836..7db48d17ee3a 100644
--- a/drivers/gpu/drm/virtio/virtgpu_gem.c
+++ b/drivers/gpu/drm/virtio/virtgpu_gem.c
@@ -228,8 +228,10 @@ int virtio_gpu_array_lock_resv(struct 
virtio_gpu_object_array *objs)
  
  	for (i = 0; i < objs->nents; ++i) {

ret = dma_resv_reserve_fences(objs->objs[i]->resv, 1);
-   if (ret)
+   if (ret) {
+   virtio_gpu_array_unlock_resv(objs);
return ret;
+   }
}
return ret;
  }

Re: [PATCH v6 02/22] drm/gem: Move mapping of imported dma-bufs to drm_gem_mmap_obj()

2022-06-29 Thread Intel




On 5/27/22 01:50, Dmitry Osipenko wrote:

Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't
handle imported dma-bufs properly, which results in mapping of something
else than the imported dma-buf. For example, on NVIDIA Tegra we get a hard
lockup when userspace writes to the memory mapping of a dma-buf that was
imported into Tegra's DRM GEM.

To fix this bug, move mapping of imported dma-bufs to drm_gem_mmap_obj().
Now mmaping of imported dma-bufs works properly for all DRM drivers.

Same comment about Fixes: as in patch 1,


Cc: sta...@vger.kernel.org
Signed-off-by: Dmitry Osipenko 
---
  drivers/gpu/drm/drm_gem.c  | 3 +++
  drivers/gpu/drm/drm_gem_shmem_helper.c | 9 -
  drivers/gpu/drm/tegra/gem.c| 4 
  3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 86d670c71286..7c0b025508e4 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1038,6 +1038,9 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned 
long obj_size,
if (obj_size < vma->vm_end - vma->vm_start)
return -EINVAL;
  
+	if (obj->import_attach)

+   return dma_buf_mmap(obj->dma_buf, vma, 0);


If we start enabling mmaping of imported dma-bufs on a majority of 
drivers in this way, how do we ensure that user-space is not blindly 
using the object mmap without calling the needed DMA_BUF_IOCTL_SYNC 
which is needed before and after cpu access of mmap'ed dma-bufs?


I was under the impression (admittedly without looking) that the few 
drivers that actually called into dma_buf_mmap() had some private 
user-mode driver code in place that ensured this happened.


/Thomas



+
/* Take a ref for this mapping of the object, so that the fault
 * handler can dereference the mmap offset's pointer to the object.
 * This reference is cleaned up by the corresponding vm_close
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c 
b/drivers/gpu/drm/drm_gem_shmem_helper.c
index 8ad0e02991ca..6190f5018986 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -609,17 +609,8 @@ EXPORT_SYMBOL_GPL(drm_gem_shmem_vm_ops);
   */
  int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct 
vm_area_struct *vma)
  {
-   struct drm_gem_object *obj = >base;
int ret;
  
-	if (obj->import_attach) {

-   /* Drop the reference drm_gem_mmap_obj() acquired.*/
-   drm_gem_object_put(obj);
-   vma->vm_private_data = NULL;
-
-   return dma_buf_mmap(obj->dma_buf, vma, 0);
-   }
-
ret = drm_gem_shmem_get_pages(shmem);
if (ret) {
drm_gem_vm_close(vma);
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 7c7dd84e6db8..f92aa20d63bb 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -564,6 +564,10 @@ int __tegra_gem_mmap(struct drm_gem_object *gem, struct 
vm_area_struct *vma)
  {
struct tegra_bo *bo = to_tegra_bo(gem);
  
+	/* imported dmu-buf is mapped by drm_gem_mmap_obj()  */

+   if (gem->import_attach)
+   return 0;
+
if (!bo->pages) {
unsigned long vm_pgoff = vma->vm_pgoff;
int err;

Re: [Linaro-mm-sig] [PATCH] drm/i915: Remove __dma_fence_is_chain()

2022-06-28 Thread Intel




On 6/29/22 01:35, Rob Clark wrote:

From: Rob Clark 

drive-by cleanup

Signed-off-by: Rob Clark 


Reviewed-by: Thomas Hellström 



---
  drivers/gpu/drm/i915/gem/i915_gem_wait.c | 7 +--
  1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c 
b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 319936f91ac5..667841780514 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -73,11 +73,6 @@ static void fence_set_priority(struct dma_fence *fence,
rcu_read_unlock();
  }
  
-static inline bool __dma_fence_is_chain(const struct dma_fence *fence)

-{
-   return fence->ops == _fence_chain_ops;
-}
-
  void i915_gem_fence_wait_priority(struct dma_fence *fence,
  const struct i915_sched_attr *attr)
  {
@@ -93,7 +88,7 @@ void i915_gem_fence_wait_priority(struct dma_fence *fence,
  
  		for (i = 0; i < array->num_fences; i++)

fence_set_priority(array->fences[i], attr);
-   } else if (__dma_fence_is_chain(fence)) {
+   } else if (dma_fence_is_chain(fence)) {
struct dma_fence *iter;
  
  		/* The chain is ordered; if we boost the last, we boost all */

Re: [PATCH v6 14/22] dma-buf: Introduce new locking convention

2022-06-28 Thread Intel

On 5/30/22 15:57, Dmitry Osipenko wrote:

On 5/30/22 16:41, Christian König wrote:

Hi Dmitry,

Am 30.05.22 um 15:26 schrieb Dmitry Osipenko:

Hello Christian,

On 5/30/22 09:50, Christian König wrote:

Hi Dmitry,

First of all please separate out this patch from the rest of the series,
since this is a complex separate structural change.

I assume all the patches will go via the DRM tree in the end since the
rest of the DRM patches in this series depend on this dma-buf change.
But I see that separation may ease reviewing of the dma-buf changes, so
let's try it.

That sounds like you are underestimating a bit how much trouble this
will be.

I have tried this before and failed because catching all the locks in
the right code paths are very tricky. So expect some fallout from this
and make sure the kernel test robot and CI systems are clean.

Sure, I'll fix up all the reported things in the next iteration.

BTW, have you ever posted yours version of the patch? Will be great if
we could compare the changed code paths.

No, I never even finished creating it after realizing how much work it
would be.

This patch introduces new locking convention for dma-buf users. From
now
on all dma-buf importers are responsible for holding dma-buf
reservation
lock around operations performed over dma-bufs.

This patch implements the new dma-buf locking convention by:

1. Making dma-buf API functions to take the reservation lock.

2. Adding new locked variants of the dma-buf API functions for
drivers
that need to manage imported dma-bufs under the held lock.

Instead of adding new locked variants please mark all variants which
expect to be called without a lock with an _unlocked postfix.

This should make it easier to remove those in a follow up patch set and
then fully move the locking into the importer.

Do we really want to move all the locks to the importers? Seems the
majority of drivers should be happy with the dma-buf helpers handling
the locking for them.

Yes, I clearly think so.

3. Converting all drivers to the new locking scheme.

I have strong doubts that you got all of them. At least radeon and
nouveau should grab the reservation lock in their ->attach callbacks
somehow.

Radeon and Nouveau use gem_prime_import_sg_table() and they take resv
lock already, seems they should be okay (?)

You are looking at the wrong side. You need to fix the export code path,
not the import ones.

See for example attach on radeon works like this
drm_gem_map_attach->drm_gem_pin->radeon_gem_prime_pin->radeon_bo_reserve->ttm_bo_reserve->dma_resv_lock.

Yeah, I was looking at the both sides, but missed this one.

Also i915 will run into trouble with attach. In particular since i915
starts a full ww transaction in its attach callback to be able to lock
other objects if migration is needed. I think i915 CI would catch this
in a selftest.

Perhaps it's worthwile to take a step back and figure out, if the
importer is required to lock, which callbacks might need a ww acquire
context?

(And off-topic, Since we do a lot of fancy stuff under dma-resv locks
including waiting for fences and other locks, IMO taking these locks
uninterruptible should ring a warning bell)

/Thomas

Same for nouveau and probably a few other exporters as well. That will
certainly cause a deadlock if you don't fix it.

I strongly suggest to do this step by step, first attach/detach and then
the rest.

Thank you very much for the suggestions. I'll implement them in the next
version.

Re: [PATCH v6 01/22] drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error

2022-06-28 Thread Intel


Hi,

On 5/27/22 01:50, Dmitry Osipenko wrote:

Use ww_acquire_fini() in the error code paths. Otherwise lockdep
thinks that lock is held when lock's memory is freed after the
drm_gem_lock_reservations() error. The WW needs to be annotated
as "freed"


s /WW/ww_acquire_context/ ?
s /"freed"/"released"/ ?



, which fixes the noisy "WARNING: held lock freed!" splat
of VirtIO-GPU driver with CONFIG_DEBUG_MUTEXES=y and enabled lockdep.

Cc: sta...@vger.kernel.org


Can you dig up the commit in error and add a Fixes: Tag?

Using that and "dim fixes" will also make the Cc: stable tag a bit more 
verbose.


With that fixed,

Reviewed-by: Thomas Hellström 



Signed-off-by: Dmitry Osipenko 
---
  drivers/gpu/drm/drm_gem.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index eb0c2d041f13..86d670c71286 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1226,7 +1226,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, 
int count,
ret = dma_resv_lock_slow_interruptible(obj->resv,
 acquire_ctx);
if (ret) {
-   ww_acquire_done(acquire_ctx);
+   ww_acquire_fini(acquire_ctx);
return ret;
}
}
@@ -1251,7 +1251,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, 
int count,
goto retry;
}
  
-			ww_acquire_done(acquire_ctx);

+   ww_acquire_fini(acquire_ctx);
return ret;
}
}

Re: [Intel-gfx] [PATCH 06/10] drm/i915/uapi: add NEEDS_CPU_ACCESS hint

2022-06-17 Thread Intel




On 5/25/22 20:43, Matthew Auld wrote:

If set, force the allocation to be placed in the mappable portion of
I915_MEMORY_CLASS_DEVICE. One big restriction here is that system memory
(i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the
object, that way we can always spill the object into system memory if we
can't make space.

Testcase: igt@gem-create@create-ext-cpu-access-sanity-check
Testcase: igt@gem-create@create-ext-cpu-access-big
Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
---
  drivers/gpu/drm/i915/gem/i915_gem_create.c | 26 ++---
  include/uapi/drm/i915_drm.h| 61 +++---
  2 files changed, 71 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index d094cae0ddf1..33673fe7ee0a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -241,6 +241,7 @@ struct create_ext {
struct drm_i915_private *i915;
struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
unsigned int n_placements;
+   unsigned int placement_mask;
unsigned long flags;
  };
  
@@ -337,6 +338,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args,

for (i = 0; i < args->num_regions; i++)
ext_data->placements[i] = placements[i];
  
+	ext_data->placement_mask = mask;

return 0;
  
  out_dump:

@@ -411,7 +413,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
*data,
struct drm_i915_gem_object *obj;
int ret;
  
-	if (args->flags)

+   if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS)
return -EINVAL;
  
  	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),

@@ -427,13 +429,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void 
*data,
ext_data.n_placements = 1;
}
  
-	/*

-* TODO: add a userspace hint to force CPU_ACCESS for the object, which
-* can override this.
-*/
-   if (ext_data.n_placements > 1 ||
-   ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
-   ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+   if (args->flags & I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) {
+   if (ext_data.n_placements == 1)
+   return -EINVAL;
+
+   /*
+* We always need to be able to spill to system memory, if we
+* can't place in the mappable part of LMEM.
+*/
+   if (!(ext_data.placement_mask & BIT(INTEL_REGION_SMEM)))
+   return -EINVAL;
+   } else {
+   if (ext_data.n_placements > 1 ||
+   ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM)
+   ext_data.flags |= I915_BO_ALLOC_GPU_ONLY;
+   }
  
  	obj = __i915_gem_object_create_user_ext(i915, args->size,

ext_data.placements,
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index e30f31a440b3..5b0a10e6a1b8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -3366,11 +3366,11 @@ struct drm_i915_query_memory_regions {
   * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added
   * extension support using struct i915_user_extension.
   *
- * Note that in the future we want to have our buffer flags here, at least for
- * the stuff that is immutable. Previously we would have two ioctls, one to
- * create the object with gem_create, and another to apply various parameters,
- * however this creates some ambiguity for the params which are considered
- * immutable. Also in general we're phasing out the various SET/GET ioctls.
+ * Note that new buffer flags should be added here, at least for the stuff that
+ * is immutable. Previously we would have two ioctls, one to create the object
+ * with gem_create, and another to apply various parameters, however this
+ * creates some ambiguity for the params which are considered immutable. Also 
in
+ * general we're phasing out the various SET/GET ioctls.
   */
  struct drm_i915_gem_create_ext {
/**
@@ -3378,7 +3378,6 @@ struct drm_i915_gem_create_ext {
 *
 * The (page-aligned) allocated size for the object will be returned.
 *
-*
 * DG2 64K min page size implications:
 *
 * On discrete platforms, starting from DG2, we have to contend with GTT
@@ -3390,7 +3389,9 @@ struct drm_i915_gem_create_ext {
 *
 * Note that the returned size here will always reflect any required
 * rounding up done by the kernel, i.e 4K will now become 64K on devices
-* such as DG2.
+* such as DG2. The kernel will always select the

Re: [Intel-gfx] [PATCH 07/10] drm/i915/error: skip non-mappable pages

2022-06-17 Thread Intel




On 5/25/22 20:43, Matthew Auld wrote:

Skip capturing any lmem pages that can't be copied using the CPU. This
in now only best effort on platforms that have small BAR.

Testcase: igt@gem-exec-capture@capture-invisible
Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
---
  drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
b/drivers/gpu/drm/i915/i915_gpu_error.c
index 0512c66fa4f3..77df899123c2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1116,11 +1116,15 @@ i915_vma_coredump_create(const struct intel_gt *gt,
dma_addr_t dma;
  
  		for_each_sgt_daddr(dma, iter, vma_res->bi.pages) {

+   dma_addr_t offset = dma - mem->region.start;
void __iomem *s;
  
-			s = io_mapping_map_wc(>iomap,

- dma - mem->region.start,
- PAGE_SIZE);
+   if (offset + PAGE_SIZE > mem->io_size) {
+   ret = -EINVAL;
+   break;
+   }
+
+   s = io_mapping_map_wc(>iomap, offset, PAGE_SIZE);
ret = compress_page(compress,
(void __force *)s, dst,
true);


Reviewed-by: Thomas Hellström

Re: [Intel-gfx] [PATCH 08/10] drm/i915/uapi: disable capturing objects on recoverable contexts

2022-06-17 Thread Intel




On 5/25/22 20:43, Matthew Auld wrote:

A non-recoverable context must be used if the user wants proper error
capture on discrete platforms. In the future the kernel may want to blit
the contents of some objects when later doing the capture stage.

Testcase: igt@gem_exec_capture@capture-recoverable-discrete
Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
---
  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index b279588c0672..e27ccfa50dc3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1961,7 +1961,7 @@ eb_find_first_request_added(struct i915_execbuffer *eb)
  #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR)
  
  /* Stage with GFP_KERNEL allocations before we enter the signaling critical path */

-static void eb_capture_stage(struct i915_execbuffer *eb)
+static int eb_capture_stage(struct i915_execbuffer *eb)
  {
const unsigned int count = eb->buffer_count;
unsigned int i = count, j;
@@ -1974,6 +1974,10 @@ static void eb_capture_stage(struct i915_execbuffer *eb)
if (!(flags & EXEC_OBJECT_CAPTURE))
continue;
  
+		if (i915_gem_context_is_recoverable(eb->gem_context) &&

+   IS_DGFX(eb->i915))
+   return -EINVAL;
+


We should also require this for future integrated, for capture buffer 
memory allocation purposes.


Otherwise Reviewed-by: Thomas Hellström

Re: [Intel-gfx] [PATCH 01/10] drm/doc: add rfc section for small BAR uapi

2022-06-16 Thread Intel




On 5/25/22 20:43, Matthew Auld wrote:

Add an entry for the new uapi needed for small BAR on DG2+.

v2:
   - Some spelling fixes and other small tweaks. (Akeem & Thomas)
   - Rework error capture interactions, including no longer needing
 NEEDS_CPU_ACCESS for objects marked for capture. (Thomas)
   - Add probed_cpu_visible_size. (Lionel)
v3:
   - Drop the vma query for now.
   - Add unallocated_cpu_visible_size as part of the region query.
   - Improve the docs some more, including documenting the expected
 behaviour on older kernels, since this came up in some offline
 discussion.
v4:
   - Various improvements all over. (Tvrtko)

Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Lionel Landwerlin 
Cc: Tvrtko Ursulin 
Cc: Jon Bloomfield 
Cc: Daniel Vetter 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: Akeem G Abodunrin 
Cc: mesa-...@lists.freedesktop.org
Acked-by: Tvrtko Ursulin 
Acked-by: Akeem G Abodunrin 
---
  Documentation/gpu/rfc/i915_small_bar.h   | 189 +++
  Documentation/gpu/rfc/i915_small_bar.rst |  47 ++
  Documentation/gpu/rfc/index.rst  |   4 +
  3 files changed, 240 insertions(+)
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.h
  create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst

diff --git a/Documentation/gpu/rfc/i915_small_bar.h 
b/Documentation/gpu/rfc/i915_small_bar.h
new file mode 100644
index ..752bb2ceb399
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_small_bar.h
@@ -0,0 +1,189 @@
+/**
+ * struct __drm_i915_memory_region_info - Describes one region as known to the
+ * driver.
+ *
+ * Note this is using both struct drm_i915_query_item and struct 
drm_i915_query.
+ * For this new query we are adding the new query id 
DRM_I915_QUERY_MEMORY_REGIONS
+ * at _i915_query_item.query_id.
+ */
+struct __drm_i915_memory_region_info {
+   /** @region: The class:instance pair encoding */
+   struct drm_i915_gem_memory_class_instance region;
+
+   /** @rsvd0: MBZ */
+   __u32 rsvd0;
+
+   /**
+* @probed_size: Memory probed by the driver (-1 = unknown)
+*
+* Note that it should not be possible to ever encounter a zero value
+* here, also note that no current region type will ever return -1 here.
+* Although for future region types, this might be a possibility. The
+* same applies to the other size fields.
+*/
+   __u64 probed_size;
+
+   /**
+* @unallocated_size: Estimate of memory remaining (-1 = unknown)
+*
+* Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting.
+* Without this (or if this is an older kernel) the value here will
+* always equal the @probed_size. Note this is only currently tracked
+* for I915_MEMORY_CLASS_DEVICE regions (for other types the value here
+* will always equal the @probed_size).
+*/
+   __u64 unallocated_size;
+
+   union {
+   /** @rsvd1: MBZ */
+   __u64 rsvd1[8];
+   struct {
+   /**
+* @probed_cpu_visible_size: Memory probed by the driver
+* that is CPU accessible. (-1 = unknown).
+*
+* This will be always be <= @probed_size, and the
+* remainder (if there is any) will not be CPU
+* accessible.
+*
+* On systems without small BAR, the @probed_size will
+* always equal the @probed_cpu_visible_size, since all
+* of it will be CPU accessible.
+*
+* Note this is only tracked for
+* I915_MEMORY_CLASS_DEVICE regions (for other types the
+* value here will always equal the @probed_size).
+*
+* Note that if the value returned here is zero, then
+* this must be an old kernel which lacks the relevant
+* small-bar uAPI support (including
+* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on
+* such systems we should never actually end up with a
+* small BAR configuration, assuming we are able to load
+* the kernel module. Hence it should be safe to treat
+* this the same as when @probed_cpu_visible_size ==
+* @probed_size.
+*/
+   __u64 probed_cpu_visible_size;
+
+   /**
+* @unallocated_cpu_visible_size: Estimate of CPU
+* visible memory remaining (-1 = unknown).
+*
+* Note this is only tracked for
+* I915_MEMORY_CLASS_DEVICE

Re: [PATCH 1/2] drm/vmwgfx: Remove unused hugepage support

2022-04-26 Thread Intel




On 4/25/22 22:31, Zack Rusin wrote:

From: Zack Rusin 

There's no point in explicitly trying to align virtual memory to
facilitate huge page table entries or huge page memory in buffer objects
given that they're not being used.

Transparent hugepages support for vram allocations has been gradually
retired over the last two years making alignment of unmapped areas
unneeded and pointless.

Signed-off-by: Zack Rusin 


For the series:

Reviewed-by: Thomas Hellström 



---
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 13 -
  1 file changed, 13 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 5dc02fd806db..45028e25d490 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1423,18 +1423,6 @@ static void vmw_debugfs_resource_managers_init(struct 
vmw_private *vmw)
root, "system_mob_ttm");
  }
  
-static unsigned long

-vmw_get_unmapped_area(struct file *file, unsigned long uaddr,
- unsigned long len, unsigned long pgoff,
- unsigned long flags)
-{
-   struct drm_file *file_priv = file->private_data;
-   struct vmw_private *dev_priv = vmw_priv(file_priv->minor->dev);
-
-   return drm_get_unmapped_area(file, uaddr, len, pgoff, flags,
-dev_priv->drm.vma_offset_manager);
-}
-
  static int vmwgfx_pm_notifier(struct notifier_block *nb, unsigned long val,
  void *ptr)
  {
@@ -1601,7 +1589,6 @@ static const struct file_operations vmwgfx_driver_fops = {
.compat_ioctl = vmw_compat_ioctl,
  #endif
.llseek = noop_llseek,
-   .get_unmapped_area = vmw_get_unmapped_area,
  };
  
  static const struct drm_driver driver = {

Re: [Intel-gfx] [PATCH 2/4] drm/i915/selftests: Check for incomplete LRI from the context image

2022-04-08 Thread Intel




On 3/14/22 19:20, Ramalingam C wrote:

From: Chris Wilson 

In order to keep the context image parser simple, we assume that all
commands follow a similar format. A few, especially not MI commands on
the render engines, have fixed lengths not encoded in a length field.
This caused us to incorrectly skip over 3D state commands, and start
interpretting context data as instructions. Eventually, as Daniele


interpreting



discovered, this would lead us to find addition LRI as part of the data
and mistakenly add invalid LRI commands to the context probes.

Stop parsing after we see the first !MI command, as we know we will have
seen all the context registers by that point. (Mostly true for all gen so far,
though the render context does have LRI after the first page that we
have been ignoring so far. It would be useful to extract those as well
so that we have the full list of user accesisble registers.)

accessible


Similarly, emit a warning if we do try to emit an invalid zero-length
LRI.

Reported-by: Daniele Ceraolo Spurio 
Signed-off-by: Chris Wilson 
Cc: Daniele Ceraolo Spurio 
Signed-off-by: Ramalingam C 


Acked-by: Thomas Hellström 





---
  drivers/gpu/drm/i915/gt/selftest_lrc.c | 61 +++---
  1 file changed, 54 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c 
b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 13f57c7c4224..0a8ed4246082 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -27,6 +27,9 @@
  #define NUM_GPR 16
  #define NUM_GPR_DW (NUM_GPR * 2) /* each GPR is 2 dwords */
  
+#define LRI_HEADER MI_INSTR(0x22, 0)

+#define LRI_LENGTH_MASK GENMASK(7, 0)
+
  static struct i915_vma *create_scratch(struct intel_gt *gt)
  {
return __vm_create_scratch_for_read_pinned(>ggtt->vm, PAGE_SIZE);
@@ -180,7 +183,7 @@ static int live_lrc_layout(void *arg)
continue;
}
  
-			if ((lri & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {

+   if ((lri & GENMASK(31, 23)) != LRI_HEADER) {
pr_err("%s: Expected LRI command at dword %d, found 
%08x\n",
   engine->name, dw, lri);
err = -EINVAL;
@@ -945,18 +948,40 @@ store_context(struct intel_context *ce, struct i915_vma 
*scratch)
hw = defaults;
hw += LRC_STATE_OFFSET / sizeof(*hw);
do {
-   u32 len = hw[dw] & 0x7f;
+   u32 len = hw[dw] & LRI_LENGTH_MASK;
+
+   /*
+* Keep it simple, skip parsing complex commands
+*
+* At present, there are no more MI_LOAD_REGISTER_IMM
+* commands after the first 3D state command. Rather
+* than include a table (see i915_cmd_parser.c) of all
+* the possible commands and their instruction lengths
+* (or mask for variable length instructions), assume
+* we have gathered the complete list of registers and
+* bail out.
+*/
+   if ((hw[dw] >> INSTR_CLIENT_SHIFT) != INSTR_MI_CLIENT)
+   break;
  
  		if (hw[dw] == 0) {

dw++;
continue;
}
  
-		if ((hw[dw] & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {

+   if ((hw[dw] & GENMASK(31, 23)) != LRI_HEADER) {
+   /* Assume all other MI commands match LRI length mask */
dw += len + 2;
continue;
}
  
+		if (!len) {

+   pr_err("%s: invalid LRI found in context image\n",
+  ce->engine->name);
+   igt_hexdump(defaults, PAGE_SIZE);
+   break;
+   }
+
dw++;
len = (len + 1) / 2;
while (len--) {
@@ -1108,18 +1133,29 @@ static struct i915_vma *load_context(struct 
intel_context *ce, u32 poison)
hw = defaults;
hw += LRC_STATE_OFFSET / sizeof(*hw);
do {
-   u32 len = hw[dw] & 0x7f;
+   u32 len = hw[dw] & LRI_LENGTH_MASK;
+
+   /* For simplicity, break parsing at the first complex command */
+   if ((hw[dw] >> INSTR_CLIENT_SHIFT) != INSTR_MI_CLIENT)
+   break;
  
  		if (hw[dw] == 0) {

dw++;
continue;
}
  
-		if ((hw[dw] & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) {

+   if ((hw[dw] & GENMASK(31, 23)) != LRI_HEADER) {
dw += len + 2;
continue;
}
  
+		if (!len) {

+   pr_err("%s: invalid LRI found in context image\n",
+  ce->engine->name);
+   igt_hexdump(defaults, PAGE_SIZE);
+

Re: [Intel-gfx] [PATCH v9 4/9] drm/i915/gt: Pass the -EINVAL when emit_pte doesn't update any PTE

2022-04-08 Thread Intel




On 4/5/22 17:08, Ramalingam C wrote:

When emit_pte doesn't update any PTE with return value as 0, interpret
it as -EINVAL.

v2:
   Add missing goto [Thomas]

Signed-off-by: Ramalingam C 
---
  drivers/gpu/drm/i915/gt/intel_migrate.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
b/drivers/gpu/drm/i915/gt/intel_migrate.c
index e0f1c727662e..6378d4450e1a 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -577,7 +577,11 @@ intel_context_migrate_copy(struct intel_context *ce,
  
  		len = emit_pte(rq, _src, src_cache_level, src_is_lmem,

   src_offset, CHUNK_SZ);


    if (len <= 0) {

        err = len ? len : -EINVAL;

        goto out_rq;

} ?

Up to you.

Reviewed-by: Thomas Hellström 




-   if (len <= 0) {
+   if (!len) {
+   err = -EINVAL;
+   goto out_rq;
+   }
+   if (len < 0) {
err = len;
goto out_rq;
}

Re: [Intel-gfx] [PATCH v7 5/9] drm/i915/selftest_migrate: Consider the possible roundup of size

2022-03-29 Thread Intel




On 3/28/22 21:07, Ramalingam C wrote:

Consider the possible round up happened at obj size alignment to
min_page_size during the obj allocation.

Signed-off-by: Ramalingam C 


Reviewed-by: Thomas Hellström 



---
  drivers/gpu/drm/i915/gt/selftest_migrate.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c 
b/drivers/gpu/drm/i915/gt/selftest_migrate.c
index c9c4f391c5cc..b5da8b8cd039 100644
--- a/drivers/gpu/drm/i915/gt/selftest_migrate.c
+++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c
@@ -152,6 +152,9 @@ static int clear(struct intel_migrate *migrate,
if (IS_ERR(obj))
return 0;
  
+	/* Consider the rounded up memory too */

+   sz = obj->base.size;
+
for_i915_gem_ww(, err, true) {
err = i915_gem_object_lock(obj, );
if (err)

Re: [Intel-gfx] [PATCH v5 6/9] drm/i915/gt: offset handling for multiple copy engines

2022-03-24 Thread Intel




On 3/21/22 23:44, Ramalingam C wrote:

Handle the src and dst chunk offsets for different instances of the copy
engines.

Signed-off-by: Ramalingam C 
---
  drivers/gpu/drm/i915/gt/intel_migrate.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 39a5f8ae664d..5f6341f91622 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -614,6 +614,9 @@ static int emit_copy(struct i915_request *rq,
u32 instance = rq->engine->instance;
u32 *cs;
  
+	src_offset += (u64)rq->engine->instance << 32;

+   dst_offset += (u64)rq->engine->instance << 32;
+


Again, these are nops since the offsets are 32-bit.

Also the instance selection is already handled in the functon, so I 
think this patch can be dropped.




cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6);
if (IS_ERR(cs))
return PTR_ERR(cs);

Re: [Intel-gfx] [PATCH v5 3/9] drm/i915/gt: Clear compress metadata for Flat-ccs objects

2022-03-24 Thread Intel


Hi, Ram

On 3/21/22 23:44, Ramalingam C wrote:

Xe-HP and latest devices support Flat CCS which reserved a portion of
the device memory to store compression metadata, during the clearing of
device memory buffer object we also need to clear the associated
CCS buffer.

XY_CTRL_SURF_COPY_BLT is a BLT cmd used for reading and writing the
ccs surface of a lmem memory. So on Flat-CCS capable platform we use
XY_CTRL_SURF_COPY_BLT  to clear the CCS meta data.

v2: Fixed issues with platform naming [Lucas]
v3: Rebased [Ram]
 Used the round_up funcs [Bob]
v4: Fixed ccs blk calculation [Ram]
 Added Kdoc on flat-ccs.
v5: GENMASK is used [Matt]
 mocs fix [Matt]
 Comments Fix [Matt]
 Flush address programming [Ram]
v6: FLUSH_DW is fixed
 Few coding style fix
v7: Adopting the XY_FAST_COLOR_BLT (Thomas]
v8: XY_CTRL_SURF_COPY_BLT for ccs clearing.
v9: emit_copy_ccs is used.

Signed-off-by: Ramalingam C 
Signed-off-by: Ayaz A Siddiqui 
---
  drivers/gpu/drm/i915/gt/intel_gpu_commands.h |  15 ++
  drivers/gpu/drm/i915/gt/intel_migrate.c  | 164 ++-
  2 files changed, 175 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h 
b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
index 925e55b6a94f..6b4eb7927ec7 100644
--- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h
@@ -153,8 +153,10 @@
  #define   MI_FLUSH_DW_PROTECTED_MEM_EN(1 << 22)
  #define   MI_FLUSH_DW_STORE_INDEX (1<<21)
  #define   MI_INVALIDATE_TLB   (1<<18)
+#define   MI_FLUSH_DW_CCS  (1<<16)
  #define   MI_FLUSH_DW_OP_STOREDW  (1<<14)
  #define   MI_FLUSH_DW_OP_MASK (3<<14)
+#define   MI_FLUSH_DW_LLC  (1<<9)
  #define   MI_FLUSH_DW_NOTIFY  (1<<8)
  #define   MI_INVALIDATE_BSD   (1<<7)
  #define   MI_FLUSH_DW_USE_GTT (1<<2)
@@ -203,6 +205,19 @@
  #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3))
  #define GFX_OP_DRAWRECT_INFO_I965  ((0x7900<<16)|0x2)
  
+#define XY_CTRL_SURF_INSTR_SIZE		5

+#define MI_FLUSH_DW_SIZE   3
+#define XY_CTRL_SURF_COPY_BLT  ((2 << 29) | (0x48 << 22) | 3)
+#define   SRC_ACCESS_TYPE_SHIFT21
+#define   DST_ACCESS_TYPE_SHIFT20
+#define   CCS_SIZE_MASKGENMASK(17, 8)
+#define   XY_CTRL_SURF_MOCS_MASK   GENMASK(31, 25)
+#define   NUM_CCS_BYTES_PER_BLOCK  256
+#define   NUM_BYTES_PER_CCS_BYTE   256
+#define   NUM_CCS_BLKS_PER_XFER1024
+#define   INDIRECT_ACCESS  0
+#define   DIRECT_ACCESS1
+
  #define COLOR_BLT_CMD (2 << 29 | 0x40 << 22 | (5 - 2))
  #define XY_COLOR_BLT_CMD  (2 << 29 | 0x50 << 22)
  #define XY_FAST_COLOR_BLT_CMD (2 << 29 | 0x44 << 22)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
b/drivers/gpu/drm/i915/gt/intel_migrate.c
index b656685a486d..39a5f8ae664d 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -16,7 +16,8 @@ struct insert_pte_data {
  };
  
  #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */

-
+#define GET_CCS_BYTES(i915, size)  (HAS_FLAT_CCS(i915) ? \
+DIV_ROUND_UP(size, 
NUM_BYTES_PER_CCS_BYTE) : 0)
  static bool engine_supports_migration(struct intel_engine_cs *engine)
  {
if (!engine)
@@ -467,6 +468,145 @@ static bool wa_1209644611_applies(int ver, u32 size)
return height % 4 == 3 && height <= 8;
  }
  
+/**

+ * DOC: Flat-CCS - Memory compression for Local memory
+ *
+ * On Xe-HP and later devices, we use dedicated compression control state (CCS)
+ * stored in local memory for each surface, to support the 3D and media
+ * compression formats.
+ *
+ * The memory required for the CCS of the entire local memory is 1/256 of the
+ * local memory size. So before the kernel boot, the required memory is 
reserved
+ * for the CCS data and a secure register will be programmed with the CCS base
+ * address.
+ *
+ * Flat CCS data needs to be cleared when a lmem object is allocated.
+ * And CCS data can be copied in and out of CCS region through
+ * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly.
+ *
+ * When we exhaust the lmem, if the object's placements support smem, then we 
can
+ * directly decompress the compressed lmem object into smem and start using it
+ * from smem itself.
+ *
+ * But when we need to swapout the compressed lmem object into a smem region
+ * though objects' placement doesn't support smem, then we copy the lmem 
content
+ * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT).
+ * When the object is referred, lmem content will be swaped in along with
+ * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding
+ * location.
+ */
+
+static inline u32 *i915_flush_dw(u32 *cmd, u32 flags)
+{
+   *cmd++ = MI_FLUSH_DW |

Re: [Intel-gfx] [PATCH v5 2/9] drm/i915/gt: Optimize the migration and clear loop

2022-03-24 Thread Intel




On 3/21/22 23:44, Ramalingam C wrote:

Move the static calculations out of the loops for copy and clear.

Signed-off-by: Ramalingam C 


Reviewed-by: Thomas Hellström 



---
  drivers/gpu/drm/i915/gt/intel_migrate.c | 44 -
  1 file changed, 21 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c 
b/drivers/gpu/drm/i915/gt/intel_migrate.c
index 73199ebf0671..b656685a486d 100644
--- a/drivers/gpu/drm/i915/gt/intel_migrate.c
+++ b/drivers/gpu/drm/i915/gt/intel_migrate.c
@@ -526,6 +526,7 @@ intel_context_migrate_copy(struct intel_context *ce,
   struct i915_request **out)
  {
struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst);
+   u32 src_offset, dst_offset;
struct i915_request *rq;
int err;
  
@@ -534,8 +535,20 @@ intel_context_migrate_copy(struct intel_context *ce,
  
  	GEM_BUG_ON(ce->ring->size < SZ_64K);
  
+	src_offset = 0;

+   dst_offset = CHUNK_SZ;
+   if (HAS_64K_PAGES(ce->engine->i915)) {
+   GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
+
+   src_offset = 0;
+   dst_offset = 0;
+   if (src_is_lmem)
+   src_offset = CHUNK_SZ;
+   if (dst_is_lmem)
+   dst_offset = 2 * CHUNK_SZ;
+   }
+
do {
-   u32 src_offset, dst_offset;
int len;
  
  		rq = i915_request_create(ce);

@@ -563,19 +576,6 @@ intel_context_migrate_copy(struct intel_context *ce,
if (err)
goto out_rq;
  
-		src_offset = 0;

-   dst_offset = CHUNK_SZ;
-   if (HAS_64K_PAGES(ce->engine->i915)) {
-   GEM_BUG_ON(!src_is_lmem && !dst_is_lmem);
-
-   src_offset = 0;
-   dst_offset = 0;
-   if (src_is_lmem)
-   src_offset = CHUNK_SZ;
-   if (dst_is_lmem)
-   dst_offset = 2 * CHUNK_SZ;
-   }
-
len = emit_pte(rq, _src, src_cache_level, src_is_lmem,
   src_offset, CHUNK_SZ);
if (len <= 0) {
@@ -585,12 +585,10 @@ intel_context_migrate_copy(struct intel_context *ce,
  
  		err = emit_pte(rq, _dst, dst_cache_level, dst_is_lmem,

   dst_offset, len);
-   if (err < 0)
-   goto out_rq;
-   if (err < len) {
+   if (err < len)
err = -EINVAL;
+   if (err < 0)
goto out_rq;
-   }
  
  		err = rq->engine->emit_flush(rq, EMIT_INVALIDATE);

if (err)
@@ -694,6 +692,7 @@ intel_context_migrate_clear(struct intel_context *ce,
  {
struct sgt_dma it = sg_sgt(sg);
struct i915_request *rq;
+   u32 offset;
int err;
  
  	GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm);

@@ -701,8 +700,11 @@ intel_context_migrate_clear(struct intel_context *ce,
  
  	GEM_BUG_ON(ce->ring->size < SZ_64K);
  
+	offset = 0;

+   if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
+   offset = CHUNK_SZ;
+
do {
-   u32 offset;
int len;
  
  		rq = i915_request_create(ce);

@@ -730,10 +732,6 @@ intel_context_migrate_clear(struct intel_context *ce,
if (err)
goto out_rq;
  
-		offset = 0;

-   if (HAS_64K_PAGES(ce->engine->i915) && is_lmem)
-   offset = CHUNK_SZ;
-
len = emit_pte(rq, , cache_level, is_lmem, offset, CHUNK_SZ);
if (len <= 0) {
err = len;

Re: [Intel-gfx] [PATCH v3 2/3] drm/i915/gem: Remove logic for wbinvd_on_all_cpus

2022-03-16 Thread Intel


Hi, Michael, others.

This is the response from Linus last time we copied that already 
pre-existing wbinvd_on_all_cpus() macro to another place in the driver:


https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html

My first interpretation of this is that even if there currently are 
similar patterns in drm_cache.c, we shouldn't introduce more, 
encouraging other drivers to use incoherent IO.


Other than that I think we should move whatever wbinvds we can over to 
the ranged versions, unless that is proven to be a performance drop.


Finally for any wbinvds left in our driver, ensure that they are never 
executed for any gpu where we provide full coherency. That is all 
discrete gpus (and to be discussed integrated gpus moving forward).


Might be that drm maintainers want to chime in here with other views.

Thanks,

Thomas



On 3/15/22 17:59, Michael Cheng wrote:

+Daniel for additional feedback!

On 2022-03-14 4:06 p.m., Michael Cheng wrote:


On 2022-03-08 10:58 a.m., Lucas De Marchi wrote:

On Tue, Feb 22, 2022 at 08:24:31PM +0100, Thomas Hellström (Intel) 
wrote:

Hi, Michael,

On 2/22/22 18:26, Michael Cheng wrote:

This patch removes logic for wbinvd_on_all_cpus and brings in
drm_cache.h. This header has the logic that outputs a warning
when wbinvd_on_all_cpus when its being used on a non-x86 platform.

Signed-off-by: Michael Cheng 


Linus has been pretty clear that he won't accept patches that add 
macros that works on one arch and warns on others anymore in i915 
and I figure even less so in drm code.


So we shouldn't try to move this out to drm. Instead we should 
restrict the wbinvd() inside our driver to integrated and X86 only. 
For discrete on all architectures we should be coherent and hence 
not be needing wbinvd().


the warn is there to guarantee we don't forget a code path. However
simply adding the warning is the real issue: we should rather guarantee
we can't take that code path. I.e., as you said refactor the code to
guarantee it works on discrete without that logic.

$ git grep wbinvd_on_all_cpus -- drivers/gpu/drm/
drivers/gpu/drm/drm_cache.c:    if (wbinvd_on_all_cpus())
drivers/gpu/drm/drm_cache.c:    if (wbinvd_on_all_cpus())
drivers/gpu/drm/drm_cache.c:    if (wbinvd_on_all_cpus())

drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c:  * Currently we 
just do a heavy handed wbinvd_on_all_cpus() here since

drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c: wbinvd_on_all_cpus();

It looks like we actually go through this on other discrete 
graphics. Is

this missing an update like s/IS_DG1/IS_DGFX/? Or should we be doing
something else?

drivers/gpu/drm/i915/gem/i915_gem_pm.c:#define 
wbinvd_on_all_cpus() \

drivers/gpu/drm/i915/gem/i915_gem_pm.c: wbinvd_on_all_cpus();

Those are for suspend. Revert ac05a22cd07a ("drm/i915/gem: 
Almagamate clflushes on suspend")

or extract that part to a helper function and implement it differently
for arches != x86?

drivers/gpu/drm/i915/gem/i915_gem_pm.c: wbinvd_on_all_cpus();

Probably take a similar approach to the suspend case?

drivers/gpu/drm/i915/gt/intel_ggtt.c: wbinvd_on_all_cpus();


For a helper function, I have a #define for all non x86 architecture 
that gives a warn on [1] within drm_cache.h Or would it be better to 
implement a helper function instead?


[1]. https://patchwork.freedesktop.org/patch/475750/?series=1=5



This one comes from 64b95df91f44 ("drm/i915: Assume exclusive access 
to objects inside resume")
Shouldn't that be doing the invalidate if the write domain is 
I915_GEM_DOMAIN_CPU


In the end I think the warning would be ok if it was the cherry on top,
to guarantee we don't take those paths. We should probably have a
warn_once() to avoid spamming the console. But we  also have to rework
the code to guarantee we are the only ones who may eventually get that
warning, and not the end user.
Could we first add the helper function/#define for now, and rework 
the code in a different patch series?


Lucas De Marchi



Thanks,

/Thomas

Re: [Intel-gfx] [PATCH] drm/i915/gtt: reduce overzealous alignment constraints for GGTT

2022-03-04 Thread Intel




On 3/3/22 11:02, Matthew Auld wrote:

Currently this will enforce both 2M alignment and padding for any LMEM
pages inserted into the GGTT. However, this was only meant to be applied
to the compact-pt layout with the ppGTT. For the GGTT we can reduce the
alignment and padding to 64K.

Bspec: 45015
Fixes: 87bd701ee268 ("drm/i915: enforce min GTT alignment for discrete cards")
Signed-off-by: Matthew Auld 
Cc: Thomas Hellström 
Cc: Robert Beckett 
Cc: Ramalingam C 


Reviewed-by: Thomas Hellström 



---
  drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 4bcdfcab3642..a5f5b2dda332 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -234,7 +234,8 @@ void i915_address_space_init(struct i915_address_space *vm, 
int subclass)
memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT,
 ARRAY_SIZE(vm->min_alignment));
  
-	if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) {

+   if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915) &&
+   subclass == VM_CLASS_PPGTT) {
vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M;
vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = 
I915_GTT_PAGE_SIZE_2M;
} else if (HAS_64K_PAGES(vm->i915)) {

Re: [Intel-gfx] [PATCH v3 2/3] drm/i915/gem: Remove logic for wbinvd_on_all_cpus

2022-02-22 Thread Intel


Hi, Michael,

On 2/22/22 18:26, Michael Cheng wrote:

This patch removes logic for wbinvd_on_all_cpus and brings in
drm_cache.h. This header has the logic that outputs a warning
when wbinvd_on_all_cpus when its being used on a non-x86 platform.

Signed-off-by: Michael Cheng 


Linus has been pretty clear that he won't accept patches that add macros 
that works on one arch and warns on others anymore in i915 and I figure 
even less so in drm code.


So we shouldn't try to move this out to drm. Instead we should restrict 
the wbinvd() inside our driver to integrated and X86 only. For discrete 
on all architectures we should be coherent and hence not be needing 
wbinvd().


Thanks,

/Thomas

Re: [Intel-gfx] [RFC 1/2] drm/i915/ttm: Add extra pages for handling ccs data

2022-02-07 Thread Intel


Hi, Ram,


On 2/7/22 10:37, Ramalingam C wrote:

While evicting the local memory data on flat-ccs capable platform we
need to evict the ccs data associated to the data.



  For this, we are
adding extra pages ((size / 256) >> PAGE_SIZE) into the ttm_tt.

To achieve this we are adding a new param into the ttm_tt_init as
ccs_pages_needed, which will be added into the ttm_tt->num_pages.


Please use imperative form above, Instead of "We are adding..", use "Add"




Signed-off-by: Ramalingam C 
Suggested-by: Thomas Hellstorm 

Hellstorm instead of Hellstrom might scare people off. :)

---
  drivers/gpu/drm/drm_gem_vram_helper.c  |  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 23 +-
  drivers/gpu/drm/qxl/qxl_ttm.c  |  2 +-
  drivers/gpu/drm/ttm/ttm_agp_backend.c  |  2 +-
  drivers/gpu/drm/ttm/ttm_tt.c   | 12 ++-
  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
  include/drm/ttm/ttm_tt.h   |  4 +++-
  7 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c
index 3f00192215d1..eef1f4dc7232 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -864,7 +864,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct 
ttm_buffer_object *bo,
if (!tt)
return NULL;
  
-	ret = ttm_tt_init(tt, bo, page_flags, ttm_cached);

+   ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0);
if (ret < 0)
goto err_ttm_tt_init;
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index 84cae740b4a5..bb71aa6d66c0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -16,6 +16,7 @@
  #include "gem/i915_gem_ttm.h"
  #include "gem/i915_gem_ttm_move.h"
  #include "gem/i915_gem_ttm_pm.h"
+#include "gt/intel_gpu_commands.h"
  
  #define I915_TTM_PRIO_PURGE 0

  #define I915_TTM_PRIO_NO_PAGES  1
@@ -242,12 +243,27 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = {
.release = i915_ttm_tt_release
  };
  
+static inline bool

+i915_gem_object_has_lmem_placement(struct drm_i915_gem_object *obj)
+{
+   int i;
+
+   for (i = 0; i < obj->mm.n_placements; i++)
+   if (obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL)
+   return true;
+
+   return false;
+}
+
  static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo,
 uint32_t page_flags)
  {
+   struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915),
+bdev);
struct ttm_resource_manager *man =
ttm_manager_type(bo->bdev, bo->resource->mem_type);
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+   unsigned long ccs_pages_needed = 0;
enum ttm_caching caching;
struct i915_ttm_tt *i915_tt;
int ret;
@@ -270,7 +286,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct 
ttm_buffer_object *bo,
i915_tt->is_shmem = true;
}
  
-	ret = ttm_tt_init(_tt->ttm, bo, page_flags, caching);

+   if (HAS_FLAT_CCS(i915) && i915_gem_object_has_lmem_placement(obj))
+   ccs_pages_needed = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size,
+  NUM_CCS_BYTES_PER_BLOCK), 
PAGE_SIZE);
+
+   ret = ttm_tt_init(_tt->ttm, bo, page_flags,
+ caching, ccs_pages_needed);


I'd suggest a patch that first adds the functionality to TTM, where even 
i915 passes in 0 here, and a follow-up patch for the i915 functionality 
where we add the ccs requirement.




if (ret)
goto err_free;
  
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c

index b2e33d5ba5d0..52156b54498f 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct 
ttm_buffer_object *bo,
ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
if (ttm == NULL)
return NULL;
-   if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) {
+   if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) {
kfree(ttm);
return NULL;
}
diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c 
b/drivers/gpu/drm/ttm/ttm_agp_backend.c
index 6ddc16f0fe2b..d27691f2e451 100644
--- a/drivers/gpu/drm/ttm/ttm_agp_backend.c
+++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c
@@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object 
*bo,
agp_be->mem = NULL;
agp_be->bridge = bridge;
  
-	if (ttm_tt_init(_be->ttm, bo, page_flags, ttm_write_combined)) {

+   if (ttm_tt_init(_be->ttm, bo, page_flags, ttm_write_combined, 0)) {
kfree(agp_be);
return

Re: [Intel-gfx] [PATCH] drm/i915: Allow dead vm to unbind vma's without lock.

2022-01-28 Thread Intel




On 1/28/22 09:57, Maarten Lankhorst wrote:

i915_gem_vm_close may take the lock, and we currently have no better way
of handling this. At least for now, allow a path in which holding vm->mutex
is sufficient. This is the case, because the object destroy path will
forcefully take vm->mutex now.

Signed-off-by: Maarten Lankhorst 


Reviewed-by: Thomas Hellstrom 



---
  drivers/gpu/drm/i915/i915_vma.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index b959e904c4d3..14a301c4069f 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -40,6 +40,17 @@
  #include "i915_vma.h"
  #include "i915_vma_resource.h"
  
+static inline void assert_vma_held_evict(const struct i915_vma *vma)

+{
+   /*
+* We may be forced to unbind when the vm is dead, to clean it up.
+* This is the only exception to the requirement of the object lock
+* being held.
+*/
+   if (atomic_read(>vm->open))
+   assert_object_held_shared(vma->obj);
+}
+
  static struct kmem_cache *slab_vmas;
  
  static struct i915_vma *i915_vma_alloc(void)

@@ -1779,7 +1790,7 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, 
bool async)
struct dma_fence *unbind_fence;
  
  	GEM_BUG_ON(i915_vma_is_pinned(vma));

-   assert_object_held_shared(vma->obj);
+   assert_vma_held_evict(vma);
  
  	if (i915_vma_is_map_and_fenceable(vma)) {

/* Force a pagefault for domain tracking on next user access */
@@ -1846,7 +1857,7 @@ int __i915_vma_unbind(struct i915_vma *vma)
int ret;
  
  	lockdep_assert_held(>vm->mutex);

-   assert_object_held_shared(vma->obj);
+   assert_vma_held_evict(vma);
  
  	if (!drm_mm_node_allocated(>node))

return 0;

Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag

2022-01-27 Thread Intel




On 1/26/22 18:11, Robert Beckett wrote:



On 26/01/2022 13:49, Thomas Hellström (Intel) wrote:


On 1/25/22 20:35, Robert Beckett wrote:

From: Ramalingam C 

Add a new platform flag, needs_compact_pt, to mark the requirement of
compact pt layout support for the ppGTT when using 64K GTT pages.

With this flag has_64k_pages will only indicate requirement of 64K
GTT page sizes or larger for device local memory access.

Suggested-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/i915/i915_drv.h  | 10 +++---
  drivers/gpu/drm/i915/i915_pci.c  |  2 ++
  drivers/gpu/drm/i915/intel_device_info.h |  1 +
  3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h 
b/drivers/gpu/drm/i915/i915_drv.h

index 44c1f98144b4..1258b7779705 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private 
*i915,

  /*
   * Set this flag, when platform requires 64K GTT page sizes or 
larger for
- * device local memory access. Also this flag implies that we 
require or
- * at least support the compact PT layout for the ppGTT when using 
the 64K

- * GTT pages.


Why do we remove these comment lines?
Because HAS_64K_PAGES now means just 64K page, it no longer means also 
requires compact pt.
This is to support other products that will have 64K but not have the 
PDE non-sharing restriction in future.


Those lines moved to the next change NEEDS_COMPACT_PT, which is now 
separate.


Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does 
"HAS_64K_PAGES" still mean compact is supported? That information is lost.


/Thomas

Re: [Intel-gfx] [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards

2022-01-26 Thread Intel




On 1/25/22 20:35, Robert Beckett wrote:

From: Matthew Auld 

For local-memory objects we need to align the GTT addresses
to 64K, both for the ppgtt and ggtt.

We need to support vm->min_alignment > 4K, depending
on the vm itself and the type of object we are inserting.
With this in mind update the GTT selftests to take this
into account.

For compact-pt we further align and pad lmem object GTT addresses
to 2MB to ensure PDEs contain consistent page sizes as
required by the HW.

v3:
* use needs_compact_pt flag to discriminate between
  64K and 64K with compact-pt
* add i915_vm_obj_min_alignment
* use i915_vm_obj_min_alignment to round up vma reservation
  if compact-pt instead of hard coding
v5:
* fix i915_vm_obj_min_alignment for internal objects which
  have no memory region

Signed-off-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
---
  .../i915/gem/selftests/i915_gem_client_blt.c  | 23 +++--
  drivers/gpu/drm/i915/gt/intel_gtt.c   | 12 +++
  drivers/gpu/drm/i915/gt/intel_gtt.h   | 18 
  drivers/gpu/drm/i915/i915_vma.c   |  9 ++
  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 ---
  5 files changed, 117 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
index c8ff8bf0986d..f0bfce53258f 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c
@@ -39,6 +39,7 @@ struct tiled_blits {
struct blit_buffer scratch;
struct i915_vma *batch;
u64 hole;
+   u64 align;
u32 width;
u32 height;
  };
@@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct 
rnd_state *prng)
goto err_free;
}
  
-	hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4);

+   t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */
+   t->align = max(t->align,
+  i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL));
+   t->align = max(t->align,
+  i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM));


Don't we always end up with 2M here, regardless of the vm restrictions?

Re: [Intel-gfx] [PATCH v5 5/5] drm/i915/uapi: document behaviour for DG2 64K support

2022-01-26 Thread Intel




On 1/25/22 20:35, Robert Beckett wrote:

From: Matthew Auld 

On discrete platforms like DG2, we need to support a minimum page size
of 64K when dealing with device local-memory. This is quite tricky for
various reasons, so try to document the new implicit uapi for this.

v3: fix typos and less emphasis
v2: Fixed suggestions on formatting [Daniel]

Signed-off-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
Acked-by: Jordan Justen 
Reviewed-by: Ramalingam C 
cc: Simon Ser 
cc: Pekka Paalanen 
Cc: Jordan Justen 
Cc: Kenneth Graunke 
Cc: mesa-...@lists.freedesktop.org
Cc: Tony Ye 
Cc: Slawomir Milczarek 
---


Reviewed-by: Thomas Hellström

Re: [Intel-gfx] [PATCH v5 4/5] drm/i915: add gtt misalignment test

2022-01-26 Thread Intel




On 1/25/22 20:35, Robert Beckett wrote:

add test to check handling of misaligned offsets and sizes

v4:
* remove spurious blank lines
* explicitly cast intel_region_id to intel_memory_type in misaligned_pin
Reported-by: kernel test robot 

Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++
  1 file changed, 128 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c 
b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index b80788a2b7f9..f082b5ff3b5e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -22,10 +22,12 @@
   *
   */
  
+#include "gt/intel_gtt.h"

  #include 
  #include 
  
  #include "gem/i915_gem_context.h"

+#include "gem/i915_gem_region.h"
  #include "gem/selftests/mock_context.h"
  #include "gt/intel_context.h"
  #include "gt/intel_gpu_commands.h"
@@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm,
return err;
  }
  
+static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr,

+  u64 addr, u64 size, unsigned long flags)
+{
+   struct drm_i915_gem_object *obj;
+   struct i915_vma *vma;
+   int err = 0;
+   u64 expected_vma_size, expected_node_size;
+
+   obj = i915_gem_object_create_region(mr, size, 0, 0);
+   if (IS_ERR(obj))
+   return PTR_ERR(obj);
+
+   vma = i915_vma_instance(obj, vm, NULL);
+   if (IS_ERR(vma)) {
+   err = PTR_ERR(vma);
+   goto err_put;
+   }
+
+   err = i915_vma_pin(vma, 0, 0, addr | flags);
+   if (err)
+   goto err_put;
+   i915_vma_unpin(vma);
+
+   if (!drm_mm_node_allocated(>node)) {
+   err = -EINVAL;
+   goto err_put;
+   }
+
+   if (i915_vma_misplaced(vma, 0, 0, addr | flags)) {
+   err = -EINVAL;
+   goto err_put;
+   }
+
+   expected_vma_size = round_up(size, 1 << 
(ffs(vma->resource->page_sizes_gtt) - 1));
+   expected_node_size = expected_vma_size;
+
+   if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) {
+   /* dg2 should expand lmem node to 2MB */


Should this test be NEEDS_COMPACT_PT()?

Otherwise LGTM. Reviewed-by: Thomas Hellström

Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag

2022-01-26 Thread Intel




On 1/25/22 20:35, Robert Beckett wrote:

From: Ramalingam C 

Add a new platform flag, needs_compact_pt, to mark the requirement of
compact pt layout support for the ppGTT when using 64K GTT pages.

With this flag has_64k_pages will only indicate requirement of 64K
GTT page sizes or larger for device local memory access.

Suggested-by: Matthew Auld 
Signed-off-by: Ramalingam C 
Signed-off-by: Robert Beckett 
---
  drivers/gpu/drm/i915/i915_drv.h  | 10 +++---
  drivers/gpu/drm/i915/i915_pci.c  |  2 ++
  drivers/gpu/drm/i915/intel_device_info.h |  1 +
  3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 44c1f98144b4..1258b7779705 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
  
  /*

   * Set this flag, when platform requires 64K GTT page sizes or larger for
- * device local memory access. Also this flag implies that we require or
- * at least support the compact PT layout for the ppGTT when using the 64K
- * GTT pages.


Why do we remove these comment lines?



+ * device local memory access.
   */
  #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
  
+/* Set this flag when platform doesn't allow both 64k pages and 4k pages in


First line of multi-line comments should be empty.



+ * the same PT. this flag means we need to support compact PT layout for the
+ * ppGTT when using the 64K GTT pages.
+ */
+#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt)
+
  #define HAS_IPC(dev_priv)  (INTEL_INFO(dev_priv)->display.has_ipc)
  
  #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 4081fd50ba9d..799b56569ef5 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = {
PLATFORM(INTEL_XEHPSDV),
.display = { },
.has_64k_pages = 1,
+   .needs_compact_pt = 1,
.platform_engine_mask =
BIT(RCS0) | BIT(BCS0) |
BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) |
@@ -1045,6 +1046,7 @@ static const struct intel_device_info dg2_info = {
.media.rel = 55,
PLATFORM(INTEL_DG2),
.has_64k_pages = 1,
+   .needs_compact_pt = 1,
.platform_engine_mask =
BIT(RCS0) | BIT(BCS0) |
BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h 
b/drivers/gpu/drm/i915/intel_device_info.h
index 3699b1c539ea..c8aaf646430c 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -130,6 +130,7 @@ enum intel_ppgtt_type {
/* Keep has_* in alphabetical order */ \
func(has_64bit_reloc); \
func(has_64k_pages); \
+   func(needs_compact_pt); \
func(gpu_reset_clobbers_display); \
func(has_reset_engine); \
func(has_global_mocs); \

Re: [Intel-gfx] [PATCH 06/11] dma-buf: warn about containers in dma_resv object

2022-01-24 Thread Intel




On 1/24/22 14:03, Christian König wrote:

Drivers should not add containers as shared fences to the dma_resv
object, instead each fence should be added individually.

Signed-off-by: Christian König 
Reviewed-by: Daniel Vetter 


Reviewed-by: Thomas Hellström 

Is there any indication that this triggers on existing drivers?

Thomas

Re: [Intel-gfx] [PATCH] drm/i915: Lock dpt_obj around set_cache_level.

2022-01-24 Thread Intel




On 1/24/22 14:02, Maarten Lankhorst wrote:

set_cache_level may unbind the object, which will result in the below
lockdep splat:
<6> [184.578145] [IGT] kms_addfb_basic: starting subtest 
addfb25-framebuffer-vs-set-tiling
<4> [184.578220] [ cut here ]
<4> [184.578221] WARN_ON(debug_locks && 
!(lock_is_held(&(&((obj)->base.resv)->lock.base)->dep_map) != 0))
<4> [184.578237] WARNING: CPU: 6 PID: 5544 at 
drivers/gpu/drm/i915/i915_gem.c:123 i915_gem_object_unbind+0x4a9/0x510 [i915]
<4> [184.578323] Modules linked in: vgem drm_shmem_helper snd_hda_codec_hdmi 
i915 mei_hdcp x86_pkg_temp_thermal snd_hda_intel coretemp crct10dif_pclmul 
snd_intel_dspcfg crc32_pclmul ttm snd_hda_codec ghash_clmulni_intel snd_hwdep 
drm_kms_helper snd_hda_core e1000e mei_me syscopyarea ptp snd_pcm sysfillrect mei 
pps_core sysimgblt fb_sys_fops prime_numbers intel_lpss_pci smsc75xx usbnet mii
<4> [184.578349] CPU: 6 PID: 5544 Comm: kms_addfb_basic Not tainted 
5.16.0-CI-Patchwork_22006+ #1
<4> [184.578351] Hardware name: Intel Corporation Alder Lake Client 
Platform/AlderLake-P DDR4 RVP, BIOS ADLPFWI1.R00.2422.A00.2110131104 10/13/2021
<4> [184.578352] RIP: 0010:i915_gem_object_unbind+0x4a9/0x510 [i915]
<4> [184.578424] Code: 00 be ff ff ff ff 48 8d 78 68 e8 a2 6e 2b e1 85 c0 0f 85 b1 fb 
ff ff 48 c7 c6 48 37 9e a0 48 c7 c7 d9 fc a1 a0 e8 a3 54 26 e1 <0f> 0b e9 97 fb ff ff 
31 ed 48 8b 5c 24 58 65 48 33 1c 25 28 00 00
<4> [184.578426] RSP: 0018:c900013b3b68 EFLAGS: 00010286
<4> [184.578428] RAX:  RBX: c900013b3bb0 RCX: 
0001
<4> [184.578429] RDX: 8001 RSI: 8230b42d RDI: 

<4> [184.578430] RBP: 888120e1 R08:  R09: 
c0007fff
<4> [184.578431] R10: 0001 R11: c900013b3980 R12: 
8881176ea740
<4> [184.578432] R13: 888120e1 R14:  R15: 
0001
<4> [184.578433] FS:  7f65074f5e40() GS:8f30() 
knlGS:
<4> [184.578435] CS:  0010 DS:  ES:  CR0: 80050033
<4> [184.578436] CR2: 7fff4420ede8 CR3: 00010c2f2005 CR4: 
00770ee0
<4> [184.578437] PKRU: 5554
<4> [184.578438] Call Trace:
<4> [184.578439]  
<4> [184.578440]  ? dma_resv_iter_first_unlocked+0x78/0xf0
<4> [184.578447]  intel_dpt_create+0x88/0x220 [i915]
<4> [184.578530]  intel_framebuffer_init+0x5b8/0x620 [i915]
<4> [184.578612]  intel_framebuffer_create+0x3d/0x60 [i915]
<4> [184.578691]  intel_user_framebuffer_create+0x18f/0x2c0 [i915]
<4> [184.578775]  drm_internal_framebuffer_create+0x36d/0x4c0
<4> [184.578779]  drm_mode_addfb2+0x2f/0xd0
<4> [184.578781]  ? drm_mode_addfb_ioctl+0x10/0x10
<4> [184.578784]  drm_ioctl_kernel+0xac/0x140
<4> [184.578787]  drm_ioctl+0x201/0x3d0
<4> [184.578789]  ? drm_mode_addfb_ioctl+0x10/0x10
<4> [184.578796]  __x64_sys_ioctl+0x6a/0xa0
<4> [184.578800]  do_syscall_64+0x37/0xb0
<4> [184.578803]  entry_SYSCALL_64_after_hwframe+0x44/0xae
<4> [184.578805] RIP: 0033:0x7f6506736317
<4> [184.578807] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff 
ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 
01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
<4> [184.578808] RSP: 002b:7fff44211a98 EFLAGS: 0246 ORIG_RAX: 
0010
<4> [184.578810] RAX: ffda RBX: 0006 RCX: 
7f6506736317
<4> [184.578811] RDX: 7fff44211b30 RSI: c06864b8 RDI: 
0006
<4> [184.578812] RBP: 7fff44211b30 R08: 7fff44311170 R09: 

<4> [184.578813] R10: 0008 R11: 0246 R12: 
c06864b8
<4> [184.578813] R13: 0006 R14:  R15: 

<4> [184.578819]  
<4> [184.578820] irq event stamp: 47931
<4> [184.578821] hardirqs last  enabled at (47937): [] 
__up_console_sem+0x62/0x70
<4> [184.578824] hardirqs last disabled at (47942): [] 
__up_console_sem+0x47/0x70
<4> [184.578826] softirqs last  enabled at (47340): [] 
__do_softirq+0x32d/0x493
<4> [184.578828] softirqs last disabled at (47335): [] 
irq_exit_rcu+0xa6/0xe0
<4> [184.578830] ---[ end trace f17ec219f892c7d4 ]---

Fixes: 0f341974cbc2 ("drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for 
i915_vma_unbind, v2.")
Signed-off-by: Maarten Lankhorst 
Testcase: kms_addfb_basic
---
  drivers/gpu/drm/i915/display/intel_dpt.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)


LGTM. Reviewed-by: Thomas Hellström 




diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index 63

Re: [Linaro-mm-sig] [PATCH 1/9] dma-buf: consolidate dma_fence subclass checking

2022-01-20 Thread Intel




On 1/20/22 14:27, Christian König wrote:

Consolidate the wrapper functions to check for dma_fence
subclasses in the dma_fence header.

This makes it easier to document and also check the different
requirements for fence containers in the subclasses.

Signed-off-by: Christian König 
---
  include/linux/dma-fence-array.h | 15 +
  include/linux/dma-fence-chain.h |  3 +--
  include/linux/dma-fence.h   | 38 +
  3 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h
index 303dd712220f..fec374f69e12 100644
--- a/include/linux/dma-fence-array.h
+++ b/include/linux/dma-fence-array.h
@@ -45,19 +45,6 @@ struct dma_fence_array {
struct irq_work work;
  };
  
-extern const struct dma_fence_ops dma_fence_array_ops;

-
-/**
- * dma_fence_is_array - check if a fence is from the array subsclass
- * @fence: fence to test
- *
- * Return true if it is a dma_fence_array and false otherwise.
- */
-static inline bool dma_fence_is_array(struct dma_fence *fence)
-{
-   return fence->ops == _fence_array_ops;
-}
-
  /**
   * to_dma_fence_array - cast a fence to a dma_fence_array
   * @fence: fence to cast to a dma_fence_array
@@ -68,7 +55,7 @@ static inline bool dma_fence_is_array(struct dma_fence *fence)
  static inline struct dma_fence_array *
  to_dma_fence_array(struct dma_fence *fence)
  {
-   if (fence->ops != _fence_array_ops)
+   if (!fence || !dma_fence_is_array(fence))
return NULL;
  
  	return container_of(fence, struct dma_fence_array, base);

diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h
index 54fe3443fd2c..ee906b659694 100644
--- a/include/linux/dma-fence-chain.h
+++ b/include/linux/dma-fence-chain.h
@@ -49,7 +49,6 @@ struct dma_fence_chain {
spinlock_t lock;
  };
  
-extern const struct dma_fence_ops dma_fence_chain_ops;
  
  /**

   * to_dma_fence_chain - cast a fence to a dma_fence_chain
@@ -61,7 +60,7 @@ extern const struct dma_fence_ops dma_fence_chain_ops;
  static inline struct dma_fence_chain *
  to_dma_fence_chain(struct dma_fence *fence)
  {
-   if (!fence || fence->ops != _fence_chain_ops)
+   if (!fence || !dma_fence_is_chain(fence))
return NULL;
  
  	return container_of(fence, struct dma_fence_chain, base);

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 1ea691753bd3..775cdc0b4f24 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -587,4 +587,42 @@ struct dma_fence *dma_fence_get_stub(void);
  struct dma_fence *dma_fence_allocate_private_stub(void);
  u64 dma_fence_context_alloc(unsigned num);
  
+extern const struct dma_fence_ops dma_fence_array_ops;

+extern const struct dma_fence_ops dma_fence_chain_ops;
+
+/**
+ * dma_fence_is_array - check if a fence is from the array subclass
+ * @fence: the fence to test
+ *
+ * Return true if it is a dma_fence_array and false otherwise.
+ */
+static inline bool dma_fence_is_array(struct dma_fence *fence)
+{
+   return fence->ops == _fence_array_ops;
+}
+
+/**
+ * dma_fence_is_chain - check if a fence is from the chain subclass
+ * @fence: the fence to test
+ *
+ * Return true if it is a dma_fence_chain and false otherwise.
+ */
+static inline bool dma_fence_is_chain(struct dma_fence *fence)
+{
+   return fence->ops == _fence_chain_ops;
+}
+
+/**
+ * dma_fence_is_container - check if a fence is a container for other fences
+ * @fence: the fence to test
+ *
+ * Return true if this fence is a container for other fences, false otherwise.
+ * This is important since we can't build up large fence structure or otherwise
+ * we run into recursion during operation on those fences.
+ */
+static inline bool dma_fence_is_container(struct dma_fence *fence)
+{
+   return dma_fence_is_array(fence) || dma_fence_is_chain(fence);
+}


What's the strategy here moving forward if we add more dma_resv 
containers, or if a driver adds a container that similarly has risc of 
recursion? Should we perhaps add an ops bool for this, or require that 
all dma_resv containers that has this limitation be part of the dma-buf 
subsystem rather than driver-specific?


Thanks,
/Thomas



+
  #endif /* __LINUX_DMA_FENCE_H */

Re: [Intel-gfx] [PATCH v6 2/6] drm/i915: Add locking to i915_gem_evict_vm(), v2.

2022-01-14 Thread Intel




On 1/14/22 14:23, Maarten Lankhorst wrote:

i915_gem_evict_vm will need to be able to evict objects that are
locked by the current ctx. By testing if the current context already
locked the object, we can do this correctly. This allows us to
evict the entire vm even if we already hold some objects' locks.

Previously, this was spread over several commits, but it makes
more sense to commit the changes to i915_gem_evict_vm separately
from the changes to i915_gem_evict_something() and
i915_gem_evict_for_node().

Changes since v1:
- Handle evicting dead objects better.

Signed-off-by: Maarten Lankhorst 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  2 +-
  drivers/gpu/drm/i915/i915_gem_evict.c | 32 +--
  drivers/gpu/drm/i915/i915_gem_evict.h |  4 ++-
  drivers/gpu/drm/i915/i915_vma.c   |  7 +++-
  .../gpu/drm/i915/selftests/i915_gem_evict.c   | 10 --
  6 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cf283b5f6ffe..4d832d6696b5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -767,7 +767,7 @@ static int eb_reserve(struct i915_execbuffer *eb)
case 1:
/* Too fragmented, unbind everything and retry */
mutex_lock(>context->vm->mutex);
-   err = i915_gem_evict_vm(eb->context->vm);
+   err = i915_gem_evict_vm(eb->context->vm, >ww);
mutex_unlock(>context->vm->mutex);
if (err)
return err;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index fafd158e5313..a69787999d09 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -367,7 +367,7 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
if (vma == ERR_PTR(-ENOSPC)) {
ret = mutex_lock_interruptible(>vm.mutex);
if (!ret) {
-   ret = i915_gem_evict_vm(>vm);
+   ret = i915_gem_evict_vm(>vm, );
mutex_unlock(>vm.mutex);
}
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c 
b/drivers/gpu/drm/i915/i915_gem_evict.c
index 24eee0c2055f..91f82ecb9ef4 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -368,7 +368,7 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
   * To clarify: This is for freeing up virtual address space, not for freeing
   * memory in e.g. the shrinker.
   */
-int i915_gem_evict_vm(struct i915_address_space *vm)
+int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx 
*ww)
  {
int ret = 0;
  
@@ -389,24 +389,52 @@ int i915_gem_evict_vm(struct i915_address_space *vm)

do {
struct i915_vma *vma, *vn;
LIST_HEAD(eviction_list);
+   LIST_HEAD(locked_eviction_list);
  
  		list_for_each_entry(vma, >bound_list, vm_link) {

if (i915_vma_is_pinned(vma))
continue;
  
+			/*

+* If we already own the lock, trylock fails. In case
+* the resv is shared among multiple objects, we still
+* need the object ref.
+*/


Should we handle dying vmas like for the other eviction utilities and 
also evict pinned dying vmas?



+   if (!kref_read(>obj->base.refcount) ||
+   (ww && (dma_resv_locking_ctx(vma->obj->base.resv) == 
>ctx))) {
+   __i915_vma_pin(vma);
+   list_add(>evict_link, 
_eviction_list);
+   continue;
+   }
+
+   if (!i915_gem_object_trylock(vma->obj, ww))
+   continue;
+
__i915_vma_pin(vma);
list_add(>evict_link, _list);
}
-   if (list_empty(_list))
+   if (list_empty(_list) && 
list_empty(_eviction_list))
break;
  
  		ret = 0;

+   /* Unbind locked objects first, before unlocking the 
eviction_list */
+   list_for_each_entry_safe(vma, vn, _eviction_list, 
evict_link) {
+   __i915_vma_unpin(vma);
+
+   if (ret == 0)
+   ret = __i915_vma_unbind(vma);
+   if (ret != -EINTR) /* "Get me out of here!" */
+   ret = 0;
+   }
+

Re: [Intel-gfx] [PATCH v5 2/6] drm/i915: Add locking to i915_gem_evict_vm()

2022-01-13 Thread Intel




On 1/13/22 12:44, Maarten Lankhorst wrote:

i915_gem_evict_vm will need to be able to evict objects that are
locked by the current ctx. By testing if the current context already
locked the object, we can do this correctly. This allows us to
evict the entire vm even if we already hold some objects' locks.

Previously, this was spread over several commits, but it makes
more sense to commit the changes to i915_gem_evict_vm separately
from the changes to i915_gem_evict_something() and
i915_gem_evict_for_node().

Signed-off-by: Maarten Lankhorst 
---
  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  2 +-
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  2 +-
  drivers/gpu/drm/i915/i915_gem_evict.c | 42 ++-
  drivers/gpu/drm/i915/i915_gem_evict.h |  4 +-
  drivers/gpu/drm/i915/i915_vma.c   |  7 +++-
  .../gpu/drm/i915/selftests/i915_gem_evict.c   | 10 -
  6 files changed, 59 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index cf283b5f6ffe..4d832d6696b5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -767,7 +767,7 @@ static int eb_reserve(struct i915_execbuffer *eb)
case 1:
/* Too fragmented, unbind everything and retry */
mutex_lock(>context->vm->mutex);
-   err = i915_gem_evict_vm(eb->context->vm);
+   err = i915_gem_evict_vm(eb->context->vm, >ww);
mutex_unlock(>context->vm->mutex);
if (err)
return err;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index fafd158e5313..a69787999d09 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -367,7 +367,7 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
if (vma == ERR_PTR(-ENOSPC)) {
ret = mutex_lock_interruptible(>vm.mutex);
if (!ret) {
-   ret = i915_gem_evict_vm(>vm);
+   ret = i915_gem_evict_vm(>vm, );
mutex_unlock(>vm.mutex);
}
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c 
b/drivers/gpu/drm/i915/i915_gem_evict.c
index 24eee0c2055f..370eb7238d1c 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -74,6 +74,12 @@ static bool defer_evict(struct i915_vma *vma)
return false;
  }
  
+static int evict_dead(struct i915_vma *vma)

+{
+   atomic_and(~I915_VMA_PIN_MASK, >flags);
+   return __i915_vma_unbind(vma);
+}
+
  /**
   * i915_gem_evict_something - Evict vmas to make room for binding a new one
   * @vm: address space to evict from
@@ -368,7 +374,7 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
   * To clarify: This is for freeing up virtual address space, not for freeing
   * memory in e.g. the shrinker.
   */
-int i915_gem_evict_vm(struct i915_address_space *vm)
+int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx 
*ww)
  {
int ret = 0;
  
@@ -389,24 +395,56 @@ int i915_gem_evict_vm(struct i915_address_space *vm)

do {
struct i915_vma *vma, *vn;
LIST_HEAD(eviction_list);
+   LIST_HEAD(locked_eviction_list);
  
  		list_for_each_entry(vma, >bound_list, vm_link) {

+   if (!kref_read(>obj->base.refcount)) {
+   ret = evict_dead(vma);
+   if (ret)
+   break;
+   }
+


Could the call to evict_dead corrupt the bound_list?


if (i915_vma_is_pinned(vma))
continue;
  
+			/*

+* If we already own the lock, trylock fails. In case 
the resv
+* is shared among multiple objects, we still need the 
object ref.
+*/
+   if (ww && (dma_resv_locking_ctx(vma->obj->base.resv) == 
>ctx)) {
+   __i915_vma_pin(vma);
+   list_add(>evict_link, 
_eviction_list);
+   continue;
+   }
+
+   if (!i915_gem_object_trylock(vma->obj, ww))
+   continue;
+
__i915_vma_pin(vma);
list_add(>evict_link, _list);
}
-   if (list_empty(_list))
+   if (list_empty(_list) && 
list_empty(_eviction_list))
break;
  
  		ret = 0;

+   /* Unbind locked objects first, before unlocking the 
eviction_list

Re: [Intel-gfx] [PATCH 2/2] drm/i915: clean up shrinker_release_pages

2022-01-05 Thread Intel


Hi, Matthew

On 12/15/21 12:07, Matthew Auld wrote:

Add some proper flags for the different modes, and shorten the name to
something more snappy.

Suggested-by: Tvrtko Ursulin 
Signed-off-by: Matthew Auld 


LGTM.

Reviewed-by: Thomas Hellström

Re: [Intel-gfx] [PATCH] drm/i915: Use trylock instead of blocking lock for __i915_gem_free_objects.

2021-12-22 Thread Intel




On 12/22/21 16:56, Maarten Lankhorst wrote:

Convert free_work into delayed_work, similar to ttm to allow converting the
blocking lock in __i915_gem_free_objects to a trylock.

Unlike ttm, the object should already be idle, as it's kept alive
by a reference through struct i915_vma->active, which is dropped
after all vma's are idle.

Because of this, we can use a no wait by default, or when the lock
is contested, we use ttm's 10 ms.

The trylock should only fail when the object is sharing it's resv with
other objects, and typically objects are not kept locked for a long
time, so we can safely retry on failure.

Fixes: be7612fd6665 ("drm/i915: Require object lock when freeing pages during 
destruction")
Testcase: igt/gem_exec_alignment/pi*
Signed-off-by: Maarten Lankhorst 
---
  drivers/gpu/drm/i915/gem/i915_gem_object.c | 14 ++
  drivers/gpu/drm/i915/i915_drv.h|  4 ++--
  2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 39cd563544a5..d87b508b59b1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -331,7 +331,13 @@ static void __i915_gem_free_objects(struct 
drm_i915_private *i915,
continue;
}
  
-		i915_gem_object_lock(obj, NULL);

+   if (!i915_gem_object_trylock(obj, NULL)) {
+   /* busy, toss it back to the pile */
+   if (llist_add(>freed, >mm.free_list))
+   queue_delayed_work(i915->wq, 
>mm.free_work, msecs_to_jiffies(10));


i915->wq is ordered. From what I can tell, with queue_delayed_work(), 
the work doesn't get inserted into the queue order until the delay 
expires, right? So we don't unnecessarily hold up other objects getting 
freed?



+   continue;
+   }
+
__i915_gem_object_pages_fini(obj);
i915_gem_object_unlock(obj);
__i915_gem_free_object(obj);
@@ -353,7 +359,7 @@ void i915_gem_flush_free_objects(struct drm_i915_private 
*i915)
  static void __i915_gem_free_work(struct work_struct *work)
  {
struct drm_i915_private *i915 =
-   container_of(work, struct drm_i915_private, mm.free_work);
+   container_of(work, struct drm_i915_private, mm.free_work.work);
  
  	i915_gem_flush_free_objects(i915);

  }
@@ -385,7 +391,7 @@ static void i915_gem_free_object(struct drm_gem_object 
*gem_obj)
 */
  
  	if (llist_add(>freed, >mm.free_list))

-   queue_work(i915->wq, >mm.free_work);
+   queue_delayed_work(i915->wq, >mm.free_work, 0);
  }
  
  void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,

@@ -710,7 +716,7 @@ bool i915_gem_object_placement_possible(struct 
drm_i915_gem_object *obj,
  
  void i915_gem_init__objects(struct drm_i915_private *i915)

  {
-   INIT_WORK(>mm.free_work, __i915_gem_free_work);
+   INIT_DELAYED_WORK(>mm.free_work, __i915_gem_free_work);
  }
  
  void i915_objects_module_exit(void)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c8fddb7e61c9..beeb42a14aae 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -465,7 +465,7 @@ struct i915_gem_mm {
 * List of objects which are pending destruction.
 */
struct llist_head free_list;
-   struct work_struct free_work;
+   struct delayed_work free_work;
/**
 * Count of objects pending destructions. Used to skip needlessly
 * waiting on an RCU barrier if no objects are waiting to be freed.
@@ -1625,7 +1625,7 @@ static inline void i915_gem_drain_freed_objects(struct 
drm_i915_private *i915)
 * armed the work again.
 */
while (atomic_read(>mm.free_count)) {
-   flush_work(>mm.free_work);
+   flush_delayed_work(>mm.free_work);
flush_delayed_work(>bdev.wq);
rcu_barrier();
}


Otherwise LGTM.

Reviewed-by: Thomas Hellström

Re: [Intel-gfx] [PATCH 1/4] drm/i915: Add has_64k_pages flag

2021-12-08 Thread Intel




On 12/8/21 13:59, Matthew Auld wrote:

On Wed, 8 Dec 2021 at 12:43, Thomas Hellström (Intel)
 wrote:

Hi,

On 12/7/21 17:51, Ramalingam C wrote:

From: Stuart Summers 

Add a new platform flag, has_64k_pages, for platforms supporting
base page sizes of 64k.

Signed-off-by: Stuart Summers 
Signed-off-by: Ramalingam C 
Reviewed-by: Lucas De Marchi 
---
   drivers/gpu/drm/i915/i915_drv.h  | 2 ++
   drivers/gpu/drm/i915/i915_pci.c  | 2 ++
   drivers/gpu/drm/i915/intel_device_info.h | 1 +
   3 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 85bb8d3107f0..6132163e1cb3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1528,6 +1528,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
   #define HAS_MSLICES(dev_priv) \
   (INTEL_INFO(dev_priv)->has_mslices)

+#define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)
+

Could we please have some documentation of the exact meaning of this flag.
Like, smallest page size of LMEM is 64K. Hardware supports 64k pages etc?

Something like: "Set if the device requires 64K GTT page sizes or
larger for device local memory access. Also implies that we require or
at least support the compact PT layout for the ppGTT when using 64K
GTT pages."


Sounds great.

Thanks,

Thomas

Re: [Intel-gfx] [PATCH 2/4] drm/i915/xehpsdv: set min page-size to 64K

2021-12-08 Thread Intel




On 12/7/21 17:51, Ramalingam C wrote:

From: Matthew Auld 

LMEM should be allocated at 64K granularity, since 4K page support will
eventually be dropped for LMEM when using the PPGTT.

Signed-off-by: Matthew Auld 
Signed-off-by: Stuart Summers 
Signed-off-by: Ramalingam C 
Cc: Joonas Lahtinen 
Cc: Rodrigo Vivi 
Reviewed-by: Lucas De Marchi 



Reviewed-by: Thomas Hellstrom 



---
  drivers/gpu/drm/i915/gem/i915_gem_stolen.c  | 6 +-
  drivers/gpu/drm/i915/gt/intel_region_lmem.c | 5 -
  2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c 
b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
index bce03d74a0b4..ba90ab47d838 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c
@@ -780,6 +780,7 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, 
u16 type,
struct intel_uncore *uncore = >uncore;
struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
struct intel_memory_region *mem;
+   resource_size_t min_page_size;
resource_size_t io_start;
resource_size_t lmem_size;
u64 lmem_base;
@@ -791,8 +792,11 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, 
u16 type,
lmem_size = pci_resource_len(pdev, 2) - lmem_base;
io_start = pci_resource_start(pdev, 2) + lmem_base;
  
+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :

+   I915_GTT_PAGE_SIZE_4K;
+
mem = intel_memory_region_create(i915, lmem_base, lmem_size,
-I915_GTT_PAGE_SIZE_4K, io_start,
+min_page_size, io_start,
 type, instance,
 _region_stolen_lmem_ops);
if (IS_ERR(mem))
diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c 
b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
index 9ea49e0a27c0..fde2dcb59809 100644
--- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c
+++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c
@@ -197,6 +197,7 @@ static struct intel_memory_region *setup_lmem(struct 
intel_gt *gt)
struct intel_uncore *uncore = gt->uncore;
struct pci_dev *pdev = to_pci_dev(i915->drm.dev);
struct intel_memory_region *mem;
+   resource_size_t min_page_size;
resource_size_t io_start;
resource_size_t lmem_size;
int err;
@@ -211,10 +212,12 @@ static struct intel_memory_region *setup_lmem(struct 
intel_gt *gt)
if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2)))
return ERR_PTR(-ENODEV);
  
+	min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K :

+   I915_GTT_PAGE_SIZE_4K;
mem = intel_memory_region_create(i915,
 0,
 lmem_size,
-I915_GTT_PAGE_SIZE_4K,
+min_page_size,
 io_start,
 INTEL_MEMORY_LOCAL,
 0,

Re: [Intel-gfx] [PATCH 1/4] drm/i915: Add has_64k_pages flag

2021-12-08 Thread Intel


Hi,

On 12/7/21 17:51, Ramalingam C wrote:

From: Stuart Summers 

Add a new platform flag, has_64k_pages, for platforms supporting
base page sizes of 64k.

Signed-off-by: Stuart Summers 
Signed-off-by: Ramalingam C 
Reviewed-by: Lucas De Marchi 
---
  drivers/gpu/drm/i915/i915_drv.h  | 2 ++
  drivers/gpu/drm/i915/i915_pci.c  | 2 ++
  drivers/gpu/drm/i915/intel_device_info.h | 1 +
  3 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 85bb8d3107f0..6132163e1cb3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1528,6 +1528,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
  #define HAS_MSLICES(dev_priv) \
(INTEL_INFO(dev_priv)->has_mslices)
  
+#define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages)

+


Could we please have some documentation of the exact meaning of this flag.
Like, smallest page size of LMEM is 64K. Hardware supports 64k pages etc?

for future reference.

/Thomas






  #define HAS_IPC(dev_priv)  (INTEL_INFO(dev_priv)->display.has_ipc)
  
  #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i))

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 6aaa7c644c9b..634282edadb7 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -1029,6 +1029,7 @@ static const struct intel_device_info xehpsdv_info = {
DGFX_FEATURES,
PLATFORM(INTEL_XEHPSDV),
.display = { },
+   .has_64k_pages = 1,
.pipe_mask = 0,
.platform_engine_mask =
BIT(RCS0) | BIT(BCS0) |
@@ -1047,6 +1048,7 @@ static const struct intel_device_info dg2_info = {
.graphics.rel = 55,
.media.rel = 55,
PLATFORM(INTEL_DG2),
+   .has_64k_pages = 1,
.platform_engine_mask =
BIT(RCS0) | BIT(BCS0) |
BIT(VECS0) | BIT(VECS1) |
diff --git a/drivers/gpu/drm/i915/intel_device_info.h 
b/drivers/gpu/drm/i915/intel_device_info.h
index 669f0d26c3c3..f38ac5bd837b 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -123,6 +123,7 @@ enum intel_ppgtt_type {
func(is_dgfx); \
/* Keep has_* in alphabetical order */ \
func(has_64bit_reloc); \
+   func(has_64k_pages); \
func(gpu_reset_clobbers_display); \
func(has_reset_engine); \
func(has_global_mocs); \

Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes

2021-12-03 Thread Intel




On 12/3/21 16:00, Christian König wrote:

Am 03.12.21 um 15:50 schrieb Thomas Hellström:


On 12/3/21 15:26, Christian König wrote:

[Adding Daniel here as well]

Am 03.12.21 um 15:18 schrieb Thomas Hellström:

[SNIP]

Well that's ok as well. My question is why does this single dma_fence
then shows up in the dma_fence_chain representing the whole
migration?

What we'd like to happen during eviction is that we

1) await any exclusive- or moving fences, then schedule the migration
blit. The blit manages its own GPU ptes. Results in a single fence.
2) Schedule unbind of any gpu vmas, resulting possibly in multiple
fences.
3) Most but not all of the remaining resv shared fences will have been
finished in 2) We can't easily tell which so we have a couple of 
shared

fences left.


Stop, wait a second here. We are going a bit in circles.

Before you migrate a buffer, you *MUST* wait for all shared fences 
to complete. This is documented mandatory DMA-buf behavior.


Daniel and I have discussed that quite extensively in the last few 
month.


So how does it come that you do the blit before all shared fences 
are completed?


Well we don't currently but wanted to... (I haven't consulted Daniel 
in the matter, tbh).


I was under the impression that all writes would add an exclusive 
fence to the dma_resv.


Yes that's correct. I'm working on to have more than one write fence, 
but that is currently under review.


If that's not the case or this is otherwise against the mandatory 
DMA-buf bevhavior, we can certainly keep that part as is and that 
would eliminate 3).


Ah, now that somewhat starts to make sense.

So your blit only waits for the writes to finish before starting the 
blit. Yes that's legal as long as you don't change the original 
content with the blit.


But don't you then need to wait for both reads and writes before you 
unmap the VMAs?


Yes, but that's planned to be done all async, and those unbind jobs are 
scheduled simultaneosly with the blit, and the blit itself manages its 
own page-table-entries, so no need to unbind any blit vmas.




Anyway the good news is your problem totally goes away with the 
DMA-resv rework I've already send out. Basically it is now possible to 
have more than one fence in the DMA-resv object for migrations and all 
existing fences are kept around until they are finished.


Sounds good.

Thanks,

Thomas

Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes

2021-12-01 Thread Intel




On 12/1/21 12:25, Christian König wrote:

Am 01.12.21 um 12:04 schrieb Thomas Hellström (Intel):


On 12/1/21 11:32, Christian König wrote:

Am 01.12.21 um 11:15 schrieb Thomas Hellström (Intel):

[SNIP]


What we could do is to avoid all this by not calling the callback 
with the lock held in the first place.


If that's possible that might be a good idea, pls also see below.


The problem with that is 
dma_fence_signal_locked()/dma_fence_signal_timestamp_locked(). If we 
could avoid using that or at least allow it to drop the lock then we 
could call the callback without holding it.


Somebody would need to audit the drivers and see if holding the lock 
is really necessary anywhere.








/Thomas


Oh, and a follow up question:

If there was a way to break the recursion on final put() (using 
the same basic approach as patch 2 in this series uses to break 
recursion in enable_signaling()), so that none of these 
containers did require any special treatment, would it be worth 
pursuing? I guess it might be possible by having the callbacks 
drop the references rather than the loop in the final put. + a 
couple of changes in code iterating over the fence pointers.


That won't really help, you just move the recursion from the 
final put into the callback.


How do we recurse from the callback? The introduced fence_put() 
of individual fence pointers
doesn't recurse anymore (at most 1 level), and any callback 
recursion is broken by the irq_work?


Yeah, but then you would need to take another lock to avoid racing 
with dma_fence_array_signaled().




I figure the big amount of work would be to adjust code that 
iterates over the individual fence pointers to recognize that 
they are rcu protected.


Could be that we could solve this with RCU, but that sounds like a 
lot of churn for no gain at all.


In other words even with the problems solved I think it would be a 
really bad idea to allow chaining of dma_fence_array objects.


Yes, that was really the question, Is it worth pursuing this? I'm 
not really suggesting we should allow this as an intentional 
feature. I'm worried, however, that if we allow these containers to 
start floating around cross-driver (or even internally) disguised 
as ordinary dma_fences, they would require a lot of driver special 
casing, or else completely unexpeced WARN_ON()s and lockdep splats 
would start to turn up, scaring people off from using them. And 
that would be a breeding ground for hairy driver-private constructs.


Well the question is why we would want to do it?

If it's to avoid inter driver lock dependencies by avoiding to call 
the callback with the spinlock held, then yes please. We had tons of 
problems with that, resulting in irq_work and work_item delegation 
all over the place.


Yes, that sounds like something desirable, but in these containers, 
what's causing the lock dependencies is the enable_signaling() 
callback that is typically called locked.





If it's to allow nesting of dma_fence_array instances, then it's 
most likely a really bad idea even if we fix all the locking order 
problems.


Well I think my use-case where I hit a dead end may illustrate what 
worries me here:


1) We use a dma-fence-array to coalesce all dependencies for ttm 
object migration.
2) We use a dma-fence-chain to order the resulting dm_fence into a 
timeline because the TTM resource manager code requires that.


Initially seemingly harmless to me.

But after a sequence evict->alloc->clear, the dma-fence-chain feeds 
into the dma-fence-array for the clearing operation. Code still works 
fine, and no deep recursion, no warnings. But if I were to add 
another driver to the system that instead feeds a dma-fence-array 
into a dma-fence-chain, this would give me a lockdep splat.


So then if somebody were to come up with the splendid idea of using a 
dma-fence-chain to initially coalesce fences, I'd hit the same 
problem or risk illegaly joining two dma-fence-chains together.


To fix this, I would need to look at the incoming fences and iterate 
over any dma-fence-array or dma-fence-chain that is fed into the 
dma-fence-array to flatten out the input. In fact all dma-fence-array 
users would need to do that, and even dma-fence-chain users watching 
out for not joining chains together or accidently add an array that 
perhaps came as a disguised dma-fence from antother driver.


So the purpose to me would be to allow these containers as input to 
eachother without a lot of in-driver special-casing, be it by 
breaking recursion on built-in flattening to avoid


a) Hitting issues in the future or with existing interoperating drivers.
b) Avoid driver-private containers that also might break the 
interoperability. (For example the i915 currently driver-private 
dma_fence_work avoid all these problems, but we're attempting to 
address issues in common code rather than re-inventing stuff 
internally).


I don't think that a dma_fence_array or dma_fence_chain is the right

Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes

2021-12-01 Thread Intel




On 12/1/21 11:32, Christian König wrote:

Am 01.12.21 um 11:15 schrieb Thomas Hellström (Intel):

[SNIP]


What we could do is to avoid all this by not calling the callback 
with the lock held in the first place.


If that's possible that might be a good idea, pls also see below.


The problem with that is 
dma_fence_signal_locked()/dma_fence_signal_timestamp_locked(). If we 
could avoid using that or at least allow it to drop the lock then we 
could call the callback without holding it.


Somebody would need to audit the drivers and see if holding the lock 
is really necessary anywhere.








/Thomas


Oh, and a follow up question:

If there was a way to break the recursion on final put() (using 
the same basic approach as patch 2 in this series uses to break 
recursion in enable_signaling()), so that none of these 
containers did require any special treatment, would it be worth 
pursuing? I guess it might be possible by having the callbacks 
drop the references rather than the loop in the final put. + a 
couple of changes in code iterating over the fence pointers.


That won't really help, you just move the recursion from the final 
put into the callback.


How do we recurse from the callback? The introduced fence_put() of 
individual fence pointers
doesn't recurse anymore (at most 1 level), and any callback 
recursion is broken by the irq_work?


Yeah, but then you would need to take another lock to avoid racing 
with dma_fence_array_signaled().




I figure the big amount of work would be to adjust code that 
iterates over the individual fence pointers to recognize that they 
are rcu protected.


Could be that we could solve this with RCU, but that sounds like a 
lot of churn for no gain at all.


In other words even with the problems solved I think it would be a 
really bad idea to allow chaining of dma_fence_array objects.


Yes, that was really the question, Is it worth pursuing this? I'm not 
really suggesting we should allow this as an intentional feature. I'm 
worried, however, that if we allow these containers to start floating 
around cross-driver (or even internally) disguised as ordinary 
dma_fences, they would require a lot of driver special casing, or 
else completely unexpeced WARN_ON()s and lockdep splats would start 
to turn up, scaring people off from using them. And that would be a 
breeding ground for hairy driver-private constructs.


Well the question is why we would want to do it?

If it's to avoid inter driver lock dependencies by avoiding to call 
the callback with the spinlock held, then yes please. We had tons of 
problems with that, resulting in irq_work and work_item delegation all 
over the place.


Yes, that sounds like something desirable, but in these containers, 
what's causing the lock dependencies is the enable_signaling() callback 
that is typically called locked.





If it's to allow nesting of dma_fence_array instances, then it's most 
likely a really bad idea even if we fix all the locking order problems.


Well I think my use-case where I hit a dead end may illustrate what 
worries me here:


1) We use a dma-fence-array to coalesce all dependencies for ttm object 
migration.
2) We use a dma-fence-chain to order the resulting dm_fence into a 
timeline because the TTM resource manager code requires that.


Initially seemingly harmless to me.

But after a sequence evict->alloc->clear, the dma-fence-chain feeds into 
the dma-fence-array for the clearing operation. Code still works fine, 
and no deep recursion, no warnings. But if I were to add another driver 
to the system that instead feeds a dma-fence-array into a 
dma-fence-chain, this would give me a lockdep splat.


So then if somebody were to come up with the splendid idea of using a 
dma-fence-chain to initially coalesce fences, I'd hit the same problem 
or risk illegaly joining two dma-fence-chains together.


To fix this, I would need to look at the incoming fences and iterate 
over any dma-fence-array or dma-fence-chain that is fed into the 
dma-fence-array to flatten out the input. In fact all dma-fence-array 
users would need to do that, and even dma-fence-chain users watching out 
for not joining chains together or accidently add an array that perhaps 
came as a disguised dma-fence from antother driver.


So the purpose to me would be to allow these containers as input to 
eachother without a lot of in-driver special-casing, be it by breaking 
recursion on built-in flattening to avoid


a) Hitting issues in the future or with existing interoperating drivers.
b) Avoid driver-private containers that also might break the 
interoperability. (For example the i915 currently driver-private 
dma_fence_work avoid all these problems, but we're attempting to address 
issues in common code rather than re-inventing stuff internally).


/Thomas




Christian.



/Thomas




Christian.




Thanks,

/Thomas

Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes

2021-12-01 Thread Intel




On 12/1/21 09:36, Christian König wrote:

Am 01.12.21 um 09:23 schrieb Thomas Hellström (Intel):

 [SNIP]
Jason and I came up with a deep dive iterator for his use case, 
but I

think we don't want to use that any more after my dma_resv rework.

In other words when you need to create a new dma_fence_array you
flatten
out the existing construct which is at worst case
dma_fence_chain->dma_fence_array->dma_fence.
Ok, Are there any cross-driver contract here, Like every driver 
using a

dma_fence_array need to check for dma_fence_chain and flatten like
above?


So far we only discussed that on the mailing list but haven't made 
any documentation for that.


OK, one other cross-driver pitfall I see is if someone accidently 
joins two fence chains together by creating a fence chain unknowingly 
using another fence chain as the @fence argument?


That would indeed be illegal and we should probably add a WARN_ON() 
for that.




The third cross-driver pitfall IMHO is the locking dependency these 
containers add. Other drivers (read at least i915) may have defined 
slightly different locking orders and that should also be addressed 
if needed, but that requires a cross driver agreement what the 
locking orders really are. Patch 1 actually addresses this, while 
keeping the container lockdep warnings for deep recursions, so at 
least I think that could serve as a discussion starter.


No, drivers should never make any assumptions on that.


Yes that i915 assumption of taking the lock of the last signaled fence 
first goes back a while in time. We should look at fixing that up, and 
document any (possibly forbidden) assumptions about fence lock locking 
orders to avoid it happening again, if there is no common cross-driver 
locking order that can be agreed.




E.g. when you need to take a look from a callback you must guarantee 
that you never have that lock taken when you call any of the dma_fence 
functions. Your patch breaks the lockdep annotation for that.


I'm pretty sure that could be fixed in a satisfactory way if needed.



What we could do is to avoid all this by not calling the callback with 
the lock held in the first place.


If that's possible that might be a good idea, pls also see below.







/Thomas


Oh, and a follow up question:

If there was a way to break the recursion on final put() (using the 
same basic approach as patch 2 in this series uses to break 
recursion in enable_signaling()), so that none of these containers 
did require any special treatment, would it be worth pursuing? I 
guess it might be possible by having the callbacks drop the 
references rather than the loop in the final put. + a couple of 
changes in code iterating over the fence pointers.


That won't really help, you just move the recursion from the final 
put into the callback.


How do we recurse from the callback? The introduced fence_put() of 
individual fence pointers
doesn't recurse anymore (at most 1 level), and any callback recursion 
is broken by the irq_work?


Yeah, but then you would need to take another lock to avoid racing 
with dma_fence_array_signaled().




I figure the big amount of work would be to adjust code that iterates 
over the individual fence pointers to recognize that they are rcu 
protected.


Could be that we could solve this with RCU, but that sounds like a lot 
of churn for no gain at all.


In other words even with the problems solved I think it would be a 
really bad idea to allow chaining of dma_fence_array objects.


Yes, that was really the question, Is it worth pursuing this? I'm not 
really suggesting we should allow this as an intentional feature. I'm 
worried, however, that if we allow these containers to start floating 
around cross-driver (or even internally) disguised as ordinary 
dma_fences, they would require a lot of driver special casing, or else 
completely unexpeced WARN_ON()s and lockdep splats would start to turn 
up, scaring people off from using them. And that would be a breeding 
ground for hairy driver-private constructs.


/Thomas




Christian.




Thanks,

/Thomas

Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes

2021-12-01 Thread Intel




On 12/1/21 08:05, Christian König wrote:

Am 30.11.21 um 20:27 schrieb Thomas Hellström:


On 11/30/21 19:12, Thomas Hellström wrote:

On Tue, 2021-11-30 at 16:02 +0100, Christian König wrote:

Am 30.11.21 um 15:35 schrieb Thomas Hellström:

On Tue, 2021-11-30 at 14:26 +0100, Christian König wrote:

Am 30.11.21 um 13:56 schrieb Thomas Hellström:

On 11/30/21 13:42, Christian König wrote:

Am 30.11.21 um 13:31 schrieb Thomas Hellström:

[SNIP]

Other than that, I didn't investigate the nesting fails
enough to
say I can accurately review this. :)

Basically the problem is that within enable_signaling()
which
is
called with the dma_fence lock held, we take the dma_fence
lock
of
another fence. If that other fence is a dma_fence_array, or
a
dma_fence_chain which in turn tries to lock a
dma_fence_array
we hit
a splat.

Yeah, I already thought that you constructed something like
that.

You get the splat because what you do here is illegal, you
can't
mix
dma_fence_array and dma_fence_chain like this or you can end
up
in a
stack corruption.

Hmm. Ok, so what is the stack corruption, is it that the
enable_signaling() will end up with endless recursion? If so,
wouldn't
it be more usable we break that recursion chain and allow a
more
general use?

The problem is that this is not easily possible for
dma_fence_array
containers. Just imagine that you drop the last reference to the
containing fences during dma_fence_array destruction if any of
the
contained fences is another container you can easily run into
recursion
and with that stack corruption.

Indeed, that would require some deeper surgery.


That's one of the major reasons I came up with the
dma_fence_chain
container. This one you can chain any number of elements together
without running into any recursion.


Also what are the mixing rules between these? Never use a
dma-fence-chain as one of the array fences and never use a
dma-fence-array as a dma-fence-chain fence?

You can't add any other container to a dma_fence_array, neither
other
dma_fence_array instances nor dma_fence_chain instances.

IIRC at least technically a dma_fence_chain can contain a
dma_fence_array if you absolutely need that, but Daniel, Jason
and I
already had the same discussion a while back and came to the
conclusion
to avoid that as well if possible.

Yes, this is actually the use-case. But what I can't easily
guarantee
is that that dma_fence_chain isn't fed into a dma_fence_array
somewhere
else. How do you typically avoid that?

Meanwhile I guess I need to take a different approach in the driver
to
avoid this altogether.

Jason and I came up with a deep dive iterator for his use case, but I
think we don't want to use that any more after my dma_resv rework.

In other words when you need to create a new dma_fence_array you
flatten
out the existing construct which is at worst case
dma_fence_chain->dma_fence_array->dma_fence.

Ok, Are there any cross-driver contract here, Like every driver using a
dma_fence_array need to check for dma_fence_chain and flatten like
above?


So far we only discussed that on the mailing list but haven't made any 
documentation for that.


OK, one other cross-driver pitfall I see is if someone accidently joins 
two fence chains together by creating a fence chain unknowingly using 
another fence chain as the @fence argument?


The third cross-driver pitfall IMHO is the locking dependency these 
containers add. Other drivers (read at least i915) may have defined 
slightly different locking orders and that should also be addressed if 
needed, but that requires a cross driver agreement what the locking 
orders really are. Patch 1 actually addresses this, while keeping the 
container lockdep warnings for deep recursions, so at least I think that 
could serve as a discussion starter.







/Thomas


Oh, and a follow up question:

If there was a way to break the recursion on final put() (using the 
same basic approach as patch 2 in this series uses to break recursion 
in enable_signaling()), so that none of these containers did require 
any special treatment, would it be worth pursuing? I guess it might 
be possible by having the callbacks drop the references rather than 
the loop in the final put. + a couple of changes in code iterating 
over the fence pointers.


That won't really help, you just move the recursion from the final put 
into the callback.


How do we recurse from the callback? The introduced fence_put() of 
individual fence pointers
doesn't recurse anymore (at most 1 level), and any callback recursion is 
broken by the irq_work?


I figure the big amount of work would be to adjust code that iterates 
over the individual fence pointers to recognize that they are rcu protected.


Thanks,

/Thomas

Re: [Intel-gfx] [PATCH] drm/i915: Add ww context to intel_dpt_pin, v2.

2021-09-29 Thread Intel




On 9/29/21 10:59, Maarten Lankhorst wrote:

Ensure i915_vma_pin_iomap and vma_unpin are done with dpt->obj lock held.

I don't think there's much of a point in merging intel_dpt_pin() with
intel_pin_fb_obj_dpt(), they touch different objects.

Changes since v1:
- Fix using the wrong pointer to retrieve error code (Julia)

Signed-off-by: Maarten Lankhorst 
Reported-by: kernel test robot 
Reported-by: Julia Lawall 


LGTM.

Reviewed-by: Thomas Hellström 


---
  drivers/gpu/drm/i915/display/intel_dpt.c | 40 +++-
  1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c 
b/drivers/gpu/drm/i915/display/intel_dpt.c
index de62bd77b15e..8f7b1f7534a4 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -121,32 +121,42 @@ struct i915_vma *intel_dpt_pin(struct i915_address_space 
*vm)
intel_wakeref_t wakeref;
struct i915_vma *vma;
void __iomem *iomem;
+   struct i915_gem_ww_ctx ww;
+   int err;
  
  	wakeref = intel_runtime_pm_get(>runtime_pm);

atomic_inc(>gpu_error.pending_fb_pin);
  
-	vma = i915_gem_object_ggtt_pin(dpt->obj, NULL, 0, 4096,

-  HAS_LMEM(i915) ? 0 : PIN_MAPPABLE);
-   if (IS_ERR(vma))
-   goto err;
+   for_i915_gem_ww(, err, true) {
+   err = i915_gem_object_lock(dpt->obj, );
+   if (err)
+   continue;
  
-	iomem = i915_vma_pin_iomap(vma);

-   i915_vma_unpin(vma);
-   if (IS_ERR(iomem)) {
-   vma = ERR_CAST(iomem);
-   goto err;
-   }
+   vma = i915_gem_object_ggtt_pin_ww(dpt->obj, , NULL, 0, 4096,
+ HAS_LMEM(i915) ? 0 : 
PIN_MAPPABLE);
+   if (IS_ERR(vma)) {
+   err = PTR_ERR(vma);
+   continue;
+   }
+
+   iomem = i915_vma_pin_iomap(vma);
+   i915_vma_unpin(vma);
  
-	dpt->vma = vma;

-   dpt->iomem = iomem;
+   if (IS_ERR(iomem)) {
+   err = PTR_ERR(iomem);
+   continue;
+   }
  
-	i915_vma_get(vma);

+   dpt->vma = vma;
+   dpt->iomem = iomem;
+
+   i915_vma_get(vma);
+   }
  
-err:

atomic_dec(>gpu_error.pending_fb_pin);
intel_runtime_pm_put(>runtime_pm, wakeref);
  
-	return vma;

+   return err ? ERR_PTR(err) : vma;
  }
  
  void intel_dpt_unpin(struct i915_address_space *vm)

[PATCH] drm/bochs: add Bochs PCI ID for Simics model

2021-09-09 Thread H. Peter Anvin (Intel)

Current (and older) Simics models for the Bochs VGA used the wrong PCI
vendor ID (0x4321 instead of 0x1234).  Although this can hopefully be
fixed in the future, it is a problem for users of the current version,
not the least because to update the device ID the BIOS has to be
rebuilt in order to see BIOS output.

Add support for the 4321: device number in addition to the
1234: one.

Signed-off-by: H. Peter Anvin (Intel) 
---
 drivers/gpu/drm/tiny/bochs.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c
index 73415fa9ae0f..2ce3bd903b70 100644
--- a/drivers/gpu/drm/tiny/bochs.c
+++ b/drivers/gpu/drm/tiny/bochs.c
@@ -63,6 +63,7 @@ MODULE_PARM_DESC(defy, "default y resolution");
 
 enum bochs_types {
BOCHS_QEMU_STDVGA,
+   BOCHS_SIMICS,
BOCHS_UNKNOWN,
 };
 
@@ -695,6 +696,13 @@ static const struct pci_device_id bochs_pci_tbl[] = {
.subdevice   = PCI_ANY_ID,
.driver_data = BOCHS_UNKNOWN,
},
+   {
+   .vendor  = 0x4321,
+   .device  = 0x,
+   .subvendor   = PCI_ANY_ID,
+   .subdevice   = PCI_ANY_ID,
+   .driver_data = BOCHS_SIMICS,
+   },
{ /* end of list */ }
 };
 
-- 
2.31.1

Re: [Intel-gfx] [PATCH 7/8] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)

2021-09-01 Thread Intel


Hi, Jason,

A quick question below:

On 7/23/21 7:21 PM, Jason Ekstrand wrote:

From: Thomas Hellström 

If our exported dma-bufs are imported by another instance of our driver,
that instance will typically have the imported dma-bufs locked during
dma_buf_map_attachment(). But the exporter also locks the same reservation
object in the map_dma_buf() callback, which leads to recursive locking.

So taking the lock inside _pin_pages_unlocked() is incorrect.

Additionally, the current pinning code path is contrary to the defined
way that pinning should occur.

Remove the explicit pin/unpin from the map/umap functions and move them
to the attach/detach allowing correct locking to occur, and to match
the static dma-buf drm_prime pattern.

Add a live selftest to exercise both dynamic and non-dynamic
exports.

v2:
- Extend the selftest with a fake dynamic importer.
- Provide real pin and unpin callbacks to not abuse the interface.
v3: (ruhl)
- Remove the dynamic export support and move the pinning into the
   attach/detach path.
v4: (ruhl)
- Put pages does not need to assert on the dma-resv
v5: (jason)
- Lock around dma_buf_unmap_attachment() when emulating a dynamic
   importer in the subtests.
- Use pin_pages_unlocked
v6: (jason)
- Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests


Why did we drop the dynamic importer from the selftests? Shouldn't we 
try to ensure compatibility with dynamic importers?


/Thomas

Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1

2021-08-06 Thread Intel




On 8/6/21 1:34 PM, Thomas Hellström (Intel) wrote:

Hi,

On 8/3/21 7:26 PM, Matthew Brost wrote:

On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote:
On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost 
 wrote:
Minimum set of patches to enable GuC submission on DG1 and enable 
it by

default.

A little difficult to test as IGTs do not work with DG1 due to a bunch
of uAPI features being disabled (e.g. relocations, caching memory
options, etc...).

Matt Auld has an igt series which fixes a lot of this stuff, would be
good to do at least a Test-With run with that.


It looks like Maarten now merged Matt's series to IGT.

There is a series on IGT trybot with pending work to have some igt 
tests support relocations,


https://patchwork.freedesktop.org/series/92043/

One of the tests that have WIP fixes is gem_exec_whisper, and that 
particular test has historically shown occasional hangs with GuC 
submission on DG1 so it would be very desirable if we could make that 
test in particular work (I haven't verified that that's the case) 
reliably.


Also the following series:

https://patchwork.freedesktop.org/series/93455/

tries a bit harder to get some more tests running, squashing the above 
series on top of latest IGT.


Thanks,
/Thomas


Just verified that gem-exec-whisper is running now on DG1 on latest igt 
and the above series. Haven't tried with GuC submission, though.


/Thomas

Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1

2021-08-06 Thread Intel




On 8/6/21 1:34 PM, Thomas Hellström (Intel) wrote:

Hi,

On 8/3/21 7:26 PM, Matthew Brost wrote:

On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote:
On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost 
 wrote:
Minimum set of patches to enable GuC submission on DG1 and enable 
it by

default.

A little difficult to test as IGTs do not work with DG1 due to a bunch
of uAPI features being disabled (e.g. relocations, caching memory
options, etc...).

Matt Auld has an igt series which fixes a lot of this stuff, would be
good to do at least a Test-With run with that.


It looks like Maarten now merged Matt's series to IGT.

There is a series on IGT trybot with pending work to have some igt 
tests support relocations,


https://patchwork.freedesktop.org/series/92043/

One of the tests that have WIP fixes is gem_exec_whisper, and that 
particular test has historically shown occasional hangs with GuC 
submission on DG1 so it would be very desirable if we could make that 
test in particular work (I haven't verified that that's the case) 
reliably.


Also the following series:

https://patchwork.freedesktop.org/series/93455/

tries a bit harder to get some more tests running, squashing the above 
series on top of latest IGT.


Thanks,
/Thomas

And also while we're working on getting igt adapted to uapi changes and 
to get more LMEM coverage in there, an IMO relevant test case to run 
manually is "piglit quick"on top of DG1-enabled OpenGL checking for 
regressions and significant changes in execution time.


/Thomas

Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1

2021-08-06 Thread Intel


Hi,

On 8/3/21 7:26 PM, Matthew Brost wrote:

On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote:

On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost  wrote:

Minimum set of patches to enable GuC submission on DG1 and enable it by
default.

A little difficult to test as IGTs do not work with DG1 due to a bunch
of uAPI features being disabled (e.g. relocations, caching memory
options, etc...).

Matt Auld has an igt series which fixes a lot of this stuff, would be
good to do at least a Test-With run with that.


It looks like Maarten now merged Matt's series to IGT.

There is a series on IGT trybot with pending work to have some igt tests 
support relocations,


https://patchwork.freedesktop.org/series/92043/

One of the tests that have WIP fixes is gem_exec_whisper, and that 
particular test has historically shown occasional hangs with GuC 
submission on DG1 so it would be very desirable if we could make that 
test in particular work (I haven't verified that that's the case) reliably.


Also the following series:

https://patchwork.freedesktop.org/series/93455/

tries a bit harder to get some more tests running, squashing the above 
series on top of latest IGT.


Thanks,
/Thomas

Re: [PATCH 1/4] drm/ttm: add a pointer to the allocating BO into ttm_resource

2021-06-10 Thread Intel


Hi, Christian,

I know you have a lot on your plate, and that the drm community is a bit 
lax about following the kernel patch submitting guidelines, but now that 
we're also spinning up a number of Intel developers on TTM could we 
please make a better effort with cover letters and commit messages so 
that they understand what the purpose and end goal of the series is. A 
reviewer shouldn't have to look at the last patch to try to get an 
understanding what the series is doing and why.


On 6/10/21 1:05 PM, Christian König wrote:

We are going to need this for the next patch




and it allows us to clean
up amdgpu as well.


The amdgpu changes are not reflected in the commit title.




Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 47 -
  drivers/gpu/drm/ttm/ttm_resource.c  |  1 +
  include/drm/ttm/ttm_resource.h  |  1 +
  3 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 194f9eecf89c..8e3f5da44e4f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -26,23 +26,12 @@
  
  #include "amdgpu.h"
  
-struct amdgpu_gtt_node {

-   struct ttm_buffer_object *tbo;
-   struct ttm_range_mgr_node base;
-};
-
  static inline struct amdgpu_gtt_mgr *
  to_gtt_mgr(struct ttm_resource_manager *man)
  {
return container_of(man, struct amdgpu_gtt_mgr, manager);
  }
  
-static inline struct amdgpu_gtt_node *

-to_amdgpu_gtt_node(struct ttm_resource *res)
-{
-   return container_of(res, struct amdgpu_gtt_node, base.base);
-}
-
  /**
   * DOC: mem_info_gtt_total
   *
@@ -107,9 +96,9 @@ const struct attribute_group amdgpu_gtt_mgr_attr_group = {
   */
  bool amdgpu_gtt_mgr_has_gart_addr(struct ttm_resource *res)
  {
-   struct amdgpu_gtt_node *node = to_amdgpu_gtt_node(res);
+   struct ttm_range_mgr_node *node = to_ttm_range_mgr_node(res);
  
-	return drm_mm_node_allocated(>base.mm_nodes[0]);

+   return drm_mm_node_allocated(>mm_nodes[0]);
  }
  
  /**

@@ -129,7 +118,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
  {
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
uint32_t num_pages = PFN_UP(tbo->base.size);
-   struct amdgpu_gtt_node *node;
+   struct ttm_range_mgr_node *node;
int r;
  
  	spin_lock(>lock);

@@ -141,19 +130,17 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
atomic64_sub(num_pages, >available);
spin_unlock(>lock);
  
-	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);

+   node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL);
if (!node) {
r = -ENOMEM;
goto err_out;
}
  
-	node->tbo = tbo;

-   ttm_resource_init(tbo, place, >base.base);
-
+   ttm_resource_init(tbo, place, >base);
if (place->lpfn) {
spin_lock(>lock);
r = drm_mm_insert_node_in_range(>mm,
-   >base.mm_nodes[0],
+   >mm_nodes[0],
num_pages, tbo->page_alignment,
0, place->fpfn, place->lpfn,
DRM_MM_INSERT_BEST);
@@ -161,14 +148,14 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
if (unlikely(r))
goto err_free;
  
-		node->base.base.start = node->base.mm_nodes[0].start;

+   node->base.start = node->mm_nodes[0].start;
} else {
-   node->base.mm_nodes[0].start = 0;
-   node->base.mm_nodes[0].size = node->base.base.num_pages;
-   node->base.base.start = AMDGPU_BO_INVALID_OFFSET;
+   node->mm_nodes[0].start = 0;
+   node->mm_nodes[0].size = node->base.num_pages;
+   node->base.start = AMDGPU_BO_INVALID_OFFSET;
}
  
-	*res = >base.base;

+   *res = >base;
return 0;
  
  err_free:

@@ -191,12 +178,12 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager 
*man,
  static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man,
   struct ttm_resource *res)
  {
-   struct amdgpu_gtt_node *node = to_amdgpu_gtt_node(res);
+   struct ttm_range_mgr_node *node = to_ttm_range_mgr_node(res);
struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man);
  
  	spin_lock(>lock);

-   if (drm_mm_node_allocated(>base.mm_nodes[0]))
-   drm_mm_remove_node(>base.mm_nodes[0]);
+   if (drm_mm_node_allocated(>mm_nodes[0]))
+   drm_mm_remove_node(>mm_nodes[0]);
spin_unlock(>lock);
atomic64_add(res

Re: [PATCH 3/9] drm/vmwgfx: Fix subresource updates with new contexts

2021-06-10 Thread Intel


Hi,

On 6/9/21 7:23 PM, Zack Rusin wrote:

The has_dx variable was only set during the initialization which
meant that UPDATE_SUBRESOURCE was never used. We were emulating it
with UPDATE_GB_IMAGE but that's always been a stop-gap. Instead
of has_dx which has been deprecated a long time ago we need to check
for whether shader model 4.0 or newer is available to the device.


Stupid question perhaps, but isn't UPDATE_SUBRESOURCE available with 
SVGA_CAP_DX regardless of the SM capabilities of the underlying device?


Thanks,

Thomas

Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2

2021-06-08 Thread Intel


Hi,

On 6/8/21 9:14 AM, Christian König wrote:

Am 08.06.21 um 07:29 schrieb Thomas Hellström (Intel):

Hi,

On 6/7/21 7:59 PM, Christian König wrote:



Am 07.06.21 um 19:58 schrieb Thomas Hellström (Intel):


On 6/7/21 7:54 PM, Christian König wrote:



Am 07.06.21 um 19:06 schrieb Thomas Hellström (Intel):


On 6/7/21 6:40 PM, Thomas Hellström (Intel) wrote:


On 6/2/21 12:09 PM, Christian König wrote:
...
@@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct 
ttm_buffer_object *bo,

   */
  static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
    const struct ttm_place *place,
-  struct ttm_resource *mem,
+  struct ttm_resource **mem,
    struct ttm_operation_ctx *ctx)
  {
  struct ttm_device *bdev = bo->bdev;
-    struct ttm_resource_manager *man = ttm_manager_type(bdev, 
mem->mem_type);

+    struct ttm_resource_manager *man;
  struct ww_acquire_ctx *ticket;
  int ret;
  +    man = ttm_manager_type(bdev, (*mem)->mem_type);


Isn't (*mem) uninitialized here? Should be place->mem_type? 
Eviction is immediately sent to the bushes.


Got at least one additional NULL pointer deref to track down in 
the eviction code, but could be a merge error of mine as well.


Actually this last one was probably due to a bad temporary fix of 
the above one.


I've found one more warning during my testing now. But that is 
just a false positive.


Apart from that I haven't seen any other fallout, but fingers 
crossed.


vmwgfx doesn't seem to happy. It works AFAICT., but warns in 
vmw_move() about ttm_bo_assign_mem() replacing an existing resource.


Yeah, that's the one I've just fixed. Patch is underway.


If that's the move_to_ghost patch, I don't think that would fix the 
vmwgfx issue, as IIRC vmwgfx ever uses ghost objects.


Mhm, could be that vmwgfx is hitting the same warning with a different 
backtrace.


Do you have the log to double check?


Unfortunately not, but IIRC it was directly from vmw_move().

/Thomas

Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2

2021-06-07 Thread Intel


Hi,

On 6/7/21 7:59 PM, Christian König wrote:



Am 07.06.21 um 19:58 schrieb Thomas Hellström (Intel):


On 6/7/21 7:54 PM, Christian König wrote:



Am 07.06.21 um 19:06 schrieb Thomas Hellström (Intel):


On 6/7/21 6:40 PM, Thomas Hellström (Intel) wrote:


On 6/2/21 12:09 PM, Christian König wrote:
...
@@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct 
ttm_buffer_object *bo,

   */
  static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
    const struct ttm_place *place,
-  struct ttm_resource *mem,
+  struct ttm_resource **mem,
    struct ttm_operation_ctx *ctx)
  {
  struct ttm_device *bdev = bo->bdev;
-    struct ttm_resource_manager *man = ttm_manager_type(bdev, 
mem->mem_type);

+    struct ttm_resource_manager *man;
  struct ww_acquire_ctx *ticket;
  int ret;
  +    man = ttm_manager_type(bdev, (*mem)->mem_type);


Isn't (*mem) uninitialized here? Should be place->mem_type? 
Eviction is immediately sent to the bushes.


Got at least one additional NULL pointer deref to track down in 
the eviction code, but could be a merge error of mine as well.


Actually this last one was probably due to a bad temporary fix of 
the above one.


I've found one more warning during my testing now. But that is just 
a false positive.


Apart from that I haven't seen any other fallout, but fingers crossed.


vmwgfx doesn't seem to happy. It works AFAICT., but warns in 
vmw_move() about ttm_bo_assign_mem() replacing an existing resource.


Yeah, that's the one I've just fixed. Patch is underway.


If that's the move_to_ghost patch, I don't think that would fix the 
vmwgfx issue, as IIRC vmwgfx ever uses ghost objects.


/Thomas

Re: [PATCH] drm/ttm: fix warning after moving resource to ghost obj

2021-06-07 Thread Intel




On 6/7/21 7:57 PM, Christian König wrote:

After we moved the resource to the ghost the bo->resource pointer needs
to be resetted since the owner of the resource is now the ghost.

s/resetted/reset/

Signed-off-by: Christian König 


Reviewed-by: Thomas Hellström

Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2

2021-06-07 Thread Intel




On 6/7/21 7:54 PM, Christian König wrote:



Am 07.06.21 um 19:06 schrieb Thomas Hellström (Intel):


On 6/7/21 6:40 PM, Thomas Hellström (Intel) wrote:


On 6/2/21 12:09 PM, Christian König wrote:
...
@@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct 
ttm_buffer_object *bo,

   */
  static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
    const struct ttm_place *place,
-  struct ttm_resource *mem,
+  struct ttm_resource **mem,
    struct ttm_operation_ctx *ctx)
  {
  struct ttm_device *bdev = bo->bdev;
-    struct ttm_resource_manager *man = ttm_manager_type(bdev, 
mem->mem_type);

+    struct ttm_resource_manager *man;
  struct ww_acquire_ctx *ticket;
  int ret;
  +    man = ttm_manager_type(bdev, (*mem)->mem_type);


Isn't (*mem) uninitialized here? Should be place->mem_type? Eviction 
is immediately sent to the bushes.


Got at least one additional NULL pointer deref to track down in the 
eviction code, but could be a merge error of mine as well.


Actually this last one was probably due to a bad temporary fix of the 
above one.


I've found one more warning during my testing now. But that is just a 
false positive.


Apart from that I haven't seen any other fallout, but fingers crossed.


vmwgfx doesn't seem to happy. It works AFAICT., but warns in vmw_move() 
about ttm_bo_assign_mem() replacing an existing resource.


/Thomas





Christian.



/Thomas

Re: [PATCH] drm/ttm: fix access to uninitialized variable.

2021-06-07 Thread Intel




On 6/7/21 7:11 PM, Christian König wrote:

The resource is not allocated yet, so no chance that this will work.

Use the placement instead.

Signed-off-by: Christian König 


Reviewed-by: Thomas Hellström 



---
  drivers/gpu/drm/ttm/ttm_bo.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 7e7284da5630..0c74f4cb2a3b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -736,7 +736,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object 
*bo,
struct ww_acquire_ctx *ticket;
int ret;
  
-	man = ttm_manager_type(bdev, (*mem)->mem_type);

+   man = ttm_manager_type(bdev, place->mem_type);
ticket = dma_resv_locking_ctx(bo->base.resv);
do {
ret = ttm_resource_alloc(bo, place, mem);

Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2

2021-06-07 Thread Intel




On 6/7/21 6:40 PM, Thomas Hellström (Intel) wrote:


On 6/2/21 12:09 PM, Christian König wrote:
...
@@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct 
ttm_buffer_object *bo,

   */
  static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
    const struct ttm_place *place,
-  struct ttm_resource *mem,
+  struct ttm_resource **mem,
    struct ttm_operation_ctx *ctx)
  {
  struct ttm_device *bdev = bo->bdev;
-    struct ttm_resource_manager *man = ttm_manager_type(bdev, 
mem->mem_type);

+    struct ttm_resource_manager *man;
  struct ww_acquire_ctx *ticket;
  int ret;
  +    man = ttm_manager_type(bdev, (*mem)->mem_type);


Isn't (*mem) uninitialized here? Should be place->mem_type? Eviction 
is immediately sent to the bushes.


Got at least one additional NULL pointer deref to track down in the 
eviction code, but could be a merge error of mine as well.


Actually this last one was probably due to a bad temporary fix of the 
above one.


/Thomas

Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2

2021-06-07 Thread Intel




On 6/2/21 12:09 PM, Christian König wrote:
...

@@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
   */
  static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo,
  const struct ttm_place *place,
- struct ttm_resource *mem,
+ struct ttm_resource **mem,
  struct ttm_operation_ctx *ctx)
  {
struct ttm_device *bdev = bo->bdev;
-   struct ttm_resource_manager *man = ttm_manager_type(bdev, 
mem->mem_type);
+   struct ttm_resource_manager *man;
struct ww_acquire_ctx *ticket;
int ret;
  
+	man = ttm_manager_type(bdev, (*mem)->mem_type);


Isn't (*mem) uninitialized here? Should be place->mem_type? Eviction is 
immediately sent to the bushes.


Got at least one additional NULL pointer deref to track down in the 
eviction code, but could be a merge error of mine as well.


/Thomas

Re: [PATCH] drm/ttm: nuke VM_MIXEDMAP on BO mappings v2

2021-06-07 Thread Intel




On 6/7/21 3:58 PM, Christian König wrote:

We discussed if that is really the right approach for quite a while now, but
digging deeper into a bug report on arm turned out that this is actually
horrible broken right now.

The reason for this is that vmf_insert_mixed_prot() always tries to grab
a reference to the underlaying page on architectures without
ARCH_HAS_PTE_SPECIAL and as far as I can see also enabled GUP.

So nuke using VM_MIXEDMAP here and use VM_PFNMAP instead.

Also make sure to reject mappings without VM_SHARED.

v2: reject COW mappings, merge function with only caller

Signed-off-by: Christian König 
Bugs: https://gitlab.freedesktop.org/drm/amd/-/issues/1606#note_936174
---
  drivers/gpu/drm/ttm/ttm_bo_vm.c | 44 +++--
  1 file changed, 14 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 61828488ae2b..c9edb75626d9 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -359,12 +359,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 * at arbitrary times while the data is mmap'ed.
 * See vmf_insert_mixed_prot() for a discussion.
 */
-   if (vma->vm_flags & VM_MIXEDMAP)
-   ret = vmf_insert_mixed_prot(vma, address,
-   __pfn_to_pfn_t(pfn, 
PFN_DEV),
-   prot);
-   else
-   ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+   ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
  
  		/* Never error on prefaulted PTEs */

if (unlikely((ret & VM_FAULT_ERROR))) {
@@ -411,15 +406,9 @@ vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, 
pgprot_t prot)
pfn = page_to_pfn(page);
  
  	/* Prefault the entire VMA range right away to avoid further faults */

-   for (address = vma->vm_start; address < vma->vm_end; address += 
PAGE_SIZE) {
-
-   if (vma->vm_flags & VM_MIXEDMAP)
-   ret = vmf_insert_mixed_prot(vma, address,
-   __pfn_to_pfn_t(pfn, 
PFN_DEV),
-   prot);
-   else
-   ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
-   }
+   for (address = vma->vm_start; address < vma->vm_end;
+address += PAGE_SIZE)
+   ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
  
  	return ret;

  }
@@ -560,8 +549,16 @@ static const struct vm_operations_struct ttm_bo_vm_ops = {
.access = ttm_bo_vm_access,
  };
  
-static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_struct *vma)

+int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo)
  {
+   /* Enforce VM_SHARED here since without it we would have really strange
+* behavior on COW.
+*/


Nit: Perhaps "Enforce no COW.." since mappings are allowed with 
VM_SHARED iff VM_MAYWRITE is not set. Also style consistency with 
comments: First /* followed by line-break or are you adapting the above 
style for ttm?


With that fixed,

Reviewed-by: Thomas Hellström 



+   if (is_cow_mapping(vma->vm_flags))
+   return -EINVAL;
+
+   ttm_bo_get(bo);
+
/*
 * Drivers may want to override the vm_ops field. Otherwise we
 * use TTM's default callbacks.
@@ -576,21 +573,8 @@ static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object 
*bo, struct vm_area_s
  
  	vma->vm_private_data = bo;
  
-	/*

-* We'd like to use VM_PFNMAP on shared mappings, where
-* (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
-* but for some reason VM_PFNMAP + x86 PAT + write-combine is very
-* bad for performance. Until that has been sorted out, use
-* VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
-*/
-   vma->vm_flags |= VM_MIXEDMAP;
+   vma->vm_flags |= VM_PFNMAP;
vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
-}
-
-int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo)
-{
-   ttm_bo_get(bo);
-   ttm_bo_mmap_vma_setup(bo, vma);
return 0;
  }
  EXPORT_SYMBOL(ttm_bo_mmap_obj);

Re: [PATCH 1/4] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3

2021-06-07 Thread Intel


Sure. LGTM,

Reviewed-by: Thomas Hellström 


On 6/7/21 2:36 PM, Christian König wrote:

Thomas any comments on this?

Is the purpose of this now clear enough?

Thanks,
Christian.

Am 01.06.21 um 14:25 schrieb Christian König:

From: Lang Yu 

Sometimes drivers need to use bounce buffers to evict BOs. While 
those reside

in some domain they are not necessarily suitable for CS.

Add a flag so that drivers can note that a bounce buffers needs to be
reallocated during validation.

v2: add detailed comments
v3 (chk): merge commits and rework commit message

Suggested-by: Christian König 
Signed-off-by: Lang Yu 
Signed-off-by: Christian König 
---
  drivers/gpu/drm/ttm/ttm_bo.c    | 3 +++
  include/drm/ttm/ttm_placement.h | 7 +--
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 51a94fd63bd7..6b393502198e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -929,6 +929,9 @@ static bool ttm_bo_places_compat(const struct 
ttm_place *places,

  {
  unsigned i;
  +    if (mem->placement & TTM_PL_FLAG_TEMPORARY)
+    return false;
+
  for (i = 0; i < num_placement; i++) {
  const struct ttm_place *heap = [i];
  diff --git a/include/drm/ttm/ttm_placement.h 
b/include/drm/ttm/ttm_placement.h

index aa6ba4d0cf78..8995c9e4ec1b 100644
--- a/include/drm/ttm/ttm_placement.h
+++ b/include/drm/ttm/ttm_placement.h
@@ -47,8 +47,11 @@
   * top of the memory area, instead of the bottom.
   */
  -#define TTM_PL_FLAG_CONTIGUOUS  (1 << 19)
-#define TTM_PL_FLAG_TOPDOWN (1 << 22)
+#define TTM_PL_FLAG_CONTIGUOUS  (1 << 0)
+#define TTM_PL_FLAG_TOPDOWN (1 << 1)
+
+/* For multihop handling */
+#define TTM_PL_FLAG_TEMPORARY   (1 << 2)
    /**
   * struct ttm_place

Re: [PATCH] drm/ttm: fix missing res assignment in ttm_range_man_alloc

2021-06-07 Thread Intel




On 6/7/21 12:40 PM, Christian König wrote:

That somehow got missing.

Signed-off-by: Christian König 
Fixes: cb1c81467af3 ("drm/ttm: flip the switch for driver allocated resources 
v2")


Reported-by: Thomas Hellström 

Reviewed-by: Thomas Hellström 



---
  drivers/gpu/drm/ttm/ttm_range_manager.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index c32e1aee2481..03395386e8a7 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -88,12 +88,14 @@ static int ttm_range_man_alloc(struct ttm_resource_manager 
*man,
  place->fpfn, lpfn, mode);
spin_unlock(>lock);
  
-	if (unlikely(ret))

+   if (unlikely(ret)) {
kfree(node);
-   else
-   node->base.start = node->mm_nodes[0].start;
+   return ret;
+   }
  
-	return ret;

+   node->base.start = node->mm_nodes[0].start;
+   *res = >base;
+   return 0;
  }
  
  static void ttm_range_man_free(struct ttm_resource_manager *man,

Re: [PATCH 10/10] drm/ttm: flip the switch for driver allocated resources v2

2021-06-07 Thread Intel




On 6/7/21 12:37 PM, Christian König wrote:

Am 07.06.21 um 12:15 schrieb Thomas Hellström (Intel):


On 6/2/21 12:09 PM, Christian König wrote:

Instead of both driver and TTM allocating memory finalize embedding the
ttm_resource object as base into the driver backends.

v2: fix typo in vmwgfx grid mgr and double init in amdgpu_vram_mgr.c

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   | 44 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    |  2 +-
  .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h    |  5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 60 
+--

  drivers/gpu/drm/drm_gem_vram_helper.c |  3 +-
  drivers/gpu/drm/nouveau/nouveau_bo.c  |  8 +--
  drivers/gpu/drm/nouveau/nouveau_mem.c | 11 ++--
  drivers/gpu/drm/nouveau/nouveau_mem.h | 14 ++---
  drivers/gpu/drm/nouveau/nouveau_ttm.c | 32 +-
  drivers/gpu/drm/ttm/ttm_range_manager.c   | 23 +++
  drivers/gpu/drm/ttm/ttm_resource.c    | 18 +-
  drivers/gpu/drm/ttm/ttm_sys_manager.c | 12 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 24 
  drivers/gpu/drm/vmwgfx/vmwgfx_thp.c   | 27 -
  include/drm/ttm/ttm_range_manager.h   |  3 +-
  include/drm/ttm/ttm_resource.h    | 43 ++---
  16 files changed, 140 insertions(+), 189 deletions(-)

...
  diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c

index ce5d07ca384c..c32e1aee2481 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -58,7 +58,7 @@ to_range_manager(struct ttm_resource_manager *man)
  static int ttm_range_man_alloc(struct ttm_resource_manager *man,
 struct ttm_buffer_object *bo,
 const struct ttm_place *place,
-   struct ttm_resource *mem)
+   struct ttm_resource **res)
  {
  struct ttm_range_manager *rman = to_range_manager(man);
  struct ttm_range_mgr_node *node;
@@ -83,37 +83,30 @@ static int ttm_range_man_alloc(struct 
ttm_resource_manager *man,

    spin_lock(>lock);
  ret = drm_mm_insert_node_in_range(mm, >mm_nodes[0],
-  mem->num_pages, bo->page_alignment, 0,
+  node->base.num_pages,
+  bo->page_alignment, 0,
    place->fpfn, lpfn, mode);
  spin_unlock(>lock);
  -    if (unlikely(ret)) {
+    if (unlikely(ret))
  kfree(node);
-    } else {
-    mem->mm_node = >mm_nodes[0];
-    mem->start = node->mm_nodes[0].start;
-    }
+    else
+    node->base.start = node->mm_nodes[0].start;
    return ret;
  }


Looks like this patch forgets to assign *@res. Null pointer derefs 
when testing i915.


I should really CC the Intel list for TTM patches as well. The CI 
system should have spotted that.


Unfortunately, the dg1 system is not participating in CI yet AFAICT, but 
moving forward I think it's a good idea.


/Thomas

Re: [PATCH 10/10] drm/ttm: flip the switch for driver allocated resources v2

2021-06-07 Thread Intel




On 6/2/21 12:09 PM, Christian König wrote:

Instead of both driver and TTM allocating memory finalize embedding the
ttm_resource object as base into the driver backends.

v2: fix typo in vmwgfx grid mgr and double init in amdgpu_vram_mgr.c

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c   | 44 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|  2 +-
  .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h|  5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 60 +--
  drivers/gpu/drm/drm_gem_vram_helper.c |  3 +-
  drivers/gpu/drm/nouveau/nouveau_bo.c  |  8 +--
  drivers/gpu/drm/nouveau/nouveau_mem.c | 11 ++--
  drivers/gpu/drm/nouveau/nouveau_mem.h | 14 ++---
  drivers/gpu/drm/nouveau/nouveau_ttm.c | 32 +-
  drivers/gpu/drm/ttm/ttm_range_manager.c   | 23 +++
  drivers/gpu/drm/ttm/ttm_resource.c| 18 +-
  drivers/gpu/drm/ttm/ttm_sys_manager.c | 12 ++--
  drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 24 
  drivers/gpu/drm/vmwgfx/vmwgfx_thp.c   | 27 -
  include/drm/ttm/ttm_range_manager.h   |  3 +-
  include/drm/ttm/ttm_resource.h| 43 ++---
  16 files changed, 140 insertions(+), 189 deletions(-)

...
  
diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c b/drivers/gpu/drm/ttm/ttm_range_manager.c

index ce5d07ca384c..c32e1aee2481 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -58,7 +58,7 @@ to_range_manager(struct ttm_resource_manager *man)
  static int ttm_range_man_alloc(struct ttm_resource_manager *man,
   struct ttm_buffer_object *bo,
   const struct ttm_place *place,
-  struct ttm_resource *mem)
+  struct ttm_resource **res)
  {
struct ttm_range_manager *rman = to_range_manager(man);
struct ttm_range_mgr_node *node;
@@ -83,37 +83,30 @@ static int ttm_range_man_alloc(struct ttm_resource_manager 
*man,
  
  	spin_lock(>lock);

ret = drm_mm_insert_node_in_range(mm, >mm_nodes[0],
- mem->num_pages, bo->page_alignment, 0,
+ node->base.num_pages,
+ bo->page_alignment, 0,
  place->fpfn, lpfn, mode);
spin_unlock(>lock);
  
-	if (unlikely(ret)) {

+   if (unlikely(ret))
kfree(node);
-   } else {
-   mem->mm_node = >mm_nodes[0];
-   mem->start = node->mm_nodes[0].start;
-   }
+   else
+   node->base.start = node->mm_nodes[0].start;
  
  	return ret;

  }


Looks like this patch forgets to assign *@res. Null pointer derefs when 
testing i915.


BTW shouldn't we return the struct ttm_resource ptr here rather than 
passing it as an argument?


/Thomas

Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2

2021-06-04 Thread Intel


Hi, Christian,

It looks like all patches in the series have been reviewed or acked by 
Matthew,

and while still a little worried about the final outcome of embedding a
struct ttm_mem_resource, FWIW,

Acked-by: Thomas Hellström 

/Thomas

On 6/2/21 12:09 PM, Christian König wrote:

To improve the handling we want the establish the resource object as base
class for the backend allocations.

v2: add missing error handling

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 54 +++---
  drivers/gpu/drm/nouveau/nouveau_bo.c   |  2 +-
  drivers/gpu/drm/radeon/radeon_ttm.c|  2 +-
  drivers/gpu/drm/ttm/ttm_bo.c   | 83 --
  drivers/gpu/drm/ttm/ttm_bo_util.c  | 43 ++-
  drivers/gpu/drm/ttm/ttm_resource.c | 31 +---
  drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c |  2 +-
  include/drm/ttm/ttm_bo_api.h   |  1 -
  include/drm/ttm/ttm_bo_driver.h| 10 ++-
  include/drm/ttm/ttm_resource.h |  4 +-
  11 files changed, 110 insertions(+), 126 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 03c6b63d1d54..59723c3d5826 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -362,14 +362,14 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev,
if (cpu_addr)
amdgpu_bo_kunmap(*bo_ptr);
  
-	ttm_resource_free(&(*bo_ptr)->tbo, (*bo_ptr)->tbo.resource);

+   ttm_resource_free(&(*bo_ptr)->tbo, &(*bo_ptr)->tbo.resource);
  
  	for (i = 0; i < (*bo_ptr)->placement.num_placement; ++i) {

(*bo_ptr)->placements[i].fpfn = offset >> PAGE_SHIFT;
(*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT;
}
r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement,
-(*bo_ptr)->tbo.resource, );
+&(*bo_ptr)->tbo.resource, );
if (r)
goto error;
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 663aa7d2e2ea..69db89261650 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -491,7 +491,7 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, 
bool evict,
return r;
  
  		amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm);

-   ttm_resource_free(bo, bo->resource);
+   ttm_resource_free(bo, >resource);
ttm_bo_assign_mem(bo, new_mem);
goto out;
}
@@ -950,9 +950,9 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)
struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
struct ttm_operation_ctx ctx = { false, false };
struct amdgpu_ttm_tt *gtt = (void *)bo->ttm;
-   struct ttm_resource tmp;
struct ttm_placement placement;
struct ttm_place placements;
+   struct ttm_resource *tmp;
uint64_t addr, flags;
int r;
  
@@ -962,37 +962,37 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo)

addr = amdgpu_gmc_agp_addr(bo);
if (addr != AMDGPU_BO_INVALID_OFFSET) {
bo->resource->start = addr >> PAGE_SHIFT;
-   } else {
+   return 0;
+   }
  
-		/* allocate GART space */

-   placement.num_placement = 1;
-   placement.placement = 
-   placement.num_busy_placement = 1;
-   placement.busy_placement = 
-   placements.fpfn = 0;
-   placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
-   placements.mem_type = TTM_PL_TT;
-   placements.flags = bo->resource->placement;
-
-   r = ttm_bo_mem_space(bo, , , );
-   if (unlikely(r))
-   return r;
+   /* allocate GART space */
+   placement.num_placement = 1;
+   placement.placement = 
+   placement.num_busy_placement = 1;
+   placement.busy_placement = 
+   placements.fpfn = 0;
+   placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
+   placements.mem_type = TTM_PL_TT;
+   placements.flags = bo->resource->placement;
  
-		/* compute PTE flags for this buffer object */

-   flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, );
+   r = ttm_bo_mem_space(bo, , , );
+   if (unlikely(r))
+   return r;
  
-		/* Bind pages */

-   gtt->offset = (u64)tmp.start << PAGE_SHIFT;
-   r = amdgpu_ttm_gart_bind(adev, bo, flags);
-   if (unlikely(r)) {
-   ttm_resource_free(bo, );
-   return r;
-   }
+   /* compute PTE flags for this buffer object */
+   flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, tmp);
  
-		ttm_resource_free(bo, bo->resource);

-

Re: [PATCH] drm/ttm: nuke VM_MIXEDMAP on BO mappings

2021-06-02 Thread Intel




On 6/2/21 8:36 PM, Daniel Vetter wrote:

On Wed, Jun 02, 2021 at 02:21:17PM +0200, Thomas Hellström wrote:

On 6/2/21 2:04 PM, Christian König wrote:


Am 02.06.21 um 13:24 schrieb Thomas Hellström (Intel):

[SNIP]

@@ -576,14 +565,10 @@ static void
ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo,
struct vm_area_s
     vma->vm_private_data = bo;
   -    /*
- * We'd like to use VM_PFNMAP on shared mappings, where
- * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
- * but for some reason VM_PFNMAP + x86 PAT +
write-combine is very
- * bad for performance. Until that has been sorted out, use
- * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
+    /* Enforce VM_SHARED here since no driver backend
actually supports COW
+ * on TTM buffer object mappings.

I think by default all TTM drivers support COW mappings in
the sense that written data never makes it to the bo but
stays in anonymous pages, although I can't find a single
usecase. So comment should be changed to state that they are
useless for us and that we can't support COW mappings with
VM_PFNMAP.

Well the problem I see with that is that it only works as long
as the BO is in system memory. When it then suddenly migrates to
VRAM everybody sees the same content again and the COW pages are
dropped. That is really inconsistent and I can't see why we
would want to do that.

Hmm, yes, that's actually a bug in drm_vma_manager().

Hui? How is that related to drm_vma_manager() ?


Last argument of "unmap_mapping_range()" is "even_cows".

Additionally to that when you allow COW mappings you need to
make sure your COWed pages have the right caching attribute and
that the reference count is initialized and taken into account
properly. Not driver actually gets that right at the moment.

I was under the impression that COW'ed pages were handled
transparently by the vm, you'd always get cached properly refcounted
COW'ed pages but anyway since we're going to ditch support for them,
doesn't really matter.

Yeah, but I would have expected that the new COWed page should have the
same caching attributes as the old one and that is not really the case.


    */
-    vma->vm_flags |= VM_MIXEDMAP;
+    vma->vm_flags |= VM_PFNMAP | VM_SHARED;

Hmm, shouldn't we refuse COW mappings instead, like my old
patch on this subject did? In theory someone could be
setting up what she thinks is a private mapping to a shared
buffer object, and write sensitive data to it, which will
immediately leak. It's a simple check, could open-code if
necessary.

Yeah, though about that as well. Rejecting things would mean we
potentially break userspace which just happened to work by
coincident previously. Not totally evil, but not nice either.

How about we do a WARN_ON_ONCE(!(vma->vm_flags & VM_SHARED)); instead?

Umm, yes but that wouldn't notify the user, and would be triggerable
from user-space. But you can also set up legal non-COW mappings
without the VM_SHARED flag, IIRC, see is_cow_mapping(). I think when
this was up for discussion last time we arrived in a
vma_is_cow_mapping() utility...

Well userspace could trigger that only once, so no spamming of the log
can be expected here. And extra warnings in the logs are usually
reported by people rather quickly.

OK, I'm mostly worried about adding a security flaw that we know about from
the start.

VM_SHARED is already cleared in vma_set_page_prot() due to the VM_PFNMAP
check in vma_wants_writenotify.
Yes, but that's only on a local variable to get a write-protected prot. 
vmwgfx does the same for its dirty-tracking. Here we're debating setting 
VM_SHARED on a private mapping.


I'm honestly not sure whether userspace then even can notice this or
anything, so might be worth a quick testcase.


The net result is that in the very unlikely case the user requested a 
private GPU mapping to write secret data into, That secret data is no 
longer secret. And, for example in the case of AMD's SEV encryption that 
data would have been encrypted in an anonymous page with COW mappings, 
but not so if we add VM_SHARED, then it will be unencrypted in in GPU 
pages. Then I think it's better to refuse COW mappings in mmap:


if (is_cow_mapping(vma->vm_flags))
   return -EINVAL;

This will still allow private read-only mappings which is OK. And if 
someone was actually relying on private COW'd GPU mappings, we'd only 
break the code slightly more...


/Thomas




Even if I'm wrong here we shouldn't allow cow mappings of gem_bo, that
just seems too nasty with all the side-effects.

Completely agree.

-Daniel


/Thomas



Christian.


/Thomas

Re: [PATCH 02/10] drm/ttm: flip over the range manager to self allocated nodes

2021-06-02 Thread Intel




On 6/2/21 8:41 PM, Christian König wrote:

Am 02.06.21 um 17:28 schrieb Thomas Hellström (Intel):

Hi!

On 6/2/21 4:17 PM, Christian König wrote:

Am 02.06.21 um 16:13 schrieb Thomas Hellström (Intel):


On 6/2/21 3:07 PM, Christian König wrote:



Am 02.06.21 um 14:33 schrieb Thomas Hellström (Intel):


On 6/2/21 2:11 PM, Christian König wrote:

Am 02.06.21 um 13:44 schrieb Thomas Hellström (Intel):


On 6/2/21 12:09 PM, Christian König wrote:

Start with the range manager to make the resource object the base
class for the allocated nodes.

While at it cleanup a lot of the code around that.

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  1 +
  drivers/gpu/drm/drm_gem_vram_helper.c   |  2 +
  drivers/gpu/drm/nouveau/nouveau_ttm.c   |  2 +
  drivers/gpu/drm/qxl/qxl_ttm.c   |  1 +
  drivers/gpu/drm/radeon/radeon_ttm.c |  1 +
  drivers/gpu/drm/ttm/ttm_range_manager.c | 56 
++---

  drivers/gpu/drm/ttm/ttm_resource.c  | 26 
  include/drm/ttm/ttm_bo_driver.h | 26 
  include/drm/ttm/ttm_range_manager.h | 43 
+++

  include/drm/ttm/ttm_resource.h  |  3 ++
  10 files changed, 111 insertions(+), 50 deletions(-)
  create mode 100644 include/drm/ttm/ttm_range_manager.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 69db89261650..df1f185faae9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -45,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 
    #include 
  diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c

index 83e7258c7f90..17a4c5d47b6a 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -17,6 +17,8 @@
  #include 
  #include 
  +#include 
+
  static const struct drm_gem_object_funcs 
drm_gem_vram_object_funcs;

    /**
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c

index 65430912ff72..b08b8efeefba 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -26,6 +26,8 @@
  #include 
  #include 
  +#include 
+
  #include "nouveau_drv.h"
  #include "nouveau_gem.h"
  #include "nouveau_mem.h"
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c 
b/drivers/gpu/drm/qxl/qxl_ttm.c

index 8aa87b8edb9c..19fd39d9a00c 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -32,6 +32,7 @@
  #include 
  #include 
  #include 
+#include 
    #include "qxl_drv.h"
  #include "qxl_object.h"
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c

index cdffa9b65108..ad2a5a791bba 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -45,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 
    #include "radeon_reg.h"
  #include "radeon.h"
diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c

index b9d5da6e6a81..ce5d07ca384c 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -29,12 +29,13 @@
   * Authors: Thomas Hellstrom 
   */
  -#include 
+#include 
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
-#include 
    /*
   * Currently we use a spinlock for the lock, but a mutex 
*may* be
@@ -60,8 +61,8 @@ static int ttm_range_man_alloc(struct 
ttm_resource_manager *man,

 struct ttm_resource *mem)
  {
  struct ttm_range_manager *rman = to_range_manager(man);
+    struct ttm_range_mgr_node *node;
  struct drm_mm *mm = >mm;
-    struct drm_mm_node *node;
  enum drm_mm_insert_mode mode;
  unsigned long lpfn;
  int ret;
@@ -70,7 +71,7 @@ static int ttm_range_man_alloc(struct 
ttm_resource_manager *man,

  if (!lpfn)
  lpfn = man->size;
  -    node = kzalloc(sizeof(*node), GFP_KERNEL);
+    node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL);


I'm still a bit confused  about the situation where a driver 
wants to attach private data to a struct ttm_resource without 
having to re-implement its own range manager?


Could be cached sg-tables, list of GPU bindings etc. Wouldn't 
work with the above unless we have a void *driver_private 
member on the struct ttm_resource. Is that the plan going 
forward here? Or that the driver actually does the 
re-implementation?


I don't really understand your concern here. The basic idea is 
that drivers use ttm_resource as a base class for their own 
implementation.


See for example how nouveau does that:

struct nouveau_mem {
    struct ttm_resource base;
    struct nouveau_cli *cli;
    u8 kind;
    u8 comp;
    struct nvif_mem mem;
    struct nvif_vma vma[2];
};

The range manager is helping drive

Re: [PATCH 02/10] drm/ttm: flip over the range manager to self allocated nodes

2021-06-02 Thread Intel


Hi!

On 6/2/21 4:17 PM, Christian König wrote:

Am 02.06.21 um 16:13 schrieb Thomas Hellström (Intel):


On 6/2/21 3:07 PM, Christian König wrote:



Am 02.06.21 um 14:33 schrieb Thomas Hellström (Intel):


On 6/2/21 2:11 PM, Christian König wrote:

Am 02.06.21 um 13:44 schrieb Thomas Hellström (Intel):


On 6/2/21 12:09 PM, Christian König wrote:

Start with the range manager to make the resource object the base
class for the allocated nodes.

While at it cleanup a lot of the code around that.

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  1 +
  drivers/gpu/drm/drm_gem_vram_helper.c   |  2 +
  drivers/gpu/drm/nouveau/nouveau_ttm.c   |  2 +
  drivers/gpu/drm/qxl/qxl_ttm.c   |  1 +
  drivers/gpu/drm/radeon/radeon_ttm.c |  1 +
  drivers/gpu/drm/ttm/ttm_range_manager.c | 56 
++---

  drivers/gpu/drm/ttm/ttm_resource.c  | 26 
  include/drm/ttm/ttm_bo_driver.h | 26 
  include/drm/ttm/ttm_range_manager.h | 43 +++
  include/drm/ttm/ttm_resource.h  |  3 ++
  10 files changed, 111 insertions(+), 50 deletions(-)
  create mode 100644 include/drm/ttm/ttm_range_manager.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 69db89261650..df1f185faae9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -45,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 
    #include 
  diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c

index 83e7258c7f90..17a4c5d47b6a 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -17,6 +17,8 @@
  #include 
  #include 
  +#include 
+
  static const struct drm_gem_object_funcs 
drm_gem_vram_object_funcs;

    /**
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c

index 65430912ff72..b08b8efeefba 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -26,6 +26,8 @@
  #include 
  #include 
  +#include 
+
  #include "nouveau_drv.h"
  #include "nouveau_gem.h"
  #include "nouveau_mem.h"
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c 
b/drivers/gpu/drm/qxl/qxl_ttm.c

index 8aa87b8edb9c..19fd39d9a00c 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -32,6 +32,7 @@
  #include 
  #include 
  #include 
+#include 
    #include "qxl_drv.h"
  #include "qxl_object.h"
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c

index cdffa9b65108..ad2a5a791bba 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -45,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 
    #include "radeon_reg.h"
  #include "radeon.h"
diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c

index b9d5da6e6a81..ce5d07ca384c 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -29,12 +29,13 @@
   * Authors: Thomas Hellstrom 
   */
  -#include 
+#include 
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
-#include 
    /*
   * Currently we use a spinlock for the lock, but a mutex *may* be
@@ -60,8 +61,8 @@ static int ttm_range_man_alloc(struct 
ttm_resource_manager *man,

 struct ttm_resource *mem)
  {
  struct ttm_range_manager *rman = to_range_manager(man);
+    struct ttm_range_mgr_node *node;
  struct drm_mm *mm = >mm;
-    struct drm_mm_node *node;
  enum drm_mm_insert_mode mode;
  unsigned long lpfn;
  int ret;
@@ -70,7 +71,7 @@ static int ttm_range_man_alloc(struct 
ttm_resource_manager *man,

  if (!lpfn)
  lpfn = man->size;
  -    node = kzalloc(sizeof(*node), GFP_KERNEL);
+    node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL);


I'm still a bit confused  about the situation where a driver 
wants to attach private data to a struct ttm_resource without 
having to re-implement its own range manager?


Could be cached sg-tables, list of GPU bindings etc. Wouldn't 
work with the above unless we have a void *driver_private member 
on the struct ttm_resource. Is that the plan going forward here? 
Or that the driver actually does the re-implementation?


I don't really understand your concern here. The basic idea is 
that drivers use ttm_resource as a base class for their own 
implementation.


See for example how nouveau does that:

struct nouveau_mem {
    struct ttm_resource base;
    struct nouveau_cli *cli;
    u8 kind;
    u8 comp;
    struct nvif_mem mem;
    struct nvif_vma vma[2];
};

The range manager is helping driver specific resource managers 
which want to implement something drm_mm_nodes based. E.

Re: [PATCH 02/10] drm/ttm: flip over the range manager to self allocated nodes

2021-06-02 Thread Intel




On 6/2/21 3:07 PM, Christian König wrote:



Am 02.06.21 um 14:33 schrieb Thomas Hellström (Intel):


On 6/2/21 2:11 PM, Christian König wrote:

Am 02.06.21 um 13:44 schrieb Thomas Hellström (Intel):


On 6/2/21 12:09 PM, Christian König wrote:

Start with the range manager to make the resource object the base
class for the allocated nodes.

While at it cleanup a lot of the code around that.

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  1 +
  drivers/gpu/drm/drm_gem_vram_helper.c   |  2 +
  drivers/gpu/drm/nouveau/nouveau_ttm.c   |  2 +
  drivers/gpu/drm/qxl/qxl_ttm.c   |  1 +
  drivers/gpu/drm/radeon/radeon_ttm.c |  1 +
  drivers/gpu/drm/ttm/ttm_range_manager.c | 56 
++---

  drivers/gpu/drm/ttm/ttm_resource.c  | 26 
  include/drm/ttm/ttm_bo_driver.h | 26 
  include/drm/ttm/ttm_range_manager.h | 43 +++
  include/drm/ttm/ttm_resource.h  |  3 ++
  10 files changed, 111 insertions(+), 50 deletions(-)
  create mode 100644 include/drm/ttm/ttm_range_manager.h

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index 69db89261650..df1f185faae9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -45,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 
    #include 
  diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c 
b/drivers/gpu/drm/drm_gem_vram_helper.c

index 83e7258c7f90..17a4c5d47b6a 100644
--- a/drivers/gpu/drm/drm_gem_vram_helper.c
+++ b/drivers/gpu/drm/drm_gem_vram_helper.c
@@ -17,6 +17,8 @@
  #include 
  #include 
  +#include 
+
  static const struct drm_gem_object_funcs drm_gem_vram_object_funcs;
    /**
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c

index 65430912ff72..b08b8efeefba 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -26,6 +26,8 @@
  #include 
  #include 
  +#include 
+
  #include "nouveau_drv.h"
  #include "nouveau_gem.h"
  #include "nouveau_mem.h"
diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c 
b/drivers/gpu/drm/qxl/qxl_ttm.c

index 8aa87b8edb9c..19fd39d9a00c 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -32,6 +32,7 @@
  #include 
  #include 
  #include 
+#include 
    #include "qxl_drv.h"
  #include "qxl_object.h"
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c

index cdffa9b65108..ad2a5a791bba 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -45,6 +45,7 @@
  #include 
  #include 
  #include 
+#include 
    #include "radeon_reg.h"
  #include "radeon.h"
diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c

index b9d5da6e6a81..ce5d07ca384c 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -29,12 +29,13 @@
   * Authors: Thomas Hellstrom 
   */
  -#include 
+#include 
  #include 
+#include 
+#include 
  #include 
  #include 
  #include 
-#include 
    /*
   * Currently we use a spinlock for the lock, but a mutex *may* be
@@ -60,8 +61,8 @@ static int ttm_range_man_alloc(struct 
ttm_resource_manager *man,

 struct ttm_resource *mem)
  {
  struct ttm_range_manager *rman = to_range_manager(man);
+    struct ttm_range_mgr_node *node;
  struct drm_mm *mm = >mm;
-    struct drm_mm_node *node;
  enum drm_mm_insert_mode mode;
  unsigned long lpfn;
  int ret;
@@ -70,7 +71,7 @@ static int ttm_range_man_alloc(struct 
ttm_resource_manager *man,

  if (!lpfn)
  lpfn = man->size;
  -    node = kzalloc(sizeof(*node), GFP_KERNEL);
+    node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL);


I'm still a bit confused  about the situation where a driver wants 
to attach private data to a struct ttm_resource without having to 
re-implement its own range manager?


Could be cached sg-tables, list of GPU bindings etc. Wouldn't work 
with the above unless we have a void *driver_private member on the 
struct ttm_resource. Is that the plan going forward here? Or that 
the driver actually does the re-implementation?


I don't really understand your concern here. The basic idea is that 
drivers use ttm_resource as a base class for their own implementation.


See for example how nouveau does that:

struct nouveau_mem {
    struct ttm_resource base;
    struct nouveau_cli *cli;
    u8 kind;
    u8 comp;
    struct nvif_mem mem;
    struct nvif_vma vma[2];
};

The range manager is helping driver specific resource managers which 
want to implement something drm_mm_nodes based. E.g. amdgpu_gtt_mgr 
and amdgpu_vram_mgr, but it can also be used stand alone.


The ttm_range_mgr_node can then be used as ba

1 2 3 >

1 - 100 of 218 matches

Mail list logo