Re: [PATCH] drm/tests: mm: Convert to drm_dbg_printer
On 2/9/24 15:08, Michał Winiarski wrote: Fix one of the tests in drm_mm that was not converted prior to drm_debug_printer removal, causing tests build failure. Fixes: e154c4fc7bf2d ("drm: remove drm_debug_printer in favor of drm_dbg_printer") Signed-off-by: Michał Winiarski --- drivers/gpu/drm/tests/drm_mm_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/tests/drm_mm_test.c b/drivers/gpu/drm/tests/drm_mm_test.c index 1eb0c304f9607..3488d930e3a38 100644 --- a/drivers/gpu/drm/tests/drm_mm_test.c +++ b/drivers/gpu/drm/tests/drm_mm_test.c @@ -188,7 +188,7 @@ static void drm_test_mm_init(struct kunit *test) static void drm_test_mm_debug(struct kunit *test) { - struct drm_printer p = drm_debug_printer(test->name); + struct drm_printer p = drm_dbg_printer(NULL, DRM_UT_CORE, test->name); struct drm_mm mm; struct drm_mm_node nodes[2]; Reviewed-by: Thomas Hellström
Re: [Linaro-mm-sig] Re: [PATCH] dma-buf: fix check in dma_resv_add_fence
On 11/27/23 14:24, Christian König wrote: Ping? Can I get an rb or acked-by for that? Thanks, Christian. Am 15.11.23 um 10:30 schrieb Christian König: It's valid to add the same fence multiple times to a dma-resv object and we shouldn't need one extra slot for each. Signed-off-by: Christian König Fixes: a3f7c10a269d5 ("dma-buf/dma-resv: check if the new fence is really later") Cc: sta...@vger.kernel.org # v5.19+ Reviewed-by: Thomas Hellström --- drivers/dma-buf/dma-resv.c | 2 +- include/linux/dma-fence.h | 15 +++ 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 38b4110378de..eb8b733065b2 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -301,7 +301,7 @@ void dma_resv_add_fence(struct dma_resv *obj, struct dma_fence *fence, dma_resv_list_entry(fobj, i, obj, , _usage); if ((old->context == fence->context && old_usage >= usage && - dma_fence_is_later(fence, old)) || + dma_fence_is_later_or_same(fence, old)) || dma_fence_is_signaled(old)) { dma_resv_list_set(fobj, i, fence, usage); dma_fence_put(old); diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index ebe78bd3d121..b3772edca2e6 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -498,6 +498,21 @@ static inline bool dma_fence_is_later(struct dma_fence *f1, return __dma_fence_is_later(f1->seqno, f2->seqno, f1->ops); } +/** + * dma_fence_is_later_or_same - return true if f1 is later or same as f2 + * @f1: the first fence from the same context + * @f2: the second fence from the same context + * + * Returns true if f1 is chronologically later than f2 or the same fence. Both + * fences must be from the same context, since a seqno is not re-used across + * contexts. + */ +static inline bool dma_fence_is_later_or_same(struct dma_fence *f1, + struct dma_fence *f2) +{ + return f1 == f2 || dma_fence_is_later(f1, f2); +} + /** * dma_fence_later - return the chronologically later fence * @f1: the first fence from the same context ___ Linaro-mm-sig mailing list -- linaro-mm-...@lists.linaro.org To unsubscribe send an email to linaro-mm-sig-le...@lists.linaro.org
Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
On 8/31/23 21:07, Danilo Krummrich wrote: On Thu, Aug 31, 2023 at 06:53:01PM +0200, Thomas Hellström (Intel) wrote: Hi, On 8/31/23 13:18, Danilo Krummrich wrote: On Thu, Aug 31, 2023 at 11:04:06AM +0200, Thomas Hellström (Intel) wrote: Hi! On 8/30/23 17:00, Danilo Krummrich wrote: On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellström (Intel) wrote: On 8/30/23 14:49, Danilo Krummrich wrote: Hi Thomas, thanks for having a look! On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote: Hi, Danilo. Some quick comments since I'm doing some Xe work in this area. Will probably get back with more. On 8/20/23 23:53, Danilo Krummrich wrote: diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h index ed8d50200cc3..693e2da3f425 100644 --- a/include/drm/drm_gpuva_mgr.h +++ b/include/drm/drm_gpuva_mgr.h @@ -26,12 +26,16 @@ */ #include +#include +#include #include #include #include +#include struct drm_gpuva_manager; +struct drm_gpuva_gem; struct drm_gpuva_fn_ops; /** @@ -140,7 +144,7 @@ struct drm_gpuva { int drm_gpuva_insert(struct drm_gpuva_manager *mgr, struct drm_gpuva *va); void drm_gpuva_remove(struct drm_gpuva *va); -void drm_gpuva_link(struct drm_gpuva *va); +void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuva_gem *vm_bo); void drm_gpuva_unlink(struct drm_gpuva *va); struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr, @@ -240,15 +244,137 @@ struct drm_gpuva_manager { * @ops: _gpuva_fn_ops providing the split/merge steps to drivers */ const struct drm_gpuva_fn_ops *ops; + + /** +* @d_obj: Dummy GEM object; used internally to pass the GPU VMs +* dma-resv to _exec. +*/ + struct drm_gem_object d_obj; + + /** +* @resv: the _resv for _gem_objects mapped in this GPU VA +* space +*/ + struct dma_resv *resv; + + /** +* @exec: the _exec helper to lock external _gem_objects +*/ + struct drm_exec exec; + + /** +* @mt_ext: _tree storing external _gem_objects +*/ + struct maple_tree mt_ext; Why are you using a maple tree here? Insertion and removal is O(log(n)) instead of O(1) for a list? Having a list of drm_gem_objects directly wouldn't work, as multiple GPU-VMs could have mappings of the same extobj. I considered using the VM_BO abstraction (struct drm_gpuva_gem) as list entry instead, which also seems to be the obvious choice. However, there is a locking conflict. A drm_gem_object keeps a list of drm_gpuva_gems, while each drm_gpuva_gem keeps a list of drm_gpuvas. Both lists are either protected with the dma-resv lock of the corresponding drm_gem_object, or with an external lock provided by the driver (see drm_gem_gpuva_set_lock()). The latter is used by drivers performing changes on the GPUVA space directly from the fence signalling path. Now, similar to what drm_gpuva_link() and drm_gpuva_unlink() are doing already, we'd want to add a drm_gpuva_gem to the extobj list for the first mapping being linked and we'd want to remove it for the last one being unlinked. (Actually we'd want to add the drm_gpuva_gem object to the extobj list even before, because otherwise we'd not acquire it's dma-resv lock of this GEM object through drm_gpuva_manager_lock(). But that's trival, we could do that when we create the drm_gpuva_gem, which we need to do anyways.) Anyway, we'd probably want to keep removing the drm_gpuva_gem from the extobj list from drm_gpuva_unlink() when the last mapping of this BO is unlinked. In order to do so, we'd (as discussed above) either need to hold the outer GPU-VM lock or the GPU-VMs dma-resv lock. Both would be illegal in the case drm_gpuva_unlink() is called from within the fence signalling path. For drivers like XE or Nouveau, we'd at least need to make sure to not mess up the locking hierarchy of GPU-VM lock and dma-resv lock of the corresponding BO. Considering all that, I thought it's probably better to track extobjs separate from the drm_gpuva_gem, hence the maple tree choice. Hm. OK, in Xe we're having a list of the xe_vmas (drm_gpuvas) that point to external objects, or in the case of multiple mappings to the same gem object, only one of the drm_gpuvas is in the list. These are protected by the GPU-VM lock. I don't see a problem with removing those from the fence signalling path, though? I intentionally tried to avoid keeping a list of drm_gpuvas to track extobjs, since this is generic code I don't know how much mappings of an external object the corresponding driver potentially creates. This could become a pretty large list to iterate. Another reason was, that I want to keep the drm_gpuva structure as small as possible, hence avoiding another list_head. Yes, the list might be pretty large, but OTOH you never iterate to access a single list element. When you need to iterate
Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
Hi, On 8/31/23 13:18, Danilo Krummrich wrote: On Thu, Aug 31, 2023 at 11:04:06AM +0200, Thomas Hellström (Intel) wrote: Hi! On 8/30/23 17:00, Danilo Krummrich wrote: On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellström (Intel) wrote: On 8/30/23 14:49, Danilo Krummrich wrote: Hi Thomas, thanks for having a look! On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote: Hi, Danilo. Some quick comments since I'm doing some Xe work in this area. Will probably get back with more. On 8/20/23 23:53, Danilo Krummrich wrote: diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h index ed8d50200cc3..693e2da3f425 100644 --- a/include/drm/drm_gpuva_mgr.h +++ b/include/drm/drm_gpuva_mgr.h @@ -26,12 +26,16 @@ */ #include +#include +#include #include #include #include +#include struct drm_gpuva_manager; +struct drm_gpuva_gem; struct drm_gpuva_fn_ops; /** @@ -140,7 +144,7 @@ struct drm_gpuva { int drm_gpuva_insert(struct drm_gpuva_manager *mgr, struct drm_gpuva *va); void drm_gpuva_remove(struct drm_gpuva *va); -void drm_gpuva_link(struct drm_gpuva *va); +void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuva_gem *vm_bo); void drm_gpuva_unlink(struct drm_gpuva *va); struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr, @@ -240,15 +244,137 @@ struct drm_gpuva_manager { * @ops: _gpuva_fn_ops providing the split/merge steps to drivers */ const struct drm_gpuva_fn_ops *ops; + + /** +* @d_obj: Dummy GEM object; used internally to pass the GPU VMs +* dma-resv to _exec. +*/ + struct drm_gem_object d_obj; + + /** +* @resv: the _resv for _gem_objects mapped in this GPU VA +* space +*/ + struct dma_resv *resv; + + /** +* @exec: the _exec helper to lock external _gem_objects +*/ + struct drm_exec exec; + + /** +* @mt_ext: _tree storing external _gem_objects +*/ + struct maple_tree mt_ext; Why are you using a maple tree here? Insertion and removal is O(log(n)) instead of O(1) for a list? Having a list of drm_gem_objects directly wouldn't work, as multiple GPU-VMs could have mappings of the same extobj. I considered using the VM_BO abstraction (struct drm_gpuva_gem) as list entry instead, which also seems to be the obvious choice. However, there is a locking conflict. A drm_gem_object keeps a list of drm_gpuva_gems, while each drm_gpuva_gem keeps a list of drm_gpuvas. Both lists are either protected with the dma-resv lock of the corresponding drm_gem_object, or with an external lock provided by the driver (see drm_gem_gpuva_set_lock()). The latter is used by drivers performing changes on the GPUVA space directly from the fence signalling path. Now, similar to what drm_gpuva_link() and drm_gpuva_unlink() are doing already, we'd want to add a drm_gpuva_gem to the extobj list for the first mapping being linked and we'd want to remove it for the last one being unlinked. (Actually we'd want to add the drm_gpuva_gem object to the extobj list even before, because otherwise we'd not acquire it's dma-resv lock of this GEM object through drm_gpuva_manager_lock(). But that's trival, we could do that when we create the drm_gpuva_gem, which we need to do anyways.) Anyway, we'd probably want to keep removing the drm_gpuva_gem from the extobj list from drm_gpuva_unlink() when the last mapping of this BO is unlinked. In order to do so, we'd (as discussed above) either need to hold the outer GPU-VM lock or the GPU-VMs dma-resv lock. Both would be illegal in the case drm_gpuva_unlink() is called from within the fence signalling path. For drivers like XE or Nouveau, we'd at least need to make sure to not mess up the locking hierarchy of GPU-VM lock and dma-resv lock of the corresponding BO. Considering all that, I thought it's probably better to track extobjs separate from the drm_gpuva_gem, hence the maple tree choice. Hm. OK, in Xe we're having a list of the xe_vmas (drm_gpuvas) that point to external objects, or in the case of multiple mappings to the same gem object, only one of the drm_gpuvas is in the list. These are protected by the GPU-VM lock. I don't see a problem with removing those from the fence signalling path, though? I intentionally tried to avoid keeping a list of drm_gpuvas to track extobjs, since this is generic code I don't know how much mappings of an external object the corresponding driver potentially creates. This could become a pretty large list to iterate. Another reason was, that I want to keep the drm_gpuva structure as small as possible, hence avoiding another list_head. Yes, the list might be pretty large, but OTOH you never iterate to access a single list element. When you need to iterate the whole list you need to do that regardless of the data structure used. As for the list head, it might perhaps be aliased (union
Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
Hi! On 8/30/23 17:00, Danilo Krummrich wrote: On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellström (Intel) wrote: On 8/30/23 14:49, Danilo Krummrich wrote: Hi Thomas, thanks for having a look! On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote: Hi, Danilo. Some quick comments since I'm doing some Xe work in this area. Will probably get back with more. On 8/20/23 23:53, Danilo Krummrich wrote: So far the DRM GPUVA manager offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVA manager represent a basic GPU-VM implementation. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Rather than being designed as a "framework", the target is to make all features appear as a collection of optional helper functions, such that drivers are free to make use of the DRM GPUVA managers basic functionality and opt-in for other features without setting any feature flags, just by making use of the corresponding functions. Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuva_mgr.c | 688 +++- include/drm/drm_gem.h | 48 ++- include/drm/drm_gpuva_mgr.h | 302 +- 3 files changed, 1010 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c index f86bfad74ff8..69872b205961 100644 --- a/drivers/gpu/drm/drm_gpuva_mgr.c +++ b/drivers/gpu/drm/drm_gpuva_mgr.c @@ -655,6 +655,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr, /** * drm_gpuva_manager_init() - initialize a _gpuva_manager * @mgr: pointer to the _gpuva_manager to initialize + * @drm: the drivers _device * @name: the name of the GPU VA space * @start_offset: the start offset of the GPU VA space * @range: the size of the GPU VA space @@ -669,6 +670,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr, */ void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, + struct drm_device *drm, const char *name, u64 start_offset, u64 range, u64 reserve_offset, u64 reserve_range, @@ -677,6 +679,11 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, mgr->rb.tree = RB_ROOT_CACHED; INIT_LIST_HEAD(>rb.list); + mt_init(>mt_ext); + + INIT_LIST_HEAD(>evict.list); + spin_lock_init(>evict.lock); + drm_gpuva_check_overflow(start_offset, range); mgr->mm_start = start_offset; mgr->mm_range = range; @@ -694,6 +701,9 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, reserve_range))) __drm_gpuva_insert(mgr, >kernel_alloc_node); } + + drm_gem_private_object_init(drm, >d_obj, 0); + mgr->resv = mgr->d_obj.resv; } EXPORT_SYMBOL_GPL(drm_gpuva_manager_init); @@ -713,10 +723,575 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr) __drm_gpuva_remove(>kernel_alloc_node); WARN(!RB_EMPTY_ROOT(>rb.tree.rb_root), -"GPUVA tree is not empty, potentially leaking memory."); +"GPUVA tree is not empty, potentially leaking memory.\n"); + + mtree_destroy(>mt_ext); + WARN(!list_empty(>evict.list), "Evict list should be empty.\n"); + + drm_gem_private_object_fini(>d_obj); } EXPORT_SYMBOL_GPL(drm_gpuva_manager_destroy); +/** + * drm_gpuva_manager_prepare_objects() - prepare all assoiciated BOs + * @mgr: the _gpuva_manager + * @num_fences: the amount of _fences to reserve + * + * Calls drm_exec_prepare_obj() for all _gem_objects the given + * _gpuva_manager contains mappings of. + * + * Drivers can obtain the corresponding _exec instance through + * DRM_GPUVA_EXEC(). It is the drivers responsibility to call drm_exec_init() + * and drm_exec_fini() accordingly. + * + * Returns: 0 on success, negative error code on failure. + */ +int +drm_gpuva_manager_prepare_objects(struct drm_gpuva_manager *mgr, + unsigned int num_fences) +{ + struct d
Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
On 8/30/23 14:49, Danilo Krummrich wrote: Hi Thomas, thanks for having a look! On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote: Hi, Danilo. Some quick comments since I'm doing some Xe work in this area. Will probably get back with more. On 8/20/23 23:53, Danilo Krummrich wrote: So far the DRM GPUVA manager offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVA manager represent a basic GPU-VM implementation. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Rather than being designed as a "framework", the target is to make all features appear as a collection of optional helper functions, such that drivers are free to make use of the DRM GPUVA managers basic functionality and opt-in for other features without setting any feature flags, just by making use of the corresponding functions. Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuva_mgr.c | 688 +++- include/drm/drm_gem.h | 48 ++- include/drm/drm_gpuva_mgr.h | 302 +- 3 files changed, 1010 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c index f86bfad74ff8..69872b205961 100644 --- a/drivers/gpu/drm/drm_gpuva_mgr.c +++ b/drivers/gpu/drm/drm_gpuva_mgr.c @@ -655,6 +655,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr, /** * drm_gpuva_manager_init() - initialize a _gpuva_manager * @mgr: pointer to the _gpuva_manager to initialize + * @drm: the drivers _device * @name: the name of the GPU VA space * @start_offset: the start offset of the GPU VA space * @range: the size of the GPU VA space @@ -669,6 +670,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr, */ void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, + struct drm_device *drm, const char *name, u64 start_offset, u64 range, u64 reserve_offset, u64 reserve_range, @@ -677,6 +679,11 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, mgr->rb.tree = RB_ROOT_CACHED; INIT_LIST_HEAD(>rb.list); + mt_init(>mt_ext); + + INIT_LIST_HEAD(>evict.list); + spin_lock_init(>evict.lock); + drm_gpuva_check_overflow(start_offset, range); mgr->mm_start = start_offset; mgr->mm_range = range; @@ -694,6 +701,9 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, reserve_range))) __drm_gpuva_insert(mgr, >kernel_alloc_node); } + + drm_gem_private_object_init(drm, >d_obj, 0); + mgr->resv = mgr->d_obj.resv; } EXPORT_SYMBOL_GPL(drm_gpuva_manager_init); @@ -713,10 +723,575 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr) __drm_gpuva_remove(>kernel_alloc_node); WARN(!RB_EMPTY_ROOT(>rb.tree.rb_root), -"GPUVA tree is not empty, potentially leaking memory."); +"GPUVA tree is not empty, potentially leaking memory.\n"); + + mtree_destroy(>mt_ext); + WARN(!list_empty(>evict.list), "Evict list should be empty.\n"); + + drm_gem_private_object_fini(>d_obj); } EXPORT_SYMBOL_GPL(drm_gpuva_manager_destroy); +/** + * drm_gpuva_manager_prepare_objects() - prepare all assoiciated BOs + * @mgr: the _gpuva_manager + * @num_fences: the amount of _fences to reserve + * + * Calls drm_exec_prepare_obj() for all _gem_objects the given + * _gpuva_manager contains mappings of. + * + * Drivers can obtain the corresponding _exec instance through + * DRM_GPUVA_EXEC(). It is the drivers responsibility to call drm_exec_init() + * and drm_exec_fini() accordingly. + * + * Returns: 0 on success, negative error code on failure. + */ +int +drm_gpuva_manager_prepare_objects(struct drm_gpuva_manager *mgr, + unsigned int num_fences) +{ + struct drm_exec *exec = DRM_GPUVA_EXEC(mgr); + MA_STATE(mas, >mt_ext, 0, 0); + union { + void *ptr; + uintptr_t cnt; +
Re: [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
Hi, Danilo. Some quick comments since I'm doing some Xe work in this area. Will probably get back with more. On 8/20/23 23:53, Danilo Krummrich wrote: So far the DRM GPUVA manager offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVA manager represent a basic GPU-VM implementation. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Rather than being designed as a "framework", the target is to make all features appear as a collection of optional helper functions, such that drivers are free to make use of the DRM GPUVA managers basic functionality and opt-in for other features without setting any feature flags, just by making use of the corresponding functions. Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuva_mgr.c | 688 +++- include/drm/drm_gem.h | 48 ++- include/drm/drm_gpuva_mgr.h | 302 +- 3 files changed, 1010 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c index f86bfad74ff8..69872b205961 100644 --- a/drivers/gpu/drm/drm_gpuva_mgr.c +++ b/drivers/gpu/drm/drm_gpuva_mgr.c @@ -655,6 +655,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr, /** * drm_gpuva_manager_init() - initialize a _gpuva_manager * @mgr: pointer to the _gpuva_manager to initialize + * @drm: the drivers _device * @name: the name of the GPU VA space * @start_offset: the start offset of the GPU VA space * @range: the size of the GPU VA space @@ -669,6 +670,7 @@ drm_gpuva_range_valid(struct drm_gpuva_manager *mgr, */ void drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, + struct drm_device *drm, const char *name, u64 start_offset, u64 range, u64 reserve_offset, u64 reserve_range, @@ -677,6 +679,11 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, mgr->rb.tree = RB_ROOT_CACHED; INIT_LIST_HEAD(>rb.list); + mt_init(>mt_ext); + + INIT_LIST_HEAD(>evict.list); + spin_lock_init(>evict.lock); + drm_gpuva_check_overflow(start_offset, range); mgr->mm_start = start_offset; mgr->mm_range = range; @@ -694,6 +701,9 @@ drm_gpuva_manager_init(struct drm_gpuva_manager *mgr, reserve_range))) __drm_gpuva_insert(mgr, >kernel_alloc_node); } + + drm_gem_private_object_init(drm, >d_obj, 0); + mgr->resv = mgr->d_obj.resv; } EXPORT_SYMBOL_GPL(drm_gpuva_manager_init); @@ -713,10 +723,575 @@ drm_gpuva_manager_destroy(struct drm_gpuva_manager *mgr) __drm_gpuva_remove(>kernel_alloc_node); WARN(!RB_EMPTY_ROOT(>rb.tree.rb_root), -"GPUVA tree is not empty, potentially leaking memory."); +"GPUVA tree is not empty, potentially leaking memory.\n"); + + mtree_destroy(>mt_ext); + WARN(!list_empty(>evict.list), "Evict list should be empty.\n"); + + drm_gem_private_object_fini(>d_obj); } EXPORT_SYMBOL_GPL(drm_gpuva_manager_destroy); +/** + * drm_gpuva_manager_prepare_objects() - prepare all assoiciated BOs + * @mgr: the _gpuva_manager + * @num_fences: the amount of _fences to reserve + * + * Calls drm_exec_prepare_obj() for all _gem_objects the given + * _gpuva_manager contains mappings of. + * + * Drivers can obtain the corresponding _exec instance through + * DRM_GPUVA_EXEC(). It is the drivers responsibility to call drm_exec_init() + * and drm_exec_fini() accordingly. + * + * Returns: 0 on success, negative error code on failure. + */ +int +drm_gpuva_manager_prepare_objects(struct drm_gpuva_manager *mgr, + unsigned int num_fences) +{ + struct drm_exec *exec = DRM_GPUVA_EXEC(mgr); + MA_STATE(mas, >mt_ext, 0, 0); + union { + void *ptr; + uintptr_t cnt; + } ref; + int ret; + + ret = drm_exec_prepare_obj(exec, >d_obj, num_fences); + if (ret) + goto out; + + rcu_read_lock(); In xe we're protecting the external object list with an outer lock, (same as protecting the mgr itself).
Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings
On 7/6/23 17:48, Danilo Krummrich wrote: Hi Thomas, On 7/6/23 10:49, Thomas Hellström (Intel) wrote: Hi, Danilo Some review comments below: On 6/30/23 00:25, Danilo Krummrich wrote: Add infrastructure to keep track of GPU virtual address (VA) mappings with a decicated VA space manager implementation. New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers start implementing, allow userspace applications to request multiple and arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is intended to serve the following purposes in this context. 1) Provide infrastructure to track GPU VA allocations and mappings, making use of the maple_tree. It looks like we're not using the maple tree anymore, but rather an instantiation of an interval tree. (Perhaps as a follow-up it makes sense to provide a pre-instantiated common u64 version of the interval tree in addition to the unsigned long one since it appears to be used in multiple places in graphics drivers). 2) Generically connect GPU VA mappings to their backing buffers, in particular DRM GEM objects. 3) Provide a common implementation to perform more complex mapping operations on the GPU VA space. In particular splitting and merging of GPU VA mappings, e.g. for intersecting mapping requests or partial unmap requests. Tested-by: Donald Robson Reviewed-by: Boris Brezillon Suggested-by: Dave Airlie Signed-off-by: Danilo Krummrich --- Documentation/gpu/drm-mm.rst | 36 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/drm_gem.c | 3 + drivers/gpu/drm/drm_gpuva_mgr.c | 1743 +++ include/drm/drm_drv.h | 6 + include/drm/drm_gem.h | 52 + include/drm/drm_gpuva_mgr.h | 756 ++ 7 files changed, 2597 insertions(+) create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c create mode 100644 include/drm/drm_gpuva_mgr.h diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index a52e6f4117d6..3d5dc9dc1bfe 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -466,6 +466,42 @@ DRM MM Range Allocator Function References .. kernel-doc:: drivers/gpu/drm/drm_mm.c :export: +DRM GPU VA Manager +== + +Overview + + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :doc: Overview + +Split and Merge +--- + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :doc: Split and Merge + +Locking +--- + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :doc: Locking + +Examples + + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :doc: Examples + +DRM GPU VA Manager Function References +-- + +.. kernel-doc:: include/drm/drm_gpuva_mgr.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :export: + DRM Buddy Allocator === diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 414855e2a463..6d6c9dec66e8 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -45,6 +45,7 @@ drm-y := \ drm_vblank.o \ drm_vblank_work.o \ drm_vma_manager.o \ + drm_gpuva_mgr.o \ drm_writeback.o drm-$(CONFIG_DRM_LEGACY) += \ drm_agpsupport.o \ diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 1a5a2cd0d4ec..cd878ebddbd0 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev, if (!obj->resv) obj->resv = >_resv; + if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA)) + drm_gem_gpuva_init(obj); + drm_vma_node_reset(>vma_node); INIT_LIST_HEAD(>lru_node); } diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c new file mode 100644 index ..4414990c05cc --- /dev/null +++ b/drivers/gpu/drm/drm_gpuva_mgr.c @@ -0,0 +1,1743 @@ +// SPDX-License-Identifier: GPL-2.0 +/* SPDX-License is GPL-2.0 but below looks like MIT. Can we use "GPL-2.0 OR MIT" or does something restrict it to GPL-only? + * Copyright (c) 2022 Red Hat. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANT
Re: [PATCH drm-next v6 02/13] drm: manager to keep track of GPUs VA mappings
Hi, Danilo Some review comments below: On 6/30/23 00:25, Danilo Krummrich wrote: Add infrastructure to keep track of GPU virtual address (VA) mappings with a decicated VA space manager implementation. New UAPIs, motivated by Vulkan sparse memory bindings graphics drivers start implementing, allow userspace applications to request multiple and arbitrary GPU VA mappings of buffer objects. The DRM GPU VA manager is intended to serve the following purposes in this context. 1) Provide infrastructure to track GPU VA allocations and mappings, making use of the maple_tree. It looks like we're not using the maple tree anymore, but rather an instantiation of an interval tree. (Perhaps as a follow-up it makes sense to provide a pre-instantiated common u64 version of the interval tree in addition to the unsigned long one since it appears to be used in multiple places in graphics drivers). 2) Generically connect GPU VA mappings to their backing buffers, in particular DRM GEM objects. 3) Provide a common implementation to perform more complex mapping operations on the GPU VA space. In particular splitting and merging of GPU VA mappings, e.g. for intersecting mapping requests or partial unmap requests. Tested-by: Donald Robson Reviewed-by: Boris Brezillon Suggested-by: Dave Airlie Signed-off-by: Danilo Krummrich --- Documentation/gpu/drm-mm.rst| 36 + drivers/gpu/drm/Makefile|1 + drivers/gpu/drm/drm_gem.c |3 + drivers/gpu/drm/drm_gpuva_mgr.c | 1743 +++ include/drm/drm_drv.h |6 + include/drm/drm_gem.h | 52 + include/drm/drm_gpuva_mgr.h | 756 ++ 7 files changed, 2597 insertions(+) create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c create mode 100644 include/drm/drm_gpuva_mgr.h diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index a52e6f4117d6..3d5dc9dc1bfe 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -466,6 +466,42 @@ DRM MM Range Allocator Function References .. kernel-doc:: drivers/gpu/drm/drm_mm.c :export: +DRM GPU VA Manager +== + +Overview + + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :doc: Overview + +Split and Merge +--- + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :doc: Split and Merge + +Locking +--- + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :doc: Locking + +Examples + + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :doc: Examples + +DRM GPU VA Manager Function References +-- + +.. kernel-doc:: include/drm/drm_gpuva_mgr.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_gpuva_mgr.c + :export: + DRM Buddy Allocator === diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 414855e2a463..6d6c9dec66e8 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -45,6 +45,7 @@ drm-y := \ drm_vblank.o \ drm_vblank_work.o \ drm_vma_manager.o \ + drm_gpuva_mgr.o \ drm_writeback.o drm-$(CONFIG_DRM_LEGACY) += \ drm_agpsupport.o \ diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 1a5a2cd0d4ec..cd878ebddbd0 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -164,6 +164,9 @@ void drm_gem_private_object_init(struct drm_device *dev, if (!obj->resv) obj->resv = >_resv; + if (drm_core_check_feature(dev, DRIVER_GEM_GPUVA)) + drm_gem_gpuva_init(obj); + drm_vma_node_reset(>vma_node); INIT_LIST_HEAD(>lru_node); } diff --git a/drivers/gpu/drm/drm_gpuva_mgr.c b/drivers/gpu/drm/drm_gpuva_mgr.c new file mode 100644 index ..4414990c05cc --- /dev/null +++ b/drivers/gpu/drm/drm_gpuva_mgr.c @@ -0,0 +1,1743 @@ +// SPDX-License-Identifier: GPL-2.0 +/* SPDX-License is GPL-2.0 but below looks like MIT. Can we use "GPL-2.0 OR MIT" or does something restrict it to GPL-only? + * Copyright (c) 2022 Red Hat. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE
Re: [PATCH v2] drm/i915: Replace kmap() with kmap_local_page()
On 6/21/23 18:35, Ira Weiny wrote: Thomas Hellström (Intel) wrote: I think one thing worth mentioning in the context of this patch is that IIRC kmap_local_page() will block offlining of the mapping CPU until kunmap_local(), so while I haven't seen any guidelines around the usage of this api for long-held mappings, I figure it's wise to keep the mapping duration short, or at least avoid sleeping with a kmap_local_page() map active. I figured, while page compression is probably to be considered "slow" it's probably not slow enough to motivate kmap() instead of kmap_local_page(), but if anyone feels differently, perhaps it should be considered. What you say is all true. But remember the mappings are only actually created on a HIGHMEM system. HIGHMEM systems are increasingly rare. Also they must suffer such performance issues because there is just no other way around supporting them. Also Sumitra, and our kmap conversion project in general, is focusing on not using kmap* if at all possible. Thus the reason V1 tried to use page_address(). Could we guarantee the i915 driver is excluded from all HIGHMEM systems? The i915 maintainers might want to chime in here, but I would say no, we can't, although we don't care much about optimizing for them. Same for the new xe driver. Thanks, /Thomas With that said, my Reviewed-by: still stands. Thanks! Ira
Re: [PATCH 1/2] drm: execution context for GEM buffers v5
On 6/21/23 15:36, Christian König wrote: This adds the infrastructure for an execution context for GEM buffers which is similar to the existing TTMs execbuf util and intended to replace it in the long term. The basic functionality is that we abstracts the necessary loop to lock many different GEM buffers with automated deadlock and duplicate handling. v2: drop xarray and use dynamic resized array instead, the locking overhead is unecessary and measurable. v3: drop duplicate tracking, radeon is really the only one needing that. v4: fixes issues pointed out by Danilo, some typos in comments and a helper for lock arrays of GEM objects. v5: some suggestions by Boris Brezillon, especially just use one retry macro, drop loop in prepare_array, use flags instead of bool Signed-off-by: Christian König --- Documentation/gpu/drm-mm.rst | 12 ++ drivers/gpu/drm/Kconfig | 6 + drivers/gpu/drm/Makefile | 2 + drivers/gpu/drm/drm_exec.c | 330 +++ include/drm/drm_exec.h | 120 + 5 files changed, 470 insertions(+) create mode 100644 drivers/gpu/drm/drm_exec.c create mode 100644 include/drm/drm_exec.h diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index a79fd3549ff8..a52e6f4117d6 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -493,6 +493,18 @@ DRM Sync Objects .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c :export: +DRM Execution context += + +.. kernel-doc:: drivers/gpu/drm/drm_exec.c + :doc: Overview + +.. kernel-doc:: include/drm/drm_exec.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_exec.c + :export: + GPU Scheduler = diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index afb3b2f5f425..c2f3d234c89e 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -194,6 +194,12 @@ config DRM_TTM GPU memory types. Will be enabled automatically if a device driver uses it. +config DRM_EXEC + tristate + depends on DRM + help + Execution context for command submissions + config DRM_BUDDY tristate depends on DRM diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 7a09a89b493b..414855e2a463 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o # # Memory-management helpers # +# +obj-$(CONFIG_DRM_EXEC) += drm_exec.o obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c new file mode 100644 index ..285bf80740b5 --- /dev/null +++ b/drivers/gpu/drm/drm_exec.c @@ -0,0 +1,330 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ + +#include +#include +#include + +/** + * DOC: Overview + * + * This component mainly abstracts the retry loop necessary for locking + * multiple GEM objects while preparing hardware operations (e.g. command + * submissions, page table updates etc..). + * + * If a contention is detected while locking a GEM object the cleanup procedure + * unlocks all previously locked GEM objects and locks the contended one first + * before locking any further objects. + * + * After an object is locked fences slots can optionally be reserved on the + * dma_resv object inside the GEM object. + * + * A typical usage pattern should look like this:: + * + * struct drm_gem_object *obj; + * struct drm_exec exec; + * unsigned long index; + * int ret; + * + * drm_exec_init(, DRM_EXEC_INTERRUPTIBLE_WAIT); + * drm_exec_until_all_locked() { + * ret = drm_exec_prepare_obj(, boA, 1); + * drm_exec_retry_on_contention(); + * if (ret) + * goto error; + * + * ret = drm_exec_prepare_obj(, boB, 1); + * drm_exec_retry_on_contention(); + * if (ret) + * goto error; + * } + * + * drm_exec_for_each_locked_object(, index, obj) { + * dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ); + * ... + * } + * drm_exec_fini(); + * + * See struct dma_exec for more details. + */ + +/* Dummy value used to initially enter the retry loop */ +#define DRM_EXEC_DUMMY (void*)~0 + +/* Unlock all objects and drop references */ +static void drm_exec_unlock_all(struct drm_exec *exec) +{ + struct drm_gem_object *obj; + unsigned long index; + + drm_exec_for_each_locked_object(exec, index, obj) { + dma_resv_unlock(obj->resv); + drm_gem_object_put(obj); + } + + drm_gem_object_put(exec->prelocked); + exec->prelocked = NULL; +} + +/** + * drm_exec_init - initialize a drm_exec object + * @exec: the drm_exec object to initialize + * @flags: controls locking behavior, see DRM_EXEC_* defines + * + * Initialize the object and
Re: [PATCH v2] drm/i915: Replace kmap() with kmap_local_page()
On 6/20/23 20:07, Sumitra Sharma wrote: On Tue, Jun 20, 2023 at 06:23:38AM -0700, Ira Weiny wrote: Sumitra Sharma wrote: On Sun, Jun 18, 2023 at 11:11:08AM -0700, Ira Weiny wrote: Sumitra Sharma wrote: kmap() has been deprecated in favor of the kmap_local_page() due to high cost, restricted mapping space, the overhead of a global lock for synchronization, and making the process sleep in the absence of free slots. kmap_local_page() is faster than kmap() and offers thread-local and CPU-local mappings, take pagefaults in a local kmap region and preserves preemption by saving the mappings of outgoing tasks and restoring those of the incoming one during a context switch. The mapping is kept thread local in the function “i915_vma_coredump_create” in i915_gpu_error.c Therefore, replace kmap() with kmap_local_page(). Suggested-by: Ira Weiny NIT: No need for the line break between Suggested-by and your signed off line. Hi Ira, What does NIT stand for? Shorthand for 'nitpicking'. "giving too much attention to details that are not important, especially as a way of criticizing: " - https://dictionary.cambridge.org/dictionary/english/nitpicking Via email this is a way for authors of an email to indicate something is technically wrong but while nicely acknowledging that it is not very significant and could be seen as overly critical. For this particular comment I'm showing something to pay attention to next time but that was not a big deal this time around. Hi Ira, Thank for your explanation on NIT. Thank you. I will take care about the line breaks. Signed-off-by: Sumitra Sharma --- Changes in v2: - Replace kmap() with kmap_local_page(). Generally it is customary to attribute a change like this to those who suggested it in a V1 review. For example: - Tvrtko/Thomas: Use kmap_local_page() instead of page_address() Also I don't see Thomas on the new email list. Since he took the time to review V1 he might want to check this version out. I've added him to the 'To:' list. Also a link to V1 is nice. B4 formats it like this: - Link to v1: https://lore.kernel.org/all/20230614123556.ga381...@sumitra.com/ All that said the code looks good to me. So with the above changes. Reviewed-by: Ira Weiny I have noted down the points mentioned above. Thank you again. I am not supposed to create another version of this patch for adding the above mentions, as you and Thomas both gave this patch a reviewed-by tag. Right? Based on this response[*] from Tvrtko I think this version can move through without a v3. Okay! Thanks & regards Sumitra I think one thing worth mentioning in the context of this patch is that IIRC kmap_local_page() will block offlining of the mapping CPU until kunmap_local(), so while I haven't seen any guidelines around the usage of this api for long-held mappings, I figure it's wise to keep the mapping duration short, or at least avoid sleeping with a kmap_local_page() map active. I figured, while page compression is probably to be considered "slow" it's probably not slow enough to motivate kmap() instead of kmap_local_page(), but if anyone feels differently, perhaps it should be considered. With that said, my Reviewed-by: still stands. /Thomas Thanks! Ira [*] https://lore.kernel.org/all/bcb0a1d2-cd4d-a56f-1ee6-7ccfdd2f7...@linux.intel.com/ Thanks all! I'll just re-send the patch for our CI, since it didn't get picked up automatically (stuck in moderation perhaps), with all r-b tags added and extra line space removed and merge it if results will be green. Regards, Tvrtko Thanks & regards Sumitra PS: I am new to the open source vocabulary terms. - Change commit subject and message. drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index f020c0086fbc..bc41500eedf5 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1164,9 +1164,9 @@ i915_vma_coredump_create(const struct intel_gt *gt, drm_clflush_pages(, 1); - s = kmap(page); + s = kmap_local_page(page); ret = compress_page(compress, s, dst, false); - kunmap(page); + kunmap_local(s); drm_clflush_pages(, 1); -- 2.25.1
Re: [PATCH 01/13] drm: execution context for GEM buffers v4
On 6/19/23 11:48, Christian König wrote: Hi, Am 19.06.23 um 11:33 schrieb Thomas Hellström (Intel): [SNIP] Sometimes you want to just drop the contended lock after the above relaxation. (Eviction would be one example), and not add as prelocked, if the contended object goes out of scope. Eviction would in some situations be one such example, -EDEADLOCK leading to an error path where the object should otherwise be freed is another. Perhaps we could add an argument to prepare_obj() as to whether the object should be immediately put after relaxation. I was considering a try_prepare version as well, that should cover this use case. That sounds a bit different from this use-case. The use-case above would, on -EDEADLOCK actually unlock everything, then lock-slow the contending lock and then immediately unlock it and drop. Hui? What would that be good for? It's for the case where you have nested locking, the contended lock has gone out-of-scope and you're probably not going to need it on the next attempt. I think the refcounted "prelocked" object that is lacking in the i915 variant will resolve all correctness / uaf issues, but still the object might be needlessly carried around for yet another locking round. It sounds like try_prepare would just skip locking and continue with everything locked so far still locked? Correct. + ret = drm_exec_obj_locked(exec, obj); + if (unlikely(ret)) { + dma_resv_unlock(obj->resv); + goto error_dropref; + } + + swap(exec->prelocked, obj); + +error_dropref: + /* Always cleanup the contention so that error handling can kick in */ + drm_gem_object_put(obj); + exec->contended = NULL; + return ret; +} + +/** + * drm_exec_prepare_obj - prepare a GEM object for use + * @exec: the drm_exec object with the state + * @obj: the GEM object to prepare + * @num_fences: how many fences to reserve + * + * Prepare a GEM object for use by locking it and reserving fence slots. All + * successfully locked objects are put into the locked container. + * + * Returns: -EDEADLK if a contention is detected, -EALREADY when object is + * already locked, -ENOMEM when memory allocation failed and zero for success. + */ +int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj, + unsigned int num_fences) +{ + int ret; + + ret = drm_exec_lock_contended(exec); + if (unlikely(ret)) + return ret; + + if (exec->prelocked == obj) { + drm_gem_object_put(exec->prelocked); + exec->prelocked = NULL; + + return dma_resv_reserve_fences(obj->resv, num_fences); + } + + if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) + ret = dma_resv_lock_interruptible(obj->resv, >ticket); + else + ret = dma_resv_lock(obj->resv, >ticket); + + if (unlikely(ret == -EDEADLK)) { + drm_gem_object_get(obj); + exec->contended = obj; + return -EDEADLK; + } + + if (unlikely(ret == -EALREADY && + (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES))) + goto reserve_fences; + + if (unlikely(ret)) + return ret; + + ret = drm_exec_obj_locked(exec, obj); + if (ret) + goto error_unlock; + +reserve_fences: + /* Keep locked when reserving fences fails */ + return dma_resv_reserve_fences(obj->resv, num_fences); Ugh, what is the use-case for keeping things locked here? How would a caller tell the difference between an error where everything is locked and nothing is locked? IMO, we should unlock on error here. If there indeed is a use-case we should add a separate function for reserving fences for all locked objects, rather than returning sometimes locked on error sometime not. We return the object locked here because it was to much churn to remove it again from the array and we are getting fully cleaned up at the end anyway. OK, so if we add an unlock functionality, we could just have a consistent locking state on error return? Yeah, that should work. Going to work on this. Great. Thanks, Thomas Regards, Christian. Thanks, Thomas Regards, Christian. Thanks, Thomas + +error_unlock: + dma_resv_unlock(obj->resv); + return ret; +} +EXPORT_SYMBOL(drm_exec_prepare_obj); + +/** + * drm_exec_prepare_array - helper to prepare an array of objects + * @exec: the drm_exec object with the state + * @objects: array of GEM object to prepare + * @num_objects: number of GEM objects in the array + * @num_fences: number of fences to reserve on each GEM object + * + * Prepares all GEM objects in an array, handles contention but aports on first + * error otherwise. Reserves @num_fences on each GEM object after locking it. + * + * Returns: -EALREADY when object is already locked, -ENOMEM when memory + * allocation failed and zero for success. + */ +int drm_exec_prepare_array(struct drm_exec *exec, +
Re: [PATCH 01/13] drm: execution context for GEM buffers v4
Hi! On 6/19/23 11:20, Christian König wrote: Hi guys, Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel): [SNIP] I really need to find some time to work on that anyway. I've been playing with drm_exec for a couple weeks now, and I wanted to share something I hacked to try and make the API simpler and more robust against misuse (see the below diff, which is a slightly adjusted version of your work). It would be good if we could have someone taking charge of this series and address all review comments, I see some of my comments getting lost, we have multiple submitters and I can't find a dri-devel patchwork entry for this. Anyway some comments below. I can try to find some time for the series this week (As long as nobody comes along and has any burning roof). In this version, the user is no longer in control of the retry loop. Instead, it provides an expression (a call to a sub-function) to be re-evaluated each time a contention is detected. IMHO, this makes the 'prepare-objs' functions easier to apprehend, and avoids any mistake like calling drm_exec_continue_on_contention() in an inner loop, or breaking out of the drm_exec_while_all_locked() loop unintentionally. In i915 we've had a very similar helper to this, and while I agree this newer version would probably help make code cleaner, but OTOH there also are some places where the short drm_exec_while_all_locked() -likeblock don't really motivate a separate function. Porting i915 to the current version will take some work, For the xe driver both versions would work fine. Yeah, this is actually what my first version of this looked like. But I abandoned that approach because we have a lot of cases were we just quickly want to lock a few GEM objects and don't want the extra overhead of putting all the state into some bag to forward it to a function. Some additional review comments not related to the interface change below: It also makes the internal management a bit simpler, since we no longer call drm_exec_cleanup() on the first attempt, and can thus get rid of the DRM_EXEC_DUMMY trick. In the below diff, I also re-introduced native support for duplicates as an opt-in, so we don't have to do things like: ret = drm_exec_prepare_obj(exec, obj, num_fences); if (ret == -EALREADY) ret = dma_resv_reserve_fences(obj->resv, num_fences); if (ret) return ret; and can just do: ret = drm_exec_prepare_obj(exec, obj, num_fences); if (ret) return; Of course drivers can open-code a wrapper doing the same thing, but given at least pvr and radeon need this, it'd be nice if the core could support it natively. That's mostly it. Just wanted to share what I had in case you're interested. If not, that's fine too. Regards, Boris --- Documentation/gpu/drm-mm.rst | 12 ++ drivers/gpu/drm/Kconfig | 6 + drivers/gpu/drm/Makefile | 2 + drivers/gpu/drm/drm_exec.c | 274 +++ include/drm/drm_exec.h | 130 + 5 files changed, 424 insertions(+) create mode 100644 drivers/gpu/drm/drm_exec.c create mode 100644 include/drm/drm_exec.h diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index fe40ee686f6e..c9f120cfe730 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -524,6 +524,18 @@ DRM Sync Objects .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c :export: +DRM Execution context += + +.. kernel-doc:: drivers/gpu/drm/drm_exec.c + :doc: Overview + +.. kernel-doc:: include/drm/drm_exec.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_exec.c + :export: + GPU Scheduler = diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 76991720637c..01a38fcdb1c4 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -194,6 +194,12 @@ config DRM_TTM GPU memory types. Will be enabled automatically if a device driver uses it. +config DRM_EXEC + tristate + depends on DRM + help + Execution context for command submissions + config DRM_BUDDY tristate depends on DRM diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 1873f64db171..18a02eaf2d49 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o # # Memory-management helpers # +# +obj-$(CONFIG_DRM_EXEC) += drm_exec.o obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c new file mode 100644 index ..e0ad1a3e1610 --- /dev/null +++ b/drivers/gpu/drm/drm_exec.c @@ -0,0 +1,274 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ + +#include +#include +#include + +/** + * DOC: Overview + * + * This component mainly abstracts the retry loop necessary for locking + * multiple GEM objects while prepar
Re: [PATCH 01/13] drm: execution context for GEM buffers v4
On 6/17/23 13:54, Boris Brezillon wrote: +Matthew who's been using drm_exec in Xe if I'm correct. Hello Christian, On Wed, 14 Jun 2023 15:02:52 +0200 Boris Brezillon wrote: On Wed, 14 Jun 2023 14:30:53 +0200 Christian König wrote: Am 14.06.23 um 14:23 schrieb Boris Brezillon: Hi Christian, On Thu, 4 May 2023 13:51:47 +0200 "Christian König" wrote: This adds the infrastructure for an execution context for GEM buffers which is similar to the existing TTMs execbuf util and intended to replace it in the long term. The basic functionality is that we abstracts the necessary loop to lock many different GEM buffers with automated deadlock and duplicate handling. As many other drivers do already, we are considering using drm_exec() for our resv locking in the PowerVR driver, so we might have more questions/comments in the coming days/weeks, but I already have a couple right now (see below). v3: drop duplicate tracking, radeon is really the only one needing that I think we'd actually be interested in duplicate tracking. Is there any way we can make it an optional feature through some extra helpers/flags? Doesn't have to be done in this patch series, I'm just wondering if this is something we can share as well. You can still capture the -EALREADY error and act appropriately in your driver. For radeon it just means ignoring the error code and going ahead, but that behavior doesn't seem to be desired in most cases. Initially I though we need to separately track how many and how often BOs are duplicated, but there is simply no use for this. [...] +/** + * DOC: Overview + * + * This component mainly abstracts the retry loop necessary for locking + * multiple GEM objects while preparing hardware operations (e.g. command + * submissions, page table updates etc..). + * + * If a contention is detected while locking a GEM object the cleanup procedure + * unlocks all previously locked GEM objects and locks the contended one first + * before locking any further objects. + * + * After an object is locked fences slots can optionally be reserved on the + * dma_resv object inside the GEM object. + * + * A typical usage pattern should look like this:: + * + * struct drm_gem_object *obj; + * struct drm_exec exec; + * unsigned long index; + * int ret; + * + * drm_exec_init(, true); + * drm_exec_while_not_all_locked() { + * ret = drm_exec_prepare_obj(, boA, 1); + * drm_exec_continue_on_contention(); + * if (ret) + * goto error; + * Have you considered defining a drm_exec_try_prepare_obj_or_retry() combining drm_exec_prepare_obj() and drm_exec_continue_on_contention()? #define drm_exec_try_prepare_obj_or_retry(exec, obj, num_fences) \ ({ \ int __ret = drm_exec_prepare_obj(exec, bo, num_fences); \ if (unlikely(drm_exec_is_contended(exec))) \ continue; \ __ret; \ }) This way the following pattern ret = drm_exec_prepare_obj(, boA, 1); drm_exec_continue_on_contention(); if (ret) goto error; can be turned into something more conventional: ret = drm_exec_try_prepare_obj_or_retry(, boA, 1); if (ret) goto error; Yeah, I was considering that as well. But then abandoned it as to complicated. I really need to find some time to work on that anyway. I've been playing with drm_exec for a couple weeks now, and I wanted to share something I hacked to try and make the API simpler and more robust against misuse (see the below diff, which is a slightly adjusted version of your work). It would be good if we could have someone taking charge of this series and address all review comments, I see some of my comments getting lost, we have multiple submitters and I can't find a dri-devel patchwork entry for this. Anyway some comments below. In this version, the user is no longer in control of the retry loop. Instead, it provides an expression (a call to a sub-function) to be re-evaluated each time a contention is detected. IMHO, this makes the 'prepare-objs' functions easier to apprehend, and avoids any mistake like calling drm_exec_continue_on_contention() in an inner loop, or breaking out of the drm_exec_while_all_locked() loop unintentionally. In i915 we've had a very similar helper to this, and while I agree this newer version would probably help make code cleaner, but OTOH there also are some places where the short drm_exec_while_all_locked() -likeblock don't really motivate a separate function. Porting i915 to the current version will take some work, For the xe driver both versions would work fine. Some additional review comments not related to the interface change below: It also makes the internal management a bit simpler, since we no longer call drm_exec_cleanup() on the first
Re: [PATCH v2] drm/i915: Replace kmap() with kmap_local_page()
On 6/18/23 20:11, Ira Weiny wrote: Sumitra Sharma wrote: kmap() has been deprecated in favor of the kmap_local_page() due to high cost, restricted mapping space, the overhead of a global lock for synchronization, and making the process sleep in the absence of free slots. kmap_local_page() is faster than kmap() and offers thread-local and CPU-local mappings, take pagefaults in a local kmap region and preserves preemption by saving the mappings of outgoing tasks and restoring those of the incoming one during a context switch. The mapping is kept thread local in the function “i915_vma_coredump_create” in i915_gpu_error.c Therefore, replace kmap() with kmap_local_page(). Suggested-by: Ira Weiny NIT: No need for the line break between Suggested-by and your signed off line. Signed-off-by: Sumitra Sharma --- Changes in v2: - Replace kmap() with kmap_local_page(). Generally it is customary to attribute a change like this to those who suggested it in a V1 review. For example: - Tvrtko/Thomas: Use kmap_local_page() instead of page_address() Also I don't see Thomas on the new email list. Since he took the time to review V1 he might want to check this version out. I've added him to the 'To:' list. Thanks. Also a link to V1 is nice. B4 formats it like this: - Link to v1: https://lore.kernel.org/all/20230614123556.ga381...@sumitra.com/ All that said the code looks good to me. So with the above changes. Reviewed-by: Ira Weiny LGTM. Reviewed-by: Thomas Hellström - Change commit subject and message. drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index f020c0086fbc..bc41500eedf5 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1164,9 +1164,9 @@ i915_vma_coredump_create(const struct intel_gt *gt, drm_clflush_pages(, 1); - s = kmap(page); + s = kmap_local_page(page); ret = compress_page(compress, s, dst, false); - kunmap(page); + kunmap_local(s); drm_clflush_pages(, 1); -- 2.25.1
Re: [Intel-gfx] [PATCH] drm/i915: Call page_address() on page acquired with GFP_KERNEL flag
On 6/14/23 15:22, Tvrtko Ursulin wrote: On 14/06/2023 13:35, Sumitra Sharma wrote: Pages allocated with GFP_KERNEL cannot come from Highmem. That is why there is no need to call kmap() on them. Are you sure it is GFP_KERNEL backed and not tmpfs? I am not sure myself so let me copy Matt and Thomas if they happen to know off hand. It looks to me these are shmem pages or TTM pages. Both could be highmem. So I think kmap is the correct choice here. /Thomas Regards, Tvrtko Therefore, don't call kmap() on the page coming from vma_res->bi.pages using for_each_sgt_page() in i915_vma_coredump_create(). Use a plain page_address() to get the kernel address instead. Signed-off-by: Sumitra Sharma --- drivers/gpu/drm/i915/i915_gpu_error.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index f020c0086fbc..6f51cb4fc55c 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1164,9 +1164,8 @@ i915_vma_coredump_create(const struct intel_gt *gt, drm_clflush_pages(, 1); - s = kmap(page); + s = page_address(page); ret = compress_page(compress, s, dst, false); - kunmap(page); drm_clflush_pages(, 1);
Re: [PATCH 01/13] drm: execution context for GEM buffers v4
On 5/4/23 13:51, Christian König wrote: This adds the infrastructure for an execution context for GEM buffers which is similar to the existing TTMs execbuf util and intended to replace it in the long term. The basic functionality is that we abstracts the necessary loop to lock many different GEM buffers with automated deadlock and duplicate handling. v2: drop xarray and use dynamic resized array instead, the locking overhead is unecessary and measurable. v3: drop duplicate tracking, radeon is really the only one needing that. v4: fixes issues pointed out by Danilo, some typos in comments and a helper for lock arrays of GEM objects. Signed-off-by: Christian König ... +/** + * struct drm_exec - Execution context + */ +struct drm_exec { + /** +* @interruptible: If locks should be taken interruptible +*/ + boolinterruptible; + + /** +* @ticket: WW ticket used for acquiring locks +*/ + struct ww_acquire_ctx ticket; + + /** +* @num_objects: number of objects locked +*/ + unsigned intnum_objects; + + /** +* @max_objects: maximum objects in array +*/ + unsigned intmax_objects; + + /** +* @objects: array of the locked objects +*/ + struct drm_gem_object **objects; Hi, Christian. Did you consider using a list here with links embedded in gem objects, now that only locked objects are to be put on the list / array. That should work as only the process owning the lock may access the list link. Apart from getting rid of reallocing this is beneficial for the more general types of ww transactions that are used by i915 (and to a minor extent xe as well, I think). In those cases we would want to unlock a temporary held object within the while_not_all_locked() loop and would then have to search the entire array for the correct pointer. Instead one could just remove it from the list of locked objects. Thanks, Thomas
Re: [PATCH v3 0/5] drm/i915: Allow user to set cache at BO creation
On 4/28/23 19:43, Yang, Fei wrote: >> On 4/28/23 17:19, Yang, Fei wrote: >>> On 4/28/23 07:47, fei.y...@intel.com wrote: From: Fei Yang The first three patches in this series are taken from https://patchwork.freedesktop.org/series/116868/ These patches are included here because the last patch has dependency on the pat_index refactor. This series is focusing on uAPI changes, 1. end support for set caching ioctl [PATCH 4/5] 2. add set_pat extension for gem_create [PATCH 5/5] v2: drop one patch that was merged separately 341ad0e8e254 drm/i915/mtl: Add PTE encode function v3: rebase on https://patchwork.freedesktop.org/series/117082/ >>> >>> Hi, Fei. >>> >>> Does this uAPI update also affect any discrete GPUs supported by i915, >> >> It does. >> >>> And in that case, does it allow setting non-snooping PAT indices on >>> those devices? >> >> It allows setting PAT indices specified in >> KMD does a sanity check so that it won't go over the max recommended >> by bspec. >> >>> If so, since the uAPI for discrete GPU devices doesn't allow incoherency >>> between GPU and CPU (apart from write-combining buffering), the correct >>> CPU caching mode matching the PAT index needs to be selected for the >>> buffer object in i915_ttm_select_tt_caching(). >> >> The patch doesn't affect the CPU caching mode setting logic though. >> And the caching settings for objects created by kernel should remain >> the same for both CPU and GPU, objects created by userspace are >> managed completely by userspace. >> >> One question though, what do you mean by non-snooping PAT indices? > > Yes, that was actually the bottom question: What do these PAT settings > allow you to do WRT the snooping on supported discrete devices (DG2) on > i915? > If they indeed don't allow disabling snooping, then that's not a problem. When dGPU's access SysMem, the PCIe default is for that access to snoop the host's caches. All of our current dGPU's do that -- independent of PAT setting. > If they do, the ttm code there needs some modification. I'm not familiar with ttm, but if your concern is that certain PAT index could disable snooping, that is not possible for current dGPU's. I think it is possible for Xe2/3 though, because there will be COH_MODE defined in the PAT registers going forward. OK. If that's the case, then it should be safe to disregard this concern. Thanks, Thomas -Fei
Re: [PATCH v3 0/5] drm/i915: Allow user to set cache at BO creation
On 4/28/23 17:19, Yang, Fei wrote: > On 4/28/23 07:47, fei.y...@intel.com wrote: >> From: Fei Yang >> >> The first three patches in this series are taken from >> https://patchwork.freedesktop.org/series/116868/ >> These patches are included here because the last patch >> has dependency on the pat_index refactor. >> >> This series is focusing on uAPI changes, >> 1. end support for set caching ioctl [PATCH 4/5] >> 2. add set_pat extension for gem_create [PATCH 5/5] >> >> v2: drop one patch that was merged separately >> 341ad0e8e254 drm/i915/mtl: Add PTE encode function >> v3: rebase on https://patchwork.freedesktop.org/series/117082/ > > Hi, Fei. > > Does this uAPI update also affect any discrete GPUs supported by i915, It does. > And in that case, does it allow setting non-snooping PAT indices on > those devices? It allows setting PAT indices specified in KMD does a sanity check so that it won't go over the max recommended by bspec. > If so, since the uAPI for discrete GPU devices doesn't allow incoherency > between GPU and CPU (apart from write-combining buffering), the correct > CPU caching mode matching the PAT index needs to be selected for the > buffer object in i915_ttm_select_tt_caching(). The patch doesn't affect the CPU caching mode setting logic though. And the caching settings for objects created by kernel should remain the same for both CPU and GPU, objects created by userspace are managed completely by userspace. One question though, what do you mean by non-snooping PAT indices? The PAT index registers don't really control coherency mode in the past, I believe MTL is the first one that has COH_MODE in the PAT registers. Aren't discrete GPUs snooping CPU cache automatically? Yes, that was actually the bottom question: What do these PAT settings allow you to do WRT the snooping on supported discrete devices (DG2) on i915? If they indeed don't allow disabling snooping, then that's not a problem. If they do, the ttm code there needs some modification. Thanks, Thomas -Fei > Thanks, > Thomas > >> >> Fei Yang (5): >> drm/i915: preparation for using PAT index >> drm/i915: use pat_index instead of cache_level >> drm/i915: make sure correct pte encode is used >> drm/i915/mtl: end support for set caching ioctl >> drm/i915: Allow user to set cache at BO creation >> >> drivers/gpu/drm/i915/display/intel_dpt.c | 12 +-- >> drivers/gpu/drm/i915/gem/i915_gem_create.c | 36 + >> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 46 ++- >> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 ++- >> drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- >> drivers/gpu/drm/i915/gem/i915_gem_object.c | 67 +++- >> drivers/gpu/drm/i915/gem/i915_gem_object.h | 8 ++ >> .../gpu/drm/i915/gem/i915_gem_object_types.h | 26 +- >> drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 9 ++- >> drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 2 - >> drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 +- >> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 16 ++-- >> .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- >> .../drm/i915/gem/selftests/i915_gem_migrate.c | 2 +- >> .../drm/i915/gem/selftests/i915_gem_mman.c | 2 +- >> drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 10 ++- >> drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 73 + >> drivers/gpu/drm/i915/gt/gen8_ppgtt.h | 3 +- >> drivers/gpu/drm/i915/gt/intel_ggtt.c | 76 +- >> drivers/gpu/drm/i915/gt/intel_gtt.h | 20 +++-- >> drivers/gpu/drm/i915/gt/intel_migrate.c | 47 ++- >> drivers/gpu/drm/i915/gt/intel_migrate.h | 13 ++- >> drivers/gpu/drm/i915/gt/intel_ppgtt.c | 6 +- >> drivers/gpu/drm/i915/gt/selftest_migrate.c | 47 +-- >> drivers/gpu/drm/i915/gt/selftest_reset.c | 8 +- >> drivers/gpu/drm/i915/gt/selftest_timeline.c | 2 +- >> drivers/gpu/drm/i915/gt/selftest_tlb.c | 4 +- >> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 10 ++- >> drivers/gpu/drm/i915/i915_debugfs.c | 55 ++--- >> drivers/gpu/drm/i915/i915_gem.c | 16 +++- >> drivers/gpu/drm/i915/i915_gpu_error.c | 8 +- >> drivers/gpu/drm/i915/i915_pci.c | 79 --- >> drivers/gpu/drm/i915/i915_vma.c | 16 ++-- >> drivers/gpu/drm/i915/i915_vma.h | 2 +- >> drivers/gpu/drm/i915/i915_vma_types.h | 2 - >> drivers/gpu/drm/i915/intel_device_info.h | 5 ++ >> drivers/gpu/drm/i915/selftests/i915_gem.c | 5 +- >> .../gpu/drm/i915/selftests/i915_gem_evict.c | 4 +- >> drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++-- >> .../drm/i915/selftests/intel_memory_region.c | 4 +- >> .../gpu/drm/i915/selftests/mock_gem_device.c | 9 +++ >> drivers/gpu/drm/i915/selftests/mock_gtt.c | 8 +- >> include/uapi/drm/i915_drm.h | 36 + >> tools/include/uapi/drm/i915_drm.h | 36
Re: [PATCH v3 0/5] drm/i915: Allow user to set cache at BO creation
On 4/28/23 07:47, fei.y...@intel.com wrote: From: Fei Yang The first three patches in this series are taken from https://patchwork.freedesktop.org/series/116868/ These patches are included here because the last patch has dependency on the pat_index refactor. This series is focusing on uAPI changes, 1. end support for set caching ioctl [PATCH 4/5] 2. add set_pat extension for gem_create [PATCH 5/5] v2: drop one patch that was merged separately 341ad0e8e254 drm/i915/mtl: Add PTE encode function v3: rebase on https://patchwork.freedesktop.org/series/117082/ Hi, Fei. Does this uAPI update also affect any discrete GPUs supported by i915, And in that case, does it allow setting non-snooping PAT indices on those devices? If so, since the uAPI for discrete GPU devices doesn't allow incoherency between GPU and CPU (apart from write-combining buffering), the correct CPU caching mode matching the PAT index needs to be selected for the buffer object in i915_ttm_select_tt_caching(). Thanks, Thomas Fei Yang (5): drm/i915: preparation for using PAT index drm/i915: use pat_index instead of cache_level drm/i915: make sure correct pte encode is used drm/i915/mtl: end support for set caching ioctl drm/i915: Allow user to set cache at BO creation drivers/gpu/drm/i915/display/intel_dpt.c | 12 +-- drivers/gpu/drm/i915/gem/i915_gem_create.c| 36 + drivers/gpu/drm/i915/gem/i915_gem_domain.c| 46 ++- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 10 ++- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +- drivers/gpu/drm/i915/gem/i915_gem_object.c| 67 +++- drivers/gpu/drm/i915/gem/i915_gem_object.h| 8 ++ .../gpu/drm/i915/gem/i915_gem_object_types.h | 26 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 9 ++- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 4 +- drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 16 ++-- .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +- .../drm/i915/gem/selftests/i915_gem_migrate.c | 2 +- .../drm/i915/gem/selftests/i915_gem_mman.c| 2 +- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 10 ++- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 73 + drivers/gpu/drm/i915/gt/gen8_ppgtt.h | 3 +- drivers/gpu/drm/i915/gt/intel_ggtt.c | 76 +- drivers/gpu/drm/i915/gt/intel_gtt.h | 20 +++-- drivers/gpu/drm/i915/gt/intel_migrate.c | 47 ++- drivers/gpu/drm/i915/gt/intel_migrate.h | 13 ++- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 6 +- drivers/gpu/drm/i915/gt/selftest_migrate.c| 47 +-- drivers/gpu/drm/i915/gt/selftest_reset.c | 8 +- drivers/gpu/drm/i915/gt/selftest_timeline.c | 2 +- drivers/gpu/drm/i915/gt/selftest_tlb.c| 4 +- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 10 ++- drivers/gpu/drm/i915/i915_debugfs.c | 55 ++--- drivers/gpu/drm/i915/i915_gem.c | 16 +++- drivers/gpu/drm/i915/i915_gpu_error.c | 8 +- drivers/gpu/drm/i915/i915_pci.c | 79 --- drivers/gpu/drm/i915/i915_vma.c | 16 ++-- drivers/gpu/drm/i915/i915_vma.h | 2 +- drivers/gpu/drm/i915/i915_vma_types.h | 2 - drivers/gpu/drm/i915/intel_device_info.h | 5 ++ drivers/gpu/drm/i915/selftests/i915_gem.c | 5 +- .../gpu/drm/i915/selftests/i915_gem_evict.c | 4 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++-- .../drm/i915/selftests/intel_memory_region.c | 4 +- .../gpu/drm/i915/selftests/mock_gem_device.c | 9 +++ drivers/gpu/drm/i915/selftests/mock_gtt.c | 8 +- include/uapi/drm/i915_drm.h | 36 + tools/include/uapi/drm/i915_drm.h | 36 + 44 files changed, 621 insertions(+), 243 deletions(-)
Re: [RFC PATCH 08/10] dma-buf/dma-fence: Introduce long-running completion fences
On 4/4/23 15:10, Christian König wrote: Am 04.04.23 um 14:54 schrieb Thomas Hellström: Hi, Christian, On 4/4/23 11:09, Christian König wrote: Am 04.04.23 um 02:22 schrieb Matthew Brost: From: Thomas Hellström For long-running workloads, drivers either need to open-code completion waits, invent their own synchronization primitives or internally use dma-fences that do not obey the cross-driver dma-fence protocol, but without any lockdep annotation all these approaches are error prone. So since for example the drm scheduler uses dma-fences it is desirable for a driver to be able to use it for throttling and error handling also with internal dma-fences tha do not obey the cros-driver dma-fence protocol. Introduce long-running completion fences in form of dma-fences, and add lockdep annotation for them. In particular: * Do not allow waiting under any memory management locks. * Do not allow to attach them to a dma-resv object. * Introduce a new interface for adding callbacks making the helper adding a callback sign off on that it is aware that the dma-fence may not complete anytime soon. Typically this will be the scheduler chaining a new long-running fence on another one. Well that's pretty much what I tried before: https://lwn.net/Articles/893704/ And the reasons why it was rejected haven't changed. Regards, Christian. Yes, TBH this was mostly to get discussion going how we'd best tackle this problem while being able to reuse the scheduler for long-running workloads. I couldn't see any clear decision on your series, though, but one main difference I see is that this is intended for driver-internal use only. (I'm counting using the drm_scheduler as a helper for driver-private use). This is by no means a way to try tackle the indefinite fence problem. Well this was just my latest try to tackle this, but essentially the problems are the same as with your approach: When we express such operations as dma_fence there is always the change that we leak that somewhere. My approach of adding a flag noting that this operation is dangerous and can't be synced with something memory management depends on tried to contain this as much as possible, but Daniel still pretty clearly rejected it (for good reasons I think). We could ofc invent a completely different data-type that abstracts the synchronization the scheduler needs in the long-running case, or each driver could hack something up, like sleeping in the prepare_job() or run_job() callback for throttling, but those waits should still be annotated in one way or annotated one way or another (and probably in a similar way across drivers) to make sure we don't do anything bad. So any suggestions as to what would be the better solution here would be appreciated. Mhm, do we really the the GPU scheduler for that? I mean in the 1 to 1 case you basically just need a component which collects the dependencies as dma_fence and if all of them are fulfilled schedules a work item. As long as the work item itself doesn't produce a dma_fence it can then still just wait for other none dma_fence dependencies. Then the work function could submit the work and wait for the result. The work item would then pretty much represent what you want, you can wait for it to finish and pass it along as long running dependency. Maybe give it a funky name and wrap it up in a structure, but that's basically it. This very much sounds like a i915_sw_fence for the dependency tracking and dma_fence_work for the actual work although it's completion fence is a dma_fence. Although that goes against the whole idea of a condition for merging the xe driver would be that we implement some sort of minimal scaffolding for long-running workloads in the drm scheduler, and the thinking behind that is to avoid implementing intel-specific solutions like those... Thanks, Thomas Regards, Christian. Thanks, Thomas
Re: [PATCH 1/9] drm: execution context for GEM buffers v3
Hi Christian On 2/28/23 09:33, Christian König wrote: This adds the infrastructure for an execution context for GEM buffers which is similar to the existinc TTMs execbuf util and intended to replace it in the long term. The basic functionality is that we abstracts the necessary loop to lock many different GEM buffers with automated deadlock and duplicate handling. v2: drop xarray and use dynamic resized array instead, the locking overhead is unecessary and measureable. v3: drop duplicate tracking, radeon is really the only one needing that. Signed-off-by: Christian König Nice. This seems to have all we need for now for Xe as well, although not for i915 ATM. I see future exension of this for things like sleeping lock evictions (where locks may go out-of-scope during the locking loop), including other resv containers, passing through dma-buf move_notify() etc, what are your thoughts around that? Extend this or build a different functionality and make this a specialization of that? What about the drm_gem_lock_reservations() interface? While a bit less capable it's confusing having two apis doing essentially the same thing, could we deprecate that? Finally a comment below: --- Documentation/gpu/drm-mm.rst | 12 ++ drivers/gpu/drm/Kconfig | 6 + drivers/gpu/drm/Makefile | 2 + drivers/gpu/drm/drm_exec.c | 249 +++ include/drm/drm_exec.h | 115 5 files changed, 384 insertions(+) create mode 100644 drivers/gpu/drm/drm_exec.c create mode 100644 include/drm/drm_exec.h diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index a79fd3549ff8..a52e6f4117d6 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -493,6 +493,18 @@ DRM Sync Objects .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c :export: +DRM Execution context += + +.. kernel-doc:: drivers/gpu/drm/drm_exec.c + :doc: Overview + +.. kernel-doc:: include/drm/drm_exec.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/drm_exec.c + :export: + GPU Scheduler = diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 17d252dc25e2..84a5fc28c48d 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -200,6 +200,12 @@ config DRM_TTM GPU memory types. Will be enabled automatically if a device driver uses it. +config DRM_EXEC + tristate + depends on DRM + help + Execution context for command submissions + config DRM_BUDDY tristate depends on DRM diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index ab4460fcd63f..d40defbb0347 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -78,6 +78,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o # # Memory-management helpers # +# +obj-$(CONFIG_DRM_EXEC) += drm_exec.o obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c new file mode 100644 index ..df546cc5a227 --- /dev/null +++ b/drivers/gpu/drm/drm_exec.c @@ -0,0 +1,249 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ + +#include +#include +#include + +/** + * DOC: Overview + * + * This component mainly abstracts the retry loop necessary for locking + * multiple GEM objects while preparing hardware operations (e.g. command + * submissions, page table updates etc..). + * + * If a contention is detected while locking a GEM object the cleanup procedure + * unlocks all previously locked GEM objects and locks the contended one first + * before locking any further objects. + * + * After an object is locked fences slots can optionally be reserved on the + * dma_resv object inside the GEM object. + * + * A typical usage pattern should look like this:: + * + * struct drm_gem_object *obj; + * struct drm_exec exec; + * unsigned long index; + * int ret; + * + * drm_exec_init(, true); + * drm_exec_while_not_all_locked() { + * ret = drm_exec_prepare_obj(, boA, 1); + * drm_exec_continue_on_contention(); + * if (ret) + * goto error; + * + * ret = drm_exec_lock(, boB, 1); Should be drm_exec_prepare_obj()? + * drm_exec_continue_on_contention(); + * if (ret) + * goto error; + * } + * + * drm_exec_for_each_locked_object(, index, obj) { + * dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ); + * ... + * } + * drm_exec_fini(); + * + * See struct dma_exec for more details. + */ + +/* Dummy value used to initially enter the retry loop */ +#define DRM_EXEC_DUMMY (void*)~0 + +/* Unlock all objects and drop references */ +static void drm_exec_unlock_all(struct drm_exec *exec) +{ + struct drm_gem_object *obj; + unsigned long index; + +
Re: [PATCH 9/9] drm: move ttm_execbuf_util into vmwgfx
Hi, On 3/9/23 06:14, Zack Rusin wrote: On Wed, 2023-03-08 at 10:10 +0100, Christian König wrote: Am 08.03.23 um 06:14 schrieb Zack Rusin: On Tue, 2023-02-28 at 09:34 +0100, Christian König wrote: VMWGFX is the only remaining user of this and should probably moved over to drm_exec when it starts using GEM as well. Is this because vmwgfx piggybacks buffer-id relocations on top of ttm validations or did you just find it too hard to port it over? I'd prefer to avoid ttm moves to vmwgfx and at least have a clear idea of what we need to do to port. I've just found it to hard to port it over because vmwgfx does some strange things with the validation code here. If you want we can take a deeper look at this together, but I need to find some time. Alternatively just tell me how to do it and I will add that to the patch set :) I don't want to hold up the set (it looks good btw), because I had to look at something else today and tomorrow. We overload the validation lists to do quite a bit more than just reservations though. There are, I think, four separate things that need to be refactored there (Christian, feel free to skip this section, this is mainly for VMware folks on the team): 1) Relocations - userspace uses the id's of the bo's in the command stream, but on the kernel side those id's are different (or in vmwgfx terminology gem id != mob id), so the buffer id's in the command stream need to be replaced, 2) Resource validation. vmwgfx splits the userspace objects into buffers and resources (shaders, surfaces, contexts). The resources are not buffers but are backed by them. A single buffer can back multiple different resources and sometimes the kernel has to actually allocate a buffer to back a resource and attach it to it (i.e. in common terminology buffer is the memory and resources are placed in it) . Now this shouldn't be in the kernel at all, the resources shouldn't have been kernel objects and instead we should have left them completely to userspace. The reason behind this is partly historical, since initially the resources (IIRC surfaces and shaders at that time), were per-device and not per context and not backed by buffer objects. The main reason for not doing the transition to user-space for resources was the SVGA device "between-draw-call-validation". If you for example unbound a mob that was backing a resource that was bound to a context, you'd need to start a whole chain of resource invalidations to avoid the device complaining about invalid setups at awkward moments, for example when it felt like suspending. Not sure whether that is gone now though, or whether there are better ways to handle that. /Thomas 3) Coherency tracking. We use validation lists as a central place for tracking which bo's/resources are used in a command buffer and we use it to keep track of which buffers/resources will endup dirty to implement coherency. 4) Central place to allocate memory for relocation/validation nodes. Where we want to endup is with 2 completely gone from the kernel side and 1, 3 and 4 refactored and cleaned up. I think there's at least 4 separate patches to this port, so it's not a trivial thing. We will take a look at this on Friday in more detail to see what we can do. z
Re: [PATCH 9/9] drm: move ttm_execbuf_util into vmwgfx
On 3/8/23 10:10, Christian König wrote: Am 08.03.23 um 06:14 schrieb Zack Rusin: On Tue, 2023-02-28 at 09:34 +0100, Christian König wrote: VMWGFX is the only remaining user of this and should probably moved over to drm_exec when it starts using GEM as well. Is this because vmwgfx piggybacks buffer-id relocations on top of ttm validations or did you just find it too hard to port it over? I'd prefer to avoid ttm moves to vmwgfx and at least have a clear idea of what we need to do to port. I've just found it to hard to port it over because vmwgfx does some strange things with the validation code here. If you want we can take a deeper look at this together, but I need to find some time. Alternatively just tell me how to do it and I will add that to the patch set :) Regards, Christian. Hmm. It turns out Xe was using these from the very start as well. But I will take a look at what it take to deprecate that usage, so don't let that stop this removal. We need a more flexible WW transaction handling anyway. /Thomas z
Re: [PATCH 2/3] drm/amd: Convert amdgpu to use suballocation helper.
On 2/23/23 17:22, Christian König wrote: Am 23.02.23 um 15:29 schrieb Thomas Hellström: On 2/23/23 12:15, Christian König wrote: Am 23.02.23 um 11:57 schrieb Thomas Hellström: From: Maarten Lankhorst Now that we have a generic suballocation helper, Use it in amdgpu. For lines that get moved or changed, also fix up pre-existing style issues. Signed-off-by: Maarten Lankhorst Co-developed-by: Thomas Hellström Signed-off-by: Thomas Hellström --- drivers/gpu/drm/Kconfig | 1 + drivers/gpu/drm/amd/amdgpu/Kconfig | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu.h | 26 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 23 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c | 324 ++--- 7 files changed, 46 insertions(+), 337 deletions(-) diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 8fbe57407c60..73ddfdf3a894 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -77,6 +77,7 @@ config DRM_KUNIT_TEST select DRM_DISPLAY_HELPER select DRM_LIB_RANDOM select DRM_KMS_HELPER + select DRM_SUBALLOC_HELPER select DRM_BUDDY select DRM_EXPORT_FOR_TESTS if m select DRM_KUNIT_TEST_HELPERS This looks like it's misplaced, apart from that the patch looks good to me. Looks like a TAB vs spaces issue. The resulting file looks correct. Also added the same select for Radeon in the following patch which was forgotten. That wasn't what I meant. This here is the patch to change amdgpu, but you are adding the dependency to the KUNIT test. It looks like that adding this line should be in patch #1, not patch #2. Ah yes, hmm. that change shouldn't be there at all. It probably strayed from the Kunit test that I haven't posted yet. Will do a v4. We need to get rid of those integer divisions as well. /Thomas Regards, Christian. Added your R-B to all patches, even if it wasn't exlicit for this one. Please let me know if I misunderstood that one. Thanks, Thomas Regards, Christian. diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig b/drivers/gpu/drm/amd/amdgpu/Kconfig index 5341b6b242c3..0ed12171450b 100644 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig @@ -18,6 +18,7 @@ config DRM_AMDGPU select BACKLIGHT_CLASS_DEVICE select INTERVAL_TREE select DRM_BUDDY + select DRM_SUBALLOC_HELPER # amdgpu depends on ACPI_VIDEO when ACPI is enabled, for select to work # ACPI_VIDEO's dependencies must also be selected. select INPUT if ACPI diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 164141bc8b4a..dda88090f044 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -424,29 +424,11 @@ struct amdgpu_clock { * alignment). */ -#define AMDGPU_SA_NUM_FENCE_LISTS 32 - struct amdgpu_sa_manager { - wait_queue_head_t wq; - struct amdgpu_bo *bo; - struct list_head *hole; - struct list_head flist[AMDGPU_SA_NUM_FENCE_LISTS]; - struct list_head olist; - unsigned size; - uint64_t gpu_addr; - void *cpu_ptr; - uint32_t domain; - uint32_t align; -}; - -/* sub-allocation buffer */ -struct amdgpu_sa_bo { - struct list_head olist; - struct list_head flist; - struct amdgpu_sa_manager *manager; - unsigned soffset; - unsigned eoffset; - struct dma_fence *fence; + struct drm_suballoc_manager base; + struct amdgpu_bo *bo; + uint64_t gpu_addr; + void *cpu_ptr; }; int amdgpu_fence_slab_init(void); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index b348dbe2..df7eb0b7c4b9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -69,7 +69,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct amdgpu_vm *vm, if (size) { r = amdgpu_sa_bo_new(>ib_pools[pool_type], - >sa_bo, size, 256); + >sa_bo, size); if (r) { dev_err(adev->dev, "failed to get a new IB (%d)\n", r); return r; @@ -309,8 +309,7 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev) for (i = 0; i < AMDGPU_IB_POOL_MAX; i++) { r = amdgpu_sa_bo_manager_init(adev, >ib_pools[i], - AMDGPU_IB_POOL_SIZE, - AMDGPU_GPU_PAGE_SIZE, + AMDGPU_IB_POOL_SIZE, 256, AMDGPU_GEM_DOMAIN_GTT); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 93207badf83f..5a85726ce853 100644 ---
Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI
On 1/19/23 05:58, Matthew Brost wrote: On Thu, Jan 19, 2023 at 04:44:23AM +0100, Danilo Krummrich wrote: On 1/18/23 21:37, Thomas Hellström (Intel) wrote: On 1/18/23 07:12, Danilo Krummrich wrote: This commit provides the implementation for the new uapi motivated by the Vulkan API. It allows user mode drivers (UMDs) to: 1) Initialize a GPU virtual address (VA) space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA space managed by the kernel and userspace, respectively. 2) Allocate and free a VA space region as well as bind and unbind memory to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl. UMDs can request the named operations to be processed either synchronously or asynchronously. It supports DRM syncobjs (incl. timelines) as synchronization mechanism. The management of the GPU VA mappings is implemented with the DRM GPU VA manager. 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The execution happens asynchronously. It supports DRM syncobj (incl. timelines) as synchronization mechanism. DRM GEM object locking is handled with drm_exec. Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM GPU scheduler for the asynchronous paths. Signed-off-by: Danilo Krummrich --- Documentation/gpu/driver-uapi.rst | 3 + drivers/gpu/drm/nouveau/Kbuild | 2 + drivers/gpu/drm/nouveau/Kconfig | 2 + drivers/gpu/drm/nouveau/nouveau_abi16.c | 16 + drivers/gpu/drm/nouveau/nouveau_abi16.h | 1 + drivers/gpu/drm/nouveau/nouveau_drm.c | 23 +- drivers/gpu/drm/nouveau/nouveau_drv.h | 9 +- drivers/gpu/drm/nouveau/nouveau_exec.c | 310 ++ drivers/gpu/drm/nouveau/nouveau_exec.h | 55 ++ drivers/gpu/drm/nouveau/nouveau_sched.c | 780 drivers/gpu/drm/nouveau/nouveau_sched.h | 98 +++ 11 files changed, 1295 insertions(+), 4 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h ... +static struct dma_fence * +nouveau_bind_job_run(struct nouveau_job *job) +{ + struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job); + struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli); + struct bind_job_op *op; + int ret = 0; + I was looking at how nouveau does the async binding compared to how xe does it. It looks to me that this function being a scheduler run_job callback is the main part of the VM_BIND dma-fence signalling critical section for the job's done_fence and if so, needs to be annotated as such? Yes, that's the case. For example nouveau_uvma_region_new allocates memory, which is not allowed if in a dma_fence signalling critical section and the locking also looks suspicious? Thanks for pointing this out, I missed that somehow. I will change it to pre-allocate new regions, mappings and page tables within the job's submit() function. Yea that what we basically do in Xe, in the IOCTL step allocate all the backing store for new page tables, populate new page tables (these are not yet visible in the page table structure), and in last step which is executed after all the dependencies are satified program all the leaf entires making the new binding visible. We screwed have this up by defering most of the IOCTL to a worker but will fix this fix this one way or another soon - get rid of worker or introduce a type of sync that is signaled after the worker + publish the dma-fence in the worker. I'd like to close on this one soon. For the ops structures the drm_gpuva_manager allocates for reporting the split/merge steps back to the driver I have ideas to entirely avoid allocations, which also is a good thing in respect of Christians feedback regarding the huge amount of mapping requests some applications seem to generate. It should be fine to have allocations to report the split/merge step as this step should be before a dma-fence is published, but yea if possible to avoid extra allocs as that is always better. Also BTW, great work on drm_gpuva_manager too. We will almost likely pick this up in Xe rather than open coding all of this as we currently do. We should probably start the port to this soon so we can contribute to the implementation and get both of our drivers upstream sooner. Regarding the locking, anything specific that makes it look suspicious to you? I haven't looked into this too but almost certainly Thomas is suggesting that if you allocate memory anywhere under the nouveau_uvmm_lock then you can't use this lock in the run_job() callback as this in the dma-fencing path. Yes, that was what looked suspicious to me, although I haven't either looked at the code in detail to say for sure. But starting by annotating this with dma_fence_[be
Re: [PATCH drm-next 13/14] drm/nouveau: implement new VM_BIND UAPI
On 1/18/23 07:12, Danilo Krummrich wrote: This commit provides the implementation for the new uapi motivated by the Vulkan API. It allows user mode drivers (UMDs) to: 1) Initialize a GPU virtual address (VA) space via the new DRM_IOCTL_NOUVEAU_VM_INIT ioctl for UMDs to specify the portion of VA space managed by the kernel and userspace, respectively. 2) Allocate and free a VA space region as well as bind and unbind memory to the GPUs VA space via the new DRM_IOCTL_NOUVEAU_VM_BIND ioctl. UMDs can request the named operations to be processed either synchronously or asynchronously. It supports DRM syncobjs (incl. timelines) as synchronization mechanism. The management of the GPU VA mappings is implemented with the DRM GPU VA manager. 3) Execute push buffers with the new DRM_IOCTL_NOUVEAU_EXEC ioctl. The execution happens asynchronously. It supports DRM syncobj (incl. timelines) as synchronization mechanism. DRM GEM object locking is handled with drm_exec. Both, DRM_IOCTL_NOUVEAU_VM_BIND and DRM_IOCTL_NOUVEAU_EXEC, use the DRM GPU scheduler for the asynchronous paths. Signed-off-by: Danilo Krummrich --- Documentation/gpu/driver-uapi.rst | 3 + drivers/gpu/drm/nouveau/Kbuild | 2 + drivers/gpu/drm/nouveau/Kconfig | 2 + drivers/gpu/drm/nouveau/nouveau_abi16.c | 16 + drivers/gpu/drm/nouveau/nouveau_abi16.h | 1 + drivers/gpu/drm/nouveau/nouveau_drm.c | 23 +- drivers/gpu/drm/nouveau/nouveau_drv.h | 9 +- drivers/gpu/drm/nouveau/nouveau_exec.c | 310 ++ drivers/gpu/drm/nouveau/nouveau_exec.h | 55 ++ drivers/gpu/drm/nouveau/nouveau_sched.c | 780 drivers/gpu/drm/nouveau/nouveau_sched.h | 98 +++ 11 files changed, 1295 insertions(+), 4 deletions(-) create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.c create mode 100644 drivers/gpu/drm/nouveau/nouveau_exec.h create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.c create mode 100644 drivers/gpu/drm/nouveau/nouveau_sched.h ... +static struct dma_fence * +nouveau_bind_job_run(struct nouveau_job *job) +{ + struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job); + struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(job->cli); + struct bind_job_op *op; + int ret = 0; + I was looking at how nouveau does the async binding compared to how xe does it. It looks to me that this function being a scheduler run_job callback is the main part of the VM_BIND dma-fence signalling critical section for the job's done_fence and if so, needs to be annotated as such? For example nouveau_uvma_region_new allocates memory, which is not allowed if in a dma_fence signalling critical section and the locking also looks suspicious? Thanks, Thomas + nouveau_uvmm_lock(uvmm); + list_for_each_op(op, _job->ops) { + switch (op->op) { + case OP_ALLOC: { + bool sparse = op->flags & DRM_NOUVEAU_VM_BIND_SPARSE; + + ret = nouveau_uvma_region_new(uvmm, + op->va.addr, + op->va.range, + sparse); + if (ret) + goto out_unlock; + break; + } + case OP_FREE: + ret = nouveau_uvma_region_destroy(uvmm, + op->va.addr, + op->va.range); + if (ret) + goto out_unlock; + break; + case OP_MAP: + ret = nouveau_uvmm_sm_map(uvmm, + op->va.addr, op->va.range, + op->gem.obj, op->gem.offset, + op->flags && 0xff); + if (ret) + goto out_unlock; + break; + case OP_UNMAP: + ret = nouveau_uvmm_sm_unmap(uvmm, + op->va.addr, + op->va.range); + if (ret) + goto out_unlock; + break; + } + } + +out_unlock: + nouveau_uvmm_unlock(uvmm); + if (ret) + NV_PRINTK(err, job->cli, "bind job failed: %d\n", ret); + return ERR_PTR(ret); +} + +static void +nouveau_bind_job_free(struct nouveau_job *job) +{ + struct nouveau_bind_job *bind_job = to_nouveau_bind_job(job); + struct bind_job_op *op, *next; + + list_for_each_op_safe(op, next, _job->ops) { + struct drm_gem_object *obj =
Re: [PATCH 05/16] drm/vmwgfx: Refactor resource validation hashtable to use linux/hashtable implementation.
Hi, Zack, On 10/19/22 18:32, Zack Rusin wrote: On Wed, 2022-10-19 at 12:21 +0200, Thomas Hellström (Intel) wrote: ⚠ External Email On 10/17/22 21:54, Zack Rusin wrote: From: Maaz Mombasawala Vmwgfx's hashtab implementation needs to be replaced with linux/hashtable to reduce maintenence burden. As part of this effort, refactor the res_ht hashtable used for resource validation during execbuf execution to use linux/hashtable implementation. This also refactors vmw_validation_context to use vmw_sw_context as the container for the hashtable, whereas before it used a vmwgfx_open_hash directly. This makes vmw_validation_context less generic, but there is no functional change since res_ht is the only instance where validation context used a hashtable in vmwgfx driver. Why is a pointer to the vmw_sw_context added to the validation context, rather than a pointer to the new hash table type itself? Seems like an unnecessary indirection which adds a dependency on the sw context to the validation context? Hi, Thomas. Thanks for taking a look at this! That's because the entire public interface of hashtable depends on it being a struct on which sizeof works rather than a pointer, i.e. almost the entire interface of hasthbale.h is defined and they all require that HASH_SIZE(hashtable) which is defined as #define HASH_SIZE(hashtable) (ARRAY_SIZE(hashtable)) works on the hashtable. So the interface of hashtable.h can't operate on pointers, but rather needs the full struct. So the only two options are either rewriting the functions from linux/hashtable.h to take explicit HASH_SIZE(hashtable) or make sure that in the functions where you will use hashtable.h you pass the parent struct so that sizeof on the hashtable works correctly. I've seen both approaches in the kernel, but in this case we thought the latter made more sense. Ah, Ok, yes, tricky one. Lots of options, none of them perfect. Reviewed-by: Thomas Hellström
Re: [PATCH 05/16] drm/vmwgfx: Refactor resource validation hashtable to use linux/hashtable implementation.
On 10/17/22 21:54, Zack Rusin wrote: From: Maaz Mombasawala Vmwgfx's hashtab implementation needs to be replaced with linux/hashtable to reduce maintenence burden. As part of this effort, refactor the res_ht hashtable used for resource validation during execbuf execution to use linux/hashtable implementation. This also refactors vmw_validation_context to use vmw_sw_context as the container for the hashtable, whereas before it used a vmwgfx_open_hash directly. This makes vmw_validation_context less generic, but there is no functional change since res_ht is the only instance where validation context used a hashtable in vmwgfx driver. Why is a pointer to the vmw_sw_context added to the validation context, rather than a pointer to the new hash table type itself? Seems like an unnecessary indirection which adds a dependency on the sw context to the validation context? Signed-off-by: Maaz Mombasawala Signed-off-by: Zack Rusin --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c| 24 -- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h| 5 +- drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c| 14 ++ drivers/gpu/drm/vmwgfx/vmwgfx_validation.c | 55 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_validation.h | 26 +++--- 5 files changed, 58 insertions(+), 66 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 13b90273eb77..8d77e79bd904 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -830,6 +830,22 @@ static void vmw_write_driver_id(struct vmw_private *dev) } } +static void vmw_sw_context_init(struct vmw_private *dev_priv) +{ + struct vmw_sw_context *sw_context = _priv->ctx; + + hash_init(sw_context->res_ht); +} + +static void vmw_sw_context_fini(struct vmw_private *dev_priv) +{ + struct vmw_sw_context *sw_context = _priv->ctx; + + vfree(sw_context->cmd_bounce); + if (sw_context->staged_bindings) + vmw_binding_state_free(sw_context->staged_bindings); +} + static int vmw_driver_load(struct vmw_private *dev_priv, u32 pci_id) { int ret; @@ -839,6 +855,8 @@ static int vmw_driver_load(struct vmw_private *dev_priv, u32 pci_id) dev_priv->drm.dev_private = dev_priv; + vmw_sw_context_init(dev_priv); + mutex_init(_priv->cmdbuf_mutex); mutex_init(_priv->binding_mutex); spin_lock_init(_priv->resource_lock); @@ -1168,9 +1186,7 @@ static void vmw_driver_unload(struct drm_device *dev) unregister_pm_notifier(_priv->pm_nb); - if (dev_priv->ctx.res_ht_initialized) - vmwgfx_ht_remove(_priv->ctx.res_ht); - vfree(dev_priv->ctx.cmd_bounce); + vmw_sw_context_fini(dev_priv); if (dev_priv->enable_fb) { vmw_fb_off(dev_priv); vmw_fb_close(dev_priv); @@ -1198,8 +1214,6 @@ static void vmw_driver_unload(struct drm_device *dev) vmw_irq_uninstall(_priv->drm); ttm_object_device_release(_priv->tdev); - if (dev_priv->ctx.staged_bindings) - vmw_binding_state_free(dev_priv->ctx.staged_bindings); for (i = vmw_res_context; i < vmw_res_max; ++i) idr_destroy(_priv->res_idr[i]); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index 09e2d738aa87..d87aeedb78d0 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -30,6 +30,7 @@ #include #include +#include #include #include @@ -93,6 +94,7 @@ #define VMW_RES_STREAM ttm_driver_type2 #define VMW_RES_FENCE ttm_driver_type3 #define VMW_RES_SHADER ttm_driver_type4 +#define VMW_RES_HT_ORDER 12 #define MKSSTAT_CAPACITY_LOG2 5U #define MKSSTAT_CAPACITY (1U << MKSSTAT_CAPACITY_LOG2) @@ -425,8 +427,7 @@ struct vmw_ctx_validation_info; * @ctx: The validation context */ struct vmw_sw_context{ - struct vmwgfx_open_hash res_ht; - bool res_ht_initialized; + DECLARE_HASHTABLE(res_ht, VMW_RES_HT_ORDER); bool kernel; struct vmw_fpriv *fp; struct drm_file *filp; diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c index f085dbd4736d..c943ab801ca7 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 OR MIT /** * - * Copyright 2009 - 2015 VMware, Inc., Palo Alto, CA., USA + * Copyright 2009 - 2022 VMware, Inc., Palo Alto, CA., USA * * Permission is hereby granted, free of charge, to any person obtaining a * copy of this software and associated documentation files (the @@ -25,6 +25,7 @@ * **/ #include +#include #include "vmwgfx_drv.h" #include "vmwgfx_reg.h" @@ -34,7 +35,6 @@ #include
Re: Balancing ttm_mem_io_reserve() and ttm_mem_io_free()
Hi, Christian, On 8/19/22 10:52, Christian König wrote: Hi Thomas, IIRC we intentionally dropped that approach of balancing ttm_mem_io_reserve()/ttm_mem_io_free(). Instead the results from ttm_mem_io_reserve() are cached and that cached information is freed by ttm_mem_io_free(). In other words every time we need to make sure we have the cache filled we call ttm_mem_io_reserve() and everytime we are about to free the resource or don't need the mapping any more we call ttm_mem_io_free(). The callbacks to io_mem_reserve() and io_mem_free() are then balanced. Hmm, yes, in the end at resource destroy, anything reserved would indeed have been freed, but consider the following: ttm_bo_vm_fault(); ttm_bo_vmap(); ttm_bo_vunmap(); ttm_bo_unmap_virtual(); Here, wouldn't we release the io_reservation on ttm_bo_vunmap() when it should really have been released on ttm_bo_unmap_virtual()? Fixing missing _free() calls in the error path is probably a good idea, but I wouldn't go beyond that. Why should any of that be racy? You need to hold the reservation lock to call any of those functions. It's when now a ttm_resource has been detached from a bo, and combined with an ongoing async memcpy we no longer have a bo reservation to protect with. Now the async memcpy currently only exists in i915 and we might at some point be able to get rid of it, but it illustrates the problem. Thanks, Thomas Regards, Christian. Am 19.08.22 um 10:13 schrieb Thomas Hellström: Hi Christian, I'm looking for a way to take some sort of reference across possible VRAM accesses over the PCI bar, be it for runtime PM, workarounds or whatever. The ttm_mem_io_reserve/free seems like a good candidate, but is completely unbalanced and looks racy. In particular error paths forget to call ttm_mem_io_free(). Would you have any objections if I took a look at attempting to balance calls to those functions, or do you have any other suggestions for a better method? Thanks, Thomas
Re: [PATCH v7 2/2] drm/gem: Don't map imported GEMs
On 6/30/22 22:22, Dmitry Osipenko wrote: Hello Thomas, On 6/30/22 23:15, Thomas Hellström (Intel) wrote: Hi, Dmitry, On 6/30/22 22:04, Dmitry Osipenko wrote: Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't handle imported dma-bufs properly, which results in mapping of something else than the imported dma-buf. On NVIDIA Tegra we get a hard lockup when userspace writes to the memory mapping of a dma-buf that was imported into Tegra's DRM GEM. Majority of DRM drivers prohibit mapping of the imported GEM objects. Mapping of imported GEMs require special care from userspace since it should sync dma-buf because mapping coherency of the exporter device may not match the DRM device. Let's prohibit the mapping for all DRM drivers for consistency. Cc: sta...@vger.kernel.org Suggested-by: Thomas Hellström Signed-off-by: Dmitry Osipenko This might break drivers whose obj->funcs->mmap() callback already handles this case, and has userspace that does the right thing. The drm-shmem helper should be the only that maps imported GEMs properly, but drivers that use drm-shmem already prohibit to map imported GEMs. Okay, I'll try to re-check once again to be sure. OK. If you aren't 100.1% sure, then please drop the Cc: stable tag and let the patch ride out at least an -rc series, because breaking a stable kernel is something we woudln't want to do. Thanks, Thomas I think the disabling must be checked on a per-driver basis to avoid that; in particular drivers that already call dma_buf_mmap() should be able to continue doing this. Also the Fixes: review comment remains, This should've been broken forever, don't think that we have a fixes tag here. I looked thought all my previous patches and added the Fixes wherever was possible. If I really missed something, then please let me know.
Re: [PATCH v7 2/2] drm/gem: Don't map imported GEMs
Hi, Dmitry, On 6/30/22 22:04, Dmitry Osipenko wrote: Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't handle imported dma-bufs properly, which results in mapping of something else than the imported dma-buf. On NVIDIA Tegra we get a hard lockup when userspace writes to the memory mapping of a dma-buf that was imported into Tegra's DRM GEM. Majority of DRM drivers prohibit mapping of the imported GEM objects. Mapping of imported GEMs require special care from userspace since it should sync dma-buf because mapping coherency of the exporter device may not match the DRM device. Let's prohibit the mapping for all DRM drivers for consistency. Cc: sta...@vger.kernel.org Suggested-by: Thomas Hellström Signed-off-by: Dmitry Osipenko This might break drivers whose obj->funcs->mmap() callback already handles this case, and has userspace that does the right thing. I think the disabling must be checked on a per-driver basis to avoid that; in particular drivers that already call dma_buf_mmap() should be able to continue doing this. Also the Fixes: review comment remains, thanks, Thomas --- drivers/gpu/drm/drm_gem.c | 4 drivers/gpu/drm/drm_gem_shmem_helper.c | 9 - 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 86d670c71286..fc9ec42fa0ab 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1034,6 +1034,10 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size, { int ret; + /* Don't allow imported objects to be mapped */ + if (obj->import_attach) + return -EINVAL; + /* Check for valid size. */ if (obj_size < vma->vm_end - vma->vm_start) return -EINVAL; diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 8ad0e02991ca..6190f5018986 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -609,17 +609,8 @@ EXPORT_SYMBOL_GPL(drm_gem_shmem_vm_ops); */ int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct *vma) { - struct drm_gem_object *obj = >base; int ret; - if (obj->import_attach) { - /* Drop the reference drm_gem_mmap_obj() acquired.*/ - drm_gem_object_put(obj); - vma->vm_private_data = NULL; - - return dma_buf_mmap(obj->dma_buf, vma, 0); - } - ret = drm_gem_shmem_get_pages(shmem); if (ret) { drm_gem_vm_close(vma);
Re: [PATCH v6 02/22] drm/gem: Move mapping of imported dma-bufs to drm_gem_mmap_obj()
On 6/29/22 10:22, Dmitry Osipenko wrote: On 6/29/22 09:40, Thomas Hellström (Intel) wrote: On 5/27/22 01:50, Dmitry Osipenko wrote: Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't handle imported dma-bufs properly, which results in mapping of something else than the imported dma-buf. For example, on NVIDIA Tegra we get a hard lockup when userspace writes to the memory mapping of a dma-buf that was imported into Tegra's DRM GEM. To fix this bug, move mapping of imported dma-bufs to drm_gem_mmap_obj(). Now mmaping of imported dma-bufs works properly for all DRM drivers. Same comment about Fixes: as in patch 1, Cc: sta...@vger.kernel.org Signed-off-by: Dmitry Osipenko --- drivers/gpu/drm/drm_gem.c | 3 +++ drivers/gpu/drm/drm_gem_shmem_helper.c | 9 - drivers/gpu/drm/tegra/gem.c | 4 3 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 86d670c71286..7c0b025508e4 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1038,6 +1038,9 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size, if (obj_size < vma->vm_end - vma->vm_start) return -EINVAL; + if (obj->import_attach) + return dma_buf_mmap(obj->dma_buf, vma, 0); If we start enabling mmaping of imported dma-bufs on a majority of drivers in this way, how do we ensure that user-space is not blindly using the object mmap without calling the needed DMA_BUF_IOCTL_SYNC which is needed before and after cpu access of mmap'ed dma-bufs? I was under the impression (admittedly without looking) that the few drivers that actually called into dma_buf_mmap() had some private user-mode driver code in place that ensured this happened. Since it's a userspace who does the mapping, then it should be a responsibility of userspace to do all the necessary syncing. Sure, but nothing prohibits user-space to ignore the syncing thinking "It works anyway", testing those drivers where the syncing is a NOP. And when a driver that finally needs syncing is tested it's too late to fix all broken user-space. I'm not sure whether anyone in userspace really needs to map imported dma-bufs in practice. Nevertheless, this use-case is broken and should be fixed by either allowing to do the mapping or prohibiting it. Then I'd vote for prohibiting it, at least for now. And for the future moving forward we could perhaps revisit the dma-buf need for syncing, requiring those drivers that actually need it to implement emulated coherent memory which can be done not too inefficiently (vmwgfx being one example). /Thomas
Re: [PATCH v6 08/22] drm/virtio: Unlock reservations on dma_resv_reserve_fences() error
On 5/27/22 01:50, Dmitry Osipenko wrote: Unlock reservations on dma_resv_reserve_fences() error to fix recursive locking of the reservations when this error happens. Fixes: Cc: sta...@vger.kernel.org With that fixed, Reviewed-by: Thomas Hellström Signed-off-by: Dmitry Osipenko --- drivers/gpu/drm/virtio/virtgpu_gem.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c index 580a78809836..7db48d17ee3a 100644 --- a/drivers/gpu/drm/virtio/virtgpu_gem.c +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c @@ -228,8 +228,10 @@ int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs) for (i = 0; i < objs->nents; ++i) { ret = dma_resv_reserve_fences(objs->objs[i]->resv, 1); - if (ret) + if (ret) { + virtio_gpu_array_unlock_resv(objs); return ret; + } } return ret; }
Re: [PATCH v6 02/22] drm/gem: Move mapping of imported dma-bufs to drm_gem_mmap_obj()
On 5/27/22 01:50, Dmitry Osipenko wrote: Drivers that use drm_gem_mmap() and drm_gem_mmap_obj() helpers don't handle imported dma-bufs properly, which results in mapping of something else than the imported dma-buf. For example, on NVIDIA Tegra we get a hard lockup when userspace writes to the memory mapping of a dma-buf that was imported into Tegra's DRM GEM. To fix this bug, move mapping of imported dma-bufs to drm_gem_mmap_obj(). Now mmaping of imported dma-bufs works properly for all DRM drivers. Same comment about Fixes: as in patch 1, Cc: sta...@vger.kernel.org Signed-off-by: Dmitry Osipenko --- drivers/gpu/drm/drm_gem.c | 3 +++ drivers/gpu/drm/drm_gem_shmem_helper.c | 9 - drivers/gpu/drm/tegra/gem.c| 4 3 files changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 86d670c71286..7c0b025508e4 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1038,6 +1038,9 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size, if (obj_size < vma->vm_end - vma->vm_start) return -EINVAL; + if (obj->import_attach) + return dma_buf_mmap(obj->dma_buf, vma, 0); If we start enabling mmaping of imported dma-bufs on a majority of drivers in this way, how do we ensure that user-space is not blindly using the object mmap without calling the needed DMA_BUF_IOCTL_SYNC which is needed before and after cpu access of mmap'ed dma-bufs? I was under the impression (admittedly without looking) that the few drivers that actually called into dma_buf_mmap() had some private user-mode driver code in place that ensured this happened. /Thomas + /* Take a ref for this mapping of the object, so that the fault * handler can dereference the mmap offset's pointer to the object. * This reference is cleaned up by the corresponding vm_close diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c index 8ad0e02991ca..6190f5018986 100644 --- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -609,17 +609,8 @@ EXPORT_SYMBOL_GPL(drm_gem_shmem_vm_ops); */ int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct *vma) { - struct drm_gem_object *obj = >base; int ret; - if (obj->import_attach) { - /* Drop the reference drm_gem_mmap_obj() acquired.*/ - drm_gem_object_put(obj); - vma->vm_private_data = NULL; - - return dma_buf_mmap(obj->dma_buf, vma, 0); - } - ret = drm_gem_shmem_get_pages(shmem); if (ret) { drm_gem_vm_close(vma); diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c index 7c7dd84e6db8..f92aa20d63bb 100644 --- a/drivers/gpu/drm/tegra/gem.c +++ b/drivers/gpu/drm/tegra/gem.c @@ -564,6 +564,10 @@ int __tegra_gem_mmap(struct drm_gem_object *gem, struct vm_area_struct *vma) { struct tegra_bo *bo = to_tegra_bo(gem); + /* imported dmu-buf is mapped by drm_gem_mmap_obj() */ + if (gem->import_attach) + return 0; + if (!bo->pages) { unsigned long vm_pgoff = vma->vm_pgoff; int err;
Re: [Linaro-mm-sig] [PATCH] drm/i915: Remove __dma_fence_is_chain()
On 6/29/22 01:35, Rob Clark wrote: From: Rob Clark drive-by cleanup Signed-off-by: Rob Clark Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/gem/i915_gem_wait.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c index 319936f91ac5..667841780514 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c @@ -73,11 +73,6 @@ static void fence_set_priority(struct dma_fence *fence, rcu_read_unlock(); } -static inline bool __dma_fence_is_chain(const struct dma_fence *fence) -{ - return fence->ops == _fence_chain_ops; -} - void i915_gem_fence_wait_priority(struct dma_fence *fence, const struct i915_sched_attr *attr) { @@ -93,7 +88,7 @@ void i915_gem_fence_wait_priority(struct dma_fence *fence, for (i = 0; i < array->num_fences; i++) fence_set_priority(array->fences[i], attr); - } else if (__dma_fence_is_chain(fence)) { + } else if (dma_fence_is_chain(fence)) { struct dma_fence *iter; /* The chain is ordered; if we boost the last, we boost all */
Re: [PATCH v6 14/22] dma-buf: Introduce new locking convention
On 5/30/22 15:57, Dmitry Osipenko wrote: On 5/30/22 16:41, Christian König wrote: Hi Dmitry, Am 30.05.22 um 15:26 schrieb Dmitry Osipenko: Hello Christian, On 5/30/22 09:50, Christian König wrote: Hi Dmitry, First of all please separate out this patch from the rest of the series, since this is a complex separate structural change. I assume all the patches will go via the DRM tree in the end since the rest of the DRM patches in this series depend on this dma-buf change. But I see that separation may ease reviewing of the dma-buf changes, so let's try it. That sounds like you are underestimating a bit how much trouble this will be. I have tried this before and failed because catching all the locks in the right code paths are very tricky. So expect some fallout from this and make sure the kernel test robot and CI systems are clean. Sure, I'll fix up all the reported things in the next iteration. BTW, have you ever posted yours version of the patch? Will be great if we could compare the changed code paths. No, I never even finished creating it after realizing how much work it would be. This patch introduces new locking convention for dma-buf users. From now on all dma-buf importers are responsible for holding dma-buf reservation lock around operations performed over dma-bufs. This patch implements the new dma-buf locking convention by: 1. Making dma-buf API functions to take the reservation lock. 2. Adding new locked variants of the dma-buf API functions for drivers that need to manage imported dma-bufs under the held lock. Instead of adding new locked variants please mark all variants which expect to be called without a lock with an _unlocked postfix. This should make it easier to remove those in a follow up patch set and then fully move the locking into the importer. Do we really want to move all the locks to the importers? Seems the majority of drivers should be happy with the dma-buf helpers handling the locking for them. Yes, I clearly think so. 3. Converting all drivers to the new locking scheme. I have strong doubts that you got all of them. At least radeon and nouveau should grab the reservation lock in their ->attach callbacks somehow. Radeon and Nouveau use gem_prime_import_sg_table() and they take resv lock already, seems they should be okay (?) You are looking at the wrong side. You need to fix the export code path, not the import ones. See for example attach on radeon works like this drm_gem_map_attach->drm_gem_pin->radeon_gem_prime_pin->radeon_bo_reserve->ttm_bo_reserve->dma_resv_lock. Yeah, I was looking at the both sides, but missed this one. Also i915 will run into trouble with attach. In particular since i915 starts a full ww transaction in its attach callback to be able to lock other objects if migration is needed. I think i915 CI would catch this in a selftest. Perhaps it's worthwile to take a step back and figure out, if the importer is required to lock, which callbacks might need a ww acquire context? (And off-topic, Since we do a lot of fancy stuff under dma-resv locks including waiting for fences and other locks, IMO taking these locks uninterruptible should ring a warning bell) /Thomas Same for nouveau and probably a few other exporters as well. That will certainly cause a deadlock if you don't fix it. I strongly suggest to do this step by step, first attach/detach and then the rest. Thank you very much for the suggestions. I'll implement them in the next version.
Re: [PATCH v6 01/22] drm/gem: Properly annotate WW context on drm_gem_lock_reservations() error
Hi, On 5/27/22 01:50, Dmitry Osipenko wrote: Use ww_acquire_fini() in the error code paths. Otherwise lockdep thinks that lock is held when lock's memory is freed after the drm_gem_lock_reservations() error. The WW needs to be annotated as "freed" s /WW/ww_acquire_context/ ? s /"freed"/"released"/ ? , which fixes the noisy "WARNING: held lock freed!" splat of VirtIO-GPU driver with CONFIG_DEBUG_MUTEXES=y and enabled lockdep. Cc: sta...@vger.kernel.org Can you dig up the commit in error and add a Fixes: Tag? Using that and "dim fixes" will also make the Cc: stable tag a bit more verbose. With that fixed, Reviewed-by: Thomas Hellström Signed-off-by: Dmitry Osipenko --- drivers/gpu/drm/drm_gem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index eb0c2d041f13..86d670c71286 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -1226,7 +1226,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, int count, ret = dma_resv_lock_slow_interruptible(obj->resv, acquire_ctx); if (ret) { - ww_acquire_done(acquire_ctx); + ww_acquire_fini(acquire_ctx); return ret; } } @@ -1251,7 +1251,7 @@ drm_gem_lock_reservations(struct drm_gem_object **objs, int count, goto retry; } - ww_acquire_done(acquire_ctx); + ww_acquire_fini(acquire_ctx); return ret; } }
Re: [Intel-gfx] [PATCH 06/10] drm/i915/uapi: add NEEDS_CPU_ACCESS hint
On 5/25/22 20:43, Matthew Auld wrote: If set, force the allocation to be placed in the mappable portion of I915_MEMORY_CLASS_DEVICE. One big restriction here is that system memory (i.e I915_MEMORY_CLASS_SYSTEM) must be given as a potential placement for the object, that way we can always spill the object into system memory if we can't make space. Testcase: igt@gem-create@create-ext-cpu-access-sanity-check Testcase: igt@gem-create@create-ext-cpu-access-big Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin --- drivers/gpu/drm/i915/gem/i915_gem_create.c | 26 ++--- include/uapi/drm/i915_drm.h| 61 +++--- 2 files changed, 71 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c b/drivers/gpu/drm/i915/gem/i915_gem_create.c index d094cae0ddf1..33673fe7ee0a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c @@ -241,6 +241,7 @@ struct create_ext { struct drm_i915_private *i915; struct intel_memory_region *placements[INTEL_REGION_UNKNOWN]; unsigned int n_placements; + unsigned int placement_mask; unsigned long flags; }; @@ -337,6 +338,7 @@ static int set_placements(struct drm_i915_gem_create_ext_memory_regions *args, for (i = 0; i < args->num_regions; i++) ext_data->placements[i] = placements[i]; + ext_data->placement_mask = mask; return 0; out_dump: @@ -411,7 +413,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, struct drm_i915_gem_object *obj; int ret; - if (args->flags) + if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) return -EINVAL; ret = i915_user_extensions(u64_to_user_ptr(args->extensions), @@ -427,13 +429,21 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void *data, ext_data.n_placements = 1; } - /* -* TODO: add a userspace hint to force CPU_ACCESS for the object, which -* can override this. -*/ - if (ext_data.n_placements > 1 || - ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM) - ext_data.flags |= I915_BO_ALLOC_GPU_ONLY; + if (args->flags & I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) { + if (ext_data.n_placements == 1) + return -EINVAL; + + /* +* We always need to be able to spill to system memory, if we +* can't place in the mappable part of LMEM. +*/ + if (!(ext_data.placement_mask & BIT(INTEL_REGION_SMEM))) + return -EINVAL; + } else { + if (ext_data.n_placements > 1 || + ext_data.placements[0]->type != INTEL_MEMORY_SYSTEM) + ext_data.flags |= I915_BO_ALLOC_GPU_ONLY; + } obj = __i915_gem_object_create_user_ext(i915, args->size, ext_data.placements, diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e30f31a440b3..5b0a10e6a1b8 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -3366,11 +3366,11 @@ struct drm_i915_query_memory_regions { * struct drm_i915_gem_create_ext - Existing gem_create behaviour, with added * extension support using struct i915_user_extension. * - * Note that in the future we want to have our buffer flags here, at least for - * the stuff that is immutable. Previously we would have two ioctls, one to - * create the object with gem_create, and another to apply various parameters, - * however this creates some ambiguity for the params which are considered - * immutable. Also in general we're phasing out the various SET/GET ioctls. + * Note that new buffer flags should be added here, at least for the stuff that + * is immutable. Previously we would have two ioctls, one to create the object + * with gem_create, and another to apply various parameters, however this + * creates some ambiguity for the params which are considered immutable. Also in + * general we're phasing out the various SET/GET ioctls. */ struct drm_i915_gem_create_ext { /** @@ -3378,7 +3378,6 @@ struct drm_i915_gem_create_ext { * * The (page-aligned) allocated size for the object will be returned. * -* * DG2 64K min page size implications: * * On discrete platforms, starting from DG2, we have to contend with GTT @@ -3390,7 +3389,9 @@ struct drm_i915_gem_create_ext { * * Note that the returned size here will always reflect any required * rounding up done by the kernel, i.e 4K will now become 64K on devices -* such as DG2. +* such as DG2. The kernel will always select the
Re: [Intel-gfx] [PATCH 07/10] drm/i915/error: skip non-mappable pages
On 5/25/22 20:43, Matthew Auld wrote: Skip capturing any lmem pages that can't be copied using the CPU. This in now only best effort on platforms that have small BAR. Testcase: igt@gem-exec-capture@capture-invisible Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin --- drivers/gpu/drm/i915/i915_gpu_error.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 0512c66fa4f3..77df899123c2 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1116,11 +1116,15 @@ i915_vma_coredump_create(const struct intel_gt *gt, dma_addr_t dma; for_each_sgt_daddr(dma, iter, vma_res->bi.pages) { + dma_addr_t offset = dma - mem->region.start; void __iomem *s; - s = io_mapping_map_wc(>iomap, - dma - mem->region.start, - PAGE_SIZE); + if (offset + PAGE_SIZE > mem->io_size) { + ret = -EINVAL; + break; + } + + s = io_mapping_map_wc(>iomap, offset, PAGE_SIZE); ret = compress_page(compress, (void __force *)s, dst, true); Reviewed-by: Thomas Hellström
Re: [Intel-gfx] [PATCH 08/10] drm/i915/uapi: disable capturing objects on recoverable contexts
On 5/25/22 20:43, Matthew Auld wrote: A non-recoverable context must be used if the user wants proper error capture on discrete platforms. In the future the kernel may want to blit the contents of some objects when later doing the capture stage. Testcase: igt@gem_exec_capture@capture-recoverable-discrete Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index b279588c0672..e27ccfa50dc3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1961,7 +1961,7 @@ eb_find_first_request_added(struct i915_execbuffer *eb) #if IS_ENABLED(CONFIG_DRM_I915_CAPTURE_ERROR) /* Stage with GFP_KERNEL allocations before we enter the signaling critical path */ -static void eb_capture_stage(struct i915_execbuffer *eb) +static int eb_capture_stage(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; unsigned int i = count, j; @@ -1974,6 +1974,10 @@ static void eb_capture_stage(struct i915_execbuffer *eb) if (!(flags & EXEC_OBJECT_CAPTURE)) continue; + if (i915_gem_context_is_recoverable(eb->gem_context) && + IS_DGFX(eb->i915)) + return -EINVAL; + We should also require this for future integrated, for capture buffer memory allocation purposes. Otherwise Reviewed-by: Thomas Hellström
Re: [Intel-gfx] [PATCH 01/10] drm/doc: add rfc section for small BAR uapi
On 5/25/22 20:43, Matthew Auld wrote: Add an entry for the new uapi needed for small BAR on DG2+. v2: - Some spelling fixes and other small tweaks. (Akeem & Thomas) - Rework error capture interactions, including no longer needing NEEDS_CPU_ACCESS for objects marked for capture. (Thomas) - Add probed_cpu_visible_size. (Lionel) v3: - Drop the vma query for now. - Add unallocated_cpu_visible_size as part of the region query. - Improve the docs some more, including documenting the expected behaviour on older kernels, since this came up in some offline discussion. v4: - Various improvements all over. (Tvrtko) Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Lionel Landwerlin Cc: Tvrtko Ursulin Cc: Jon Bloomfield Cc: Daniel Vetter Cc: Jordan Justen Cc: Kenneth Graunke Cc: Akeem G Abodunrin Cc: mesa-...@lists.freedesktop.org Acked-by: Tvrtko Ursulin Acked-by: Akeem G Abodunrin --- Documentation/gpu/rfc/i915_small_bar.h | 189 +++ Documentation/gpu/rfc/i915_small_bar.rst | 47 ++ Documentation/gpu/rfc/index.rst | 4 + 3 files changed, 240 insertions(+) create mode 100644 Documentation/gpu/rfc/i915_small_bar.h create mode 100644 Documentation/gpu/rfc/i915_small_bar.rst diff --git a/Documentation/gpu/rfc/i915_small_bar.h b/Documentation/gpu/rfc/i915_small_bar.h new file mode 100644 index ..752bb2ceb399 --- /dev/null +++ b/Documentation/gpu/rfc/i915_small_bar.h @@ -0,0 +1,189 @@ +/** + * struct __drm_i915_memory_region_info - Describes one region as known to the + * driver. + * + * Note this is using both struct drm_i915_query_item and struct drm_i915_query. + * For this new query we are adding the new query id DRM_I915_QUERY_MEMORY_REGIONS + * at _i915_query_item.query_id. + */ +struct __drm_i915_memory_region_info { + /** @region: The class:instance pair encoding */ + struct drm_i915_gem_memory_class_instance region; + + /** @rsvd0: MBZ */ + __u32 rsvd0; + + /** +* @probed_size: Memory probed by the driver (-1 = unknown) +* +* Note that it should not be possible to ever encounter a zero value +* here, also note that no current region type will ever return -1 here. +* Although for future region types, this might be a possibility. The +* same applies to the other size fields. +*/ + __u64 probed_size; + + /** +* @unallocated_size: Estimate of memory remaining (-1 = unknown) +* +* Requires CAP_PERFMON or CAP_SYS_ADMIN to get reliable accounting. +* Without this (or if this is an older kernel) the value here will +* always equal the @probed_size. Note this is only currently tracked +* for I915_MEMORY_CLASS_DEVICE regions (for other types the value here +* will always equal the @probed_size). +*/ + __u64 unallocated_size; + + union { + /** @rsvd1: MBZ */ + __u64 rsvd1[8]; + struct { + /** +* @probed_cpu_visible_size: Memory probed by the driver +* that is CPU accessible. (-1 = unknown). +* +* This will be always be <= @probed_size, and the +* remainder (if there is any) will not be CPU +* accessible. +* +* On systems without small BAR, the @probed_size will +* always equal the @probed_cpu_visible_size, since all +* of it will be CPU accessible. +* +* Note this is only tracked for +* I915_MEMORY_CLASS_DEVICE regions (for other types the +* value here will always equal the @probed_size). +* +* Note that if the value returned here is zero, then +* this must be an old kernel which lacks the relevant +* small-bar uAPI support (including +* I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS), but on +* such systems we should never actually end up with a +* small BAR configuration, assuming we are able to load +* the kernel module. Hence it should be safe to treat +* this the same as when @probed_cpu_visible_size == +* @probed_size. +*/ + __u64 probed_cpu_visible_size; + + /** +* @unallocated_cpu_visible_size: Estimate of CPU +* visible memory remaining (-1 = unknown). +* +* Note this is only tracked for +* I915_MEMORY_CLASS_DEVICE
Re: [PATCH 1/2] drm/vmwgfx: Remove unused hugepage support
On 4/25/22 22:31, Zack Rusin wrote: From: Zack Rusin There's no point in explicitly trying to align virtual memory to facilitate huge page table entries or huge page memory in buffer objects given that they're not being used. Transparent hugepages support for vram allocations has been gradually retired over the last two years making alignment of unmapped areas unneeded and pointless. Signed-off-by: Zack Rusin For the series: Reviewed-by: Thomas Hellström --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 13 - 1 file changed, 13 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 5dc02fd806db..45028e25d490 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -1423,18 +1423,6 @@ static void vmw_debugfs_resource_managers_init(struct vmw_private *vmw) root, "system_mob_ttm"); } -static unsigned long -vmw_get_unmapped_area(struct file *file, unsigned long uaddr, - unsigned long len, unsigned long pgoff, - unsigned long flags) -{ - struct drm_file *file_priv = file->private_data; - struct vmw_private *dev_priv = vmw_priv(file_priv->minor->dev); - - return drm_get_unmapped_area(file, uaddr, len, pgoff, flags, -dev_priv->drm.vma_offset_manager); -} - static int vmwgfx_pm_notifier(struct notifier_block *nb, unsigned long val, void *ptr) { @@ -1601,7 +1589,6 @@ static const struct file_operations vmwgfx_driver_fops = { .compat_ioctl = vmw_compat_ioctl, #endif .llseek = noop_llseek, - .get_unmapped_area = vmw_get_unmapped_area, }; static const struct drm_driver driver = {
Re: [Intel-gfx] [PATCH 2/4] drm/i915/selftests: Check for incomplete LRI from the context image
On 3/14/22 19:20, Ramalingam C wrote: From: Chris Wilson In order to keep the context image parser simple, we assume that all commands follow a similar format. A few, especially not MI commands on the render engines, have fixed lengths not encoded in a length field. This caused us to incorrectly skip over 3D state commands, and start interpretting context data as instructions. Eventually, as Daniele interpreting discovered, this would lead us to find addition LRI as part of the data and mistakenly add invalid LRI commands to the context probes. Stop parsing after we see the first !MI command, as we know we will have seen all the context registers by that point. (Mostly true for all gen so far, though the render context does have LRI after the first page that we have been ignoring so far. It would be useful to extract those as well so that we have the full list of user accesisble registers.) accessible Similarly, emit a warning if we do try to emit an invalid zero-length LRI. Reported-by: Daniele Ceraolo Spurio Signed-off-by: Chris Wilson Cc: Daniele Ceraolo Spurio Signed-off-by: Ramalingam C Acked-by: Thomas Hellström --- drivers/gpu/drm/i915/gt/selftest_lrc.c | 61 +++--- 1 file changed, 54 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 13f57c7c4224..0a8ed4246082 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -27,6 +27,9 @@ #define NUM_GPR 16 #define NUM_GPR_DW (NUM_GPR * 2) /* each GPR is 2 dwords */ +#define LRI_HEADER MI_INSTR(0x22, 0) +#define LRI_LENGTH_MASK GENMASK(7, 0) + static struct i915_vma *create_scratch(struct intel_gt *gt) { return __vm_create_scratch_for_read_pinned(>ggtt->vm, PAGE_SIZE); @@ -180,7 +183,7 @@ static int live_lrc_layout(void *arg) continue; } - if ((lri & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) { + if ((lri & GENMASK(31, 23)) != LRI_HEADER) { pr_err("%s: Expected LRI command at dword %d, found %08x\n", engine->name, dw, lri); err = -EINVAL; @@ -945,18 +948,40 @@ store_context(struct intel_context *ce, struct i915_vma *scratch) hw = defaults; hw += LRC_STATE_OFFSET / sizeof(*hw); do { - u32 len = hw[dw] & 0x7f; + u32 len = hw[dw] & LRI_LENGTH_MASK; + + /* +* Keep it simple, skip parsing complex commands +* +* At present, there are no more MI_LOAD_REGISTER_IMM +* commands after the first 3D state command. Rather +* than include a table (see i915_cmd_parser.c) of all +* the possible commands and their instruction lengths +* (or mask for variable length instructions), assume +* we have gathered the complete list of registers and +* bail out. +*/ + if ((hw[dw] >> INSTR_CLIENT_SHIFT) != INSTR_MI_CLIENT) + break; if (hw[dw] == 0) { dw++; continue; } - if ((hw[dw] & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) { + if ((hw[dw] & GENMASK(31, 23)) != LRI_HEADER) { + /* Assume all other MI commands match LRI length mask */ dw += len + 2; continue; } + if (!len) { + pr_err("%s: invalid LRI found in context image\n", + ce->engine->name); + igt_hexdump(defaults, PAGE_SIZE); + break; + } + dw++; len = (len + 1) / 2; while (len--) { @@ -1108,18 +1133,29 @@ static struct i915_vma *load_context(struct intel_context *ce, u32 poison) hw = defaults; hw += LRC_STATE_OFFSET / sizeof(*hw); do { - u32 len = hw[dw] & 0x7f; + u32 len = hw[dw] & LRI_LENGTH_MASK; + + /* For simplicity, break parsing at the first complex command */ + if ((hw[dw] >> INSTR_CLIENT_SHIFT) != INSTR_MI_CLIENT) + break; if (hw[dw] == 0) { dw++; continue; } - if ((hw[dw] & GENMASK(31, 23)) != MI_INSTR(0x22, 0)) { + if ((hw[dw] & GENMASK(31, 23)) != LRI_HEADER) { dw += len + 2; continue; } + if (!len) { + pr_err("%s: invalid LRI found in context image\n", + ce->engine->name); + igt_hexdump(defaults, PAGE_SIZE); +
Re: [Intel-gfx] [PATCH v9 4/9] drm/i915/gt: Pass the -EINVAL when emit_pte doesn't update any PTE
On 4/5/22 17:08, Ramalingam C wrote: When emit_pte doesn't update any PTE with return value as 0, interpret it as -EINVAL. v2: Add missing goto [Thomas] Signed-off-by: Ramalingam C --- drivers/gpu/drm/i915/gt/intel_migrate.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index e0f1c727662e..6378d4450e1a 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -577,7 +577,11 @@ intel_context_migrate_copy(struct intel_context *ce, len = emit_pte(rq, _src, src_cache_level, src_is_lmem, src_offset, CHUNK_SZ); if (len <= 0) { err = len ? len : -EINVAL; goto out_rq; } ? Up to you. Reviewed-by: Thomas Hellström - if (len <= 0) { + if (!len) { + err = -EINVAL; + goto out_rq; + } + if (len < 0) { err = len; goto out_rq; }
Re: [Intel-gfx] [PATCH v7 5/9] drm/i915/selftest_migrate: Consider the possible roundup of size
On 3/28/22 21:07, Ramalingam C wrote: Consider the possible round up happened at obj size alignment to min_page_size during the obj allocation. Signed-off-by: Ramalingam C Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/gt/selftest_migrate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/selftest_migrate.c b/drivers/gpu/drm/i915/gt/selftest_migrate.c index c9c4f391c5cc..b5da8b8cd039 100644 --- a/drivers/gpu/drm/i915/gt/selftest_migrate.c +++ b/drivers/gpu/drm/i915/gt/selftest_migrate.c @@ -152,6 +152,9 @@ static int clear(struct intel_migrate *migrate, if (IS_ERR(obj)) return 0; + /* Consider the rounded up memory too */ + sz = obj->base.size; + for_i915_gem_ww(, err, true) { err = i915_gem_object_lock(obj, ); if (err)
Re: [Intel-gfx] [PATCH v5 6/9] drm/i915/gt: offset handling for multiple copy engines
On 3/21/22 23:44, Ramalingam C wrote: Handle the src and dst chunk offsets for different instances of the copy engines. Signed-off-by: Ramalingam C --- drivers/gpu/drm/i915/gt/intel_migrate.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 39a5f8ae664d..5f6341f91622 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -614,6 +614,9 @@ static int emit_copy(struct i915_request *rq, u32 instance = rq->engine->instance; u32 *cs; + src_offset += (u64)rq->engine->instance << 32; + dst_offset += (u64)rq->engine->instance << 32; + Again, these are nops since the offsets are 32-bit. Also the instance selection is already handled in the functon, so I think this patch can be dropped. cs = intel_ring_begin(rq, ver >= 8 ? 10 : 6); if (IS_ERR(cs)) return PTR_ERR(cs);
Re: [Intel-gfx] [PATCH v5 3/9] drm/i915/gt: Clear compress metadata for Flat-ccs objects
Hi, Ram On 3/21/22 23:44, Ramalingam C wrote: Xe-HP and latest devices support Flat CCS which reserved a portion of the device memory to store compression metadata, during the clearing of device memory buffer object we also need to clear the associated CCS buffer. XY_CTRL_SURF_COPY_BLT is a BLT cmd used for reading and writing the ccs surface of a lmem memory. So on Flat-CCS capable platform we use XY_CTRL_SURF_COPY_BLT to clear the CCS meta data. v2: Fixed issues with platform naming [Lucas] v3: Rebased [Ram] Used the round_up funcs [Bob] v4: Fixed ccs blk calculation [Ram] Added Kdoc on flat-ccs. v5: GENMASK is used [Matt] mocs fix [Matt] Comments Fix [Matt] Flush address programming [Ram] v6: FLUSH_DW is fixed Few coding style fix v7: Adopting the XY_FAST_COLOR_BLT (Thomas] v8: XY_CTRL_SURF_COPY_BLT for ccs clearing. v9: emit_copy_ccs is used. Signed-off-by: Ramalingam C Signed-off-by: Ayaz A Siddiqui --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 15 ++ drivers/gpu/drm/i915/gt/intel_migrate.c | 164 ++- 2 files changed, 175 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index 925e55b6a94f..6b4eb7927ec7 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -153,8 +153,10 @@ #define MI_FLUSH_DW_PROTECTED_MEM_EN(1 << 22) #define MI_FLUSH_DW_STORE_INDEX (1<<21) #define MI_INVALIDATE_TLB (1<<18) +#define MI_FLUSH_DW_CCS (1<<16) #define MI_FLUSH_DW_OP_STOREDW (1<<14) #define MI_FLUSH_DW_OP_MASK (3<<14) +#define MI_FLUSH_DW_LLC (1<<9) #define MI_FLUSH_DW_NOTIFY (1<<8) #define MI_INVALIDATE_BSD (1<<7) #define MI_FLUSH_DW_USE_GTT (1<<2) @@ -203,6 +205,19 @@ #define GFX_OP_DRAWRECT_INFO ((0x3<<29)|(0x1d<<24)|(0x80<<16)|(0x3)) #define GFX_OP_DRAWRECT_INFO_I965 ((0x7900<<16)|0x2) +#define XY_CTRL_SURF_INSTR_SIZE 5 +#define MI_FLUSH_DW_SIZE 3 +#define XY_CTRL_SURF_COPY_BLT ((2 << 29) | (0x48 << 22) | 3) +#define SRC_ACCESS_TYPE_SHIFT21 +#define DST_ACCESS_TYPE_SHIFT20 +#define CCS_SIZE_MASKGENMASK(17, 8) +#define XY_CTRL_SURF_MOCS_MASK GENMASK(31, 25) +#define NUM_CCS_BYTES_PER_BLOCK 256 +#define NUM_BYTES_PER_CCS_BYTE 256 +#define NUM_CCS_BLKS_PER_XFER1024 +#define INDIRECT_ACCESS 0 +#define DIRECT_ACCESS1 + #define COLOR_BLT_CMD (2 << 29 | 0x40 << 22 | (5 - 2)) #define XY_COLOR_BLT_CMD (2 << 29 | 0x50 << 22) #define XY_FAST_COLOR_BLT_CMD (2 << 29 | 0x44 << 22) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index b656685a486d..39a5f8ae664d 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -16,7 +16,8 @@ struct insert_pte_data { }; #define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ - +#define GET_CCS_BYTES(i915, size) (HAS_FLAT_CCS(i915) ? \ +DIV_ROUND_UP(size, NUM_BYTES_PER_CCS_BYTE) : 0) static bool engine_supports_migration(struct intel_engine_cs *engine) { if (!engine) @@ -467,6 +468,145 @@ static bool wa_1209644611_applies(int ver, u32 size) return height % 4 == 3 && height <= 8; } +/** + * DOC: Flat-CCS - Memory compression for Local memory + * + * On Xe-HP and later devices, we use dedicated compression control state (CCS) + * stored in local memory for each surface, to support the 3D and media + * compression formats. + * + * The memory required for the CCS of the entire local memory is 1/256 of the + * local memory size. So before the kernel boot, the required memory is reserved + * for the CCS data and a secure register will be programmed with the CCS base + * address. + * + * Flat CCS data needs to be cleared when a lmem object is allocated. + * And CCS data can be copied in and out of CCS region through + * XY_CTRL_SURF_COPY_BLT. CPU can't access the CCS data directly. + * + * When we exhaust the lmem, if the object's placements support smem, then we can + * directly decompress the compressed lmem object into smem and start using it + * from smem itself. + * + * But when we need to swapout the compressed lmem object into a smem region + * though objects' placement doesn't support smem, then we copy the lmem content + * as it is into smem region along with ccs data (using XY_CTRL_SURF_COPY_BLT). + * When the object is referred, lmem content will be swaped in along with + * restoration of the CCS data (using XY_CTRL_SURF_COPY_BLT) at corresponding + * location. + */ + +static inline u32 *i915_flush_dw(u32 *cmd, u32 flags) +{ + *cmd++ = MI_FLUSH_DW |
Re: [Intel-gfx] [PATCH v5 2/9] drm/i915/gt: Optimize the migration and clear loop
On 3/21/22 23:44, Ramalingam C wrote: Move the static calculations out of the loops for copy and clear. Signed-off-by: Ramalingam C Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/gt/intel_migrate.c | 44 - 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 73199ebf0671..b656685a486d 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -526,6 +526,7 @@ intel_context_migrate_copy(struct intel_context *ce, struct i915_request **out) { struct sgt_dma it_src = sg_sgt(src), it_dst = sg_sgt(dst); + u32 src_offset, dst_offset; struct i915_request *rq; int err; @@ -534,8 +535,20 @@ intel_context_migrate_copy(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K); + src_offset = 0; + dst_offset = CHUNK_SZ; + if (HAS_64K_PAGES(ce->engine->i915)) { + GEM_BUG_ON(!src_is_lmem && !dst_is_lmem); + + src_offset = 0; + dst_offset = 0; + if (src_is_lmem) + src_offset = CHUNK_SZ; + if (dst_is_lmem) + dst_offset = 2 * CHUNK_SZ; + } + do { - u32 src_offset, dst_offset; int len; rq = i915_request_create(ce); @@ -563,19 +576,6 @@ intel_context_migrate_copy(struct intel_context *ce, if (err) goto out_rq; - src_offset = 0; - dst_offset = CHUNK_SZ; - if (HAS_64K_PAGES(ce->engine->i915)) { - GEM_BUG_ON(!src_is_lmem && !dst_is_lmem); - - src_offset = 0; - dst_offset = 0; - if (src_is_lmem) - src_offset = CHUNK_SZ; - if (dst_is_lmem) - dst_offset = 2 * CHUNK_SZ; - } - len = emit_pte(rq, _src, src_cache_level, src_is_lmem, src_offset, CHUNK_SZ); if (len <= 0) { @@ -585,12 +585,10 @@ intel_context_migrate_copy(struct intel_context *ce, err = emit_pte(rq, _dst, dst_cache_level, dst_is_lmem, dst_offset, len); - if (err < 0) - goto out_rq; - if (err < len) { + if (err < len) err = -EINVAL; + if (err < 0) goto out_rq; - } err = rq->engine->emit_flush(rq, EMIT_INVALIDATE); if (err) @@ -694,6 +692,7 @@ intel_context_migrate_clear(struct intel_context *ce, { struct sgt_dma it = sg_sgt(sg); struct i915_request *rq; + u32 offset; int err; GEM_BUG_ON(ce->vm != ce->engine->gt->migrate.context->vm); @@ -701,8 +700,11 @@ intel_context_migrate_clear(struct intel_context *ce, GEM_BUG_ON(ce->ring->size < SZ_64K); + offset = 0; + if (HAS_64K_PAGES(ce->engine->i915) && is_lmem) + offset = CHUNK_SZ; + do { - u32 offset; int len; rq = i915_request_create(ce); @@ -730,10 +732,6 @@ intel_context_migrate_clear(struct intel_context *ce, if (err) goto out_rq; - offset = 0; - if (HAS_64K_PAGES(ce->engine->i915) && is_lmem) - offset = CHUNK_SZ; - len = emit_pte(rq, , cache_level, is_lmem, offset, CHUNK_SZ); if (len <= 0) { err = len;
Re: [Intel-gfx] [PATCH v3 2/3] drm/i915/gem: Remove logic for wbinvd_on_all_cpus
Hi, Michael, others. This is the response from Linus last time we copied that already pre-existing wbinvd_on_all_cpus() macro to another place in the driver: https://lists.freedesktop.org/archives/dri-devel/2021-November/330928.html My first interpretation of this is that even if there currently are similar patterns in drm_cache.c, we shouldn't introduce more, encouraging other drivers to use incoherent IO. Other than that I think we should move whatever wbinvds we can over to the ranged versions, unless that is proven to be a performance drop. Finally for any wbinvds left in our driver, ensure that they are never executed for any gpu where we provide full coherency. That is all discrete gpus (and to be discussed integrated gpus moving forward). Might be that drm maintainers want to chime in here with other views. Thanks, Thomas On 3/15/22 17:59, Michael Cheng wrote: +Daniel for additional feedback! On 2022-03-14 4:06 p.m., Michael Cheng wrote: On 2022-03-08 10:58 a.m., Lucas De Marchi wrote: On Tue, Feb 22, 2022 at 08:24:31PM +0100, Thomas Hellström (Intel) wrote: Hi, Michael, On 2/22/22 18:26, Michael Cheng wrote: This patch removes logic for wbinvd_on_all_cpus and brings in drm_cache.h. This header has the logic that outputs a warning when wbinvd_on_all_cpus when its being used on a non-x86 platform. Signed-off-by: Michael Cheng Linus has been pretty clear that he won't accept patches that add macros that works on one arch and warns on others anymore in i915 and I figure even less so in drm code. So we shouldn't try to move this out to drm. Instead we should restrict the wbinvd() inside our driver to integrated and X86 only. For discrete on all architectures we should be coherent and hence not be needing wbinvd(). the warn is there to guarantee we don't forget a code path. However simply adding the warning is the real issue: we should rather guarantee we can't take that code path. I.e., as you said refactor the code to guarantee it works on discrete without that logic. $ git grep wbinvd_on_all_cpus -- drivers/gpu/drm/ drivers/gpu/drm/drm_cache.c: if (wbinvd_on_all_cpus()) drivers/gpu/drm/drm_cache.c: if (wbinvd_on_all_cpus()) drivers/gpu/drm/drm_cache.c: if (wbinvd_on_all_cpus()) drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c: * Currently we just do a heavy handed wbinvd_on_all_cpus() here since drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c: wbinvd_on_all_cpus(); It looks like we actually go through this on other discrete graphics. Is this missing an update like s/IS_DG1/IS_DGFX/? Or should we be doing something else? drivers/gpu/drm/i915/gem/i915_gem_pm.c:#define wbinvd_on_all_cpus() \ drivers/gpu/drm/i915/gem/i915_gem_pm.c: wbinvd_on_all_cpus(); Those are for suspend. Revert ac05a22cd07a ("drm/i915/gem: Almagamate clflushes on suspend") or extract that part to a helper function and implement it differently for arches != x86? drivers/gpu/drm/i915/gem/i915_gem_pm.c: wbinvd_on_all_cpus(); Probably take a similar approach to the suspend case? drivers/gpu/drm/i915/gt/intel_ggtt.c: wbinvd_on_all_cpus(); For a helper function, I have a #define for all non x86 architecture that gives a warn on [1] within drm_cache.h Or would it be better to implement a helper function instead? [1]. https://patchwork.freedesktop.org/patch/475750/?series=1=5 This one comes from 64b95df91f44 ("drm/i915: Assume exclusive access to objects inside resume") Shouldn't that be doing the invalidate if the write domain is I915_GEM_DOMAIN_CPU In the end I think the warning would be ok if it was the cherry on top, to guarantee we don't take those paths. We should probably have a warn_once() to avoid spamming the console. But we also have to rework the code to guarantee we are the only ones who may eventually get that warning, and not the end user. Could we first add the helper function/#define for now, and rework the code in a different patch series? Lucas De Marchi Thanks, /Thomas
Re: [Intel-gfx] [PATCH] drm/i915/gtt: reduce overzealous alignment constraints for GGTT
On 3/3/22 11:02, Matthew Auld wrote: Currently this will enforce both 2M alignment and padding for any LMEM pages inserted into the GGTT. However, this was only meant to be applied to the compact-pt layout with the ppGTT. For the GGTT we can reduce the alignment and padding to 64K. Bspec: 45015 Fixes: 87bd701ee268 ("drm/i915: enforce min GTT alignment for discrete cards") Signed-off-by: Matthew Auld Cc: Thomas Hellström Cc: Robert Beckett Cc: Ramalingam C Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/gt/intel_gtt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 4bcdfcab3642..a5f5b2dda332 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -234,7 +234,8 @@ void i915_address_space_init(struct i915_address_space *vm, int subclass) memset64(vm->min_alignment, I915_GTT_MIN_ALIGNMENT, ARRAY_SIZE(vm->min_alignment)); - if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915)) { + if (HAS_64K_PAGES(vm->i915) && NEEDS_COMPACT_PT(vm->i915) && + subclass == VM_CLASS_PPGTT) { vm->min_alignment[INTEL_MEMORY_LOCAL] = I915_GTT_PAGE_SIZE_2M; vm->min_alignment[INTEL_MEMORY_STOLEN_LOCAL] = I915_GTT_PAGE_SIZE_2M; } else if (HAS_64K_PAGES(vm->i915)) {
Re: [Intel-gfx] [PATCH v3 2/3] drm/i915/gem: Remove logic for wbinvd_on_all_cpus
Hi, Michael, On 2/22/22 18:26, Michael Cheng wrote: This patch removes logic for wbinvd_on_all_cpus and brings in drm_cache.h. This header has the logic that outputs a warning when wbinvd_on_all_cpus when its being used on a non-x86 platform. Signed-off-by: Michael Cheng Linus has been pretty clear that he won't accept patches that add macros that works on one arch and warns on others anymore in i915 and I figure even less so in drm code. So we shouldn't try to move this out to drm. Instead we should restrict the wbinvd() inside our driver to integrated and X86 only. For discrete on all architectures we should be coherent and hence not be needing wbinvd(). Thanks, /Thomas
Re: [Intel-gfx] [RFC 1/2] drm/i915/ttm: Add extra pages for handling ccs data
Hi, Ram, On 2/7/22 10:37, Ramalingam C wrote: While evicting the local memory data on flat-ccs capable platform we need to evict the ccs data associated to the data. For this, we are adding extra pages ((size / 256) >> PAGE_SIZE) into the ttm_tt. To achieve this we are adding a new param into the ttm_tt_init as ccs_pages_needed, which will be added into the ttm_tt->num_pages. Please use imperative form above, Instead of "We are adding..", use "Add" Signed-off-by: Ramalingam C Suggested-by: Thomas Hellstorm Hellstorm instead of Hellstrom might scare people off. :) --- drivers/gpu/drm/drm_gem_vram_helper.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 23 +- drivers/gpu/drm/qxl/qxl_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_agp_backend.c | 2 +- drivers/gpu/drm/ttm/ttm_tt.c | 12 ++- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 2 +- include/drm/ttm/ttm_tt.h | 4 +++- 7 files changed, 36 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index 3f00192215d1..eef1f4dc7232 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -864,7 +864,7 @@ static struct ttm_tt *bo_driver_ttm_tt_create(struct ttm_buffer_object *bo, if (!tt) return NULL; - ret = ttm_tt_init(tt, bo, page_flags, ttm_cached); + ret = ttm_tt_init(tt, bo, page_flags, ttm_cached, 0); if (ret < 0) goto err_ttm_tt_init; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 84cae740b4a5..bb71aa6d66c0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -16,6 +16,7 @@ #include "gem/i915_gem_ttm.h" #include "gem/i915_gem_ttm_move.h" #include "gem/i915_gem_ttm_pm.h" +#include "gt/intel_gpu_commands.h" #define I915_TTM_PRIO_PURGE 0 #define I915_TTM_PRIO_NO_PAGES 1 @@ -242,12 +243,27 @@ static const struct i915_refct_sgt_ops tt_rsgt_ops = { .release = i915_ttm_tt_release }; +static inline bool +i915_gem_object_has_lmem_placement(struct drm_i915_gem_object *obj) +{ + int i; + + for (i = 0; i < obj->mm.n_placements; i++) + if (obj->mm.placements[i]->type == INTEL_MEMORY_LOCAL) + return true; + + return false; +} + static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, uint32_t page_flags) { + struct drm_i915_private *i915 = container_of(bo->bdev, typeof(*i915), +bdev); struct ttm_resource_manager *man = ttm_manager_type(bo->bdev, bo->resource->mem_type); struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo); + unsigned long ccs_pages_needed = 0; enum ttm_caching caching; struct i915_ttm_tt *i915_tt; int ret; @@ -270,7 +286,12 @@ static struct ttm_tt *i915_ttm_tt_create(struct ttm_buffer_object *bo, i915_tt->is_shmem = true; } - ret = ttm_tt_init(_tt->ttm, bo, page_flags, caching); + if (HAS_FLAT_CCS(i915) && i915_gem_object_has_lmem_placement(obj)) + ccs_pages_needed = DIV_ROUND_UP(DIV_ROUND_UP(bo->base.size, + NUM_CCS_BYTES_PER_BLOCK), PAGE_SIZE); + + ret = ttm_tt_init(_tt->ttm, bo, page_flags, + caching, ccs_pages_needed); I'd suggest a patch that first adds the functionality to TTM, where even i915 passes in 0 here, and a follow-up patch for the i915 functionality where we add the ccs requirement. if (ret) goto err_free; diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index b2e33d5ba5d0..52156b54498f 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -113,7 +113,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct ttm_buffer_object *bo, ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL); if (ttm == NULL) return NULL; - if (ttm_tt_init(ttm, bo, page_flags, ttm_cached)) { + if (ttm_tt_init(ttm, bo, page_flags, ttm_cached, 0)) { kfree(ttm); return NULL; } diff --git a/drivers/gpu/drm/ttm/ttm_agp_backend.c b/drivers/gpu/drm/ttm/ttm_agp_backend.c index 6ddc16f0fe2b..d27691f2e451 100644 --- a/drivers/gpu/drm/ttm/ttm_agp_backend.c +++ b/drivers/gpu/drm/ttm/ttm_agp_backend.c @@ -134,7 +134,7 @@ struct ttm_tt *ttm_agp_tt_create(struct ttm_buffer_object *bo, agp_be->mem = NULL; agp_be->bridge = bridge; - if (ttm_tt_init(_be->ttm, bo, page_flags, ttm_write_combined)) { + if (ttm_tt_init(_be->ttm, bo, page_flags, ttm_write_combined, 0)) { kfree(agp_be); return
Re: [Intel-gfx] [PATCH] drm/i915: Allow dead vm to unbind vma's without lock.
On 1/28/22 09:57, Maarten Lankhorst wrote: i915_gem_vm_close may take the lock, and we currently have no better way of handling this. At least for now, allow a path in which holding vm->mutex is sufficient. This is the case, because the object destroy path will forcefully take vm->mutex now. Signed-off-by: Maarten Lankhorst Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/i915/i915_vma.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index b959e904c4d3..14a301c4069f 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -40,6 +40,17 @@ #include "i915_vma.h" #include "i915_vma_resource.h" +static inline void assert_vma_held_evict(const struct i915_vma *vma) +{ + /* +* We may be forced to unbind when the vm is dead, to clean it up. +* This is the only exception to the requirement of the object lock +* being held. +*/ + if (atomic_read(>vm->open)) + assert_object_held_shared(vma->obj); +} + static struct kmem_cache *slab_vmas; static struct i915_vma *i915_vma_alloc(void) @@ -1779,7 +1790,7 @@ struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async) struct dma_fence *unbind_fence; GEM_BUG_ON(i915_vma_is_pinned(vma)); - assert_object_held_shared(vma->obj); + assert_vma_held_evict(vma); if (i915_vma_is_map_and_fenceable(vma)) { /* Force a pagefault for domain tracking on next user access */ @@ -1846,7 +1857,7 @@ int __i915_vma_unbind(struct i915_vma *vma) int ret; lockdep_assert_held(>vm->mutex); - assert_object_held_shared(vma->obj); + assert_vma_held_evict(vma); if (!drm_mm_node_allocated(>node)) return 0;
Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag
On 1/26/22 18:11, Robert Beckett wrote: On 26/01/2022 13:49, Thomas Hellström (Intel) wrote: On 1/25/22 20:35, Robert Beckett wrote: From: Ramalingam C Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages. With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access. Suggested-by: Matthew Auld Signed-off-by: Ramalingam C Signed-off-by: Robert Beckett --- drivers/gpu/drm/i915/i915_drv.h | 10 +++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, /* * Set this flag, when platform requires 64K GTT page sizes or larger for - * device local memory access. Also this flag implies that we require or - * at least support the compact PT layout for the ppGTT when using the 64K - * GTT pages. Why do we remove these comment lines? Because HAS_64K_PAGES now means just 64K page, it no longer means also requires compact pt. This is to support other products that will have 64K but not have the PDE non-sharing restriction in future. Those lines moved to the next change NEEDS_COMPACT_PT, which is now separate. Yes, NEEDS_COMPACT_PT indicates that compact is *required* but does "HAS_64K_PAGES" still mean compact is supported? That information is lost. /Thomas
Re: [Intel-gfx] [PATCH v5 2/5] drm/i915: enforce min GTT alignment for discrete cards
On 1/25/22 20:35, Robert Beckett wrote: From: Matthew Auld For local-memory objects we need to align the GTT addresses to 64K, both for the ppgtt and ggtt. We need to support vm->min_alignment > 4K, depending on the vm itself and the type of object we are inserting. With this in mind update the GTT selftests to take this into account. For compact-pt we further align and pad lmem object GTT addresses to 2MB to ensure PDEs contain consistent page sizes as required by the HW. v3: * use needs_compact_pt flag to discriminate between 64K and 64K with compact-pt * add i915_vm_obj_min_alignment * use i915_vm_obj_min_alignment to round up vma reservation if compact-pt instead of hard coding v5: * fix i915_vm_obj_min_alignment for internal objects which have no memory region Signed-off-by: Matthew Auld Signed-off-by: Ramalingam C Signed-off-by: Robert Beckett Cc: Joonas Lahtinen Cc: Rodrigo Vivi --- .../i915/gem/selftests/i915_gem_client_blt.c | 23 +++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 12 +++ drivers/gpu/drm/i915/gt/intel_gtt.h | 18 drivers/gpu/drm/i915/i915_vma.c | 9 ++ drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 96 --- 5 files changed, 117 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c index c8ff8bf0986d..f0bfce53258f 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_client_blt.c @@ -39,6 +39,7 @@ struct tiled_blits { struct blit_buffer scratch; struct i915_vma *batch; u64 hole; + u64 align; u32 width; u32 height; }; @@ -410,14 +411,21 @@ tiled_blits_create(struct intel_engine_cs *engine, struct rnd_state *prng) goto err_free; } - hole_size = 2 * PAGE_ALIGN(WIDTH * HEIGHT * 4); + t->align = I915_GTT_PAGE_SIZE_2M; /* XXX worst case, derive from vm! */ + t->align = max(t->align, + i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_LOCAL)); + t->align = max(t->align, + i915_vm_min_alignment(t->ce->vm, INTEL_MEMORY_SYSTEM)); Don't we always end up with 2M here, regardless of the vm restrictions?
Re: [Intel-gfx] [PATCH v5 5/5] drm/i915/uapi: document behaviour for DG2 64K support
On 1/25/22 20:35, Robert Beckett wrote: From: Matthew Auld On discrete platforms like DG2, we need to support a minimum page size of 64K when dealing with device local-memory. This is quite tricky for various reasons, so try to document the new implicit uapi for this. v3: fix typos and less emphasis v2: Fixed suggestions on formatting [Daniel] Signed-off-by: Matthew Auld Signed-off-by: Ramalingam C Signed-off-by: Robert Beckett Acked-by: Jordan Justen Reviewed-by: Ramalingam C cc: Simon Ser cc: Pekka Paalanen Cc: Jordan Justen Cc: Kenneth Graunke Cc: mesa-...@lists.freedesktop.org Cc: Tony Ye Cc: Slawomir Milczarek --- Reviewed-by: Thomas Hellström
Re: [Intel-gfx] [PATCH v5 4/5] drm/i915: add gtt misalignment test
On 1/25/22 20:35, Robert Beckett wrote: add test to check handling of misaligned offsets and sizes v4: * remove spurious blank lines * explicitly cast intel_region_id to intel_memory_type in misaligned_pin Reported-by: kernel test robot Signed-off-by: Robert Beckett --- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 128 ++ 1 file changed, 128 insertions(+) diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index b80788a2b7f9..f082b5ff3b5e 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -22,10 +22,12 @@ * */ +#include "gt/intel_gtt.h" #include #include #include "gem/i915_gem_context.h" +#include "gem/i915_gem_region.h" #include "gem/selftests/mock_context.h" #include "gt/intel_context.h" #include "gt/intel_gpu_commands.h" @@ -1067,6 +1069,120 @@ static int shrink_boom(struct i915_address_space *vm, return err; } +static int misaligned_case(struct i915_address_space *vm, struct intel_memory_region *mr, + u64 addr, u64 size, unsigned long flags) +{ + struct drm_i915_gem_object *obj; + struct i915_vma *vma; + int err = 0; + u64 expected_vma_size, expected_node_size; + + obj = i915_gem_object_create_region(mr, size, 0, 0); + if (IS_ERR(obj)) + return PTR_ERR(obj); + + vma = i915_vma_instance(obj, vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_put; + } + + err = i915_vma_pin(vma, 0, 0, addr | flags); + if (err) + goto err_put; + i915_vma_unpin(vma); + + if (!drm_mm_node_allocated(>node)) { + err = -EINVAL; + goto err_put; + } + + if (i915_vma_misplaced(vma, 0, 0, addr | flags)) { + err = -EINVAL; + goto err_put; + } + + expected_vma_size = round_up(size, 1 << (ffs(vma->resource->page_sizes_gtt) - 1)); + expected_node_size = expected_vma_size; + + if (IS_DG2(vm->i915) && i915_gem_object_is_lmem(obj)) { + /* dg2 should expand lmem node to 2MB */ Should this test be NEEDS_COMPACT_PT()? Otherwise LGTM. Reviewed-by: Thomas Hellström
Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: add needs_compact_pt flag
On 1/25/22 20:35, Robert Beckett wrote: From: Ramalingam C Add a new platform flag, needs_compact_pt, to mark the requirement of compact pt layout support for the ppGTT when using 64K GTT pages. With this flag has_64k_pages will only indicate requirement of 64K GTT page sizes or larger for device local memory access. Suggested-by: Matthew Auld Signed-off-by: Ramalingam C Signed-off-by: Robert Beckett --- drivers/gpu/drm/i915/i915_drv.h | 10 +++--- drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 44c1f98144b4..1258b7779705 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1512,12 +1512,16 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, /* * Set this flag, when platform requires 64K GTT page sizes or larger for - * device local memory access. Also this flag implies that we require or - * at least support the compact PT layout for the ppGTT when using the 64K - * GTT pages. Why do we remove these comment lines? + * device local memory access. */ #define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages) +/* Set this flag when platform doesn't allow both 64k pages and 4k pages in First line of multi-line comments should be empty. + * the same PT. this flag means we need to support compact PT layout for the + * ppGTT when using the 64K GTT pages. + */ +#define NEEDS_COMPACT_PT(dev_priv) (INTEL_INFO(dev_priv)->needs_compact_pt) + #define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc) #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i)) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 4081fd50ba9d..799b56569ef5 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1028,6 +1028,7 @@ static const struct intel_device_info xehpsdv_info = { PLATFORM(INTEL_XEHPSDV), .display = { }, .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | BIT(VECS2) | BIT(VECS3) | @@ -1045,6 +1046,7 @@ static const struct intel_device_info dg2_info = { .media.rel = 55, PLATFORM(INTEL_DG2), .has_64k_pages = 1, + .needs_compact_pt = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 3699b1c539ea..c8aaf646430c 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -130,6 +130,7 @@ enum intel_ppgtt_type { /* Keep has_* in alphabetical order */ \ func(has_64bit_reloc); \ func(has_64k_pages); \ + func(needs_compact_pt); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ func(has_global_mocs); \
Re: [Intel-gfx] [PATCH 06/11] dma-buf: warn about containers in dma_resv object
On 1/24/22 14:03, Christian König wrote: Drivers should not add containers as shared fences to the dma_resv object, instead each fence should be added individually. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Reviewed-by: Thomas Hellström Is there any indication that this triggers on existing drivers? Thomas
Re: [Intel-gfx] [PATCH] drm/i915: Lock dpt_obj around set_cache_level.
On 1/24/22 14:02, Maarten Lankhorst wrote: set_cache_level may unbind the object, which will result in the below lockdep splat: <6> [184.578145] [IGT] kms_addfb_basic: starting subtest addfb25-framebuffer-vs-set-tiling <4> [184.578220] [ cut here ] <4> [184.578221] WARN_ON(debug_locks && !(lock_is_held(&(&((obj)->base.resv)->lock.base)->dep_map) != 0)) <4> [184.578237] WARNING: CPU: 6 PID: 5544 at drivers/gpu/drm/i915/i915_gem.c:123 i915_gem_object_unbind+0x4a9/0x510 [i915] <4> [184.578323] Modules linked in: vgem drm_shmem_helper snd_hda_codec_hdmi i915 mei_hdcp x86_pkg_temp_thermal snd_hda_intel coretemp crct10dif_pclmul snd_intel_dspcfg crc32_pclmul ttm snd_hda_codec ghash_clmulni_intel snd_hwdep drm_kms_helper snd_hda_core e1000e mei_me syscopyarea ptp snd_pcm sysfillrect mei pps_core sysimgblt fb_sys_fops prime_numbers intel_lpss_pci smsc75xx usbnet mii <4> [184.578349] CPU: 6 PID: 5544 Comm: kms_addfb_basic Not tainted 5.16.0-CI-Patchwork_22006+ #1 <4> [184.578351] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS ADLPFWI1.R00.2422.A00.2110131104 10/13/2021 <4> [184.578352] RIP: 0010:i915_gem_object_unbind+0x4a9/0x510 [i915] <4> [184.578424] Code: 00 be ff ff ff ff 48 8d 78 68 e8 a2 6e 2b e1 85 c0 0f 85 b1 fb ff ff 48 c7 c6 48 37 9e a0 48 c7 c7 d9 fc a1 a0 e8 a3 54 26 e1 <0f> 0b e9 97 fb ff ff 31 ed 48 8b 5c 24 58 65 48 33 1c 25 28 00 00 <4> [184.578426] RSP: 0018:c900013b3b68 EFLAGS: 00010286 <4> [184.578428] RAX: RBX: c900013b3bb0 RCX: 0001 <4> [184.578429] RDX: 8001 RSI: 8230b42d RDI: <4> [184.578430] RBP: 888120e1 R08: R09: c0007fff <4> [184.578431] R10: 0001 R11: c900013b3980 R12: 8881176ea740 <4> [184.578432] R13: 888120e1 R14: R15: 0001 <4> [184.578433] FS: 7f65074f5e40() GS:8f30() knlGS: <4> [184.578435] CS: 0010 DS: ES: CR0: 80050033 <4> [184.578436] CR2: 7fff4420ede8 CR3: 00010c2f2005 CR4: 00770ee0 <4> [184.578437] PKRU: 5554 <4> [184.578438] Call Trace: <4> [184.578439] <4> [184.578440] ? dma_resv_iter_first_unlocked+0x78/0xf0 <4> [184.578447] intel_dpt_create+0x88/0x220 [i915] <4> [184.578530] intel_framebuffer_init+0x5b8/0x620 [i915] <4> [184.578612] intel_framebuffer_create+0x3d/0x60 [i915] <4> [184.578691] intel_user_framebuffer_create+0x18f/0x2c0 [i915] <4> [184.578775] drm_internal_framebuffer_create+0x36d/0x4c0 <4> [184.578779] drm_mode_addfb2+0x2f/0xd0 <4> [184.578781] ? drm_mode_addfb_ioctl+0x10/0x10 <4> [184.578784] drm_ioctl_kernel+0xac/0x140 <4> [184.578787] drm_ioctl+0x201/0x3d0 <4> [184.578789] ? drm_mode_addfb_ioctl+0x10/0x10 <4> [184.578796] __x64_sys_ioctl+0x6a/0xa0 <4> [184.578800] do_syscall_64+0x37/0xb0 <4> [184.578803] entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [184.578805] RIP: 0033:0x7f6506736317 <4> [184.578807] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48 <4> [184.578808] RSP: 002b:7fff44211a98 EFLAGS: 0246 ORIG_RAX: 0010 <4> [184.578810] RAX: ffda RBX: 0006 RCX: 7f6506736317 <4> [184.578811] RDX: 7fff44211b30 RSI: c06864b8 RDI: 0006 <4> [184.578812] RBP: 7fff44211b30 R08: 7fff44311170 R09: <4> [184.578813] R10: 0008 R11: 0246 R12: c06864b8 <4> [184.578813] R13: 0006 R14: R15: <4> [184.578819] <4> [184.578820] irq event stamp: 47931 <4> [184.578821] hardirqs last enabled at (47937): [] __up_console_sem+0x62/0x70 <4> [184.578824] hardirqs last disabled at (47942): [] __up_console_sem+0x47/0x70 <4> [184.578826] softirqs last enabled at (47340): [] __do_softirq+0x32d/0x493 <4> [184.578828] softirqs last disabled at (47335): [] irq_exit_rcu+0xa6/0xe0 <4> [184.578830] ---[ end trace f17ec219f892c7d4 ]--- Fixes: 0f341974cbc2 ("drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for i915_vma_unbind, v2.") Signed-off-by: Maarten Lankhorst Testcase: kms_addfb_basic --- drivers/gpu/drm/i915/display/intel_dpt.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) LGTM. Reviewed-by: Thomas Hellström diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c index 63
Re: [Linaro-mm-sig] [PATCH 1/9] dma-buf: consolidate dma_fence subclass checking
On 1/20/22 14:27, Christian König wrote: Consolidate the wrapper functions to check for dma_fence subclasses in the dma_fence header. This makes it easier to document and also check the different requirements for fence containers in the subclasses. Signed-off-by: Christian König --- include/linux/dma-fence-array.h | 15 + include/linux/dma-fence-chain.h | 3 +-- include/linux/dma-fence.h | 38 + 3 files changed, 40 insertions(+), 16 deletions(-) diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h index 303dd712220f..fec374f69e12 100644 --- a/include/linux/dma-fence-array.h +++ b/include/linux/dma-fence-array.h @@ -45,19 +45,6 @@ struct dma_fence_array { struct irq_work work; }; -extern const struct dma_fence_ops dma_fence_array_ops; - -/** - * dma_fence_is_array - check if a fence is from the array subsclass - * @fence: fence to test - * - * Return true if it is a dma_fence_array and false otherwise. - */ -static inline bool dma_fence_is_array(struct dma_fence *fence) -{ - return fence->ops == _fence_array_ops; -} - /** * to_dma_fence_array - cast a fence to a dma_fence_array * @fence: fence to cast to a dma_fence_array @@ -68,7 +55,7 @@ static inline bool dma_fence_is_array(struct dma_fence *fence) static inline struct dma_fence_array * to_dma_fence_array(struct dma_fence *fence) { - if (fence->ops != _fence_array_ops) + if (!fence || !dma_fence_is_array(fence)) return NULL; return container_of(fence, struct dma_fence_array, base); diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h index 54fe3443fd2c..ee906b659694 100644 --- a/include/linux/dma-fence-chain.h +++ b/include/linux/dma-fence-chain.h @@ -49,7 +49,6 @@ struct dma_fence_chain { spinlock_t lock; }; -extern const struct dma_fence_ops dma_fence_chain_ops; /** * to_dma_fence_chain - cast a fence to a dma_fence_chain @@ -61,7 +60,7 @@ extern const struct dma_fence_ops dma_fence_chain_ops; static inline struct dma_fence_chain * to_dma_fence_chain(struct dma_fence *fence) { - if (!fence || fence->ops != _fence_chain_ops) + if (!fence || !dma_fence_is_chain(fence)) return NULL; return container_of(fence, struct dma_fence_chain, base); diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 1ea691753bd3..775cdc0b4f24 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -587,4 +587,42 @@ struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num); +extern const struct dma_fence_ops dma_fence_array_ops; +extern const struct dma_fence_ops dma_fence_chain_ops; + +/** + * dma_fence_is_array - check if a fence is from the array subclass + * @fence: the fence to test + * + * Return true if it is a dma_fence_array and false otherwise. + */ +static inline bool dma_fence_is_array(struct dma_fence *fence) +{ + return fence->ops == _fence_array_ops; +} + +/** + * dma_fence_is_chain - check if a fence is from the chain subclass + * @fence: the fence to test + * + * Return true if it is a dma_fence_chain and false otherwise. + */ +static inline bool dma_fence_is_chain(struct dma_fence *fence) +{ + return fence->ops == _fence_chain_ops; +} + +/** + * dma_fence_is_container - check if a fence is a container for other fences + * @fence: the fence to test + * + * Return true if this fence is a container for other fences, false otherwise. + * This is important since we can't build up large fence structure or otherwise + * we run into recursion during operation on those fences. + */ +static inline bool dma_fence_is_container(struct dma_fence *fence) +{ + return dma_fence_is_array(fence) || dma_fence_is_chain(fence); +} What's the strategy here moving forward if we add more dma_resv containers, or if a driver adds a container that similarly has risc of recursion? Should we perhaps add an ops bool for this, or require that all dma_resv containers that has this limitation be part of the dma-buf subsystem rather than driver-specific? Thanks, /Thomas + #endif /* __LINUX_DMA_FENCE_H */
Re: [Intel-gfx] [PATCH v6 2/6] drm/i915: Add locking to i915_gem_evict_vm(), v2.
On 1/14/22 14:23, Maarten Lankhorst wrote: i915_gem_evict_vm will need to be able to evict objects that are locked by the current ctx. By testing if the current context already locked the object, we can do this correctly. This allows us to evict the entire vm even if we already hold some objects' locks. Previously, this was spread over several commits, but it makes more sense to commit the changes to i915_gem_evict_vm separately from the changes to i915_gem_evict_something() and i915_gem_evict_for_node(). Changes since v1: - Handle evicting dead objects better. Signed-off-by: Maarten Lankhorst --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 2 +- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +- drivers/gpu/drm/i915/i915_gem_evict.c | 32 +-- drivers/gpu/drm/i915/i915_gem_evict.h | 4 ++- drivers/gpu/drm/i915/i915_vma.c | 7 +++- .../gpu/drm/i915/selftests/i915_gem_evict.c | 10 -- 6 files changed, 49 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index cf283b5f6ffe..4d832d6696b5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -767,7 +767,7 @@ static int eb_reserve(struct i915_execbuffer *eb) case 1: /* Too fragmented, unbind everything and retry */ mutex_lock(>context->vm->mutex); - err = i915_gem_evict_vm(eb->context->vm); + err = i915_gem_evict_vm(eb->context->vm, >ww); mutex_unlock(>context->vm->mutex); if (err) return err; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index fafd158e5313..a69787999d09 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -367,7 +367,7 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf) if (vma == ERR_PTR(-ENOSPC)) { ret = mutex_lock_interruptible(>vm.mutex); if (!ret) { - ret = i915_gem_evict_vm(>vm); + ret = i915_gem_evict_vm(>vm, ); mutex_unlock(>vm.mutex); } if (ret) diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 24eee0c2055f..91f82ecb9ef4 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -368,7 +368,7 @@ int i915_gem_evict_for_node(struct i915_address_space *vm, * To clarify: This is for freeing up virtual address space, not for freeing * memory in e.g. the shrinker. */ -int i915_gem_evict_vm(struct i915_address_space *vm) +int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww) { int ret = 0; @@ -389,24 +389,52 @@ int i915_gem_evict_vm(struct i915_address_space *vm) do { struct i915_vma *vma, *vn; LIST_HEAD(eviction_list); + LIST_HEAD(locked_eviction_list); list_for_each_entry(vma, >bound_list, vm_link) { if (i915_vma_is_pinned(vma)) continue; + /* +* If we already own the lock, trylock fails. In case +* the resv is shared among multiple objects, we still +* need the object ref. +*/ Should we handle dying vmas like for the other eviction utilities and also evict pinned dying vmas? + if (!kref_read(>obj->base.refcount) || + (ww && (dma_resv_locking_ctx(vma->obj->base.resv) == >ctx))) { + __i915_vma_pin(vma); + list_add(>evict_link, _eviction_list); + continue; + } + + if (!i915_gem_object_trylock(vma->obj, ww)) + continue; + __i915_vma_pin(vma); list_add(>evict_link, _list); } - if (list_empty(_list)) + if (list_empty(_list) && list_empty(_eviction_list)) break; ret = 0; + /* Unbind locked objects first, before unlocking the eviction_list */ + list_for_each_entry_safe(vma, vn, _eviction_list, evict_link) { + __i915_vma_unpin(vma); + + if (ret == 0) + ret = __i915_vma_unbind(vma); + if (ret != -EINTR) /* "Get me out of here!" */ + ret = 0; + } +
Re: [Intel-gfx] [PATCH v5 2/6] drm/i915: Add locking to i915_gem_evict_vm()
On 1/13/22 12:44, Maarten Lankhorst wrote: i915_gem_evict_vm will need to be able to evict objects that are locked by the current ctx. By testing if the current context already locked the object, we can do this correctly. This allows us to evict the entire vm even if we already hold some objects' locks. Previously, this was spread over several commits, but it makes more sense to commit the changes to i915_gem_evict_vm separately from the changes to i915_gem_evict_something() and i915_gem_evict_for_node(). Signed-off-by: Maarten Lankhorst --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 2 +- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +- drivers/gpu/drm/i915/i915_gem_evict.c | 42 ++- drivers/gpu/drm/i915/i915_gem_evict.h | 4 +- drivers/gpu/drm/i915/i915_vma.c | 7 +++- .../gpu/drm/i915/selftests/i915_gem_evict.c | 10 - 6 files changed, 59 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index cf283b5f6ffe..4d832d6696b5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -767,7 +767,7 @@ static int eb_reserve(struct i915_execbuffer *eb) case 1: /* Too fragmented, unbind everything and retry */ mutex_lock(>context->vm->mutex); - err = i915_gem_evict_vm(eb->context->vm); + err = i915_gem_evict_vm(eb->context->vm, >ww); mutex_unlock(>context->vm->mutex); if (err) return err; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index fafd158e5313..a69787999d09 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -367,7 +367,7 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf) if (vma == ERR_PTR(-ENOSPC)) { ret = mutex_lock_interruptible(>vm.mutex); if (!ret) { - ret = i915_gem_evict_vm(>vm); + ret = i915_gem_evict_vm(>vm, ); mutex_unlock(>vm.mutex); } if (ret) diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 24eee0c2055f..370eb7238d1c 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -74,6 +74,12 @@ static bool defer_evict(struct i915_vma *vma) return false; } +static int evict_dead(struct i915_vma *vma) +{ + atomic_and(~I915_VMA_PIN_MASK, >flags); + return __i915_vma_unbind(vma); +} + /** * i915_gem_evict_something - Evict vmas to make room for binding a new one * @vm: address space to evict from @@ -368,7 +374,7 @@ int i915_gem_evict_for_node(struct i915_address_space *vm, * To clarify: This is for freeing up virtual address space, not for freeing * memory in e.g. the shrinker. */ -int i915_gem_evict_vm(struct i915_address_space *vm) +int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww) { int ret = 0; @@ -389,24 +395,56 @@ int i915_gem_evict_vm(struct i915_address_space *vm) do { struct i915_vma *vma, *vn; LIST_HEAD(eviction_list); + LIST_HEAD(locked_eviction_list); list_for_each_entry(vma, >bound_list, vm_link) { + if (!kref_read(>obj->base.refcount)) { + ret = evict_dead(vma); + if (ret) + break; + } + Could the call to evict_dead corrupt the bound_list? if (i915_vma_is_pinned(vma)) continue; + /* +* If we already own the lock, trylock fails. In case the resv +* is shared among multiple objects, we still need the object ref. +*/ + if (ww && (dma_resv_locking_ctx(vma->obj->base.resv) == >ctx)) { + __i915_vma_pin(vma); + list_add(>evict_link, _eviction_list); + continue; + } + + if (!i915_gem_object_trylock(vma->obj, ww)) + continue; + __i915_vma_pin(vma); list_add(>evict_link, _list); } - if (list_empty(_list)) + if (list_empty(_list) && list_empty(_eviction_list)) break; ret = 0; + /* Unbind locked objects first, before unlocking the eviction_list
Re: [Intel-gfx] [PATCH 2/2] drm/i915: clean up shrinker_release_pages
Hi, Matthew On 12/15/21 12:07, Matthew Auld wrote: Add some proper flags for the different modes, and shorten the name to something more snappy. Suggested-by: Tvrtko Ursulin Signed-off-by: Matthew Auld LGTM. Reviewed-by: Thomas Hellström
Re: [Intel-gfx] [PATCH] drm/i915: Use trylock instead of blocking lock for __i915_gem_free_objects.
On 12/22/21 16:56, Maarten Lankhorst wrote: Convert free_work into delayed_work, similar to ttm to allow converting the blocking lock in __i915_gem_free_objects to a trylock. Unlike ttm, the object should already be idle, as it's kept alive by a reference through struct i915_vma->active, which is dropped after all vma's are idle. Because of this, we can use a no wait by default, or when the lock is contested, we use ttm's 10 ms. The trylock should only fail when the object is sharing it's resv with other objects, and typically objects are not kept locked for a long time, so we can safely retry on failure. Fixes: be7612fd6665 ("drm/i915: Require object lock when freeing pages during destruction") Testcase: igt/gem_exec_alignment/pi* Signed-off-by: Maarten Lankhorst --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 14 ++ drivers/gpu/drm/i915/i915_drv.h| 4 ++-- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 39cd563544a5..d87b508b59b1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -331,7 +331,13 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, continue; } - i915_gem_object_lock(obj, NULL); + if (!i915_gem_object_trylock(obj, NULL)) { + /* busy, toss it back to the pile */ + if (llist_add(>freed, >mm.free_list)) + queue_delayed_work(i915->wq, >mm.free_work, msecs_to_jiffies(10)); i915->wq is ordered. From what I can tell, with queue_delayed_work(), the work doesn't get inserted into the queue order until the delay expires, right? So we don't unnecessarily hold up other objects getting freed? + continue; + } + __i915_gem_object_pages_fini(obj); i915_gem_object_unlock(obj); __i915_gem_free_object(obj); @@ -353,7 +359,7 @@ void i915_gem_flush_free_objects(struct drm_i915_private *i915) static void __i915_gem_free_work(struct work_struct *work) { struct drm_i915_private *i915 = - container_of(work, struct drm_i915_private, mm.free_work); + container_of(work, struct drm_i915_private, mm.free_work.work); i915_gem_flush_free_objects(i915); } @@ -385,7 +391,7 @@ static void i915_gem_free_object(struct drm_gem_object *gem_obj) */ if (llist_add(>freed, >mm.free_list)) - queue_work(i915->wq, >mm.free_work); + queue_delayed_work(i915->wq, >mm.free_work, 0); } void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj, @@ -710,7 +716,7 @@ bool i915_gem_object_placement_possible(struct drm_i915_gem_object *obj, void i915_gem_init__objects(struct drm_i915_private *i915) { - INIT_WORK(>mm.free_work, __i915_gem_free_work); + INIT_DELAYED_WORK(>mm.free_work, __i915_gem_free_work); } void i915_objects_module_exit(void) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c8fddb7e61c9..beeb42a14aae 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -465,7 +465,7 @@ struct i915_gem_mm { * List of objects which are pending destruction. */ struct llist_head free_list; - struct work_struct free_work; + struct delayed_work free_work; /** * Count of objects pending destructions. Used to skip needlessly * waiting on an RCU barrier if no objects are waiting to be freed. @@ -1625,7 +1625,7 @@ static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915) * armed the work again. */ while (atomic_read(>mm.free_count)) { - flush_work(>mm.free_work); + flush_delayed_work(>mm.free_work); flush_delayed_work(>bdev.wq); rcu_barrier(); } Otherwise LGTM. Reviewed-by: Thomas Hellström
Re: [Intel-gfx] [PATCH 1/4] drm/i915: Add has_64k_pages flag
On 12/8/21 13:59, Matthew Auld wrote: On Wed, 8 Dec 2021 at 12:43, Thomas Hellström (Intel) wrote: Hi, On 12/7/21 17:51, Ramalingam C wrote: From: Stuart Summers Add a new platform flag, has_64k_pages, for platforms supporting base page sizes of 64k. Signed-off-by: Stuart Summers Signed-off-by: Ramalingam C Reviewed-by: Lucas De Marchi --- drivers/gpu/drm/i915/i915_drv.h | 2 ++ drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 5 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 85bb8d3107f0..6132163e1cb3 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1528,6 +1528,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, #define HAS_MSLICES(dev_priv) \ (INTEL_INFO(dev_priv)->has_mslices) +#define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages) + Could we please have some documentation of the exact meaning of this flag. Like, smallest page size of LMEM is 64K. Hardware supports 64k pages etc? Something like: "Set if the device requires 64K GTT page sizes or larger for device local memory access. Also implies that we require or at least support the compact PT layout for the ppGTT when using 64K GTT pages." Sounds great. Thanks, Thomas
Re: [Intel-gfx] [PATCH 2/4] drm/i915/xehpsdv: set min page-size to 64K
On 12/7/21 17:51, Ramalingam C wrote: From: Matthew Auld LMEM should be allocated at 64K granularity, since 4K page support will eventually be dropped for LMEM when using the PPGTT. Signed-off-by: Matthew Auld Signed-off-by: Stuart Summers Signed-off-by: Ramalingam C Cc: Joonas Lahtinen Cc: Rodrigo Vivi Reviewed-by: Lucas De Marchi Reviewed-by: Thomas Hellstrom --- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 6 +- drivers/gpu/drm/i915/gt/intel_region_lmem.c | 5 - 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index bce03d74a0b4..ba90ab47d838 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -780,6 +780,7 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type, struct intel_uncore *uncore = >uncore; struct pci_dev *pdev = to_pci_dev(i915->drm.dev); struct intel_memory_region *mem; + resource_size_t min_page_size; resource_size_t io_start; resource_size_t lmem_size; u64 lmem_base; @@ -791,8 +792,11 @@ i915_gem_stolen_lmem_setup(struct drm_i915_private *i915, u16 type, lmem_size = pci_resource_len(pdev, 2) - lmem_base; io_start = pci_resource_start(pdev, 2) + lmem_base; + min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K : + I915_GTT_PAGE_SIZE_4K; + mem = intel_memory_region_create(i915, lmem_base, lmem_size, -I915_GTT_PAGE_SIZE_4K, io_start, +min_page_size, io_start, type, instance, _region_stolen_lmem_ops); if (IS_ERR(mem)) diff --git a/drivers/gpu/drm/i915/gt/intel_region_lmem.c b/drivers/gpu/drm/i915/gt/intel_region_lmem.c index 9ea49e0a27c0..fde2dcb59809 100644 --- a/drivers/gpu/drm/i915/gt/intel_region_lmem.c +++ b/drivers/gpu/drm/i915/gt/intel_region_lmem.c @@ -197,6 +197,7 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) struct intel_uncore *uncore = gt->uncore; struct pci_dev *pdev = to_pci_dev(i915->drm.dev); struct intel_memory_region *mem; + resource_size_t min_page_size; resource_size_t io_start; resource_size_t lmem_size; int err; @@ -211,10 +212,12 @@ static struct intel_memory_region *setup_lmem(struct intel_gt *gt) if (GEM_WARN_ON(lmem_size > pci_resource_len(pdev, 2))) return ERR_PTR(-ENODEV); + min_page_size = HAS_64K_PAGES(i915) ? I915_GTT_PAGE_SIZE_64K : + I915_GTT_PAGE_SIZE_4K; mem = intel_memory_region_create(i915, 0, lmem_size, -I915_GTT_PAGE_SIZE_4K, +min_page_size, io_start, INTEL_MEMORY_LOCAL, 0,
Re: [Intel-gfx] [PATCH 1/4] drm/i915: Add has_64k_pages flag
Hi, On 12/7/21 17:51, Ramalingam C wrote: From: Stuart Summers Add a new platform flag, has_64k_pages, for platforms supporting base page sizes of 64k. Signed-off-by: Stuart Summers Signed-off-by: Ramalingam C Reviewed-by: Lucas De Marchi --- drivers/gpu/drm/i915/i915_drv.h | 2 ++ drivers/gpu/drm/i915/i915_pci.c | 2 ++ drivers/gpu/drm/i915/intel_device_info.h | 1 + 3 files changed, 5 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 85bb8d3107f0..6132163e1cb3 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1528,6 +1528,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, #define HAS_MSLICES(dev_priv) \ (INTEL_INFO(dev_priv)->has_mslices) +#define HAS_64K_PAGES(dev_priv) (INTEL_INFO(dev_priv)->has_64k_pages) + Could we please have some documentation of the exact meaning of this flag. Like, smallest page size of LMEM is 64K. Hardware supports 64k pages etc? for future reference. /Thomas #define HAS_IPC(dev_priv) (INTEL_INFO(dev_priv)->display.has_ipc) #define HAS_REGION(i915, i) (INTEL_INFO(i915)->memory_regions & (i)) diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index 6aaa7c644c9b..634282edadb7 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -1029,6 +1029,7 @@ static const struct intel_device_info xehpsdv_info = { DGFX_FEATURES, PLATFORM(INTEL_XEHPSDV), .display = { }, + .has_64k_pages = 1, .pipe_mask = 0, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | @@ -1047,6 +1048,7 @@ static const struct intel_device_info dg2_info = { .graphics.rel = 55, .media.rel = 55, PLATFORM(INTEL_DG2), + .has_64k_pages = 1, .platform_engine_mask = BIT(RCS0) | BIT(BCS0) | BIT(VECS0) | BIT(VECS1) | diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h index 669f0d26c3c3..f38ac5bd837b 100644 --- a/drivers/gpu/drm/i915/intel_device_info.h +++ b/drivers/gpu/drm/i915/intel_device_info.h @@ -123,6 +123,7 @@ enum intel_ppgtt_type { func(is_dgfx); \ /* Keep has_* in alphabetical order */ \ func(has_64bit_reloc); \ + func(has_64k_pages); \ func(gpu_reset_clobbers_display); \ func(has_reset_engine); \ func(has_global_mocs); \
Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes
On 12/3/21 16:00, Christian König wrote: Am 03.12.21 um 15:50 schrieb Thomas Hellström: On 12/3/21 15:26, Christian König wrote: [Adding Daniel here as well] Am 03.12.21 um 15:18 schrieb Thomas Hellström: [SNIP] Well that's ok as well. My question is why does this single dma_fence then shows up in the dma_fence_chain representing the whole migration? What we'd like to happen during eviction is that we 1) await any exclusive- or moving fences, then schedule the migration blit. The blit manages its own GPU ptes. Results in a single fence. 2) Schedule unbind of any gpu vmas, resulting possibly in multiple fences. 3) Most but not all of the remaining resv shared fences will have been finished in 2) We can't easily tell which so we have a couple of shared fences left. Stop, wait a second here. We are going a bit in circles. Before you migrate a buffer, you *MUST* wait for all shared fences to complete. This is documented mandatory DMA-buf behavior. Daniel and I have discussed that quite extensively in the last few month. So how does it come that you do the blit before all shared fences are completed? Well we don't currently but wanted to... (I haven't consulted Daniel in the matter, tbh). I was under the impression that all writes would add an exclusive fence to the dma_resv. Yes that's correct. I'm working on to have more than one write fence, but that is currently under review. If that's not the case or this is otherwise against the mandatory DMA-buf bevhavior, we can certainly keep that part as is and that would eliminate 3). Ah, now that somewhat starts to make sense. So your blit only waits for the writes to finish before starting the blit. Yes that's legal as long as you don't change the original content with the blit. But don't you then need to wait for both reads and writes before you unmap the VMAs? Yes, but that's planned to be done all async, and those unbind jobs are scheduled simultaneosly with the blit, and the blit itself manages its own page-table-entries, so no need to unbind any blit vmas. Anyway the good news is your problem totally goes away with the DMA-resv rework I've already send out. Basically it is now possible to have more than one fence in the DMA-resv object for migrations and all existing fences are kept around until they are finished. Sounds good. Thanks, Thomas
Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes
On 12/1/21 12:25, Christian König wrote: Am 01.12.21 um 12:04 schrieb Thomas Hellström (Intel): On 12/1/21 11:32, Christian König wrote: Am 01.12.21 um 11:15 schrieb Thomas Hellström (Intel): [SNIP] What we could do is to avoid all this by not calling the callback with the lock held in the first place. If that's possible that might be a good idea, pls also see below. The problem with that is dma_fence_signal_locked()/dma_fence_signal_timestamp_locked(). If we could avoid using that or at least allow it to drop the lock then we could call the callback without holding it. Somebody would need to audit the drivers and see if holding the lock is really necessary anywhere. /Thomas Oh, and a follow up question: If there was a way to break the recursion on final put() (using the same basic approach as patch 2 in this series uses to break recursion in enable_signaling()), so that none of these containers did require any special treatment, would it be worth pursuing? I guess it might be possible by having the callbacks drop the references rather than the loop in the final put. + a couple of changes in code iterating over the fence pointers. That won't really help, you just move the recursion from the final put into the callback. How do we recurse from the callback? The introduced fence_put() of individual fence pointers doesn't recurse anymore (at most 1 level), and any callback recursion is broken by the irq_work? Yeah, but then you would need to take another lock to avoid racing with dma_fence_array_signaled(). I figure the big amount of work would be to adjust code that iterates over the individual fence pointers to recognize that they are rcu protected. Could be that we could solve this with RCU, but that sounds like a lot of churn for no gain at all. In other words even with the problems solved I think it would be a really bad idea to allow chaining of dma_fence_array objects. Yes, that was really the question, Is it worth pursuing this? I'm not really suggesting we should allow this as an intentional feature. I'm worried, however, that if we allow these containers to start floating around cross-driver (or even internally) disguised as ordinary dma_fences, they would require a lot of driver special casing, or else completely unexpeced WARN_ON()s and lockdep splats would start to turn up, scaring people off from using them. And that would be a breeding ground for hairy driver-private constructs. Well the question is why we would want to do it? If it's to avoid inter driver lock dependencies by avoiding to call the callback with the spinlock held, then yes please. We had tons of problems with that, resulting in irq_work and work_item delegation all over the place. Yes, that sounds like something desirable, but in these containers, what's causing the lock dependencies is the enable_signaling() callback that is typically called locked. If it's to allow nesting of dma_fence_array instances, then it's most likely a really bad idea even if we fix all the locking order problems. Well I think my use-case where I hit a dead end may illustrate what worries me here: 1) We use a dma-fence-array to coalesce all dependencies for ttm object migration. 2) We use a dma-fence-chain to order the resulting dm_fence into a timeline because the TTM resource manager code requires that. Initially seemingly harmless to me. But after a sequence evict->alloc->clear, the dma-fence-chain feeds into the dma-fence-array for the clearing operation. Code still works fine, and no deep recursion, no warnings. But if I were to add another driver to the system that instead feeds a dma-fence-array into a dma-fence-chain, this would give me a lockdep splat. So then if somebody were to come up with the splendid idea of using a dma-fence-chain to initially coalesce fences, I'd hit the same problem or risk illegaly joining two dma-fence-chains together. To fix this, I would need to look at the incoming fences and iterate over any dma-fence-array or dma-fence-chain that is fed into the dma-fence-array to flatten out the input. In fact all dma-fence-array users would need to do that, and even dma-fence-chain users watching out for not joining chains together or accidently add an array that perhaps came as a disguised dma-fence from antother driver. So the purpose to me would be to allow these containers as input to eachother without a lot of in-driver special-casing, be it by breaking recursion on built-in flattening to avoid a) Hitting issues in the future or with existing interoperating drivers. b) Avoid driver-private containers that also might break the interoperability. (For example the i915 currently driver-private dma_fence_work avoid all these problems, but we're attempting to address issues in common code rather than re-inventing stuff internally). I don't think that a dma_fence_array or dma_fence_chain is the right
Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes
On 12/1/21 11:32, Christian König wrote: Am 01.12.21 um 11:15 schrieb Thomas Hellström (Intel): [SNIP] What we could do is to avoid all this by not calling the callback with the lock held in the first place. If that's possible that might be a good idea, pls also see below. The problem with that is dma_fence_signal_locked()/dma_fence_signal_timestamp_locked(). If we could avoid using that or at least allow it to drop the lock then we could call the callback without holding it. Somebody would need to audit the drivers and see if holding the lock is really necessary anywhere. /Thomas Oh, and a follow up question: If there was a way to break the recursion on final put() (using the same basic approach as patch 2 in this series uses to break recursion in enable_signaling()), so that none of these containers did require any special treatment, would it be worth pursuing? I guess it might be possible by having the callbacks drop the references rather than the loop in the final put. + a couple of changes in code iterating over the fence pointers. That won't really help, you just move the recursion from the final put into the callback. How do we recurse from the callback? The introduced fence_put() of individual fence pointers doesn't recurse anymore (at most 1 level), and any callback recursion is broken by the irq_work? Yeah, but then you would need to take another lock to avoid racing with dma_fence_array_signaled(). I figure the big amount of work would be to adjust code that iterates over the individual fence pointers to recognize that they are rcu protected. Could be that we could solve this with RCU, but that sounds like a lot of churn for no gain at all. In other words even with the problems solved I think it would be a really bad idea to allow chaining of dma_fence_array objects. Yes, that was really the question, Is it worth pursuing this? I'm not really suggesting we should allow this as an intentional feature. I'm worried, however, that if we allow these containers to start floating around cross-driver (or even internally) disguised as ordinary dma_fences, they would require a lot of driver special casing, or else completely unexpeced WARN_ON()s and lockdep splats would start to turn up, scaring people off from using them. And that would be a breeding ground for hairy driver-private constructs. Well the question is why we would want to do it? If it's to avoid inter driver lock dependencies by avoiding to call the callback with the spinlock held, then yes please. We had tons of problems with that, resulting in irq_work and work_item delegation all over the place. Yes, that sounds like something desirable, but in these containers, what's causing the lock dependencies is the enable_signaling() callback that is typically called locked. If it's to allow nesting of dma_fence_array instances, then it's most likely a really bad idea even if we fix all the locking order problems. Well I think my use-case where I hit a dead end may illustrate what worries me here: 1) We use a dma-fence-array to coalesce all dependencies for ttm object migration. 2) We use a dma-fence-chain to order the resulting dm_fence into a timeline because the TTM resource manager code requires that. Initially seemingly harmless to me. But after a sequence evict->alloc->clear, the dma-fence-chain feeds into the dma-fence-array for the clearing operation. Code still works fine, and no deep recursion, no warnings. But if I were to add another driver to the system that instead feeds a dma-fence-array into a dma-fence-chain, this would give me a lockdep splat. So then if somebody were to come up with the splendid idea of using a dma-fence-chain to initially coalesce fences, I'd hit the same problem or risk illegaly joining two dma-fence-chains together. To fix this, I would need to look at the incoming fences and iterate over any dma-fence-array or dma-fence-chain that is fed into the dma-fence-array to flatten out the input. In fact all dma-fence-array users would need to do that, and even dma-fence-chain users watching out for not joining chains together or accidently add an array that perhaps came as a disguised dma-fence from antother driver. So the purpose to me would be to allow these containers as input to eachother without a lot of in-driver special-casing, be it by breaking recursion on built-in flattening to avoid a) Hitting issues in the future or with existing interoperating drivers. b) Avoid driver-private containers that also might break the interoperability. (For example the i915 currently driver-private dma_fence_work avoid all these problems, but we're attempting to address issues in common code rather than re-inventing stuff internally). /Thomas Christian. /Thomas Christian. Thanks, /Thomas
Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes
On 12/1/21 09:36, Christian König wrote: Am 01.12.21 um 09:23 schrieb Thomas Hellström (Intel): [SNIP] Jason and I came up with a deep dive iterator for his use case, but I think we don't want to use that any more after my dma_resv rework. In other words when you need to create a new dma_fence_array you flatten out the existing construct which is at worst case dma_fence_chain->dma_fence_array->dma_fence. Ok, Are there any cross-driver contract here, Like every driver using a dma_fence_array need to check for dma_fence_chain and flatten like above? So far we only discussed that on the mailing list but haven't made any documentation for that. OK, one other cross-driver pitfall I see is if someone accidently joins two fence chains together by creating a fence chain unknowingly using another fence chain as the @fence argument? That would indeed be illegal and we should probably add a WARN_ON() for that. The third cross-driver pitfall IMHO is the locking dependency these containers add. Other drivers (read at least i915) may have defined slightly different locking orders and that should also be addressed if needed, but that requires a cross driver agreement what the locking orders really are. Patch 1 actually addresses this, while keeping the container lockdep warnings for deep recursions, so at least I think that could serve as a discussion starter. No, drivers should never make any assumptions on that. Yes that i915 assumption of taking the lock of the last signaled fence first goes back a while in time. We should look at fixing that up, and document any (possibly forbidden) assumptions about fence lock locking orders to avoid it happening again, if there is no common cross-driver locking order that can be agreed. E.g. when you need to take a look from a callback you must guarantee that you never have that lock taken when you call any of the dma_fence functions. Your patch breaks the lockdep annotation for that. I'm pretty sure that could be fixed in a satisfactory way if needed. What we could do is to avoid all this by not calling the callback with the lock held in the first place. If that's possible that might be a good idea, pls also see below. /Thomas Oh, and a follow up question: If there was a way to break the recursion on final put() (using the same basic approach as patch 2 in this series uses to break recursion in enable_signaling()), so that none of these containers did require any special treatment, would it be worth pursuing? I guess it might be possible by having the callbacks drop the references rather than the loop in the final put. + a couple of changes in code iterating over the fence pointers. That won't really help, you just move the recursion from the final put into the callback. How do we recurse from the callback? The introduced fence_put() of individual fence pointers doesn't recurse anymore (at most 1 level), and any callback recursion is broken by the irq_work? Yeah, but then you would need to take another lock to avoid racing with dma_fence_array_signaled(). I figure the big amount of work would be to adjust code that iterates over the individual fence pointers to recognize that they are rcu protected. Could be that we could solve this with RCU, but that sounds like a lot of churn for no gain at all. In other words even with the problems solved I think it would be a really bad idea to allow chaining of dma_fence_array objects. Yes, that was really the question, Is it worth pursuing this? I'm not really suggesting we should allow this as an intentional feature. I'm worried, however, that if we allow these containers to start floating around cross-driver (or even internally) disguised as ordinary dma_fences, they would require a lot of driver special casing, or else completely unexpeced WARN_ON()s and lockdep splats would start to turn up, scaring people off from using them. And that would be a breeding ground for hairy driver-private constructs. /Thomas Christian. Thanks, /Thomas
Re: [Linaro-mm-sig] [RFC PATCH 1/2] dma-fence: Avoid establishing a locking order between fence classes
On 12/1/21 08:05, Christian König wrote: Am 30.11.21 um 20:27 schrieb Thomas Hellström: On 11/30/21 19:12, Thomas Hellström wrote: On Tue, 2021-11-30 at 16:02 +0100, Christian König wrote: Am 30.11.21 um 15:35 schrieb Thomas Hellström: On Tue, 2021-11-30 at 14:26 +0100, Christian König wrote: Am 30.11.21 um 13:56 schrieb Thomas Hellström: On 11/30/21 13:42, Christian König wrote: Am 30.11.21 um 13:31 schrieb Thomas Hellström: [SNIP] Other than that, I didn't investigate the nesting fails enough to say I can accurately review this. :) Basically the problem is that within enable_signaling() which is called with the dma_fence lock held, we take the dma_fence lock of another fence. If that other fence is a dma_fence_array, or a dma_fence_chain which in turn tries to lock a dma_fence_array we hit a splat. Yeah, I already thought that you constructed something like that. You get the splat because what you do here is illegal, you can't mix dma_fence_array and dma_fence_chain like this or you can end up in a stack corruption. Hmm. Ok, so what is the stack corruption, is it that the enable_signaling() will end up with endless recursion? If so, wouldn't it be more usable we break that recursion chain and allow a more general use? The problem is that this is not easily possible for dma_fence_array containers. Just imagine that you drop the last reference to the containing fences during dma_fence_array destruction if any of the contained fences is another container you can easily run into recursion and with that stack corruption. Indeed, that would require some deeper surgery. That's one of the major reasons I came up with the dma_fence_chain container. This one you can chain any number of elements together without running into any recursion. Also what are the mixing rules between these? Never use a dma-fence-chain as one of the array fences and never use a dma-fence-array as a dma-fence-chain fence? You can't add any other container to a dma_fence_array, neither other dma_fence_array instances nor dma_fence_chain instances. IIRC at least technically a dma_fence_chain can contain a dma_fence_array if you absolutely need that, but Daniel, Jason and I already had the same discussion a while back and came to the conclusion to avoid that as well if possible. Yes, this is actually the use-case. But what I can't easily guarantee is that that dma_fence_chain isn't fed into a dma_fence_array somewhere else. How do you typically avoid that? Meanwhile I guess I need to take a different approach in the driver to avoid this altogether. Jason and I came up with a deep dive iterator for his use case, but I think we don't want to use that any more after my dma_resv rework. In other words when you need to create a new dma_fence_array you flatten out the existing construct which is at worst case dma_fence_chain->dma_fence_array->dma_fence. Ok, Are there any cross-driver contract here, Like every driver using a dma_fence_array need to check for dma_fence_chain and flatten like above? So far we only discussed that on the mailing list but haven't made any documentation for that. OK, one other cross-driver pitfall I see is if someone accidently joins two fence chains together by creating a fence chain unknowingly using another fence chain as the @fence argument? The third cross-driver pitfall IMHO is the locking dependency these containers add. Other drivers (read at least i915) may have defined slightly different locking orders and that should also be addressed if needed, but that requires a cross driver agreement what the locking orders really are. Patch 1 actually addresses this, while keeping the container lockdep warnings for deep recursions, so at least I think that could serve as a discussion starter. /Thomas Oh, and a follow up question: If there was a way to break the recursion on final put() (using the same basic approach as patch 2 in this series uses to break recursion in enable_signaling()), so that none of these containers did require any special treatment, would it be worth pursuing? I guess it might be possible by having the callbacks drop the references rather than the loop in the final put. + a couple of changes in code iterating over the fence pointers. That won't really help, you just move the recursion from the final put into the callback. How do we recurse from the callback? The introduced fence_put() of individual fence pointers doesn't recurse anymore (at most 1 level), and any callback recursion is broken by the irq_work? I figure the big amount of work would be to adjust code that iterates over the individual fence pointers to recognize that they are rcu protected. Thanks, /Thomas
Re: [Intel-gfx] [PATCH] drm/i915: Add ww context to intel_dpt_pin, v2.
On 9/29/21 10:59, Maarten Lankhorst wrote: Ensure i915_vma_pin_iomap and vma_unpin are done with dpt->obj lock held. I don't think there's much of a point in merging intel_dpt_pin() with intel_pin_fb_obj_dpt(), they touch different objects. Changes since v1: - Fix using the wrong pointer to retrieve error code (Julia) Signed-off-by: Maarten Lankhorst Reported-by: kernel test robot Reported-by: Julia Lawall LGTM. Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/display/intel_dpt.c | 40 +++- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c index de62bd77b15e..8f7b1f7534a4 100644 --- a/drivers/gpu/drm/i915/display/intel_dpt.c +++ b/drivers/gpu/drm/i915/display/intel_dpt.c @@ -121,32 +121,42 @@ struct i915_vma *intel_dpt_pin(struct i915_address_space *vm) intel_wakeref_t wakeref; struct i915_vma *vma; void __iomem *iomem; + struct i915_gem_ww_ctx ww; + int err; wakeref = intel_runtime_pm_get(>runtime_pm); atomic_inc(>gpu_error.pending_fb_pin); - vma = i915_gem_object_ggtt_pin(dpt->obj, NULL, 0, 4096, - HAS_LMEM(i915) ? 0 : PIN_MAPPABLE); - if (IS_ERR(vma)) - goto err; + for_i915_gem_ww(, err, true) { + err = i915_gem_object_lock(dpt->obj, ); + if (err) + continue; - iomem = i915_vma_pin_iomap(vma); - i915_vma_unpin(vma); - if (IS_ERR(iomem)) { - vma = ERR_CAST(iomem); - goto err; - } + vma = i915_gem_object_ggtt_pin_ww(dpt->obj, , NULL, 0, 4096, + HAS_LMEM(i915) ? 0 : PIN_MAPPABLE); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + continue; + } + + iomem = i915_vma_pin_iomap(vma); + i915_vma_unpin(vma); - dpt->vma = vma; - dpt->iomem = iomem; + if (IS_ERR(iomem)) { + err = PTR_ERR(iomem); + continue; + } - i915_vma_get(vma); + dpt->vma = vma; + dpt->iomem = iomem; + + i915_vma_get(vma); + } -err: atomic_dec(>gpu_error.pending_fb_pin); intel_runtime_pm_put(>runtime_pm, wakeref); - return vma; + return err ? ERR_PTR(err) : vma; } void intel_dpt_unpin(struct i915_address_space *vm)
[PATCH] drm/bochs: add Bochs PCI ID for Simics model
Current (and older) Simics models for the Bochs VGA used the wrong PCI vendor ID (0x4321 instead of 0x1234). Although this can hopefully be fixed in the future, it is a problem for users of the current version, not the least because to update the device ID the BIOS has to be rebuilt in order to see BIOS output. Add support for the 4321: device number in addition to the 1234: one. Signed-off-by: H. Peter Anvin (Intel) --- drivers/gpu/drm/tiny/bochs.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/tiny/bochs.c b/drivers/gpu/drm/tiny/bochs.c index 73415fa9ae0f..2ce3bd903b70 100644 --- a/drivers/gpu/drm/tiny/bochs.c +++ b/drivers/gpu/drm/tiny/bochs.c @@ -63,6 +63,7 @@ MODULE_PARM_DESC(defy, "default y resolution"); enum bochs_types { BOCHS_QEMU_STDVGA, + BOCHS_SIMICS, BOCHS_UNKNOWN, }; @@ -695,6 +696,13 @@ static const struct pci_device_id bochs_pci_tbl[] = { .subdevice = PCI_ANY_ID, .driver_data = BOCHS_UNKNOWN, }, + { + .vendor = 0x4321, + .device = 0x, + .subvendor = PCI_ANY_ID, + .subdevice = PCI_ANY_ID, + .driver_data = BOCHS_SIMICS, + }, { /* end of list */ } }; -- 2.31.1
Re: [Intel-gfx] [PATCH 7/8] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
Hi, Jason, A quick question below: On 7/23/21 7:21 PM, Jason Ekstrand wrote: From: Thomas Hellström If our exported dma-bufs are imported by another instance of our driver, that instance will typically have the imported dma-bufs locked during dma_buf_map_attachment(). But the exporter also locks the same reservation object in the map_dma_buf() callback, which leads to recursive locking. So taking the lock inside _pin_pages_unlocked() is incorrect. Additionally, the current pinning code path is contrary to the defined way that pinning should occur. Remove the explicit pin/unpin from the map/umap functions and move them to the attach/detach allowing correct locking to occur, and to match the static dma-buf drm_prime pattern. Add a live selftest to exercise both dynamic and non-dynamic exports. v2: - Extend the selftest with a fake dynamic importer. - Provide real pin and unpin callbacks to not abuse the interface. v3: (ruhl) - Remove the dynamic export support and move the pinning into the attach/detach path. v4: (ruhl) - Put pages does not need to assert on the dma-resv v5: (jason) - Lock around dma_buf_unmap_attachment() when emulating a dynamic importer in the subtests. - Use pin_pages_unlocked v6: (jason) - Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests Why did we drop the dynamic importer from the selftests? Shouldn't we try to ensure compatibility with dynamic importers? /Thomas
Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1
On 8/6/21 1:34 PM, Thomas Hellström (Intel) wrote: Hi, On 8/3/21 7:26 PM, Matthew Brost wrote: On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote: On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost wrote: Minimum set of patches to enable GuC submission on DG1 and enable it by default. A little difficult to test as IGTs do not work with DG1 due to a bunch of uAPI features being disabled (e.g. relocations, caching memory options, etc...). Matt Auld has an igt series which fixes a lot of this stuff, would be good to do at least a Test-With run with that. It looks like Maarten now merged Matt's series to IGT. There is a series on IGT trybot with pending work to have some igt tests support relocations, https://patchwork.freedesktop.org/series/92043/ One of the tests that have WIP fixes is gem_exec_whisper, and that particular test has historically shown occasional hangs with GuC submission on DG1 so it would be very desirable if we could make that test in particular work (I haven't verified that that's the case) reliably. Also the following series: https://patchwork.freedesktop.org/series/93455/ tries a bit harder to get some more tests running, squashing the above series on top of latest IGT. Thanks, /Thomas Just verified that gem-exec-whisper is running now on DG1 on latest igt and the above series. Haven't tried with GuC submission, though. /Thomas
Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1
On 8/6/21 1:34 PM, Thomas Hellström (Intel) wrote: Hi, On 8/3/21 7:26 PM, Matthew Brost wrote: On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote: On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost wrote: Minimum set of patches to enable GuC submission on DG1 and enable it by default. A little difficult to test as IGTs do not work with DG1 due to a bunch of uAPI features being disabled (e.g. relocations, caching memory options, etc...). Matt Auld has an igt series which fixes a lot of this stuff, would be good to do at least a Test-With run with that. It looks like Maarten now merged Matt's series to IGT. There is a series on IGT trybot with pending work to have some igt tests support relocations, https://patchwork.freedesktop.org/series/92043/ One of the tests that have WIP fixes is gem_exec_whisper, and that particular test has historically shown occasional hangs with GuC submission on DG1 so it would be very desirable if we could make that test in particular work (I haven't verified that that's the case) reliably. Also the following series: https://patchwork.freedesktop.org/series/93455/ tries a bit harder to get some more tests running, squashing the above series on top of latest IGT. Thanks, /Thomas And also while we're working on getting igt adapted to uapi changes and to get more LMEM coverage in there, an IMO relevant test case to run manually is "piglit quick"on top of DG1-enabled OpenGL checking for regressions and significant changes in execution time. /Thomas
Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1
Hi, On 8/3/21 7:26 PM, Matthew Brost wrote: On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote: On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost wrote: Minimum set of patches to enable GuC submission on DG1 and enable it by default. A little difficult to test as IGTs do not work with DG1 due to a bunch of uAPI features being disabled (e.g. relocations, caching memory options, etc...). Matt Auld has an igt series which fixes a lot of this stuff, would be good to do at least a Test-With run with that. It looks like Maarten now merged Matt's series to IGT. There is a series on IGT trybot with pending work to have some igt tests support relocations, https://patchwork.freedesktop.org/series/92043/ One of the tests that have WIP fixes is gem_exec_whisper, and that particular test has historically shown occasional hangs with GuC submission on DG1 so it would be very desirable if we could make that test in particular work (I haven't verified that that's the case) reliably. Also the following series: https://patchwork.freedesktop.org/series/93455/ tries a bit harder to get some more tests running, squashing the above series on top of latest IGT. Thanks, /Thomas
Re: [PATCH 1/4] drm/ttm: add a pointer to the allocating BO into ttm_resource
Hi, Christian, I know you have a lot on your plate, and that the drm community is a bit lax about following the kernel patch submitting guidelines, but now that we're also spinning up a number of Intel developers on TTM could we please make a better effort with cover letters and commit messages so that they understand what the purpose and end goal of the series is. A reviewer shouldn't have to look at the last patch to try to get an understanding what the series is doing and why. On 6/10/21 1:05 PM, Christian König wrote: We are going to need this for the next patch and it allows us to clean up amdgpu as well. The amdgpu changes are not reflected in the commit title. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 47 - drivers/gpu/drm/ttm/ttm_resource.c | 1 + include/drm/ttm/ttm_resource.h | 1 + 3 files changed, 19 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c index 194f9eecf89c..8e3f5da44e4f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c @@ -26,23 +26,12 @@ #include "amdgpu.h" -struct amdgpu_gtt_node { - struct ttm_buffer_object *tbo; - struct ttm_range_mgr_node base; -}; - static inline struct amdgpu_gtt_mgr * to_gtt_mgr(struct ttm_resource_manager *man) { return container_of(man, struct amdgpu_gtt_mgr, manager); } -static inline struct amdgpu_gtt_node * -to_amdgpu_gtt_node(struct ttm_resource *res) -{ - return container_of(res, struct amdgpu_gtt_node, base.base); -} - /** * DOC: mem_info_gtt_total * @@ -107,9 +96,9 @@ const struct attribute_group amdgpu_gtt_mgr_attr_group = { */ bool amdgpu_gtt_mgr_has_gart_addr(struct ttm_resource *res) { - struct amdgpu_gtt_node *node = to_amdgpu_gtt_node(res); + struct ttm_range_mgr_node *node = to_ttm_range_mgr_node(res); - return drm_mm_node_allocated(>base.mm_nodes[0]); + return drm_mm_node_allocated(>mm_nodes[0]); } /** @@ -129,7 +118,7 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man, { struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); uint32_t num_pages = PFN_UP(tbo->base.size); - struct amdgpu_gtt_node *node; + struct ttm_range_mgr_node *node; int r; spin_lock(>lock); @@ -141,19 +130,17 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man, atomic64_sub(num_pages, >available); spin_unlock(>lock); - node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL); + node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL); if (!node) { r = -ENOMEM; goto err_out; } - node->tbo = tbo; - ttm_resource_init(tbo, place, >base.base); - + ttm_resource_init(tbo, place, >base); if (place->lpfn) { spin_lock(>lock); r = drm_mm_insert_node_in_range(>mm, - >base.mm_nodes[0], + >mm_nodes[0], num_pages, tbo->page_alignment, 0, place->fpfn, place->lpfn, DRM_MM_INSERT_BEST); @@ -161,14 +148,14 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man, if (unlikely(r)) goto err_free; - node->base.base.start = node->base.mm_nodes[0].start; + node->base.start = node->mm_nodes[0].start; } else { - node->base.mm_nodes[0].start = 0; - node->base.mm_nodes[0].size = node->base.base.num_pages; - node->base.base.start = AMDGPU_BO_INVALID_OFFSET; + node->mm_nodes[0].start = 0; + node->mm_nodes[0].size = node->base.num_pages; + node->base.start = AMDGPU_BO_INVALID_OFFSET; } - *res = >base.base; + *res = >base; return 0; err_free: @@ -191,12 +178,12 @@ static int amdgpu_gtt_mgr_new(struct ttm_resource_manager *man, static void amdgpu_gtt_mgr_del(struct ttm_resource_manager *man, struct ttm_resource *res) { - struct amdgpu_gtt_node *node = to_amdgpu_gtt_node(res); + struct ttm_range_mgr_node *node = to_ttm_range_mgr_node(res); struct amdgpu_gtt_mgr *mgr = to_gtt_mgr(man); spin_lock(>lock); - if (drm_mm_node_allocated(>base.mm_nodes[0])) - drm_mm_remove_node(>base.mm_nodes[0]); + if (drm_mm_node_allocated(>mm_nodes[0])) + drm_mm_remove_node(>mm_nodes[0]); spin_unlock(>lock); atomic64_add(res
Re: [PATCH 3/9] drm/vmwgfx: Fix subresource updates with new contexts
Hi, On 6/9/21 7:23 PM, Zack Rusin wrote: The has_dx variable was only set during the initialization which meant that UPDATE_SUBRESOURCE was never used. We were emulating it with UPDATE_GB_IMAGE but that's always been a stop-gap. Instead of has_dx which has been deprecated a long time ago we need to check for whether shader model 4.0 or newer is available to the device. Stupid question perhaps, but isn't UPDATE_SUBRESOURCE available with SVGA_CAP_DX regardless of the SM capabilities of the underlying device? Thanks, Thomas
Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2
Hi, On 6/8/21 9:14 AM, Christian König wrote: Am 08.06.21 um 07:29 schrieb Thomas Hellström (Intel): Hi, On 6/7/21 7:59 PM, Christian König wrote: Am 07.06.21 um 19:58 schrieb Thomas Hellström (Intel): On 6/7/21 7:54 PM, Christian König wrote: Am 07.06.21 um 19:06 schrieb Thomas Hellström (Intel): On 6/7/21 6:40 PM, Thomas Hellström (Intel) wrote: On 6/2/21 12:09 PM, Christian König wrote: ... @@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, */ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, const struct ttm_place *place, - struct ttm_resource *mem, + struct ttm_resource **mem, struct ttm_operation_ctx *ctx) { struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man = ttm_manager_type(bdev, mem->mem_type); + struct ttm_resource_manager *man; struct ww_acquire_ctx *ticket; int ret; + man = ttm_manager_type(bdev, (*mem)->mem_type); Isn't (*mem) uninitialized here? Should be place->mem_type? Eviction is immediately sent to the bushes. Got at least one additional NULL pointer deref to track down in the eviction code, but could be a merge error of mine as well. Actually this last one was probably due to a bad temporary fix of the above one. I've found one more warning during my testing now. But that is just a false positive. Apart from that I haven't seen any other fallout, but fingers crossed. vmwgfx doesn't seem to happy. It works AFAICT., but warns in vmw_move() about ttm_bo_assign_mem() replacing an existing resource. Yeah, that's the one I've just fixed. Patch is underway. If that's the move_to_ghost patch, I don't think that would fix the vmwgfx issue, as IIRC vmwgfx ever uses ghost objects. Mhm, could be that vmwgfx is hitting the same warning with a different backtrace. Do you have the log to double check? Unfortunately not, but IIRC it was directly from vmw_move(). /Thomas
Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2
Hi, On 6/7/21 7:59 PM, Christian König wrote: Am 07.06.21 um 19:58 schrieb Thomas Hellström (Intel): On 6/7/21 7:54 PM, Christian König wrote: Am 07.06.21 um 19:06 schrieb Thomas Hellström (Intel): On 6/7/21 6:40 PM, Thomas Hellström (Intel) wrote: On 6/2/21 12:09 PM, Christian König wrote: ... @@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, */ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, const struct ttm_place *place, - struct ttm_resource *mem, + struct ttm_resource **mem, struct ttm_operation_ctx *ctx) { struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man = ttm_manager_type(bdev, mem->mem_type); + struct ttm_resource_manager *man; struct ww_acquire_ctx *ticket; int ret; + man = ttm_manager_type(bdev, (*mem)->mem_type); Isn't (*mem) uninitialized here? Should be place->mem_type? Eviction is immediately sent to the bushes. Got at least one additional NULL pointer deref to track down in the eviction code, but could be a merge error of mine as well. Actually this last one was probably due to a bad temporary fix of the above one. I've found one more warning during my testing now. But that is just a false positive. Apart from that I haven't seen any other fallout, but fingers crossed. vmwgfx doesn't seem to happy. It works AFAICT., but warns in vmw_move() about ttm_bo_assign_mem() replacing an existing resource. Yeah, that's the one I've just fixed. Patch is underway. If that's the move_to_ghost patch, I don't think that would fix the vmwgfx issue, as IIRC vmwgfx ever uses ghost objects. /Thomas
Re: [PATCH] drm/ttm: fix warning after moving resource to ghost obj
On 6/7/21 7:57 PM, Christian König wrote: After we moved the resource to the ghost the bo->resource pointer needs to be resetted since the owner of the resource is now the ghost. s/resetted/reset/ Signed-off-by: Christian König Reviewed-by: Thomas Hellström
Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2
On 6/7/21 7:54 PM, Christian König wrote: Am 07.06.21 um 19:06 schrieb Thomas Hellström (Intel): On 6/7/21 6:40 PM, Thomas Hellström (Intel) wrote: On 6/2/21 12:09 PM, Christian König wrote: ... @@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, */ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, const struct ttm_place *place, - struct ttm_resource *mem, + struct ttm_resource **mem, struct ttm_operation_ctx *ctx) { struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man = ttm_manager_type(bdev, mem->mem_type); + struct ttm_resource_manager *man; struct ww_acquire_ctx *ticket; int ret; + man = ttm_manager_type(bdev, (*mem)->mem_type); Isn't (*mem) uninitialized here? Should be place->mem_type? Eviction is immediately sent to the bushes. Got at least one additional NULL pointer deref to track down in the eviction code, but could be a merge error of mine as well. Actually this last one was probably due to a bad temporary fix of the above one. I've found one more warning during my testing now. But that is just a false positive. Apart from that I haven't seen any other fallout, but fingers crossed. vmwgfx doesn't seem to happy. It works AFAICT., but warns in vmw_move() about ttm_bo_assign_mem() replacing an existing resource. /Thomas Christian. /Thomas
Re: [PATCH] drm/ttm: fix access to uninitialized variable.
On 6/7/21 7:11 PM, Christian König wrote: The resource is not allocated yet, so no chance that this will work. Use the placement instead. Signed-off-by: Christian König Reviewed-by: Thomas Hellström --- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 7e7284da5630..0c74f4cb2a3b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -736,7 +736,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, struct ww_acquire_ctx *ticket; int ret; - man = ttm_manager_type(bdev, (*mem)->mem_type); + man = ttm_manager_type(bdev, place->mem_type); ticket = dma_resv_locking_ctx(bo->base.resv); do { ret = ttm_resource_alloc(bo, place, mem);
Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2
On 6/7/21 6:40 PM, Thomas Hellström (Intel) wrote: On 6/2/21 12:09 PM, Christian König wrote: ... @@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, */ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, const struct ttm_place *place, - struct ttm_resource *mem, + struct ttm_resource **mem, struct ttm_operation_ctx *ctx) { struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man = ttm_manager_type(bdev, mem->mem_type); + struct ttm_resource_manager *man; struct ww_acquire_ctx *ticket; int ret; + man = ttm_manager_type(bdev, (*mem)->mem_type); Isn't (*mem) uninitialized here? Should be place->mem_type? Eviction is immediately sent to the bushes. Got at least one additional NULL pointer deref to track down in the eviction code, but could be a merge error of mine as well. Actually this last one was probably due to a bad temporary fix of the above one. /Thomas
Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2
On 6/2/21 12:09 PM, Christian König wrote: ... @@ -728,14 +728,15 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, */ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, const struct ttm_place *place, - struct ttm_resource *mem, + struct ttm_resource **mem, struct ttm_operation_ctx *ctx) { struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man = ttm_manager_type(bdev, mem->mem_type); + struct ttm_resource_manager *man; struct ww_acquire_ctx *ticket; int ret; + man = ttm_manager_type(bdev, (*mem)->mem_type); Isn't (*mem) uninitialized here? Should be place->mem_type? Eviction is immediately sent to the bushes. Got at least one additional NULL pointer deref to track down in the eviction code, but could be a merge error of mine as well. /Thomas
Re: [PATCH] drm/ttm: nuke VM_MIXEDMAP on BO mappings v2
On 6/7/21 3:58 PM, Christian König wrote: We discussed if that is really the right approach for quite a while now, but digging deeper into a bug report on arm turned out that this is actually horrible broken right now. The reason for this is that vmf_insert_mixed_prot() always tries to grab a reference to the underlaying page on architectures without ARCH_HAS_PTE_SPECIAL and as far as I can see also enabled GUP. So nuke using VM_MIXEDMAP here and use VM_PFNMAP instead. Also make sure to reject mappings without VM_SHARED. v2: reject COW mappings, merge function with only caller Signed-off-by: Christian König Bugs: https://gitlab.freedesktop.org/drm/amd/-/issues/1606#note_936174 --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 44 +++-- 1 file changed, 14 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 61828488ae2b..c9edb75626d9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -359,12 +359,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, * at arbitrary times while the data is mmap'ed. * See vmf_insert_mixed_prot() for a discussion. */ - if (vma->vm_flags & VM_MIXEDMAP) - ret = vmf_insert_mixed_prot(vma, address, - __pfn_to_pfn_t(pfn, PFN_DEV), - prot); - else - ret = vmf_insert_pfn_prot(vma, address, pfn, prot); + ret = vmf_insert_pfn_prot(vma, address, pfn, prot); /* Never error on prefaulted PTEs */ if (unlikely((ret & VM_FAULT_ERROR))) { @@ -411,15 +406,9 @@ vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot) pfn = page_to_pfn(page); /* Prefault the entire VMA range right away to avoid further faults */ - for (address = vma->vm_start; address < vma->vm_end; address += PAGE_SIZE) { - - if (vma->vm_flags & VM_MIXEDMAP) - ret = vmf_insert_mixed_prot(vma, address, - __pfn_to_pfn_t(pfn, PFN_DEV), - prot); - else - ret = vmf_insert_pfn_prot(vma, address, pfn, prot); - } + for (address = vma->vm_start; address < vma->vm_end; +address += PAGE_SIZE) + ret = vmf_insert_pfn_prot(vma, address, pfn, prot); return ret; } @@ -560,8 +549,16 @@ static const struct vm_operations_struct ttm_bo_vm_ops = { .access = ttm_bo_vm_access, }; -static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_struct *vma) +int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo) { + /* Enforce VM_SHARED here since without it we would have really strange +* behavior on COW. +*/ Nit: Perhaps "Enforce no COW.." since mappings are allowed with VM_SHARED iff VM_MAYWRITE is not set. Also style consistency with comments: First /* followed by line-break or are you adapting the above style for ttm? With that fixed, Reviewed-by: Thomas Hellström + if (is_cow_mapping(vma->vm_flags)) + return -EINVAL; + + ttm_bo_get(bo); + /* * Drivers may want to override the vm_ops field. Otherwise we * use TTM's default callbacks. @@ -576,21 +573,8 @@ static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_s vma->vm_private_data = bo; - /* -* We'd like to use VM_PFNMAP on shared mappings, where -* (vma->vm_flags & VM_SHARED) != 0, for performance reasons, -* but for some reason VM_PFNMAP + x86 PAT + write-combine is very -* bad for performance. Until that has been sorted out, use -* VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719 -*/ - vma->vm_flags |= VM_MIXEDMAP; + vma->vm_flags |= VM_PFNMAP; vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP; -} - -int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo) -{ - ttm_bo_get(bo); - ttm_bo_mmap_vma_setup(bo, vma); return 0; } EXPORT_SYMBOL(ttm_bo_mmap_obj);
Re: [PATCH 1/4] drm/ttm: add TTM_PL_FLAG_TEMPORARY flag v3
Sure. LGTM, Reviewed-by: Thomas Hellström On 6/7/21 2:36 PM, Christian König wrote: Thomas any comments on this? Is the purpose of this now clear enough? Thanks, Christian. Am 01.06.21 um 14:25 schrieb Christian König: From: Lang Yu Sometimes drivers need to use bounce buffers to evict BOs. While those reside in some domain they are not necessarily suitable for CS. Add a flag so that drivers can note that a bounce buffers needs to be reallocated during validation. v2: add detailed comments v3 (chk): merge commits and rework commit message Suggested-by: Christian König Signed-off-by: Lang Yu Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 3 +++ include/drm/ttm/ttm_placement.h | 7 +-- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 51a94fd63bd7..6b393502198e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -929,6 +929,9 @@ static bool ttm_bo_places_compat(const struct ttm_place *places, { unsigned i; + if (mem->placement & TTM_PL_FLAG_TEMPORARY) + return false; + for (i = 0; i < num_placement; i++) { const struct ttm_place *heap = [i]; diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h index aa6ba4d0cf78..8995c9e4ec1b 100644 --- a/include/drm/ttm/ttm_placement.h +++ b/include/drm/ttm/ttm_placement.h @@ -47,8 +47,11 @@ * top of the memory area, instead of the bottom. */ -#define TTM_PL_FLAG_CONTIGUOUS (1 << 19) -#define TTM_PL_FLAG_TOPDOWN (1 << 22) +#define TTM_PL_FLAG_CONTIGUOUS (1 << 0) +#define TTM_PL_FLAG_TOPDOWN (1 << 1) + +/* For multihop handling */ +#define TTM_PL_FLAG_TEMPORARY (1 << 2) /** * struct ttm_place
Re: [PATCH] drm/ttm: fix missing res assignment in ttm_range_man_alloc
On 6/7/21 12:40 PM, Christian König wrote: That somehow got missing. Signed-off-by: Christian König Fixes: cb1c81467af3 ("drm/ttm: flip the switch for driver allocated resources v2") Reported-by: Thomas Hellström Reviewed-by: Thomas Hellström --- drivers/gpu/drm/ttm/ttm_range_manager.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c b/drivers/gpu/drm/ttm/ttm_range_manager.c index c32e1aee2481..03395386e8a7 100644 --- a/drivers/gpu/drm/ttm/ttm_range_manager.c +++ b/drivers/gpu/drm/ttm/ttm_range_manager.c @@ -88,12 +88,14 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, place->fpfn, lpfn, mode); spin_unlock(>lock); - if (unlikely(ret)) + if (unlikely(ret)) { kfree(node); - else - node->base.start = node->mm_nodes[0].start; + return ret; + } - return ret; + node->base.start = node->mm_nodes[0].start; + *res = >base; + return 0; } static void ttm_range_man_free(struct ttm_resource_manager *man,
Re: [PATCH 10/10] drm/ttm: flip the switch for driver allocated resources v2
On 6/7/21 12:37 PM, Christian König wrote: Am 07.06.21 um 12:15 schrieb Thomas Hellström (Intel): On 6/2/21 12:09 PM, Christian König wrote: Instead of both driver and TTM allocating memory finalize embedding the ttm_resource object as base into the driver backends. v2: fix typo in vmwgfx grid mgr and double init in amdgpu_vram_mgr.c Signed-off-by: Christian König Reviewed-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 44 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h | 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 60 +-- drivers/gpu/drm/drm_gem_vram_helper.c | 3 +- drivers/gpu/drm/nouveau/nouveau_bo.c | 8 +-- drivers/gpu/drm/nouveau/nouveau_mem.c | 11 ++-- drivers/gpu/drm/nouveau/nouveau_mem.h | 14 ++--- drivers/gpu/drm/nouveau/nouveau_ttm.c | 32 +- drivers/gpu/drm/ttm/ttm_range_manager.c | 23 +++ drivers/gpu/drm/ttm/ttm_resource.c | 18 +- drivers/gpu/drm/ttm/ttm_sys_manager.c | 12 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 24 drivers/gpu/drm/vmwgfx/vmwgfx_thp.c | 27 - include/drm/ttm/ttm_range_manager.h | 3 +- include/drm/ttm/ttm_resource.h | 43 ++--- 16 files changed, 140 insertions(+), 189 deletions(-) ... diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c b/drivers/gpu/drm/ttm/ttm_range_manager.c index ce5d07ca384c..c32e1aee2481 100644 --- a/drivers/gpu/drm/ttm/ttm_range_manager.c +++ b/drivers/gpu/drm/ttm/ttm_range_manager.c @@ -58,7 +58,7 @@ to_range_manager(struct ttm_resource_manager *man) static int ttm_range_man_alloc(struct ttm_resource_manager *man, struct ttm_buffer_object *bo, const struct ttm_place *place, - struct ttm_resource *mem) + struct ttm_resource **res) { struct ttm_range_manager *rman = to_range_manager(man); struct ttm_range_mgr_node *node; @@ -83,37 +83,30 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, spin_lock(>lock); ret = drm_mm_insert_node_in_range(mm, >mm_nodes[0], - mem->num_pages, bo->page_alignment, 0, + node->base.num_pages, + bo->page_alignment, 0, place->fpfn, lpfn, mode); spin_unlock(>lock); - if (unlikely(ret)) { + if (unlikely(ret)) kfree(node); - } else { - mem->mm_node = >mm_nodes[0]; - mem->start = node->mm_nodes[0].start; - } + else + node->base.start = node->mm_nodes[0].start; return ret; } Looks like this patch forgets to assign *@res. Null pointer derefs when testing i915. I should really CC the Intel list for TTM patches as well. The CI system should have spotted that. Unfortunately, the dg1 system is not participating in CI yet AFAICT, but moving forward I think it's a good idea. /Thomas
Re: [PATCH 10/10] drm/ttm: flip the switch for driver allocated resources v2
On 6/2/21 12:09 PM, Christian König wrote: Instead of both driver and TTM allocating memory finalize embedding the ttm_resource object as base into the driver backends. v2: fix typo in vmwgfx grid mgr and double init in amdgpu_vram_mgr.c Signed-off-by: Christian König Reviewed-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 44 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c| 2 +- .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 5 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 60 +-- drivers/gpu/drm/drm_gem_vram_helper.c | 3 +- drivers/gpu/drm/nouveau/nouveau_bo.c | 8 +-- drivers/gpu/drm/nouveau/nouveau_mem.c | 11 ++-- drivers/gpu/drm/nouveau/nouveau_mem.h | 14 ++--- drivers/gpu/drm/nouveau/nouveau_ttm.c | 32 +- drivers/gpu/drm/ttm/ttm_range_manager.c | 23 +++ drivers/gpu/drm/ttm/ttm_resource.c| 18 +- drivers/gpu/drm/ttm/ttm_sys_manager.c | 12 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_gmrid_manager.c | 24 drivers/gpu/drm/vmwgfx/vmwgfx_thp.c | 27 - include/drm/ttm/ttm_range_manager.h | 3 +- include/drm/ttm/ttm_resource.h| 43 ++--- 16 files changed, 140 insertions(+), 189 deletions(-) ... diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c b/drivers/gpu/drm/ttm/ttm_range_manager.c index ce5d07ca384c..c32e1aee2481 100644 --- a/drivers/gpu/drm/ttm/ttm_range_manager.c +++ b/drivers/gpu/drm/ttm/ttm_range_manager.c @@ -58,7 +58,7 @@ to_range_manager(struct ttm_resource_manager *man) static int ttm_range_man_alloc(struct ttm_resource_manager *man, struct ttm_buffer_object *bo, const struct ttm_place *place, - struct ttm_resource *mem) + struct ttm_resource **res) { struct ttm_range_manager *rman = to_range_manager(man); struct ttm_range_mgr_node *node; @@ -83,37 +83,30 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, spin_lock(>lock); ret = drm_mm_insert_node_in_range(mm, >mm_nodes[0], - mem->num_pages, bo->page_alignment, 0, + node->base.num_pages, + bo->page_alignment, 0, place->fpfn, lpfn, mode); spin_unlock(>lock); - if (unlikely(ret)) { + if (unlikely(ret)) kfree(node); - } else { - mem->mm_node = >mm_nodes[0]; - mem->start = node->mm_nodes[0].start; - } + else + node->base.start = node->mm_nodes[0].start; return ret; } Looks like this patch forgets to assign *@res. Null pointer derefs when testing i915. BTW shouldn't we return the struct ttm_resource ptr here rather than passing it as an argument? /Thomas
Re: [PATCH 01/10] drm/ttm: allocate resource object instead of embedding it v2
Hi, Christian, It looks like all patches in the series have been reviewed or acked by Matthew, and while still a little worried about the final outcome of embedding a struct ttm_mem_resource, FWIW, Acked-by: Thomas Hellström /Thomas On 6/2/21 12:09 PM, Christian König wrote: To improve the handling we want the establish the resource object as base class for the backend allocations. v2: add missing error handling Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 54 +++--- drivers/gpu/drm/nouveau/nouveau_bo.c | 2 +- drivers/gpu/drm/radeon/radeon_ttm.c| 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 83 -- drivers/gpu/drm/ttm/ttm_bo_util.c | 43 ++- drivers/gpu/drm/ttm/ttm_resource.c | 31 +--- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 2 +- include/drm/ttm/ttm_bo_api.h | 1 - include/drm/ttm/ttm_bo_driver.h| 10 ++- include/drm/ttm/ttm_resource.h | 4 +- 11 files changed, 110 insertions(+), 126 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 03c6b63d1d54..59723c3d5826 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -362,14 +362,14 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, if (cpu_addr) amdgpu_bo_kunmap(*bo_ptr); - ttm_resource_free(&(*bo_ptr)->tbo, (*bo_ptr)->tbo.resource); + ttm_resource_free(&(*bo_ptr)->tbo, &(*bo_ptr)->tbo.resource); for (i = 0; i < (*bo_ptr)->placement.num_placement; ++i) { (*bo_ptr)->placements[i].fpfn = offset >> PAGE_SHIFT; (*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT; } r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement, -(*bo_ptr)->tbo.resource, ); +&(*bo_ptr)->tbo.resource, ); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 663aa7d2e2ea..69db89261650 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -491,7 +491,7 @@ static int amdgpu_bo_move(struct ttm_buffer_object *bo, bool evict, return r; amdgpu_ttm_backend_unbind(bo->bdev, bo->ttm); - ttm_resource_free(bo, bo->resource); + ttm_resource_free(bo, >resource); ttm_bo_assign_mem(bo, new_mem); goto out; } @@ -950,9 +950,9 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev); struct ttm_operation_ctx ctx = { false, false }; struct amdgpu_ttm_tt *gtt = (void *)bo->ttm; - struct ttm_resource tmp; struct ttm_placement placement; struct ttm_place placements; + struct ttm_resource *tmp; uint64_t addr, flags; int r; @@ -962,37 +962,37 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) addr = amdgpu_gmc_agp_addr(bo); if (addr != AMDGPU_BO_INVALID_OFFSET) { bo->resource->start = addr >> PAGE_SHIFT; - } else { + return 0; + } - /* allocate GART space */ - placement.num_placement = 1; - placement.placement = - placement.num_busy_placement = 1; - placement.busy_placement = - placements.fpfn = 0; - placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT; - placements.mem_type = TTM_PL_TT; - placements.flags = bo->resource->placement; - - r = ttm_bo_mem_space(bo, , , ); - if (unlikely(r)) - return r; + /* allocate GART space */ + placement.num_placement = 1; + placement.placement = + placement.num_busy_placement = 1; + placement.busy_placement = + placements.fpfn = 0; + placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT; + placements.mem_type = TTM_PL_TT; + placements.flags = bo->resource->placement; - /* compute PTE flags for this buffer object */ - flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, ); + r = ttm_bo_mem_space(bo, , , ); + if (unlikely(r)) + return r; - /* Bind pages */ - gtt->offset = (u64)tmp.start << PAGE_SHIFT; - r = amdgpu_ttm_gart_bind(adev, bo, flags); - if (unlikely(r)) { - ttm_resource_free(bo, ); - return r; - } + /* compute PTE flags for this buffer object */ + flags = amdgpu_ttm_tt_pte_flags(adev, bo->ttm, tmp); - ttm_resource_free(bo, bo->resource); -
Re: [PATCH] drm/ttm: nuke VM_MIXEDMAP on BO mappings
On 6/2/21 8:36 PM, Daniel Vetter wrote: On Wed, Jun 02, 2021 at 02:21:17PM +0200, Thomas Hellström wrote: On 6/2/21 2:04 PM, Christian König wrote: Am 02.06.21 um 13:24 schrieb Thomas Hellström (Intel): [SNIP] @@ -576,14 +565,10 @@ static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_s vma->vm_private_data = bo; - /* - * We'd like to use VM_PFNMAP on shared mappings, where - * (vma->vm_flags & VM_SHARED) != 0, for performance reasons, - * but for some reason VM_PFNMAP + x86 PAT + write-combine is very - * bad for performance. Until that has been sorted out, use - * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719 + /* Enforce VM_SHARED here since no driver backend actually supports COW + * on TTM buffer object mappings. I think by default all TTM drivers support COW mappings in the sense that written data never makes it to the bo but stays in anonymous pages, although I can't find a single usecase. So comment should be changed to state that they are useless for us and that we can't support COW mappings with VM_PFNMAP. Well the problem I see with that is that it only works as long as the BO is in system memory. When it then suddenly migrates to VRAM everybody sees the same content again and the COW pages are dropped. That is really inconsistent and I can't see why we would want to do that. Hmm, yes, that's actually a bug in drm_vma_manager(). Hui? How is that related to drm_vma_manager() ? Last argument of "unmap_mapping_range()" is "even_cows". Additionally to that when you allow COW mappings you need to make sure your COWed pages have the right caching attribute and that the reference count is initialized and taken into account properly. Not driver actually gets that right at the moment. I was under the impression that COW'ed pages were handled transparently by the vm, you'd always get cached properly refcounted COW'ed pages but anyway since we're going to ditch support for them, doesn't really matter. Yeah, but I would have expected that the new COWed page should have the same caching attributes as the old one and that is not really the case. */ - vma->vm_flags |= VM_MIXEDMAP; + vma->vm_flags |= VM_PFNMAP | VM_SHARED; Hmm, shouldn't we refuse COW mappings instead, like my old patch on this subject did? In theory someone could be setting up what she thinks is a private mapping to a shared buffer object, and write sensitive data to it, which will immediately leak. It's a simple check, could open-code if necessary. Yeah, though about that as well. Rejecting things would mean we potentially break userspace which just happened to work by coincident previously. Not totally evil, but not nice either. How about we do a WARN_ON_ONCE(!(vma->vm_flags & VM_SHARED)); instead? Umm, yes but that wouldn't notify the user, and would be triggerable from user-space. But you can also set up legal non-COW mappings without the VM_SHARED flag, IIRC, see is_cow_mapping(). I think when this was up for discussion last time we arrived in a vma_is_cow_mapping() utility... Well userspace could trigger that only once, so no spamming of the log can be expected here. And extra warnings in the logs are usually reported by people rather quickly. OK, I'm mostly worried about adding a security flaw that we know about from the start. VM_SHARED is already cleared in vma_set_page_prot() due to the VM_PFNMAP check in vma_wants_writenotify. Yes, but that's only on a local variable to get a write-protected prot. vmwgfx does the same for its dirty-tracking. Here we're debating setting VM_SHARED on a private mapping. I'm honestly not sure whether userspace then even can notice this or anything, so might be worth a quick testcase. The net result is that in the very unlikely case the user requested a private GPU mapping to write secret data into, That secret data is no longer secret. And, for example in the case of AMD's SEV encryption that data would have been encrypted in an anonymous page with COW mappings, but not so if we add VM_SHARED, then it will be unencrypted in in GPU pages. Then I think it's better to refuse COW mappings in mmap: if (is_cow_mapping(vma->vm_flags)) return -EINVAL; This will still allow private read-only mappings which is OK. And if someone was actually relying on private COW'd GPU mappings, we'd only break the code slightly more... /Thomas Even if I'm wrong here we shouldn't allow cow mappings of gem_bo, that just seems too nasty with all the side-effects. Completely agree. -Daniel /Thomas Christian. /Thomas
Re: [PATCH 02/10] drm/ttm: flip over the range manager to self allocated nodes
On 6/2/21 8:41 PM, Christian König wrote: Am 02.06.21 um 17:28 schrieb Thomas Hellström (Intel): Hi! On 6/2/21 4:17 PM, Christian König wrote: Am 02.06.21 um 16:13 schrieb Thomas Hellström (Intel): On 6/2/21 3:07 PM, Christian König wrote: Am 02.06.21 um 14:33 schrieb Thomas Hellström (Intel): On 6/2/21 2:11 PM, Christian König wrote: Am 02.06.21 um 13:44 schrieb Thomas Hellström (Intel): On 6/2/21 12:09 PM, Christian König wrote: Start with the range manager to make the resource object the base class for the allocated nodes. While at it cleanup a lot of the code around that. Signed-off-by: Christian König Reviewed-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 + drivers/gpu/drm/drm_gem_vram_helper.c | 2 + drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 + drivers/gpu/drm/qxl/qxl_ttm.c | 1 + drivers/gpu/drm/radeon/radeon_ttm.c | 1 + drivers/gpu/drm/ttm/ttm_range_manager.c | 56 ++--- drivers/gpu/drm/ttm/ttm_resource.c | 26 include/drm/ttm/ttm_bo_driver.h | 26 include/drm/ttm/ttm_range_manager.h | 43 +++ include/drm/ttm/ttm_resource.h | 3 ++ 10 files changed, 111 insertions(+), 50 deletions(-) create mode 100644 include/drm/ttm/ttm_range_manager.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 69db89261650..df1f185faae9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -45,6 +45,7 @@ #include #include #include +#include #include diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index 83e7258c7f90..17a4c5d47b6a 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -17,6 +17,8 @@ #include #include +#include + static const struct drm_gem_object_funcs drm_gem_vram_object_funcs; /** diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index 65430912ff72..b08b8efeefba 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -26,6 +26,8 @@ #include #include +#include + #include "nouveau_drv.h" #include "nouveau_gem.h" #include "nouveau_mem.h" diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index 8aa87b8edb9c..19fd39d9a00c 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -32,6 +32,7 @@ #include #include #include +#include #include "qxl_drv.h" #include "qxl_object.h" diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index cdffa9b65108..ad2a5a791bba 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -45,6 +45,7 @@ #include #include #include +#include #include "radeon_reg.h" #include "radeon.h" diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c b/drivers/gpu/drm/ttm/ttm_range_manager.c index b9d5da6e6a81..ce5d07ca384c 100644 --- a/drivers/gpu/drm/ttm/ttm_range_manager.c +++ b/drivers/gpu/drm/ttm/ttm_range_manager.c @@ -29,12 +29,13 @@ * Authors: Thomas Hellstrom */ -#include +#include #include +#include +#include #include #include #include -#include /* * Currently we use a spinlock for the lock, but a mutex *may* be @@ -60,8 +61,8 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, struct ttm_resource *mem) { struct ttm_range_manager *rman = to_range_manager(man); + struct ttm_range_mgr_node *node; struct drm_mm *mm = >mm; - struct drm_mm_node *node; enum drm_mm_insert_mode mode; unsigned long lpfn; int ret; @@ -70,7 +71,7 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, if (!lpfn) lpfn = man->size; - node = kzalloc(sizeof(*node), GFP_KERNEL); + node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL); I'm still a bit confused about the situation where a driver wants to attach private data to a struct ttm_resource without having to re-implement its own range manager? Could be cached sg-tables, list of GPU bindings etc. Wouldn't work with the above unless we have a void *driver_private member on the struct ttm_resource. Is that the plan going forward here? Or that the driver actually does the re-implementation? I don't really understand your concern here. The basic idea is that drivers use ttm_resource as a base class for their own implementation. See for example how nouveau does that: struct nouveau_mem { struct ttm_resource base; struct nouveau_cli *cli; u8 kind; u8 comp; struct nvif_mem mem; struct nvif_vma vma[2]; }; The range manager is helping drive
Re: [PATCH 02/10] drm/ttm: flip over the range manager to self allocated nodes
Hi! On 6/2/21 4:17 PM, Christian König wrote: Am 02.06.21 um 16:13 schrieb Thomas Hellström (Intel): On 6/2/21 3:07 PM, Christian König wrote: Am 02.06.21 um 14:33 schrieb Thomas Hellström (Intel): On 6/2/21 2:11 PM, Christian König wrote: Am 02.06.21 um 13:44 schrieb Thomas Hellström (Intel): On 6/2/21 12:09 PM, Christian König wrote: Start with the range manager to make the resource object the base class for the allocated nodes. While at it cleanup a lot of the code around that. Signed-off-by: Christian König Reviewed-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 + drivers/gpu/drm/drm_gem_vram_helper.c | 2 + drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 + drivers/gpu/drm/qxl/qxl_ttm.c | 1 + drivers/gpu/drm/radeon/radeon_ttm.c | 1 + drivers/gpu/drm/ttm/ttm_range_manager.c | 56 ++--- drivers/gpu/drm/ttm/ttm_resource.c | 26 include/drm/ttm/ttm_bo_driver.h | 26 include/drm/ttm/ttm_range_manager.h | 43 +++ include/drm/ttm/ttm_resource.h | 3 ++ 10 files changed, 111 insertions(+), 50 deletions(-) create mode 100644 include/drm/ttm/ttm_range_manager.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 69db89261650..df1f185faae9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -45,6 +45,7 @@ #include #include #include +#include #include diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index 83e7258c7f90..17a4c5d47b6a 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -17,6 +17,8 @@ #include #include +#include + static const struct drm_gem_object_funcs drm_gem_vram_object_funcs; /** diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index 65430912ff72..b08b8efeefba 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -26,6 +26,8 @@ #include #include +#include + #include "nouveau_drv.h" #include "nouveau_gem.h" #include "nouveau_mem.h" diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index 8aa87b8edb9c..19fd39d9a00c 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -32,6 +32,7 @@ #include #include #include +#include #include "qxl_drv.h" #include "qxl_object.h" diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index cdffa9b65108..ad2a5a791bba 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -45,6 +45,7 @@ #include #include #include +#include #include "radeon_reg.h" #include "radeon.h" diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c b/drivers/gpu/drm/ttm/ttm_range_manager.c index b9d5da6e6a81..ce5d07ca384c 100644 --- a/drivers/gpu/drm/ttm/ttm_range_manager.c +++ b/drivers/gpu/drm/ttm/ttm_range_manager.c @@ -29,12 +29,13 @@ * Authors: Thomas Hellstrom */ -#include +#include #include +#include +#include #include #include #include -#include /* * Currently we use a spinlock for the lock, but a mutex *may* be @@ -60,8 +61,8 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, struct ttm_resource *mem) { struct ttm_range_manager *rman = to_range_manager(man); + struct ttm_range_mgr_node *node; struct drm_mm *mm = >mm; - struct drm_mm_node *node; enum drm_mm_insert_mode mode; unsigned long lpfn; int ret; @@ -70,7 +71,7 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, if (!lpfn) lpfn = man->size; - node = kzalloc(sizeof(*node), GFP_KERNEL); + node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL); I'm still a bit confused about the situation where a driver wants to attach private data to a struct ttm_resource without having to re-implement its own range manager? Could be cached sg-tables, list of GPU bindings etc. Wouldn't work with the above unless we have a void *driver_private member on the struct ttm_resource. Is that the plan going forward here? Or that the driver actually does the re-implementation? I don't really understand your concern here. The basic idea is that drivers use ttm_resource as a base class for their own implementation. See for example how nouveau does that: struct nouveau_mem { struct ttm_resource base; struct nouveau_cli *cli; u8 kind; u8 comp; struct nvif_mem mem; struct nvif_vma vma[2]; }; The range manager is helping driver specific resource managers which want to implement something drm_mm_nodes based. E.
Re: [PATCH 02/10] drm/ttm: flip over the range manager to self allocated nodes
On 6/2/21 3:07 PM, Christian König wrote: Am 02.06.21 um 14:33 schrieb Thomas Hellström (Intel): On 6/2/21 2:11 PM, Christian König wrote: Am 02.06.21 um 13:44 schrieb Thomas Hellström (Intel): On 6/2/21 12:09 PM, Christian König wrote: Start with the range manager to make the resource object the base class for the allocated nodes. While at it cleanup a lot of the code around that. Signed-off-by: Christian König Reviewed-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 + drivers/gpu/drm/drm_gem_vram_helper.c | 2 + drivers/gpu/drm/nouveau/nouveau_ttm.c | 2 + drivers/gpu/drm/qxl/qxl_ttm.c | 1 + drivers/gpu/drm/radeon/radeon_ttm.c | 1 + drivers/gpu/drm/ttm/ttm_range_manager.c | 56 ++--- drivers/gpu/drm/ttm/ttm_resource.c | 26 include/drm/ttm/ttm_bo_driver.h | 26 include/drm/ttm/ttm_range_manager.h | 43 +++ include/drm/ttm/ttm_resource.h | 3 ++ 10 files changed, 111 insertions(+), 50 deletions(-) create mode 100644 include/drm/ttm/ttm_range_manager.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 69db89261650..df1f185faae9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -45,6 +45,7 @@ #include #include #include +#include #include diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index 83e7258c7f90..17a4c5d47b6a 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -17,6 +17,8 @@ #include #include +#include + static const struct drm_gem_object_funcs drm_gem_vram_object_funcs; /** diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index 65430912ff72..b08b8efeefba 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -26,6 +26,8 @@ #include #include +#include + #include "nouveau_drv.h" #include "nouveau_gem.h" #include "nouveau_mem.h" diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c index 8aa87b8edb9c..19fd39d9a00c 100644 --- a/drivers/gpu/drm/qxl/qxl_ttm.c +++ b/drivers/gpu/drm/qxl/qxl_ttm.c @@ -32,6 +32,7 @@ #include #include #include +#include #include "qxl_drv.h" #include "qxl_object.h" diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index cdffa9b65108..ad2a5a791bba 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -45,6 +45,7 @@ #include #include #include +#include #include "radeon_reg.h" #include "radeon.h" diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c b/drivers/gpu/drm/ttm/ttm_range_manager.c index b9d5da6e6a81..ce5d07ca384c 100644 --- a/drivers/gpu/drm/ttm/ttm_range_manager.c +++ b/drivers/gpu/drm/ttm/ttm_range_manager.c @@ -29,12 +29,13 @@ * Authors: Thomas Hellstrom */ -#include +#include #include +#include +#include #include #include #include -#include /* * Currently we use a spinlock for the lock, but a mutex *may* be @@ -60,8 +61,8 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, struct ttm_resource *mem) { struct ttm_range_manager *rman = to_range_manager(man); + struct ttm_range_mgr_node *node; struct drm_mm *mm = >mm; - struct drm_mm_node *node; enum drm_mm_insert_mode mode; unsigned long lpfn; int ret; @@ -70,7 +71,7 @@ static int ttm_range_man_alloc(struct ttm_resource_manager *man, if (!lpfn) lpfn = man->size; - node = kzalloc(sizeof(*node), GFP_KERNEL); + node = kzalloc(struct_size(node, mm_nodes, 1), GFP_KERNEL); I'm still a bit confused about the situation where a driver wants to attach private data to a struct ttm_resource without having to re-implement its own range manager? Could be cached sg-tables, list of GPU bindings etc. Wouldn't work with the above unless we have a void *driver_private member on the struct ttm_resource. Is that the plan going forward here? Or that the driver actually does the re-implementation? I don't really understand your concern here. The basic idea is that drivers use ttm_resource as a base class for their own implementation. See for example how nouveau does that: struct nouveau_mem { struct ttm_resource base; struct nouveau_cli *cli; u8 kind; u8 comp; struct nvif_mem mem; struct nvif_vma vma[2]; }; The range manager is helping driver specific resource managers which want to implement something drm_mm_nodes based. E.g. amdgpu_gtt_mgr and amdgpu_vram_mgr, but it can also be used stand alone. The ttm_range_mgr_node can then be used as ba