Re: [Intel-gfx] [PATCH v2] drm/i915: Replace kmap() with kmap_local_page()
On sabato 17 giugno 2023 20:04:20 CEST Sumitra Sharma wrote: > kmap() has been deprecated in favor of the kmap_local_page() > due to high cost, restricted mapping space, the overhead of a > global lock for synchronization, and making the process sleep > in the absence of free slots. > > kmap_local_page() is faster than kmap() and offers thread-local > and CPU-local mappings, take pagefaults in a local kmap region NIT: _can_ take pagefaults in a local kmap region > and preserves preemption by saving the mappings of outgoing tasks > and restoring those of the incoming one during a context switch. > > The mapping is kept thread local in the function > “i915_vma_coredump_create” in i915_gpu_error.c > > Therefore, replace kmap() with kmap_local_page(). > > Suggested-by: Ira Weiny > > Signed-off-by: Sumitra Sharma > --- > > Changes in v2: > - Replace kmap() with kmap_local_page(). > - Change commit subject and message. With the changes that Ira suggested and the minor fix I'm proposing to the commit message, it looks good to me too, so this patch is... Reviewed-by: Fabio M. De Francesco However, as far as I'm concerned, our nits don't necessarily require any newer version, especially because Tvrtko has already sent this patch for their CI. Thanks, Fabio P.S.: As Sumitra says both kmap() and kmap_local_page() allows preemption in non atomic context. Furthermore, Tvrtko confirmed that the pages can come from HIGHMEM, therefore kmap_local_page for local temporary mapping is unavoidable. Last thing... Thomas thinks he wants to make it run atomically (if I understood one of his messages correctly). As I already responded, nothing prevents someone does another patch just to disable preemption (or to enter atomic context by other means) around the code marked by kmap_local_page() / kunmap_local() because these functions work perfectly _also_ in atomic context (including interrupts). But this is not something that Sumitra should be worried about. > > drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c > b/drivers/gpu/drm/i915/i915_gpu_error.c index f020c0086fbc..bc41500eedf5 > 100644 > --- a/drivers/gpu/drm/i915/i915_gpu_error.c > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c > @@ -1164,9 +1164,9 @@ i915_vma_coredump_create(const struct intel_gt *gt, > > drm_clflush_pages(, 1); > > - s = kmap(page); > + s = kmap_local_page(page); > ret = compress_page(compress, s, dst, false); > - kunmap(page); > + kunmap_local(s); > > drm_clflush_pages(, 1); > > -- > 2.25.1
Re: [Intel-gfx] [PATCH v2] drm/i915: Replace kmap() with kmap_local_page()
On mercoledì 21 giugno 2023 11:06:51 CEST Thomas Hellström (Intel) wrote: > > I think one thing worth mentioning in the context of this patch is that > IIRC kmap_local_page() will block offlining of the mapping CPU until > kunmap_local(), Migration is disabled. > so while I haven't seen any guidelines around the usage > of this api for long-held mappings, It would be advisable to not use it for long term mappings, if possible. These "local" mappings should better be help for not too long duration. > I figure it's wise to keep the > mapping duration short, or at least avoid sleeping with a > kmap_local_page() map active. Nothing prevents a call of preempt_disable() around the section of code between kmap_local_page() / kunmap_local(). If it is needed, local mappings could also be acquired under spinlocks and/or with disabled interrupts. I don't know the code, however, everything cited above could be the subject of a subsequent patch. Regards, Fabio > I figured, while page compression is probably to be considered "slow" > it's probably not slow enough to motivate kmap() instead of > kmap_local_page(), but if anyone feels differently, perhaps it should be > considered. > > With that said, my Reviewed-by: still stands. > > /Thomas >
[Intel-gfx] [PATCH v2 3/3] drm/i915/gem: Replace kmap() with kmap_local_page()
kmap() s been deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Obviously, thread locality implies that the kernel virtual addresses are only valid in the context of the callers. The kmap_local_page() use in i915/gem doesn't break the above-mentioned constraint, so it should be preferred to kmap(). Therefore, replace kmap() with kmap_local_page() in i915/gem and use memcpy_to_page() where it is possible to avoid the open coding of mapping + memcpy() + un-mapping. Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 ++ drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 37d1efcd3ca6..8856a6409e83 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -657,16 +657,14 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv, do { unsigned int len = min_t(typeof(size), size, PAGE_SIZE); struct page *page; - void *pgdata, *vaddr; + void *pgdata; err = aops->write_begin(file, file->f_mapping, offset, len, , ); if (err < 0) goto fail; - vaddr = kmap(page); - memcpy(vaddr, data, len); - kunmap(page); + memcpy_to_page(page, 0, data, len); err = aops->write_end(file, file->f_mapping, offset, len, len, page, pgdata); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 56279908ed30..5fd9e1ee2340 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -155,7 +155,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(to_gt(i915)); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); - cpu = kmap(p) + offset_in_page(offset); + cpu = kmap_local_page(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%lu + %u [0x%lx]) of 0x%x, found 0x%x\n", @@ -173,7 +173,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu)); - kunmap(p); + kunmap_local(cpu); out: i915_gem_object_lock(obj, NULL); @@ -251,7 +251,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(to_gt(i915)); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); - cpu = kmap(p) + offset_in_page(offset); + cpu = kmap_local_page(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%lu + %u [0x%lx]) of 0x%x, found 0x%x\n", @@ -269,7 +269,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu)); - kunmap(p); + kunmap_local(cpu); if (err) return err; -- 2.40.0
[Intel-gfx] [PATCH v2 2/3] drm/i915/gt: Replace kmap() with kmap_local_page()
kmap() s been deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Obviously, thread locality implies that the kernel virtual addresses are only valid in the context of the callers. The use of kmap_local_page() in i915/gt doesn't break the above-mentioned constraint, so it should be preferred to kmap(). Therefore, replace kmap() with kmap_local_page() in i915/gt. Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c| 11 --- 2 files changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index 37d0b0fe791d..89295c6921d6 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -749,7 +749,7 @@ static void swizzle_page(struct page *page) char *vaddr; int i; - vaddr = kmap(page); + vaddr = kmap_local_page(page); for (i = 0; i < PAGE_SIZE; i += 128) { memcpy(temp, [i], 64); @@ -757,7 +757,7 @@ static void swizzle_page(struct page *page) memcpy([i + 64], temp, 64); } - kunmap(page); + kunmap_local(vaddr); } /** diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 449c9ed44382..be809839a241 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -101,22 +101,19 @@ static int __shmem_rw(struct file *file, loff_t off, unsigned int this = min_t(size_t, PAGE_SIZE - offset_in_page(off), len); struct page *page; - void *vaddr; page = shmem_read_mapping_page_gfp(file->f_mapping, pfn, GFP_KERNEL); if (IS_ERR(page)) return PTR_ERR(page); - vaddr = kmap(page); if (write) { - memcpy(vaddr + offset_in_page(off), ptr, this); + memcpy_to_page(page, offset_in_page(off), ptr, this); set_page_dirty(page); } else { - memcpy(ptr, vaddr + offset_in_page(off), this); + memcpy_from_page(ptr, page, offset_in_page(off), this); } mark_page_accessed(page); - kunmap(page); put_page(page); len -= this; @@ -143,11 +140,11 @@ int shmem_read_to_iosys_map(struct file *file, loff_t off, if (IS_ERR(page)) return PTR_ERR(page); - vaddr = kmap(page); + vaddr = kmap_local_page(page); iosys_map_memcpy_to(map, map_off, vaddr + offset_in_page(off), this); mark_page_accessed(page); - kunmap(page); + kunmap_local(vaddr); put_page(page); len -= this; -- 2.40.0
[Intel-gfx] [PATCH v2 1/3] drm/i915: Replace kmap() with kmap_local_page()
kmap() has been deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Obviously, thread locality implies that the kernel voirtual addresses are valid only in the context of the callers. kmap_local_page() use in i915_gem.c doesn't break the above-mentioned constraint, so it should be preferred to kmap(). Therefore, replace kmap() with kmap_local_page() in i915_gem.c Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/i915_gem.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 35950fa91406..c35248555e42 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -212,14 +212,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret; - vaddr = kmap(page); + vaddr = kmap_local_page(page); if (needs_clflush) drm_clflush_virt_range(vaddr + offset, len); ret = __copy_to_user(user_data, vaddr + offset, len); - kunmap(page); + kunmap_local(vaddr); return ret ? -EFAULT : 0; } @@ -643,7 +643,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret; - vaddr = kmap(page); + vaddr = kmap_local_page(page); if (needs_clflush_before) drm_clflush_virt_range(vaddr + offset, len); @@ -652,7 +652,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, if (!ret && needs_clflush_after) drm_clflush_virt_range(vaddr + offset, len); - kunmap(page); + kunmap_local(vaddr); return ret ? -EFAULT : 0; } -- 2.40.0
[Intel-gfx] [PATCH v2 0/3] drm/i915: Replace kmap() with kmap_local_page()
kmap() has been deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). The tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and so they are still valid. Furthermore, kmap_local_page() is faster than kmap() in kernels with HIGHMEM enabled. Thread locality implies that the kernel virtual addresses returned by kmap_local_page() are only valid in the context of the callers. This constraint is never violated with the conversions in this series, because the pointers are never handed to other threads, so the local mappings are allowed and preferred. Therefore, replace kmap() with kmap_local_page() in drm/i915/, drm/i915/gem/, drm/i915/gt/. Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco v1->v2: Do some changes in the text of the cover letter and in the commit messages. There are no changes in the code of any of the three patches. Fabio M. De Francesco (3): drm/i915: Replace kmap() with kmap_local_page() drm/i915/gt: Replace kmap() with kmap_local_page() drm/i915/gem: Replace kmap() with kmap_local_page() drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 ++ drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c | 11 --- drivers/gpu/drm/i915/i915_gem.c| 8 5 files changed, 16 insertions(+), 21 deletions(-) -- 2.40.0
Re: [Intel-gfx] [PATCH v2 9/9] drm/i915: Use kmap_local_page() in gem/i915_gem_execbuffer.c
On venerdì 31 marzo 2023 13:30:20 CEST Tvrtko Ursulin wrote: > On 31/03/2023 05:18, Ira Weiny wrote: > > Zhao Liu wrote: > >> From: Zhao Liu > >> > >> The use of kmap_atomic() is being deprecated in favor of > >> kmap_local_page()[1], and this patch converts the calls from > >> kmap_atomic() to kmap_local_page(). > >> > >> The main difference between atomic and local mappings is that local > >> mappings doesn't disable page faults or preemption (the preemption is > >> disabled for !PREEMPT_RT case, otherwise it only disables migration). > >> > >> With kmap_local_page(), we can avoid the often unwanted side effect of > >> unnecessary page faults and preemption disables. > >> > >> In i915_gem_execbuffer.c, eb->reloc_cache.vaddr is mapped by > >> kmap_atomic() in eb_relocate_entry(), and is unmapped by > >> kunmap_atomic() in reloc_cache_reset(). > > > > First off thanks for the series and sticking with this. That said this > > patch kind of threw me for a loop because tracing the map/unmap calls did > > not make sense to me. See below. > > > >> And this mapping/unmapping occurs in two places: one is in > >> eb_relocate_vma(), and another is in eb_relocate_vma_slow(). > >> > >> The function eb_relocate_vma() or eb_relocate_vma_slow() doesn't > >> need to disable pagefaults and preemption during the above mapping/ > >> unmapping. > >> > >> So it can simply use kmap_local_page() / kunmap_local() that can > >> instead do the mapping / unmapping regardless of the context. > >> > >> Convert the calls of kmap_atomic() / kunmap_atomic() to > >> kmap_local_page() / kunmap_local(). > >> > >> [1]: > >> https://lore.kernel.org/all/20220813220034.806698-1-ira.we...@intel.com > >> > >> v2: No code change since v1. Added description of the motivation of > >> > >> using kmap_local_page() and "Suggested-by" tag of Fabio. > >> > >> Suggested-by: Ira Weiny > >> Suggested-by: Fabio M. De Francesco > >> Signed-off-by: Zhao Liu > >> --- > >> > >> Suggested by credits: > >>Ira: Referred to his task document, review comments. > >>Fabio: Referred to his boiler plate commit message and his description > >> > >> about why kmap_local_page() should be preferred. > >> > >> --- > >> > >> drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 +- > >> 1 file changed, 5 insertions(+), 5 deletions(-) > >> [snip] > However I am unsure if disabling pagefaulting is needed or not. Thomas, > Matt, being the last to touch this area, perhaps you could have a look? > Because I notice we have a fallback iomap path which still uses > io_mapping_map_atomic_wc. So if kmap_atomic to kmap_local conversion is > safe, does the iomap side also needs converting to > io_mapping_map_local_wc? Or they have separate requirements? AFAIK, the requirements for io_mapping_map_local_wc() are the same as for kmap_local_page(): the kernel virtual address is _only_ valid in the caller context, and map/unmap nesting must be done in stack-based ordering (LIFO). I think a follow up patch could safely switch to io_mapping_map_local_wc() / io_mapping_unmap_local_wc since the address is local to context. However, not being an expert, reading your note now I suspect that I'm missing something. Can I ask why you think that page-faults disabling might be necessary? Thanks, Fabio > Regards, > > Tvrtko
Re: [Intel-gfx] [PATCH v2 0/9] drm/i915: Replace kmap_atomic() with kmap_local_page()
On mercoledì 29 marzo 2023 09:32:11 CEST Zhao Liu wrote: > From: Zhao Liu > > Hi list, > > Sorry for a long delay since v1 [1]. This patchset is based on 197b6b6 > (Linux 6.3-rc4). > > Welcome and thanks for your review and comments! > > > # Purpose of this patchset > > The purpose of this pacthset is to replace all uses of kmap_atomic() in > i915 with kmap_local_page() because the use of kmap_atomic() is being > deprecated in favor of kmap_local_page()[1]. And 92b64bd (mm/highmem: > add notes about conversions from kmap{,_atomic}()) has declared the > deprecation of kmap_atomic(). > > > # Motivation for deprecating kmap_atomic() and using kmap_local_page() > > The main difference between atomic and local mappings is that local > mappings doesn't disable page faults or preemption (the preemption is > disabled for !PREEMPT_RT case, otherwise it only disables migration). > > With kmap_local_page(), we can avoid the often unwanted side effect of > unnecessary page faults and preemption disables. > > > # Patch summary > > Patch 1, 4-6 and 8-9 replace kamp_atomic()/kunmap_atomic() with > kmap_local_page()/kunmap_local() directly. With thses local > mappings, the page faults and preemption are allowed. > > Patch 2 and 7 use memcpy_from_page() and memcpy_to_page() to replace > kamp_atomic()/kunmap_atomic(). These two variants of memcpy() > are based on the local mapping, so page faults and preemption > are also allowed in these two interfaces. > > Patch 3 replaces kamp_atomic()/kunmap_atomic() with kmap_local_page()/ > kunmap_local() and also diable page fault since the for special > handling (pls see the commit message). > > > # Changes since v1 > > * Dropped hot plug related description in commit message since it has > nothing to do with kmap_local_page(). > * Emphasized the motivation for using kmap_local_page() in commit > message. > * Rebased patch 1 on f47e630 (drm/i915/gem: Typecheck page lookups) to > keep the "idx" variable of type pgoff_t here. > * Used memcpy_from_page() and memcpy_to_page() to replace > kmap_local_page() + memcpy() in patch 2. > > > # Reference > > [1]: > https://lore.kernel.org/lkml/20221017093726.2070674-1-zhao1.liu@linux.intel.c > om/ [1]: > https://lore.kernel.org/all/20220813220034.806698-1-ira.we...@intel.com --- > Zhao Liu (9): > drm/i915: Use kmap_local_page() in gem/i915_gem_object.c > drm/i915: Use memcpy_[from/to]_page() in gem/i915_gem_pyhs.c > drm/i915: Use kmap_local_page() in gem/i915_gem_shmem.c > drm/i915: Use kmap_local_page() in gem/selftests/huge_pages.c > drm/i915: Use kmap_local_page() in gem/selftests/i915_gem_coherency.c > drm/i915: Use kmap_local_page() in gem/selftests/i915_gem_context.c > drm/i915: Use memcpy_from_page() in gt/uc/intel_uc_fw.c > drm/i915: Use kmap_local_page() in i915_cmd_parser.c > drm/i915: Use kmap_local_page() in gem/i915_gem_execbuffer.c > I _think_ that the "long delay" you mentioned in the first sentence has paid off in full. I don't see things to improve (except all those "kamp_atomic()" typo in the patches summary; however, typos are only in the cover so I'm sure they won't hurt anybody). Each of the nine patches listed above looks good to me, so they are all… Reviewed-by: Fabio M. De Francesco Thanks! Fabio PS: Obviously there was no need to reconfirm my tag for patch 3/9. A single tag that catches all patches is easier for a lazy person like me :-) > > drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 +- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 +++- > drivers/gpu/drm/i915/gem/i915_gem_phys.c | 10 ++ > drivers/gpu/drm/i915/gem/i915_gem_shmem.c| 6 -- > drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 6 +++--- > .../gpu/drm/i915/gem/selftests/i915_gem_coherency.c | 12 > .../gpu/drm/i915/gem/selftests/i915_gem_context.c| 8 > drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 5 + > drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++-- > 9 files changed, 28 insertions(+), 41 deletions(-) > > -- > 2.34.1
Re: [Intel-gfx] [PATCH 3/9] drm/i915: Use kmap_local_page() in gem/i915_gem_shmem.c
On lunedì 17 ottobre 2022 11:37:19 CET Zhao Liu wrote: > From: Zhao Liu > > The use of kmap_atomic() is being deprecated in favor of > kmap_local_page()[1]. > > The main difference between atomic and local mappings is that local > mappings doesn't disable page faults or preemption. > > In drm/i915/gem/i915_gem_shmem.c, the function shmem_pwrite() need to > disable pagefault to eliminate the potential recursion fault[2]. But > here __copy_from_user_inatomic() doesn't need to disable preemption and > local mapping is valid for sched in/out. > So it can use kmap_local_page() / kunmap_local() with > pagefault_disable() / pagefault_enable() to replace atomic mapping. > > [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.we...@intel.com > [2]: https://patchwork.freedesktop.org/patch/295840/ > > Suggested-by: Ira Weiny > Signed-off-by: Zhao Liu > --- > Suggested by credits: > Ira: Referred to his suggestions about keeping pagefault_disable(). > --- > drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) The commit message explains clearly and the changes to the code look good. The necessity to reuse pagefault_disable() / pagefault_enable() from the main kmap_atomic() side effect is a nice catch. Reviewed-by: Fabio M. De Francesco Thanks! > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index f42ca1179f37..e279a3e30c02 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > @@ -472,11 +472,13 @@ shmem_pwrite(struct drm_i915_gem_object *obj, > if (err < 0) > return err; > > - vaddr = kmap_atomic(page); > + vaddr = kmap_local_page(page); > + pagefault_disable(); > unwritten = __copy_from_user_inatomic(vaddr + pg, > user_data, > len); > - kunmap_atomic(vaddr); > + pagefault_enable(); > + kunmap_local(vaddr); > > err = aops->write_end(obj->base.filp, mapping, offset, len, > len - unwritten, page, data);
Re: [Intel-gfx] [PATCH 1/9] drm/i915: Use kmap_local_page() in gem/i915_gem_object.c
On giovedì 3 novembre 2022 17:51:23 CET Ira Weiny wrote: > On Sat, Oct 29, 2022 at 01:17:03PM +0200, Fabio M. De Francesco wrote: > > On lunedì 17 ottobre 2022 11:37:17 CEST Zhao Liu wrote: > > > From: Zhao Liu > > > > > > The use of kmap_atomic() is being deprecated in favor of > > > kmap_local_page()[1]. > > > > > > The main difference between atomic and local mappings is that local > > > mappings doesn't disable page faults or preemption. > > > > You are right about about page faults which are never disabled by > > kmap_local_page(). However kmap_atomic might not disable preemption. It > > depends on CONFIG_PREEMPT_RT. > > > > Please refer to how kmap_atomic_prot() works (this function is called by > > kmap_atomic() when kernels have HIGHMEM enabled). > > > > > There're 2 reasons why i915_gem_object_read_from_page_kmap() doesn't > > > need to disable pagefaults and preemption for mapping: > > > > > > 1. The flush operation is safe for CPU hotplug when preemption is not > > > disabled. > > > > I'm confused here. Why are you talking about CPU hotplug? > > I agree with Fabio here. I'm not making the connection between cpu hotplug and > this code path. > > Ira @Zhao, I'd like to add that I was about to put my reviewed-by tag. The other things I objected are minor nits. Please just clarify this connection. Your code is good and deserves to be applied. Fabio > > > In any case, developers should never rely on implicit calls of > > preempt_disable() for the reasons said above. Therefore, flush operations > > should be allowed regardless that kmap_atomic() potential side effect. > > > > > In drm/i915/gem/i915_gem_object.c, the function > > > i915_gem_object_read_from_page_kmap() calls drm_clflush_virt_range() > > > > If I recall correctly, drm_clflush_virt_range() can always be called with page > > faults and preemption enabled. If so, this is enough to say that the > > conversion is safe. > > > > Is this code explicitly related to flushing the cache lines before removing / > > adding CPUs? If I recall correctly, there are several other reasons behind the > > need to issue cache lines flushes. Am I wrong about this? > > > > Can you please say more about what I'm missing here? > > > > > to > > > use CLFLUSHOPT or WBINVD to flush. Since CLFLUSHOPT is global on x86 > > > and WBINVD is called on each cpu in drm_clflush_virt_range(), the flush > > > operation is global and any issue with cpu's being added or removed > > > can be handled safely. > > > > Again your main concern is about CPU hotplug. > > > > Even if I'm missing something, do we really need all these details about the > > inner workings of drm_clflush_virt_range()? > > > > I'm not an expert, so may be that I'm wrong about all I wrote above. > > > > Therefore, can you please elaborate a little more for readers with very little > > knowledge of these kinds of things (like me and perhaps others)? > > > > > 2. Any context switch caused by preemption or sleep (pagefault may > > > cause sleep) doesn't affect the validity of local mapping. > > > > I'd replace "preemption or sleep" with "preemption and page faults" since > > yourself then added that page faults lead to tasks being put to sleep. > > > > > Therefore, i915_gem_object_read_from_page_kmap() is a function where > > > the use of kmap_local_page() in place of kmap_atomic() is correctly > > > suited. > > > > > > Convert the calls of kmap_atomic() / kunmap_atomic() to > > > kmap_local_page() / kunmap_local(). > > > > > > And remove the redundant variable that stores the address of the mapped > > > page since kunmap_local() can accept any pointer within the page. > > > > > > [1]: > > > https://lore.kernel.org/all/20220813220034.806698-1-ira.we...@intel.com > > > > > > Suggested-by: Dave Hansen > > > Suggested-by: Ira Weiny > > > Suggested-by: Fabio M. De Francesco > > > Signed-off-by: Zhao Liu > > > --- > > > > > > Suggested by credits: > > > Dave: Referred to his explanation about cache flush. > > > Ira: Referred to his task document, review comments and explanation about > > > > > >cache flush. > > > > > > Fabio: Referred to his boiler plate commit message. > > > > > > --- > > > > > > dri
Re: [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Replace kmap() with kmap_local_page() (rev2)
On lunedì 17 ottobre 2022 22:23:30 CET Patchwork wrote: > == Series Details == > > Series: drm/i915: Replace kmap() with kmap_local_page() (rev2) > URL : https://patchwork.freedesktop.org/series/107277/ > State : warning > > == Summary == > > Error: dim sparse failed > Sparse version: v0.6.2 > Fast mode used, each commit won't be checked separately. I received this and other messages with regards to the same series, however, after due inspection of my patches, I haven't yet been able to understand what these reports may mean or even if they are reporting bugs. Can anyone please provide any help? Thanks, Fabio P.S.: Similar requests for the other messages will follow ASAP.
Re: [Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915: Replace kmap() with kmap_local_page() (rev2)
On lunedì 17 ottobre 2022 22:23:30 CET Patchwork wrote: > == Series Details == > > Series: drm/i915: Replace kmap() with kmap_local_page() (rev2) > URL : https://patchwork.freedesktop.org/series/107277/ > State : warning > > == Summary == > > Error: dim sparse failed > Sparse version: v0.6.2 > Fast mode used, each commit won't be checked separately. I received this and other messages with regards to the same series, however, after due inspection of my patches, I haven't yet been able to understand what these reports may mean or even if they are reporting bugs. Can anyone please provide any help? Thanks, Fabio P.S.: Similar requests for the other messages will follow ASAP.
Re: [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Replace kmap() with kmap_local_page() (rev2)
On lunedì 17 ottobre 2022 22:34:44 CET Patchwork wrote: > == Series Details == > > Series: drm/i915: Replace kmap() with kmap_local_page() (rev2) > URL : https://patchwork.freedesktop.org/series/107277/ > State : success > > == Summary == > > CI Bug Log - changes from CI_DRM_12252 -> Patchwork_107277v2 > > > Summary > --- > > **SUCCESS** > > No regressions found. > > External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107277v2/ index.html > > Participating hosts (46 -> 43) > -- > > Additional (1): fi-icl-u2 > Missing(4): fi-bxt-dsi fi-cfl-8700k bat-atsm-1 fi-bdw-samus > > Known issues > > > Here are the changes found in Patchwork_107277v2 that come from known issues: > > ### IGT changes ### > > Issues hit > > * igt@gem_exec_gttfill@basic: > - fi-pnv-d510:[PASS][1] -> [FAIL][2] ([i915#7229]) >[1]: > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12252/fi-pnv-d510/ igt@gem_exec_gttfill@ > basic.html [2]: > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107277v2/fi-pnv-d510/ igt@gem_exec_gt > tf...@basic.html > > * igt@gem_linear_blits@basic: > - fi-pnv-d510:[PASS][3] -> [SKIP][4] ([fdo#109271]) >[3]: > https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_12252/fi-pnv-d510/ igt@gem_linear_blits@ > basic.html [4]: > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107277v2/fi-pnv-d510/ igt@gem_linear_ > bl...@basic.html > > * igt@runner@aborted: > - fi-icl-u2: NOTRUN -> [FAIL][5] ([i915#7220]) >[5]: > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107277v2/fi-icl-u2/ igt@runner@aborte > d.html > > > [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271 > [i915#7220]: https://gitlab.freedesktop.org/drm/intel/issues/7220 > [i915#7229]: https://gitlab.freedesktop.org/drm/intel/issues/7229 > > > Build changes > - > > * Linux: CI_DRM_12252 -> Patchwork_107277v2 > > CI-20190529: 20190529 > CI_DRM_12252: 14867082ef288af10c90732e31b633af30e304c0 @ > git://anongit.freedesktop.org/gfx-ci/linux IGT_7017: > 193c8bdd7df32556129482c52011e1b7a233b699 @ > https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_107277v2: > 14867082ef288af10c90732e31b633af30e304c0 @ git://anongit.freedesktop.org/ gfx-ci/linux > > > ### Linux commits > > 83c34dd3b2a2 drm/i915/gem: Replace kmap() with kmap_local_page() > ade39e0c9963 drm/i915/gt: Replace kmap() with kmap_local_page() > cd89efceb13c drm/i915: Replace kmap() with kmap_local_page() > > == Logs == > > For more details see: > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_107277v2/index.html This is the second of the three messages with regards to the same series. My kind request is the same of the previous... Can anyone please provide any help? Thanks, Fabio
Re: [Intel-gfx] [PATCH 2/9] drm/i915: Use kmap_local_page() in gem/i915_gem_pyhs.c
On lunedì 17 ottobre 2022 11:37:18 CEST Zhao Liu wrote: > From: Zhao Liu > > The use of kmap_atomic() is being deprecated in favor of > kmap_local_page()[1]. > > The main difference between atomic and local mappings is that local > mappings doesn't disable page faults or preemption. > > In drm/i915/gem/i915_gem_phys.c, the functions > i915_gem_object_get_pages_phys() and i915_gem_object_put_pages_phys() > don't need to disable pagefaults and preemption for mapping because of > these 2 reasons: > > 1. The flush operation is safe for CPU hotplug when preemption is not > disabled. In drm/i915/gem/i915_gem_object.c, the functions > i915_gem_object_get_pages_phys() and i915_gem_object_put_pages_phys() > calls drm_clflush_virt_range() to use CLFLUSHOPT or WBINVD to flush. > Since CLFLUSHOPT is global on x86 and WBINVD is called on each cpu in > drm_clflush_virt_range(), the flush operation is global and any issue > with cpu's being added or removed can be handled safely. > > 2. Any context switch caused by preemption or sleep (pagefault may > cause sleep) doesn't affect the validity of local mapping. > > Therefore, i915_gem_object_get_pages_phys() and > i915_gem_object_put_pages_phys() are two functions where the use of > kmap_local_page() in place of kmap_atomic() is correctly suited. > > Convert the calls of kmap_atomic() / kunmap_atomic() to > kmap_local_page() / kunmap_local(). > I have here the same questions as in 1/9. > [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.we...@intel.com > > Suggested-by: Dave Hansen > Suggested-by: Ira Weiny > Suggested-by: Fabio M. De Francesco > Signed-off-by: Zhao Liu > --- > Suggested by credits: > Dave: Referred to his explanation about cache flush. > Ira: Referred to his task document, review comments and explanation about >cache flush. > Fabio: Referred to his boiler plate commit message. > --- > drivers/gpu/drm/i915/gem/i915_gem_phys.c | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c > b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 0d0e46dae559..d602ba19ecb2 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c > @@ -66,10 +66,10 @@ static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object > *obj) if (IS_ERR(page)) > goto err_st; > > - src = kmap_atomic(page); > + src = kmap_local_page(page); > memcpy(dst, src, PAGE_SIZE); > drm_clflush_virt_range(dst, PAGE_SIZE); > - kunmap_atomic(src); > + kunmap_local(src); Please use memcpy_from_page() instead of open coding mapping + memcpy() + unmapping. > > put_page(page); > dst += PAGE_SIZE; > @@ -114,10 +114,10 @@ i915_gem_object_put_pages_phys(struct drm_i915_gem_object *obj, > if (IS_ERR(page)) > continue; > > - dst = kmap_atomic(page); > + dst = kmap_local_page(page); > drm_clflush_virt_range(src, PAGE_SIZE); > memcpy(dst, src, PAGE_SIZE); > - kunmap_atomic(dst); > + kunmap_local(dst); For the same reasons said above, memcpy_to_page() should be used here and avoid open coding of three functions. Using those helpers forces you to move drm_clflush_virt_range() out of the mapping / un-mapping region. I may be wrong, however I'm pretty sure that the relative positions of each of those call sites is something that cannot be randomly chosen. Thanks, Fabio > > set_page_dirty(page); > if (obj->mm.madv == I915_MADV_WILLNEED)
Re: [Intel-gfx] [PATCH 1/9] drm/i915: Use kmap_local_page() in gem/i915_gem_object.c
On lunedì 17 ottobre 2022 11:37:17 CEST Zhao Liu wrote: > From: Zhao Liu > > The use of kmap_atomic() is being deprecated in favor of > kmap_local_page()[1]. > > The main difference between atomic and local mappings is that local > mappings doesn't disable page faults or preemption. You are right about about page faults which are never disabled by kmap_local_page(). However kmap_atomic might not disable preemption. It depends on CONFIG_PREEMPT_RT. Please refer to how kmap_atomic_prot() works (this function is called by kmap_atomic() when kernels have HIGHMEM enabled). > > There're 2 reasons why i915_gem_object_read_from_page_kmap() doesn't > need to disable pagefaults and preemption for mapping: > > 1. The flush operation is safe for CPU hotplug when preemption is not > disabled. I'm confused here. Why are you talking about CPU hotplug? In any case, developers should never rely on implicit calls of preempt_disable() for the reasons said above. Therefore, flush operations should be allowed regardless that kmap_atomic() potential side effect. > In drm/i915/gem/i915_gem_object.c, the function > i915_gem_object_read_from_page_kmap() calls drm_clflush_virt_range() If I recall correctly, drm_clflush_virt_range() can always be called with page faults and preemption enabled. If so, this is enough to say that the conversion is safe. Is this code explicitly related to flushing the cache lines before removing / adding CPUs? If I recall correctly, there are several other reasons behind the need to issue cache lines flushes. Am I wrong about this? Can you please say more about what I'm missing here? > to > use CLFLUSHOPT or WBINVD to flush. Since CLFLUSHOPT is global on x86 > and WBINVD is called on each cpu in drm_clflush_virt_range(), the flush > operation is global and any issue with cpu's being added or removed > can be handled safely. Again your main concern is about CPU hotplug. Even if I'm missing something, do we really need all these details about the inner workings of drm_clflush_virt_range()? I'm not an expert, so may be that I'm wrong about all I wrote above. Therefore, can you please elaborate a little more for readers with very little knowledge of these kinds of things (like me and perhaps others)? > 2. Any context switch caused by preemption or sleep (pagefault may > cause sleep) doesn't affect the validity of local mapping. I'd replace "preemption or sleep" with "preemption and page faults" since yourself then added that page faults lead to tasks being put to sleep. > Therefore, i915_gem_object_read_from_page_kmap() is a function where > the use of kmap_local_page() in place of kmap_atomic() is correctly > suited. > > Convert the calls of kmap_atomic() / kunmap_atomic() to > kmap_local_page() / kunmap_local(). > > And remove the redundant variable that stores the address of the mapped > page since kunmap_local() can accept any pointer within the page. > > [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.we...@intel.com > > Suggested-by: Dave Hansen > Suggested-by: Ira Weiny > Suggested-by: Fabio M. De Francesco > Signed-off-by: Zhao Liu > --- > Suggested by credits: > Dave: Referred to his explanation about cache flush. > Ira: Referred to his task document, review comments and explanation about >cache flush. > Fabio: Referred to his boiler plate commit message. > --- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 +++- > 1 file changed, 3 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c > b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 369006c5317f..a0072abed75e 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c > @@ -413,17 +413,15 @@ void __i915_gem_object_invalidate_frontbuffer(struct > drm_i915_gem_object *obj, static void > i915_gem_object_read_from_page_kmap(struct drm_i915_gem_object *obj, u64 offset, void > *dst, int size) { > - void *src_map; > void *src_ptr; > > - src_map = kmap_atomic(i915_gem_object_get_page(obj, offset >> PAGE_SHIFT)); > - > - src_ptr = src_map + offset_in_page(offset); > + src_ptr = kmap_local_page(i915_gem_object_get_page(obj, offset >> PAGE_SHIFT)) > + + offset_in_page(offset); > if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_READ)) > drm_clflush_virt_range(src_ptr, size); > memcpy(dst, src_ptr, size); > > - kunmap_atomic(src_map); > + kunmap_local(src_ptr); > } > > static void The changes look good, but I'd like to better understand the commit message. Thanks, Fabio
Re: [Intel-gfx] [PATCH 0/9] drm/i915: Replace kmap_atomic() with kmap_local_page()
On lunedì 17 ottobre 2022 11:37:16 CEST Zhao Liu wrote: > From: Zhao Liu > > The use of kmap_atomic() is being deprecated in favor of > kmap_local_page()[1]. Some words to explain why kmap_atomic was deprecated won't hurt. Many maintainers and reviewers, and also casual readers might not yet be aware of the reasons behind that deprecation. > In the following patches, we can convert the calls of kmap_atomic() / > kunmap_atomic() to kmap_local_page() / kunmap_local(), which can > instead do the mapping / unmapping regardless of the context. Readers are probably much more interested in what you did in the following patches and why you did it, instead of being informed about what "we can" do. I would suggest something like "The following patches convert the calls to kmap_atomic() to kmap_local_page() [the rest looks OK]". This could also be the place to say something about why we prefer kmap_local_page() to kmap_atomic(). Are you sure that the reasons that motivates your conversions are merely summarized to kmap_local_page() being able to do mappings regardless of context? I think you are missing the real reasons why. What about avoiding the often unwanted side effect of unnecessary page faults disables? > > With kmap_local_page(), the mapping is per thread, CPU local and not > globally visible. No news here. kmap_atomic() is "per thread, CPU local and not glocally visible". I cannot see any difference here between kmap_atomic() and kmap_local_page(). > > [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.we...@intel.com > --- > Zhao Liu (9): > drm/i915: Use kmap_local_page() in gem/i915_gem_object.c > drm/i915: Use kmap_local_page() in gem/i915_gem_pyhs.c > drm/i915: Use kmap_local_page() in gem/i915_gem_shmem.c > drm/i915: Use kmap_local_page() in gem/selftests/huge_pages.c > drm/i915: Use kmap_local_page() in gem/selftests/i915_gem_coherency.c > drm/i915: Use kmap_local_page() in gem/selftests/i915_gem_context.c > drm/i915: Use memcpy_from_page() in gt/uc/intel_uc_fw.c > drm/i915: Use kmap_local_page() in i915_cmd_parser.c > drm/i915: Use kmap_local_page() in gem/i915_gem_execbuffer.c > > drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 +- > drivers/gpu/drm/i915/gem/i915_gem_object.c | 8 +++- > drivers/gpu/drm/i915/gem/i915_gem_phys.c | 8 > drivers/gpu/drm/i915/gem/i915_gem_shmem.c| 6 -- > drivers/gpu/drm/i915/gem/selftests/huge_pages.c | 6 +++--- > .../gpu/drm/i915/gem/selftests/i915_gem_coherency.c | 12 > .../gpu/drm/i915/gem/selftests/i915_gem_context.c| 8 > drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 5 + > drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++-- > 9 files changed, 30 insertions(+), 37 deletions(-) Thanks for helping with kmap_atomic() conversions to kmap_local_page(). Fabio
[Intel-gfx] [RESEND PATCH 1/3] drm/i915: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Since its use in i915_gem.c is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in i915_gem.c Cc: "Venkataramanan, Anirudh" Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/i915_gem.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 702e5b89be22..43effce60e1b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -212,14 +212,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret; - vaddr = kmap(page); + vaddr = kmap_local_page(page); if (needs_clflush) drm_clflush_virt_range(vaddr + offset, len); ret = __copy_to_user(user_data, vaddr + offset, len); - kunmap(page); + kunmap_local(vaddr); return ret ? -EFAULT : 0; } @@ -634,7 +634,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret; - vaddr = kmap(page); + vaddr = kmap_local_page(page); if (needs_clflush_before) drm_clflush_virt_range(vaddr + offset, len); @@ -643,7 +643,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, if (!ret && needs_clflush_after) drm_clflush_virt_range(vaddr + offset, len); - kunmap(page); + kunmap_local(vaddr); return ret ? -EFAULT : 0; } -- 2.37.1
[Intel-gfx] [RESEND PATCH 3/3] drm/i915/gem: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Since its use in i915/gem is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in i915/gem. Instead of open-coding local map + memcpy + local unmap, use memcpy_to_page() in a suited call site. Cc: "Venkataramanan, Anirudh" Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 ++ drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 4eed3dd90ba8..2bc6ab9964ff 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -640,16 +640,14 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv, do { unsigned int len = min_t(typeof(size), size, PAGE_SIZE); struct page *page; - void *pgdata, *vaddr; + void *pgdata; err = aops->write_begin(file, file->f_mapping, offset, len, , ); if (err < 0) goto fail; - vaddr = kmap(page); - memcpy(vaddr, data, len); - kunmap(page); + memcpy_to_page(page, 0, data, len); err = aops->write_end(file, file->f_mapping, offset, len, len, page, pgdata); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 3ced9948a331..bb25b50b5688 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -153,7 +153,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(to_gt(i915)); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); - cpu = kmap(p) + offset_in_page(offset); + cpu = kmap_local_page(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n", @@ -171,7 +171,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu)); - kunmap(p); + kunmap_local(cpu); out: i915_gem_object_lock(obj, NULL); @@ -249,7 +249,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(to_gt(i915)); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); - cpu = kmap(p) + offset_in_page(offset); + cpu = kmap_local_page(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n", @@ -267,7 +267,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu)); - kunmap(p); + kunmap_local(cpu); if (err) return err; -- 2.37.1
[Intel-gfx] [RESEND PATCH 0/3] drm/i915: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. Since its use in drm/i915 is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in drm/i915. These changes should be tested in an 32 bits system, booting a kernel with HIGHMEM enabled. Unfortunately I have no i915 based hardware, therefore any help with testing would be greatly appreciated. I'm resending this little series because I suspect that it has been lost, since it was submitted on Aug 11, 2022. In the meantime I'm adding one more recipient (Anirudh) who is helping, along with others, Ira and me with these conversions / removals of kmap() and kmap_atomic() Cc: "Venkataramanan, Anirudh" Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco Fabio M. De Francesco (3): drm/i915: Replace kmap() with kmap_local_page() drm/i915/gt: Replace kmap() with kmap_local_page() drm/i915/gem: Replace kmap() with kmap_local_page() drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 ++ drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c | 11 --- drivers/gpu/drm/i915/i915_gem.c| 8 5 files changed, 16 insertions(+), 21 deletions(-) -- 2.37.1
[Intel-gfx] [RESEND PATCH 2/3] drm/i915/gt: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Since its use in i915/gt is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in i915/gt. Instead of open-coding local mappings + memcpy() + local unmappings, use the memcpy_{from,to}_page() helpers where these are better suited. Cc: "Venkataramanan, Anirudh" Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c| 11 --- 2 files changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index 6ebda3d65086..21d8ce40b897 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -747,7 +747,7 @@ static void swizzle_page(struct page *page) char *vaddr; int i; - vaddr = kmap(page); + vaddr = kmap_local_page(page); for (i = 0; i < PAGE_SIZE; i += 128) { memcpy(temp, [i], 64); @@ -755,7 +755,7 @@ static void swizzle_page(struct page *page) memcpy([i + 64], temp, 64); } - kunmap(page); + kunmap_local(vaddr); } /** diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 402f085f3a02..48edbb8a33e5 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -98,22 +98,19 @@ static int __shmem_rw(struct file *file, loff_t off, unsigned int this = min_t(size_t, PAGE_SIZE - offset_in_page(off), len); struct page *page; - void *vaddr; page = shmem_read_mapping_page_gfp(file->f_mapping, pfn, GFP_KERNEL); if (IS_ERR(page)) return PTR_ERR(page); - vaddr = kmap(page); if (write) { - memcpy(vaddr + offset_in_page(off), ptr, this); + memcpy_to_page(page, offset_in_page(off), ptr, this); set_page_dirty(page); } else { - memcpy(ptr, vaddr + offset_in_page(off), this); + memcpy_from_page(ptr, page, offset_in_page(off), this); } mark_page_accessed(page); - kunmap(page); put_page(page); len -= this; @@ -140,11 +137,11 @@ int shmem_read_to_iosys_map(struct file *file, loff_t off, if (IS_ERR(page)) return PTR_ERR(page); - vaddr = kmap(page); + vaddr = kmap_local_page(page); iosys_map_memcpy_to(map, map_off, vaddr + offset_in_page(off), this); mark_page_accessed(page); - kunmap(page); + kunmap_local(vaddr); put_page(page); len -= this; -- 2.37.1
[Intel-gfx] [PATCH 2/3] drm/i915/gt: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Since its use in i915/gt is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in i915/gt Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c| 11 --- 2 files changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index 6ebda3d65086..21d8ce40b897 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -747,7 +747,7 @@ static void swizzle_page(struct page *page) char *vaddr; int i; - vaddr = kmap(page); + vaddr = kmap_local_page(page); for (i = 0; i < PAGE_SIZE; i += 128) { memcpy(temp, [i], 64); @@ -755,7 +755,7 @@ static void swizzle_page(struct page *page) memcpy([i + 64], temp, 64); } - kunmap(page); + kunmap_local(vaddr); } /** diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 402f085f3a02..48edbb8a33e5 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -98,22 +98,19 @@ static int __shmem_rw(struct file *file, loff_t off, unsigned int this = min_t(size_t, PAGE_SIZE - offset_in_page(off), len); struct page *page; - void *vaddr; page = shmem_read_mapping_page_gfp(file->f_mapping, pfn, GFP_KERNEL); if (IS_ERR(page)) return PTR_ERR(page); - vaddr = kmap(page); if (write) { - memcpy(vaddr + offset_in_page(off), ptr, this); + memcpy_to_page(page, offset_in_page(off), ptr, this); set_page_dirty(page); } else { - memcpy(ptr, vaddr + offset_in_page(off), this); + memcpy_from_page(ptr, page, offset_in_page(off), this); } mark_page_accessed(page); - kunmap(page); put_page(page); len -= this; @@ -140,11 +137,11 @@ int shmem_read_to_iosys_map(struct file *file, loff_t off, if (IS_ERR(page)) return PTR_ERR(page); - vaddr = kmap(page); + vaddr = kmap_local_page(page); iosys_map_memcpy_to(map, map_off, vaddr + offset_in_page(off), this); mark_page_accessed(page); - kunmap(page); + kunmap_local(vaddr); put_page(page); len -= this; -- 2.37.1
[Intel-gfx] [PATCH 1/3] drm/i915: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Since its use in i915_gem.c is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in i915_gem.c Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/i915_gem.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 702e5b89be22..43effce60e1b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -212,14 +212,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret; - vaddr = kmap(page); + vaddr = kmap_local_page(page); if (needs_clflush) drm_clflush_virt_range(vaddr + offset, len); ret = __copy_to_user(user_data, vaddr + offset, len); - kunmap(page); + kunmap_local(vaddr); return ret ? -EFAULT : 0; } @@ -634,7 +634,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret; - vaddr = kmap(page); + vaddr = kmap_local_page(page); if (needs_clflush_before) drm_clflush_virt_range(vaddr + offset, len); @@ -643,7 +643,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, if (!ret && needs_clflush_after) drm_clflush_virt_range(vaddr + offset, len); - kunmap(page); + kunmap_local(vaddr); return ret ? -EFAULT : 0; } -- 2.37.1
[Intel-gfx] [PATCH 3/3] drm/i915/gem: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and are still valid. Since its use in i915/gem is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in i915/gem. Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco --- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 ++ drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 4eed3dd90ba8..2bc6ab9964ff 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -640,16 +640,14 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv, do { unsigned int len = min_t(typeof(size), size, PAGE_SIZE); struct page *page; - void *pgdata, *vaddr; + void *pgdata; err = aops->write_begin(file, file->f_mapping, offset, len, , ); if (err < 0) goto fail; - vaddr = kmap(page); - memcpy(vaddr, data, len); - kunmap(page); + memcpy_to_page(page, 0, data, len); err = aops->write_end(file, file->f_mapping, offset, len, len, page, pgdata); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 3ced9948a331..bb25b50b5688 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -153,7 +153,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(to_gt(i915)); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); - cpu = kmap(p) + offset_in_page(offset); + cpu = kmap_local_page(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n", @@ -171,7 +171,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu)); - kunmap(p); + kunmap_local(cpu); out: i915_gem_object_lock(obj, NULL); @@ -249,7 +249,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(to_gt(i915)); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); - cpu = kmap(p) + offset_in_page(offset); + cpu = kmap_local_page(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n", @@ -267,7 +267,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu)); - kunmap(p); + kunmap_local(cpu); if (err) return err; -- 2.37.1
[Intel-gfx] [PATCH 0/3] drm/i915: Replace kmap() with kmap_local_page()
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. Since its use in drm/i915 is safe everywhere, it should be preferred. Therefore, replace kmap() with kmap_local_page() in drm/i915. These changes should be tested in an 32 bits system, booting a kernel with HIGHMEM enabled. Unfortunately I have no i915 based hardware, therefore any help with testing would be greatly appreciated. Suggested-by: Ira Weiny Signed-off-by: Fabio M. De Francesco Fabio M. De Francesco (3): drm/i915: Replace kmap() with kmap_local_page() drm/i915/gt: Replace kmap() with kmap_local_page() drm/i915/gem: Replace kmap() with kmap_local_page() drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 6 ++ drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c | 11 --- drivers/gpu/drm/i915/i915_gem.c| 8 5 files changed, 16 insertions(+), 21 deletions(-) -- 2.37.1