Re: [Intel-gfx] [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
On Wed, Jun 11, 2014 at 02:17:21PM -0700, Volkin, Bradley D wrote: On Tue, Jun 10, 2014 at 04:14:41AM -0700, Chris Wilson wrote: On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8134MiB/s - 8154MiB/s Upload rate for 2 tiled surfaces: 8625MiB/s - 8632MiB/s Upload rate for 4 linear surfaces: 8127MiB/s - 8134MiB/s Upload rate for 4 tiled surfaces: 8602MiB/s - 8629MiB/s Upload rate for 8 linear surfaces: 8124MiB/s - 8137MiB/s Upload rate for 8 tiled surfaces: 8603MiB/s - 8624MiB/s Upload rate for 16 linear surfaces: 8123MiB/s - 8128MiB/s Upload rate for 16 tiled surfaces: 8606MiB/s - 8618MiB/s Upload rate for 32 linear surfaces: 8121MiB/s - 8128MiB/s Upload rate for 32 tiled surfaces: 8605MiB/s - 8614MiB/s Upload rate for 64 linear surfaces: 8121MiB/s - 8127MiB/s Upload rate for 64 tiled surfaces: 3017MiB/s - 5127MiB/s Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk The translation from vm_insert_pfn to remap_pfn_range looks correct. I don't know these APIs particularly well though. I wonder if there's any reason it would be unsafe to call remap_pfn_range from .fault() since it seems to only be used in .mmap() handlers in other places. Reading their implementations, nothing jumped out, so I'll say Most drivers just statically relocate their BAR to userspace, we manage it dynamically. Iirc vm_insert_pfn was more-or-less added for us, too. In any case I think we're safe. Reviewed-by: Brad Volkin bradley.d.vol...@intel.com with the note that you may want to take a look just in case. Thanks for the review, also added the Testcase: tag now that Chris pushed his igt patch. -Daniel --- drivers/gpu/drm/i915/i915_gem.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e1f68f06c2ef..7128d389a6ff 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1709,21 +1709,19 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) pfn = PAGE_SHIFT; if (!obj-fault_mappable) { - int i; - - for (i = 0; i obj-base.size PAGE_SHIFT; i++) { - ret = vm_insert_pfn(vma, - (unsigned long)vma-vm_start + i * PAGE_SIZE, - pfn + i); - if (ret) - break; - } - - obj-fault_mappable = true; - } else - ret = vm_insert_pfn(vma, - (unsigned long)vmf-virtual_address, - pfn + page_offset); + ret = remap_pfn_range(vma, + vma-vm_start, + pfn, + obj-base.size, + vma-vm_page_prot); + obj-fault_mappable = ret == 0; + } else { + ret = remap_pfn_range(vma, + (unsigned long)vmf-virtual_address, + pfn + page_offset, + PAGE_SIZE, + vma-vm_page_prot); + } unpin: i915_gem_object_ggtt_unpin(obj); unlock: -- 2.0.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
On Wed, Jun 11, 2014 at 02:17:21PM -0700, Volkin, Bradley D wrote: On Tue, Jun 10, 2014 at 04:14:41AM -0700, Chris Wilson wrote: On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, using i-g-t/gem_fence_upload: Upload rate for 2 linear surfaces: 8134MiB/s - 8154MiB/s Upload rate for 2 tiled surfaces: 8625MiB/s - 8632MiB/s Upload rate for 4 linear surfaces: 8127MiB/s - 8134MiB/s Upload rate for 4 tiled surfaces: 8602MiB/s - 8629MiB/s Upload rate for 8 linear surfaces: 8124MiB/s - 8137MiB/s Upload rate for 8 tiled surfaces: 8603MiB/s - 8624MiB/s Upload rate for 16 linear surfaces: 8123MiB/s - 8128MiB/s Upload rate for 16 tiled surfaces: 8606MiB/s - 8618MiB/s Upload rate for 32 linear surfaces: 8121MiB/s - 8128MiB/s Upload rate for 32 tiled surfaces: 8605MiB/s - 8614MiB/s Upload rate for 64 linear surfaces: 8121MiB/s - 8127MiB/s Upload rate for 64 tiled surfaces: 3017MiB/s - 5127MiB/s Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk Testcase: i-g-t/gem_fence_upload The translation from vm_insert_pfn to remap_pfn_range looks correct. I don't know these APIs particularly well though. I wonder if there's any reason it would be unsafe to call remap_pfn_range from .fault() since it seems to only be used in .mmap() handlers in other places. Reading their implementations, nothing jumped out, so I'll say Right, using it within the fault handler is fine. The underlying operations are equivalent to the vm_insert_pfn() just unrolled for a contiguous range. So the reason why other drivers do it in their mmap handler rather than at fault is that they have a radically simpler driver model than we do. An exercise for the reader would be to erradicate the walk_system_ram() still required by remap_pfn_range() which we avoided in our prototypical vm_insert_pfn_from_io_mapping(). [We could simply do a remap_pfn_from_io_mapping that shared the core with remap_pfn_range I guess.] -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
On Wed, Jun 11, 2014 at 02:17:21PM -0700, Volkin, Bradley D wrote: On Tue, Jun 10, 2014 at 04:14:41AM -0700, Chris Wilson wrote: On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8134MiB/s - 8154MiB/s Upload rate for 2 tiled surfaces: 8625MiB/s - 8632MiB/s Upload rate for 4 linear surfaces: 8127MiB/s - 8134MiB/s Upload rate for 4 tiled surfaces: 8602MiB/s - 8629MiB/s Upload rate for 8 linear surfaces: 8124MiB/s - 8137MiB/s Upload rate for 8 tiled surfaces: 8603MiB/s - 8624MiB/s Upload rate for 16 linear surfaces: 8123MiB/s - 8128MiB/s Upload rate for 16 tiled surfaces: 8606MiB/s - 8618MiB/s Upload rate for 32 linear surfaces: 8121MiB/s - 8128MiB/s Upload rate for 32 tiled surfaces: 8605MiB/s - 8614MiB/s Upload rate for 64 linear surfaces: 8121MiB/s - 8127MiB/s Upload rate for 64 tiled surfaces: 3017MiB/s - 5127MiB/s Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk The translation from vm_insert_pfn to remap_pfn_range looks correct. I don't know these APIs particularly well though. I wonder if there's any reason it would be unsafe to call remap_pfn_range from .fault() since it seems to only be used in .mmap() handlers in other places. Reading their implementations, nothing jumped out, so I'll say So apparently wasn't quite the same semantics and remap_pfn_range doesn't handle concurrent operations too well. Can you please also review Chris' fixup patch for that? Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: Use remap_pfn_range() to prefault all PTE in a single pass
On Tue, Jun 10, 2014 at 04:14:41AM -0700, Chris Wilson wrote: On an Ivybridge i7-3720qm with 1600MHz DDR3, with 32 fences, Upload rate for 2 linear surfaces: 8134MiB/s - 8154MiB/s Upload rate for 2 tiled surfaces: 8625MiB/s - 8632MiB/s Upload rate for 4 linear surfaces: 8127MiB/s - 8134MiB/s Upload rate for 4 tiled surfaces: 8602MiB/s - 8629MiB/s Upload rate for 8 linear surfaces: 8124MiB/s - 8137MiB/s Upload rate for 8 tiled surfaces: 8603MiB/s - 8624MiB/s Upload rate for 16 linear surfaces: 8123MiB/s - 8128MiB/s Upload rate for 16 tiled surfaces: 8606MiB/s - 8618MiB/s Upload rate for 32 linear surfaces: 8121MiB/s - 8128MiB/s Upload rate for 32 tiled surfaces: 8605MiB/s - 8614MiB/s Upload rate for 64 linear surfaces: 8121MiB/s - 8127MiB/s Upload rate for 64 tiled surfaces: 3017MiB/s - 5127MiB/s Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk The translation from vm_insert_pfn to remap_pfn_range looks correct. I don't know these APIs particularly well though. I wonder if there's any reason it would be unsafe to call remap_pfn_range from .fault() since it seems to only be used in .mmap() handlers in other places. Reading their implementations, nothing jumped out, so I'll say Reviewed-by: Brad Volkin bradley.d.vol...@intel.com with the note that you may want to take a look just in case. --- drivers/gpu/drm/i915/i915_gem.c | 28 +--- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e1f68f06c2ef..7128d389a6ff 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1709,21 +1709,19 @@ int i915_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf) pfn = PAGE_SHIFT; if (!obj-fault_mappable) { - int i; - - for (i = 0; i obj-base.size PAGE_SHIFT; i++) { - ret = vm_insert_pfn(vma, - (unsigned long)vma-vm_start + i * PAGE_SIZE, - pfn + i); - if (ret) - break; - } - - obj-fault_mappable = true; - } else - ret = vm_insert_pfn(vma, - (unsigned long)vmf-virtual_address, - pfn + page_offset); + ret = remap_pfn_range(vma, + vma-vm_start, + pfn, + obj-base.size, + vma-vm_page_prot); + obj-fault_mappable = ret == 0; + } else { + ret = remap_pfn_range(vma, + (unsigned long)vmf-virtual_address, + pfn + page_offset, + PAGE_SIZE, + vma-vm_page_prot); + } unpin: i915_gem_object_ggtt_unpin(obj); unlock: -- 2.0.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx