Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-05-16 Thread Volkin, Bradley D
Reviewed-by: Brad Volkin bradley.d.vol...@intel.com 

On Fri, May 16, 2014 at 02:22:37PM +0100, Chris Wilson wrote:
 By exporting the ability to map user address and inserting PTEs
 representing their backing pages into the GTT, we can exploit UMA in order
 to utilize normal application data as a texture source or even as a
 render target (depending upon the capabilities of the chipset). This has
 a number of uses, with zero-copy downloads to the GPU and efficient
 readback making the intermixed streaming of CPU and GPU operations
 fairly efficient. This ability has many widespread implications from
 faster rendering of client-side software rasterisers (chromium),
 mitigation of stalls due to read back (firefox) and to faster pipelining
 of texture data (such as pixel buffer objects in GL or data blobs in CL).
 
 v2: Compile with CONFIG_MMU_NOTIFIER
 v3: We can sleep while performing invalidate-range, which we can utilise
 to drop our page references prior to the kernel manipulating the vma
 (for either discard or cloning) and so protect normal users.
 v4: Only run the invalidate notifier if the range intercepts the bo.
 v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
 v6: Recheck after reacquire mutex for lost mmu.
 v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
 v8: Fix rebasing error after forwarding porting the back port.
 v9: Limit the userptr to page aligned entries. We now expect userspace
 to handle all the offset-in-page adjustments itself.
 v10: Prevent vma from being copied across fork to avoid issues with cow.
 v11: Drop vma behaviour changes -- locking is nigh on impossible.
  Use a worker to load user pages to avoid lock inversions.
 v12: Use get_task_mm()/mmput() for correct refcounting of mm.
 v13: Use a worker to release the mmu_notifier to avoid lock inversion
 v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
  with its own locking and tree of objects for each mm/mmu_notifier.
 v15: Prevent overlapping userptr objects, and invalidate all objects
  within the mmu_notifier range
 v16: Fix a typo for iterating over multiple objects in the range and
  rearrange error path to destroy the mmu_notifier locklessly.
  Also close a race between invalidate_range and the get_pages_worker.
 v17: Close a race between get_pages_worker/invalidate_range and fresh
  allocations of the same userptr range - and notice that
  struct_mutex was presumed to be held when during creation it wasn't.
 v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
  for the struct sg_table and to clear it before reporting an error.
 v19: Always error out on read-only userptr requests as we don't have the
  hardware infrastructure to support them at the moment.
 v20: Refuse to implement read-only support until we have the required
  infrastructure - but reserve the bit in flags for future use.
 v21: use_mm() is not required for get_user_pages(). It is only meant to
  be used to fix up the kernel thread's current-mm for use with
  copy_user().
 v22: Use sg_alloc_table_from_pages for that chunky feeling
 v23: Export a function for sanity checking dma-buf rather than encode
  userptr details elsewhere, and clean up comments based on
  suggestions by Bradley.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com
 Cc: Gong, Zhipeng zhipeng.g...@intel.com
 Cc: Akash Goel akash.g...@intel.com
 Cc: Volkin, Bradley D bradley.d.vol...@intel.com
 Reviewed-by: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com
 ---
  drivers/gpu/drm/i915/Kconfig|   1 +
  drivers/gpu/drm/i915/Makefile   |   1 +
  drivers/gpu/drm/i915/i915_dma.c |   1 +
  drivers/gpu/drm/i915/i915_drv.h |  25 +-
  drivers/gpu/drm/i915/i915_gem.c |   4 +
  drivers/gpu/drm/i915/i915_gem_dmabuf.c  |   8 +
  drivers/gpu/drm/i915/i915_gem_userptr.c | 711 
 
  drivers/gpu/drm/i915/i915_gpu_error.c   |   2 +
  include/uapi/drm/i915_drm.h |  16 +
  9 files changed, 768 insertions(+), 1 deletion(-)
  create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c
 
 diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
 index e4e3c01b8cbc..437e1824d0bf 100644
 --- a/drivers/gpu/drm/i915/Kconfig
 +++ b/drivers/gpu/drm/i915/Kconfig
 @@ -5,6 +5,7 @@ config DRM_I915
   depends on (AGP || AGP=n)
   select INTEL_GTT
   select AGP_INTEL if AGP
 + select INTERVAL_TREE
   # we need shmfs for the swappable backing store, and in particular
   # the shmem_readpage() which depends upon tmpfs
   select SHMEM
 diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
 index b6ce5640d592..fa9e806259ba 100644
 --- a/drivers/gpu/drm/i915/Makefile
 +++ b/drivers/gpu/drm/i915/Makefile
 @@ -28,6 +28,7 @@ i915-y += i915_cmd_parser.o \
 i915_gem.o \
 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-05-16 Thread Daniel Vetter
On Fri, May 16, 2014 at 08:34:52AM -0700, Volkin, Bradley D wrote:
 Reviewed-by: Brad Volkin bradley.d.vol...@intel.com 
 
 On Fri, May 16, 2014 at 02:22:37PM +0100, Chris Wilson wrote:
  By exporting the ability to map user address and inserting PTEs
  representing their backing pages into the GTT, we can exploit UMA in order
  to utilize normal application data as a texture source or even as a
  render target (depending upon the capabilities of the chipset). This has
  a number of uses, with zero-copy downloads to the GPU and efficient
  readback making the intermixed streaming of CPU and GPU operations
  fairly efficient. This ability has many widespread implications from
  faster rendering of client-side software rasterisers (chromium),
  mitigation of stalls due to read back (firefox) and to faster pipelining
  of texture data (such as pixel buffer objects in GL or data blobs in CL).
  
  v2: Compile with CONFIG_MMU_NOTIFIER
  v3: We can sleep while performing invalidate-range, which we can utilise
  to drop our page references prior to the kernel manipulating the vma
  (for either discard or cloning) and so protect normal users.
  v4: Only run the invalidate notifier if the range intercepts the bo.
  v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
  v6: Recheck after reacquire mutex for lost mmu.
  v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
  v8: Fix rebasing error after forwarding porting the back port.
  v9: Limit the userptr to page aligned entries. We now expect userspace
  to handle all the offset-in-page adjustments itself.
  v10: Prevent vma from being copied across fork to avoid issues with cow.
  v11: Drop vma behaviour changes -- locking is nigh on impossible.
   Use a worker to load user pages to avoid lock inversions.
  v12: Use get_task_mm()/mmput() for correct refcounting of mm.
  v13: Use a worker to release the mmu_notifier to avoid lock inversion
  v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
   with its own locking and tree of objects for each mm/mmu_notifier.
  v15: Prevent overlapping userptr objects, and invalidate all objects
   within the mmu_notifier range
  v16: Fix a typo for iterating over multiple objects in the range and
   rearrange error path to destroy the mmu_notifier locklessly.
   Also close a race between invalidate_range and the get_pages_worker.
  v17: Close a race between get_pages_worker/invalidate_range and fresh
   allocations of the same userptr range - and notice that
   struct_mutex was presumed to be held when during creation it wasn't.
  v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
   for the struct sg_table and to clear it before reporting an error.
  v19: Always error out on read-only userptr requests as we don't have the
   hardware infrastructure to support them at the moment.
  v20: Refuse to implement read-only support until we have the required
   infrastructure - but reserve the bit in flags for future use.
  v21: use_mm() is not required for get_user_pages(). It is only meant to
   be used to fix up the kernel thread's current-mm for use with
   copy_user().
  v22: Use sg_alloc_table_from_pages for that chunky feeling
  v23: Export a function for sanity checking dma-buf rather than encode
   userptr details elsewhere, and clean up comments based on
   suggestions by Bradley.
  
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
  Cc: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com
  Cc: Gong, Zhipeng zhipeng.g...@intel.com
  Cc: Akash Goel akash.g...@intel.com
  Cc: Volkin, Bradley D bradley.d.vol...@intel.com
  Reviewed-by: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com

Bring on the champagne!

Queued for -next, thanks for the patch.
-Daniel

  ---
   drivers/gpu/drm/i915/Kconfig|   1 +
   drivers/gpu/drm/i915/Makefile   |   1 +
   drivers/gpu/drm/i915/i915_dma.c |   1 +
   drivers/gpu/drm/i915/i915_drv.h |  25 +-
   drivers/gpu/drm/i915/i915_gem.c |   4 +
   drivers/gpu/drm/i915/i915_gem_dmabuf.c  |   8 +
   drivers/gpu/drm/i915/i915_gem_userptr.c | 711 
  
   drivers/gpu/drm/i915/i915_gpu_error.c   |   2 +
   include/uapi/drm/i915_drm.h |  16 +
   9 files changed, 768 insertions(+), 1 deletion(-)
   create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c
  
  diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
  index e4e3c01b8cbc..437e1824d0bf 100644
  --- a/drivers/gpu/drm/i915/Kconfig
  +++ b/drivers/gpu/drm/i915/Kconfig
  @@ -5,6 +5,7 @@ config DRM_I915
  depends on (AGP || AGP=n)
  select INTEL_GTT
  select AGP_INTEL if AGP
  +   select INTERVAL_TREE
  # we need shmfs for the swappable backing store, and in particular
  # the shmem_readpage() which depends upon tmpfs
  select SHMEM
  diff --git a/drivers/gpu/drm/i915/Makefile 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-02-04 Thread Daniel Vetter
On Mon, Feb 03, 2014 at 03:28:37PM +, Tvrtko Ursulin wrote:
 
 On 01/29/2014 08:34 PM, Daniel Vetter wrote:
 Actually I've found something else to complain about:
 
 On Tue, Jan 28, 2014 at 2:16 PM, Chris Wilson ch...@chris-wilson.co.uk 
 wrote:
 +#define I915_USERPTR_READ_ONLY 0x1
 
 This smells like an insta-root-exploit:
 1. mmap /lib/ld-linux.so as read-only
 2. userptr bind that mmap'ed area as READ_ONLY
 3. blit exploit code over it
 4. profit
 
 I also don't see a way we could fix this, at least without the
 hardware providing read-only modes in the ptes. Which also requires us
 to actually trust it to follow them, even when they exists ...
 
 Would disallowing mapping of shared pages help and be acceptable
 considering intended use cases?

The above exploit is the simplest one I could come up with, but I expect
the vm in general won't be too happy if we write to pages it never expects
are written to. We could do fun stuff like corrupt pagecache or swap
cache. Which in conjunction with stable kernel pages (which some I/O paths
needed) is rather likely to result in havoc.

Essentially I'm no vm expert, and this definitely needs a full vm audit
even before considering it at all. So I'd like to drop support for it in
the initial version ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-02-03 Thread Tvrtko Ursulin


On 01/29/2014 08:34 PM, Daniel Vetter wrote:

Actually I've found something else to complain about:

On Tue, Jan 28, 2014 at 2:16 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:

+#define I915_USERPTR_READ_ONLY 0x1


This smells like an insta-root-exploit:
1. mmap /lib/ld-linux.so as read-only
2. userptr bind that mmap'ed area as READ_ONLY
3. blit exploit code over it
4. profit

I also don't see a way we could fix this, at least without the
hardware providing read-only modes in the ptes. Which also requires us
to actually trust it to follow them, even when they exists ...


Would disallowing mapping of shared pages help and be acceptable 
considering intended use cases?


Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-02-03 Thread Tvrtko Ursulin

On 01/30/2014 11:06 AM, Chris Wilson wrote:

On Wed, Jan 29, 2014 at 10:58:48PM +0100, Daniel Vetter wrote:

On Wed, Jan 29, 2014 at 10:53 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:

On Wed, Jan 29, 2014 at 09:25:51PM +0100, Daniel Vetter wrote:

So originally I've thought we need this due to the massive overhead of the
mmu notifier. But now with the nice shared mmu notifiers I've thought that
overhead is gone I prefer to also ditch this option.

Same goes about the MMU_NOTIFIER conditional code, imo we simply should
select this - most distros will have it anyway and users will be really
suprised if they lose userspace driver features for seemingly irrelevant
reasons.


Seriously? You think the overhead is magically gone?


Well the once-per-process overhead is still there, and imo it's ok to
eat that. But the complaints I've heard concerned the per-object
overhead, so I wonder how much of that is still relevant.


I am still annoyed by the thought of having to enable an extra feature
in my kernels, and the extra code that is then run on every mm
operation. (Mixing mmu_notifiers + mm debuging was an especially
unpleasant experience that I don't wish to ever do again.)

Numbers talk though, if we can't demonstrate a significant difference
between the two, it can die. Keeping a debug mode to turn off
mmu_notifiers would still be good so that we can keep track of any
impact over time.


Writing a benchmark for this is next on my userptr to do list following 
completing of the i-g-t test case.


Btw, I did not notice you are discussing this sooner since I got dropped 
from Cc. Only when Rafael mentioned he saw some discussion about 
potential exploit I went looking.


Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-01-30 Thread Chris Wilson
On Wed, Jan 29, 2014 at 10:58:48PM +0100, Daniel Vetter wrote:
 On Wed, Jan 29, 2014 at 10:53 PM, Chris Wilson ch...@chris-wilson.co.uk 
 wrote:
  On Wed, Jan 29, 2014 at 09:25:51PM +0100, Daniel Vetter wrote:
  So originally I've thought we need this due to the massive overhead of the
  mmu notifier. But now with the nice shared mmu notifiers I've thought that
  overhead is gone I prefer to also ditch this option.
 
  Same goes about the MMU_NOTIFIER conditional code, imo we simply should
  select this - most distros will have it anyway and users will be really
  suprised if they lose userspace driver features for seemingly irrelevant
  reasons.
 
  Seriously? You think the overhead is magically gone?
 
 Well the once-per-process overhead is still there, and imo it's ok to
 eat that. But the complaints I've heard concerned the per-object
 overhead, so I wonder how much of that is still relevant.

I am still annoyed by the thought of having to enable an extra feature
in my kernels, and the extra code that is then run on every mm
operation. (Mixing mmu_notifiers + mm debuging was an especially
unpleasant experience that I don't wish to ever do again.)

Numbers talk though, if we can't demonstrate a significant difference
between the two, it can die. Keeping a debug mode to turn off
mmu_notifiers would still be good so that we can keep track of any
impact over time.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-01-29 Thread Daniel Vetter
On Tue, Jan 28, 2014 at 01:16:46PM +, Chris Wilson wrote:
 By exporting the ability to map user address and inserting PTEs
 representing their backing pages into the GTT, we can exploit UMA in order
 to utilize normal application data as a texture source or even as a
 render target (depending upon the capabilities of the chipset). This has
 a number of uses, with zero-copy downloads to the GPU and efficient
 readback making the intermixed streaming of CPU and GPU operations
 fairly efficient. This ability has many widespread implications from
 faster rendering of client-side software rasterisers (chromium),
 mitigation of stalls due to read back (firefox) and to faster pipelining
 of texture data (such as pixel buffer objects in GL or data blobs in CL).
 
 v2: Compile with CONFIG_MMU_NOTIFIER
 v3: We can sleep while performing invalidate-range, which we can utilise
 to drop our page references prior to the kernel manipulating the vma
 (for either discard or cloning) and so protect normal users.
 v4: Only run the invalidate notifier if the range intercepts the bo.
 v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
 v6: Recheck after reacquire mutex for lost mmu.
 v7: Fix implicit padding of ioctl struct by rounding to next 64bit boundary.
 v8: Fix rebasing error after forwarding porting the back port.
 v9: Limit the userptr to page aligned entries. We now expect userspace
 to handle all the offset-in-page adjustments itself.
 v10: Prevent vma from being copied across fork to avoid issues with cow.
 v11: Drop vma behaviour changes -- locking is nigh on impossible.
  Use a worker to load user pages to avoid lock inversions.
 v12: Use get_task_mm()/mmput() for correct refcounting of mm.
 v13: Use a worker to release the mmu_notifier to avoid lock inversion
 v14: Decouple mmu_notifier from struct_mutex using a custom mmu_notifer
  with its own locking and tree of objects for each mm/mmu_notifier.
 v15: Prevent overlapping userptr objects, and invalidate all objects
  within the mmu_notifier range
 v16: Fix a typo for iterating over multiple objects in the range and
  rearrange error path to destroy the mmu_notifier locklessly.
  Also close a race between invalidate_range and the get_pages_worker.
 v17: Close a race between get_pages_worker/invalidate_range and fresh
  allocations of the same userptr range - and notice that
  struct_mutex was presumed to be held when during creation it wasn't.
 v18: Sigh. Fix the refactor of st_set_pages() to allocate enough memory
  for the struct sg_table and to clear it before reporting an error.
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 Cc: Tvrtko Ursulin tvrtko.ursu...@linux.intel.com
 Cc: Gong, Zhipeng zhipeng.g...@intel.com
 Cc: Akash Goel akash.g...@intel.com
 Cc: Volkin, Bradley D bradley.d.vol...@intel.com

[snip]

 diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
 index 37c8073a8246..6c145a0be250 100644
 --- a/include/uapi/drm/i915_drm.h
 +++ b/include/uapi/drm/i915_drm.h
 @@ -224,6 +224,7 @@ typedef struct _drm_i915_sarea {
  #define DRM_I915_REG_READ0x31
  #define DRM_I915_GET_RESET_STATS 0x32
  #define DRM_I915_GEM_CREATE2 0x33
 +#define DRM_I915_GEM_USERPTR 0x34
  
  #define DRM_IOCTL_I915_INIT  DRM_IOW( DRM_COMMAND_BASE + 
 DRM_I915_INIT, drm_i915_init_t)
  #define DRM_IOCTL_I915_FLUSH DRM_IO ( DRM_COMMAND_BASE + 
 DRM_I915_FLUSH)
 @@ -275,6 +276,7 @@ typedef struct _drm_i915_sarea {
  #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY   DRM_IOW (DRM_COMMAND_BASE + 
 DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
  #define DRM_IOCTL_I915_REG_READ  DRM_IOWR 
 (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
  #define DRM_IOCTL_I915_GET_RESET_STATS   DRM_IOWR 
 (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
 +#define DRM_IOCTL_I915_GEM_USERPTR   DRM_IOWR 
 (DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
  
  /* Allow drivers to submit batchbuffers directly to hardware, relying
   * on the security mechanisms provided by hardware.
 @@ -1129,4 +1131,18 @@ struct drm_i915_reset_stats {
   __u32 pad;
  };
  
 +struct drm_i915_gem_userptr {
 + __u64 user_ptr;
 + __u64 user_size;
 + __u32 flags;
 +#define I915_USERPTR_READ_ONLY 0x1
 +#define I915_USERPTR_UNSYNCHRONIZED 0x8000

So originally I've thought we need this due to the massive overhead of the
mmu notifier. But now with the nice shared mmu notifiers I've thought that
overhead is gone I prefer to also ditch this option.

Same goes about the MMU_NOTIFIER conditional code, imo we simply should
select this - most distros will have it anyway and users will be really
suprised if they lose userspace driver features for seemingly irrelevant
reasons.

Beside this I think I've run out of stuff to complain about ;-)

Cheers, Daniel


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-01-29 Thread Daniel Vetter
Actually I've found something else to complain about:

On Tue, Jan 28, 2014 at 2:16 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 +#define I915_USERPTR_READ_ONLY 0x1

This smells like an insta-root-exploit:
1. mmap /lib/ld-linux.so as read-only
2. userptr bind that mmap'ed area as READ_ONLY
3. blit exploit code over it
4. profit

I also don't see a way we could fix this, at least without the
hardware providing read-only modes in the ptes. Which also requires us
to actually trust it to follow them, even when they exists ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-01-29 Thread Chris Wilson
On Wed, Jan 29, 2014 at 09:34:42PM +0100, Daniel Vetter wrote:
 Actually I've found something else to complain about:
 
 On Tue, Jan 28, 2014 at 2:16 PM, Chris Wilson ch...@chris-wilson.co.uk 
 wrote:
  +#define I915_USERPTR_READ_ONLY 0x1
 
 This smells like an insta-root-exploit:
 1. mmap /lib/ld-linux.so as read-only
 2. userptr bind that mmap'ed area as READ_ONLY
 3. blit exploit code over it
 4. profit
 
 I also don't see a way we could fix this, at least without the
 hardware providing read-only modes in the ptes. Which also requires us
 to actually trust it to follow them, even when they exists ...

Allow it for root only code then, unless we can expose it on supported
hw ;-)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-01-29 Thread Chris Wilson
On Wed, Jan 29, 2014 at 09:25:51PM +0100, Daniel Vetter wrote:
 So originally I've thought we need this due to the massive overhead of the
 mmu notifier. But now with the nice shared mmu notifiers I've thought that
 overhead is gone I prefer to also ditch this option.
 
 Same goes about the MMU_NOTIFIER conditional code, imo we simply should
 select this - most distros will have it anyway and users will be really
 suprised if they lose userspace driver features for seemingly irrelevant
 reasons.

Seriously? You think the overhead is magically gone?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-01-29 Thread Daniel Vetter
On Wed, Jan 29, 2014 at 10:53 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 On Wed, Jan 29, 2014 at 09:25:51PM +0100, Daniel Vetter wrote:
 So originally I've thought we need this due to the massive overhead of the
 mmu notifier. But now with the nice shared mmu notifiers I've thought that
 overhead is gone I prefer to also ditch this option.

 Same goes about the MMU_NOTIFIER conditional code, imo we simply should
 select this - most distros will have it anyway and users will be really
 suprised if they lose userspace driver features for seemingly irrelevant
 reasons.

 Seriously? You think the overhead is magically gone?

Well the once-per-process overhead is still there, and imo it's ok to
eat that. But the complaints I've heard concerned the per-object
overhead, so I wonder how much of that is still relevant.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-01-27 Thread Volkin, Bradley D
Hi Chris,

A few questions/comments throughout. I may be off the mark on some. Please
bear with me as I try to get more familiar with the gem code.

Thanks,
Brad

[ snip ]

On Fri, Jan 24, 2014 at 01:00:19AM -0800, Chris Wilson wrote:
 +static void
 +__i915_mmu_notifier_destroy_worker(struct work_struct *work)
 +{
 + struct i915_mmu_notifier *mmu = container_of(work, typeof(*mmu), work);
 + mmu_notifier_unregister(mmu-mn, mmu-mm);
 + kfree(mmu);
 +}
 +
 +static void
 +__i915_mmu_notifier_destroy(struct i915_mmu_notifier *mmu)
 +{
 + hash_del(mmu-node);
 + INIT_WORK(mmu-work, __i915_mmu_notifier_destroy_worker);
 + schedule_work(mmu-work);

The commit message mentions a potential lock inversion as the reason for using 
a wq.
A comment with the details might be helpful.

 +}
 +
 +static void __i915_mmu_notifier_update_serial(struct i915_mmu_notifier *mmu)
 +{
 + if (++mmu-serial == 0)
 + mmu-serial = 1;
 +}
 +
 +static void
 +i915_mmu_notifier_del(struct i915_mmu_notifier *mmu,
 +   struct i915_mmu_object *mn)
 +{
 + bool destroy;
 +
 + spin_lock(mmu-lock);
 + interval_tree_remove(mn-it, mmu-objects);
 + destroy = --mmu-count == 0;
 + __i915_mmu_notifier_update_serial(mmu);
 + spin_unlock(mmu-lock);
 +
 + if (destroy) /* protected against _add() by struct_mutex */
 + __i915_mmu_notifier_destroy(mmu);

I see that we should hold struct_mutex when this function is called,
but I don't see that we try to get the mutex anywhere on the _add() path.
Have I missed something?

 +}
 +
 +static int
 +i915_mmu_notifier_add(struct i915_mmu_notifier *mmu,
 +   struct i915_mmu_object *mn)
 +{
 + int ret = -EINVAL;
 +
 + spin_lock(mmu-lock);
 + /* Disallow overlapping userptr objects */
 + if (!interval_tree_iter_first(mmu-objects,
 +   mn-it.start, mn-it.last)) {
 + interval_tree_insert(mn-it, mmu-objects);
 + mmu-count++;
 + __i915_mmu_notifier_update_serial(mmu);
 + ret = 0;
 + }
 + spin_unlock(mmu-lock);
 +
 + return ret;
 +}
 +
 +static void
 +i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
 +{
 + struct i915_mmu_object *mn;
 +
 + mn = obj-userptr.mn;
 + if (mn == NULL)
 + return;
 +
 + i915_mmu_notifier_del(mn-mmu, mn);
 + obj-userptr.mn = NULL;
 +}
 +
 +static int
 +i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
 + unsigned flags)
 +{
 + struct i915_mmu_notifier *mmu;
 + struct i915_mmu_object *mn;
 + int ret;
 +
 + if (flags  I915_USERPTR_UNSYNCHRONIZED)
 + return capable(CAP_SYS_ADMIN) ? 0 : -EPERM;
 +
 + mmu = i915_mmu_notifier_get(obj-base.dev, obj-userptr.mm);
 + if (IS_ERR(mmu))
 + return PTR_ERR(mmu);
 +
 + mn = kzalloc(sizeof(*mn), GFP_KERNEL);
 + if (mn == NULL) {
 + ret = -ENOMEM;
 + goto destroy_mmu;
 + }
 +
 + mn-mmu = mmu;
 + mn-it.start = obj-userptr.ptr;
 + mn-it.last = mn-it.start + obj-base.size - 1;
 + mn-obj = obj;
 +
 + ret = i915_mmu_notifier_add(mmu, mn);
 + if (ret)
 + goto free_mn;
 +
 + obj-userptr.mn = mn;
 + return 0;
 +
 +free_mn:
 + kfree(mn);
 +destroy_mmu:
 + if (mmu-count == 0)
 + __i915_mmu_notifier_destroy(mmu);

Other accesses to mmu-count are protected by mmu-lock. Again, I may have 
missed
something but don't immediately see why that's not required.

 + return ret;
 +}
 +
 +#else
 +
 +static void
 +i915_gem_userptr_release__mmu_notifier(struct drm_i915_gem_object *obj)
 +{
 +}
 +
 +static int
 +i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
 + unsigned flags)
 +{
 + if ((flags  I915_USERPTR_UNSYNCHRONIZED) == 0)
 + return -ENODEV;
 +
 + if (!capable(CAP_SYS_ADMIN))
 + return -EPERM;
 +
 + return 0;
 +}
 +#endif
 +
 +struct get_pages_work {
 + struct work_struct work;
 + struct drm_i915_gem_object *obj;
 + struct task_struct *task;
 +};
 +
 +static void
 +__i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 +{
 + struct get_pages_work *work = container_of(_work, typeof(*work), work);
 + struct drm_i915_gem_object *obj = work-obj;
 + struct drm_device *dev = obj-base.dev;
 + const int num_pages = obj-base.size  PAGE_SHIFT;
 + struct page **pvec;
 + int pinned, ret;
 +
 + ret = -ENOMEM;
 + pinned = 0;
 +
 + pvec = kmalloc(num_pages*sizeof(struct page *),
 +GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 + if (pvec == NULL)
 + pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
 + if (pvec != NULL) {
 + struct mm_struct *mm = obj-userptr.mm;
 +
 + use_mm(mm);
 + down_read(mm-mmap_sem);
 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2014-01-27 Thread Chris Wilson
On Mon, Jan 27, 2014 at 09:56:12AM -0800, Volkin, Bradley D wrote:
  +static void
  +i915_mmu_notifier_del(struct i915_mmu_notifier *mmu,
  + struct i915_mmu_object *mn)
  +{
  +   bool destroy;
  +
  +   spin_lock(mmu-lock);
  +   interval_tree_remove(mn-it, mmu-objects);
  +   destroy = --mmu-count == 0;
  +   __i915_mmu_notifier_update_serial(mmu);
  +   spin_unlock(mmu-lock);
  +
  +   if (destroy) /* protected against _add() by struct_mutex */
  +   __i915_mmu_notifier_destroy(mmu);
 
 I see that we should hold struct_mutex when this function is called,
 but I don't see that we try to get the mutex anywhere on the _add() path.
 Have I missed something?

No, it's fixed in a later patch. I assumed I had the lock taken in the
outermost ioctl routine.

  +free_mn:
  +   kfree(mn);
  +destroy_mmu:
  +   if (mmu-count == 0)
  +   __i915_mmu_notifier_destroy(mmu);
 
 Other accesses to mmu-count are protected by mmu-lock. Again, I may have 
 missed
 something but don't immediately see why that's not required.

mmu-count is protected by struct_mutex. See above.

  +   if (pinned  num_pages) {
  +   if (pinned  0) {
  +   ret = pinned;
  +   pinned = 0;
  +   } else {
  +   /* Spawn a worker so that we can acquire the
  +* user pages without holding our mutex.
  +*/
  +   ret = -EAGAIN;
  +   if (obj-userptr.work == NULL) {
  +   struct get_pages_work *work;
  +
  +   work = kmalloc(sizeof(*work), GFP_KERNEL);
  +   if (work != NULL) {
  +   obj-userptr.work = work-work;
  +
  +   work-obj = obj;
  +   drm_gem_object_reference(obj-base);
  +
  +   work-task = current;
  +   get_task_struct(work-task);
  +
  +   INIT_WORK(work-work, 
  __i915_gem_userptr_get_pages_worker);
  +   schedule_work(work-work);
 
 Any reason to use the system wq instead of the driver wq here?
 It doesn't look like it's the usual takes modeset locks justification.

Performance. Using the driver workqueue would serialise the clients, but
using the system workqueue we can do the pagefaulting in parallel.
 
  +   } else
  +   ret = -ENOMEM;
  +   } else {
  +   if (IS_ERR(obj-userptr.work)) {
 
 } else if (...) { ?

No, I think it is clearer as is.
 
  +   ret = PTR_ERR(obj-userptr.work);
  +   obj-userptr.work = NULL;
  +   }
  +   }
  +   }
  +   } else {
  +   struct sg_table *st;
  +
  +   st = kmalloc(sizeof(*st), GFP_KERNEL);
  +   if (st == NULL || sg_alloc_table(st, num_pages, GFP_KERNEL)) {
  +   kfree(st);
  +   ret = -ENOMEM;
  +   } else {
  +   struct scatterlist *sg;
  +   int n;
  +
  +   for_each_sg(st-sgl, sg, num_pages, n)
  +   sg_set_page(sg, pvec[n], PAGE_SIZE, 0);
  +
  +   obj-pages = st;
  +   obj-userptr.work = NULL;
  +
  +   pinned = 0;
  +   ret = 0;
  +   }
 
 This block is almost identical to code in the worker. Would it be worth 
 extracting
 the common parts into a helper?

Almost, but subtly and importantly different. Only the loop was the same
at which point I didn't feel like the saving was significant. It is now
even less similar.
 
  +   }
  +
  +   release_pages(pvec, pinned, 0);
  +   drm_free_large(pvec);
  +   return ret;
  +}
  +
  +static void
  +i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
  +{
  +   struct scatterlist *sg;
  +   int i;
  +
  +   if (obj-madv != I915_MADV_WILLNEED)
  +   obj-dirty = 0;
 
 This is subtly different than similar code in the standard put_pages() in that
 it sets dirty=0 for both DONTNEED and PURGED instead of just DONTNEED (w/ 
 BUG_ON(PURGED)).
 I don't think we will ever actually truncate userptr objects, so is there any 
 reason for
 this to be different?

No, there's no reason for the difference. Semantically it is the same, of
course.

  +/**
  + * Creates a new mm object that wraps some normal memory from the process
  + * context - user memory.
  + *
  + * We impose several restrictions upon the memory being mapped
  + * into the GPU.
  + * 1. It must be page aligned (both start/end addresses, i.e ptr and size).
  + * 2. We only allow a bo as large as we could in theory map into the GTT,
  + *that is we limit the size to the total size of the GTT.
  + * 3. The bo is 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-06-24 Thread Jesse Barnes
On Mon, 8 Apr 2013 21:24:58 +0200
Daniel Vetter dan...@ffwll.ch wrote:

 On Mon, Apr 8, 2013 at 7:40 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
  On Mon, Apr 08, 2013 at 07:18:11PM +0200, Daniel Vetter wrote:
  On Tue, Feb 12, 2013 at 02:17:22PM +, Chris Wilson wrote:
   By exporting the ability to map user address and inserting PTEs
   representing their backing pages into the GTT, we can exploit UMA in 
   order
   to utilize normal application data as a texture source or even as a
   render target (depending upon the capabilities of the chipset). This has
   a number of uses, with zero-copy downloads to the GPU and efficient
   readback making the intermixed streaming of CPU and GPU operations
   fairly efficient. This ability has many widespread implications from
   faster rendering of client-side software rasterisers (chromium),
   mitigation of stalls due to read back (firefox) and to faster pipelining
   of texture data (such as pixel buffer objects in GL or data blobs in CL).
  
   v2: Compile with CONFIG_MMU_NOTIFIER
   v3: We can sleep while performing invalidate-range, which we can utilise
   to drop our page references prior to the kernel manipulating the vma
   (for either discard or cloning) and so protect normal users.
   v4: Only run the invalidate notifier if the range intercepts the bo.
   v5: Prevent userspace from attempting to GTT mmap non-page aligned 
   buffers
  
   Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 
  Quick bikeshed:
  - Still not really in favour of the in-page gtt offset handling ... I
still think that this is just a fancy mmap api, and it better reject
attempts to not map anything aligned to a full page outright.
 
  Strongly disagree.
 
 Ok, let's dig out the beaten arguments here ;-)
 - Imo the gtt_offset frobbery is a bit fragile, and getting this right
 in the face of ppgtt won't make it better. And yes the only reason we
 still have that field is that you've shot down any patch to remove it
 citing userptr here. So it's here already doesn't count ;-)
 - Userptr for i915 is an mmap interface, and that works on pages,
 lying to userspace isn't great.
 - I don't see why userspace can't do this themselves. I've seen that
 it makes things easier in SNA/X, but for a general purpose interface
 that argument doesn't cut too much.
 - I'm also a bit afraid that our code implicitly assumes that
 size/offset are always page-aligned and I kinda want to avoid that we
 have to audit for such issues from here on. We've blown up in the past
 assuming that size  0 already, I think we're set to blow up on this
 one here.
 
 In any case, if you really want to stick to this I want this to be
 explictly track in an obj-reloc_gtt_offset_adjustment or something
 which is very loudly yelling at people to make sure no one trips over
 it. Tracking the adjustment in a separate field, which would only ever
 be used in the reloc code would address all my concerns (safe for the
 api ugliness one).

Resurrecting this again.

I'm of two minds on the API here: on the one hand, it can be nicer for
the kernel to handle this stuff if it can be done easily, and save
userspace the trouble.  But OTOH, consistent with existing page based
interfaces makes things a little less jarring...

 
  - I915_USERPTR_UNSYNCHRONIZED ... eeek. That means that despite everyone
having mmu notifiers enabled in their distro config, you make sure sna
doesn't hit it. Imo not enough testing coverage ;-) Or this there
another reason behind this than mmu notifiers are too slow?
 
Generally I'm a bit sloppy with going root-only for legacy X stuff (like
scanline waits), but this here looks very much generally useful. So not
exemption-material imo.
 
  Strongly disagree. Most of my machines do not have mmu-notifiers and
  would still like to benefit from userptr and I see no reason why we need
  to force mmu-notifiers.
 
 Note that I didn't shout against the mmu_notifier-less support
 (although I'm honestly not too happy about it), what I don't like is
 the override bit disabling the mmu_notifiers even if we have them.
 Since that will mean that the code won't be tested through SNA, and so
 has a good chance of being buggy. Once mesa comes around and uses it,
 it'll nicely blow up. And one of the reason Jesse is breathing down my
 neck to merge this is other guys are interested in this at intel.

I think we'll need good test cases to cover things regardless.  And
yes, an mmu notifier version that doesn't require root would be more
generally useful than a root only interface (or are those items
unrelated at this point?).  Having a flag for root only operation for
clients that know what they're doing is fine though, IMO.

I think one of the nice use cases the Mesa guys have is to save an
extra copy in glReadPixels for certain types of objects, which means
non-root is a requirement.  And it seems like we could do a
glReadPixels extension that would end up being zero copy with 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-04-15 Thread Daniel Vetter
On Mon, Apr 8, 2013 at 11:48 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 On Mon, Apr 08, 2013 at 09:24:58PM +0200, Daniel Vetter wrote:
 On Mon, Apr 8, 2013 at 7:40 PM, Chris Wilson ch...@chris-wilson.co.uk 
 wrote:
  On Mon, Apr 08, 2013 at 07:18:11PM +0200, Daniel Vetter wrote:
  On Tue, Feb 12, 2013 at 02:17:22PM +, Chris Wilson wrote:
   By exporting the ability to map user address and inserting PTEs
   representing their backing pages into the GTT, we can exploit UMA in 
   order
   to utilize normal application data as a texture source or even as a
   render target (depending upon the capabilities of the chipset). This has
   a number of uses, with zero-copy downloads to the GPU and efficient
   readback making the intermixed streaming of CPU and GPU operations
   fairly efficient. This ability has many widespread implications from
   faster rendering of client-side software rasterisers (chromium),
   mitigation of stalls due to read back (firefox) and to faster pipelining
   of texture data (such as pixel buffer objects in GL or data blobs in 
   CL).
  
   v2: Compile with CONFIG_MMU_NOTIFIER
   v3: We can sleep while performing invalidate-range, which we can utilise
   to drop our page references prior to the kernel manipulating the vma
   (for either discard or cloning) and so protect normal users.
   v4: Only run the invalidate notifier if the range intercepts the bo.
   v5: Prevent userspace from attempting to GTT mmap non-page aligned 
   buffers
  
   Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 
  Quick bikeshed:
  - Still not really in favour of the in-page gtt offset handling ... I
still think that this is just a fancy mmap api, and it better reject
attempts to not map anything aligned to a full page outright.
 
  Strongly disagree.

 Ok, let's dig out the beaten arguments here ;-)
 - Imo the gtt_offset frobbery is a bit fragile, and getting this right
 in the face of ppgtt won't make it better. And yes the only reason we
 still have that field is that you've shot down any patch to remove it
 citing userptr here. So it's here already doesn't count ;-)
 - Userptr for i915 is an mmap interface, and that works on pages,
 lying to userspace isn't great.

 No. Due to the nature of existing decades old userspace, I need to map
 byte ranges, so I do not view this as a simple equivalence to mmapping
 a bo.

See below, I need to roll this up from behind ...

 - I don't see why userspace can't do this themselves. I've seen that
 it makes things easier in SNA/X, but for a general purpose interface
 that argument doesn't cut too much.

 I have completely opposite viewpoint: A general purpose interface is not
 a page interface, and that this interface trades a small amount of
 kernel complexity (tracking the offset_in_page) so that userspace has a
 flexible interface that matches its requirements.

mmap is the general-purpose map something into cpu address space
thingy, and it works on pages, too. So still don't buy this, and Eric
seems to agree.

But anyway, if you're convinced I'm grumpily ok with those semantics.

 - I'm also a bit afraid that our code implicitly assumes that
 size/offset are always page-aligned and I kinda want to avoid that we
 have to audit for such issues from here on. We've blown up in the past
 assuming that size  0 already, I think we're set to blow up on this
 one here.

 Now that we can distinguish between size and num_pages, there is no
 longer a need for size to be page aligned (and is currently redundant).

 In any case, if you really want to stick to this I want this to be
 explictly track in an obj-reloc_gtt_offset_adjustment or something

 Sure, let's call it obj-gtt_offset:12;

As long as it's something with the high 20 bits zero I'm ok. Since
with ppgtt we'll soon have tons of different -gtt_offsets, so with
your current approach we either need to keep this offset at different
places (and I'd really really dislike to do that, see all the stuff
I've been fighting in modeset land). So I want this separate and
explicit.

 which is very loudly yelling at people to make sure no one trips over
 it. Tracking the adjustment in a separate field, which would only ever
 be used in the reloc code would address all my concerns (safe for the
 api ugliness one).

 And everywhere that deals in GTT addresses.

I've ignored the other cases since I don't see a use-case, but that's
a point to be address more below. So looking at other places we use
gtt address I see two major pieces
- pwrite/pread: Since we also need to deal in struct pages for cpu
access it's easiest to just add the static offset at the beginning.
Again, much clearer when this offset is explicit and stored someplace
separate.
- gtt mmaps: I have no idea how you plan to coax the mmap syscal into
cooperation here.

So addressing your point above that SNA has to deal with 25+ years of
userspace legacy: I really want this to be explicitly done in the
kernel in all the places we need it. At 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-04-08 Thread Daniel Vetter
On Tue, Feb 12, 2013 at 02:17:22PM +, Chris Wilson wrote:
 By exporting the ability to map user address and inserting PTEs
 representing their backing pages into the GTT, we can exploit UMA in order
 to utilize normal application data as a texture source or even as a
 render target (depending upon the capabilities of the chipset). This has
 a number of uses, with zero-copy downloads to the GPU and efficient
 readback making the intermixed streaming of CPU and GPU operations
 fairly efficient. This ability has many widespread implications from
 faster rendering of client-side software rasterisers (chromium),
 mitigation of stalls due to read back (firefox) and to faster pipelining
 of texture data (such as pixel buffer objects in GL or data blobs in CL).
 
 v2: Compile with CONFIG_MMU_NOTIFIER
 v3: We can sleep while performing invalidate-range, which we can utilise
 to drop our page references prior to the kernel manipulating the vma
 (for either discard or cloning) and so protect normal users.
 v4: Only run the invalidate notifier if the range intercepts the bo.
 v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
 
 Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

Quick bikeshed:
- Still not really in favour of the in-page gtt offset handling ... I
  still think that this is just a fancy mmap api, and it better reject
  attempts to not map anything aligned to a full page outright.

- I915_USERPTR_UNSYNCHRONIZED ... eeek. That means that despite everyone
  having mmu notifiers enabled in their distro config, you make sure sna
  doesn't hit it. Imo not enough testing coverage ;-) Or this there
  another reason behind this than mmu notifiers are too slow?

  Generally I'm a bit sloppy with going root-only for legacy X stuff (like
  scanline waits), but this here looks very much generally useful. So not
  exemption-material imo.

- On a quick read I've seen some gtt mmap support remnants. This scares
  me, a roundabout njet! would appease. Though I think that should already
  happen with the checks we have to reject snoopable buffers?

- I think we should reject set_cacheing requests on usersptr objects.

- union drm_i915_gem_objects freaked me out shortly, until I've noticed
  that it's only used for our private slab. Maybe and explicit max in
  there? Also, this somewhat defeats that you've moved the userptr stuff
  out of the base class - this way we won't save any memory ...

- Basic igt to check the above api corner-cases return -EINVAL would be
  nice.

- I need to check for deadlocks around the mmu notifier handling. Iirc
  that thing takes all mm locks, and our own bo unref code can be called
  from all kinds of interesting places. Since each vma object also holds a
  ref onto a gem bo I suspect that we do have some fun all here ...

Cheers, Daniel

 ---
  drivers/gpu/drm/i915/Makefile  |1 +
  drivers/gpu/drm/i915/i915_dma.c|1 +
  drivers/gpu/drm/i915/i915_drv.h|   22 ++
  drivers/gpu/drm/i915/i915_gem.c|   31 ++-
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |7 +-
  drivers/gpu/drm/i915/i915_gem_userptr.c|  329 
 
  include/uapi/drm/i915_drm.h|   16 ++
  7 files changed, 393 insertions(+), 14 deletions(-)
  create mode 100644 drivers/gpu/drm/i915/i915_gem_userptr.c
 
 diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
 index 91f3ac6..42858f6 100644
 --- a/drivers/gpu/drm/i915/Makefile
 +++ b/drivers/gpu/drm/i915/Makefile
 @@ -14,6 +14,7 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o \
 i915_gem_gtt.o \
 i915_gem_stolen.o \
 i915_gem_tiling.o \
 +   i915_gem_userptr.o \
 i915_sysfs.o \
 i915_trace_points.o \
 i915_ums.o \
 diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
 index 4fa6beb..9b1984c 100644
 --- a/drivers/gpu/drm/i915/i915_dma.c
 +++ b/drivers/gpu/drm/i915/i915_dma.c
 @@ -1883,6 +1883,7 @@ struct drm_ioctl_desc i915_ioctls[] = {
   DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, 
 i915_gem_context_create_ioctl, DRM_UNLOCKED),
   DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, 
 i915_gem_context_destroy_ioctl, DRM_UNLOCKED),
   DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_UNLOCKED),
 + DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, 
 DRM_UNLOCKED),
  };
  
  int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls);
 diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
 index 923dc0a..90070f4 100644
 --- a/drivers/gpu/drm/i915/i915_drv.h
 +++ b/drivers/gpu/drm/i915/i915_drv.h
 @@ -42,6 +42,7 @@
  #include linux/backlight.h
  #include linux/intel-iommu.h
  #include linux/kref.h
 +#include linux/mmu_notifier.h
  #include linux/pm_qos.h
  
  /* General customization:
 @@ -1076,6 +1077,7 @@ struct drm_i915_gem_object_ops {
*/
   int (*get_pages)(struct drm_i915_gem_object *);
   void 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-04-08 Thread Chris Wilson
On Mon, Apr 08, 2013 at 07:18:11PM +0200, Daniel Vetter wrote:
 On Tue, Feb 12, 2013 at 02:17:22PM +, Chris Wilson wrote:
  By exporting the ability to map user address and inserting PTEs
  representing their backing pages into the GTT, we can exploit UMA in order
  to utilize normal application data as a texture source or even as a
  render target (depending upon the capabilities of the chipset). This has
  a number of uses, with zero-copy downloads to the GPU and efficient
  readback making the intermixed streaming of CPU and GPU operations
  fairly efficient. This ability has many widespread implications from
  faster rendering of client-side software rasterisers (chromium),
  mitigation of stalls due to read back (firefox) and to faster pipelining
  of texture data (such as pixel buffer objects in GL or data blobs in CL).
  
  v2: Compile with CONFIG_MMU_NOTIFIER
  v3: We can sleep while performing invalidate-range, which we can utilise
  to drop our page references prior to the kernel manipulating the vma
  (for either discard or cloning) and so protect normal users.
  v4: Only run the invalidate notifier if the range intercepts the bo.
  v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
  
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 
 Quick bikeshed:
 - Still not really in favour of the in-page gtt offset handling ... I
   still think that this is just a fancy mmap api, and it better reject
   attempts to not map anything aligned to a full page outright.

Strongly disagree.
 
 - I915_USERPTR_UNSYNCHRONIZED ... eeek. That means that despite everyone
   having mmu notifiers enabled in their distro config, you make sure sna
   doesn't hit it. Imo not enough testing coverage ;-) Or this there
   another reason behind this than mmu notifiers are too slow?
 
   Generally I'm a bit sloppy with going root-only for legacy X stuff (like
   scanline waits), but this here looks very much generally useful. So not
   exemption-material imo.

Strongly disagree. Most of my machines do not have mmu-notifiers and
would still like to benefit from userptr and I see no reason why we need
to force mmu-notifiers.

 - On a quick read I've seen some gtt mmap support remnants. This scares
   me, a roundabout njet! would appease. Though I think that should already
   happen with the checks we have to reject snoopable buffers?

That's because there are platforms where it is theorectically possible
and whilst I have no use case for it, I wanted to make it work
nevertheless. I still think it is possible, but I could not see a way to
do so without completely replacing the drm vm code.
 
 - I think we should reject set_cacheing requests on usersptr objects.

I don't think that is strictly required, just like we should not limit
the user from using set_tiling. (Though the user is never actually going
to tell the kernel about such tiling...)
 
 - union drm_i915_gem_objects freaked me out shortly, until I've noticed
   that it's only used for our private slab. Maybe and explicit max in
   there? Also, this somewhat defeats that you've moved the userptr stuff
   out of the base class - this way we won't save any memory ...

The alternative is to use a union inside the object. Long ago, I had a
few more objects in there.
 
 - Basic igt to check the above api corner-cases return -EINVAL would be
   nice.

Been sitting around for ages, just waiting for the interface to be
agreed upon.
 
 - I need to check for deadlocks around the mmu notifier handling. Iirc
   that thing takes all mm locks, and our own bo unref code can be called
   from all kinds of interesting places. Since each vma object also holds a
   ref onto a gem bo I suspect that we do have some fun all here ...

The notifier itself takes no locks, so the locking is whatever state the
caller sets up. The current locking order is (mm, struct mutex) i.e. the
same ordering as used the notifier callbacks.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-04-08 Thread Daniel Vetter
On Mon, Apr 8, 2013 at 7:40 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 On Mon, Apr 08, 2013 at 07:18:11PM +0200, Daniel Vetter wrote:
 On Tue, Feb 12, 2013 at 02:17:22PM +, Chris Wilson wrote:
  By exporting the ability to map user address and inserting PTEs
  representing their backing pages into the GTT, we can exploit UMA in order
  to utilize normal application data as a texture source or even as a
  render target (depending upon the capabilities of the chipset). This has
  a number of uses, with zero-copy downloads to the GPU and efficient
  readback making the intermixed streaming of CPU and GPU operations
  fairly efficient. This ability has many widespread implications from
  faster rendering of client-side software rasterisers (chromium),
  mitigation of stalls due to read back (firefox) and to faster pipelining
  of texture data (such as pixel buffer objects in GL or data blobs in CL).
 
  v2: Compile with CONFIG_MMU_NOTIFIER
  v3: We can sleep while performing invalidate-range, which we can utilise
  to drop our page references prior to the kernel manipulating the vma
  (for either discard or cloning) and so protect normal users.
  v4: Only run the invalidate notifier if the range intercepts the bo.
  v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
 
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

 Quick bikeshed:
 - Still not really in favour of the in-page gtt offset handling ... I
   still think that this is just a fancy mmap api, and it better reject
   attempts to not map anything aligned to a full page outright.

 Strongly disagree.

Ok, let's dig out the beaten arguments here ;-)
- Imo the gtt_offset frobbery is a bit fragile, and getting this right
in the face of ppgtt won't make it better. And yes the only reason we
still have that field is that you've shot down any patch to remove it
citing userptr here. So it's here already doesn't count ;-)
- Userptr for i915 is an mmap interface, and that works on pages,
lying to userspace isn't great.
- I don't see why userspace can't do this themselves. I've seen that
it makes things easier in SNA/X, but for a general purpose interface
that argument doesn't cut too much.
- I'm also a bit afraid that our code implicitly assumes that
size/offset are always page-aligned and I kinda want to avoid that we
have to audit for such issues from here on. We've blown up in the past
assuming that size  0 already, I think we're set to blow up on this
one here.

In any case, if you really want to stick to this I want this to be
explictly track in an obj-reloc_gtt_offset_adjustment or something
which is very loudly yelling at people to make sure no one trips over
it. Tracking the adjustment in a separate field, which would only ever
be used in the reloc code would address all my concerns (safe for the
api ugliness one).

 - I915_USERPTR_UNSYNCHRONIZED ... eeek. That means that despite everyone
   having mmu notifiers enabled in their distro config, you make sure sna
   doesn't hit it. Imo not enough testing coverage ;-) Or this there
   another reason behind this than mmu notifiers are too slow?

   Generally I'm a bit sloppy with going root-only for legacy X stuff (like
   scanline waits), but this here looks very much generally useful. So not
   exemption-material imo.

 Strongly disagree. Most of my machines do not have mmu-notifiers and
 would still like to benefit from userptr and I see no reason why we need
 to force mmu-notifiers.

Note that I didn't shout against the mmu_notifier-less support
(although I'm honestly not too happy about it), what I don't like is
the override bit disabling the mmu_notifiers even if we have them.
Since that will mean that the code won't be tested through SNA, and so
has a good chance of being buggy. Once mesa comes around and uses it,
it'll nicely blow up. And one of the reason Jesse is breathing down my
neck to merge this is other guys are interested in this at intel.

 - On a quick read I've seen some gtt mmap support remnants. This scares
   me, a roundabout njet! would appease. Though I think that should already
   happen with the checks we have to reject snoopable buffers?

 That's because there are platforms where it is theorectically possible
 and whilst I have no use case for it, I wanted to make it work
 nevertheless. I still think it is possible, but I could not see a way to
 do so without completely replacing the drm vm code.

If I understand things correctly we should be able to block this
simply by refusing to create an mmap offset for a userptr backed
object.

 - I think we should reject set_cacheing requests on usersptr objects.

 I don't think that is strictly required, just like we should not limit
 the user from using set_tiling. (Though the user is never actually going
 to tell the kernel about such tiling...).

Yeah, I guess we could allow people to shot their foot off. Otoh it
adds another dimension to the userptr interface, which we need to make
sure it 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-04-08 Thread Chris Wilson
On Mon, Apr 08, 2013 at 09:24:58PM +0200, Daniel Vetter wrote:
 On Mon, Apr 8, 2013 at 7:40 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
  On Mon, Apr 08, 2013 at 07:18:11PM +0200, Daniel Vetter wrote:
  On Tue, Feb 12, 2013 at 02:17:22PM +, Chris Wilson wrote:
   By exporting the ability to map user address and inserting PTEs
   representing their backing pages into the GTT, we can exploit UMA in 
   order
   to utilize normal application data as a texture source or even as a
   render target (depending upon the capabilities of the chipset). This has
   a number of uses, with zero-copy downloads to the GPU and efficient
   readback making the intermixed streaming of CPU and GPU operations
   fairly efficient. This ability has many widespread implications from
   faster rendering of client-side software rasterisers (chromium),
   mitigation of stalls due to read back (firefox) and to faster pipelining
   of texture data (such as pixel buffer objects in GL or data blobs in CL).
  
   v2: Compile with CONFIG_MMU_NOTIFIER
   v3: We can sleep while performing invalidate-range, which we can utilise
   to drop our page references prior to the kernel manipulating the vma
   (for either discard or cloning) and so protect normal users.
   v4: Only run the invalidate notifier if the range intercepts the bo.
   v5: Prevent userspace from attempting to GTT mmap non-page aligned 
   buffers
  
   Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
 
  Quick bikeshed:
  - Still not really in favour of the in-page gtt offset handling ... I
still think that this is just a fancy mmap api, and it better reject
attempts to not map anything aligned to a full page outright.
 
  Strongly disagree.
 
 Ok, let's dig out the beaten arguments here ;-)
 - Imo the gtt_offset frobbery is a bit fragile, and getting this right
 in the face of ppgtt won't make it better. And yes the only reason we
 still have that field is that you've shot down any patch to remove it
 citing userptr here. So it's here already doesn't count ;-)
 - Userptr for i915 is an mmap interface, and that works on pages,
 lying to userspace isn't great.

No. Due to the nature of existing decades old userspace, I need to map
byte ranges, so I do not view this as a simple equivalence to mmapping
a bo.

 - I don't see why userspace can't do this themselves. I've seen that
 it makes things easier in SNA/X, but for a general purpose interface
 that argument doesn't cut too much.

I have completely opposite viewpoint: A general purpose interface is not
a page interface, and that this interface trades a small amount of
kernel complexity (tracking the offset_in_page) so that userspace has a
flexible interface that matches its requirements.

 - I'm also a bit afraid that our code implicitly assumes that
 size/offset are always page-aligned and I kinda want to avoid that we
 have to audit for such issues from here on. We've blown up in the past
 assuming that size  0 already, I think we're set to blow up on this
 one here.

Now that we can distinguish between size and num_pages, there is no
longer a need for size to be page aligned (and is currently redundant).
 
 In any case, if you really want to stick to this I want this to be
 explictly track in an obj-reloc_gtt_offset_adjustment or something

Sure, let's call it obj-gtt_offset:12;

 which is very loudly yelling at people to make sure no one trips over
 it. Tracking the adjustment in a separate field, which would only ever
 be used in the reloc code would address all my concerns (safe for the
 api ugliness one).

And everywhere that deals in GTT addresses.
 
  - I915_USERPTR_UNSYNCHRONIZED ... eeek. That means that despite everyone
having mmu notifiers enabled in their distro config, you make sure sna
doesn't hit it. Imo not enough testing coverage ;-) Or this there
another reason behind this than mmu notifiers are too slow?
 
Generally I'm a bit sloppy with going root-only for legacy X stuff (like
scanline waits), but this here looks very much generally useful. So not
exemption-material imo.
 
  Strongly disagree. Most of my machines do not have mmu-notifiers and
  would still like to benefit from userptr and I see no reason why we need
  to force mmu-notifiers.
 
 Note that I didn't shout against the mmu_notifier-less support
 (although I'm honestly not too happy about it), what I don't like is
 the override bit disabling the mmu_notifiers even if we have them.
 Since that will mean that the code won't be tested through SNA, and so
 has a good chance of being buggy. Once mesa comes around and uses it,
 it'll nicely blow up. And one of the reason Jesse is breathing down my
 neck to merge this is other guys are interested in this at intel.

I don't see how you can reconcile that viewpoint. In order for userspace
to be able to use userptr without mmu-notifiers it has to be very
careful about managing synchronisation of such pointers across vma
events. So userptr should only be 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-04-08 Thread Eric Anholt
Daniel Vetter dan...@ffwll.ch writes:

 On Mon, Apr 8, 2013 at 7:40 PM, Chris Wilson ch...@chris-wilson.co.uk wrote:
 On Mon, Apr 08, 2013 at 07:18:11PM +0200, Daniel Vetter wrote:
 On Tue, Feb 12, 2013 at 02:17:22PM +, Chris Wilson wrote:
  By exporting the ability to map user address and inserting PTEs
  representing their backing pages into the GTT, we can exploit UMA in order
  to utilize normal application data as a texture source or even as a
  render target (depending upon the capabilities of the chipset). This has
  a number of uses, with zero-copy downloads to the GPU and efficient
  readback making the intermixed streaming of CPU and GPU operations
  fairly efficient. This ability has many widespread implications from
  faster rendering of client-side software rasterisers (chromium),
  mitigation of stalls due to read back (firefox) and to faster pipelining
  of texture data (such as pixel buffer objects in GL or data blobs in CL).
 
  v2: Compile with CONFIG_MMU_NOTIFIER
  v3: We can sleep while performing invalidate-range, which we can utilise
  to drop our page references prior to the kernel manipulating the vma
  (for either discard or cloning) and so protect normal users.
  v4: Only run the invalidate notifier if the range intercepts the bo.
  v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers
 
  Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk

 Quick bikeshed:
 - Still not really in favour of the in-page gtt offset handling ... I
   still think that this is just a fancy mmap api, and it better reject
   attempts to not map anything aligned to a full page outright.

 Strongly disagree.

 Ok, let's dig out the beaten arguments here ;-)
 - Imo the gtt_offset frobbery is a bit fragile, and getting this right
 in the face of ppgtt won't make it better. And yes the only reason we
 still have that field is that you've shot down any patch to remove it
 citing userptr here. So it's here already doesn't count ;-)

Agreed -- given that I need to look at byte offsets for alignment issues
on basically all my usages of memory, having my data have part of its
intra-page offset hidden in the kernel at creation time would be bad for
Mesa.

Access to data is controlled at a page level, so I think this kernel
interface should act at a page level.

 - I915_USERPTR_UNSYNCHRONIZED ... eeek. That means that despite everyone
   having mmu notifiers enabled in their distro config, you make sure sna
   doesn't hit it. Imo not enough testing coverage ;-) Or this there
   another reason behind this than mmu notifiers are too slow?

   Generally I'm a bit sloppy with going root-only for legacy X stuff (like
   scanline waits), but this here looks very much generally useful. So not
   exemption-material imo.

 Strongly disagree. Most of my machines do not have mmu-notifiers and
 would still like to benefit from userptr and I see no reason why we need
 to force mmu-notifiers.

 Note that I didn't shout against the mmu_notifier-less support
 (although I'm honestly not too happy about it), what I don't like is
 the override bit disabling the mmu_notifiers even if we have them.
 Since that will mean that the code won't be tested through SNA, and so
 has a good chance of being buggy. Once mesa comes around and uses it,
 it'll nicely blow up. And one of the reason Jesse is breathing down my
 neck to merge this is other guys are interested in this at intel.

I hate root-only interfaces.  It helps lock in root-only X, and means
that X gets special treatment compared to the 3D driver.


pgpYT0aKVgn5K.pgp
Description: PGP signature
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-02-13 Thread Reese, Armin C
Thanks for the patch, Chris.

This is exactly what we were looking for to replace the VMAP feature you 
submitted a couple of years ago.  We need a method to quickly move data from 
user mode allocations into video memory (by mapping backing pages into the GTT).

The Interface appears simple enough, yet fulfills our needs.

I have one question ... what exactly does the I915_USERPTR_UNSYNCHRONIZED flag 
do?

Thanks,
Armin

-Original Message-
From: intel-gfx-bounces+armin.c.reese=intel@lists.freedesktop.org 
[mailto:intel-gfx-bounces+armin.c.reese=intel@lists.freedesktop.org] On 
Behalf Of Chris Wilson
Sent: Tuesday, February 12, 2013 6:17 AM
To: intel-gfx@lists.freedesktop.org
Subject: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into 
video memory (userptr) ioctl

By exporting the ability to map user address and inserting PTEs representing 
their backing pages into the GTT, we can exploit UMA in order to utilize normal 
application data as a texture source or even as a render target (depending upon 
the capabilities of the chipset). This has a number of uses, with zero-copy 
downloads to the GPU and efficient readback making the intermixed streaming of 
CPU and GPU operations fairly efficient. This ability has many widespread 
implications from faster rendering of client-side software rasterisers 
(chromium), mitigation of stalls due to read back (firefox) and to faster 
pipelining of texture data (such as pixel buffer objects in GL or data blobs in 
CL).

v2: Compile with CONFIG_MMU_NOTIFIER
v3: We can sleep while performing invalidate-range, which we can utilise to 
drop our page references prior to the kernel manipulating the vma (for either 
discard or cloning) and so protect normal users.
v4: Only run the invalidate notifier if the range intercepts the bo.
v5: Prevent userspace from attempting to GTT mmap non-page aligned buffers

Signed-off-by: Chris Wilson ch...@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/Makefile  |1 +
 drivers/gpu/drm/i915/i915_dma.c|1 +
 drivers/gpu/drm/i915/i915_drv.h|   22 ++
 drivers/gpu/drm/i915/i915_gem.c|   31 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |7 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c|  329 
 include/uapi/drm/i915_drm.h|   16 ++
 7 files changed, 393 insertions(+), 14 deletions(-)  create mode 100644 
drivers/gpu/drm/i915/i915_gem_userptr.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile 
index 91f3ac6..42858f6 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -14,6 +14,7 @@ i915-y := i915_drv.o i915_dma.o i915_irq.o \
  i915_gem_gtt.o \
  i915_gem_stolen.o \
  i915_gem_tiling.o \
+ i915_gem_userptr.o \
  i915_sysfs.o \
  i915_trace_points.o \
  i915_ums.o \
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c 
index 4fa6beb..9b1984c 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1883,6 +1883,7 @@ struct drm_ioctl_desc i915_ioctls[] = {
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, 
i915_gem_context_create_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, 
i915_gem_context_destroy_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_UNLOCKED),
+   DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, 
+DRM_UNLOCKED),
 };
 
 int i915_max_ioctl = DRM_ARRAY_SIZE(i915_ioctls); diff --git 
a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 
923dc0a..90070f4 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -42,6 +42,7 @@
 #include linux/backlight.h
 #include linux/intel-iommu.h
 #include linux/kref.h
+#include linux/mmu_notifier.h
 #include linux/pm_qos.h
 
 /* General customization:
@@ -1076,6 +1077,7 @@ struct drm_i915_gem_object_ops {
 */
int (*get_pages)(struct drm_i915_gem_object *);
void (*put_pages)(struct drm_i915_gem_object *);
+   void (*release)(struct drm_i915_gem_object *);
 };
 
 struct drm_i915_gem_object {
@@ -1222,6 +1224,23 @@ struct drm_i915_gem_object {  };  #define 
to_gem_object(obj) (((struct drm_i915_gem_object *)(obj))-base)
 
+struct i915_gem_userptr_object {
+   struct drm_i915_gem_object gem;
+   uintptr_t user_ptr;
+   size_t user_size;
+   int read_only;
+
+   struct mm_struct *mm;
+#if defined(CONFIG_MMU_NOTIFIER)
+   struct mmu_notifier mn;
+#endif
+};
+
+union drm_i915_gem_objects {
+   struct drm_i915_gem_object base;
+   struct i915_gem_userptr_object userptr; };
+
 #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
 
 /**
@@ -1501,6 +1520,8 @@ int i915_gem_entervt_ioctl(struct drm_device *dev, void 
*data,
   struct drm_file *file_priv);
 int i915_gem_leavevt_ioctl(struct 

Re: [Intel-gfx] [PATCH] drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl

2013-02-13 Thread Chris Wilson
On Wed, Feb 13, 2013 at 10:24:59PM +, Reese, Armin C wrote:
 Thanks for the patch, Chris.
 
 This is exactly what we were looking for to replace the VMAP feature you 
 submitted a couple of years ago.  We need a method to quickly move data from 
 user mode allocations into video memory (by mapping backing pages into the 
 GTT).
 
 The Interface appears simple enough, yet fulfills our needs.
 
 I have one question ... what exactly does the I915_USERPTR_UNSYNCHRONIZED 
 flag do?

Not so loud, Daniel might notice. It is a flag for the user to ask for a
get_user_pages mapping that ignored the synchronization issues
associated with cloning vma. That is the caller fully understood that to
fork and share those pages would require userspace synchronization and
that the caller was also responsible for ensuring that the mapping
was destroyed first before the vma was release - or else the vma could be
reused by the process/shared-memory whilst it was stil active.

The intention was to workaround the limitations and lack-of mmu-notifier
in many cases. I made it a privileged flag because normal users aren't
really meant to be able to cause so much interprocess havoc. Though
really having the GPU also access that memory is not so much more
dangerous than sharing that memory with another process, so I am open to
the suggestion that maybe we should allow normal users to shoot
themselves in the foot as well.

Also note that the current interface prohibits GTT mmaping of userptr bo
on LLC machines if the pointer is not page-aligned. I still think I can
lift that restriction, but my initial attempts were futile, so I'm
ignoring that corner case unless there is a use-case.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx