Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-09-15 Thread Chris Wilson
On Fri, Jul 31, 2015 at 06:46:20PM +0530, Goel, Akash wrote:
> Since the suggestion is to reuse the gtt_pwrite_fast function only
> for non-shmem backed objects, can we also relax the Tiling
> constraint here, apart from removing the PIN_NONBLOCK flag. As
> anyways there is a put_fence already being done in gtt_pwrite_fast
> function.

The tiling restraint is due to a hardware limitation (the effect of
swizzling), you cannot simply drop it.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-09-15 Thread Chris Wilson
On Tue, Sep 15, 2015 at 02:03:27PM +0530, ankitprasad.r.sha...@intel.com wrote:
> @@ -1090,17 +1184,17 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
> *data,
>   goto out;
>   }
>  
> - /* prime objects have no backing filp to GEM pread/pwrite
> -  * pages from.
> -  */
> - if (!obj->base.filp) {
> - ret = -EINVAL;
> - goto out;
> - }
> -
>   trace_i915_gem_object_pwrite(obj, args->offset, args->size);
>  
>   ret = -EFAULT;
> +
> + /* pwrite for non shmem backed objects */
> + if (!obj->base.filp) {
> + ret = i915_gem_gtt_pread_pwrite(dev, obj, args->size,
> + args->offset, args->data_ptr,
> + true);
> + goto out;
> + }

There already exists a GTT write path, along with a more correct
description of its limitations.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-09-15 Thread Ankitprasad Sharma
On Tue, 2015-09-15 at 10:54 +0100, Chris Wilson wrote:
> On Tue, Sep 15, 2015 at 02:03:27PM +0530, ankitprasad.r.sha...@intel.com 
> wrote:
> > @@ -1090,17 +1184,17 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
> > *data,
> > goto out;
> > }
> >  
> > -   /* prime objects have no backing filp to GEM pread/pwrite
> > -* pages from.
> > -*/
> > -   if (!obj->base.filp) {
> > -   ret = -EINVAL;
> > -   goto out;
> > -   }
> > -
> > trace_i915_gem_object_pwrite(obj, args->offset, args->size);
> >  
> > ret = -EFAULT;
> > +
> > +   /* pwrite for non shmem backed objects */
> > +   if (!obj->base.filp) {
> > +   ret = i915_gem_gtt_pread_pwrite(dev, obj, args->size,
> > +   args->offset, args->data_ptr,
> > +   true);
> > +   goto out;
> > +   }
> 
> There already exists a GTT write path, along with a more correct
> description of its limitations.

Then it would look something like this, making i915_gem_gtt_pwrite_fast
to handle pagefaults for non-shmem backed objects

@@ -831,10 +925,16 @@ i915_gem_gtt_pwrite_fast(struct drm_device *dev,
 * retry in the slow path.
 */
- if (fast_user_write(dev_priv->gtt.mappable, page_base,
+if (obj->base.filp &&
+ fast_user_write(dev_priv->gtt.mappable, page_base,
 page_offset, user_data, page_length)) {
ret = -EFAULT;
goto out_flush;
+} else if (slow_user_access(dev_priv->gtt.mappable,
+page_base, page_offset,
+   user_data, page_length, true)) {
+   ret = -EFAULT;
+   goto out_flush;
 }
 
Thanks
-Ankit


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-09-10 Thread Ankitprasad Sharma
On Fri, 2015-07-31 at 18:46 +0530, Goel, Akash wrote:
> 
> On 7/22/2015 8:09 PM, Chris Wilson wrote:
> > On Wed, Jul 22, 2015 at 07:21:49PM +0530, ankitprasad.r.sha...@intel.com 
> > wrote:
> >>   static int
> >>   i915_gem_shmem_pread(struct drm_device *dev,
> >> struct drm_i915_gem_object *obj,
> >> @@ -754,17 +850,20 @@ i915_gem_pread_ioctl(struct drm_device *dev, void 
> >> *data,
> >>goto out;
> >>}
> >>
> >> -  /* prime objects have no backing filp to GEM pread/pwrite
> >> -   * pages from.
> >> -   */
> >> -  if (!obj->base.filp) {
> >> -  ret = -EINVAL;
> >> -  goto out;
> >> -  }
> >> -
> >>trace_i915_gem_object_pread(obj, args->offset, args->size);
> >>
> >> -  ret = i915_gem_shmem_pread(dev, obj, args, file);
> >> +  /* pread for non shmem backed objects */
> >> +  if (!obj->base.filp) {
> >> +  if (obj->tiling_mode == I915_TILING_NONE)
> >
> > pread/pwrite is defined as a cpu linear, the restriction upon tiling is
> > a simplification of handling swizzling.
> >
> >> +  ret = i915_gem_gtt_pread_pwrite(dev, obj, args->size,
> >> +  args->offset,
> >> +  args->data_ptr,
> >> +  false);
> >> +  else
> >> +  ret = -EINVAL;
> >> +  } else {
> >> +  ret = i915_gem_shmem_pread(dev, obj, args, file);
> >> +  }
> >>
> >>   out:
> >>drm_gem_object_unreference(>base);
> >> @@ -1105,17 +1204,22 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
> >> *data,
> >>goto out;
> >>}
> >>
> >> -  /* prime objects have no backing filp to GEM pread/pwrite
> >> -   * pages from.
> >> -   */
> >> -  if (!obj->base.filp) {
> >> -  ret = -EINVAL;
> >> -  goto out;
> >> -  }
> >> -
> >>trace_i915_gem_object_pwrite(obj, args->offset, args->size);
> >>
> >>ret = -EFAULT;
> >> +
> >> +  /* pwrite for non shmem backed objects */
> >> +  if (!obj->base.filp) {
> >> +  if (obj->tiling_mode == I915_TILING_NONE)
> >> +  ret = i915_gem_gtt_pread_pwrite(dev, obj, args->size,
> >> +  args->offset,
> >> +  args->data_ptr,
> >> +  true);
> >> +  else
> >> +  ret = -EINVAL;
> >> +
> >> +  goto out;
> >
> > The fast pwrite code always works for obj->base.filp == NULL. To easily
> > make it generic and handle the cases where we cannot fallback to shem,
> > undo the PIN_NONBLOCK.
> >
> > Then you just want something like
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c 
> > b/drivers/gpu/drm/i915/i915_gem.c
> > index d3016f37cd4d..f2284a27dd6d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1114,8 +1114,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
> > *data,
> >   * perspective, requiring manual detiling by the client.
> >   */
> >  if (obj->tiling_mode == I915_TILING_NONE &&
> 
> Since the suggestion is to reuse the gtt_pwrite_fast function only for 
> non-shmem backed objects, can we also relax the Tiling constraint here, 
> apart from removing the PIN_NONBLOCK flag. As anyways there is a 
> put_fence already being done in gtt_pwrite_fast function.
> 
> Also currently the gtt_pwrite_fast function is non-tolerant to faults, 
> incurred while accessing the User space buffer, so can that also be 
> relaxed (at least for the non-shmem backed objects) ??

Even if we reuse the gtt_pwrite_fast we will have to handle page-faults
for non-shmem backed objects, that will contradict the purpose of
gtt_pwrite_fast. The gtt_pread_pwrite will still be around for pread
purpose.

Also, I think it should be ok to relax the tiling constraint as well, as
we put_fence everytime we try to pread/pwrite from/to a non-shmem-backed
object here.

Thanks,
Ankit

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-31 Thread Goel, Akash



On 7/22/2015 8:09 PM, Chris Wilson wrote:

On Wed, Jul 22, 2015 at 07:21:49PM +0530, ankitprasad.r.sha...@intel.com wrote:

  static int
  i915_gem_shmem_pread(struct drm_device *dev,
 struct drm_i915_gem_object *obj,
@@ -754,17 +850,20 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
goto out;
}

-   /* prime objects have no backing filp to GEM pread/pwrite
-* pages from.
-*/
-   if (!obj-base.filp) {
-   ret = -EINVAL;
-   goto out;
-   }
-
trace_i915_gem_object_pread(obj, args-offset, args-size);

-   ret = i915_gem_shmem_pread(dev, obj, args, file);
+   /* pread for non shmem backed objects */
+   if (!obj-base.filp) {
+   if (obj-tiling_mode == I915_TILING_NONE)


pread/pwrite is defined as a cpu linear, the restriction upon tiling is
a simplification of handling swizzling.


+   ret = i915_gem_gtt_pread_pwrite(dev, obj, args-size,
+   args-offset,
+   args-data_ptr,
+   false);
+   else
+   ret = -EINVAL;
+   } else {
+   ret = i915_gem_shmem_pread(dev, obj, args, file);
+   }

  out:
drm_gem_object_unreference(obj-base);
@@ -1105,17 +1204,22 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
*data,
goto out;
}

-   /* prime objects have no backing filp to GEM pread/pwrite
-* pages from.
-*/
-   if (!obj-base.filp) {
-   ret = -EINVAL;
-   goto out;
-   }
-
trace_i915_gem_object_pwrite(obj, args-offset, args-size);

ret = -EFAULT;
+
+   /* pwrite for non shmem backed objects */
+   if (!obj-base.filp) {
+   if (obj-tiling_mode == I915_TILING_NONE)
+   ret = i915_gem_gtt_pread_pwrite(dev, obj, args-size,
+   args-offset,
+   args-data_ptr,
+   true);
+   else
+   ret = -EINVAL;
+
+   goto out;


The fast pwrite code always works for obj-base.filp == NULL. To easily
make it generic and handle the cases where we cannot fallback to shem,
undo the PIN_NONBLOCK.

Then you just want something like
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d3016f37cd4d..f2284a27dd6d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1114,8 +1114,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
  * perspective, requiring manual detiling by the client.
  */
 if (obj-tiling_mode == I915_TILING_NONE 


Since the suggestion is to reuse the gtt_pwrite_fast function only for 
non-shmem backed objects, can we also relax the Tiling constraint here, 
apart from removing the PIN_NONBLOCK flag. As anyways there is a 
put_fence already being done in gtt_pwrite_fast function.


Also currently the gtt_pwrite_fast function is non-tolerant to faults, 
incurred while accessing the User space buffer, so can that also be 
relaxed (at least for the non-shmem backed objects) ??


Best regards
Akash


-   obj-base.write_domain != I915_GEM_DOMAIN_CPU 
-   cpu_write_needs_clflush(obj)) {
+   (obj-base.filp == NULL ||
+(obj-base.write_domain != I915_GEM_DOMAIN_CPU 
+ cpu_write_needs_clflush(obj {
 ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
 /* Note that the gtt paths might fail with non-page-backed user
  * pointers (e.g. gtt mappings when moving data between
@@ -1125,7 +1126,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 if (ret == -EFAULT || ret == -ENOSPC) {
 if (obj-phys_handle)
 ret = i915_gem_phys_pwrite(obj, args, file);
-   else
+   else if (obj-filp)
 ret = i915_gem_shmem_pwrite(dev, obj, args, file);
 }


to enable pwrite access to stolen.
-Chris


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-22 Thread Tvrtko Ursulin


Hi,

On 07/22/2015 02:51 PM, ankitprasad.r.sha...@intel.com wrote:

From: Ankitprasad Sharma ankitprasad.r.sha...@intel.com

This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface.
This will cover prime objects as well as stolen memory backed objects
but for userptr objects it is still forbidden.

v2: Drop locks around slow_user_access, prefault the pages before
access (Chris)

v3: Rebased to the latest drm-intel-nightly (Ankit)

v4: Moved page base  offset calculations outside the copy loop,
corrected data types for size and offset variables, corrected if-else
braces format (Tvrtko/kerneldocs)

Testcase: igt/gem_stolen

Signed-off-by: Ankitprasad Sharma ankitprasad.r.sha...@intel.com
---
  drivers/gpu/drm/i915/i915_gem.c | 138 +++-
  1 file changed, 121 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9e7e182..4321178 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -631,6 +631,102 @@ shmem_pread_slow(struct page *page, int 
shmem_page_offset, int page_length,
return ret ? - EFAULT : 0;
  }

+static inline uint64_t
+slow_user_access(struct io_mapping *mapping,
+uint64_t page_base, int page_offset,
+char __user *user_data,
+int length, bool pwrite)
+{
+   void __iomem *vaddr_inatomic;
+   void *vaddr;
+   uint64_t unwritten;
+
+   vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
+   /* We can use the cpu mem copy function because this is X86. */
+   vaddr = (void __force *)vaddr_inatomic + page_offset;
+   if (pwrite)
+   unwritten = __copy_from_user(vaddr, user_data, length);
+   else
+   unwritten = __copy_to_user(user_data, vaddr, length);
+
+   io_mapping_unmap(vaddr_inatomic);
+   return unwritten;
+}
+
+static int
+i915_gem_gtt_pread_pwrite(struct drm_device *dev,
+ struct drm_i915_gem_object *obj, uint64_t size,
+ uint64_t data_offset, uint64_t data_ptr, bool pwrite)
+{
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   char __user *user_data;
+   uint64_t remain;
+   uint64_t offset, page_base;
+   int page_offset, page_length, ret = 0;
+
+   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
+   if (ret)
+   goto out;
+
+   ret = i915_gem_object_set_to_gtt_domain(obj, pwrite);
+   if (ret)
+   goto out_unpin;
+
+   ret = i915_gem_object_put_fence(obj);
+   if (ret)
+   goto out_unpin;
+
+   user_data = to_user_ptr(data_ptr);
+   remain = size;
+   offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
+
+   if (pwrite)
+   intel_fb_obj_invalidate(obj, ORIGIN_GTT);
+
+   mutex_unlock(dev-struct_mutex);
+   if (!pwrite  likely(!i915.prefault_disable))
+   ret = fault_in_multipages_writeable(user_data, remain);
+
+   /*
+* page_offset = offset within page
+* page_base = page offset within aperture
+*/
+   page_offset = offset_in_page(offset);
+   page_base = offset  PAGE_MASK;
+
+   while (remain  0) {
+   /* page_length = bytes to copy for this page */
+   page_length = remain;
+   if ((page_offset + remain)  PAGE_SIZE)
+   page_length = PAGE_SIZE - page_offset;
+
+   /* This is a slow read/write as it tries to read from
+* and write to user memory which may result into page
+* faults
+*/
+   ret = slow_user_access(dev_priv-gtt.mappable, page_base,
+  page_offset, user_data,
+  page_length, pwrite);
+
+   if (ret) {
+   ret = -EFAULT;
+   break;
+   }
+
+   remain -= page_length;
+   user_data += page_length;
+   page_base += page_length;
+   page_offset = 0;
+   }
+
+   mutex_lock(dev-struct_mutex);


My objection here was that we are re-acquiring the mutex in 
non-interruptible mode, while the caller had it interruptible. It am not 
100% what the conclusion back then was?


But also, why it is even necessary to drop the mutex here?

Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-22 Thread Daniel Vetter
On Wed, Jul 22, 2015 at 04:46:00PM +0100, Tvrtko Ursulin wrote:
 But also, why it is even necessary to drop the mutex here?

Locking inversion with our own pagefault handler. Happens since at least
mesa can return gtt mmaps to gl clients, which can then in turn try to
upload that with pwrite somewhere else. Also userspace could be just plain
nasty and try to deadlock the kernel ;-)

Dropping the struct_mutex is SOP for the copy_from/to_user slowpath.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-22 Thread Tvrtko Ursulin


On 07/22/2015 05:05 PM, Daniel Vetter wrote:

On Wed, Jul 22, 2015 at 04:46:00PM +0100, Tvrtko Ursulin wrote:

But also, why it is even necessary to drop the mutex here?


Locking inversion with our own pagefault handler. Happens since at least
mesa can return gtt mmaps to gl clients, which can then in turn try to
upload that with pwrite somewhere else. Also userspace could be just plain
nasty and try to deadlock the kernel ;-)


Yeah blame userspace. ;D


Dropping the struct_mutex is SOP for the copy_from/to_user slowpath.


I suspected something like that but did not feel like thinking too much. 
:) When encountering unusual things like dropping and re-acquiring locks 
I expect to see comments explaining it.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-22 Thread Chris Wilson
On Wed, Jul 22, 2015 at 07:21:49PM +0530, ankitprasad.r.sha...@intel.com wrote:
  static int
  i915_gem_shmem_pread(struct drm_device *dev,
struct drm_i915_gem_object *obj,
 @@ -754,17 +850,20 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
   goto out;
   }
  
 - /* prime objects have no backing filp to GEM pread/pwrite
 -  * pages from.
 -  */
 - if (!obj-base.filp) {
 - ret = -EINVAL;
 - goto out;
 - }
 -
   trace_i915_gem_object_pread(obj, args-offset, args-size);
  
 - ret = i915_gem_shmem_pread(dev, obj, args, file);
 + /* pread for non shmem backed objects */
 + if (!obj-base.filp) {
 + if (obj-tiling_mode == I915_TILING_NONE)

pread/pwrite is defined as a cpu linear, the restriction upon tiling is
a simplification of handling swizzling.

 + ret = i915_gem_gtt_pread_pwrite(dev, obj, args-size,
 + args-offset,
 + args-data_ptr,
 + false);
 + else
 + ret = -EINVAL;
 + } else {
 + ret = i915_gem_shmem_pread(dev, obj, args, file);
 + }
  
  out:
   drm_gem_object_unreference(obj-base);
 @@ -1105,17 +1204,22 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void 
 *data,
   goto out;
   }
  
 - /* prime objects have no backing filp to GEM pread/pwrite
 -  * pages from.
 -  */
 - if (!obj-base.filp) {
 - ret = -EINVAL;
 - goto out;
 - }
 -
   trace_i915_gem_object_pwrite(obj, args-offset, args-size);
  
   ret = -EFAULT;
 +
 + /* pwrite for non shmem backed objects */
 + if (!obj-base.filp) {
 + if (obj-tiling_mode == I915_TILING_NONE)
 + ret = i915_gem_gtt_pread_pwrite(dev, obj, args-size,
 + args-offset,
 + args-data_ptr,
 + true);
 + else
 + ret = -EINVAL;
 +
 + goto out;

The fast pwrite code always works for obj-base.filp == NULL. To easily
make it generic and handle the cases where we cannot fallback to shem,
undo the PIN_NONBLOCK.

Then you just want something like
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d3016f37cd4d..f2284a27dd6d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1114,8 +1114,9 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 * perspective, requiring manual detiling by the client.
 */
if (obj-tiling_mode == I915_TILING_NONE 
-   obj-base.write_domain != I915_GEM_DOMAIN_CPU 
-   cpu_write_needs_clflush(obj)) {
+   (obj-base.filp == NULL ||
+(obj-base.write_domain != I915_GEM_DOMAIN_CPU 
+ cpu_write_needs_clflush(obj {
ret = i915_gem_gtt_pwrite_fast(dev_priv, obj, args, file);
/* Note that the gtt paths might fail with non-page-backed user
 * pointers (e.g. gtt mappings when moving data between
@@ -1125,7 +1126,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
if (ret == -EFAULT || ret == -ENOSPC) {
if (obj-phys_handle)
ret = i915_gem_phys_pwrite(obj, args, file);
-   else
+   else if (obj-filp)
ret = i915_gem_shmem_pwrite(dev, obj, args, file);
}
 

to enable pwrite access to stolen.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-02 Thread Chris Wilson
On Thu, Jul 02, 2015 at 11:42:44AM +0100, Tvrtko Ursulin wrote:
 I am not super familiar with low level mapping business. But it
 looks correct to me. Just one question would be if there are any
 downsides to WC mapping? If in the read case it would be any
 advantage not to ask for WC?

Read-side WC behaves as UC. It's bad, but since we have no other
coherent access we have no choice. We rely on userspace not to do this,
and our powers of persuasion that copying to a coherent buffer first
using the GPU and then using the CPU for readback is about 8x faster
(incl. the sync overhead) then having to use UC.
 
 +static int
 +i915_gem_gtt_pread_pwrite(struct drm_device *dev,
 +  struct drm_i915_gem_object *obj, uint64_t size,
 +  uint64_t data_offset, uint64_t data_ptr, bool write)
 +{
 +struct drm_i915_private *dev_priv = dev-dev_private;
 +char __user *user_data;
 +ssize_t remain;
 +loff_t offset, page_base;
 +int page_offset, page_length, ret = 0;
 +
 +ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
 +if (ret)
 +goto out;
 +
 +ret = i915_gem_object_set_to_gtt_domain(obj, write);
 +if (ret)
 +goto out_unpin;
 +
 +ret = i915_gem_object_put_fence(obj);
 +if (ret)
 +goto out_unpin;
 +
 +user_data = to_user_ptr(data_ptr);
 +remain = size;
 
 Strictly speaking uint64_t can overflow ssize_t, compiler does not
 care in this case?

compiler is too quiet at times. Killing ssize_t and even loff_t here
makes sense.

 +offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
 +
 +if (write)
 +intel_fb_obj_invalidate(obj, ORIGIN_GTT);

Smells wrong, better off combining the invalidate with the right flags
in the caller.

 +mutex_unlock(dev-struct_mutex);
 +if (!write  likely(!i915.prefault_disable))
 +ret = fault_in_multipages_writeable(user_data, remain);
 
 A bit confusing read/write inversion. :) But correct. Just wondering
 if it would make sense to invert the boolean at least in
 slow_user_access. Or in fact just call it pwrite instead of write to
 reflect pwrite == true is _reading_ from user memory.
 
 +while (remain  0) {
 +/* Operation in this page
 + *
 + * page_base = page offset within aperture
 + * page_offset = offset within page
 + * page_length = bytes to copy for this page
 + */
 +page_base = offset  PAGE_MASK;
 +page_offset = offset_in_page(offset);
 +page_length = remain;
 +if ((page_offset + remain)  PAGE_SIZE)
 +page_length = PAGE_SIZE - page_offset;
 
 It would save some arithmetics and branching to pull out the first
 potentially unaligned copy out of the loop and then page_offset
 would be zero for 2nd page onwards.

Honestly, keeping the loop simple is preferable :) What I do locally if

page_offset = offset_in_page();
offset = PAGE_MASK;
while (remain  0) {
len = min(remain, PAGE_SIZE - page_offset);

user_data += len;
remain -= len;
offset += PAGE_SIZE;
page_offset = 0;
}

 
 +/* This is a slow read/write as it tries to read from
 + * and write to user memory which may result into page
 + * faults
 + */
 +ret = slow_user_access(dev_priv-gtt.mappable, page_base,
 +   page_offset, user_data,
 +   page_length, write);
 +
 +if (ret) {
 +ret = -EINVAL;

Would be better to return the right errno from slow_user_access or if it
is a bool, treat is a bool!

 +break;
 +}
 +
 +remain -= page_length;
 +user_data += page_length;
 +offset += page_length;
 +}
 +
 +mutex_lock(dev-struct_mutex);
 
 Caller had it interruptible.

On second access, it is preferrable not to risk interruption -
especially when catching up with state to return to userspace.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-02 Thread Tvrtko Ursulin


On 07/01/2015 10:25 AM, ankitprasad.r.sha...@intel.com wrote:

From: Ankitprasad Sharma ankitprasad.r.sha...@intel.com

This patch adds support for extending the pread/pwrite functionality
for objects not backed by shmem. The access will be made through
gtt interface.
This will cover prime objects as well as stolen memory backed objects
but for userptr objects it is still forbidden.

v2: drop locks around slow_user_access, prefault the pages before
access (Chris)

v3: Rebased to the latest drm-intel-nightly (Ankit)

testcase: igt/gem_stolen



Signed-off-by: Ankitprasad Sharma ankitprasad.r.sha...@intel.com
---
  drivers/gpu/drm/i915/i915_gem.c | 137 +++-
  1 file changed, 120 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4acf331..4be6eb4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -629,6 +629,102 @@ shmem_pread_slow(struct page *page, int 
shmem_page_offset, int page_length,
return ret ? - EFAULT : 0;
  }

+static inline int
+slow_user_access(struct io_mapping *mapping,
+loff_t page_base, int page_offset,
+char __user *user_data,
+int length, bool write)
+{
+   void __iomem *vaddr_inatomic;
+   void *vaddr;
+   unsigned long unwritten;
+
+   vaddr_inatomic = io_mapping_map_wc(mapping, page_base);
+   /* We can use the cpu mem copy function because this is X86. */
+   vaddr = (void __force *)vaddr_inatomic + page_offset;
+   if (write)
+   unwritten = __copy_from_user(vaddr, user_data, length);
+   else
+   unwritten = __copy_to_user(user_data, vaddr, length);
+
+   io_mapping_unmap(vaddr_inatomic);
+   return unwritten;
+}


I am not super familiar with low level mapping business. But it looks 
correct to me. Just one question would be if there are any downsides to 
WC mapping? If in the read case it would be any advantage not to ask for WC?



+static int
+i915_gem_gtt_pread_pwrite(struct drm_device *dev,
+ struct drm_i915_gem_object *obj, uint64_t size,
+ uint64_t data_offset, uint64_t data_ptr, bool write)
+{
+   struct drm_i915_private *dev_priv = dev-dev_private;
+   char __user *user_data;
+   ssize_t remain;
+   loff_t offset, page_base;
+   int page_offset, page_length, ret = 0;
+
+   ret = i915_gem_obj_ggtt_pin(obj, 0, PIN_MAPPABLE);
+   if (ret)
+   goto out;
+
+   ret = i915_gem_object_set_to_gtt_domain(obj, write);
+   if (ret)
+   goto out_unpin;
+
+   ret = i915_gem_object_put_fence(obj);
+   if (ret)
+   goto out_unpin;
+
+   user_data = to_user_ptr(data_ptr);
+   remain = size;


Strictly speaking uint64_t can overflow ssize_t, compiler does not care 
in this case?



+
+   offset = i915_gem_obj_ggtt_offset(obj) + data_offset;
+
+   if (write)
+   intel_fb_obj_invalidate(obj, ORIGIN_GTT);
+
+   mutex_unlock(dev-struct_mutex);
+   if (!write  likely(!i915.prefault_disable))
+   ret = fault_in_multipages_writeable(user_data, remain);


A bit confusing read/write inversion. :) But correct. Just wondering if 
it would make sense to invert the boolean at least in slow_user_access. 
Or in fact just call it pwrite instead of write to reflect pwrite == 
true is _reading_ from user memory.



+   while (remain  0) {
+   /* Operation in this page
+*
+* page_base = page offset within aperture
+* page_offset = offset within page
+* page_length = bytes to copy for this page
+*/
+   page_base = offset  PAGE_MASK;
+   page_offset = offset_in_page(offset);
+   page_length = remain;
+   if ((page_offset + remain)  PAGE_SIZE)
+   page_length = PAGE_SIZE - page_offset;


It would save some arithmetics and branching to pull out the first 
potentially unaligned copy out of the loop and then page_offset would be 
zero for 2nd page onwards.



+   /* This is a slow read/write as it tries to read from
+* and write to user memory which may result into page
+* faults
+*/
+   ret = slow_user_access(dev_priv-gtt.mappable, page_base,
+  page_offset, user_data,
+  page_length, write);
+
+   if (ret) {
+   ret = -EINVAL;
+   break;
+   }
+
+   remain -= page_length;
+   user_data += page_length;
+   offset += page_length;
+   }
+
+   mutex_lock(dev-struct_mutex);


Caller had it interruptible.


+
+out_unpin:
+   i915_gem_object_ggtt_unpin(obj);
+out:
+   return 

Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-02 Thread Tvrtko Ursulin


On 07/02/2015 12:00 PM, Chris Wilson wrote:

+   /* This is a slow read/write as it tries to read from
+* and write to user memory which may result into page
+* faults
+*/
+   ret = slow_user_access(dev_priv-gtt.mappable, page_base,
+  page_offset, user_data,
+  page_length, write);
+
+   if (ret) {
+   ret = -EINVAL;


Would be better to return the right errno from slow_user_access or if it
is a bool, treat is a bool!


It returns a number of unwritten bytes.

Actually just spotted the return type for slow_user_access is also 
wrong, it is an int, to which it implicitly casts an unsigned long.


And it is customary to return -EFAULT on rather than -EINVAL so I 
suggest changing that as well.



+   break;
+   }
+
+   remain -= page_length;
+   user_data += page_length;
+   offset += page_length;
+   }
+
+   mutex_lock(dev-struct_mutex);


Caller had it interruptible.


On second access, it is preferrable not to risk interruption -
especially when catching up with state to return to userspace.



Second access is only in pwrite case and more so, it makes no sense to 
make such decisions on behalf of the caller. If it is a real concern, 
caller should do it.


Yes I see that there is a problem where this helper doesn't know what 
type of mutex caller had, but, it is still bad imho and could be fixed 
in a different way.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-02 Thread Chris Wilson
On Thu, Jul 02, 2015 at 12:27:42PM +0100, Tvrtko Ursulin wrote:
 
 On 07/02/2015 12:00 PM, Chris Wilson wrote:
 +  /* This is a slow read/write as it tries to read from
 +   * and write to user memory which may result into page
 +   * faults
 +   */
 +  ret = slow_user_access(dev_priv-gtt.mappable, page_base,
 + page_offset, user_data,
 + page_length, write);
 +
 +  if (ret) {
 +  ret = -EINVAL;
 
 Would be better to return the right errno from slow_user_access or if it
 is a bool, treat is a bool!
 
 It returns a number of unwritten bytes.
 
 Actually just spotted the return type for slow_user_access is also
 wrong, it is an int, to which it implicitly casts an unsigned long.

Ugh, our convention for those has to been to do the unwritten check in
the subroutine and convert it back to the errno. Saves all the guessing
about the failure mode in the caller.
 
 And it is customary to return -EFAULT on rather than -EINVAL so I
 suggest changing that as well.

Yup.
 
 +  break;
 +  }
 +
 +  remain -= page_length;
 +  user_data += page_length;
 +  offset += page_length;
 +  }
 +
 +  mutex_lock(dev-struct_mutex);
 
 Caller had it interruptible.
 
 On second access, it is preferrable not to risk interruption -
 especially when catching up with state to return to userspace.
 
 
 Second access is only in pwrite case and more so, it makes no sense
 to make such decisions on behalf of the caller. If it is a real
 concern, caller should do it.

That's odd. We must have taken it on both pwrite/pread paths to have
prepared the object for reading/writing. We can only dropped it under
difficult circumstances (lots of dancing required if we do to make sure
that the object hasn't been modified in the meantime).
 
 Yes I see that there is a problem where this helper doesn't know
 what type of mutex caller had, but, it is still bad imho and could
 be fixed in a different way.

Smells fishy indeed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-02 Thread shuang . he
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: 
shuang...@intel.com)
Task id: 6697
-Summary-
Platform  Delta  drm-intel-nightly  Series Applied
ILK  302/302  302/302
SNB  312/316  312/316
IVB  343/343  343/343
BYT -1  287/287  286/287
HSW  380/380  380/380
-Detailed-
Platform  Testdrm-intel-nightly  Series 
Applied
*BYT  igt@gem_tiled_partial_pwrite_pread@reads  PASS(1)  FAIL(1)
Note: You need to pay more attention to line start with '*'
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/4] drm/i915: Support for pread/pwrite from/to non shmem backed objects

2015-07-01 Thread Chris Wilson
On Wed, Jul 01, 2015 at 02:55:47PM +0530, ankitprasad.r.sha...@intel.com wrote:
 - ret = i915_gem_shmem_pread(dev, obj, args, file);
 + /* pread for non shmem backed objects */
 + if (!obj-base.filp) {
 + if (obj-tiling_mode == I915_TILING_NONE)
 + ret = i915_gem_gtt_pread_pwrite(dev, obj, args-size,
 + args-offset,
 + args-data_ptr,
 + false);
 + else
 + ret = -EINVAL;
 + } else
 + ret = i915_gem_shmem_pread(dev, obj, args, file);

My preference for structuring this would be to always try the faster
method and move the can-gtt/can-shm tests into that method and then try the
slow path for suitable errno.

That way with the refactoring here, we place all the knowledge of what
objects work with the GTT access inside the GTT accessor. Similarly
wrt to the fast shmem read path.

For extra fun, we can directly read from stolen physical addresses for
gen6.

Overall the patch lgtm, though I have some cleanups pending to the GTT
access path to handle huge objects - the ordering of those with this
doesn't matter.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx