Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object

2016-09-16 Thread Chris Wilson
On Wed, Sep 14, 2016 at 07:52:37AM +0100, Chris Wilson wrote:
> In preparation to support many distinct timelines, we need to expand the
> activity tracking on the GEM object to handle more than just a request
> per engine. We already use the struct reservation_object on the dma-buf
> to handle many fence contexts, so integrating that into the GEM object
> itself is the preferred solution. (For example, we can now share the same
> reservation_object between every consumer/producer using this buffer and
> skip the manual import/export via dma-buf.)
> 
> Caveats:
> 
>  * busy-ioctl: the modern interface is completely thrown away,
> regressing us back to before
> 
> commit e9808edd98679680804dfbc42c5ee8f1aa91f617 [v3.10]
> Author: Chris Wilson 
> Date:   Wed Jul 4 12:25:08 2012 +0100
> 
> drm/i915: Return a mask of the active rings in the high word of busy_ioctl

There is an alternative here, as we can walk through the obj->resv and
convert *i915* fences to their respective read/write engines.

However, this means that busy-ioctl and wait-ioctl(.timeout=0) are no
longer equivalent as that busy-ioctl may report an object as idle but
still has external rendering (and so wait-ioctl, set-to-domain-ioctl,
pread, pwrite may stall).

Walking the resv object is also more expensive than doing a simple test.
Though that can be mitigated later by opencoding the test (once we can
do the whole busy-ioctl under only RCU protection that makes more
sense). But it makes busy-ioctl slower than ideal, and busy-ioctl is
high frequency (so long as we keep avoiding struct_mutex, the difference
is likely only visible in the microbenchmarks).

Pro: we keep the finese of being able to report which engines are busy
Con: not equivalent to wait-ioctl, polling the dmabuf, etc.

That we have alternative mechanisms to check may be considered a good
thing, i.e. we can migrate to using wait-ioctl for the general am I
going to stall test, and using busy-ioctl for engine selection. I think
I am leaning towards keeping the ABI more or less intact for the time
being. Though we are getting pretty close to its limit on NUM_ENGINES.

int
i915_gem_busy_ioctl(struct drm_device *dev, void *data,
struct drm_file *file)
{
struct drm_i915_gem_busy *args = data;
struct drm_i915_gem_object *obj;
struct fence **shared, *excl;
unsigned int count, i;
int ret;

ret = -ENOENT;
obj = i915_gem_object_lookup(file, args->handle);
if (obj) {
ret = reservation_object_get_fences_rcu(obj->resv,
, , );
i915_gem_object_put_unlocked(obj);
}
if (unlikely(ret))
return ret;

 /* A discrepancy here is that we do not report the status of
  * non-i915 fences, i.e. even though we may report the object as idle,
  * a call to set-domain may still stall waiting for foreign rendering.
  * This also means that wait-ioctl may report an object as busy,
  * where busy-ioctl considers it idle.
  *
  * We trade the ability to warn of foreign fences to report on which
  * i915 engines are active for the object.
  *
  * Alternatively, we can trade that extra information on read/write
  * activity with
  * args->busy =
  * !reservation_object_test_signaled_rcu(obj->resv, true);
  * to report the overall busyness. This is what the wait-ioctl does.
  */
args->busy = 0;

/* Translate shared fences to READ set of engines */
for (i = 0; i < count; i++) {
args->busy |= busy_check_reader(shared[i]);
fence_put(shared[i]);
}
kfree(shared);

/* Translate the exclusive fence to the READ *and* WRITE engine */
if (excl) {
args->busy |= busy_check_writer(excl);
fence_put(excl);
}

return 0;
}

-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object

2016-09-15 Thread Jani Nikula
On Thu, 15 Sep 2016, Dave Gordon  wrote:
> On 14/09/16 18:35, Chris Wilson wrote:
>> On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote:
>>> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
 -static inline bool
 -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
 -int engine)
 -{
 -  return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
 +  return obj->active_count;
>>>
>>> our type is bool, so !!obj->active_count;
>>
>> That's the beauty of using bool, !! is implied on the (bool)integer
>> conversion.
>> -Chris
>
> It's a gcc extension, though, so does it work on clang?

It's in the standard.

"When any scalar value is converted to _Bool, the result is 0 if the
value compares equal to 0; otherwise, the result is 1."

BR,
Jani.


>
> .Dave.
> ___
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Jani Nikula, Intel Open Source Technology Center
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object

2016-09-15 Thread Dave Gordon

On 14/09/16 18:35, Chris Wilson wrote:

On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote:

On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:

-static inline bool
-i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
- int engine)
-{
-   return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
+   return obj->active_count;


our type is bool, so !!obj->active_count;


That's the beauty of using bool, !! is implied on the (bool)integer
conversion.
-Chris


It's a gcc extension, though, so does it work on clang?

.Dave.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object

2016-09-14 Thread Chris Wilson
On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote:
> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> > -static inline bool
> > -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
> > -     int engine)
> > -{
> > -   return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
> > +   return obj->active_count;
> 
> our type is bool, so !!obj->active_count;

That's the beauty of using bool, !! is implied on the (bool)integer
conversion.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object

2016-09-14 Thread Joonas Lahtinen
On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote:
> In preparation to support many distinct timelines, we need to expand the
> activity tracking on the GEM object to handle more than just a request
> per engine. We already use the struct reservation_object on the dma-buf
> to handle many fence contexts, so integrating that into the GEM object
> itself is the preferred solution. (For example, we can now share the same
> reservation_object between every consumer/producer using this buffer and
> skip the manual import/export via dma-buf.)
> 
> Caveats:

I'd make comments which patch in the series addresses each introduced
problem, which are fixable in future and which are taken as "a
permanent hit" for achieving multiple timelines. With a bit of
reasoning for each (now only a few points include some of this).

>  static inline struct drm_i915_gem_object *
> @@ -2347,35 +2341,10 @@ i915_gem_object_has_struct_page(const struct 
> drm_i915_gem_object *obj)
>   return obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE;
>  }
>  
> -static inline unsigned long
> -i915_gem_object_get_active(const struct drm_i915_gem_object *obj)
> -{
> - return (obj->flags >> I915_BO_ACTIVE_SHIFT) & I915_BO_ACTIVE_MASK;
> -}
> -
>  static inline bool
>  i915_gem_object_is_active(const struct drm_i915_gem_object *obj)
>  {
> - return i915_gem_object_get_active(obj);
> -}
> -
> -static inline void
> -i915_gem_object_set_active(struct drm_i915_gem_object *obj, int engine)
> -{
> - obj->flags |= BIT(engine + I915_BO_ACTIVE_SHIFT);
> -}
> -
> -static inline void
> -i915_gem_object_clear_active(struct drm_i915_gem_object *obj, int engine)
> -{
> - obj->flags &= ~BIT(engine + I915_BO_ACTIVE_SHIFT);
> -}
> -
> -static inline bool
> -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj,
> -   int engine)
> -{
> - return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT);
> + return obj->active_count;

our type is bool, so !!obj->active_count;

>  }
> 



> 
>  static int
>  i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req,
>   struct list_head *vmas)
>  {
> - const unsigned int other_rings = eb_other_engines(req);
>   struct i915_vma *vma;
>   int ret;
>  
>   list_for_each_entry(vma, vmas, exec_list) {
>   struct drm_i915_gem_object *obj = vma->obj;
> - struct reservation_object *resv;
> -
> - if (obj->flags & other_rings) {
> - ret = i915_gem_request_await_object
> - (req, obj, obj->base.pending_write_domain);
> - if (ret)
> - return ret;
> - }
>  
> - resv = i915_gem_object_get_dmabuf_resv(obj);
> - if (resv) {
> - ret = i915_sw_fence_await_reservation
> - (>submit, resv, _fence_ops,
> -  obj->base.pending_write_domain, 10*HZ,
> -  GFP_KERNEL | __GFP_NOWARN);
> - if (ret < 0)
> - return ret;
> - }
> + ret = i915_gem_request_await_object
> + (req, obj, obj->base.pending_write_domain);

I know it was previously like this, but I'm not sure I agree on this
style at all.

> @@ -11935,17 +11932,8 @@ static bool use_mmio_flip(struct intel_engine_cs 
> *engine,
>  
>   if (i915.use_mmio_flip < 0)
>   return false;
> - else if (i915.use_mmio_flip > 0)
> - return true;
> - else if (i915.enable_execlists)
> - return true;
>  
> - resv = i915_gem_object_get_dmabuf_resv(obj);
> - if (resv && !reservation_object_test_signaled_rcu(resv, false))
> - return true;
> -
> - return engine != i915_gem_active_get_engine(>last_write,
> - 
> >base.dev->struct_mutex);
> + return true;

return i915_use_mmio_flip >= 0; // ?

> @@ -860,39 +860,6 @@ struct drm_i915_gem_busy {
>    * long as no new GPU commands are executed upon it). Due to the
>    * asynchronous nature of the hardware, an object reported
>    * as busy may become idle before the ioctl is completed.
> -  *
> -  * Furthermore, if the object is busy, which engine is busy is only
> -  * provided as a guide. There are race conditions which prevent the
> -  * report of which engines are busy from being always accurate.
> -  * However, the converse is not true. If the object is idle, the
> -  * result of the ioctl, that all engines are idle, is accurate.
> -  *
> -  * The returned dword is split into two fields to indicate both
> -  * the engines on which the object is being read, and the
> -  * engine on which it is currently being written (if any).
> -  *
> -  * The low word (bits 0:15) indicate if the object is being written

[Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object

2016-09-14 Thread Chris Wilson
In preparation to support many distinct timelines, we need to expand the
activity tracking on the GEM object to handle more than just a request
per engine. We already use the struct reservation_object on the dma-buf
to handle many fence contexts, so integrating that into the GEM object
itself is the preferred solution. (For example, we can now share the same
reservation_object between every consumer/producer using this buffer and
skip the manual import/export via dma-buf.)

Caveats:

 * busy-ioctl: the modern interface is completely thrown away,
regressing us back to before

commit e9808edd98679680804dfbc42c5ee8f1aa91f617 [v3.10]
Author: Chris Wilson 
Date:   Wed Jul 4 12:25:08 2012 +0100

drm/i915: Return a mask of the active rings in the high word of busy_ioctl

 * local engine selection and semaphore avoidance is lost for CS flips

 * non-blocking atomic modesets take a step backwards as the wait for
render completion blocks the ioctl. This is fixed in a subsequent
patch to use a fence instead for awaiting on the rendering.

 * dynamic array manipulation for shared-fences in reservation is slower
than the previous lockless static assignment (e.g. gem_exec_lut_handle
runtime on ivb goes from 42s to 72s)

 * loss of object-level retirement callbacks

 * minor loss of object-level last activity information from debugfs,
could be replaced with per-vma information if desired

Signed-off-by: Chris Wilson 
---
 drivers/gpu/drm/i915/i915_debugfs.c|  18 +--
 drivers/gpu/drm/i915/i915_drv.h|  43 +
 drivers/gpu/drm/i915/i915_gem.c| 252 +++--
 drivers/gpu/drm/i915/i915_gem_batch_pool.c |   3 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.c |  46 +-
 drivers/gpu/drm/i915/i915_gem_dmabuf.h |  45 --
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  71 ++--
 drivers/gpu/drm/i915/i915_gem_gtt.c|  29 
 drivers/gpu/drm/i915/i915_gem_gtt.h|   1 +
 drivers/gpu/drm/i915/i915_gem_request.c|  48 +++---
 drivers/gpu/drm/i915/i915_gem_request.h|  35 +---
 drivers/gpu/drm/i915/i915_gpu_error.c  |   6 +-
 drivers/gpu/drm/i915/intel_atomic_plane.c  |   2 -
 drivers/gpu/drm/i915/intel_display.c   | 122 +++---
 drivers/gpu/drm/i915/intel_drv.h   |   3 -
 include/uapi/drm/i915_drm.h|  33 
 16 files changed, 127 insertions(+), 630 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/i915_gem_dmabuf.h

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c 
b/drivers/gpu/drm/i915/i915_debugfs.c
index 64702cc68e3a..c4e7532c5b6a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -134,15 +134,13 @@ static void
 describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj)
 {
struct drm_i915_private *dev_priv = to_i915(obj->base.dev);
-   struct intel_engine_cs *engine;
struct i915_vma *vma;
unsigned int frontbuffer_bits;
int pin_count = 0;
-   enum intel_engine_id id;
 
lockdep_assert_held(>base.dev->struct_mutex);
 
-   seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x [ ",
+   seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x %s%s%s",
   >base,
   get_active_flag(obj),
   get_pin_flag(obj),
@@ -151,14 +149,7 @@ describe_obj(struct seq_file *m, struct 
drm_i915_gem_object *obj)
   get_pin_mapped_flag(obj),
   obj->base.size / 1024,
   obj->base.read_domains,
-  obj->base.write_domain);
-   for_each_engine_id(engine, dev_priv, id)
-   seq_printf(m, "%x ",
-  i915_gem_active_get_seqno(>last_read[id],
-
>base.dev->struct_mutex));
-   seq_printf(m, "] %x %s%s%s",
-  i915_gem_active_get_seqno(>last_write,
->base.dev->struct_mutex),
+  obj->base.write_domain,
   i915_cache_level_str(dev_priv, obj->cache_level),
   obj->dirty ? " dirty" : "",
   obj->madv == I915_MADV_DONTNEED ? " purgeable" : "");
@@ -198,11 +189,6 @@ describe_obj(struct seq_file *m, struct 
drm_i915_gem_object *obj)
seq_printf(m, " (%s mappable)", s);
}
 
-   engine = i915_gem_active_get_engine(>last_write,
-   _priv->drm.struct_mutex);
-   if (engine)
-   seq_printf(m, " (%s)", engine->name);
-
frontbuffer_bits = atomic_read(>frontbuffer_bits);
if (frontbuffer_bits)
seq_printf(m, " (frontbuffer: 0x%03x)", frontbuffer_bits);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index af0212032b64..2bcab3087e8c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -41,6 +41,7 @@
 #include