Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
On Wed, Sep 14, 2016 at 07:52:37AM +0100, Chris Wilson wrote: > In preparation to support many distinct timelines, we need to expand the > activity tracking on the GEM object to handle more than just a request > per engine. We already use the struct reservation_object on the dma-buf > to handle many fence contexts, so integrating that into the GEM object > itself is the preferred solution. (For example, we can now share the same > reservation_object between every consumer/producer using this buffer and > skip the manual import/export via dma-buf.) > > Caveats: > > * busy-ioctl: the modern interface is completely thrown away, > regressing us back to before > > commit e9808edd98679680804dfbc42c5ee8f1aa91f617 [v3.10] > Author: Chris Wilson > Date: Wed Jul 4 12:25:08 2012 +0100 > > drm/i915: Return a mask of the active rings in the high word of busy_ioctl There is an alternative here, as we can walk through the obj->resv and convert *i915* fences to their respective read/write engines. However, this means that busy-ioctl and wait-ioctl(.timeout=0) are no longer equivalent as that busy-ioctl may report an object as idle but still has external rendering (and so wait-ioctl, set-to-domain-ioctl, pread, pwrite may stall). Walking the resv object is also more expensive than doing a simple test. Though that can be mitigated later by opencoding the test (once we can do the whole busy-ioctl under only RCU protection that makes more sense). But it makes busy-ioctl slower than ideal, and busy-ioctl is high frequency (so long as we keep avoiding struct_mutex, the difference is likely only visible in the microbenchmarks). Pro: we keep the finese of being able to report which engines are busy Con: not equivalent to wait-ioctl, polling the dmabuf, etc. That we have alternative mechanisms to check may be considered a good thing, i.e. we can migrate to using wait-ioctl for the general am I going to stall test, and using busy-ioctl for engine selection. I think I am leaning towards keeping the ABI more or less intact for the time being. Though we are getting pretty close to its limit on NUM_ENGINES. int i915_gem_busy_ioctl(struct drm_device *dev, void *data, struct drm_file *file) { struct drm_i915_gem_busy *args = data; struct drm_i915_gem_object *obj; struct fence **shared, *excl; unsigned int count, i; int ret; ret = -ENOENT; obj = i915_gem_object_lookup(file, args->handle); if (obj) { ret = reservation_object_get_fences_rcu(obj->resv, &excl, &count, &shared); i915_gem_object_put_unlocked(obj); } if (unlikely(ret)) return ret; /* A discrepancy here is that we do not report the status of * non-i915 fences, i.e. even though we may report the object as idle, * a call to set-domain may still stall waiting for foreign rendering. * This also means that wait-ioctl may report an object as busy, * where busy-ioctl considers it idle. * * We trade the ability to warn of foreign fences to report on which * i915 engines are active for the object. * * Alternatively, we can trade that extra information on read/write * activity with * args->busy = * !reservation_object_test_signaled_rcu(obj->resv, true); * to report the overall busyness. This is what the wait-ioctl does. */ args->busy = 0; /* Translate shared fences to READ set of engines */ for (i = 0; i < count; i++) { args->busy |= busy_check_reader(shared[i]); fence_put(shared[i]); } kfree(shared); /* Translate the exclusive fence to the READ *and* WRITE engine */ if (excl) { args->busy |= busy_check_writer(excl); fence_put(excl); } return 0; } -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
On Thu, 15 Sep 2016, Dave Gordon wrote: > On 14/09/16 18:35, Chris Wilson wrote: >> On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote: >>> On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote: -static inline bool -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj, -int engine) -{ - return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT); + return obj->active_count; >>> >>> our type is bool, so !!obj->active_count; >> >> That's the beauty of using bool, !! is implied on the (bool)integer >> conversion. >> -Chris > > It's a gcc extension, though, so does it work on clang? It's in the standard. "When any scalar value is converted to _Bool, the result is 0 if the value compares equal to 0; otherwise, the result is 1." BR, Jani. > > .Dave. > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Jani Nikula, Intel Open Source Technology Center ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
On 14/09/16 18:35, Chris Wilson wrote: On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote: On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote: -static inline bool -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj, - int engine) -{ - return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT); + return obj->active_count; our type is bool, so !!obj->active_count; That's the beauty of using bool, !! is implied on the (bool)integer conversion. -Chris It's a gcc extension, though, so does it work on clang? .Dave. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
On Wed, Sep 14, 2016 at 12:44:04PM +0300, Joonas Lahtinen wrote: > On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote: > > -static inline bool > > -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj, > > - int engine) > > -{ > > - return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT); > > + return obj->active_count; > > our type is bool, so !!obj->active_count; That's the beauty of using bool, !! is implied on the (bool)integer conversion. -Chris -- Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
On ke, 2016-09-14 at 07:52 +0100, Chris Wilson wrote: > In preparation to support many distinct timelines, we need to expand the > activity tracking on the GEM object to handle more than just a request > per engine. We already use the struct reservation_object on the dma-buf > to handle many fence contexts, so integrating that into the GEM object > itself is the preferred solution. (For example, we can now share the same > reservation_object between every consumer/producer using this buffer and > skip the manual import/export via dma-buf.) > > Caveats: I'd make comments which patch in the series addresses each introduced problem, which are fixable in future and which are taken as "a permanent hit" for achieving multiple timelines. With a bit of reasoning for each (now only a few points include some of this). > static inline struct drm_i915_gem_object * > @@ -2347,35 +2341,10 @@ i915_gem_object_has_struct_page(const struct > drm_i915_gem_object *obj) > return obj->ops->flags & I915_GEM_OBJECT_HAS_STRUCT_PAGE; > } > > -static inline unsigned long > -i915_gem_object_get_active(const struct drm_i915_gem_object *obj) > -{ > - return (obj->flags >> I915_BO_ACTIVE_SHIFT) & I915_BO_ACTIVE_MASK; > -} > - > static inline bool > i915_gem_object_is_active(const struct drm_i915_gem_object *obj) > { > - return i915_gem_object_get_active(obj); > -} > - > -static inline void > -i915_gem_object_set_active(struct drm_i915_gem_object *obj, int engine) > -{ > - obj->flags |= BIT(engine + I915_BO_ACTIVE_SHIFT); > -} > - > -static inline void > -i915_gem_object_clear_active(struct drm_i915_gem_object *obj, int engine) > -{ > - obj->flags &= ~BIT(engine + I915_BO_ACTIVE_SHIFT); > -} > - > -static inline bool > -i915_gem_object_has_active_engine(const struct drm_i915_gem_object *obj, > - int engine) > -{ > - return obj->flags & BIT(engine + I915_BO_ACTIVE_SHIFT); > + return obj->active_count; our type is bool, so !!obj->active_count; > } > > > static int > i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req, > struct list_head *vmas) > { > - const unsigned int other_rings = eb_other_engines(req); > struct i915_vma *vma; > int ret; > > list_for_each_entry(vma, vmas, exec_list) { > struct drm_i915_gem_object *obj = vma->obj; > - struct reservation_object *resv; > - > - if (obj->flags & other_rings) { > - ret = i915_gem_request_await_object > - (req, obj, obj->base.pending_write_domain); > - if (ret) > - return ret; > - } > > - resv = i915_gem_object_get_dmabuf_resv(obj); > - if (resv) { > - ret = i915_sw_fence_await_reservation > - (&req->submit, resv, &i915_fence_ops, > - obj->base.pending_write_domain, 10*HZ, > - GFP_KERNEL | __GFP_NOWARN); > - if (ret < 0) > - return ret; > - } > + ret = i915_gem_request_await_object > + (req, obj, obj->base.pending_write_domain); I know it was previously like this, but I'm not sure I agree on this style at all. > @@ -11935,17 +11932,8 @@ static bool use_mmio_flip(struct intel_engine_cs > *engine, > > if (i915.use_mmio_flip < 0) > return false; > - else if (i915.use_mmio_flip > 0) > - return true; > - else if (i915.enable_execlists) > - return true; > > - resv = i915_gem_object_get_dmabuf_resv(obj); > - if (resv && !reservation_object_test_signaled_rcu(resv, false)) > - return true; > - > - return engine != i915_gem_active_get_engine(&obj->last_write, > - > &obj->base.dev->struct_mutex); > + return true; return i915_use_mmio_flip >= 0; // ? > @@ -860,39 +860,6 @@ struct drm_i915_gem_busy { > * long as no new GPU commands are executed upon it). Due to the > * asynchronous nature of the hardware, an object reported > * as busy may become idle before the ioctl is completed. > - * > - * Furthermore, if the object is busy, which engine is busy is only > - * provided as a guide. There are race conditions which prevent the > - * report of which engines are busy from being always accurate. > - * However, the converse is not true. If the object is idle, the > - * result of the ioctl, that all engines are idle, is accurate. > - * > - * The returned dword is split into two fields to indicate both > - * the engines on which the object is being read, and the > - * engine on which it is currently being written (if any). > - * > - * The low word (bits 0:15) indicate if the objec
[Intel-gfx] [PATCH 05/18] drm/i915: Move GEM activity tracking into a common struct reservation_object
In preparation to support many distinct timelines, we need to expand the activity tracking on the GEM object to handle more than just a request per engine. We already use the struct reservation_object on the dma-buf to handle many fence contexts, so integrating that into the GEM object itself is the preferred solution. (For example, we can now share the same reservation_object between every consumer/producer using this buffer and skip the manual import/export via dma-buf.) Caveats: * busy-ioctl: the modern interface is completely thrown away, regressing us back to before commit e9808edd98679680804dfbc42c5ee8f1aa91f617 [v3.10] Author: Chris Wilson Date: Wed Jul 4 12:25:08 2012 +0100 drm/i915: Return a mask of the active rings in the high word of busy_ioctl * local engine selection and semaphore avoidance is lost for CS flips * non-blocking atomic modesets take a step backwards as the wait for render completion blocks the ioctl. This is fixed in a subsequent patch to use a fence instead for awaiting on the rendering. * dynamic array manipulation for shared-fences in reservation is slower than the previous lockless static assignment (e.g. gem_exec_lut_handle runtime on ivb goes from 42s to 72s) * loss of object-level retirement callbacks * minor loss of object-level last activity information from debugfs, could be replaced with per-vma information if desired Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c| 18 +-- drivers/gpu/drm/i915/i915_drv.h| 43 + drivers/gpu/drm/i915/i915_gem.c| 252 +++-- drivers/gpu/drm/i915/i915_gem_batch_pool.c | 3 +- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 46 +- drivers/gpu/drm/i915/i915_gem_dmabuf.h | 45 -- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 71 ++-- drivers/gpu/drm/i915/i915_gem_gtt.c| 29 drivers/gpu/drm/i915/i915_gem_gtt.h| 1 + drivers/gpu/drm/i915/i915_gem_request.c| 48 +++--- drivers/gpu/drm/i915/i915_gem_request.h| 35 +--- drivers/gpu/drm/i915/i915_gpu_error.c | 6 +- drivers/gpu/drm/i915/intel_atomic_plane.c | 2 - drivers/gpu/drm/i915/intel_display.c | 122 +++--- drivers/gpu/drm/i915/intel_drv.h | 3 - include/uapi/drm/i915_drm.h| 33 16 files changed, 127 insertions(+), 630 deletions(-) delete mode 100644 drivers/gpu/drm/i915/i915_gem_dmabuf.h diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 64702cc68e3a..c4e7532c5b6a 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -134,15 +134,13 @@ static void describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) { struct drm_i915_private *dev_priv = to_i915(obj->base.dev); - struct intel_engine_cs *engine; struct i915_vma *vma; unsigned int frontbuffer_bits; int pin_count = 0; - enum intel_engine_id id; lockdep_assert_held(&obj->base.dev->struct_mutex); - seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x [ ", + seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x %s%s%s", &obj->base, get_active_flag(obj), get_pin_flag(obj), @@ -151,14 +149,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) get_pin_mapped_flag(obj), obj->base.size / 1024, obj->base.read_domains, - obj->base.write_domain); - for_each_engine_id(engine, dev_priv, id) - seq_printf(m, "%x ", - i915_gem_active_get_seqno(&obj->last_read[id], - &obj->base.dev->struct_mutex)); - seq_printf(m, "] %x %s%s%s", - i915_gem_active_get_seqno(&obj->last_write, -&obj->base.dev->struct_mutex), + obj->base.write_domain, i915_cache_level_str(dev_priv, obj->cache_level), obj->dirty ? " dirty" : "", obj->madv == I915_MADV_DONTNEED ? " purgeable" : ""); @@ -198,11 +189,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) seq_printf(m, " (%s mappable)", s); } - engine = i915_gem_active_get_engine(&obj->last_write, - &dev_priv->drm.struct_mutex); - if (engine) - seq_printf(m, " (%s)", engine->name); - frontbuffer_bits = atomic_read(&obj->frontbuffer_bits); if (frontbuffer_bits) seq_printf(m, " (frontbuffer: 0x%03x)", frontbuffer_bits); diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index af0212032b64..2bcab3087e8c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -41,6 +41,7 @@ #include #include