Re: [Intel-gfx] [PATCH] drm/i915: Track number of pending freed objects

2018-02-19 Thread Chris Wilson
Quoting Chris Wilson (2018-02-19 16:19:21)
> During igt, we frequently call into the driver to reset both HW and
> driver state (idling the device, waiting for it to become idle and
> freeing off old objects) to ensure that we start each test/subtest/pass
> from known state. This process incurs an RCU barrier or two to ensure
> that any such pending frees are indeed flushed before we return.
> However, unconditionally waiting on the RCU barrier adds needless delay
> to many callers, which adds up to several seconds when repeated thousands
> of times. We can skip the rcu_barrier() if by tracking how many outstanding
> frees we have, we know there are none.
> 
> The same path is used along suspend, where we may be able to save the
> unconditional RCU barrier.

To put it into perspective with a completely meaningless
microbenchmark, igt/gem_sync/idle is improved from 50ms to 30us on bdw.

> Signed-off-by: Chris Wilson 
> Cc: Mika Kuoppala 
> Cc: Joonas Lahtinen 
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Track number of pending freed objects

2018-02-19 Thread Chris Wilson
Quoting Tvrtko Ursulin (2018-02-19 18:13:13)
> 
> On 19/02/2018 16:19, Chris Wilson wrote:
> > During igt, we frequently call into the driver to reset both HW and
> > driver state (idling the device, waiting for it to become idle and
> > freeing off old objects) to ensure that we start each test/subtest/pass
> > from known state. This process incurs an RCU barrier or two to ensure
> > that any such pending frees are indeed flushed before we return.
> > However, unconditionally waiting on the RCU barrier adds needless delay
> > to many callers, which adds up to several seconds when repeated thousands
> > of times. We can skip the rcu_barrier() if by tracking how many outstanding
> > frees we have, we know there are none.
> 
> To be pedantic it is not skipping the rcu_barrier, but skipping the 
> drain altogether.
> 
> So theoretically there is a tiny difference in behaviour where today 
> drain would wait for all frees currently executing, where after the 
> patch it will ignore these and only process the ones which got to the 
> end of the function.
> 
> Perhaps it atomic_inc was at the very top of i915_gem_free_object it 
> would be closer to today. But such suggestions feel extremely iffy.

That's a smallish window. And it exists even today, with a race after
the RCU grace period (if you let userspace race with itself). I think
it's fair to say that we are dependent upon single-threaded client
operation here (either igt or suspend) for defined behaviour.

> Nosing around the code base suggest the change is completely fine. Only 
> potentially relevant site which might care about the subtle difference 
> is i915_gem_freeze_late, which actually doesnt care since everything has 
> been frozen at that point. So all frees have presumably exited and 
> incremented the new counter.

That's the idea at least :)
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH] drm/i915: Track number of pending freed objects

2018-02-19 Thread Tvrtko Ursulin


On 19/02/2018 16:19, Chris Wilson wrote:

During igt, we frequently call into the driver to reset both HW and
driver state (idling the device, waiting for it to become idle and
freeing off old objects) to ensure that we start each test/subtest/pass
from known state. This process incurs an RCU barrier or two to ensure
that any such pending frees are indeed flushed before we return.
However, unconditionally waiting on the RCU barrier adds needless delay
to many callers, which adds up to several seconds when repeated thousands
of times. We can skip the rcu_barrier() if by tracking how many outstanding
frees we have, we know there are none.


To be pedantic it is not skipping the rcu_barrier, but skipping the 
drain altogether.


So theoretically there is a tiny difference in behaviour where today 
drain would wait for all frees currently executing, where after the 
patch it will ignore these and only process the ones which got to the 
end of the function.


Perhaps it atomic_inc was at the very top of i915_gem_free_object it 
would be closer to today. But such suggestions feel extremely iffy.


Nosing around the code base suggest the change is completely fine. Only 
potentially relevant site which might care about the subtle difference 
is i915_gem_freeze_late, which actually doesnt care since everything has 
been frozen at that point. So all frees have presumably exited and 
incremented the new counter.


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



The same path is used along suspend, where we may be able to save the
unconditional RCU barrier.

Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Joonas Lahtinen 
---
  drivers/gpu/drm/i915/i915_drv.h | 8 
  drivers/gpu/drm/i915/i915_gem.c | 4 
  2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6b215c411338..76bfe909168c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1095,6 +1095,11 @@ struct i915_gem_mm {
struct llist_head free_list;
struct work_struct free_work;
spinlock_t free_lock;
+   /**
+* Count of objects pending destructions. Used to skip needlessly
+* waiting on an RCU barrier if no objects are waiting to be freed.
+*/
+   atomic_t free_count;
  
  	/**

 * Small stash of WC pages
@@ -3134,6 +3139,9 @@ void i915_gem_free_object(struct drm_gem_object *obj);
  
  static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)

  {
+   if (!atomic_read(&i915->mm.free_count))
+   return;
+
/* A single pass should suffice to release all the freed objects (along
 * most call paths) , but be a little more paranoid in that freeing
 * the objects does take a little amount of time, during which the rcu
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f530cd247724..f2dcf09f4ebc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4758,6 +4758,9 @@ static void __i915_gem_free_objects(struct 
drm_i915_private *i915,
kfree(obj->bit_17);
i915_gem_object_free(obj);
  
+		GEM_BUG_ON(!atomic_read(&i915->mm.free_count));

+   atomic_dec(&i915->mm.free_count);
+
if (on)
cond_resched();
}
@@ -4846,6 +4849,7 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 * i915_gem_busy_ioctl(). For the corresponding synchronized
 * lookup see i915_gem_object_lookup_rcu().
 */
+   atomic_inc(&to_i915(obj->base.dev)->mm.free_count);
call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
  }
  


___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] [PATCH] drm/i915: Track number of pending freed objects

2018-02-19 Thread Chris Wilson
During igt, we frequently call into the driver to reset both HW and
driver state (idling the device, waiting for it to become idle and
freeing off old objects) to ensure that we start each test/subtest/pass
from known state. This process incurs an RCU barrier or two to ensure
that any such pending frees are indeed flushed before we return.
However, unconditionally waiting on the RCU barrier adds needless delay
to many callers, which adds up to several seconds when repeated thousands
of times. We can skip the rcu_barrier() if by tracking how many outstanding
frees we have, we know there are none.

The same path is used along suspend, where we may be able to save the
unconditional RCU barrier.

Signed-off-by: Chris Wilson 
Cc: Mika Kuoppala 
Cc: Joonas Lahtinen 
---
 drivers/gpu/drm/i915/i915_drv.h | 8 
 drivers/gpu/drm/i915/i915_gem.c | 4 
 2 files changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 6b215c411338..76bfe909168c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1095,6 +1095,11 @@ struct i915_gem_mm {
struct llist_head free_list;
struct work_struct free_work;
spinlock_t free_lock;
+   /**
+* Count of objects pending destructions. Used to skip needlessly
+* waiting on an RCU barrier if no objects are waiting to be freed.
+*/
+   atomic_t free_count;
 
/**
 * Small stash of WC pages
@@ -3134,6 +3139,9 @@ void i915_gem_free_object(struct drm_gem_object *obj);
 
 static inline void i915_gem_drain_freed_objects(struct drm_i915_private *i915)
 {
+   if (!atomic_read(&i915->mm.free_count))
+   return;
+
/* A single pass should suffice to release all the freed objects (along
 * most call paths) , but be a little more paranoid in that freeing
 * the objects does take a little amount of time, during which the rcu
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f530cd247724..f2dcf09f4ebc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4758,6 +4758,9 @@ static void __i915_gem_free_objects(struct 
drm_i915_private *i915,
kfree(obj->bit_17);
i915_gem_object_free(obj);
 
+   GEM_BUG_ON(!atomic_read(&i915->mm.free_count));
+   atomic_dec(&i915->mm.free_count);
+
if (on)
cond_resched();
}
@@ -4846,6 +4849,7 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj)
 * i915_gem_busy_ioctl(). For the corresponding synchronized
 * lookup see i915_gem_object_lookup_rcu().
 */
+   atomic_inc(&to_i915(obj->base.dev)->mm.free_count);
call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
 }
 
-- 
2.16.1

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx