[PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)

2016-09-21 Thread Chris Wilson
On Wed, Sep 21, 2016 at 10:26:25AM +0300, Gustavo Padovan wrote:
> Hi Rafael,
> 
> 2016-09-14 Rafael Antognolli :
> 
> > Hi Chris and Gustavo,
> > 
> > On Mon, Aug 29, 2016 at 07:16:13PM +0100, Chris Wilson wrote:
> > > If we being polled with a timeout of zero, a nonblocking busy query,
> > > we don't need to install any fence callbacks as we will not be waiting.
> > > As we only install the callback once, the overhead comes from the atomic
> > > bit test that also causes serialisation between threads.
> > > 
> > > Signed-off-by: Chris Wilson 
> > > Cc: Sumit Semwal 
> > > Cc: Gustavo Padovan 
> > > Cc: linux-media at vger.kernel.org
> > > Cc: dri-devel at lists.freedesktop.org
> > > Cc: linaro-mm-sig at lists.linaro.org
> > > ---
> > >  drivers/dma-buf/sync_file.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > index 486d29c1a830..abb5fdab75fd 100644
> > > --- a/drivers/dma-buf/sync_file.c
> > > +++ b/drivers/dma-buf/sync_file.c
> > > @@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, 
> > > poll_table *wait)
> > >  
> > >   poll_wait(file, _file->wq, wait);
> > >  
> > > - if (!test_and_set_bit(POLL_ENABLED, _file->fence->flags)) {
> > > + if (!poll_does_not_wait(wait) &&
> > > + !test_and_set_bit(POLL_ENABLED, _file->fence->flags)) {
> > >   if (fence_add_callback(sync_file->fence, _file->cb,
> > >  fence_check_cb_func) < 0)
> > >   wake_up_all(_file->wq);
> > 
> > This commit is causing an error on one of the tests that Robert Foss
> > submitted for i-g-t. The one that does random merge of fences from
> > different timelines. A simple version of the test that still triggers
> > this is:
> > 
> > static void test_sync_simple_merge(void)
> > {
> > int fence1, fence2, fence_merge, timeline1, timeline2;
> > int ret;
> > 
> > timeline1 = sw_sync_timeline_create();
> > timeline2 = sw_sync_timeline_create();
> > fence1 = sw_sync_fence_create(timeline1, 1);
> > fence2 = sw_sync_fence_create(timeline2, 2);
> > fence_merge = sw_sync_merge(fence1, fence2);
> > sw_sync_timeline_inc(timeline1, 5);
> > sw_sync_timeline_inc(timeline2, 5);
> > 
> > ret = sw_sync_wait(fence_merge, 0);
> > igt_assert_f(ret > 0, "Failure triggering fence\n");
> > 
> > sw_sync_fence_destroy(fence_merge);
> > sw_sync_fence_destroy(fence1);
> > sw_sync_fence_destroy(fence2);
> > sw_sync_timeline_destroy(timeline1);
> > sw_sync_timeline_destroy(timeline2);
> > }
> > 
> > It looks like you cannot trust fence_is_signaled() without a
> > fence_add_callback(). I think the fence_array->num_pending won't get
> > updated. Although I couldn't figure out why it only happens if you merge
> > fences from different timelines.
> 
> Yes, num_pending is only updated when signaling is enabled. It only
> happens with different timelines because when you merge fences that are
> on the same timeline your final sync_file has only one fence and thus 
> a fence_array is not created.
> 
> If we want to keep the poll_does_not_wait optimization we need a way
> to count the pending fences during fence_is_signaled(). I'd propose
> something like this:
> 
> 
> Author: Gustavo Padovan 
> Date:   Tue Sep 20 16:43:06 2016 +0200
> 
> dma-buf/fence-array: get signaled state when signaling is disabled
> 
> If the fences in the fence_array signal on the fence_array does not have
> signalling enabled num_pending will not be updated accordingly.
> 
> So when signaling is disabled check the signal of every fence with
> fence_is_signaled() and then compare with num_pending to learn if the
> fence_array was signalled or not.
> 
> If we want to keep the poll_does_not_wait optimization I think we need
> something like this. It keeps the same behaviour if signalling is enabled
> but tries to calculated the state otherwise.
> 
> Signed-off-by: Gustavo Padovan 

We need this regardless, so yay for uncovering a bug!
> 
> diff --git a/drivers/dma-buf/fence-array.c b/drivers/dma-buf/fence-array.c
> index f1989fc..34c9209 100644
> --- a/drivers/dma-buf/fence-array.c
> +++ b/drivers/dma-buf/fence-array.c
> @@ -75,8 +75,18 @@ static bool fence_array_enable_signaling(struct fence 
> *fence)
>  static bool fence_array_signaled(struct fence *fence)
>  {
> struct fence_array *array = to_fence_array(fence);
> +   int i, num_pending;
>  
> -   return atomic_read(>num_pending) <= 0;
> +   num_pending = atomic_read(>num_pending);
> +
> +   if (!test_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, >flags)) {

Oh, very sneaky. I thought this was FENCE_FLAG_SIGNALED_BIT!

Throw in a comment like:

/* Before signaling is enabled, num_pending is static (set during array
 * construction as a count of *all* fences. To ensure 

[PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)

2016-09-21 Thread Gustavo Padovan
Hi Rafael,

2016-09-14 Rafael Antognolli :

> Hi Chris and Gustavo,
> 
> On Mon, Aug 29, 2016 at 07:16:13PM +0100, Chris Wilson wrote:
> > If we being polled with a timeout of zero, a nonblocking busy query,
> > we don't need to install any fence callbacks as we will not be waiting.
> > As we only install the callback once, the overhead comes from the atomic
> > bit test that also causes serialisation between threads.
> > 
> > Signed-off-by: Chris Wilson 
> > Cc: Sumit Semwal 
> > Cc: Gustavo Padovan 
> > Cc: linux-media at vger.kernel.org
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: linaro-mm-sig at lists.linaro.org
> > ---
> >  drivers/dma-buf/sync_file.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > index 486d29c1a830..abb5fdab75fd 100644
> > --- a/drivers/dma-buf/sync_file.c
> > +++ b/drivers/dma-buf/sync_file.c
> > @@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, 
> > poll_table *wait)
> >  
> > poll_wait(file, _file->wq, wait);
> >  
> > -   if (!test_and_set_bit(POLL_ENABLED, _file->fence->flags)) {
> > +   if (!poll_does_not_wait(wait) &&
> > +   !test_and_set_bit(POLL_ENABLED, _file->fence->flags)) {
> > if (fence_add_callback(sync_file->fence, _file->cb,
> >fence_check_cb_func) < 0)
> > wake_up_all(_file->wq);
> 
> This commit is causing an error on one of the tests that Robert Foss
> submitted for i-g-t. The one that does random merge of fences from
> different timelines. A simple version of the test that still triggers
> this is:
> 
> static void test_sync_simple_merge(void)
> {
> int fence1, fence2, fence_merge, timeline1, timeline2;
> int ret;
> 
> timeline1 = sw_sync_timeline_create();
> timeline2 = sw_sync_timeline_create();
> fence1 = sw_sync_fence_create(timeline1, 1);
> fence2 = sw_sync_fence_create(timeline2, 2);
> fence_merge = sw_sync_merge(fence1, fence2);
> sw_sync_timeline_inc(timeline1, 5);
> sw_sync_timeline_inc(timeline2, 5);
> 
> ret = sw_sync_wait(fence_merge, 0);
> igt_assert_f(ret > 0, "Failure triggering fence\n");
> 
> sw_sync_fence_destroy(fence_merge);
> sw_sync_fence_destroy(fence1);
> sw_sync_fence_destroy(fence2);
> sw_sync_timeline_destroy(timeline1);
> sw_sync_timeline_destroy(timeline2);
> }
> 
> It looks like you cannot trust fence_is_signaled() without a
> fence_add_callback(). I think the fence_array->num_pending won't get
> updated. Although I couldn't figure out why it only happens if you merge
> fences from different timelines.

Yes, num_pending is only updated when signaling is enabled. It only
happens with different timelines because when you merge fences that are
on the same timeline your final sync_file has only one fence and thus 
a fence_array is not created.

If we want to keep the poll_does_not_wait optimization we need a way
to count the pending fences during fence_is_signaled(). I'd propose
something like this:


Author: Gustavo Padovan 
Date:   Tue Sep 20 16:43:06 2016 +0200

dma-buf/fence-array: get signaled state when signaling is disabled

If the fences in the fence_array signal on the fence_array does not have
signalling enabled num_pending will not be updated accordingly.

So when signaling is disabled check the signal of every fence with
fence_is_signaled() and then compare with num_pending to learn if the
fence_array was signalled or not.

If we want to keep the poll_does_not_wait optimization I think we need
something like this. It keeps the same behaviour if signalling is enabled
but tries to calculated the state otherwise.

Signed-off-by: Gustavo Padovan 

diff --git a/drivers/dma-buf/fence-array.c b/drivers/dma-buf/fence-array.c
index f1989fc..34c9209 100644
--- a/drivers/dma-buf/fence-array.c
+++ b/drivers/dma-buf/fence-array.c
@@ -75,8 +75,18 @@ static bool fence_array_enable_signaling(struct fence *fence)
 static bool fence_array_signaled(struct fence *fence)
 {
struct fence_array *array = to_fence_array(fence);
+   int i, num_pending;

-   return atomic_read(>num_pending) <= 0;
+   num_pending = atomic_read(>num_pending);
+
+   if (!test_bit(FENCE_FLAG_ENABLE_SIGNAL_BIT, >flags)) {
+   for (i = 0 ; i < array->num_fences; ++i) {
+   if (fence_is_signaled(array->fences[i]))
+   num_pending--;
+   }
+   }
+
+   return num_pending <= 0;
 }

 static void fence_array_release(struct fence *fence)


Gustavo


[PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)

2016-09-14 Thread Rafael Antognolli
Hi Chris and Gustavo,

On Mon, Aug 29, 2016 at 07:16:13PM +0100, Chris Wilson wrote:
> If we being polled with a timeout of zero, a nonblocking busy query,
> we don't need to install any fence callbacks as we will not be waiting.
> As we only install the callback once, the overhead comes from the atomic
> bit test that also causes serialisation between threads.
> 
> Signed-off-by: Chris Wilson 
> Cc: Sumit Semwal 
> Cc: Gustavo Padovan 
> Cc: linux-media at vger.kernel.org
> Cc: dri-devel at lists.freedesktop.org
> Cc: linaro-mm-sig at lists.linaro.org
> ---
>  drivers/dma-buf/sync_file.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> index 486d29c1a830..abb5fdab75fd 100644
> --- a/drivers/dma-buf/sync_file.c
> +++ b/drivers/dma-buf/sync_file.c
> @@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, 
> poll_table *wait)
>  
>   poll_wait(file, _file->wq, wait);
>  
> - if (!test_and_set_bit(POLL_ENABLED, _file->fence->flags)) {
> + if (!poll_does_not_wait(wait) &&
> + !test_and_set_bit(POLL_ENABLED, _file->fence->flags)) {
>   if (fence_add_callback(sync_file->fence, _file->cb,
>  fence_check_cb_func) < 0)
>   wake_up_all(_file->wq);

This commit is causing an error on one of the tests that Robert Foss
submitted for i-g-t. The one that does random merge of fences from
different timelines. A simple version of the test that still triggers
this is:

static void test_sync_simple_merge(void)
{
int fence1, fence2, fence_merge, timeline1, timeline2;
int ret;

timeline1 = sw_sync_timeline_create();
timeline2 = sw_sync_timeline_create();
fence1 = sw_sync_fence_create(timeline1, 1);
fence2 = sw_sync_fence_create(timeline2, 2);
fence_merge = sw_sync_merge(fence1, fence2);
sw_sync_timeline_inc(timeline1, 5);
sw_sync_timeline_inc(timeline2, 5);

ret = sw_sync_wait(fence_merge, 0);
igt_assert_f(ret > 0, "Failure triggering fence\n");

sw_sync_fence_destroy(fence_merge);
sw_sync_fence_destroy(fence1);
sw_sync_fence_destroy(fence2);
sw_sync_timeline_destroy(timeline1);
sw_sync_timeline_destroy(timeline2);
}

It looks like you cannot trust fence_is_signaled() without a
fence_add_callback(). I think the fence_array->num_pending won't get
updated. Although I couldn't figure out why it only happens if you merge
fences from different timelines.

Regards,
Rafael


[PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)

2016-09-13 Thread Sumit Semwal
Hi Chris,

On 29 August 2016 at 23:56, Gustavo Padovan  wrote:
> Hi Chris,
>
> 2016-08-29 Chris Wilson :
>
>> If we being polled with a timeout of zero, a nonblocking busy query,
>> we don't need to install any fence callbacks as we will not be waiting.
>> As we only install the callback once, the overhead comes from the atomic
>> bit test that also causes serialisation between threads.
>>
>> Signed-off-by: Chris Wilson 
>> Cc: Sumit Semwal 
>> Cc: Gustavo Padovan 
>> Cc: linux-media at vger.kernel.org
>> Cc: dri-devel at lists.freedesktop.org
>> Cc: linaro-mm-sig at lists.linaro.org
>> ---
>>  drivers/dma-buf/sync_file.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> Indeed, we can shortcut this.
>
> Reviewed-by: Gustavo Padovan 
>
> Gustavo

Thanks; pushed to drm-misc.

Best,
Sumit.


[PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)

2016-08-29 Thread Chris Wilson
If we being polled with a timeout of zero, a nonblocking busy query,
we don't need to install any fence callbacks as we will not be waiting.
As we only install the callback once, the overhead comes from the atomic
bit test that also causes serialisation between threads.

Signed-off-by: Chris Wilson 
Cc: Sumit Semwal 
Cc: Gustavo Padovan 
Cc: linux-media at vger.kernel.org
Cc: dri-devel at lists.freedesktop.org
Cc: linaro-mm-sig at lists.linaro.org
---
 drivers/dma-buf/sync_file.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 486d29c1a830..abb5fdab75fd 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -306,7 +306,8 @@ static unsigned int sync_file_poll(struct file *file, 
poll_table *wait)

poll_wait(file, _file->wq, wait);

-   if (!test_and_set_bit(POLL_ENABLED, _file->fence->flags)) {
+   if (!poll_does_not_wait(wait) &&
+   !test_and_set_bit(POLL_ENABLED, _file->fence->flags)) {
if (fence_add_callback(sync_file->fence, _file->cb,
   fence_check_cb_func) < 0)
wake_up_all(_file->wq);
-- 
2.9.3



[PATCH] dma-buf/sync-file: Avoid enable fence signaling if poll(.timeout=0)

2016-08-29 Thread Gustavo Padovan
Hi Chris,

2016-08-29 Chris Wilson :

> If we being polled with a timeout of zero, a nonblocking busy query,
> we don't need to install any fence callbacks as we will not be waiting.
> As we only install the callback once, the overhead comes from the atomic
> bit test that also causes serialisation between threads.
> 
> Signed-off-by: Chris Wilson 
> Cc: Sumit Semwal 
> Cc: Gustavo Padovan 
> Cc: linux-media at vger.kernel.org
> Cc: dri-devel at lists.freedesktop.org
> Cc: linaro-mm-sig at lists.linaro.org
> ---
>  drivers/dma-buf/sync_file.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Indeed, we can shortcut this.

Reviewed-by: Gustavo Padovan 

Gustavo