Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread Daniel Vetter
On Wed, Aug 22, 2018 at 11:59 AM, zhoucm1  wrote:
>
>
> On 2018年08月22日 17:31, Daniel Vetter wrote:
>>
>> On Wed, Aug 22, 2018 at 05:28:17PM +0800, zhoucm1 wrote:
>>>
>>>
>>> On 2018年08月22日 17:24, Daniel Vetter wrote:

 On Wed, Aug 22, 2018 at 04:49:28PM +0800, Chunming Zhou wrote:
>
> VK_KHR_timeline_semaphore:
> This extension introduces a new type of semaphore that has an integer
> payload
> identifying a point in a timeline. Such timeline semaphores support the
> following operations:
> * Host query - A host operation that allows querying the payload of
> the
>   timeline semaphore.
> * Host wait - A host operation that allows a blocking wait for a
>   timeline semaphore to reach a specified value.
> * Device wait - A device operation that allows waiting for a
>   timeline semaphore to reach a specified value.
> * Device signal - A device operation that allows advancing the
>   timeline semaphore to a specified value.
>
> Since it's a timeline, that means the front time point(PT) always is
> signaled before the late PT.
> a. signal PT design:
> Signal PT fence N depends on PT[N-1] fence and signal opertion fence,
> when PT[N] fence is signaled,
> the timeline will increase to value of PT[N].
> b. wait PT design:
> Wait PT fence is signaled by reaching timeline point value, when
> timeline is increasing, will compare
> wait PTs value with new timeline value, if PT value is lower than
> timeline value, then wait PT will be
> signaled, otherwise keep in list. semaphore wait operation can wait on
> any point of timeline,
> so need a RB tree to order them. And wait PT could ahead of signal PT,
> we need a sumission fence to
> perform that.
>
> TODO:
> CPU query and wait on timeline semaphore.

 Another TODO: igt testcase for all the cornercases. We already have
 other syncobj tests in there.
>>>
>>> Yes, I'm also trying to find where test should be wrote, Could you give a
>>> directory?
>>
>> There's already tests/syncobj_basic.c and tests/syncobj_wait.c. Either
>> extend those, or probably better to start a new tests/syncobj_timeline.c
>> since I expect this will have a lot of corner-cases we need to check.
>
> I failed to find them in both kernel and libdrm, Could you point which test
> you said?

igt testcases. It's a separate thing with lots of drm tests:

https://cgit.freedesktop.org/drm/igt-gpu-tools

I know that amdgpu has their tests in libdrm (which imo is a bit
unfortunate split, but there's also reasons and stuff).

Cheers, Daniel

> Thanks,
> David Zhou
>
>> -Daniel
>>
>>> Thanks,
>>> David Zhou

 That would also help with understanding how this is supposed to be used,
 since I'm a bit too dense to immediately get your algorithm by just
 staring at the code.


> Change-Id: I9f09aae225e268442c30451badac40406f24262c
> Signed-off-by: Chunming Zhou 
> Cc: Christian Konig 
> Cc: Dave Airlie 
> Cc: Daniel Rakos 
> ---
>drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
>drivers/gpu/drm/drm_syncobj.c  | 385
> -
>drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
>drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
>include/drm/drm_syncobj.h  |  45 +++-
>include/uapi/drm/drm.h |   3 +-
>6 files changed, 435 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d42d1c8f78f6..463cc8960723 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1105,7 +1105,7 @@ static int
> amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
>{
> int r;
> struct dma_fence *fence;
> -   r = drm_syncobj_find_fence(p->filp, handle, );
> +   r = drm_syncobj_find_fence(p->filp, handle, , 0);
> if (r)
> return r;
> @@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct
> amdgpu_cs_parser *p)
>{
> int i;
> -   for (i = 0; i < p->num_post_dep_syncobjs; ++i)
> -   drm_syncobj_replace_fence(p->post_dep_syncobjs[i],
> p->fence);
> +   for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
> +   drm_syncobj_signal_fence(p->post_dep_syncobjs[i],
> p->fence, 0);
> +   }
>}
>static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> diff --git a/drivers/gpu/drm/drm_syncobj.c
> b/drivers/gpu/drm/drm_syncobj.c
> index 70af32d0def1..3709f36c901e 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct 

Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread zhoucm1



On 2018年08月22日 17:31, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 05:28:17PM +0800, zhoucm1 wrote:


On 2018年08月22日 17:24, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 04:49:28PM +0800, Chunming Zhou wrote:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
* Host query - A host operation that allows querying the payload of the
  timeline semaphore.
* Host wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

TODO:
CPU query and wait on timeline semaphore.

Another TODO: igt testcase for all the cornercases. We already have
other syncobj tests in there.

Yes, I'm also trying to find where test should be wrote, Could you give a
directory?

There's already tests/syncobj_basic.c and tests/syncobj_wait.c. Either
extend those, or probably better to start a new tests/syncobj_timeline.c
since I expect this will have a lot of corner-cases we need to check.
I failed to find them in both kernel and libdrm, Could you point which 
test you said?


Thanks,
David Zhou

-Daniel


Thanks,
David Zhou

That would also help with understanding how this is supposed to be used,
since I'm a bit too dense to immediately get your algorithm by just
staring at the code.



Change-Id: I9f09aae225e268442c30451badac40406f24262c
Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
   drivers/gpu/drm/drm_syncobj.c  | 385 
-
   drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
   drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
   include/drm/drm_syncobj.h  |  45 +++-
   include/uapi/drm/drm.h |   3 +-
   6 files changed, 435 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d42d1c8f78f6..463cc8960723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1105,7 +1105,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
   {
int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, );
+   r = drm_syncobj_find_fence(p->filp, handle, , 0);
if (r)
return r;
@@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct 
amdgpu_cs_parser *p)
   {
int i;
-   for (i = 0; i < p->num_post_dep_syncobjs; ++i)
-   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+   for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
+   drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
+   }
   }
   static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 70af32d0def1..3709f36c901e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
   }
   EXPORT_SYMBOL(drm_syncobj_replace_fence);
+static void drm_syncobj_timeline_signal_submission_fences(struct drm_syncobj 
*syncobj)
+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; node = rb_next(node)) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
+   if (wait_pt->submission_fence)
+   dma_fence_signal(_pt->submission_fence->base);
+   } else {
+   /* the loop is from left to right, the later entry value is
+* bigger, so don't need to check any more */
+   break;
+   }

This seems to reinvet 

Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread Daniel Vetter
On Wed, Aug 22, 2018 at 05:28:17PM +0800, zhoucm1 wrote:
> 
> 
> On 2018年08月22日 17:24, Daniel Vetter wrote:
> > On Wed, Aug 22, 2018 at 04:49:28PM +0800, Chunming Zhou wrote:
> > > VK_KHR_timeline_semaphore:
> > > This extension introduces a new type of semaphore that has an integer 
> > > payload
> > > identifying a point in a timeline. Such timeline semaphores support the
> > > following operations:
> > >* Host query - A host operation that allows querying the payload of the
> > >  timeline semaphore.
> > >* Host wait - A host operation that allows a blocking wait for a
> > >  timeline semaphore to reach a specified value.
> > >* Device wait - A device operation that allows waiting for a
> > >  timeline semaphore to reach a specified value.
> > >* Device signal - A device operation that allows advancing the
> > >  timeline semaphore to a specified value.
> > > 
> > > Since it's a timeline, that means the front time point(PT) always is 
> > > signaled before the late PT.
> > > a. signal PT design:
> > > Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
> > > when PT[N] fence is signaled,
> > > the timeline will increase to value of PT[N].
> > > b. wait PT design:
> > > Wait PT fence is signaled by reaching timeline point value, when timeline 
> > > is increasing, will compare
> > > wait PTs value with new timeline value, if PT value is lower than 
> > > timeline value, then wait PT will be
> > > signaled, otherwise keep in list. semaphore wait operation can wait on 
> > > any point of timeline,
> > > so need a RB tree to order them. And wait PT could ahead of signal PT, we 
> > > need a sumission fence to
> > > perform that.
> > > 
> > > TODO:
> > > CPU query and wait on timeline semaphore.
> > Another TODO: igt testcase for all the cornercases. We already have
> > other syncobj tests in there.
> Yes, I'm also trying to find where test should be wrote, Could you give a
> directory?

There's already tests/syncobj_basic.c and tests/syncobj_wait.c. Either
extend those, or probably better to start a new tests/syncobj_timeline.c
since I expect this will have a lot of corner-cases we need to check.
-Daniel

> 
> Thanks,
> David Zhou
> > 
> > That would also help with understanding how this is supposed to be used,
> > since I'm a bit too dense to immediately get your algorithm by just
> > staring at the code.
> > 
> > 
> > > Change-Id: I9f09aae225e268442c30451badac40406f24262c
> > > Signed-off-by: Chunming Zhou 
> > > Cc: Christian Konig 
> > > Cc: Dave Airlie 
> > > Cc: Daniel Rakos 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
> > >   drivers/gpu/drm/drm_syncobj.c  | 385 
> > > -
> > >   drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
> > >   drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
> > >   include/drm/drm_syncobj.h  |  45 +++-
> > >   include/uapi/drm/drm.h |   3 +-
> > >   6 files changed, 435 insertions(+), 11 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > index d42d1c8f78f6..463cc8960723 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > > @@ -1105,7 +1105,7 @@ static int 
> > > amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
> > >   {
> > >   int r;
> > >   struct dma_fence *fence;
> > > - r = drm_syncobj_find_fence(p->filp, handle, );
> > > + r = drm_syncobj_find_fence(p->filp, handle, , 0);
> > >   if (r)
> > >   return r;
> > > @@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct 
> > > amdgpu_cs_parser *p)
> > >   {
> > >   int i;
> > > - for (i = 0; i < p->num_post_dep_syncobjs; ++i)
> > > - drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
> > > + for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
> > > + drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
> > > + }
> > >   }
> > >   static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> > > diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> > > index 70af32d0def1..3709f36c901e 100644
> > > --- a/drivers/gpu/drm/drm_syncobj.c
> > > +++ b/drivers/gpu/drm/drm_syncobj.c
> > > @@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
> > > *syncobj,
> > >   }
> > >   EXPORT_SYMBOL(drm_syncobj_replace_fence);
> > > +static void drm_syncobj_timeline_signal_submission_fences(struct 
> > > drm_syncobj *syncobj)
> > > +{
> > > + struct rb_node *node = NULL;
> > > + struct drm_syncobj_wait_pt *wait_pt = NULL;
> > > +
> > > + spin_lock(>lock);
> > > + for(node = rb_first(>syncobj_timeline.wait_pt_tree);
> > > + node != NULL; node = rb_next(node)) {
> > > + wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
> > > + if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
> > > +

Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread zhoucm1



On 2018年08月22日 17:24, Daniel Vetter wrote:

On Wed, Aug 22, 2018 at 04:49:28PM +0800, Chunming Zhou wrote:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
   * Host query - A host operation that allows querying the payload of the
 timeline semaphore.
   * Host wait - A host operation that allows a blocking wait for a
 timeline semaphore to reach a specified value.
   * Device wait - A device operation that allows waiting for a
 timeline semaphore to reach a specified value.
   * Device signal - A device operation that allows advancing the
 timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

TODO:
CPU query and wait on timeline semaphore.

Another TODO: igt testcase for all the cornercases. We already have
other syncobj tests in there.
Yes, I'm also trying to find where test should be wrote, Could you give 
a directory?


Thanks,
David Zhou


That would also help with understanding how this is supposed to be used,
since I'm a bit too dense to immediately get your algorithm by just
staring at the code.



Change-Id: I9f09aae225e268442c30451badac40406f24262c
Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
  drivers/gpu/drm/drm_syncobj.c  | 385 -
  drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
  drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
  include/drm/drm_syncobj.h  |  45 +++-
  include/uapi/drm/drm.h |   3 +-
  6 files changed, 435 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d42d1c8f78f6..463cc8960723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1105,7 +1105,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
  {
int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, );
+   r = drm_syncobj_find_fence(p->filp, handle, , 0);
if (r)
return r;
  
@@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)

  {
int i;
  
-	for (i = 0; i < p->num_post_dep_syncobjs; ++i)

-   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+   for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
+   drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
+   }
  }
  
  static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 70af32d0def1..3709f36c901e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  
+static void drm_syncobj_timeline_signal_submission_fences(struct drm_syncobj *syncobj)

+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; node = rb_next(node)) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
+   if (wait_pt->submission_fence)
+   dma_fence_signal(_pt->submission_fence->base);
+   } else {
+   /* the loop is from left to right, the later entry value is
+* bigger, so don't need to check any more */
+   break;
+   }

This seems to reinvet syncobj->cb_list. Since this is the exact same
"future fence that doesn't even exist yet" problem I think those two code
path should be unified. In general I think it'd be much better if we treat
the old syncobj as a timeline with a limit of 1 slot only.

Or there needs to be a really good reason why all new code.

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.


+   }
+ 

Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread Daniel Vetter
On Wed, Aug 22, 2018 at 04:49:28PM +0800, Chunming Zhou wrote:
> VK_KHR_timeline_semaphore:
> This extension introduces a new type of semaphore that has an integer payload
> identifying a point in a timeline. Such timeline semaphores support the
> following operations:
>   * Host query - A host operation that allows querying the payload of the
> timeline semaphore.
>   * Host wait - A host operation that allows a blocking wait for a
> timeline semaphore to reach a specified value.
>   * Device wait - A device operation that allows waiting for a
> timeline semaphore to reach a specified value.
>   * Device signal - A device operation that allows advancing the
> timeline semaphore to a specified value.
> 
> Since it's a timeline, that means the front time point(PT) always is signaled 
> before the late PT.
> a. signal PT design:
> Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
> PT[N] fence is signaled,
> the timeline will increase to value of PT[N].
> b. wait PT design:
> Wait PT fence is signaled by reaching timeline point value, when timeline is 
> increasing, will compare
> wait PTs value with new timeline value, if PT value is lower than timeline 
> value, then wait PT will be
> signaled, otherwise keep in list. semaphore wait operation can wait on any 
> point of timeline,
> so need a RB tree to order them. And wait PT could ahead of signal PT, we 
> need a sumission fence to
> perform that.
> 
> TODO:
> CPU query and wait on timeline semaphore.

Another TODO: igt testcase for all the cornercases. We already have
other syncobj tests in there.

That would also help with understanding how this is supposed to be used,
since I'm a bit too dense to immediately get your algorithm by just
staring at the code.


> Change-Id: I9f09aae225e268442c30451badac40406f24262c
> Signed-off-by: Chunming Zhou 
> Cc: Christian Konig 
> Cc: Dave Airlie 
> Cc: Daniel Rakos 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
>  drivers/gpu/drm/drm_syncobj.c  | 385 
> -
>  drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
>  drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
>  include/drm/drm_syncobj.h  |  45 +++-
>  include/uapi/drm/drm.h |   3 +-
>  6 files changed, 435 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index d42d1c8f78f6..463cc8960723 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -1105,7 +1105,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
> amdgpu_cs_parser *p,
>  {
>   int r;
>   struct dma_fence *fence;
> - r = drm_syncobj_find_fence(p->filp, handle, );
> + r = drm_syncobj_find_fence(p->filp, handle, , 0);
>   if (r)
>   return r;
>  
> @@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct 
> amdgpu_cs_parser *p)
>  {
>   int i;
>  
> - for (i = 0; i < p->num_post_dep_syncobjs; ++i)
> - drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
> + for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
> + drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
> + }
>  }
>  
>  static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 70af32d0def1..3709f36c901e 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
> *syncobj,
>  }
>  EXPORT_SYMBOL(drm_syncobj_replace_fence);
>  
> +static void drm_syncobj_timeline_signal_submission_fences(struct drm_syncobj 
> *syncobj)
> +{
> + struct rb_node *node = NULL;
> + struct drm_syncobj_wait_pt *wait_pt = NULL;
> +
> + spin_lock(>lock);
> + for(node = rb_first(>syncobj_timeline.wait_pt_tree);
> + node != NULL; node = rb_next(node)) {
> + wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
> + if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
> + if (wait_pt->submission_fence)
> + dma_fence_signal(_pt->submission_fence->base);
> + } else {
> + /* the loop is from left to right, the later entry value is
> +  * bigger, so don't need to check any more */
> + break;
> + }

This seems to reinvet syncobj->cb_list. Since this is the exact same
"future fence that doesn't even exist yet" problem I think those two code
path should be unified. In general I think it'd be much better if we treat
the old syncobj as a timeline with a limit of 1 slot only.

Or there needs to be a really good reason why all new code.

Specifically for this stuff here having unified future fence semantics
will allow drivers to do clever stuff with them.

> + }
> + spin_unlock(>lock);
> 

Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread Christian König

Am 22.08.2018 um 10:57 schrieb Christian König:

Am 22.08.2018 um 10:49 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer 
payload

identifying a point in a timeline. Such timeline semaphores support the
following operations:
   * Host query - A host operation that allows querying the payload 
of the

 timeline semaphore.
   * Host wait - A host operation that allows a blocking wait for a
 timeline semaphore to reach a specified value.
   * Device wait - A device operation that allows waiting for a
 timeline semaphore to reach a specified value.
   * Device signal - A device operation that allows advancing the
 timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

TODO:
CPU query and wait on timeline semaphore.


Please drop DRM_SYNCOBJ_CREATE_TYPE_NORMAL, that isn't used in the 
whole implementation.


Additional to that I would split up the change to 
drm_syncobj_find_fence() in a separate patch.


And please drop the submission_fence implementation and instead use 
wait_event() for that.


Implementing this as a fence has the possiblity that somebody is abusing 
it for wait before signal.


Christian.



Christian.



Change-Id: I9f09aae225e268442c30451badac40406f24262c
Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
  drivers/gpu/drm/drm_syncobj.c  | 385 
-

  drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
  drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
  include/drm/drm_syncobj.h  |  45 +++-
  include/uapi/drm/drm.h |   3 +-
  6 files changed, 435 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index d42d1c8f78f6..463cc8960723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1105,7 +1105,7 @@ static int 
amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,

  {
  int r;
  struct dma_fence *fence;
-    r = drm_syncobj_find_fence(p->filp, handle, );
+    r = drm_syncobj_find_fence(p->filp, handle, , 0);
  if (r)
  return r;
  @@ -1193,8 +1193,9 @@ static void 
amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)

  {
  int i;
  -    for (i = 0; i < p->num_post_dep_syncobjs; ++i)
-    drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+    for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
+    drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
+    }
  }
    static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index 70af32d0def1..3709f36c901e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct 
drm_syncobj *syncobj,

  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  +static void drm_syncobj_timeline_signal_submission_fences(struct 
drm_syncobj *syncobj)

+{
+    struct rb_node *node = NULL;
+    struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+    spin_lock(>lock);
+    for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+    node != NULL; node = rb_next(node)) {
+    wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+    if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
+    if (wait_pt->submission_fence)
+ dma_fence_signal(_pt->submission_fence->base);
+    } else {
+    /* the loop is from left to right, the later entry value is
+ * bigger, so don't need to check any more */
+    break;
+    }
+    }
+    spin_unlock(>lock);
+}
+
+static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj 
*syncobj)

+{
+    struct rb_node *node = NULL;
+    struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+    spin_lock(>lock);
+    for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+    node != NULL; ) {
+    wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+    node = rb_next(node);
+    if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {
+    dma_fence_signal(_pt->base.base);
+    

Re: [PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread Christian König

Am 22.08.2018 um 10:49 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
   * Host query - A host operation that allows querying the payload of the
 timeline semaphore.
   * Host wait - A host operation that allows a blocking wait for a
 timeline semaphore to reach a specified value.
   * Device wait - A device operation that allows waiting for a
 timeline semaphore to reach a specified value.
   * Device signal - A device operation that allows advancing the
 timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

TODO:
CPU query and wait on timeline semaphore.


Please drop DRM_SYNCOBJ_CREATE_TYPE_NORMAL, that isn't used in the whole 
implementation.


Additional to that I would split up the change to 
drm_syncobj_find_fence() in a separate patch.


Christian.



Change-Id: I9f09aae225e268442c30451badac40406f24262c
Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
  drivers/gpu/drm/drm_syncobj.c  | 385 -
  drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
  drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
  include/drm/drm_syncobj.h  |  45 +++-
  include/uapi/drm/drm.h |   3 +-
  6 files changed, 435 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d42d1c8f78f6..463cc8960723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1105,7 +1105,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
  {
int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, );
+   r = drm_syncobj_find_fence(p->filp, handle, , 0);
if (r)
return r;
  
@@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)

  {
int i;
  
-	for (i = 0; i < p->num_post_dep_syncobjs; ++i)

-   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+   for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
+   drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
+   }
  }
  
  static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 70af32d0def1..3709f36c901e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
  }
  EXPORT_SYMBOL(drm_syncobj_replace_fence);
  
+static void drm_syncobj_timeline_signal_submission_fences(struct drm_syncobj *syncobj)

+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; node = rb_next(node)) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
+   if (wait_pt->submission_fence)
+   dma_fence_signal(_pt->submission_fence->base);
+   } else {
+   /* the loop is from left to right, the later entry value is
+* bigger, so don't need to check any more */
+   break;
+   }
+   }
+   spin_unlock(>lock);
+}
+
+static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj)
+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; ) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   node = rb_next(node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {
+   dma_fence_signal(_pt->base.base);
+   rb_erase(_pt->node,
+>syncobj_timeline.wait_pt_tree);
+   

[PATCH 2/2] [RFC]drm: add syncobj timeline support

2018-08-22 Thread Chunming Zhou
VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
  * Host query - A host operation that allows querying the payload of the
timeline semaphore.
  * Host wait - A host operation that allows a blocking wait for a
timeline semaphore to reach a specified value.
  * Device wait - A device operation that allows waiting for a
timeline semaphore to reach a specified value.
  * Device signal - A device operation that allows advancing the
timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

TODO:
CPU query and wait on timeline semaphore.

Change-Id: I9f09aae225e268442c30451badac40406f24262c
Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |   7 +-
 drivers/gpu/drm/drm_syncobj.c  | 385 -
 drivers/gpu/drm/v3d/v3d_gem.c  |   4 +-
 drivers/gpu/drm/vc4/vc4_gem.c  |   2 +-
 include/drm/drm_syncobj.h  |  45 +++-
 include/uapi/drm/drm.h |   3 +-
 6 files changed, 435 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d42d1c8f78f6..463cc8960723 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1105,7 +1105,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
 {
int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, );
+   r = drm_syncobj_find_fence(p->filp, handle, , 0);
if (r)
return r;
 
@@ -1193,8 +1193,9 @@ static void amdgpu_cs_post_dependencies(struct 
amdgpu_cs_parser *p)
 {
int i;
 
-   for (i = 0; i < p->num_post_dep_syncobjs; ++i)
-   drm_syncobj_replace_fence(p->post_dep_syncobjs[i], p->fence);
+   for (i = 0; i < p->num_post_dep_syncobjs; ++i) {
+   drm_syncobj_signal_fence(p->post_dep_syncobjs[i], p->fence, 0);
+   }
 }
 
 static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 70af32d0def1..3709f36c901e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -187,6 +187,191 @@ void drm_syncobj_replace_fence(struct drm_syncobj 
*syncobj,
 }
 EXPORT_SYMBOL(drm_syncobj_replace_fence);
 
+static void drm_syncobj_timeline_signal_submission_fences(struct drm_syncobj 
*syncobj)
+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; node = rb_next(node)) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.signal_point) {
+   if (wait_pt->submission_fence)
+   dma_fence_signal(_pt->submission_fence->base);
+   } else {
+   /* the loop is from left to right, the later entry value is
+* bigger, so don't need to check any more */
+   break;
+   }
+   }
+   spin_unlock(>lock);
+}
+
+static void drm_syncobj_timeline_signal_wait_pts(struct drm_syncobj *syncobj)
+{
+   struct rb_node *node = NULL;
+   struct drm_syncobj_wait_pt *wait_pt = NULL;
+
+   spin_lock(>lock);
+   for(node = rb_first(>syncobj_timeline.wait_pt_tree);
+   node != NULL; ) {
+   wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+   node = rb_next(node);
+   if (wait_pt->value <= syncobj->syncobj_timeline.timeline) {
+   dma_fence_signal(_pt->base.base);
+   rb_erase(_pt->node,
+>syncobj_timeline.wait_pt_tree);
+   RB_CLEAR_NODE(_pt->node);
+   if (wait_pt->submission_fence)
+   dma_fence_put(_pt->submission_fence->base);
+   /* kfree(wait_pt) is excuted by fence put */
+