Re: [PATCH 5/5] amdgpu: use drm sync objects for shared semaphores (v5)

2017-06-16 Thread Dave Airlie
> team which seem to make a lot of sense to upstream as well:
>
> 1. An IOCTL to reset a sync object to it's initial state. E.g. reset the
> fence the sync objects wraps back to NULL.
>
> 2. The ability to merge multiple sync objects into one. Essentially the same
> thing we have for the sync file, but handle based instead of fd.
>
> I'm going to work on that in the next week or so. Shouldn't be to much of a
> problem to come up with some sane code, but probably finding use cases for
> this in the open source userspace might be challenging.

Do we have use cases for these in the closed source stack? I'm sure if they are
useful enough, radv could use them for similiar things.

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] amdgpu: use drm sync objects for shared semaphores (v5)

2017-06-16 Thread Christian König

Am 15.06.2017 um 05:59 schrieb Dave Airlie:

On 1 June 2017 at 11:06, Dave Airlie  wrote:

From: Dave Airlie 

This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.

Sync objects are managed via the drm syncobj ioctls.

The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.

This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.

In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.

NOTE: this interface addition needs a version bump to expose
it to userspace.

TODO: update to dep_sync when rebasing onto amdgpu master.
(with this - r-b from Christian)

Hey Christian,

did you have a chance to re-review this, I think this
has the last changes you asked for done properly.


Sorry for the delay, holidays and dentist visits seem to eat up all my 
time at the moment.


Patches #1-#3 in this series are Acked-by: Christian König 
.


Patch #4 already has my rb.

Patch #5 in this Series is Reviewed-by: Christian König 
.



If so I'd like to get Alex to get drm-next + pull these in on top at some
point, or if I already have the dep_sync changes in my tree I can just
fix it up and apply them there.


Any approach works for me as long as Alex is fine with it.

Additional to your set I synced up two more ideas with out internal 
Vulkan team which seem to make a lot of sense to upstream as well:


1. An IOCTL to reset a sync object to it's initial state. E.g. reset the 
fence the sync objects wraps back to NULL.


2. The ability to merge multiple sync objects into one. Essentially the 
same thing we have for the sync file, but handle based instead of fd.


I'm going to work on that in the next week or so. Shouldn't be to much 
of a problem to come up with some sane code, but probably finding use 
cases for this in the open source userspace might be challenging.


Regards,
Christian.



Dave.



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] amdgpu: use drm sync objects for shared semaphores (v5)

2017-06-14 Thread Dave Airlie
On 1 June 2017 at 11:06, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This creates a new command submission chunk for amdgpu
> to add in and out sync objects around the submission.
>
> Sync objects are managed via the drm syncobj ioctls.
>
> The command submission interface is enhanced with two new
> chunks, one for syncobj pre submission dependencies,
> and one for post submission sync obj signalling,
> and just takes a list of handles for each.
>
> This is based on work originally done by David Zhou at AMD,
> with input from Christian Konig on what things should look like.
>
> In theory VkFences could be backed with sync objects and
> just get passed into the cs as syncobj handles as well.
>
> NOTE: this interface addition needs a version bump to expose
> it to userspace.
>
> TODO: update to dep_sync when rebasing onto amdgpu master.
> (with this - r-b from Christian)

Hey Christian,

did you have a chance to re-review this, I think this
has the last changes you asked for done properly.

If so I'd like to get Alex to get drm-next + pull these in on top at some
point, or if I already have the dep_sync changes in my tree I can just
fix it up and apply them there.

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] amdgpu: use drm sync objects for shared semaphores (v5)

2017-05-29 Thread Christian König

Am 29.05.2017 um 09:30 schrieb Dave Airlie:

From: Dave Airlie 

This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.

Sync objects are managed via the drm syncobj ioctls.

The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.

This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.

In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.

NOTE: this interface addition needs a version bump to expose
it to userspace.

TODO: update to dep_sync when rebasing onto amdgpu master.
(with this - r-b from Christian)

v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis

Signed-off-by: Dave Airlie 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 87 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
  include/uapi/drm/amdgpu_drm.h   |  6 +++
  3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 30225d7..3cbd547 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  
@@ -226,6 +227,8 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data)

break;
  
  		case AMDGPU_CHUNK_ID_DEPENDENCIES:

+   case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
break;
  
  		default:

@@ -1040,6 +1043,40 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
return 0;
  }
  
+static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,

+uint32_t handle)
+{
+   int r;
+   struct dma_fence *fence;
+   r = drm_syncobj_fence_get(p->filp, handle, );
+   if (r)
+   return r;
+
+   r = amdgpu_sync_fence(p->adev, >job->sync, fence);
+   dma_fence_put(fence);
+
+   return r;
+}
+
+static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
+   struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
  static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
  struct amdgpu_cs_parser *p)
  {
@@ -1054,12 +1091,54 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
r = amdgpu_cs_process_fence_dep(p, chunk);
if (r)
return r;
+   } else if (chunk->chunk_id == AMDGPU_CHUNK_ID_SYNCOBJ_IN) {
+   r = amdgpu_cs_process_syncobj_in_dep(p, chunk);
+   if (r)
+   return r;
}
}
  
  	return 0;

  }
  
+static int amdgpu_cs_process_syncobj_out_dep(struct amdgpu_cs_parser *p,

+struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = drm_syncobj_replace_fence(p->filp, deps[i].handle,
+ p->fence);
+   if (r)
+   return r;


Thinking more about it, here is still a problem with the handling.

Imagine you have multiple semaphores to signal and only the last handle 
is invalid.


So if you have n handles then the sync objects 1..(n-1) will get a fence 
which is never signaled because n aborts the command submission.


I think the only valid approach to handle this is to resolve the handles 
upfront during the initial parsing of the chunks.


Christian.


+   }
+   return 0;

[PATCH 5/5] amdgpu: use drm sync objects for shared semaphores (v5)

2017-05-29 Thread Dave Airlie
From: Dave Airlie 

This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.

Sync objects are managed via the drm syncobj ioctls.

The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.

This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.

In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.

NOTE: this interface addition needs a version bump to expose
it to userspace.

TODO: update to dep_sync when rebasing onto amdgpu master.
(with this - r-b from Christian)

v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 87 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
 include/uapi/drm/amdgpu_drm.h   |  6 +++
 3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 30225d7..3cbd547 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 
@@ -226,6 +227,8 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void 
*data)
break;
 
case AMDGPU_CHUNK_ID_DEPENDENCIES:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
break;
 
default:
@@ -1040,6 +1043,40 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
return 0;
 }
 
+static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
+uint32_t handle)
+{
+   int r;
+   struct dma_fence *fence;
+   r = drm_syncobj_fence_get(p->filp, handle, );
+   if (r)
+   return r;
+
+   r = amdgpu_sync_fence(p->adev, >job->sync, fence);
+   dma_fence_put(fence);
+
+   return r;
+}
+
+static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
+   struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
 static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
  struct amdgpu_cs_parser *p)
 {
@@ -1054,12 +1091,54 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
r = amdgpu_cs_process_fence_dep(p, chunk);
if (r)
return r;
+   } else if (chunk->chunk_id == AMDGPU_CHUNK_ID_SYNCOBJ_IN) {
+   r = amdgpu_cs_process_syncobj_in_dep(p, chunk);
+   if (r)
+   return r;
}
}
 
return 0;
 }
 
+static int amdgpu_cs_process_syncobj_out_dep(struct amdgpu_cs_parser *p,
+struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = drm_syncobj_replace_fence(p->filp, deps[i].handle,
+ p->fence);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
+static int amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)
+{
+   int i, r;
+
+   for (i = 0; i < p->nchunks; ++i) {
+   struct amdgpu_cs_chunk *chunk;
+
+   chunk = >chunks[i];
+
+   if (chunk->chunk_id == AMDGPU_CHUNK_ID_SYNCOBJ_OUT) {
+   r = amdgpu_cs_process_syncobj_out_dep(p, chunk);
+   if (r)
+   return r;
+   }
+   }
+   

Re: [PATCH 5/5] amdgpu: use drm sync objects for shared semaphores (v5)

2017-05-24 Thread Christian König

Am 24.05.2017 um 09:25 schrieb zhoucm1:



On 2017年05月24日 15:06, Dave Airlie wrote:

From: Dave Airlie 

This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.

Sync objects are managed via the drm syncobj ioctls.

The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.

This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.

In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.

NOTE: this interface addition needs a version bump to expose
it to userspace.

v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis

Signed-off-by: Dave Airlie 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 87 
-

  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
  include/uapi/drm/amdgpu_drm.h   |  6 +++
  3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c

index 30225d7..3cbd547 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  @@ -226,6 +227,8 @@ int amdgpu_cs_parser_init(struct 
amdgpu_cs_parser *p, void *data)

  break;
case AMDGPU_CHUNK_ID_DEPENDENCIES:
+case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
+case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
  break;
default:
@@ -1040,6 +1043,40 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,

  return 0;
  }
  +static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,

+ uint32_t handle)
+{
+int r;
+struct dma_fence *fence;
+r = drm_syncobj_fence_get(p->filp, handle, );
+if (r)
+return r;
+
+r = amdgpu_sync_fence(p->adev, >job->sync, fence);
We just fixed a sched fence issue by adding new job->dep_sync member, 
so here should use >job->dep_sync instead of >job->sync.


To be precise job->sync is now for the implicit fences from the BOs 
(e.g. buffers moves and such) and job->dep_sync is for all explicit 
fences from dependencies/semaphores.


The handling is now slightly different to fix a bug with two consecutive 
jobs scheduled to the same ring.


But apart from that the patch now looks good to me, so with that fixed 
it is Reviewed-by: Christian König .


Regards,
Christian.



Regards,
David Zhou

+dma_fence_put(fence);
+
+return r;
+}
+
+static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
+struct amdgpu_cs_chunk *chunk)
+{
+unsigned num_deps;
+int i, r;
+struct drm_amdgpu_cs_chunk_sem *deps;
+
+deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+num_deps = chunk->length_dw * 4 /
+sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+for (i = 0; i < num_deps; ++i) {
+r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle);
+if (r)
+return r;
+}
+return 0;
+}
+
  static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
struct amdgpu_cs_parser *p)
  {
@@ -1054,12 +1091,54 @@ static int amdgpu_cs_dependencies(struct 
amdgpu_device *adev,

  r = amdgpu_cs_process_fence_dep(p, chunk);
  if (r)
  return r;
+} else if (chunk->chunk_id == AMDGPU_CHUNK_ID_SYNCOBJ_IN) {
+r = amdgpu_cs_process_syncobj_in_dep(p, chunk);
+if (r)
+return r;
  }
  }
return 0;
  }
  +static int amdgpu_cs_process_syncobj_out_dep(struct 
amdgpu_cs_parser *p,

+ struct amdgpu_cs_chunk *chunk)
+{
+unsigned num_deps;
+int i, r;
+struct drm_amdgpu_cs_chunk_sem *deps;
+
+deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+num_deps = chunk->length_dw * 4 /
+sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+for (i = 0; i < num_deps; ++i) {
+r = drm_syncobj_replace_fence(p->filp, deps[i].handle,
+  p->fence);
+if (r)
+return r;
+}
+return 0;
+}
+
+static int amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)
+{
+int i, r;
+
+for (i = 0; i < p->nchunks; ++i) {
+struct amdgpu_cs_chunk *chunk;
+
+chunk = >chunks[i];
+
+if (chunk->chunk_id 

Re: [PATCH 5/5] amdgpu: use drm sync objects for shared semaphores (v5)

2017-05-24 Thread zhoucm1



On 2017年05月24日 15:06, Dave Airlie wrote:

From: Dave Airlie 

This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.

Sync objects are managed via the drm syncobj ioctls.

The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.

This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.

In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.

NOTE: this interface addition needs a version bump to expose
it to userspace.

v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis

Signed-off-by: Dave Airlie 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 87 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
  include/uapi/drm/amdgpu_drm.h   |  6 +++
  3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 30225d7..3cbd547 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -27,6 +27,7 @@
  #include 
  #include 
  #include 
+#include 
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  
@@ -226,6 +227,8 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data)

break;
  
  		case AMDGPU_CHUNK_ID_DEPENDENCIES:

+   case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
break;
  
  		default:

@@ -1040,6 +1043,40 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
return 0;
  }
  
+static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,

+uint32_t handle)
+{
+   int r;
+   struct dma_fence *fence;
+   r = drm_syncobj_fence_get(p->filp, handle, );
+   if (r)
+   return r;
+
+   r = amdgpu_sync_fence(p->adev, >job->sync, fence);
We just fixed a sched fence issue by adding new job->dep_sync member, so 
here should use >job->dep_sync instead of >job->sync.


Regards,
David Zhou

+   dma_fence_put(fence);
+
+   return r;
+}
+
+static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
+   struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
  static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
  struct amdgpu_cs_parser *p)
  {
@@ -1054,12 +1091,54 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
r = amdgpu_cs_process_fence_dep(p, chunk);
if (r)
return r;
+   } else if (chunk->chunk_id == AMDGPU_CHUNK_ID_SYNCOBJ_IN) {
+   r = amdgpu_cs_process_syncobj_in_dep(p, chunk);
+   if (r)
+   return r;
}
}
  
  	return 0;

  }
  
+static int amdgpu_cs_process_syncobj_out_dep(struct amdgpu_cs_parser *p,

+struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = drm_syncobj_replace_fence(p->filp, deps[i].handle,
+ p->fence);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
+static int amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)
+{
+   int i, r;
+
+   for (i = 0; i < p->nchunks; ++i) {
+   struct amdgpu_cs_chunk *chunk;
+
+   chunk = >chunks[i];
+
+   if (chunk->chunk_id == AMDGPU_CHUNK_ID_SYNCOBJ_OUT) {
+   r = amdgpu_cs_process_syncobj_out_dep(p, chunk);
+   

[PATCH 5/5] amdgpu: use drm sync objects for shared semaphores (v5)

2017-05-24 Thread Dave Airlie
From: Dave Airlie 

This creates a new command submission chunk for amdgpu
to add in and out sync objects around the submission.

Sync objects are managed via the drm syncobj ioctls.

The command submission interface is enhanced with two new
chunks, one for syncobj pre submission dependencies,
and one for post submission sync obj signalling,
and just takes a list of handles for each.

This is based on work originally done by David Zhou at AMD,
with input from Christian Konig on what things should look like.

In theory VkFences could be backed with sync objects and
just get passed into the cs as syncobj handles as well.

NOTE: this interface addition needs a version bump to expose
it to userspace.

v1.1: keep file reference on import.
v2: move to using syncobjs
v2.1: change some APIs to just use p pointer.
v3: make more robust against CS failures, we now add the
wait sems but only remove them once the CS job has been
submitted.
v4: rewrite names of API and base on new syncobj code.
v5: move post deps earlier, rename some apis

Signed-off-by: Dave Airlie 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 87 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 +-
 include/uapi/drm/amdgpu_drm.h   |  6 +++
 3 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 30225d7..3cbd547 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 
@@ -226,6 +227,8 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void 
*data)
break;
 
case AMDGPU_CHUNK_ID_DEPENDENCIES:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
break;
 
default:
@@ -1040,6 +1043,40 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
return 0;
 }
 
+static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
+uint32_t handle)
+{
+   int r;
+   struct dma_fence *fence;
+   r = drm_syncobj_fence_get(p->filp, handle, );
+   if (r)
+   return r;
+
+   r = amdgpu_sync_fence(p->adev, >job->sync, fence);
+   dma_fence_put(fence);
+
+   return r;
+}
+
+static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
+   struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
 static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
  struct amdgpu_cs_parser *p)
 {
@@ -1054,12 +1091,54 @@ static int amdgpu_cs_dependencies(struct amdgpu_device 
*adev,
r = amdgpu_cs_process_fence_dep(p, chunk);
if (r)
return r;
+   } else if (chunk->chunk_id == AMDGPU_CHUNK_ID_SYNCOBJ_IN) {
+   r = amdgpu_cs_process_syncobj_in_dep(p, chunk);
+   if (r)
+   return r;
}
}
 
return 0;
 }
 
+static int amdgpu_cs_process_syncobj_out_dep(struct amdgpu_cs_parser *p,
+struct amdgpu_cs_chunk *chunk)
+{
+   unsigned num_deps;
+   int i, r;
+   struct drm_amdgpu_cs_chunk_sem *deps;
+
+   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+   for (i = 0; i < num_deps; ++i) {
+   r = drm_syncobj_replace_fence(p->filp, deps[i].handle,
+ p->fence);
+   if (r)
+   return r;
+   }
+   return 0;
+}
+
+static int amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)
+{
+   int i, r;
+
+   for (i = 0; i < p->nchunks; ++i) {
+   struct amdgpu_cs_chunk *chunk;
+
+   chunk = >chunks[i];
+
+   if (chunk->chunk_id == AMDGPU_CHUNK_ID_SYNCOBJ_OUT) {
+   r = amdgpu_cs_process_syncobj_out_dep(p, chunk);
+   if (r)
+   return r;
+   }
+   }
+   return 0;
+}
+
 static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,