Re: [PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)

2017-08-11 Thread Christian König

Am 10.08.2017 um 21:43 schrieb Dave Airlie:

On 10 August 2017 at 23:56, Emil Velikov  wrote:

Hi Dave,

On 18 July 2017 at 04:52, Dave Airlie  wrote:


+int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
+amdgpu_context_handle context,
+amdgpu_bo_list_handle bo_list_handle,
+int num_chunks,
+struct drm_amdgpu_cs_chunk *chunks,
+uint64_t *seq_no)
+{
+   union drm_amdgpu_cs cs = {0};
+   uint64_t *chunk_array;
+   int i, r;
+   if (num_chunks == 0)
+   return -EINVAL;
+
+   chunk_array = alloca(sizeof(uint64_t) * num_chunks);

Out of curiosity:
Does malloc/free produce noticeable overhead that lead you you alloca?

The preexisting code for these ioctls used alloca so I just followed
the existing pattern.

I doubt we'd notice malloc/free, but we shouldn't also be sending more
than 5-10 chunks
even in the future, so stack alloc should be fine.


num_chunks is signed - should we bail on negative values, can we make
it unsigned?

Possibly, I don't see random users of this API appearing though, but yeah could
change the if <= 0.


I would rather make the variable unsigned, but yeah it's not so much of 
an issue.


Christian.



Dave.
___
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)

2017-08-10 Thread Dave Airlie
On 10 August 2017 at 23:56, Emil Velikov  wrote:
> Hi Dave,
>
> On 18 July 2017 at 04:52, Dave Airlie  wrote:
>
>> +int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
>> +amdgpu_context_handle context,
>> +amdgpu_bo_list_handle bo_list_handle,
>> +int num_chunks,
>> +struct drm_amdgpu_cs_chunk *chunks,
>> +uint64_t *seq_no)
>> +{
>> +   union drm_amdgpu_cs cs = {0};
>> +   uint64_t *chunk_array;
>> +   int i, r;
>> +   if (num_chunks == 0)
>> +   return -EINVAL;
>> +
>> +   chunk_array = alloca(sizeof(uint64_t) * num_chunks);
> Out of curiosity:
> Does malloc/free produce noticeable overhead that lead you you alloca?

The preexisting code for these ioctls used alloca so I just followed
the existing pattern.

I doubt we'd notice malloc/free, but we shouldn't also be sending more
than 5-10 chunks
even in the future, so stack alloc should be fine.

> num_chunks is signed - should we bail on negative values, can we make
> it unsigned?

Possibly, I don't see random users of this API appearing though, but yeah could
change the if <= 0.

Dave.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)

2017-08-10 Thread Emil Velikov
Hi Dave,

On 18 July 2017 at 04:52, Dave Airlie  wrote:

> +int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
> +amdgpu_context_handle context,
> +amdgpu_bo_list_handle bo_list_handle,
> +int num_chunks,
> +struct drm_amdgpu_cs_chunk *chunks,
> +uint64_t *seq_no)
> +{
> +   union drm_amdgpu_cs cs = {0};
> +   uint64_t *chunk_array;
> +   int i, r;
> +   if (num_chunks == 0)
> +   return -EINVAL;
> +
> +   chunk_array = alloca(sizeof(uint64_t) * num_chunks);
Out of curiosity:
Does malloc/free produce noticeable overhead that lead you you alloca?

num_chunks is signed - should we bail on negative values, can we make
it unsigned?

Thanks
Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)

2017-07-18 Thread Christian König

Am 18.07.2017 um 05:52 schrieb Dave Airlie:

From: Dave Airlie 

This just sends chunks to the kernel API for a single command
stream.

This should provide a more future proof and extensible API
for command submission.

v2: use amdgpu_bo_list_handle, add two helper functions to
access bo and context internals.

Signed-off-by: Dave Airlie 


Reviewed-by: Christian König 


---
  amdgpu/amdgpu.h| 30 ++
  amdgpu/amdgpu_cs.c | 47 +++
  2 files changed, 77 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 183f974..238b1aa 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1382,6 +1382,36 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
 int shared_fd,
 uint32_t *syncobj);
  
+/**

+ *  Submit raw command submission to kernel
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   context- \c [in] context handle for context id
+ * \param   bo_list_handle - \c [in] request bo list handle (0 for none)
+ * \param   num_chunks - \c [in] number of CS chunks to submit
+ * \param   chunks - \c [in] array of CS chunks
+ * \param   seq_no - \c [out] output sequence number for submission.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+struct drm_amdgpu_cs_chunk;
+struct drm_amdgpu_cs_chunk_dep;
+struct drm_amdgpu_cs_chunk_data;
+
+int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
+amdgpu_context_handle context,
+amdgpu_bo_list_handle bo_list_handle,
+int num_chunks,
+struct drm_amdgpu_cs_chunk *chunks,
+uint64_t *seq_no);
+
+void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence,
+ struct drm_amdgpu_cs_chunk_dep *dep);
+void amdgpu_cs_chunk_fence_info_to_data(struct amdgpu_cs_fence_info 
*fence_info,
+   struct drm_amdgpu_cs_chunk_data *data);
+
  #ifdef __cplusplus
  }
  #endif
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 722fd75..dfba875 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -634,3 +634,50 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
  
  	return drmSyncobjFDToHandle(dev->fd, shared_fd, handle);

  }
+
+int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
+amdgpu_context_handle context,
+amdgpu_bo_list_handle bo_list_handle,
+int num_chunks,
+struct drm_amdgpu_cs_chunk *chunks,
+uint64_t *seq_no)
+{
+   union drm_amdgpu_cs cs = {0};
+   uint64_t *chunk_array;
+   int i, r;
+   if (num_chunks == 0)
+   return -EINVAL;
+
+   chunk_array = alloca(sizeof(uint64_t) * num_chunks);
+   for (i = 0; i < num_chunks; i++)
+   chunk_array[i] = (uint64_t)(uintptr_t)[i];
+   cs.in.chunks = (uint64_t)(uintptr_t)chunk_array;
+   cs.in.ctx_id = context->id;
+   cs.in.bo_list_handle = bo_list_handle ? bo_list_handle->handle : 0;
+   cs.in.num_chunks = num_chunks;
+   r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_CS,
+   , sizeof(cs));
+   if (r)
+   return r;
+
+   if (seq_no)
+   *seq_no = cs.out.handle;
+   return 0;
+}
+
+void amdgpu_cs_chunk_fence_info_to_data(struct amdgpu_cs_fence_info 
*fence_info,
+   struct drm_amdgpu_cs_chunk_data *data)
+{
+   data->fence_data.handle = fence_info->handle->handle;
+   data->fence_data.offset = fence_info->offset * sizeof(uint64_t);
+}
+
+void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence,
+ struct drm_amdgpu_cs_chunk_dep *dep)
+{
+   dep->ip_type = fence->ip_type;
+   dep->ip_instance = fence->ip_instance;
+   dep->ring = fence->ring;
+   dep->ctx_id = fence->context->id;
+   dep->handle = fence->fence;
+}



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel