Re: [PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)
Am 10.08.2017 um 21:43 schrieb Dave Airlie: On 10 August 2017 at 23:56, Emil Velikovwrote: Hi Dave, On 18 July 2017 at 04:52, Dave Airlie wrote: +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, +amdgpu_context_handle context, +amdgpu_bo_list_handle bo_list_handle, +int num_chunks, +struct drm_amdgpu_cs_chunk *chunks, +uint64_t *seq_no) +{ + union drm_amdgpu_cs cs = {0}; + uint64_t *chunk_array; + int i, r; + if (num_chunks == 0) + return -EINVAL; + + chunk_array = alloca(sizeof(uint64_t) * num_chunks); Out of curiosity: Does malloc/free produce noticeable overhead that lead you you alloca? The preexisting code for these ioctls used alloca so I just followed the existing pattern. I doubt we'd notice malloc/free, but we shouldn't also be sending more than 5-10 chunks even in the future, so stack alloc should be fine. num_chunks is signed - should we bail on negative values, can we make it unsigned? Possibly, I don't see random users of this API appearing though, but yeah could change the if <= 0. I would rather make the variable unsigned, but yeah it's not so much of an issue. Christian. Dave. ___ amd-gfx mailing list amd-...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)
On 10 August 2017 at 23:56, Emil Velikovwrote: > Hi Dave, > > On 18 July 2017 at 04:52, Dave Airlie wrote: > >> +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, >> +amdgpu_context_handle context, >> +amdgpu_bo_list_handle bo_list_handle, >> +int num_chunks, >> +struct drm_amdgpu_cs_chunk *chunks, >> +uint64_t *seq_no) >> +{ >> + union drm_amdgpu_cs cs = {0}; >> + uint64_t *chunk_array; >> + int i, r; >> + if (num_chunks == 0) >> + return -EINVAL; >> + >> + chunk_array = alloca(sizeof(uint64_t) * num_chunks); > Out of curiosity: > Does malloc/free produce noticeable overhead that lead you you alloca? The preexisting code for these ioctls used alloca so I just followed the existing pattern. I doubt we'd notice malloc/free, but we shouldn't also be sending more than 5-10 chunks even in the future, so stack alloc should be fine. > num_chunks is signed - should we bail on negative values, can we make > it unsigned? Possibly, I don't see random users of this API appearing though, but yeah could change the if <= 0. Dave. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)
Hi Dave, On 18 July 2017 at 04:52, Dave Airliewrote: > +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, > +amdgpu_context_handle context, > +amdgpu_bo_list_handle bo_list_handle, > +int num_chunks, > +struct drm_amdgpu_cs_chunk *chunks, > +uint64_t *seq_no) > +{ > + union drm_amdgpu_cs cs = {0}; > + uint64_t *chunk_array; > + int i, r; > + if (num_chunks == 0) > + return -EINVAL; > + > + chunk_array = alloca(sizeof(uint64_t) * num_chunks); Out of curiosity: Does malloc/free produce noticeable overhead that lead you you alloca? num_chunks is signed - should we bail on negative values, can we make it unsigned? Thanks Emil ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)
Am 18.07.2017 um 05:52 schrieb Dave Airlie: From: Dave AirlieThis just sends chunks to the kernel API for a single command stream. This should provide a more future proof and extensible API for command submission. v2: use amdgpu_bo_list_handle, add two helper functions to access bo and context internals. Signed-off-by: Dave Airlie Reviewed-by: Christian König --- amdgpu/amdgpu.h| 30 ++ amdgpu/amdgpu_cs.c | 47 +++ 2 files changed, 77 insertions(+) diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index 183f974..238b1aa 100644 --- a/amdgpu/amdgpu.h +++ b/amdgpu/amdgpu.h @@ -1382,6 +1382,36 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, int shared_fd, uint32_t *syncobj); +/** + * Submit raw command submission to kernel + * + * \param dev - \c [in] device handle + * \param context- \c [in] context handle for context id + * \param bo_list_handle - \c [in] request bo list handle (0 for none) + * \param num_chunks - \c [in] number of CS chunks to submit + * \param chunks - \c [in] array of CS chunks + * \param seq_no - \c [out] output sequence number for submission. + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * + */ +struct drm_amdgpu_cs_chunk; +struct drm_amdgpu_cs_chunk_dep; +struct drm_amdgpu_cs_chunk_data; + +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, +amdgpu_context_handle context, +amdgpu_bo_list_handle bo_list_handle, +int num_chunks, +struct drm_amdgpu_cs_chunk *chunks, +uint64_t *seq_no); + +void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence, + struct drm_amdgpu_cs_chunk_dep *dep); +void amdgpu_cs_chunk_fence_info_to_data(struct amdgpu_cs_fence_info *fence_info, + struct drm_amdgpu_cs_chunk_data *data); + #ifdef __cplusplus } #endif diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index 722fd75..dfba875 100644 --- a/amdgpu/amdgpu_cs.c +++ b/amdgpu/amdgpu_cs.c @@ -634,3 +634,50 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, return drmSyncobjFDToHandle(dev->fd, shared_fd, handle); } + +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, +amdgpu_context_handle context, +amdgpu_bo_list_handle bo_list_handle, +int num_chunks, +struct drm_amdgpu_cs_chunk *chunks, +uint64_t *seq_no) +{ + union drm_amdgpu_cs cs = {0}; + uint64_t *chunk_array; + int i, r; + if (num_chunks == 0) + return -EINVAL; + + chunk_array = alloca(sizeof(uint64_t) * num_chunks); + for (i = 0; i < num_chunks; i++) + chunk_array[i] = (uint64_t)(uintptr_t)[i]; + cs.in.chunks = (uint64_t)(uintptr_t)chunk_array; + cs.in.ctx_id = context->id; + cs.in.bo_list_handle = bo_list_handle ? bo_list_handle->handle : 0; + cs.in.num_chunks = num_chunks; + r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_CS, + , sizeof(cs)); + if (r) + return r; + + if (seq_no) + *seq_no = cs.out.handle; + return 0; +} + +void amdgpu_cs_chunk_fence_info_to_data(struct amdgpu_cs_fence_info *fence_info, + struct drm_amdgpu_cs_chunk_data *data) +{ + data->fence_data.handle = fence_info->handle->handle; + data->fence_data.offset = fence_info->offset * sizeof(uint64_t); +} + +void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence, + struct drm_amdgpu_cs_chunk_dep *dep) +{ + dep->ip_type = fence->ip_type; + dep->ip_instance = fence->ip_instance; + dep->ring = fence->ring; + dep->ctx_id = fence->context->id; + dep->handle = fence->fence; +} ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel