Re: [PATCH] drm/ttm: use the parent resv for ghost objects v2

2019-10-24 Thread Zhou, David(ChunMing)

On 2019/10/24 下午6:25, Christian König wrote:
> Ping?
>
> Am 18.10.19 um 13:58 schrieb Christian König:
>> This way the TTM is destroyed with the correct dma_resv object
>> locked and we can even pipeline imported BO evictions.
>>
>> v2: Limit this to only cases when the parent object uses a separate
>>  reservation object as well. This fixes another OOM problem.
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_util.c | 16 +---
>>   1 file changed, 9 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
>> b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> index e030c27f53cf..45e440f80b7b 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_util.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
>> @@ -512,7 +512,9 @@ static int ttm_buffer_object_transfer(struct 
>> ttm_buffer_object *bo,
>>   kref_init(>base.kref);
>>   fbo->base.destroy = _transfered_destroy;
>>   fbo->base.acc_size = 0;
>> -    fbo->base.base.resv = >base.base._resv;
>> +    if (bo->base.resv == >base._resv)
>> +    fbo->base.base.resv = >base.base._resv;
>> +
>>   dma_resv_init(fbo->base.base.resv);

Doesn't this lead to issue if you force to init parent resv? Otherwise 
how to deal with if parent->resv is locking?


>>   ret = dma_resv_trylock(fbo->base.base.resv);
>>   WARN_ON(!ret);
>> @@ -711,7 +713,7 @@ int ttm_bo_move_accel_cleanup(struct 
>> ttm_buffer_object *bo,
>>   if (ret)
>>   return ret;
>>   -    dma_resv_add_excl_fence(ghost_obj->base.resv, fence);
>> +    dma_resv_add_excl_fence(_obj->base._resv, fence);
>>     /**
>>    * If we're not moving to fixed memory, the TTM object
>> @@ -724,7 +726,7 @@ int ttm_bo_move_accel_cleanup(struct 
>> ttm_buffer_object *bo,
>>   else
>>   bo->ttm = NULL;
>>   -    ttm_bo_unreserve(ghost_obj);
>> +    dma_resv_unlock(_obj->base._resv);

fbo->base.base.resv?

-David

>>   ttm_bo_put(ghost_obj);
>>   }
>>   @@ -767,7 +769,7 @@ int ttm_bo_pipeline_move(struct 
>> ttm_buffer_object *bo,
>>   if (ret)
>>   return ret;
>>   -    dma_resv_add_excl_fence(ghost_obj->base.resv, fence);
>> +    dma_resv_add_excl_fence(_obj->base._resv, fence);
>>     /**
>>    * If we're not moving to fixed memory, the TTM object
>> @@ -780,7 +782,7 @@ int ttm_bo_pipeline_move(struct ttm_buffer_object 
>> *bo,
>>   else
>>   bo->ttm = NULL;
>>   -    ttm_bo_unreserve(ghost_obj);
>> +    dma_resv_unlock(_obj->base._resv);
>>   ttm_bo_put(ghost_obj);
>>     } else if (from->flags & TTM_MEMTYPE_FLAG_FIXED) {
>> @@ -836,7 +838,7 @@ int ttm_bo_pipeline_gutting(struct 
>> ttm_buffer_object *bo)
>>   if (ret)
>>   return ret;
>>   -    ret = dma_resv_copy_fences(ghost->base.resv, bo->base.resv);
>> +    ret = dma_resv_copy_fences(>base._resv, bo->base.resv);
>>   /* Last resort, wait for the BO to be idle when we are OOM */
>>   if (ret)
>>   ttm_bo_wait(bo, false, false);
>> @@ -845,7 +847,7 @@ int ttm_bo_pipeline_gutting(struct 
>> ttm_buffer_object *bo)
>>   bo->mem.mem_type = TTM_PL_SYSTEM;
>>   bo->ttm = NULL;
>>   -    ttm_bo_unreserve(ghost);
>> +    dma_resv_unlock(>base._resv);
>>   ttm_bo_put(ghost);
>>     return 0;
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/1] drm/syncobj: add sideband payload

2019-09-17 Thread Zhou, David(ChunMing)
Hi Lionel,
The update looks good to me.
I tried your signal-order test, seems it isn't ready to run, not sure if I can 
reproduce your this issue.

-David

From: Lionel Landwerlin 
Sent: Tuesday, September 17, 2019 7:03 PM
To: dri-devel@lists.freedesktop.org 
Cc: Lionel Landwerlin ; Zhou, David(ChunMing) 
; Koenig, Christian ; Jason 
Ekstrand 
Subject: [PATCH 1/1] drm/syncobj: add sideband payload

The Vulkan timeline semaphores allow signaling to happen on the point
of the timeline without all of the its dependencies to be created.

The current 2 implementations (AMD/Intel) of the Vulkan spec on top of
the Linux kernel are using a thread to wait on the dependencies of a
given point to materialize and delay actual submission to the kernel
driver until the wait completes.

If a binary semaphore is submitted for signaling along the side of a
timeline semaphore waiting for completion that means that the drm
syncobj associated with that binary semaphore will not have a DMA
fence associated with it by the time vkQueueSubmit() returns. This and
the fact that a binary semaphore can be signaled and unsignaled as
before its DMA fences materialize mean that we cannot just rely on the
fence within the syncobj but we also need a sideband payload verifying
that the fence in the syncobj matches the last submission from the
Vulkan API point of view.

This change adds a sideband payload that is incremented with signaled
syncobj when vkQueueSubmit() is called. The next vkQueueSubmit()
waiting on a the syncobj will read the sideband payload and wait for a
fence chain element with a seqno superior or equal to the sideband
payload value to be added into the fence chain and use that fence to
trigger the submission on the kernel driver.

v2: Use a separate ioctl to get/set the sideband value (Christian)

v3: Use 2 ioctls for get/set (Christian)

v4: Use a single new ioctl

v5: a bunch of blattant mistakes
Store payload atomically (Chris)

v6: Only touch atomic value once (Jason)

v7: Updated atomic value when importing sync file

Signed-off-by: Lionel Landwerlin 
Reviewed-by: David Zhou  (v6)
Cc: Christian Koenig 
Cc: Jason Ekstrand 
Cc: David(ChunMing) Zhou 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  3 ++
 drivers/gpu/drm/drm_syncobj.c  | 64 --
 include/drm/drm_syncobj.h  |  9 +
 include/uapi/drm/drm.h | 17 +
 5 files changed, 93 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 51a2055c8f18..e297dfd85019 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device 
*dev, void *data,
   struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data,
+struct drm_file *file_private);

 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index f675a3bb2c88..644d0bc800a4 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
   DRM_RENDER_ALLOW),
 DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
   DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl,
+ DRM_RENDER_ALLOW),
+
 DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, 
drm_crtc_get_sequence_ioctl, 0),
 DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, 0),
 DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, 
drm_mode_create_lease_ioctl, DRM_MASTER),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 4b5c7b0ed714..2de8f1380890 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -418,8 +418,10 @@ int drm_syncobj_create(struct drm_syncobj **out_syncobj, 
uint32_t flags,
 if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)
 drm_syncobj_assign_null_handle(syncobj);

-   if (fence)
+   if (fence) {
 drm_syncobj_replace_fence(syncobj, fence);
+   atomic64_set(>binary_payload, fence->seqno);
+   }

 *out_syncobj = syncobj;
 return 0;
@@ -604,6 +606,7 @@ static int drm_syncobj_import_sync_file_fence(struct 
drm_file *file_private,
 }

 drm_syncobj_replace_fence(syncobj, fence);
+   atomic64_set(>binary_payload, fence->seqno);
 dma_fence_put(fence);
 drm_syncobj_put(syncobj);
 return 0;
@@ -1224,8 +1227,10 @@ drm_syncobj_reset_ioctl(s

回复:[PATCH v3 1/1] drm/syncobj: add sideband payload

2019-08-08 Thread Zhou, David(ChunMing)
Thank you, I got your mean.
when you have sideband payload, you will go into timeline path. Clever!

-David

 原始邮件 
主题:Re: [PATCH v3 1/1] drm/syncobj: add sideband payload
发件人:Lionel Landwerlin
收件人:"Zhou, David(ChunMing)" ,dri-de...@freedesktop.org
抄送:"Koenig, Christian" ,Jason Ekstrand ,"Zhou, David(ChunMing)"

On 08/08/2019 17:48, Chunming Zhou wrote:
> 在 2019/8/8 22:34, Lionel Landwerlin 写道:
>> On 08/08/2019 17:16, Chunming Zhou wrote:
>>> 在 2019/8/8 22:01, Lionel Landwerlin 写道:
>>>> On 08/08/2019 16:42, Chunming Zhou wrote:
>>>>> No, I just see your comment "The next vkQueueSubmit()
>>>>> waiting on a the syncobj will read the sideband payload and wait for a
>>>>> fence chain element with a seqno superior or equal to the sideband
>>>>> payload value to be added into the fence chain and use that fence to
>>>>> trigger the submission on the kernel driver. ".
>>>> That was meant to say wait on the chain fence to reach the sideband
>>>> payload value.
>>>>
>>>> It a bit confusing but I can't see any other way to word it.
>>>>
>>>>
>>>>> In that, you mentioned wait for sideband.
>>>>> So I want to know we how to use that, maybe I miss something in
>>>>> previous
>>>>> discussing thread.
>>>> In QueueSubmit(), we start by reading the sideband payloads :
>>>> https://gitlab.freedesktop.org/llandwerlin/mesa/blob/review/anv-timeline_semaphore_prep/src/intel/vulkan/anv_queue.c#L655
>>>>
>>>>
>>>> Then build everything for the submission and hand it over to the
>>>> submission thread.
>>>>
>>>> Instead of the just waiting on the timeline semaphore values, we also
>>>> wait on the binary semaphore sideband payload values.
>>> Waiting on timeline values is when finding fence in kernel.
>>
>> Hmm aren't you waiting in a thread in userspace?
> Yes, For timeline case, we can use waitForSyncobj()..., At begin of
> QueueThread, I let it wait in cs_ioctl when drm_syncobj_find_fence.
>
> But I still didn't get how to wait on sideband for binary Syncobj.
>
> Ah, I see, you will compare it in your QueueThread, if sideband value is
>   >= expected, then do submission, otherwise, loop QueueThread, right?


Just treat binary semaphores as timelines and waitForSyncobj on the
sideband payload value.

It should make the submission thread any busier than currently.


-Lionel


>
> That sounds the QueueThread will be always busy.
>
> -David
>
>
>>
>>> But I don't see how to wait/block in kernel when finding fence for
>>> binary sideband payload  values.
>>
>> Essentially our driver now handles timelines & binary semaphore using
>> dma-fence-chain in both cases.
>
>
>> Only with timelines we take the values submitted by the vulkan
>> application.
>
>> The binary semaphore auto increment on vkQueueSubmit() and that is
>> managed by the userspace driver.
>>
>>
>> -Lionel
>>
>>
>>>
>>> -David
>>>
>>>> Finally before exiting the QueueSubmit() call, we bump the sideband
>>>> payloads of all the binary semaphores have have been signaled :
>>>> https://gitlab.freedesktop.org/llandwerlin/mesa/blob/review/anv-timeline_semaphore_prep/src/intel/vulkan/anv_queue.c#L806
>>>>
>>>>
>>>>
>>>> Whoever calls QueueSubmit() after that will pickup the new sideband
>>>> payload values to wait on.
>>>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>
>>>>> -DAvid
>>>>>
>>>>>
>>>>> 在 2019/8/8 21:38, Lionel Landwerlin 写道:
>>>>>> Interesting question :)
>>>>>>
>>>>>> I didn't see any usecase for that.
>>>>>> This sideband payload value is used for a wait so waiting on the wait
>>>>>> value for another wait is bit meta :)
>>>>>>
>>>>>> Do you have a use case for that?
>>>>>>
>>>>>> -Lionel
>>>>>>
>>>>>> On 08/08/2019 16:23, Chunming Zhou wrote:
>>>>>>> The propursal is fine with me.
>>>>>>>
>>>>>>> one question:
>>>>>>>
>>>>>>> how to wait sideband payload? Following patch will show that?
>>>>>>>
>>>>>>> -David
>>>

RE: Threaded submission & semaphore sharing

2019-08-01 Thread Zhou, David(ChunMing)
Hi Lionel,

By the Queue thread is a heavy thread, which is always resident in driver 
during application running, our guys don't like that. So we switch to Semaphore 
Thread, only when waitBeforeSignal of timeline happens, we spawn a thread to 
handle that wait. So we don't have your this issue.
By the way, I already pass all your CTS cases for now. I suggest you to switch 
to Semaphore Thread instead of Queue Thread as well. It works very well.

-David

-Original Message-
From: Lionel Landwerlin  
Sent: Friday, August 2, 2019 4:52 AM
To: dri-devel ; Koenig, Christian 
; Zhou, David(ChunMing) ; Jason 
Ekstrand 
Subject: Threaded submission & semaphore sharing

Hi Christian, David,

Sorry to report this so late in the process, but I think we found an issue not 
directly related to syncobj timelines themselves but with a side effect of the 
threaded submissions.

Essentially we're failing a test in crucible : 
func.sync.semaphore-fd.opaque-fd
This test create a single binary semaphore, shares it between 2 
VkDevice/VkQueue.
Then in a loop it proceeds to submit workload alternating between the 2 VkQueue 
with one submit depending on the other.
It does so by waiting on the VkSemaphore signaled in the previous iteration and 
resignaling it.

The problem for us is that once things are dispatched to the submission thread, 
the ordering of the submission is lost.
Because we have 2 devices and they both have their own submission thread.

Jason suggested that we reestablish the ordering by having semaphores/syncobjs 
carry an additional uint64_t payload.
This 64bit integer would represent be an identifier that submission threads 
will WAIT_FOR_AVAILABLE on.

The scenario would look like this :
     - vkQueueSubmit(queueA, signal on semA);
         - in the caller thread, this would increment the syncobj additional 
u64 payload and return it to userspace.
         - at some point the submission thread of queueA submits the workload 
and signal the syncobj of semA with value returned in the caller thread of 
vkQueueSubmit().
     - vkQueueSubmit(queueB, wait on semA);
         - in the caller thread, this would read the syncobj additional
u64 payload
         - at some point the submission thread of queueB will try to submit the 
work, but first it will WAIT_FOR_AVAILABLE the u64 value returned in the step 
above

Because we want the binary semaphores to be shared across processes and would 
like this to remain a single FD, the simplest location to store this additional 
u64 payload would be the DRM syncobj.
It would need an additional ioctl to read & increment the value.

What do you think?

-Lionel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Recall: [PATCH libdrm] libdrm: wrap new flexible syncobj query interface v2

2019-07-25 Thread Zhou, David(ChunMing)
Zhou, David(ChunMing) would like to recall the message, "[PATCH libdrm] libdrm: 
wrap new flexible syncobj query interface v2".
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Recall: [PATCH libdrm] libdrm: wrap new flexible syncobj query interface v2

2019-07-25 Thread Zhou, David(ChunMing)
Zhou, David(ChunMing) would like to recall the message, "[PATCH libdrm] libdrm: 
wrap new flexible syncobj query interface v2".
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH 1/2] update drm.h

2019-06-06 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Michel Dänzer 
> Sent: Thursday, June 06, 2019 10:09 PM
> To: Zhou, David(ChunMing) ; Koenig, Christian
> ; Zhou, David(ChunMing)
> 
> Cc: dri-devel@lists.freedesktop.org
> Subject: Re: [PATCH 1/2] update drm.h
> 
> On 2019-06-06 12:31 p.m., Michel Dänzer wrote:
> > On 2019-06-06 12:26 p.m., zhoucm1 wrote:
> >> https://gitlab.freedesktop.org/mesa/drm, Where the merge request
> button?
> >
> > If you push to a branch in your personal repository, the output of git
> > push contains an URL for creating a merge request.
> 
> Daniel Vetter pointed out on IRC that currently merge requests aren't
> enabled yet for libdrm, which probably explains why you couldn't find a way
> to create one. Sorry, I didn't realize this.
> 
> Meanwhile, can you just push to a branch in your personal repo, and make
> sure the CI pipeline passes?

Sorry, which personal repo? My problem to connect to gitlab.freedesktop isn't 
solved yet by our IT, I'm pushing them, but no response yet.

-David

> 
> 
> --
> Earthling Michel Dänzer   |  https://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3

2019-05-29 Thread Zhou, David(ChunMing)
Patch #1,#5,#6,#8,#9,#10 are Reviewed-by: Chunming Zhou 
Patch #2,#3,#4 are Acked-by: Chunming Zhou 

-David

> -Original Message-
> From: dri-devel  On Behalf Of
> Christian K?nig
> Sent: Wednesday, May 29, 2019 8:27 PM
> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org
> Subject: [PATCH 10/10] drm/amdgpu: stop removing BOs from the LRU v3
> 
> This avoids OOM situations when we have lots of threads submitting at the
> same time.
> 
> v3: apply this to the whole driver, not just CS
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c| 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 2 +-
>  4 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 20f2955d2a55..3e2da24cd17a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct
> amdgpu_cs_parser *p,
>   }
> 
>   r = ttm_eu_reserve_buffers(>ticket, >validated, true,
> -, true);
> +, false);
>   if (unlikely(r != 0)) {
>   if (r != -ERESTARTSYS)
>   DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> index 06f83cac0d3a..f660628e6af9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
> @@ -79,7 +79,7 @@ int amdgpu_map_static_csa(struct amdgpu_device
> *adev, struct amdgpu_vm *vm,
>   list_add(_tv.head, );
>   amdgpu_vm_get_pd_bo(vm, , );
> 
> - r = ttm_eu_reserve_buffers(, , true, NULL, true);
> + r = ttm_eu_reserve_buffers(, , true, NULL, false);
>   if (r) {
>   DRM_ERROR("failed to reserve CSA,PD BOs: err=%d\n", r);
>   return r;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index d513a5ad03dd..ed25a4e14404 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -171,7 +171,7 @@ void amdgpu_gem_object_close(struct
> drm_gem_object *obj,
> 
>   amdgpu_vm_get_pd_bo(vm, , _pd);
> 
> - r = ttm_eu_reserve_buffers(, , false, , true);
> + r = ttm_eu_reserve_buffers(, , false, , false);
>   if (r) {
>   dev_err(adev->dev, "leaking bo va because "
>   "we fail to reserve bo (%d)\n", r);
> @@ -608,7 +608,7 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev,
> void *data,
> 
>   amdgpu_vm_get_pd_bo(>vm, , _pd);
> 
> - r = ttm_eu_reserve_buffers(, , true, , true);
> + r = ttm_eu_reserve_buffers(, , true, , false);
>   if (r)
>   goto error_unref;
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index c430e8259038..d60593cc436e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -155,7 +155,7 @@ static inline int amdgpu_bo_reserve(struct
> amdgpu_bo *bo, bool no_intr)
>   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
>   int r;
> 
> - r = ttm_bo_reserve(>tbo, !no_intr, false, NULL);
> + r = __ttm_bo_reserve(>tbo, !no_intr, false, NULL);
>   if (unlikely(r != 0)) {
>   if (r != -ERESTARTSYS)
>   dev_err(adev->dev, "%p reserve failed\n", bo);
> --
> 2.17.1
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH 1/2] drm/syncobj: add an output syncobj parameter to find_fence

2019-05-23 Thread Zhou, David(ChunMing)
can you make the parameter optional? Otherwise looks good to me.

-David

 Original Message 
Subject: [PATCH 1/2] drm/syncobj: add an output syncobj parameter to find_fence
From: Lionel Landwerlin
To: intel-...@lists.freedesktop.org
CC: Lionel Landwerlin ,"Koenig, Christian" ,"Zhou, David(ChunMing)" ,Eric 
Anholt ,DRI-Devel

[CAUTION: External Email]

We would like to get both the fence & the syncobj in i915 rather than
doing 2 calls to drm_syncobj_find() & drm_syncobj_find_fence().

Signed-off-by: Lionel Landwerlin 
Cc: Christian Koenig 
Cc: David(ChunMing) Zhou 
Cc: Eric Anholt 
CC: DRI-Devel 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  4 ++-
 drivers/gpu/drm/drm_syncobj.c  | 45 +-
 drivers/gpu/drm/v3d/v3d_gem.c  |  5 ++-
 include/drm/drm_syncobj.h  |  1 +
 4 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 2f6239b6be6f..09fde3c73a2c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1124,10 +1124,11 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,
 uint32_t handle, u64 point,
 u64 flags)
 {
+   struct drm_syncobj *syncobj;
struct dma_fence *fence;
int r;

-   r = drm_syncobj_find_fence(p->filp, handle, point, flags, );
+   r = drm_syncobj_find_fence(p->filp, handle, point, flags, , 
);
if (r) {
DRM_ERROR("syncobj %u failed to find fence @ %llu (%d)!\n",
  handle, point, r);
@@ -1136,6 +1137,7 @@ static int amdgpu_syncobj_lookup_and_add_to_sync(struct 
amdgpu_cs_parser *p,

r = amdgpu_sync_fence(p->adev, >job->sync, fence, true);
dma_fence_put(fence);
+   drm_syncobj_put(syncobj);

return r;
 }
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 3d400905100b..f2fd0c1fb1d3 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -222,29 +222,32 @@ static void drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)
  * @handle: sync object handle to lookup.
  * @point: timeline point
  * @flags: DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT or not
+ * @syncobj: out parameter for the syncobj
  * @fence: out parameter for the fence
  *
  * This is just a convenience function that combines drm_syncobj_find() and
  * drm_syncobj_fence_get().
  *
- * Returns 0 on success or a negative error value on failure. On success @fence
- * contains a reference to the fence, which must be released by calling
- * dma_fence_put().
+ * Returns 0 on success or a negative error value on failure. On
+ * success @syncobj and @fence contains a reference respectively to
+ * the syncobj and to the fence, which must be released by calling
+ * respectively drm_syncobj_put() and dma_fence_put().
  */
 int drm_syncobj_find_fence(struct drm_file *file_private,
   u32 handle, u64 point, u64 flags,
+  struct drm_syncobj **syncobj,
   struct dma_fence **fence)
 {
-   struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
struct syncobj_wait_entry wait;
u64 timeout = nsecs_to_jiffies64(DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT);
int ret;

-   if (!syncobj)
+   *syncobj = drm_syncobj_find(file_private, handle);
+
+   if (!(*syncobj))
return -ENOENT;

-   *fence = drm_syncobj_fence_get(syncobj);
-   drm_syncobj_put(syncobj);
+   *fence = drm_syncobj_fence_get(*syncobj);

if (*fence) {
ret = dma_fence_chain_find_seqno(fence, point);
@@ -255,13 +258,15 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
ret = -EINVAL;
}

-   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
+   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT)) {
+   drm_syncobj_put(*syncobj);
return ret;
+   }

memset(, 0, sizeof(wait));
wait.task = current;
wait.point = point;
-   drm_syncobj_fence_add_wait(syncobj, );
+   drm_syncobj_fence_add_wait(*syncobj, );

do {
set_current_state(TASK_INTERRUPTIBLE);
@@ -286,7 +291,10 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
*fence = wait.fence;

if (wait.node.next)
-   drm_syncobj_remove_wait(syncobj, );
+   drm_syncobj_remove_wait(*syncobj, );
+
+   if (ret)
+   drm_syncobj_put(*syncobj);

return ret;
 }
@@ -531,6 +539,7 @@ static int drm_syncobj_export_sync_file(struct drm_file 
*file_private,
int handle, int *p_fd)
 {
int ret;
+  

RE: [PATCH 01/11] drm/ttm: Make LRU removal optional.

2019-05-17 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Tuesday, May 14, 2019 8:31 PM
> To: Olsak, Marek ; Zhou, David(ChunMing)
> ; Liang, Prike ; dri-
> de...@lists.freedesktop.org; amd-...@lists.freedesktop.org
> Subject: [PATCH 01/11] drm/ttm: Make LRU removal optional.
> 
> [CAUTION: External Email]
> 
> We are already doing this for DMA-buf imports and also for amdgpu VM BOs
> for quite a while now.
> 
> If this doesn't run into any problems we are probably going to stop removing
> BOs from the LRU altogether.
> 
> Signed-off-by: Christian König 
> ---
[snip]
> diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> index 0075eb9a0b52..957ec375a4ba 100644
> --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c
> @@ -69,7 +69,8 @@ void ttm_eu_backoff_reservation(struct
> ww_acquire_ctx *ticket,
> list_for_each_entry(entry, list, head) {
> struct ttm_buffer_object *bo = entry->bo;
> 
> -   ttm_bo_add_to_lru(bo);
> +   if (list_empty(>lru))
> +   ttm_bo_add_to_lru(bo);
> reservation_object_unlock(bo->resv);
> }
> spin_unlock(>lru_lock);
> @@ -93,7 +94,7 @@ EXPORT_SYMBOL(ttm_eu_backoff_reservation);
> 
>  int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
>struct list_head *list, bool intr,
> -  struct list_head *dups)
> +  struct list_head *dups, bool del_lru)
>  {
> struct ttm_bo_global *glob;
> struct ttm_validate_buffer *entry; @@ -172,11 +173,11 @@ int
> ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,
> list_add(>head, list);
> }
> 
> -   if (ticket)
> -   ww_acquire_done(ticket);
> -   spin_lock(>lru_lock);
> -   ttm_eu_del_from_lru_locked(list);
> -   spin_unlock(>lru_lock);
> +   if (del_lru) {
> +   spin_lock(>lru_lock);
> +   ttm_eu_del_from_lru_locked(list);
> +   spin_unlock(>lru_lock);
> +   }

Can you make bo to lru tail here when del_lru is false?

Busy iteration in evict_first will try other process Bos first, which could 
save loop time.

> return 0;
>  }
>  EXPORT_SYMBOL(ttm_eu_reserve_buffers);
> @@ -203,7 +204,10 @@ void ttm_eu_fence_buffer_objects(struct
> ww_acquire_ctx *ticket,
> reservation_object_add_shared_fence(bo->resv, fence);
> else
> reservation_object_add_excl_fence(bo->resv, fence);
> -   ttm_bo_add_to_lru(bo);
> +   if (list_empty(>lru))
> +   ttm_bo_add_to_lru(bo);
> +   else
> +   ttm_bo_move_to_lru_tail(bo, NULL);

If this line is done in above, then we don't need this here.

-David
> reservation_object_unlock(bo->resv);
> }
> spin_unlock(>lru_lock);
> diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> index 161b80fee492..5cffaa24259f 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
> @@ -63,7 +63,7 @@ static int virtio_gpu_object_list_validate(struct
> ww_acquire_ctx *ticket,
> struct virtio_gpu_object *qobj;
> int ret;
> 
> -   ret = ttm_eu_reserve_buffers(ticket, head, true, NULL);
> +   ret = ttm_eu_reserve_buffers(ticket, head, true, NULL, true);
> if (ret != 0)
> return ret;
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> index a7c30e567f09..d28cbedba0b5 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
> @@ -465,7 +465,8 @@ vmw_resource_check_buffer(struct ww_acquire_ctx
> *ticket,
> val_buf->bo = >backup->base;
> val_buf->num_shared = 0;
> list_add_tail(_buf->head, _list);
> -   ret = ttm_eu_reserve_buffers(ticket, _list, interruptible, NULL);
> +   ret = ttm_eu_reserve_buffers(ticket, _list, interruptible, NULL,
> +true);
> if (unlikely(ret != 0))
> goto out_no_reserve;
> 
> diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> index 3b396fea40d7..ac435b51f4eb 100644
> --- a/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_validation.h
> @@ -165,7 +165,7 @@ vmw_validation_bo_reserve

Re:[PATCH libdrm] enable syncobj test depending on capability

2019-05-17 Thread Zhou, David(ChunMing)
Can you guy do that? Otherwise if kernel driver doesn't set that cap, test 
could fail.

Thanks,
-David

 Original Message 
Subject: Re: [PATCH libdrm] enable syncobj test depending on capability
From: "Koenig, Christian"
To: Michel Dänzer ,"Zhou, David(ChunMing)" ,"Zhou, David(ChunMing)"
CC: dri-devel@lists.freedesktop.org

Am 17.05.19 um 11:55 schrieb Michel Dänzer:
> [CAUTION: External Email]
>
> On 2019-05-17 11:47 a.m., zhoucm1 wrote:
>> ping, Could you help check in patch to gitlab? My connection to gitlab
>> still has problem.
> Please follow the process documented in include/drm/README for
> include/drm/drm.h .

Yeah, the header should be updated separately to what is currently in
drm-next (or drm-misc-next).

And then we can update the fix on top of that,
Christian.

>
>
> --
> Earthling Michel Dänzer   |  https://www.amd.com
> Libre software enthusiast | Mesa and X developer

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH libdrm 7/7] add syncobj timeline tests v3

2019-05-16 Thread Zhou, David(ChunMing)
It mentioned me I cannot push to gitlab directly. After that, I added my ssh 
pub to gitlab web, and also added gitlab url to git remote.
then push again, it mentions "connection timeout".

-David

 Original Message 
Subject: Re: [PATCH libdrm 7/7] add syncobj timeline tests v3
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org
CC:

[CAUTION: External Email]

Am 16.05.19 um 12:19 schrieb zhoucm1:
>
>
> On 2019年05月16日 18:09, Christian König wrote:
>> [CAUTION: External Email]
>>
>> Am 16.05.19 um 10:16 schrieb zhoucm1:
>>> I was able to push changes to libdrm, but now seems after libdrm is
>>> migrated to gitlab, I cannot yet. What step do I need to get back my
>>> permission? I already can login into gitlab with old freedesktop
>>> account.
>>>
>>> @Christian, Can you help submit this patch set to libdrm first?
>>
>> Done. And I think you can now request write permission to a repository
>> through the web-interface and all the "owners" of the project can grant
>> that to you.
> Any guide for that? I failed to find where to request permission.

Not of hand. What does the system say when you try to push?

Christian.

>
> -David
>>
>> Christian.
>>
>>>
>>>
>>> Thanks,
>>>
>>> -David
>>>
>>>
>>> On 2019年05月16日 16:07, Chunming Zhou wrote:
>>>> v2: drop DRM_SYNCOBJ_CREATE_TYPE_TIMELINE, fix timeout calculation,
>>>>  fix some warnings
>>>> v3: add export/import and cpu signal testing cases
>>>>
>>>> Signed-off-by: Chunming Zhou 
>>>> Acked-by: Christian König 
>>>> Acked-by: Lionel Landwerlin 
>>>> ---
>>>>   tests/amdgpu/Makefile.am |   3 +-
>>>>   tests/amdgpu/amdgpu_test.c   |  11 ++
>>>>   tests/amdgpu/amdgpu_test.h   |  21 +++
>>>>   tests/amdgpu/meson.build |   2 +-
>>>>   tests/amdgpu/syncobj_tests.c | 290
>>>> +++
>>>>   5 files changed, 325 insertions(+), 2 deletions(-)
>>>>   create mode 100644 tests/amdgpu/syncobj_tests.c
>>>>
>>>> diff --git a/tests/amdgpu/Makefile.am b/tests/amdgpu/Makefile.am
>>>> index 48278848..920882d0 100644
>>>> --- a/tests/amdgpu/Makefile.am
>>>> +++ b/tests/amdgpu/Makefile.am
>>>> @@ -34,4 +34,5 @@ amdgpu_test_SOURCES = \
>>>>   uve_ib.h \
>>>>   deadlock_tests.c \
>>>>   vm_tests.c\
>>>> -ras_tests.c
>>>> +ras_tests.c \
>>>> +syncobj_tests.c
>>>> diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
>>>> index 35c8bf6c..73403fb4 100644
>>>> --- a/tests/amdgpu/amdgpu_test.c
>>>> +++ b/tests/amdgpu/amdgpu_test.c
>>>> @@ -57,6 +57,7 @@
>>>>   #define DEADLOCK_TESTS_STR "Deadlock Tests"
>>>>   #define VM_TESTS_STR "VM Tests"
>>>>   #define RAS_TESTS_STR "RAS Tests"
>>>> +#define SYNCOBJ_TIMELINE_TESTS_STR "SYNCOBJ TIMELINE Tests"
>>>> /**
>>>>*  Open handles for amdgpu devices
>>>> @@ -123,6 +124,12 @@ static CU_SuiteInfo suites[] = {
>>>>   .pCleanupFunc = suite_ras_tests_clean,
>>>>   .pTests = ras_tests,
>>>>   },
>>>> +{
>>>> +.pName = SYNCOBJ_TIMELINE_TESTS_STR,
>>>> +.pInitFunc = suite_syncobj_timeline_tests_init,
>>>> +.pCleanupFunc = suite_syncobj_timeline_tests_clean,
>>>> +.pTests = syncobj_timeline_tests,
>>>> +},
>>>> CU_SUITE_INFO_NULL,
>>>>   };
>>>> @@ -176,6 +183,10 @@ static Suites_Active_Status suites_active_stat[]
>>>> = {
>>>>   .pName = RAS_TESTS_STR,
>>>>   .pActive = suite_ras_tests_enable,
>>>>   },
>>>> +{
>>>> +.pName = SYNCOBJ_TIMELINE_TESTS_STR,
>>>> +.pActive = suite_syncobj_timeline_tests_enable,
>>>> +},
>>>>   };
>>>> diff --git a/tests/amdgpu/amdgpu_test.h
>>>> b/tests/amdgpu/amdgpu_test.h
>>>> index bcd0bc7e..36675ea3 100644
>>>> --- a/tests/amdgpu/amdgpu_test.h
>>>> +++ b/tests/amdgpu/amdgpu_test.h
>>>> @@ -216,6 +216,

Re:[PATCH libdrm] enable syncobj test depending on capability

2019-05-16 Thread Zhou, David(ChunMing)
could you help push this patch as well?

Thanks,
-David

 Original Message 
Subject: Re: [PATCH libdrm] enable syncobj test depending on capability
From: "Koenig, Christian"
To: "Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
CC:

Am 16.05.19 um 12:46 schrieb Chunming Zhou:
> Feature is controlled by DRM_CAP_SYNCOBJ_TIMELINE drm capability.
>
> Signed-off-by: Chunming Zhou 

Reviewed-by: Christian König 

> ---
>   include/drm/drm.h| 1 +
>   tests/amdgpu/syncobj_tests.c | 8 
>   2 files changed, 9 insertions(+)
>
> diff --git a/include/drm/drm.h b/include/drm/drm.h
> index c893f3b4..532787bf 100644
> --- a/include/drm/drm.h
> +++ b/include/drm/drm.h
> @@ -643,6 +643,7 @@ struct drm_gem_open {
>   #define DRM_CAP_PAGE_FLIP_TARGET0x11
>   #define DRM_CAP_CRTC_IN_VBLANK_EVENT0x12
>   #define DRM_CAP_SYNCOBJ 0x13
> +#define DRM_CAP_SYNCOBJ_TIMELINE 0x14
>
>   /** DRM_IOCTL_GET_CAP ioctl argument type */
>   struct drm_get_cap {
> diff --git a/tests/amdgpu/syncobj_tests.c b/tests/amdgpu/syncobj_tests.c
> index a0c627d7..869ed88e 100644
> --- a/tests/amdgpu/syncobj_tests.c
> +++ b/tests/amdgpu/syncobj_tests.c
> @@ -22,6 +22,7 @@
>   */
>
>   #include "CUnit/Basic.h"
> +#include "xf86drm.h"
>
>   #include "amdgpu_test.h"
>   #include "amdgpu_drm.h"
> @@ -36,6 +37,13 @@ static void amdgpu_syncobj_timeline_test(void);
>
>   CU_BOOL suite_syncobj_timeline_tests_enable(void)
>   {
> + int r;
> + uint64_t cap = 0;
> +
> + r = drmGetCap(drm_amdgpu[0], DRM_CAP_SYNCOBJ_TIMELINE, );
> + if (r || cap == 0)
> + return CU_FALSE;
> +
>return CU_TRUE;
>   }
>

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-15 Thread Zhou, David(ChunMing)
Ah, sorry, I missed  "+  ttm_bo_move_to_lru_tail(bo, 
NULL);".

Right, moving them to end before releasing is fixing my concern.

Sorry for noise.
-David


 Original Message 
Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: "Koenig, Christian"
To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org
CC:

[CAUTION: External Email]
BO list? No, we stop removing them from the LRU.

But we still move them to the end of the LRU before releasing them.

Christian.

Am 15.05.19 um 16:21 schrieb Zhou, David(ChunMing):
Isn't this patch trying to stop removing for all BOs  from bo list?

-David

 Original Message 
Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Olsak, Marek" ,"Liang, 
Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org>
CC:

[CAUTION: External Email]
That is a good point, but actually not a problem in practice.

See the change to ttm_eu_fence_buffer_objects:
-   ttm_bo_add_to_lru(bo);
+   if (list_empty(>lru))
+   ttm_bo_add_to_lru(bo);
+   else
+   ttm_bo_move_to_lru_tail(bo, NULL);

We still move the BOs to the end of the LRU in the same order we have before, 
we just don't remove them when they are reserved.

Regards,
Christian.

Am 14.05.19 um 16:31 schrieb Zhou, David(ChunMing):
how to refresh LRU to keep the order align with bo list passed from user space?

you can verify it by some games, performance could be different much between 
multiple runnings.

-David

 Original Message 
Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org>
CC:

[CAUTION: External Email]
Hui? What do you mean with that?

Christian.

Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing):
my only concern is how to fresh LRU when bo is from bo list.

-David

 Original Message ----
Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org>
CC:

[CAUTION: External Email]

This avoids OOM situations when we have lots of threads
submitting at the same time.

Signed-off-by: Christian König 
<mailto:christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fff558cf385b..f9240a94217b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
}

r = ttm_eu_reserve_buffers(>ticket, >validated, true,
-  , true);
+  , false);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
--
2.17.1





___
amd-gfx mailing list
amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-15 Thread Zhou, David(ChunMing)
Isn't this patch trying to stop removing for all BOs  from bo list?

-David

 Original Message 
Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Olsak, Marek" ,"Liang, 
Prike" ,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org
CC:

[CAUTION: External Email]
That is a good point, but actually not a problem in practice.

See the change to ttm_eu_fence_buffer_objects:
-   ttm_bo_add_to_lru(bo);
+   if (list_empty(>lru))
+   ttm_bo_add_to_lru(bo);
+   else
+   ttm_bo_move_to_lru_tail(bo, NULL);

We still move the BOs to the end of the LRU in the same order we have before, 
we just don't remove them when they are reserved.

Regards,
Christian.

Am 14.05.19 um 16:31 schrieb Zhou, David(ChunMing):
how to refresh LRU to keep the order align with bo list passed from user space?

you can verify it by some games, performance could be different much between 
multiple runnings.

-David

 Original Message 
Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org>
CC:

[CAUTION: External Email]
Hui? What do you mean with that?

Christian.

Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing):
my only concern is how to fresh LRU when bo is from bo list.

-David

 Original Message 
Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org>
CC:

[CAUTION: External Email]

This avoids OOM situations when we have lots of threads
submitting at the same time.

Signed-off-by: Christian König 
<mailto:christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fff558cf385b..f9240a94217b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
}

r = ttm_eu_reserve_buffers(>ticket, >validated, true,
-  , true);
+  , false);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
--
2.17.1





___
amd-gfx mailing list
amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-14 Thread Zhou, David(ChunMing)
how to refresh LRU to keep the order align with bo list passed from user space?

you can verify it by some games, performance could be different much between 
multiple runnings.

-David

 Original Message 
Subject: Re: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Zhou, David(ChunMing)" ,"Olsak, Marek" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org
CC:

[CAUTION: External Email]
Hui? What do you mean with that?

Christian.

Am 14.05.19 um 15:12 schrieb Zhou, David(ChunMing):
my only concern is how to fresh LRU when bo is from bo list.

-David

 Original Message 
Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org>
CC:

[CAUTION: External Email]

This avoids OOM situations when we have lots of threads
submitting at the same time.

Signed-off-by: Christian König 
<mailto:christian.koe...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fff558cf385b..f9240a94217b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
}

r = ttm_eu_reserve_buffers(>ticket, >validated, true,
-  , true);
+  , false);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
--
2.17.1


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS

2019-05-14 Thread Zhou, David(ChunMing)
my only concern is how to fresh LRU when bo is from bo list.

-David

 Original Message 
Subject: [PATCH 11/11] drm/amdgpu: stop removing BOs from the LRU during CS
From: Christian König
To: "Olsak, Marek" ,"Zhou, David(ChunMing)" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org
CC:

[CAUTION: External Email]

This avoids OOM situations when we have lots of threads
submitting at the same time.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index fff558cf385b..f9240a94217b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -648,7 +648,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
}

r = ttm_eu_reserve_buffers(>ticket, >validated, true,
-  , true);
+  , false);
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
--
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH 1/2] drm/ttm: fix busy memory to fail other user v7

2019-05-09 Thread Zhou, David(ChunMing)
I know that before, it will issue warning only when debug option is enabled. 
Removing that is ok to me.
I only help Prike draft your idea, and Prike is trying this patch on his side. 
The latest feedback he gave me is first_bo is always null, code doesn't run 
into busy path, which is very confusing me, and he said  he is debugging  that.

-David


 Original Message 
Subject: Re: [PATCH 1/2] drm/ttm: fix busy memory to fail other user v7
From: "Koenig, Christian"
To: "Zhou, David(ChunMing)" ,"Liang, Prike" ,dri-devel@lists.freedesktop.org
CC:

I've foudn one more problem with this.

With lockdep enabled I get a warning because ttm_eu_reserve_buffers()
has called ww_acquire_done() on the ticket (which essentially means we
are done, no more locking with that ticket).

The simplest solution is probably to just remove the call to
ww_acquire_done() from ttm_eu_reserve_buffers().

Christian.

Am 07.05.19 um 13:45 schrieb Chunming Zhou:
> heavy gpu job could occupy memory long time, which lead other user fail to 
> get memory.
>
> basically pick up Christian idea:
>
> 1. Reserve the BO in DC using a ww_mutex ticket (trivial).
> 2. If we then run into this EBUSY condition in TTM check if the BO we need 
> memory for (or rather the ww_mutex of its reservation object) has a ticket 
> assigned.
> 3. If we have a ticket we grab a reference to the first BO on the LRU, drop 
> the LRU lock and try to grab the reservation lock with the ticket.
> 4. If getting the reservation lock with the ticket succeeded we check if the 
> BO is still the first one on the LRU in question (the BO could have moved).
> 5. If the BO is still the first one on the LRU in question we try to evict it 
> as we would evict any other BO.
> 6. If any of the "If's" above fail we just back off and return -EBUSY.
>
> v2: fix some minor check
> v3: address Christian v2 comments.
> v4: fix some missing
> v5: handle first_bo unlock and bo_get/put
> v6: abstract unified iterate function, and handle all possible usecase not 
> only pinned bo.
> v7: pass request bo->resv to ttm_bo_evict_first
>
> Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 111 +--
>   1 file changed, 94 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 8502b3ed2d88..f5e6328e4a57 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -766,11 +766,13 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
>* b. Otherwise, trylock it.
>*/
>   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
> - struct ttm_operation_ctx *ctx, bool *locked)
> + struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
>   {
>bool ret = false;
>
>*locked = false;
> + if (busy)
> + *busy = false;
>if (bo->resv == ctx->resv) {
>reservation_object_assert_held(bo->resv);
>if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
> @@ -779,35 +781,46 @@ static bool ttm_bo_evict_swapout_allowable(struct 
> ttm_buffer_object *bo,
>} else {
>*locked = reservation_object_trylock(bo->resv);
>ret = *locked;
> + if (!ret && busy)
> + *busy = true;
>}
>
>return ret;
>   }
>
> -static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
> -uint32_t mem_type,
> -const struct ttm_place *place,
> -struct ttm_operation_ctx *ctx)
> +static struct ttm_buffer_object*
> +ttm_mem_find_evitable_bo(struct ttm_bo_device *bdev,
> +  struct ttm_mem_type_manager *man,
> +  const struct ttm_place *place,
> +  struct ttm_operation_ctx *ctx,
> +  struct ttm_buffer_object **first_bo,
> +  bool *locked)
>   {
> - struct ttm_bo_global *glob = bdev->glob;
> - struct ttm_mem_type_manager *man = >man[mem_type];
>struct ttm_buffer_object *bo = NULL;
> - bool locked = false;
> - unsigned i;
> - int ret;
> + int i;
>
> - spin_lock(>lru_lock);
> + if (first_bo)
> + *first_bo = NULL;
>for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
>list_for_each_entry(bo, >lru[i], lru) {
> - if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
> +

RE: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-27 Thread Zhou, David(ChunMing)
Sorry, I only can put my Acked-by: Chunming Zhou  on 
patch#3.

I cannot fully judge patch #4, #5, #6.

-David

From: amd-gfx  On Behalf Of Grodzovsky, 
Andrey
Sent: Friday, April 26, 2019 10:09 PM
To: Koenig, Christian ; Zhou, David(ChunMing) 
; dri-devel@lists.freedesktop.org; 
amd-...@lists.freedesktop.org; e...@anholt.net; etna...@lists.freedesktop.org
Cc: Kazlauskas, Nicholas ; Liu, Monk 

Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already 
signaled.


Ping (mostly David and Monk).

Andrey
On 4/24/19 3:09 AM, Christian König wrote:
Am 24.04.19 um 05:02 schrieb Zhou, David(ChunMing):
>> -drm_sched_stop(>sched, >base);
>> -
>>   /* after all hw jobs are reset, hw fence is meaningless, so 
>> force_completion */
>>   amdgpu_fence_driver_force_completion(ring);
>>   }

HW fence are already forced completion, then we can just disable irq fence 
process and ignore hw fence signal when we are trying to do GPU reset, I think. 
Otherwise which will make the logic much more complex.
If this situation happens because of long time execution, we can increase 
timeout of reset detection.

You are not thinking widely enough, forcing the hw fence to complete can 
trigger other to start other activity in the system.

We first need to stop everything and make sure that we don't do any processing 
any more and then start with our reset procedure including forcing all hw 
fences to complete.

Christian.



-David

From: amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org>
 On Behalf Of Grodzovsky, Andrey
Sent: Wednesday, April 24, 2019 12:00 AM
To: Zhou, David(ChunMing) <mailto:david1.z...@amd.com>; 
dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>; 
amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org>; 
e...@anholt.net<mailto:e...@anholt.net>; 
etna...@lists.freedesktop.org<mailto:etna...@lists.freedesktop.org>; 
ckoenig.leichtzumer...@gmail.com<mailto:ckoenig.leichtzumer...@gmail.com>
Cc: Kazlauskas, Nicholas 
<mailto:nicholas.kazlaus...@amd.com>; Liu, Monk 
<mailto:monk@amd.com>
Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already 
signaled.


No, i mean the actual HW fence which signals when the job finished execution on 
the HW.

Andrey
On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote:
do you mean fence timer? why not stop it as well when stopping sched for the 
reason of hw reset?

 Original Message 
Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already 
signaled.
From: "Grodzovsky, Andrey"
To: "Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com>
CC: "Kazlauskas, Nicholas" ,"Liu, Monk"

On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote:
> +Monk.
>
> GPU reset is used widely in SRIOV, so need virtulizatino guy take a look.
>
> But out of curious, why guilty job can signal more if the job is already
> set to guilty? set it wrongly?
>
>
> -David


It's possible that the job does completes at a later time then it's
timeout handler started processing so in this patch we try to protect
against this by rechecking the HW fence after stopping all SW
schedulers. We do it BEFORE marking guilty on the job's sched_entity so
at the point we check the guilty flag is not set yet.

Andrey


>
> 在 2019/4/18 23:00, Andrey Grodzovsky 写道:
>> Also reject TDRs if another one already running.
>>
>> v2:
>> Stop all schedulers across device and entire XGMI hive before
>> force signaling HW fences.
>> Avoid passing job_signaled to helper fnctions to keep all the decision
>> making about skipping HW reset in one place.
>>
>> v3:
>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced
>> against it's decrement in drm_sched_stop in non HW reset case.
>> v4: rebase
>> v5: Revert v3 as we do it now in sceduler code.
>>
>> Signed-off-by: Andrey Grodzovsky 
>> <mailto:andrey.grodzov...@amd.com>
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 
>> +++--
>>1 file changed, 95 insertions(+), 48 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a0e165c..85f8792 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_

Re:[PATCH] gpu/docs: Clarify what userspace means for gl

2019-04-24 Thread Zhou, David(ChunMing)
Will linux be only mesa-linux? I thought linux is an  open linux.
Which will impact our opengl/amdvlk(MIT open source), not sure Rocm:
1. how to deal with one uapi that opengl/amdvlk needs but mesa dont need? 
reject?
2. one hw feature that opengl/amdvlk developers work on that but no mesa 
developers work on, cannot upstream as well?

I think above two would easily happen, because there are many employees working 
on company project with many customer kinds of reqiurements, but mesa not.

-David

 Original Message 
Subject: [PATCH] gpu/docs: Clarify what userspace means for gl
From: Daniel Vetter
To: DRI Development ,Mesa Dev
CC: Jérôme Glisse ,Daniel Vetter ,Karol Herbst ,Kenneth Graunke ,Ben Skeggs 
,Daniel Vetter ,Sean Paul

Clear rules avoid arguing.

Note that this just aims to document current expectations. If that
shifts (e.g. because gl isn't the main api anymore, replaced by vk),
then we need to update this text.

I think it'd be good to have an equally solid list on the kms side.
But kms is much more meant to be a standard, and the list of userspace
projects we've accepted in the past is constantly shifting and
adjusting. So I figured I'll leave that as an exercise for later on.

v2: Try to clarify that we don't want a mesa driver just for mesa's
sake, and more clearly exclude anything that just doesn't make sense
technically.  Example would be a compute driver that makes sense to be
merged into drm (for kernel side code-sharing), but where the intended
use is some single-source CUDA-style compute without ever bothering
about any of the 3D/rendering side baggage that comes with gl/vk.

v3: Drop vulkan for now, the situation there isn't as obviously
clear-cut as on the gl side, and I don't want to tank this idea on a
hot discussion about vk and mesa. Plus I think once we have 1-2 more
vk drivers in mesa the situation on the vk side is clear-cut too, and
we can do a follow-up patch to add vk to the list where we expect the
userspace to be in upstream mesa. That's would give nice precedence to
make it clear that this isn't cast in stone, but meant to reflect
reality and should be adjusted as needed.

v4: Fix typo.

v5: Add a note to the commit message that this text needs to be
updated when the situation changes.

v6: Add a sentence why mesa will give the most meaningful review on gl
stuff - it's a very active project with lots of developers.

Acked-by: Dave Airlie  (v4)
Acked-by: Eric Anholt  (v4)
Acked-by: Alex Deucher  (v5)
Acked-by: Sean Paul  (v5)
Acked-by: Kenneth Graunke  (v5)
Acked-by: Karol Herbst  (v5)
Acked-by: Rob Clark 
Acked-by: Jérôme Glisse 
Acked-by: Bas Nieuwenhuizen 
Acked-by: Ben Skeggs 
Cc: Dave Airlie 
Cc: Eric Anholt 
Cc: Alex Deucher 
Cc: Sean Paul 
Cc: Kenneth Graunke 
Cc: Karol Herbst 
Cc: Rob Clark 
Cc: Jérôme Glisse 
Cc: Bas Nieuwenhuizen 
Cc: Ben Skeggs 
Signed-off-by: Daniel Vetter 
---
I chatted with a pile of people in private, and there's clearly some
solid support for this. But there's also some big concerns brought up
by other people. The main one summed up is "what if everyone just
ships vk, with a generic gl-on-vk like ANGLE?", but there's other
concerns too.

So all together I think this doesn't clear the bar of (almost)
unanimous support which we need to make documentation actually help
with clarifying what's expected. And if/when someone comes up with a
more creative userspace approach for gl/vk we'll need to figure this
all out with the time honored tradition of having a few massive
threads on dri-devel :-)

Hence this is more fyi as a guidance I guess, not a strict rule.
And I don't plan on merging this.

Cheers, Daniel
---
 Documentation/gpu/drm-uapi.rst | 25 +
 1 file changed, 25 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index c9fd23efd957..0f767cfd5db6 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -105,6 +105,31 @@ is already rather painful for the DRM subsystem, with 
multiple different uAPIs
 for the same thing co-existing. If we add a few more complete mistakes into the
 mix every year it would be entirely unmanageable.

+Below some clarifications what this means for specific areas in DRM.
+
+Compute Userspace
+---
+
+Userspace API for enabling compute and rendering blocks which are capable of at
+least supporting one of the OpenGL or OpenGL ES standards from Khronos need to
+be enabled in the upstream `Mesa3D project`.
+
+Mesa3D is the canonical upstream for these areas because it is a fully
+compliant, performant and cross-vendor implementation that supports all kernel
+drivers in DRM. It is also an active project with plenty of developers who
+can perform meaningful review. It is therefore the best platform to validate
+userspace API and especially make sure that cross-vendor interoperation is
+assured.
+
+Other userspace is only admissible if exposing a given feature 

RE: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-23 Thread Zhou, David(ChunMing)
>> -drm_sched_stop(>sched, >base);
>> -
>>   /* after all hw jobs are reset, hw fence is meaningless, so 
>> force_completion */
>>   amdgpu_fence_driver_force_completion(ring);
>>   }

HW fence are already forced completion, then we can just disable irq fence 
process and ignore hw fence signal when we are trying to do GPU reset, I think. 
Otherwise which will make the logic much more complex.
If this situation happens because of long time execution, we can increase 
timeout of reset detection.

-David

From: amd-gfx  On Behalf Of Grodzovsky, 
Andrey
Sent: Wednesday, April 24, 2019 12:00 AM
To: Zhou, David(ChunMing) ; 
dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org; 
e...@anholt.net; etna...@lists.freedesktop.org; ckoenig.leichtzumer...@gmail.com
Cc: Kazlauskas, Nicholas ; Liu, Monk 

Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already 
signaled.


No, i mean the actual HW fence which signals when the job finished execution on 
the HW.

Andrey
On 4/23/19 11:19 AM, Zhou, David(ChunMing) wrote:
do you mean fence timer? why not stop it as well when stopping sched for the 
reason of hw reset?

 Original Message 
Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already 
signaled.
From: "Grodzovsky, Andrey"
To: "Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com<mailto:dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com>
CC: "Kazlauskas, Nicholas" ,"Liu, Monk"

On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote:
> +Monk.
>
> GPU reset is used widely in SRIOV, so need virtulizatino guy take a look.
>
> But out of curious, why guilty job can signal more if the job is already
> set to guilty? set it wrongly?
>
>
> -David


It's possible that the job does completes at a later time then it's
timeout handler started processing so in this patch we try to protect
against this by rechecking the HW fence after stopping all SW
schedulers. We do it BEFORE marking guilty on the job's sched_entity so
at the point we check the guilty flag is not set yet.

Andrey


>
> 在 2019/4/18 23:00, Andrey Grodzovsky 写道:
>> Also reject TDRs if another one already running.
>>
>> v2:
>> Stop all schedulers across device and entire XGMI hive before
>> force signaling HW fences.
>> Avoid passing job_signaled to helper fnctions to keep all the decision
>> making about skipping HW reset in one place.
>>
>> v3:
>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced
>> against it's decrement in drm_sched_stop in non HW reset case.
>> v4: rebase
>> v5: Revert v3 as we do it now in sceduler code.
>>
>> Signed-off-by: Andrey Grodzovsky 
>> <mailto:andrey.grodzov...@amd.com>
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 
>> +++--
>>1 file changed, 95 insertions(+), 48 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a0e165c..85f8792 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct 
>> amdgpu_device *adev,
>>   if (!ring || !ring->sched.thread)
>>   continue;
>>
>> -drm_sched_stop(>sched, >base);
>> -
>>   /* after all hw jobs are reset, hw fence is meaningless, so 
>> force_completion */
>>   amdgpu_fence_driver_force_completion(ring);
>>   }
>> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct 
>> amdgpu_device *adev,
>>   if(job)
>>   drm_sched_increase_karma(>base);
>>
>> +/* Don't suspend on bare metal if we are not going to HW reset the ASIC 
>> */
>>   if (!amdgpu_sriov_vf(adev)) {
>>
>>   if (!need_full_reset)
>> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct 
>> amdgpu_hive_info *hive,
>>   return r;
>>}
>>
>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool 
>> trylock)
>>{
>> -int i;
>> -
>> -for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>> -struct amdgpu_ring *ring = ad

Re:[PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-23 Thread Zhou, David(ChunMing)
do you mean fence timer? why not stop it as well when stopping sched for the 
reason of hw reset?

 Original Message 
Subject: Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already 
signaled.
From: "Grodzovsky, Andrey"
To: "Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com
CC: "Kazlauskas, Nicholas" ,"Liu, Monk"


On 4/22/19 9:09 AM, Zhou, David(ChunMing) wrote:
> +Monk.
>
> GPU reset is used widely in SRIOV, so need virtulizatino guy take a look.
>
> But out of curious, why guilty job can signal more if the job is already
> set to guilty? set it wrongly?
>
>
> -David


It's possible that the job does completes at a later time then it's
timeout handler started processing so in this patch we try to protect
against this by rechecking the HW fence after stopping all SW
schedulers. We do it BEFORE marking guilty on the job's sched_entity so
at the point we check the guilty flag is not set yet.

Andrey


>
> 在 2019/4/18 23:00, Andrey Grodzovsky 写道:
>> Also reject TDRs if another one already running.
>>
>> v2:
>> Stop all schedulers across device and entire XGMI hive before
>> force signaling HW fences.
>> Avoid passing job_signaled to helper fnctions to keep all the decision
>> making about skipping HW reset in one place.
>>
>> v3:
>> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced
>> against it's decrement in drm_sched_stop in non HW reset case.
>> v4: rebase
>> v5: Revert v3 as we do it now in sceduler code.
>>
>> Signed-off-by: Andrey Grodzovsky 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 
>> +++--
>>1 file changed, 95 insertions(+), 48 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a0e165c..85f8792 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct 
>> amdgpu_device *adev,
>>   if (!ring || !ring->sched.thread)
>>   continue;
>>
>> -drm_sched_stop(>sched, >base);
>> -
>>   /* after all hw jobs are reset, hw fence is meaningless, so 
>> force_completion */
>>   amdgpu_fence_driver_force_completion(ring);
>>   }
>> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct 
>> amdgpu_device *adev,
>>   if(job)
>>   drm_sched_increase_karma(>base);
>>
>> +/* Don't suspend on bare metal if we are not going to HW reset the ASIC 
>> */
>>   if (!amdgpu_sriov_vf(adev)) {
>>
>>   if (!need_full_reset)
>> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct 
>> amdgpu_hive_info *hive,
>>   return r;
>>}
>>
>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool 
>> trylock)
>>{
>> -int i;
>> -
>> -for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>> -struct amdgpu_ring *ring = adev->rings[i];
>> -
>> -if (!ring || !ring->sched.thread)
>> -continue;
>> -
>> -if (!adev->asic_reset_res)
>> -drm_sched_resubmit_jobs(>sched);
>> +if (trylock) {
>> +if (!mutex_trylock(>lock_reset))
>> +return false;
>> +} else
>> +mutex_lock(>lock_reset);
>>
>> -drm_sched_start(>sched, !adev->asic_reset_res);
>> -}
>> -
>> -if (!amdgpu_device_has_dc_support(adev)) {
>> -drm_helper_resume_force_mode(adev->ddev);
>> -}
>> -
>> -adev->asic_reset_res = 0;
>> -}
>> -
>> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
>> -{
>> -mutex_lock(>lock_reset);
>>   atomic_inc(>gpu_reset_counter);
>>   adev->in_gpu_reset = 1;
>>   /* Block kfd: SRIOV would do it separately */
>>   if (!amdgpu_sriov_vf(adev))
>>amdgpu_amdkfd_pre_reset(adev);
>> +
>> +return true;
>>}
>>
>>static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
>> @@ -3538,40 +3521,42 @@ s

Re:[PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-23 Thread Zhou, David(ChunMing)
>3. If we have a ticket we grab a reference to the first BO on the LRU, drop 
>the LRU lock and try to grab the reservation lock with the ticket.

The BO on LRU is already locked by cs user, can it be dropped here by DC user? 
and then DC user grab its lock with ticket, how does CS grab it again?

If you think waiting in ttm has this risk, how about just adding a wrapper for 
pin function as below?
amdgpu_get_pin_bo_timeout()
{
do {
amdgpo_bo_reserve();
r=amdgpu_bo_pin();

if(!r)
break;
amdgpu_bo_unreserve();
timeout--;

} while(timeout>0);

}

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Zhou, David(ChunMing)" ,"Koenig, Christian" ,"Liang, Prike" 
,dri-devel@lists.freedesktop.org
CC:

Well that's not so easy of hand.

The basic problem here is that when you busy wait at this place you can easily 
run into situations where application A busy waits for B while B busy waits for 
A -> deadlock.

So what we need here is the deadlock detection logic of the ww_mutex. To use 
this we at least need to do the following steps:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).

2. If we then run into this EBUSY condition in TTM check if the BO we need 
memory for (or rather the ww_mutex of its reservation object) has a ticket 
assigned.

3. If we have a ticket we grab a reference to the first BO on the LRU, drop the 
LRU lock and try to grab the reservation lock with the ticket.

4. If getting the reservation lock with the ticket succeeded we check if the BO 
is still the first one on the LRU in question (the BO could have moved).

5. If the BO is still the first one on the LRU in question we try to evict it 
as we would evict any other BO.

6. If any of the "If's" above fail we just back off and return -EBUSY.

Steps 2-5 are certainly not trivial, but doable as far as I can see.

Have fun :)
Christian.

Am 23.04.19 um 15:19 schrieb Zhou, David(ChunMing):
How about adding more condition ctx->resv inline to address your concern? As 
well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang <mailto:prike.li...@amd.com>
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou <mailto:david1.z...@amd.com>
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> Cc: Liang, Prike <mailto:prike.li...@amd.com>; Zhou, 
> David(ChunMing) <mailto:david1.z...@amd.com>
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user 
> fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou <mailto:david1.z...@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c 
> index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
> ttm_buffer_object *bo,
>if (mem->mm_node)
>break;
>ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> + return ret;
> + }
>} while (1);
>mem->mem_type = mem_type;
>return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
dri-devel mailing list
dri-devel@lists.freedesktop.org<mailto:dri-devel@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH v5 3/6] drm/scheduler: rework job destruction

2019-04-23 Thread Zhou, David(ChunMing)
This patch is to fix deadlock between fence->lock and sched->job_list_lock, 
right?
So I suggest to just move list_del_init(_job->node) from 
drm_sched_process_job to work thread. That will avoid deadlock described in the 
link.


 Original Message 
Subject: Re: [PATCH v5 3/6] drm/scheduler: rework job destruction
From: "Grodzovsky, Andrey"
To: "Zhou, David(ChunMing)" 
,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,e...@anholt.net,etna...@lists.freedesktop.org,ckoenig.leichtzumer...@gmail.com
CC: "Kazlauskas, Nicholas" ,"Koenig, Christian"


On 4/22/19 8:48 AM, Chunming Zhou wrote:
> Hi Andrey,
>
> static void drm_sched_process_job(struct dma_fence *f, struct
> dma_fence_cb *cb)
> {
> ...
>   spin_lock_irqsave(>job_list_lock, flags);
>   /* remove job from ring_mirror_list */
>   list_del_init(_job->node);
>   spin_unlock_irqrestore(>job_list_lock, flags);
> [David] How about just remove above to worker from irq process? Any
> problem? Maybe I missed previous your discussion, but I think removing
> lock for list is a risk for future maintenance although you make sure
> thread safe currently.
>
> -David


We remove the lock exactly because of the fact that insertion and
removal to/from the list will be done form exactly one thread at ant
time now. So I am not sure I understand what you mean.

Andrey


>
> ...
>
>   schedule_work(_job->finish_work);
> }
>
> 在 2019/4/18 23:00, Andrey Grodzovsky 写道:
>> From: Christian König 
>>
>> We now destroy finished jobs from the worker thread to make sure that
>> we never destroy a job currently in timeout processing.
>> By this we avoid holding lock around ring mirror list in drm_sched_stop
>> which should solve a deadlock reported by a user.
>>
>> v2: Remove unused variable.
>> v4: Move guilty job free into sched code.
>> v5:
>> Move sched->hw_rq_count to drm_sched_start to account for counter
>> decrement in drm_sched_stop even when we don't call resubmit jobs
>> if guily job did signal.
>>
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>>
>> Signed-off-by: Christian König 
>> Signed-off-by: Andrey Grodzovsky 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>>drivers/gpu/drm/etnaviv/etnaviv_dump.c |   4 -
>>drivers/gpu/drm/etnaviv/etnaviv_sched.c|   2 +-
>>drivers/gpu/drm/lima/lima_sched.c  |   2 +-
>>drivers/gpu/drm/panfrost/panfrost_job.c|   2 +-
>>drivers/gpu/drm/scheduler/sched_main.c | 159 
>> +
>>drivers/gpu/drm/v3d/v3d_sched.c|   2 +-
>>include/drm/gpu_scheduler.h|   6 +-
>>8 files changed, 102 insertions(+), 84 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 7cee269..a0e165c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct 
>> amdgpu_device *adev,
>>if (!ring || !ring->sched.thread)
>>continue;
>>
>> - drm_sched_stop(>sched);
>> + drm_sched_stop(>sched, >base);
>>
>>/* after all hw jobs are reset, hw fence is meaningless, so 
>> force_completion */
>>amdgpu_fence_driver_force_completion(ring);
>> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct 
>> amdgpu_device *adev,
>>if(job)
>>drm_sched_increase_karma(>base);
>>
>> -
>> -
>>if (!amdgpu_sriov_vf(adev)) {
>>
>>if (!need_full_reset)
>> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct 
>> amdgpu_hive_info *hive,
>>return r;
>>}
>>
>> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev,
>> -   struct amdgpu_job *job)
>> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>>{
>>int i;
>>
>> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
>> *adev,
>>
>>/* Post ASIC reset for all devs .*/
>>list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
>> - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job : NULL);
>> + amdgpu_device_post_asic_reset(tmp_adev);
>>
>>if (r) {
>>/* bad news, how to tell it to userspace ? */
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c 
>> b/drivers/gpu/drm/e

Re:[PATCH] ttm: wait mem space if user allow while gpu busy

2019-04-23 Thread Zhou, David(ChunMing)
How about adding more condition ctx->resv inline to address your concern? As 
well as don't wait from same user, shouldn't lead to deadlock.

Otherwise, any other idea?

 Original Message 
Subject: Re: [PATCH] ttm: wait mem space if user allow while gpu busy
From: Christian König
To: "Liang, Prike" ,"Zhou, David(ChunMing)" ,dri-devel@lists.freedesktop.org
CC:

Well that is certainly a NAK because it can lead to deadlock in the
memory management.

You can't just busy wait with all those locks held.

Regards,
Christian.

Am 23.04.19 um 03:45 schrieb Liang, Prike:
> Acked-by: Prike Liang 
>
> Thanks,
> Prike
> -Original Message-
> From: Chunming Zhou 
> Sent: Monday, April 22, 2019 6:39 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Liang, Prike ; Zhou, David(ChunMing) 
> 
> Subject: [PATCH] ttm: wait mem space if user allow while gpu busy
>
> heavy gpu job could occupy memory long time, which could lead to other user 
> fail to get memory.
>
> Change-Id: I0b322d98cd76e5ac32b00462bbae8008d76c5e11
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c 
> index 7c484729f9b2..6c596cc24bec 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -830,8 +830,10 @@ static int ttm_bo_mem_force_space(struct 
> ttm_buffer_object *bo,
>if (mem->mm_node)
>break;
>ret = ttm_mem_evict_first(bdev, mem_type, place, ctx);
> - if (unlikely(ret != 0))
> - return ret;
> + if (unlikely(ret != 0)) {
> + if (!ctx || ctx->no_wait_gpu || ret != -EBUSY)
> + return ret;
> + }
>} while (1);
>mem->mem_type = mem_type;
>return ttm_bo_add_move_fence(bo, man, mem);
> --
> 2.17.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: DMA-buf P2P

2019-04-19 Thread Zhou, David(ChunMing)
Which test are you using? Can share?

-David

> -Original Message-
> From: dri-devel  On Behalf Of
> Christian K?nig
> Sent: Thursday, April 18, 2019 8:09 PM
> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org
> Subject: DMA-buf P2P
> 
> Hi guys,
> 
> as promised this is the patch set which enables P2P buffer sharing with DMA-
> buf.
> 
> Basic idea is that importers can set a flag noting that they can deal with and
> sgt which doesn't contains pages.
> 
> This in turn is the signal to the exporter that we don't need to move a buffer
> to system memory any more when a remote device wants to access it.
> 
> Please review and/or comment,
> Christian.
> 
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH libdrm] amdgpu: Add context priority override function.

2019-04-17 Thread Zhou, David(ChunMing)
Reviewed-by: Chunming Zhou 

> -Original Message-
> From: dri-devel  On Behalf Of Bas
> Nieuwenhuizen
> Sent: Thursday, April 18, 2019 2:34 AM
> To: dri-devel@lists.freedesktop.org
> Subject: [PATCH libdrm] amdgpu: Add context priority override function.
> 
> This way we can override the priority of a single context using a master fd.
> 
> Since we cannot usefully create an amdgpu device of a master fd without the
> fd deduplication kicking in this takes a plain fd.
> 
> This can be used by e.g. radv to get high priority contexts using a master fd
> from the primary node or a lease.
> ---
>  amdgpu/amdgpu-symbol-check |  1 +
>  amdgpu/amdgpu.h| 15 +++
>  amdgpu/amdgpu_cs.c | 25 +
>  3 files changed, 41 insertions(+)
> 
> diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-
> check index 96a44b40..4d806922 100755
> --- a/amdgpu/amdgpu-symbol-check
> +++ b/amdgpu/amdgpu-symbol-check
> @@ -38,6 +38,7 @@ amdgpu_cs_create_syncobj2  amdgpu_cs_ctx_create
>  amdgpu_cs_ctx_create2
>  amdgpu_cs_ctx_free
> +amdgpu_cs_ctx_override_priority
>  amdgpu_cs_destroy_semaphore
>  amdgpu_cs_destroy_syncobj
>  amdgpu_cs_export_syncobj
> diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index
> d6de3b8d..3d838e08 100644
> --- a/amdgpu/amdgpu.h
> +++ b/amdgpu/amdgpu.h
> @@ -911,6 +911,21 @@ int amdgpu_cs_ctx_create(amdgpu_device_handle
> dev,  */  int amdgpu_cs_ctx_free(amdgpu_context_handle context);
> 
> +/**
> + * Override the submission priority for the given context using a master fd.
> + *
> + * \param   dev - \c [in] device handle
> + * \param   context- \c [in] context handle for context id
> + * \param   master_fd  - \c [in] The master fd to authorize the override.
> + * \param   priority   - \c [in] The priority to assign to the context.
> + *
> + * \return 0 on success or a a negative Posix error code on failure.
> + */
> +int amdgpu_cs_ctx_override_priority(amdgpu_device_handle dev,
> +amdgpu_context_handle context,
> +int master_fd,
> +unsigned priority);
> +
>  /**
>   * Query reset state for the specific GPU Context
>   *
> diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index
> 5bedf748..7ee844fb 100644
> --- a/amdgpu/amdgpu_cs.c
> +++ b/amdgpu/amdgpu_cs.c
> @@ -142,6 +142,31 @@ drm_public int
> amdgpu_cs_ctx_free(amdgpu_context_handle context)
>   return r;
>  }
> 
> +drm_public int amdgpu_cs_ctx_override_priority(amdgpu_device_handle
> dev,
> +   amdgpu_context_handle context,
> +   int master_fd,
> +   unsigned priority) {
> + int r;
> +
> + if (!dev || !context || master_fd < 0)
> + return -EINVAL;
> +
> + union drm_amdgpu_sched args;
> + memset(, 0, sizeof(args));
> +
> + args.in.op = AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE;
> + args.in.fd = dev->fd;
> + args.in.priority = priority;
> + args.in.ctx_id = context->id;
> +
> + r = drmCommandWrite(master_fd, DRM_AMDGPU_SCHED, ,
> sizeof(args));
> + if (r)
> + return r;
> +
> + return 0;
> +}
> +
>  drm_public int amdgpu_cs_query_reset_state(amdgpu_context_handle
> context,
>  uint32_t *state, uint32_t *hangs)  {
> --
> 2.21.0
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re:[PATCH v2] drm: introduce a capability flag for syncobj timeline support

2019-04-16 Thread Zhou, David(ChunMing)
Reviewed-by: Chunming Zhou  for series.

 Original Message 
Subject: [PATCH v2] drm: introduce a capability flag for syncobj timeline 
support
From: Lionel Landwerlin
To: dri-devel@lists.freedesktop.org
CC: Lionel Landwerlin ,"Koenig, Christian" ,Dave Airlie ,Daniel Vetter ,"Zhou, 
David(ChunMing)"

Unfortunately userspace users of this API cannot be publicly disclosed
yet.

This commit effectively disables timeline syncobj ioctls for all
drivers. Each driver wishing to support this feature will need to
expose DRIVER_SYNCOBJ_TIMELINE.

v2: Add uAPI capability check (Christian)

Signed-off-by: Lionel Landwerlin 
Reviewed-by: Christian König  (v1)
Cc: Dave Airlie 
Cc: Daniel Vetter 
Cc: Christian König 
Cc: Chunming Zhou 
---
 drivers/gpu/drm/drm_ioctl.c   |  3 +++
 drivers/gpu/drm/drm_syncobj.c | 10 +-
 include/drm/drm_drv.h |  7 +++
 include/uapi/drm/drm.h|  1 +
 4 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index d337f161909c..15ca94338d55 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -245,6 +245,9 @@ static int drm_getcap(struct drm_device *dev, void *data, 
struct drm_file *file_
 case DRM_CAP_SYNCOBJ:
 req->value = drm_core_check_feature(dev, DRIVER_SYNCOBJ);
 return 0;
+   case DRM_CAP_SYNCOBJ_TIMELINE:
+   req->value = drm_core_check_feature(dev, 
DRIVER_SYNCOBJ_TIMELINE);
+   return 0;
 }

 /* Other caps only work with KMS drivers */
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index fb65f13d25cf..72a38ff6e3e4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -755,7 +755,7 @@ drm_syncobj_transfer_ioctl(struct drm_device *dev, void 
*data,
 struct drm_syncobj_transfer *args = data;
 int ret;

-   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
 return -EOPNOTSUPP;

 if (args->pad)
@@ -1106,7 +1106,7 @@ drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, 
void *data,
 struct drm_syncobj **syncobjs;
 int ret = 0;

-   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
 return -EOPNOTSUPP;

 if (args->flags & ~(DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL |
@@ -1210,7 +1210,7 @@ drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, 
void *data,
 uint32_t i, j;
 int ret;

-   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
 return -EOPNOTSUPP;

 if (args->pad != 0)
@@ -1281,8 +1281,8 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void 
*data,
 uint32_t i;
 int ret;

-   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
-   return -ENODEV;
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE))
+   return -EOPNOTSUPP;

 if (args->pad != 0)
 return -EINVAL;
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index 5cc7f728ec73..68ca736c548d 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -91,6 +91,13 @@ enum drm_driver_feature {
  * submission.
  */
 DRIVER_SYNCOBJ  = BIT(5),
+   /**
+* @DRIVER_SYNCOBJ_TIMELINE:
+*
+* Driver supports the timeline flavor of _syncobj for explicit
+* synchronization of command submission.
+*/
+   DRIVER_SYNCOBJ_TIMELINE = BIT(6),

 /* IMPORTANT: Below are all the legacy flags, add new ones above. */

diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 236b01a1fabf..661d73f9a919 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -649,6 +649,7 @@ struct drm_gem_open {
 #define DRM_CAP_PAGE_FLIP_TARGET0x11
 #define DRM_CAP_CRTC_IN_VBLANK_EVENT0x12
 #define DRM_CAP_SYNCOBJ 0x13
+#define DRM_CAP_SYNCOBJ_TIMELINE   0x14

 /** DRM_IOCTL_GET_CAP ioctl argument type */
 struct drm_get_cap {
--
2.21.0.392.gf8f6787159e

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point interface v4

2019-03-31 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Lionel Landwerlin 
> Sent: Saturday, March 30, 2019 10:09 PM
> To: Koenig, Christian ; Zhou, David(ChunMing)
> ; dri-devel@lists.freedesktop.org; amd-
> g...@lists.freedesktop.org; ja...@jlekstrand.net; Hector, Tobias
> 
> Subject: Re: [PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point
> interface v4
> 
> On 28/03/2019 15:18, Christian König wrote:
> > Am 28.03.19 um 14:50 schrieb Lionel Landwerlin:
> >> On 25/03/2019 08:32, Chunming Zhou wrote:
> >>> From: Christian König 
> >>>
> >>> Use the dma_fence_chain object to create a timeline of fence objects
> >>> instead of just replacing the existing fence.
> >>>
> >>> v2: rebase and cleanup
> >>> v3: fix garbage collection parameters
> >>> v4: add unorder point check, print a warn calltrace
> >>>
> >>> Signed-off-by: Christian König 
> >>> Cc: Lionel Landwerlin 
> >>> ---
> >>>   drivers/gpu/drm/drm_syncobj.c | 39
> >>> +++
> >>>   include/drm/drm_syncobj.h |  5 +
> >>>   2 files changed, 44 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_syncobj.c
> >>> b/drivers/gpu/drm/drm_syncobj.c index 5329e66598c6..19a9ce638119
> >>> 100644
> >>> --- a/drivers/gpu/drm/drm_syncobj.c
> >>> +++ b/drivers/gpu/drm/drm_syncobj.c
> >>> @@ -122,6 +122,45 @@ static void drm_syncobj_remove_wait(struct
> >>> drm_syncobj *syncobj,
> >>>   spin_unlock(>lock);
> >>>   }
> >>>   +/**
> >>> + * drm_syncobj_add_point - add new timeline point to the syncobj
> >>> + * @syncobj: sync object to add timeline point do
> >>> + * @chain: chain node to use to add the point
> >>> + * @fence: fence to encapsulate in the chain node
> >>> + * @point: sequence number to use for the point
> >>> + *
> >>> + * Add the chain node as new timeline point to the syncobj.
> >>> + */
> >>> +void drm_syncobj_add_point(struct drm_syncobj *syncobj,
> >>> +   struct dma_fence_chain *chain,
> >>> +   struct dma_fence *fence,
> >>> +   uint64_t point)
> >>> +{
> >>> +    struct syncobj_wait_entry *cur, *tmp;
> >>> +    struct dma_fence *prev;
> >>> +
> >>> +    dma_fence_get(fence);
> >>> +
> >>> +    spin_lock(>lock);
> >>> +
> >>> +    prev = drm_syncobj_fence_get(syncobj);
> >>> +    /* You are adding an unorder point to timeline, which could
> >>> cause payload returned from query_ioctl is 0! */
> >>> +    WARN_ON_ONCE(prev && prev->seqno >= point);
> >>
> >>
> >> I think the WARN/BUG macros should only fire when there is an issue
> >> with programming from within the kernel.
> >>
> >> But this particular warning can be triggered by an application.
> >>
> >>
> >> Probably best to just remove it?
> >
> > Yeah, that was also my argument against it.
> >
> > Key point here is that we still want to note somehow that userspace
> > did something wrong and returning an error is not an option.
> >
> > Maybe just use DRM_ERROR with a static variable to print the message
> > only once.
> >
> > Christian.
> 
> I don't really see any point in printing an error once. If you run your
> application twice you end up thinking there was an issue just on the first run
> but it's actually always wrong.
> 

Except this nitpick, is there any other concern to push whole patch set? Is 
that time to push whole patch set?

-David

> 
> Unless we're willing to take the syncobj lock for longer periods of time when
> adding points, I guess we'll have to defer validation to validation layers.
> 
> 
> -Lionel
> 
> >
> >>
> >>
> >> -Lionel
> >>
> >>
> >>> +    dma_fence_chain_init(chain, prev, fence, point);
> >>> +    rcu_assign_pointer(syncobj->fence, >base);
> >>> +
> >>> +    list_for_each_entry_safe(cur, tmp, >cb_list, node) {
> >>> +    list_del_init(>node);
> >>> +    syncobj_wait_syncobj_func(syncobj, cur);
> >>> +    }
> >>> +    spin_unlock(>lock);
> >>> +
> >>> +    /* Walk the chain once to trigger garbage collection */
> >>> +    dma_fence_c

Re:[PATCH 1/9] dma-buf: add new dma_fence_chain container v6

2019-03-21 Thread Zhou, David(ChunMing)
cmpxchg be replaced by some simple c sentance?
otherwise we have to remove __rcu of chian->prev.

-David

 Original Message 
Subject: Re: [PATCH 1/9] dma-buf: add new dma_fence_chain container v6
From: Christian König
To: "Zhou, David(ChunMing)" ,kbuild test robot ,"Zhou, David(ChunMing)"
CC: 
kbuild-...@01.org,dri-devel@lists.freedesktop.org,amd-...@lists.freedesktop.org,lionel.g.landwer...@intel.com,ja...@jlekstrand.net,"Koenig,
 Christian" ,"Hector, Tobias"

Hi David,

For the cmpxchg() case I of hand don't know either. Looks like so far
nobody has used cmpxchg() with rcu protected structures.

The other cases should be replaced by RCU_INIT_POINTER() or
rcu_dereference_protected(.., true);

Regards,
Christian.

Am 21.03.19 um 07:34 schrieb zhoucm1:
> Hi Lionel and Christian,
>
> Below is robot report for chain->prev, which was added __rcu as you
> suggested.
>
> How to fix this line "tmp = cmpxchg(>prev, prev, replacement); "?
> I checked kernel header file, seems it has no cmpxchg for rcu.
>
> Any suggestion to fix this robot report?
>
> Thanks,
> -David
>
> On 2019年03月21日 08:24, kbuild test robot wrote:
>> Hi Chunming,
>>
>> I love your patch! Perhaps something to improve:
>>
>> [auto build test WARNING on linus/master]
>> [also build test WARNING on v5.1-rc1 next-20190320]
>> [if your patch is applied to the wrong git tree, please drop us a
>> note to help improve the system]
>>
>> url:
>> https://github.com/0day-ci/linux/commits/Chunming-Zhou/dma-buf-add-new-dma_fence_chain-container-v6/20190320-223607
>> reproduce:
>>  # apt-get install sparse
>>  make ARCH=x86_64 allmodconfig
>>  make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'
>>
>>
>> sparse warnings: (new ones prefixed by >>)
>>
>>>> drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in
>>>> initializer (different address spaces) @@expected struct
>>>> dma_fence [noderef] *__old @@got  dma_fence [noderef]
>>>> *__old @@
>> drivers/dma-buf/dma-fence-chain.c:73:23:expected struct
>> dma_fence [noderef] *__old
>> drivers/dma-buf/dma-fence-chain.c:73:23:got struct dma_fence
>> *[assigned] prev
>>>> drivers/dma-buf/dma-fence-chain.c:73:23: sparse: incorrect type in
>>>> initializer (different address spaces) @@expected struct
>>>> dma_fence [noderef] *__new @@got  dma_fence [noderef]
>>>> *__new @@
>> drivers/dma-buf/dma-fence-chain.c:73:23:expected struct
>> dma_fence [noderef] *__new
>> drivers/dma-buf/dma-fence-chain.c:73:23:got struct dma_fence
>> *[assigned] replacement
>>>> drivers/dma-buf/dma-fence-chain.c:73:21: sparse: incorrect type in
>>>> assignment (different address spaces) @@expected struct
>>>> dma_fence *tmp @@got struct dma_fence [noderef] >>> dma_fence *tmp @@
>> drivers/dma-buf/dma-fence-chain.c:73:21:expected struct
>> dma_fence *tmp
>> drivers/dma-buf/dma-fence-chain.c:73:21:got struct dma_fence
>> [noderef] *[assigned] __ret
>>>> drivers/dma-buf/dma-fence-chain.c:190:28: sparse: incorrect type in
>>>> argument 1 (different address spaces) @@expected struct
>>>> dma_fence *fence @@got struct dma_fence struct dma_fence *fence @@
>> drivers/dma-buf/dma-fence-chain.c:190:28:expected struct
>> dma_fence *fence
>> drivers/dma-buf/dma-fence-chain.c:190:28:got struct dma_fence
>> [noderef] *prev
>>>> drivers/dma-buf/dma-fence-chain.c:222:21: sparse: incorrect type in
>>>> assignment (different address spaces) @@expected struct
>>>> dma_fence [noderef] *prev @@got [noderef] *prev @@
>> drivers/dma-buf/dma-fence-chain.c:222:21:expected struct
>> dma_fence [noderef] *prev
>> drivers/dma-buf/dma-fence-chain.c:222:21:got struct dma_fence
>> *prev
>> drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression
>> using sizeof(void)
>> drivers/dma-buf/dma-fence-chain.c:235:33: sparse: expression
>> using sizeof(void)
>>
>> vim +73 drivers/dma-buf/dma-fence-chain.c
>>
>>  38
>>  39/**
>>  40 * dma_fence_chain_walk - chain walking function
>>  41 * @fence: current chain node
>>  42 *
>>  43 * Walk the chain to the next node. Returns the next fence
>> or NULL if we are at
>>  44 * the end of the chain. Garbage collects chain nodes
>> which are already
>> 

RE: [PATCH] drm/amdgpu: Error handling issues about CHECKED_RETURN

2019-02-13 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Bo YU 
> Sent: Thursday, February 14, 2019 12:46 PM
> To: Deucher, Alexander ; Koenig, Christian
> ; Zhou, David(ChunMing)
> ; airl...@linux.ie; dan...@ffwll.ch; Zhu, Rex
> ; Grodzovsky, Andrey
> ; dri-devel@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Cc: Bo Yu ; amd-...@lists.freedesktop.org
> Subject: [PATCH] drm/amdgpu: Error handling issues about
> CHECKED_RETURN
> 
> From: Bo Yu 
> 
> Calling "amdgpu_ring_test_helper" without checking return value

We could need to continue to ring test even there is one ring test failed.

-David

> 
> Signed-off-by: Bo Yu 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 57cb3a51bda7..48465a61516b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4728,7 +4728,9 @@ static int gfx_v8_0_cp_test_all_rings(struct
> amdgpu_device *adev)
> 
>   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
>   ring = >gfx.compute_ring[i];
> - amdgpu_ring_test_helper(ring);
> + r = amdgpu_ring_test_helper(ring);
> + if (r)
> + return r;
>   }
> 
>   return 0;
> --
> 2.11.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

RE: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new drm_syncobj_add_point interface v2

2018-12-12 Thread Zhou, David(ChunMing)
+ Daniel Rakos and Jason Ekstrand.

 Below is the background, which is from Daniel R should  be able to explain 
that's why: 
" ISVs, especially those coming from D3D12, are unsatisfied with the behavior 
of the Vulkan semaphores as they are unhappy with the fact that for every 
single dependency they need to use separate semaphores due to their binary 
nature.
Compared to that a synchronization primitive like D3D12 monitored fences enable 
one of those to be used to track a sequence of operations by simply associating 
timeline values to the completion of individual operations. This allows them to 
track the lifetime and usage of resources and the ordered completion of 
sequences.
Besides that, they also want to use a single synchronization primitive to be 
able to handle GPU-to-GPU and GPU-to-CPU dependencies, compared to using 
semaphores for the former and fences for the latter.
In addition, compared to legacy semaphores, timeline semaphores are proposed to 
support wait-before-signal, i.e. allow enqueueing a semaphore wait operation 
with a wait value that is larger than any of the already enqueued signal 
values. This seems to be a hard requirement for ISVs. Without UMD-side queue 
batching, and even UMD-side queue batching doesn’t help the situation when such 
a semaphore is externally shared with another API. Thus in order to properly 
support wait-before-signal the KMD implementation has to also be able to 
support such dependencies.
"

Btw, we already add test case to igt, and tested by many existing test, like 
libdrm unit test, igt related test, vulkan cts, and steam games.

-David
> -Original Message-
> From: Daniel Vetter 
> Sent: Wednesday, December 12, 2018 7:15 PM
> To: Koenig, Christian 
> Cc: Zhou, David(ChunMing) ; dri-devel  de...@lists.freedesktop.org>; amd-gfx list ;
> intel-gfx ; Christian König
> 
> Subject: Re: [Intel-gfx] [PATCH 03/10] drm/syncobj: add new
> drm_syncobj_add_point interface v2
> 
> On Wed, Dec 12, 2018 at 12:08 PM Koenig, Christian
>  wrote:
> >
> > Am 12.12.18 um 11:49 schrieb Daniel Vetter:
> > > On Fri, Dec 07, 2018 at 11:54:15PM +0800, Chunming Zhou wrote:
> > >> From: Christian König 
> > >>
> > >> Use the dma_fence_chain object to create a timeline of fence
> > >> objects instead of just replacing the existing fence.
> > >>
> > >> v2: rebase and cleanup
> > >>
> > >> Signed-off-by: Christian König 
> > > Somewhat jumping back into this. Not sure we discussed this already
> > > or not. I'm a bit unclear on why we have to chain the fences in the
> timeline:
> > >
> > > - The timeline stuff is modelled after the WDDM2 monitored fences.
> Which
> > >really are just u64 counters in memory somewhere (I think could be
> > >system ram or vram). Because WDDM2 has the memory management
> entirely
> > >separated from rendering synchronization it totally allows userspace to
> > >create loops and deadlocks and everything else nasty using this - the
> > >memory manager won't deadlock because these monitored fences
> never leak
> > >into the buffer manager. And if CS deadlock, gpu reset takes care of 
> > > the
> > >mess.
> > >
> > > - This has a few consequences, as in they seem to indeed work like a
> > >memory location: Userspace incrementing out-of-order (because they
> run
> > >batches updating the same fence on different engines) is totally fine,
> > >as is doing anything else "stupid".
> > >
> > > - Now on linux we can't allow anything, because we need to make sure
> that
> > >deadlocks don't leak into the memory manager. But as long as we block
> > >until the underlying dma_fence has materialized, nothing userspace can
> > >do will lead to such a deadlock. Even if userspace ends up submitting
> > >jobs without enough built-in synchronization, leading to out-of-order
> > >signalling of fences on that "timeline". And I don't think that would
> > >pose a problem for us.
> > >
> > > Essentially I think we can look at timeline syncobj as a dma_fence
> > > container indexed through an integer, and there's no need to enforce
> > > that the timline works like a real dma_fence timeline, with all it's
> > > guarantees. It's just a pile of (possibly, if userspace is stupid)
> > > unrelated dma_fences. You could implement the entire thing in
> > > userspace after all, except for the "we want to share these timeline
> > > objects between processes" problem.
> > >
> > > tldr; I think

RE: [PATCH 01/10] dma-buf: add new dma_fence_chain container v4

2018-12-11 Thread Zhou, David(ChunMing)
Hi Daniel and Chris,

Could you take a look on all the patches? Can we get your RB or AB on all 
patches including igt patch before we submit to drm-misc? 

We already fix all existing issues, and also add  test case in IGT as your 
required.

Btw, the patch set is tested by below tests:
a. vulkan cts  " ./deqp-vk -n dEQP-VK. *semaphore*" 
b. internal vulkan timeline test
c. libdrm test "sudo ./amdgpu_test -s 9"
d. IGT test, "sudo ./syncobj_basic"
e. IGT test, "sudo ./syncobj_wait"
f. IGT test, "sudo ./syncobj_timeline"

Any other suggestion or requirement is welcome.

-David

> -Original Message-
> From: dri-devel  On Behalf Of
> Chunming Zhou
> Sent: Tuesday, December 11, 2018 6:35 PM
> To: Koenig, Christian ; dri-
> de...@lists.freedesktop.org; amd-...@lists.freedesktop.org; intel-
> g...@lists.freedesktop.org
> Cc: Christian König ; Koenig, Christian
> 
> Subject: [PATCH 01/10] dma-buf: add new dma_fence_chain container v4
> 
> From: Christian König 
> 
> Lockless container implementation similar to a dma_fence_array, but with
> only two elements per node and automatic garbage collection.
> 
> v2: properly document dma_fence_chain_for_each, add
> dma_fence_chain_find_seqno,
> drop prev reference during garbage collection if it's not a chain fence.
> v3: use head and iterator for dma_fence_chain_for_each
> v4: fix reference count in dma_fence_chain_enable_signaling
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/Makefile  |   3 +-
>  drivers/dma-buf/dma-fence-chain.c | 241
> ++
>  include/linux/dma-fence-chain.h   |  81 ++
>  3 files changed, 324 insertions(+), 1 deletion(-)  create mode 100644
> drivers/dma-buf/dma-fence-chain.c  create mode 100644 include/linux/dma-
> fence-chain.h
> 
> diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index
> 0913a6ccab5a..1f006e083eb9 100644
> --- a/drivers/dma-buf/Makefile
> +++ b/drivers/dma-buf/Makefile
> @@ -1,4 +1,5 @@
> -obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-
> fence.o
> +obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
> +  reservation.o seqno-fence.o
>  obj-$(CONFIG_SYNC_FILE)  += sync_file.o
>  obj-$(CONFIG_SW_SYNC)+= sw_sync.o sync_debug.o
>  obj-$(CONFIG_UDMABUF)+= udmabuf.o
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-
> fence-chain.c
> new file mode 100644
> index ..0c5e3c902fa0
> --- /dev/null
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -0,0 +1,241 @@
> +/*
> + * fence-chain: chain fences together in a timeline
> + *
> + * Copyright (C) 2018 Advanced Micro Devices, Inc.
> + * Authors:
> + *   Christian König 
> + *
> + * This program is free software; you can redistribute it and/or modify
> +it
> + * under the terms of the GNU General Public License version 2 as
> +published by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> +WITHOUT
> + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> +or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +License for
> + * more details.
> + */
> +
> +#include 
> +
> +static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
> +
> +/**
> + * dma_fence_chain_get_prev - use RCU to get a reference to the
> +previous fence
> + * @chain: chain node to get the previous node from
> + *
> + * Use dma_fence_get_rcu_safe to get a reference to the previous fence
> +of the
> + * chain node.
> + */
> +static struct dma_fence *dma_fence_chain_get_prev(struct
> +dma_fence_chain *chain) {
> + struct dma_fence *prev;
> +
> + rcu_read_lock();
> + prev = dma_fence_get_rcu_safe(>prev);
> + rcu_read_unlock();
> + return prev;
> +}
> +
> +/**
> + * dma_fence_chain_walk - chain walking function
> + * @fence: current chain node
> + *
> + * Walk the chain to the next node. Returns the next fence or NULL if
> +we are at
> + * the end of the chain. Garbage collects chain nodes which are already
> + * signaled.
> + */
> +struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence) {
> + struct dma_fence_chain *chain, *prev_chain;
> + struct dma_fence *prev, *replacement, *tmp;
> +
> + chain = to_dma_fence_chain(fence);
> + if (!chain) {
> + dma_fence_put(fence);
> + return NULL;
> + }
> +
> + while ((prev = dma_fence_chain_get_prev(chain))) {
> +
> + prev_chain = to_dma_fence_chain(prev);
> + if (prev_chain) {
> + if (!dma_fence_is_signaled(prev_chain->fence))
> + break;
> +
> + replacement =
> dma_fence_chain_get_prev(prev_chain);
> + } else {
> + if (!dma_fence_is_signaled(prev))
> + break;
> +
> + replacement = NULL;
> + }
> +
> + tmp 

RE: [PATCH v3 2/2] drm/sched: Rework HW fence processing.

2018-12-10 Thread Zhou, David(ChunMing)
I don't think adding cb to sched job would work as soon as their lifetime is 
different with fence.
Unless you make the sched job reference, otherwise we will get trouble sooner 
or later.

-David

> -Original Message-
> From: amd-gfx  On Behalf Of
> Andrey Grodzovsky
> Sent: Tuesday, December 11, 2018 5:44 AM
> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org;
> ckoenig.leichtzumer...@gmail.com; e...@anholt.net;
> etna...@lists.freedesktop.org
> Cc: Zhou, David(ChunMing) ; Liu, Monk
> ; Grodzovsky, Andrey
> 
> Subject: [PATCH v3 2/2] drm/sched: Rework HW fence processing.
> 
> Expedite job deletion from ring mirror list to the HW fence signal callback
> instead from finish_work, together with waiting for all such fences to signal 
> in
> drm_sched_stop we garantee that already signaled job will not be processed
> twice.
> Remove the sched finish fence callback and just submit finish_work directly
> from the HW fence callback.
> 
> v2: Fix comments.
> 
> v3: Attach  hw fence cb to sched_job
> 
> Suggested-by: Christian Koenig 
> Signed-off-by: Andrey Grodzovsky 
> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 58 --
> 
>  include/drm/gpu_scheduler.h|  6 ++--
>  2 files changed, 30 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index cdf95e2..f0c1f32 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -284,8 +284,6 @@ static void drm_sched_job_finish(struct work_struct
> *work)
>   cancel_delayed_work_sync(>work_tdr);
> 
>   spin_lock_irqsave(>job_list_lock, flags);
> - /* remove job from ring_mirror_list */
> - list_del_init(_job->node);
>   /* queue TDR for next job */
>   drm_sched_start_timeout(sched);
>   spin_unlock_irqrestore(>job_list_lock, flags); @@ -293,22
> +291,11 @@ static void drm_sched_job_finish(struct work_struct *work)
>   sched->ops->free_job(s_job);
>  }
> 
> -static void drm_sched_job_finish_cb(struct dma_fence *f,
> - struct dma_fence_cb *cb)
> -{
> - struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
> -  finish_cb);
> - schedule_work(>finish_work);
> -}
> -
>  static void drm_sched_job_begin(struct drm_sched_job *s_job)  {
>   struct drm_gpu_scheduler *sched = s_job->sched;
>   unsigned long flags;
> 
> - dma_fence_add_callback(_job->s_fence->finished, _job-
> >finish_cb,
> -drm_sched_job_finish_cb);
> -
>   spin_lock_irqsave(>job_list_lock, flags);
>   list_add_tail(_job->node, >ring_mirror_list);
>   drm_sched_start_timeout(sched);
> @@ -359,12 +346,11 @@ void drm_sched_stop(struct drm_gpu_scheduler
> *sched, struct drm_sched_job *bad,
>   list_for_each_entry_reverse(s_job, >ring_mirror_list, node)
> {
>   if (s_job->s_fence->parent &&
>   dma_fence_remove_callback(s_job->s_fence->parent,
> -   _job->s_fence->cb)) {
> +   _job->cb)) {
>   dma_fence_put(s_job->s_fence->parent);
>   s_job->s_fence->parent = NULL;
>   atomic_dec(>hw_rq_count);
> - }
> - else {
> + } else {
>   /* TODO Is it get/put neccessey here ? */
>   dma_fence_get(_job->s_fence->finished);
>   list_add(_job->finish_node, _list); @@ -
> 417,31 +403,34 @@ EXPORT_SYMBOL(drm_sched_stop);  void
> drm_sched_start(struct drm_gpu_scheduler *sched, bool unpark_only)  {
>   struct drm_sched_job *s_job, *tmp;
> - unsigned long flags;
>   int r;
> 
>   if (unpark_only)
>   goto unpark;
> 
> - spin_lock_irqsave(>job_list_lock, flags);
> + /*
> +  * Locking the list is not required here as the sched thread is parked
> +  * so no new jobs are being pushed in to HW and in drm_sched_stop
> we
> +  * flushed all the jobs who were still in mirror list but who already
> +  * signaled and removed them self from the list. Also concurrent
> +  * GPU recovers can't run in parallel.
> +  */
>   list_for_each_entry_safe(s_job, tmp, >ring_mirror_list,
> node) {
> - struct drm_sched_fence *s_fence = s_job->s_fence;
>   struct dma_fence *fence = s_job->s_fence->p

RE: [PATCH -next] drm/amdgpu: remove set but not used variable 'grbm_soft_reset'

2018-12-09 Thread Zhou, David(ChunMing)


> -Original Message-
> From: YueHaibing 
> Sent: Saturday, December 08, 2018 11:01 PM
> To: Deucher, Alexander ; Koenig, Christian
> ; Zhou, David(ChunMing)
> ; airl...@linux.ie; Liu, Leo ;
> Gao, Likun ; Panariti, David
> ; S, Shirish ; Zhu, Rex
> ; Grodzovsky, Andrey 
> Cc: YueHaibing ; amd-...@lists.freedesktop.org;
> dri-devel@lists.freedesktop.org; linux-ker...@vger.kernel.org; kernel-
> janit...@vger.kernel.org
> Subject: [PATCH -next] drm/amdgpu: remove set but not used variable
> 'grbm_soft_reset'
> 
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c: In function
> 'gfx_v8_0_pre_soft_reset':
> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:4950:27: warning:
>  variable 'srbm_soft_reset' set but not used [-Wunused-but-set-variable]
> 
> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c: In function
> 'gfx_v8_0_post_soft_reset':
> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:5054:27: warning:
>  variable 'srbm_soft_reset' set but not used [-Wunused-but-set-variable]
> 
> It never used since introduction in commit d31a501ead7f ("drm/amdgpu: add
> pre_soft_reset ip func") and e4ae0fc33631 ("drm/amdgpu: implement
> gfx8 post_soft_reset")
> 
> Signed-off-by: YueHaibing 

Reviewed-by: Chunming Zhou 

> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> index 1454fc3..8c1ba79 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
> @@ -4947,14 +4947,13 @@ static bool gfx_v8_0_check_soft_reset(void
> *handle)  static int gfx_v8_0_pre_soft_reset(void *handle)  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> - u32 grbm_soft_reset = 0, srbm_soft_reset = 0;
> + u32 grbm_soft_reset = 0;
> 
>   if ((!adev->gfx.grbm_soft_reset) &&
>   (!adev->gfx.srbm_soft_reset))
>   return 0;
> 
>   grbm_soft_reset = adev->gfx.grbm_soft_reset;
> - srbm_soft_reset = adev->gfx.srbm_soft_reset;
> 
>   /* stop the rlc */
>   adev->gfx.rlc.funcs->stop(adev);
> @@ -5051,14 +5050,13 @@ static int gfx_v8_0_soft_reset(void *handle)
> static int gfx_v8_0_post_soft_reset(void *handle)  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> - u32 grbm_soft_reset = 0, srbm_soft_reset = 0;
> + u32 grbm_soft_reset = 0;
> 
>   if ((!adev->gfx.grbm_soft_reset) &&
>   (!adev->gfx.srbm_soft_reset))
>   return 0;
> 
>   grbm_soft_reset = adev->gfx.grbm_soft_reset;
> - srbm_soft_reset = adev->gfx.srbm_soft_reset;
> 
>   if (REG_GET_FIELD(grbm_soft_reset, GRBM_SOFT_RESET,
> SOFT_RESET_CP) ||
>   REG_GET_FIELD(grbm_soft_reset, GRBM_SOFT_RESET,
> SOFT_RESET_CPF) ||
> 
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 2/2] drm/sched: Rework HW fence processing.

2018-12-06 Thread Zhou, David(ChunMing)


> -Original Message-
> From: dri-devel  On Behalf Of
> Andrey Grodzovsky
> Sent: Friday, December 07, 2018 1:41 AM
> To: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org;
> ckoenig.leichtzumer...@gmail.com; e...@anholt.net;
> etna...@lists.freedesktop.org
> Cc: Liu, Monk 
> Subject: [PATCH 2/2] drm/sched: Rework HW fence processing.
> 
> Expedite job deletion from ring mirror list to the HW fence signal callback
> instead from finish_work, together with waiting for all such fences to signal 
> in
> drm_sched_stop we garantee that already signaled job will not be processed
> twice.
> Remove the sched finish fence callback and just submit finish_work directly
> from the HW fence callback.
> 
> Suggested-by: Christian Koenig 
> Signed-off-by: Andrey Grodzovsky 
> ---
>  drivers/gpu/drm/scheduler/sched_fence.c |  4 +++-
> drivers/gpu/drm/scheduler/sched_main.c  | 39 --
> ---
>  include/drm/gpu_scheduler.h | 10 +++--
>  3 files changed, 30 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c
> b/drivers/gpu/drm/scheduler/sched_fence.c
> index d8d2dff..e62c239 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -151,7 +151,8 @@ struct drm_sched_fence
> *to_drm_sched_fence(struct dma_fence *f)
> EXPORT_SYMBOL(to_drm_sched_fence);
> 
>  struct drm_sched_fence *drm_sched_fence_create(struct
> drm_sched_entity *entity,
> -void *owner)
> +void *owner,
> +struct drm_sched_job *s_job)
>  {
>   struct drm_sched_fence *fence = NULL;
>   unsigned seq;
> @@ -163,6 +164,7 @@ struct drm_sched_fence
> *drm_sched_fence_create(struct drm_sched_entity *entity,
>   fence->owner = owner;
>   fence->sched = entity->rq->sched;
>   spin_lock_init(>lock);
> + fence->s_job = s_job;
> 
>   seq = atomic_inc_return(>fence_seq);
>   dma_fence_init(>scheduled,
> _sched_fence_ops_scheduled, diff --git
> a/drivers/gpu/drm/scheduler/sched_main.c
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 8fb7f86..2860037 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -284,31 +284,17 @@ static void drm_sched_job_finish(struct
> work_struct *work)
>   cancel_delayed_work_sync(>work_tdr);
> 
>   spin_lock_irqsave(>job_list_lock, flags);
> - /* remove job from ring_mirror_list */
> - list_del_init(_job->node);
> - /* queue TDR for next job */
>   drm_sched_start_timeout(sched);
>   spin_unlock_irqrestore(>job_list_lock, flags);
> 
>   sched->ops->free_job(s_job);
>  }
> 
> -static void drm_sched_job_finish_cb(struct dma_fence *f,
> - struct dma_fence_cb *cb)
> -{
> - struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
> -  finish_cb);
> - schedule_work(>finish_work);
> -}
> -
>  static void drm_sched_job_begin(struct drm_sched_job *s_job)  {
>   struct drm_gpu_scheduler *sched = s_job->sched;
>   unsigned long flags;
> 
> - dma_fence_add_callback(_job->s_fence->finished, _job-
> >finish_cb,
> -drm_sched_job_finish_cb);
> -
>   spin_lock_irqsave(>job_list_lock, flags);
>   list_add_tail(_job->node, >ring_mirror_list);
>   drm_sched_start_timeout(sched);
> @@ -418,13 +404,17 @@ void drm_sched_start(struct drm_gpu_scheduler
> *sched, bool unpark_only)  {
>   struct drm_sched_job *s_job, *tmp;
>   bool found_guilty = false;
> - unsigned long flags;
>   int r;
> 
>   if (unpark_only)
>   goto unpark;
> 
> - spin_lock_irqsave(>job_list_lock, flags);
> + /*
> +  * Locking the list is not required here as the sched thread is parked
> +  * so no new jobs are being pushed in to HW and in drm_sched_stop
> we
> +  * flushed any in flight jobs who didn't signal yet. Also concurrent
> +  * GPU recovers can't run in parallel.
> +  */
>   list_for_each_entry_safe(s_job, tmp, >ring_mirror_list,
> node) {
>   struct drm_sched_fence *s_fence = s_job->s_fence;
>   struct dma_fence *fence;
> @@ -453,7 +443,6 @@ void drm_sched_start(struct drm_gpu_scheduler
> *sched, bool unpark_only)
>   }
> 
>   drm_sched_start_timeout(sched);
> - spin_unlock_irqrestore(>job_list_lock, flags);
> 
>  unpark:
>   kthread_unpark(sched->thread);
> @@ -505,7 +494,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   job->sched = sched;
>   job->entity = entity;
>   job->s_priority = entity->rq - sched->sched_rq;
> - job->s_fence = drm_sched_fence_create(entity, owner);
> + job->s_fence = drm_sched_fence_create(entity, owner, job);
>   if (!job->s_fence)
>   return -ENOMEM;
>   job->id = 

RE: [PATCH 02/11] dma-buf: add new dma_fence_chain container v2

2018-12-03 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Monday, December 03, 2018 9:56 PM
> To: Zhou, David(ChunMing) ; Koenig, Christian
> ; dri-devel@lists.freedesktop.org; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH 02/11] dma-buf: add new dma_fence_chain container
> v2
> 
> Am 03.12.18 um 14:44 schrieb Chunming Zhou:
> >
> > 在 2018/12/3 21:28, Christian König 写道:
> >> Am 03.12.18 um 14:18 schrieb Chunming Zhou:
> >>> 在 2018/12/3 19:00, Christian König 写道:
> >>>> Am 03.12.18 um 06:25 schrieb zhoucm1:
> >>>>> On 2018年11月28日 22:50, Christian König wrote:
> >>>>>> Lockless container implementation similar to a dma_fence_array,
> >>>>>> but with only two elements per node and automatic garbage
> >>>>>> collection.
> >>>>>>
> >>>>>> v2: properly document dma_fence_chain_for_each, add
> >>>>>> dma_fence_chain_find_seqno,
> >>>>>>    drop prev reference during garbage collection if it's not
> >>>>>> a chain fence.
> >>>>>>
> >>>>>> Signed-off-by: Christian König 
> >>>>>> ---
 [snip]
> >>>>>> +
> >>>>>> +/**
> >>>>>> + * dma_fence_chain_init - initialize a fence chain
> >>>>>> + * @chain: the chain node to initialize
> >>>>>> + * @prev: the previous fence
> >>>>>> + * @fence: the current fence
> >>>>>> + *
> >>>>>> + * Initialize a new chain node and either start a new chain or
> >>>>>> +add
> >>>>>> the node to
> >>>>>> + * the existing chain of the previous fence.
> >>>>>> + */
> >>>>>> +void dma_fence_chain_init(struct dma_fence_chain *chain,
> >>>>>> +  struct dma_fence *prev,
> >>>>>> +  struct dma_fence *fence,
> >>>>>> +  uint64_t seqno)
> >>>>>> +{
> >>>>>> +    struct dma_fence_chain *prev_chain =
> >>>>>> +to_dma_fence_chain(prev);
> >>>>>> +    uint64_t context;
> >>>>>> +
> >>>>>> +    spin_lock_init(>lock);
> >>>>>> +    chain->prev = prev;
> >>>>>> +    chain->fence = fence;
> >>>>>> +    chain->prev_seqno = 0;
> >>>>>> +    init_irq_work(>work, dma_fence_chain_irq_work);
> >>>>>> +
> >>>>>> +    /* Try to reuse the context of the previous chain node. */
> >>>>>> +    if (prev_chain && seqno > prev->seqno &&
> >>>>>> +    __dma_fence_is_later(seqno, prev->seqno)) {
> >>>>> As your patch#1 makes __dma_fence_is_later only be valid for
> >>>>> 32bit, we cannot use it for 64bit here, we should remove it from
> >>>>> here, just compare seqno directly.
> >>>> That is intentional. We must make sure that the number both
> >>>> increments as 64bit number as well as not wraps around as 32bit
> number.
> >>>>
> >>>> In other words the largest difference between two sequence numbers
> >>>> userspace is allowed to submit is 1<<31.
> >>> Why? no one can make sure that, application users would only think
> >>> it is an uint64 sequence nubmer, and they can signal any advanced
> >>> point. I already see umd guys writing timeline test use max_uint64-1
> >>> as a final signal.
> >>> We shouldn't add this limitation here.
> >> We need to be backward compatible to hardware which can only do 32bit
> >> signaling with the dma-fence implementation.
> > I see that, you already explained that before.
> > but can't we just grep low 32bit of seqno only when 32bit hardware try
> > to use?
> >
> > then we can make dma_fence_later use 64bit comparation.
> 
> The problem is that we don't know at all times when to use a 32bit compare
> and when to use a 64bit compare.
> 
> What we could do is test if any of the upper 32bits of a sequence number is
> not 0 and if that is the case do a 64bit compare. This way max_uint64_t would
> still be handled correctly.
Sounds we can have a try, and we need mask upper 32bits for 32bit hardware case 
in the meanwhile,  right?

-David
> 
> 
> Christian.
> 
> >
> >> Otherwise dma_fence_later() could return an inconsistent result and
> >> break at other places.
> >>
> >> So if userspace wants to use more than 1<<31 difference between
> >> sequence numbers we need to push back on this.
> > It's rare case, but I don't think we can convince them add this
> > limitation. So we cannot add this limitation here.
> >
> > -David

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for amdgpu v3

2018-11-30 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Friday, November 30, 2018 5:15 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org; amd-...@lists.freedesktop.org
> Subject: Re: [PATCH libdrm 4/5] wrap syncobj timeline query/wait APIs for
> amdgpu v3
> 
[snip]
> >> +drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
> >> +   uint32_t *handles, uint64_t *points,
> > This interfaces is public to umd, I think they like "uint64_t
> > **points" for batch query, I've verified before, it works well and
> > more convenience.
> > If removing num_handles, that means only one syncobj to query, I agree
> > with "uint64_t *point".
> 
> "handles" as well as "points" are an array of objects. If the UMD wants to
> write the points to separate locations it can do so manually after calling the
> function.

Ok, it doesn't matter.

-David
> 
> It doesn't make any sense that libdrm or the kernel does the extra
> indirection, the transferred pointers are 64bit as well (even on a 32bit
> system) so the overhead is identical.
> 
> Adding another indirection just makes the implementation unnecessary
> complex.


> 
> Christian.
> 
> >
> > -David
> >> +   unsigned num_handles) {
> >> +    if (NULL == dev)
> >> +    return -EINVAL;
> >> +
> >> +    return drmSyncobjQuery(dev->fd, handles, points, num_handles); }
> >> +
> >>   drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
> >>   uint32_t handle,
> >>   int *shared_fd)
> >

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 6/7] drm/syncobj: add new drm_syncobj_add_point interface

2018-11-15 Thread Zhou, David(ChunMing)
Don't know how to work, not completely yet.

> -Original Message-
> From: Christian König 
> Sent: Thursday, November 15, 2018 7:13 PM
> To: dri-devel@lists.freedesktop.org
> Cc: ch...@chris-wilson.co.uk; daniel.vet...@ffwll.ch; e...@anholt.net; Zhou,
> David(ChunMing) 
> Subject: [PATCH 6/7] drm/syncobj: add new drm_syncobj_add_point
> interface
> 
> Use the dma_fence_chain object to create a timeline of fence objects
> instead of just replacing the existing fence.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/drm_syncobj.c | 40
> 
>  include/drm/drm_syncobj.h |  5 +
>  2 files changed, 45 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c
> b/drivers/gpu/drm/drm_syncobj.c index 4a2e6ef16979..589d884ccd58
> 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -193,6 +193,46 @@ void drm_syncobj_remove_callback(struct
> drm_syncobj *syncobj,
>   spin_unlock(>lock);
>  }
> 
> +/**
> + * drm_syncobj_add_point - add new timeline point to the syncobj
> + * @syncobj: sync object to add timeline point do
> + * @chain: chain node to use to add the point
> + * @fence: fence to encapsulate in the chain node
> + * @point: sequence number to use for the point
> + *
> + * Add the chain node as new timeline point to the syncobj.
> + */
> +void drm_syncobj_add_point(struct drm_syncobj *syncobj,
> +struct dma_fence_chain *chain,
> +struct dma_fence *fence,
> +uint64_t point)
> +{
> + struct drm_syncobj_cb *cur, *tmp;
> + struct dma_fence *prev;
> +
> + dma_fence_get(fence);
> + dma_fence_get(fence);
> +
> + spin_lock(>lock);
> +
> + prev = rcu_dereference_protected(syncobj->fence,
> +  lockdep_is_held(>lock));
> + dma_fence_chain_init(chain, prev, fence, point);
> + rcu_assign_pointer(syncobj->fence, >base);
> +
> + list_for_each_entry_safe(cur, tmp, >cb_list, node) {
> + list_del_init(>node);
> + cur->func(syncobj, cur);
> + }
> + spin_unlock(>lock);
> +
> + /* Walk the chain once to trigger garbage collection */
> + prev = fence;
> + dma_fence_chain_for_each(prev);
> +
> + dma_fence_put(fence);
> +}
> +
>  /**
>   * drm_syncobj_replace_fence - replace fence in a sync object.
>   * @syncobj: Sync object to replace fence in diff --git
> a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index
> c79f5ada7cdb..35a917241e30 100644
> --- a/include/drm/drm_syncobj.h
> +++ b/include/drm/drm_syncobj.h
> @@ -27,6 +27,7 @@
>  #define __DRM_SYNCOBJ_H__
> 
>  #include "linux/dma-fence.h"
> +#include "linux/dma-fence-chain.h"
> 
>  /**
>   * struct drm_syncobj - sync object.
> @@ -110,6 +111,10 @@ drm_syncobj_fence_get(struct drm_syncobj
> *syncobj)
> 
>  struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
>u32 handle);
> +void drm_syncobj_add_point(struct drm_syncobj *syncobj,
> +struct dma_fence_chain *chain,
> +struct dma_fence *fence,
> +uint64_t point);
>  void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
>  struct dma_fence *fence);
>  int drm_syncobj_find_fence(struct drm_file *file_private,
> --
> 2.14.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 5/7] drm/syncobj: move drm_syncobj_cb into drm_syncobj.c

2018-11-15 Thread Zhou, David(ChunMing)
Reviewed-by: Chunming Zhou 

> -Original Message-
> From: Christian König 
> Sent: Thursday, November 15, 2018 7:13 PM
> To: dri-devel@lists.freedesktop.org
> Cc: ch...@chris-wilson.co.uk; daniel.vet...@ffwll.ch; e...@anholt.net; Zhou,
> David(ChunMing) 
> Subject: [PATCH 5/7] drm/syncobj: move drm_syncobj_cb into
> drm_syncobj.c
> 
> Not used outside the file.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/drm_syncobj.c | 21 +
>  include/drm/drm_syncobj.h | 21 -
>  2 files changed, 21 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c
> b/drivers/gpu/drm/drm_syncobj.c index 4c45acb326b9..4a2e6ef16979
> 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -56,6 +56,27 @@
>  #include "drm_internal.h"
>  #include 
> 
> +struct drm_syncobj_cb;
> +
> +typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj,
> +struct drm_syncobj_cb *cb);
> +
> +/**
> + * struct drm_syncobj_cb - callback for drm_syncobj_add_callback
> + * @node: used by drm_syncob_add_callback to append this struct to
> + * _syncobj.cb_list
> + * @func: drm_syncobj_func_t to call
> + *
> + * This struct will be initialized by drm_syncobj_add_callback,
> +additional
> + * data can be passed along by embedding drm_syncobj_cb in another
> struct.
> + * The callback will get called the next time drm_syncobj_replace_fence
> +is
> + * called.
> + */
> +struct drm_syncobj_cb {
> + struct list_head node;
> + drm_syncobj_func_t func;
> +};
> +
>  static DEFINE_SPINLOCK(stub_fence_lock);  static struct dma_fence
> stub_fence;
> 
> diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index
> ab9055f943c7..c79f5ada7cdb 100644
> --- a/include/drm/drm_syncobj.h
> +++ b/include/drm/drm_syncobj.h
> @@ -28,8 +28,6 @@
> 
>  #include "linux/dma-fence.h"
> 
> -struct drm_syncobj_cb;
> -
>  /**
>   * struct drm_syncobj - sync object.
>   *
> @@ -62,25 +60,6 @@ struct drm_syncobj {
>   struct file *file;
>  };
> 
> -typedef void (*drm_syncobj_func_t)(struct drm_syncobj *syncobj,
> -struct drm_syncobj_cb *cb);
> -
> -/**
> - * struct drm_syncobj_cb - callback for drm_syncobj_add_callback
> - * @node: used by drm_syncob_add_callback to append this struct to
> - * _syncobj.cb_list
> - * @func: drm_syncobj_func_t to call
> - *
> - * This struct will be initialized by drm_syncobj_add_callback, additional
> - * data can be passed along by embedding drm_syncobj_cb in another
> struct.
> - * The callback will get called the next time drm_syncobj_replace_fence is
> - * called.
> - */
> -struct drm_syncobj_cb {
> - struct list_head node;
> - drm_syncobj_func_t func;
> -};
> -
>  void drm_syncobj_free(struct kref *kref);
> 
>  /**
> --
> 2.14.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 4/7] drm/syncobj: use only a single stub fence

2018-11-15 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Thursday, November 15, 2018 7:13 PM
> To: dri-devel@lists.freedesktop.org
> Cc: ch...@chris-wilson.co.uk; daniel.vet...@ffwll.ch; e...@anholt.net; Zhou,
> David(ChunMing) 
> Subject: [PATCH 4/7] drm/syncobj: use only a single stub fence
> 
> Extract of useful code from the timeline work. Let's use just a single stub
> fence instance instead of allocating a new one all the time.
> 
> Signed-off-by: Chunming Zhou 
> Signed-off-by: Christian König 

It is a good conclusion during previous review, there already is my Sined-off, 
I cannot give RB on that, need other people take an action.

-David
> ---
>  drivers/gpu/drm/drm_syncobj.c | 67 ++
> -
>  1 file changed, 35 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c
> b/drivers/gpu/drm/drm_syncobj.c index f190414511ae..4c45acb326b9
> 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -56,10 +56,8 @@
>  #include "drm_internal.h"
>  #include 
> 
> -struct drm_syncobj_stub_fence {
> - struct dma_fence base;
> - spinlock_t lock;
> -};
> +static DEFINE_SPINLOCK(stub_fence_lock); static struct dma_fence
> +stub_fence;
> 
>  static const char *drm_syncobj_stub_fence_get_name(struct dma_fence
> *fence)  { @@ -71,6 +69,25 @@ static const struct dma_fence_ops
> drm_syncobj_stub_fence_ops = {
>   .get_timeline_name = drm_syncobj_stub_fence_get_name,  };
> 
> +/**
> + * drm_syncobj_get_stub_fence - return a signaled fence
> + *
> + * Return a stub fence which is already signaled.
> + */
> +static struct dma_fence *drm_syncobj_get_stub_fence(void) {
> + spin_lock(_fence_lock);
> + if (!stub_fence.ops) {
> + dma_fence_init(_fence,
> +_syncobj_stub_fence_ops,
> +_fence_lock,
> +0, 0);
> + dma_fence_signal_locked(_fence);
> + }
> + spin_unlock(_fence_lock);
> +
> + return dma_fence_get(_fence);
> +}
> 
>  /**
>   * drm_syncobj_find - lookup and reference a sync object.
> @@ -190,23 +207,18 @@ void drm_syncobj_replace_fence(struct
> drm_syncobj *syncobj,  }  EXPORT_SYMBOL(drm_syncobj_replace_fence);
> 
> -static int drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
> +/**
> + * drm_syncobj_assign_null_handle - assign a stub fence to the sync
> +object
> + * @syncobj: sync object to assign the fence on
> + *
> + * Assign a already signaled stub fence to the sync object.
> + */
> +static void drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
>  {
> - struct drm_syncobj_stub_fence *fence;
> - fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> - if (fence == NULL)
> - return -ENOMEM;
> + struct dma_fence *fence = drm_syncobj_get_stub_fence();
> 
> - spin_lock_init(>lock);
> - dma_fence_init(>base, _syncobj_stub_fence_ops,
> ->lock, 0, 0);
> - dma_fence_signal(>base);
> -
> - drm_syncobj_replace_fence(syncobj, >base);
> -
> - dma_fence_put(>base);
> -
> - return 0;
> + drm_syncobj_replace_fence(syncobj, fence);
> + dma_fence_put(fence);
>  }
> 
>  /**
> @@ -273,7 +285,6 @@ EXPORT_SYMBOL(drm_syncobj_free);  int
> drm_syncobj_create(struct drm_syncobj **out_syncobj, uint32_t flags,
>  struct dma_fence *fence)
>  {
> - int ret;
>   struct drm_syncobj *syncobj;
> 
>   syncobj = kzalloc(sizeof(struct drm_syncobj), GFP_KERNEL); @@ -
> 284,13 +295,8 @@ int drm_syncobj_create(struct drm_syncobj
> **out_syncobj, uint32_t flags,
>   INIT_LIST_HEAD(>cb_list);
>   spin_lock_init(>lock);
> 
> - if (flags & DRM_SYNCOBJ_CREATE_SIGNALED) {
> - ret = drm_syncobj_assign_null_handle(syncobj);
> - if (ret < 0) {
> - drm_syncobj_put(syncobj);
> - return ret;
> - }
> - }
> + if (flags & DRM_SYNCOBJ_CREATE_SIGNALED)
> + drm_syncobj_assign_null_handle(syncobj);
> 
>   if (fence)
>   drm_syncobj_replace_fence(syncobj, fence); @@ -984,11
> +990,8 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>   if (ret < 0)
>   return ret;
> 
> - for (i = 0; i < args->count_handles; i++) {
> - ret = drm_syncobj_assign_null_handle(syncobjs[i]);
> - if (ret < 0)
> - break;
> - }
> + for (i = 0; i < args->count_handles; i++)
> + drm_syncobj_assign_null_handle(syncobjs[i]);
> 
>   drm_syncobj_array_free(syncobjs, args->count_handles);
> 
> --
> 2.14.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 2/7] dma-buf: add new dma_fence_chain container

2018-11-15 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Thursday, November 15, 2018 7:13 PM
> To: dri-devel@lists.freedesktop.org
> Cc: ch...@chris-wilson.co.uk; daniel.vet...@ffwll.ch; e...@anholt.net; Zhou,
> David(ChunMing) 
> Subject: [PATCH 2/7] dma-buf: add new dma_fence_chain container
> 
> Lockless container implementation similar to a dma_fence_array, but with
> only two elements per node and automatic garbage collection.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/Makefile  |   3 +-
>  drivers/dma-buf/dma-fence-chain.c | 186
> ++
>  include/linux/dma-fence-chain.h   |  69 ++
>  3 files changed, 257 insertions(+), 1 deletion(-)  create mode 100644
> drivers/dma-buf/dma-fence-chain.c  create mode 100644 include/linux/dma-
> fence-chain.h
> 
> diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index
> 0913a6ccab5a..1f006e083eb9 100644
> --- a/drivers/dma-buf/Makefile
> +++ b/drivers/dma-buf/Makefile
> @@ -1,4 +1,5 @@
> -obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-
> fence.o
> +obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
> +  reservation.o seqno-fence.o
>  obj-$(CONFIG_SYNC_FILE)  += sync_file.o
>  obj-$(CONFIG_SW_SYNC)+= sw_sync.o sync_debug.o
>  obj-$(CONFIG_UDMABUF)+= udmabuf.o
> diff --git a/drivers/dma-buf/dma-fence-chain.c b/drivers/dma-buf/dma-
> fence-chain.c
> new file mode 100644
> index ..ac830b886589
> --- /dev/null
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -0,0 +1,186 @@
> +/*
> + * fence-chain: chain fences together in a timeline
> + *
> + * Copyright (C) 2018 Advanced Micro Devices, Inc.
> + * Authors:
> + *   Christian König 
> + *
> + * This program is free software; you can redistribute it and/or modify
> +it
> + * under the terms of the GNU General Public License version 2 as
> +published by
> + * the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful, but
> +WITHOUT
> + * ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY
> +or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +License for
> + * more details.
> + */
> +
> +#include 
> +
> +static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
> +
> +/**
> + * dma_fence_chain_get_prev - use RCU to get a reference to the
> +previous fence
> + * @chain: chain node to get the previous node from
> + *
> + * Use dma_fence_get_rcu_safe to get a reference to the previous fence
> +of the
> + * chain node.
> + */
> +static struct dma_fence *dma_fence_chain_get_prev(struct
> +dma_fence_chain *chain) {
> + struct dma_fence *prev;
> +
> + rcu_read_lock();
> + prev = dma_fence_get_rcu_safe(>prev);
> + rcu_read_unlock();
> + return prev;
> +}
> +
> +/**
> + * dma_fence_chain_walk - chain walking function
> + * @fence: current chain node
> + *
> + * Walk the chain to the next node. Returns the next fence or NULL if
> +we are at
> + * the end of the chain. Garbage collects chain nodes which are already
> + * signaled.
> + */
> +struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence) {
> + struct dma_fence_chain *chain, *prev_chain;
> + struct dma_fence *prev, *prev_prev, *tmp;
> +
> + chain = to_dma_fence_chain(fence);
> + if (!chain) {
> + dma_fence_put(fence);
> + return NULL;
> + }
> +
> + while ((prev = dma_fence_chain_get_prev(chain))) {
> +
> + prev_chain = to_dma_fence_chain(prev);
> + if (!prev_chain || !dma_fence_is_signaled(prev_chain-
> >fence))
> + break;
> +
> + prev_prev = dma_fence_chain_get_prev(prev_chain);
> + tmp = cmpxchg(>prev, prev, prev_prev);
> + if (tmp == prev)
> + dma_fence_put(tmp);
> + else
> + dma_fence_put(prev_prev);
> + dma_fence_put(prev);
> + }
> +
> + dma_fence_put(fence);
> + return prev;
> +}
> +EXPORT_SYMBOL(dma_fence_chain_walk);
> +
> +static const char *dma_fence_chain_get_driver_name(struct dma_fence
> +*fence) {
> +return "dma_fence_chain";
> +}
> +
> +static const char *dma_fence_chain_get_timeline_name(struct dma_fence
> +*fence) {
> +return "unbound";
> +}
> +
> +static void dma_fence_chain_irq_work(struct irq_work *work) {
> + struct dma_fence_chain *chain;
> +
> + chain = container_of(work, ty

RE: [PATCH 1/7] dma-buf: make fence sequence numbers 64 bit

2018-11-15 Thread Zhou, David(ChunMing)
Acked-by: Chunming  Zhou , it is better that other people 
from outside can take a look.

> -Original Message-
> From: Christian König 
> Sent: Thursday, November 15, 2018 7:13 PM
> To: dri-devel@lists.freedesktop.org
> Cc: ch...@chris-wilson.co.uk; daniel.vet...@ffwll.ch; e...@anholt.net; Zhou,
> David(ChunMing) 
> Subject: [PATCH 1/7] dma-buf: make fence sequence numbers 64 bit
> 
> For a lot of use cases we need 64bit sequence numbers. Currently drivers
> overload the dma_fence structure to store the additional bits.
> 
> Stop doing that and make the sequence number in the dma_fence always
> 64bit.
> 
> For compatibility with hardware which can do only 32bit sequences the
> comparisons in __dma_fence_is_later still only takes the lower 32bits as
> significant.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-fence.c|  2 +-
>  drivers/dma-buf/sw_sync.c  |  2 +-
>  drivers/dma-buf/sync_file.c|  4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c |  2 +-
>  drivers/gpu/drm/i915/i915_sw_fence.c   |  2 +-
>  drivers/gpu/drm/i915/intel_engine_cs.c |  2 +-
>  drivers/gpu/drm/vgem/vgem_fence.c  |  4 ++--
>  include/linux/dma-fence.h  | 14 +++---
>  8 files changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index 1551ca7df394..37e24b69e94b 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -615,7 +615,7 @@ EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>   */
>  void
>  dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops
> *ops,
> -spinlock_t *lock, u64 context, unsigned seqno)
> +spinlock_t *lock, u64 context, u64 seqno)
>  {
>   BUG_ON(!lock);
>   BUG_ON(!ops || !ops->get_driver_name || !ops-
> >get_timeline_name); diff --git a/drivers/dma-buf/sw_sync.c
> b/drivers/dma-buf/sw_sync.c index 53c1d6d36a64..32dcf7b4c935 100644
> --- a/drivers/dma-buf/sw_sync.c
> +++ b/drivers/dma-buf/sw_sync.c
> @@ -172,7 +172,7 @@ static bool timeline_fence_enable_signaling(struct
> dma_fence *fence)  static void timeline_fence_value_str(struct dma_fence
> *fence,
>   char *str, int size)
>  {
> - snprintf(str, size, "%d", fence->seqno);
> + snprintf(str, size, "%lld", fence->seqno);
>  }
> 
>  static void timeline_fence_timeline_value_str(struct dma_fence *fence, diff
> --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c index
> 35dd06479867..4f6305ca52c8 100644
> --- a/drivers/dma-buf/sync_file.c
> +++ b/drivers/dma-buf/sync_file.c
> @@ -144,7 +144,7 @@ char *sync_file_get_name(struct sync_file *sync_file,
> char *buf, int len)
>   } else {
>   struct dma_fence *fence = sync_file->fence;
> 
> - snprintf(buf, len, "%s-%s%llu-%d",
> + snprintf(buf, len, "%s-%s%llu-%lld",
>fence->ops->get_driver_name(fence),
>fence->ops->get_timeline_name(fence),
>fence->context,
> @@ -258,7 +258,7 @@ static struct sync_file *sync_file_merge(const char
> *name, struct sync_file *a,
> 
>   i_b++;
>   } else {
> - if (pt_a->seqno - pt_b->seqno <= INT_MAX)
> + if (__dma_fence_is_later(pt_a->seqno, pt_b-
> >seqno))
>   add_fence(fences, , pt_a);
>   else
>   add_fence(fences, , pt_b);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> index 12f2bf97611f..bfaf5c6323be 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sa.c
> @@ -388,7 +388,7 @@ void amdgpu_sa_bo_dump_debug_info(struct
> amdgpu_sa_manager *sa_manager,
>  soffset, eoffset, eoffset - soffset);
> 
>   if (i->fence)
> - seq_printf(m, " protected by 0x%08x on
> context %llu",
> + seq_printf(m, " protected by 0x%016llx on
> context %llu",
>  i->fence->seqno, i->fence->context);
> 
>   seq_printf(m, "\n");
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c
> b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 6dbeed079ae5..11bcdabd5177 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -393,7 +393,7 @@ static void timer_i915_sw_fence_wake(struct
> timer_li

RE: [PATCH] drm/syncobj: Fix oops on drm_syncobj_find_fence(file_priv, 0, ...).

2018-11-05 Thread Zhou, David(ChunMing)
Reviewed-by: Chunming Zhou 

> -Original Message-
> From: Eric Anholt 
> Sent: Tuesday, November 06, 2018 7:01 AM
> To: dri-devel@lists.freedesktop.org
> Cc: linux-ker...@vger.kernel.org; Eric Anholt ; Zhou,
> David(ChunMing) ; Koenig, Christian
> 
> Subject: [PATCH] drm/syncobj: Fix oops on
> drm_syncobj_find_fence(file_priv, 0, ...).
> 
> This broke rendering on V3D, where we almost always have a 0 in-syncobj.
> 
> Signed-off-by: Eric Anholt 
> Fixes: 48197bc564c7 ("drm: add syncobj timeline support v9")
> Cc: Chunming Zhou 
> Cc: Christian König 
> ---
>  drivers/gpu/drm/drm_syncobj.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c
> b/drivers/gpu/drm/drm_syncobj.c index 4dca5f7e8c4b..da8175d9c6ff 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -443,7 +443,8 @@ int drm_syncobj_find_fence(struct drm_file
> *file_private,
>   int ret;
> 
>   ret = drm_syncobj_search_fence(syncobj, point, flags, fence);
> - drm_syncobj_put(syncobj);
> + if (syncobj)
> + drm_syncobj_put(syncobj);
>   return ret;
>  }
>  EXPORT_SYMBOL(drm_syncobj_find_fence);
> --
> 2.19.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH libdrm 5/5] [libdrm] add syncobj timeline tests

2018-11-05 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Daniel Vetter  On Behalf Of Daniel Vetter
> Sent: Monday, November 05, 2018 5:39 PM
> To: Zhou, David(ChunMing) 
> Cc: dri-devel@lists.freedesktop.org; amd-...@lists.freedesktop.org
> Subject: Re: [PATCH libdrm 5/5] [libdrm] add syncobj timeline tests
> 
> On Fri, Nov 02, 2018 at 04:26:49PM +0800, Chunming Zhou wrote:
> > Signed-off-by: Chunming Zhou 
> > ---
> >  tests/amdgpu/Makefile.am |   3 +-
> >  tests/amdgpu/amdgpu_test.c   |  12 ++
> >  tests/amdgpu/amdgpu_test.h   |  21 +++
> >  tests/amdgpu/meson.build |   2 +-
> >  tests/amdgpu/syncobj_tests.c | 263
> > +++
> >  5 files changed, 299 insertions(+), 2 deletions(-)  create mode
> > 100644 tests/amdgpu/syncobj_tests.c
> 
> This testcase seems very much a happy sunday scenario, no tests at all for
> corner cases, invalid input, and generally trying to pull the kernel over the
> table. I think we need a lot more, and preferrably in igt, where we already
> have a good baseline of drm_syncobj tests.
Hi Daniel,

OK, if you insist on that, I would switch to implement a timeline test on IGT.
Btw,  timeline syncobj test needs based on command submission, Can I write it 
with amdgpu driver on IGT?
And after that, where should I send igt patch to review? 

Last, if you are free, Could you also take a look the u/k interface of timeline 
syncobj?


Thanks,
David Zhou
> -Daniel
> 
> >
> > diff --git a/tests/amdgpu/Makefile.am b/tests/amdgpu/Makefile.am index
> > 447ff217..d3fbe2bb 100644
> > --- a/tests/amdgpu/Makefile.am
> > +++ b/tests/amdgpu/Makefile.am
> > @@ -33,4 +33,5 @@ amdgpu_test_SOURCES = \
> > vcn_tests.c \
> > uve_ib.h \
> > deadlock_tests.c \
> > -   vm_tests.c
> > +   vm_tests.c \
> > +   syncobj_tests.c
> > diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
> > index 96fcd687..cdcb93a5 100644
> > --- a/tests/amdgpu/amdgpu_test.c
> > +++ b/tests/amdgpu/amdgpu_test.c
> > @@ -56,6 +56,7 @@
> >  #define UVD_ENC_TESTS_STR "UVD ENC Tests"
> >  #define DEADLOCK_TESTS_STR "Deadlock Tests"
> >  #define VM_TESTS_STR "VM Tests"
> > +#define SYNCOBJ_TIMELINE_TESTS_STR "SYNCOBJ TIMELINE Tests"
> >
> >  /**
> >   *  Open handles for amdgpu devices
> > @@ -116,6 +117,12 @@ static CU_SuiteInfo suites[] = {
> > .pCleanupFunc = suite_vm_tests_clean,
> > .pTests = vm_tests,
> > },
> > +   {
> > +   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
> > +   .pInitFunc = suite_syncobj_timeline_tests_init,
> > +   .pCleanupFunc = suite_syncobj_timeline_tests_clean,
> > +   .pTests = syncobj_timeline_tests,
> > +   },
> >
> > CU_SUITE_INFO_NULL,
> >  };
> > @@ -165,6 +172,11 @@ static Suites_Active_Status suites_active_stat[] = {
> > .pName = VM_TESTS_STR,
> > .pActive = suite_vm_tests_enable,
> > },
> > +   {
> > +   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
> > +   .pActive = suite_syncobj_timeline_tests_enable,
> > +   },
> > +
> >  };
> >
> >
> > diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h
> > index 0609a74b..946e91c2 100644
> > --- a/tests/amdgpu/amdgpu_test.h
> > +++ b/tests/amdgpu/amdgpu_test.h
> > @@ -194,6 +194,27 @@ CU_BOOL suite_vm_tests_enable(void);
> >   */
> >  extern CU_TestInfo vm_tests[];
> >
> > +/**
> > + * Initialize syncobj timeline test suite  */ int
> > +suite_syncobj_timeline_tests_init();
> > +
> > +/**
> > + * Deinitialize syncobj timeline test suite  */ int
> > +suite_syncobj_timeline_tests_clean();
> > +
> > +/**
> > + * Decide if the suite is enabled by default or not.
> > + */
> > +CU_BOOL suite_syncobj_timeline_tests_enable(void);
> > +
> > +/**
> > + * Tests in syncobj timeline test suite  */ extern CU_TestInfo
> > +syncobj_timeline_tests[];
> > +
> > +
> >  /**
> >   * Helper functions
> >   */
> > diff --git a/tests/amdgpu/meson.build b/tests/amdgpu/meson.build index
> > 4c1237c6..3ceec715 100644
> > --- a/tests/amdgpu/meson.build
> > +++ b/tests/amdgpu/meson.build
> > @@ -24,7 +24,7 @@ if dep_cunit.found()
> >  files(
> >'amdgpu_test.c', 'basic_tests.c', 'bo_tests.c', 'cs_tests.c',
> >'vce_tests.c', 'uvd_enc_tests.c', 'vcn_tests.c', 'deadlock_tests.c',
> > -  'vm_tests.c'

RE: [igt-dev] [PATCH] RFC: Make igts for cross-driver stuff mandatory?

2018-10-25 Thread Zhou, David(ChunMing)
Make igt for cross-driver, I think you should rename it first, not an intel 
specific. NO company wants their employee working on other company stuff.
You can rename it to DGT(drm graphics test), and published following  libdrm, 
or directly merge to libdrm, then everyone  can use it and develop it in same 
page, which is only my personal opinion. 

Regards,
David

> -Original Message-
> From: dri-devel  On Behalf Of Eric
> Anholt
> Sent: Friday, October 26, 2018 12:36 AM
> To: Sean Paul ; Daniel Vetter 
> Cc: IGT development ; Intel Graphics
> Development ; DRI Development  de...@lists.freedesktop.org>; amd-...@lists.freedesktop.org
> Subject: Re: [igt-dev] [PATCH] RFC: Make igts for cross-driver stuff
> mandatory?
> 
> Sean Paul  writes:
> 
> > On Fri, Oct 19, 2018 at 10:50:49AM +0200, Daniel Vetter wrote:
> >> Hi all,
> >>
> >> This is just to collect feedback on this idea, and see whether the
> >> overall dri-devel community stands on all this. I think the past few
> >> cross-vendor uapi extensions all came with igts attached, and
> >> personally I think there's lots of value in having them: A
> >> cross-vendor interface isn't useful if every driver implements it
> >> slightly differently.
> >>
> >> I think there's 2 questions here:
> >>
> >> - Do we want to make such testcases mandatory?
> >>
> >
> > Yes, more testing == better code.
> >
> >
> >> - If yes, are we there yet, or is there something crucially missing
> >>   still?
> >
> > In my experience, no. Last week while trying to replicate an intel-gfx
> > CI failure, I tried compiling igt for one of my (intel) chromebooks.
> > It seems like cross-compilation (or, in my case, just specifying
> > prefix/ld_library_path/sbin_path) is broken on igt. If we want to
> > impose restrictions across the entire subsystem, we need to make sure
> > that everyone can build and deploy igt easily.
> >
> > I managed to hack around everything and get it working, but I still
> > haven't tried switching out the toolchain. Once we have some GitLab CI
> > to validate cross-compilation, then we can consider making IGT mandatory.
> >
> > It's possible that I'm just a meson n00b and didn't use the right
> > incantation, so maybe it already works, but then we need better
> documentation.
> >
> > I've pasted my horrible hacks below, I also didn't have libunwind, so
> > removed its usage.
> 
> I've also had to cut out libunwind for cross-compiling on many occasions.
> Worst library.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] drm: fix call_kern.cocci warnings (fwd)

2018-10-24 Thread Zhou, David(ChunMing)
Reviewed-by: Chunming Zhou 

> -Original Message-
> From: Julia Lawall 
> Sent: Thursday, October 25, 2018 2:57 AM
> To: Zhou, David(ChunMing) 
> Cc: kbuild-...@01.org; intel-...@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; Christian König
> ; Gustavo Padovan
> ; Maarten Lankhorst
> ; Sean Paul ; David
> Airlie ; dri-devel@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] drm: fix call_kern.cocci warnings (fwd)
> 
> The containing function is called with a spin_lock held, so GFP_KERNEL can't
> be used.
> 
> julia
> 
> -- Forwarded message --
> Date: Tue, 23 Oct 2018 17:14:25 +0800
> From: kbuild test robot 
> To: kbu...@01.org
> Cc: Julia Lawall 
> Subject: [PATCH] drm: fix call_kern.cocci warnings
> 
> CC: kbuild-...@01.org
> CC: intel-...@lists.freedesktop.org
> CC: dri-devel@lists.freedesktop.org
> TO: Chunming Zhou 
> CC: "Christian König" 
> CC: Gustavo Padovan 
> CC: Maarten Lankhorst 
> CC: Sean Paul 
> CC: David Airlie 
> CC: dri-devel@lists.freedesktop.org
> CC: linux-ker...@vger.kernel.org
> 
> From: kbuild test robot 
> 
> drivers/gpu/drm/drm_syncobj.c:202:4-14: ERROR: function
> drm_syncobj_find_signal_pt_for_point called on line 390 inside lock on line
> 389 but uses GFP_KERNEL
> 
>  Find functions that refer to GFP_KERNEL but are called with locks held.
> 
> Semantic patch information:
>  The proposed change of converting the GFP_KERNEL is not necessarily the
> correct one.  It may be desired to unlock the lock, or to not call the  
> function
> under the lock in the first place.
> 
> Generated by: scripts/coccinelle/locks/call_kern.cocci
> 
> Fixes: 48197bc564c7 ("drm: add syncobj timeline support v9")
> CC: Chunming Zhou 
> Signed-off-by: kbuild test robot 
> ---
> 
> tree:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
> head:   8d7ffd2298c607c3e1a16f94d51450d7940fd6a7
> commit: 48197bc564c7a1888c86024a1ba4f956e0ec2300 [1968/2033] drm: add
> syncobj timeline support v9
> :: branch date: 4 hours ago
> :: commit date: 5 days ago
> 
> Please take the patch only if it's a positive warning. Thanks!
> 
>  drm_syncobj.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -199,7 +199,7 @@ static struct dma_fence
>   (point <= syncobj->timeline)) {
>   struct drm_syncobj_stub_fence *fence =
>   kzalloc(sizeof(struct drm_syncobj_stub_fence),
> - GFP_KERNEL);
> + GFP_ATOMIC);
> 
>   if (!fence)
>   return NULL;
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] drm: fix deadlock of syncobj v2

2018-10-22 Thread Zhou, David(ChunMing)
Ping...
Btw:
The patch is tested by syncobj_basic and syncobj_wait of IGT.

> -Original Message-
> From: Chunming Zhou 
> Sent: Sunday, October 21, 2018 7:14 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Zhou, David(ChunMing) ; Daniel Vetter
> ; Chris Wilson ; Koenig, Christian
> 
> Subject: [PATCH] drm: fix deadlock of syncobj v2
> 
> v2:
> add a mutex between sync_cb execution and free.
> 
> Signed-off-by: Chunming Zhou 
> Cc: Daniel Vetter 
> Cc: Chris Wilson 
> Cc: Christian König 
> ---
>  drivers/gpu/drm/drm_syncobj.c | 11 +--
>  include/drm/drm_syncobj.h |  4 
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c
> b/drivers/gpu/drm/drm_syncobj.c index 57bf6006394d..c025a0b93565
> 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -158,9 +158,11 @@ void drm_syncobj_add_callback(struct drm_syncobj
> *syncobj,  void drm_syncobj_remove_callback(struct drm_syncobj *syncobj,
>struct drm_syncobj_cb *cb)
>  {
> + mutex_lock(>mutex);
>   spin_lock(>lock);
>   list_del_init(>node);
>   spin_unlock(>lock);
> + mutex_unlock(>mutex);
>  }
> 
>  static void drm_syncobj_init(struct drm_syncobj *syncobj) @@ -344,13
> +346,17 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
>   drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
>   if (fence) {
>   struct drm_syncobj_cb *cur, *tmp;
> + LIST_HEAD(cb_list);
> 
> + mutex_lock(>mutex);
>   spin_lock(>lock);
> - list_for_each_entry_safe(cur, tmp, >cb_list, node)
> {
> + list_splice_init(>cb_list, _list);
> + spin_unlock(>lock);
> + list_for_each_entry_safe(cur, tmp, _list, node) {
>   list_del_init(>node);
>   cur->func(syncobj, cur);
>   }
> - spin_unlock(>lock);
> + mutex_unlock(>mutex);
>   }
>  }
>  EXPORT_SYMBOL(drm_syncobj_replace_fence);
> @@ -501,6 +507,7 @@ int drm_syncobj_create(struct drm_syncobj
> **out_syncobj, uint32_t flags,
>   kref_init(>refcount);
>   INIT_LIST_HEAD(>cb_list);
>   spin_lock_init(>lock);
> + mutex_init(>mutex);
>   if (flags & DRM_SYNCOBJ_CREATE_TYPE_TIMELINE)
>   syncobj->type = DRM_SYNCOBJ_TYPE_TIMELINE;
>   else
> diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index
> 5e8c5c027e09..3d3c8c181bd2 100644
> --- a/include/drm/drm_syncobj.h
> +++ b/include/drm/drm_syncobj.h
> @@ -78,6 +78,10 @@ struct drm_syncobj {
>* @lock: Protects syncobj list and write-locks 
>*/
>   spinlock_t lock;
> + /**
> +  * @mutex: mutex between syncobj cb execution and free.
> +  */
> + struct mutex mutex;
>   /**
>* @file: A file backing for this syncobj.
>*/
> --
> 2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 7/7] drm/amdgpu: update version for timeline syncobj support in amdgpu

2018-10-15 Thread Zhou, David(ChunMing)
Ping...
Christian, Could I get your RB on the series? And help me to push to drm-misc?
After that I can rebase libdrm header file based on drm-next.

Thanks,
David Zhou

> -Original Message-
> From: amd-gfx  On Behalf Of
> Chunming Zhou
> Sent: Monday, October 15, 2018 4:56 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Zhou, David(ChunMing) ; amd-
> g...@lists.freedesktop.org
> Subject: [PATCH 7/7] drm/amdgpu: update version for timeline syncobj
> support in amdgpu
> 
> Signed-off-by: Chunming Zhou 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6870909da926..58cba492ba55 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -70,9 +70,10 @@
>   * - 3.25.0 - Add support for sensor query info (stable pstate sclk/mclk).
>   * - 3.26.0 - GFX9: Process AMDGPU_IB_FLAG_TC_WB_NOT_INVALIDATE.
>   * - 3.27.0 - Add new chunk to to AMDGPU_CS to enable BO_LIST creation.
> + * - 3.28.0 - Add syncobj timeline support to AMDGPU_CS.
>   */
>  #define KMS_DRIVER_MAJOR 3
> -#define KMS_DRIVER_MINOR 27
> +#define KMS_DRIVER_MINOR 28
>  #define KMS_DRIVER_PATCHLEVEL0
> 
>  int amdgpu_vram_limit = 0;
> --
> 2.17.1
> 
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 3/6] drm: add support of syncobj timeline point wait v2

2018-10-07 Thread Zhou, David(ChunMing)
>> Another general comment (no good place to put it) is that I think we want 
>> two kinds of waits:  Wait for time point to be completed and wait for time 
>> point to become available.  The first is the usual CPU wait for completion 
>> while the second is for use by userspace drivers to wait until the first 
>> moment where they can submit work which depends on a given time point.

Hi Jason,

How about adding two new wait flags?
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_COMPLETED
DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE

Thanks,
David

From: Christian König 
Sent: Tuesday, September 25, 2018 5:50 PM
To: Jason Ekstrand ; Zhou, David(ChunMing) 

Cc: amd-gfx mailing list ; Maling list - DRI 
developers 
Subject: Re: [PATCH 3/6] drm: add support of syncobj timeline point wait v2

Am 25.09.2018 um 11:22 schrieb Jason Ekstrand:
On Thu, Sep 20, 2018 at 6:04 AM Chunming Zhou 
mailto:david1.z...@amd.com>> wrote:
points array is one-to-one match with syncobjs array.
v2:
add seperate ioctl for timeline point wait, otherwise break uapi.

I think ioctl structs can be extended as long as fields aren't re-ordered.  I'm 
not sure on the details of this though as I'm not a particularly experienced 
kernel developer.

Yeah, that is correct. The problem in this particular case is that we don't 
change the direct IOCTL parameter, but rather the array it points to.

We could do something like keep the existing handles array and add a separate 
optional one for the timeline points. That would also drop the need for the 
padding of the structure.


Another general comment (no good place to put it) is that I think we want two 
kinds of waits:  Wait for time point to be completed and wait for time point to 
become available.  The first is the usual CPU wait for completion while the 
second is for use by userspace drivers to wait until the first moment where 
they can submit work which depends on a given time point.

Oh, yeah that is a really good point as ell.

Christian.



Signed-off-by: Chunming Zhou mailto:david1.z...@amd.com>>
---
 drivers/gpu/drm/drm_internal.h |  2 +
 drivers/gpu/drm/drm_ioctl.c|  2 +
 drivers/gpu/drm/drm_syncobj.c  | 99 +-
 include/uapi/drm/drm.h | 14 +
 4 files changed, 103 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 0c4eb4a9ab31..566d44e3c782 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -183,6 +183,8 @@ int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 6b4a633b4240..c0891614f516 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -669,6 +669,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET, drm_syncobj_reset_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 67472bd77c83..a43de0e4616c 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -126,13 +126,14 @@ static void drm_syncobj_add_callback_locked(struct 
drm_syncobj *syncobj,
 }

 static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,
+u64 point,
 struct dma_fence **fence,
 struct drm_syncobj_cb *cb,
 drm_syncobj_func_t func)
 {
int ret;

-   ret = drm_syncobj_search_fence(syncobj, 0, 0, fence);
+   ret = drm_syncobj_search_fence(syncobj, point, 0, fence);
if (!ret)
return 1;

@@ -143,7 +144,7 @@ static int drm_syncobj_fence_get_or_add_callback(struct 
drm_syncobj *syncobj,
 */
if (!list_empty(>signal_pt_list)) {
spin_unlock(>lock);
-   drm_syncobj_search_fence(syncobj, 0, 0, 

RE: [PATCH 5/6] drm/amdgpu: add timeline support in amdgpu CS

2018-10-07 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Nicolai Hähnle 
> Sent: Wednesday, September 26, 2018 4:44 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: amd-...@lists.freedesktop.org
> Subject: Re: [PATCH 5/6] drm/amdgpu: add timeline support in amdgpu CS
> 
> >   static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, diff --git
> > a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index
> > 1ceec56de015..412359b446f1 100644
> > --- a/include/uapi/drm/amdgpu_drm.h
> > +++ b/include/uapi/drm/amdgpu_drm.h
> > @@ -517,6 +517,8 @@ struct drm_amdgpu_gem_va {
> >   #define AMDGPU_CHUNK_ID_SYNCOBJ_IN  0x04
> >   #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
> >   #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
> > +#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT0x07
> > +#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL  0x08
> >
> >   struct drm_amdgpu_cs_chunk {
> > __u32   chunk_id;
> > @@ -592,6 +594,14 @@ struct drm_amdgpu_cs_chunk_sem {
> > __u32 handle;
> >   };
> >
> > +struct drm_amdgpu_cs_chunk_syncobj {
> > +   __u32 handle;
> > +   __u32 pad;
> > +   __u64 point;
> > +   __u64 flags;
> > +};
> 
> Sure it's nice to be forward-looking, but can't we just put the flags into the
> padding?

Will change.

Thanks,
David
> 
> Cheers,
> Nicolai
> 
> 
> > +
> > +
> >   #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ0
> >   #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD 1
> >   #define AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD   2
> >
> 
> 
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 5/6] drm/amdgpu: add timeline support in amdgpu CS

2018-10-07 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Nicolai Hähnle 
> Sent: Wednesday, September 26, 2018 5:06 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: amd-...@lists.freedesktop.org
> Subject: Re: [PATCH 5/6] drm/amdgpu: add timeline support in amdgpu CS
> 
> Hey Chunming,
> 
> On 20.09.2018 13:03, Chunming Zhou wrote:
> > @@ -1113,48 +1117,91 @@ static int
> amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
> >   }
> >
> >   static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser
> *p,
> > -   struct amdgpu_cs_chunk *chunk)
> > +   struct amdgpu_cs_chunk *chunk,
> > +   bool timeline)
> >   {
> > unsigned num_deps;
> > int i, r;
> > -   struct drm_amdgpu_cs_chunk_sem *deps;
> >
> > -   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
> > -   num_deps = chunk->length_dw * 4 /
> > -   sizeof(struct drm_amdgpu_cs_chunk_sem);
> > +   if (!timeline) {
> > +   struct drm_amdgpu_cs_chunk_sem *deps;
> >
> > -   for (i = 0; i < num_deps; ++i) {
> > -   r = amdgpu_syncobj_lookup_and_add_to_sync(p,
> deps[i].handle);
> > +   deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
> > +   num_deps = chunk->length_dw * 4 /
> > +   sizeof(struct drm_amdgpu_cs_chunk_sem);
> > +   for (i = 0; i < num_deps; ++i) {
> > +   r = amdgpu_syncobj_lookup_and_add_to_sync(p,
> deps[i].handle,
> > + 0, 0);
> > if (r)
> > return r;
> 
> The indentation looks wrong.
> 
> 
> > +   }
> > +   } else {
> > +   struct drm_amdgpu_cs_chunk_syncobj *syncobj_deps;
> > +
> > +   syncobj_deps = (struct drm_amdgpu_cs_chunk_syncobj
> *)chunk->kdata;
> > +   num_deps = chunk->length_dw * 4 /
> > +   sizeof(struct drm_amdgpu_cs_chunk_syncobj);
> > +   for (i = 0; i < num_deps; ++i) {
> > +   r = amdgpu_syncobj_lookup_and_add_to_sync(p,
> syncobj_deps[i].handle,
> > +
> syncobj_deps[i].point,
> > +
> syncobj_deps[i].flags);
> > +   if (r)
> > +   return r;
> 
> Here as well.
> 
> So I'm wondering a bit about this uapi. Specifically, what happens if you try 
> to
> use timeline syncobjs here as dependencies _without_
> DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT?
> 
> My understanding is, it'll just return -EINVAL without any indication as to
> which syncobj actually failed. What's the caller supposed to do then?

How about adding a print to indicate which syncobj failed?

Thanks,
David Zhou
> 
> Cheers,
> Nicolai
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 3/6] drm: add support of syncobj timeline point wait v2

2018-09-21 Thread Zhou, David(ChunMing)


> -Original Message-
> From: amd-gfx  On Behalf Of
> Christian K?nig
> Sent: Thursday, September 20, 2018 7:11 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: amd-...@lists.freedesktop.org
> Subject: Re: [PATCH 3/6] drm: add support of syncobj timeline point wait v2
> 
> Am 20.09.2018 um 13:03 schrieb Chunming Zhou:
> > points array is one-to-one match with syncobjs array.
> > v2:
> > add seperate ioctl for timeline point wait, otherwise break uapi.
> >
> > Signed-off-by: Chunming Zhou 
> > ---
> >   drivers/gpu/drm/drm_internal.h |  2 +
> >   drivers/gpu/drm/drm_ioctl.c|  2 +
> >   drivers/gpu/drm/drm_syncobj.c  | 99
> +-
> >   include/uapi/drm/drm.h | 14 +
> >   4 files changed, 103 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_internal.h
> > b/drivers/gpu/drm/drm_internal.h index 0c4eb4a9ab31..566d44e3c782
> > 100644
> > --- a/drivers/gpu/drm/drm_internal.h
> > +++ b/drivers/gpu/drm/drm_internal.h
> > @@ -183,6 +183,8 @@ int drm_syncobj_fd_to_handle_ioctl(struct
> drm_device *dev, void *data,
> >struct drm_file *file_private);
> >   int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
> >struct drm_file *file_private);
> > +int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
> > +   struct drm_file *file_private);
> >   int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
> > struct drm_file *file_private);
> >   int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
> > diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> > index 6b4a633b4240..c0891614f516 100644
> > --- a/drivers/gpu/drm/drm_ioctl.c
> > +++ b/drivers/gpu/drm/drm_ioctl.c
> > @@ -669,6 +669,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
> >   DRM_UNLOCKED|DRM_RENDER_ALLOW),
> > DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT,
> drm_syncobj_wait_ioctl,
> >   DRM_UNLOCKED|DRM_RENDER_ALLOW),
> > +   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT,
> drm_syncobj_timeline_wait_ioctl,
> > + DRM_UNLOCKED|DRM_RENDER_ALLOW),
> > DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET,
> drm_syncobj_reset_ioctl,
> >   DRM_UNLOCKED|DRM_RENDER_ALLOW),
> > DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL,
> drm_syncobj_signal_ioctl,
> > diff --git a/drivers/gpu/drm/drm_syncobj.c
> > b/drivers/gpu/drm/drm_syncobj.c index 67472bd77c83..a43de0e4616c
> > 100644
> > --- a/drivers/gpu/drm/drm_syncobj.c
> > +++ b/drivers/gpu/drm/drm_syncobj.c
> > @@ -126,13 +126,14 @@ static void
> drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,
> >   }
> >
> >   static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj
> > *syncobj,
> > +u64 point,
> >  struct dma_fence **fence,
> >  struct drm_syncobj_cb *cb,
> >  drm_syncobj_func_t func)
> >   {
> > int ret;
> >
> > -   ret = drm_syncobj_search_fence(syncobj, 0, 0, fence);
> > +   ret = drm_syncobj_search_fence(syncobj, point, 0, fence);
> > if (!ret)
> > return 1;
> >
> > @@ -143,7 +144,7 @@ static int
> drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,
> >  */
> > if (!list_empty(>signal_pt_list)) {
> > spin_unlock(>lock);
> > -   drm_syncobj_search_fence(syncobj, 0, 0, fence);
> > +   drm_syncobj_search_fence(syncobj, point, 0, fence);
> > if (*fence)
> > return 1;
> > spin_lock(>lock);
> > @@ -358,7 +359,9 @@ void drm_syncobj_replace_fence(struct
> drm_syncobj *syncobj,
> > spin_lock(>lock);
> > list_for_each_entry_safe(cur, tmp, >cb_list, node)
> {
> > list_del_init(>node);
> > +   spin_unlock(>lock);
> > cur->func(syncobj, cur);
> > +   spin_lock(>lock);
> 
> That looks fishy to me. Why do we need to unlock 

Cb func will call _search_fence, which will need to grab the lock, otherwise 
deadlock.


>and who guarantees that
> tmp is still valid when we grab the lock again?

Sorry for that, quickly  fix deadlock and forget to

RE: [PATCH 2/6] [RFC]drm: add syncobj timeline support v7

2018-09-20 Thread Zhou, David(ChunMing)


> -Original Message-
> From: amd-gfx  On Behalf Of
> Christian K?nig
> Sent: Thursday, September 20, 2018 5:35 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; Daniel Vetter ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH 2/6] [RFC]drm: add syncobj timeline support v7
> 
> The only thing I can still see is that you use wait_event_timeout() instead of
> wait_event_interruptible().
> 
> Any particular reason for that?

I tried again after you said last thread, CTS always fail, and syncobj unit 
test fails as well.


> 
> Apart from that it now looks good to me.

Thanks, Can I get your RB on it?

Btw, I realize Vulkan spec names semaphore type as binary and timeline, so how 
about change _TYPE_INDIVIDUAL  to _TYPE_BINARY ?

Regards,
David Zhou
> 
> Christian.
> 
> Am 20.09.2018 um 11:29 schrieb Zhou, David(ChunMing):
> > Ping...
> >
> >> -Original Message-
> >> From: amd-gfx  On Behalf Of
> >> Chunming Zhou
> >> Sent: Wednesday, September 19, 2018 5:18 PM
> >> To: dri-devel@lists.freedesktop.org
> >> Cc: Zhou, David(ChunMing) ; amd-
> >> g...@lists.freedesktop.org; Rakos, Daniel ;
> >> Daniel Vetter ; Dave Airlie ;
> >> Koenig, Christian 
> >> Subject: [PATCH 2/6] [RFC]drm: add syncobj timeline support v7
> >>
> >> This patch is for VK_KHR_timeline_semaphore extension, semaphore is
> >> called syncobj in kernel side:
> >> This extension introduces a new type of syncobj that has an integer
> >> payload identifying a point in a timeline. Such timeline syncobjs
> >> support the following
> >> operations:
> >> * CPU query - A host operation that allows querying the payload of the
> >>   timeline syncobj.
> >> * CPU wait - A host operation that allows a blocking wait for a
> >>   timeline syncobj to reach a specified value.
> >> * Device wait - A device operation that allows waiting for a
> >>   timeline syncobj to reach a specified value.
> >> * Device signal - A device operation that allows advancing the
> >>   timeline syncobj to a specified value.
> >>
> >> v1:
> >> Since it's a timeline, that means the front time point(PT) always is
> >> signaled before the late PT.
> >> a. signal PT design:
> >> Signal PT fence N depends on PT[N-1] fence and signal opertion fence,
> >> when PT[N] fence is signaled, the timeline will increase to value of PT[N].
> >> b. wait PT design:
> >> Wait PT fence is signaled by reaching timeline point value, when
> >> timeline is increasing, will compare wait PTs value with new timeline
> >> value, if PT value is lower than timeline value, then wait PT will be 
> >> signaled,
> otherwise keep in list.
> >> syncobj wait operation can wait on any point of timeline, so need a
> >> RB tree to order them. And wait PT could ahead of signal PT, we need
> >> a sumission fence to perform that.
> >>
> >> v2:
> >> 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2.
> >> move unexposed denitions to .c file. (Daniel Vetter) 3. split up the
> >> change to
> >> drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up
> >> the change to drm_syncobj_replace_fence() in a separate patch.
> >> 5. drop the submission_fence implementation and instead use
> >> wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL
> type syncobj case.
> >> (Daniel Vetter)
> >>
> >> v3:
> >> 1. replace normal syncobj with timeline implemenation. (Vetter and
> Christian)
> >>  a. normal syncobj signal op will create a signal PT to tail of signal 
> >> pt list.
> >>  b. normal syncobj wait op will create a wait pt with last signal
> >> point, and this wait PT is only signaled by related signal point PT.
> >> 2. many bug fix and clean up
> >> 3. stub fence moving is moved to other patch.
> >>
> >> v4:
> >> 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2.
> >> fix syncobj lifecycle. (Christian) 3. only enable_signaling when
> >> there is wait_pt. (Christian) 4. fix timeline path issues.
> >> 5. write a timeline test in libdrm
> >>
> >> v5: (Christian)
> >> 1. semaphore is called syncobj in kernel side.
> >> 2. don't need 'timeline' characters in some function name.
> >> 3. keep syncobj cb.
> >>
> >> v6: (Christian)
&

RE: [PATCH 2/6] [RFC]drm: add syncobj timeline support v7

2018-09-20 Thread Zhou, David(ChunMing)
Ping...

> -Original Message-
> From: amd-gfx  On Behalf Of
> Chunming Zhou
> Sent: Wednesday, September 19, 2018 5:18 PM
> To: dri-devel@lists.freedesktop.org
> Cc: Zhou, David(ChunMing) ; amd-
> g...@lists.freedesktop.org; Rakos, Daniel ; Daniel
> Vetter ; Dave Airlie ; Koenig,
> Christian 
> Subject: [PATCH 2/6] [RFC]drm: add syncobj timeline support v7
> 
> This patch is for VK_KHR_timeline_semaphore extension, semaphore is
> called syncobj in kernel side:
> This extension introduces a new type of syncobj that has an integer payload
> identifying a point in a timeline. Such timeline syncobjs support the 
> following
> operations:
>* CPU query - A host operation that allows querying the payload of the
>  timeline syncobj.
>* CPU wait - A host operation that allows a blocking wait for a
>  timeline syncobj to reach a specified value.
>* Device wait - A device operation that allows waiting for a
>  timeline syncobj to reach a specified value.
>* Device signal - A device operation that allows advancing the
>  timeline syncobj to a specified value.
> 
> v1:
> Since it's a timeline, that means the front time point(PT) always is signaled
> before the late PT.
> a. signal PT design:
> Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when
> PT[N] fence is signaled, the timeline will increase to value of PT[N].
> b. wait PT design:
> Wait PT fence is signaled by reaching timeline point value, when timeline is
> increasing, will compare wait PTs value with new timeline value, if PT value 
> is
> lower than timeline value, then wait PT will be signaled, otherwise keep in 
> list.
> syncobj wait operation can wait on any point of timeline, so need a RB tree to
> order them. And wait PT could ahead of signal PT, we need a sumission fence
> to perform that.
> 
> v2:
> 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2.
> move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to
> drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the
> change to drm_syncobj_replace_fence() in a separate patch.
> 5. drop the submission_fence implementation and instead use wait_event()
> for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case.
> (Daniel Vetter)
> 
> v3:
> 1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
> a. normal syncobj signal op will create a signal PT to tail of signal pt 
> list.
> b. normal syncobj wait op will create a wait pt with last signal point, 
> and this
> wait PT is only signaled by related signal point PT.
> 2. many bug fix and clean up
> 3. stub fence moving is moved to other patch.
> 
> v4:
> 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2. fix syncobj
> lifecycle. (Christian) 3. only enable_signaling when there is wait_pt. 
> (Christian)
> 4. fix timeline path issues.
> 5. write a timeline test in libdrm
> 
> v5: (Christian)
> 1. semaphore is called syncobj in kernel side.
> 2. don't need 'timeline' characters in some function name.
> 3. keep syncobj cb.
> 
> v6: (Christian)
> 1. merge syncobj_timeline to syncobj structure.
> 2. simplify some check sentences.
> 3. some misc change.
> 4. fix CTS failed issue.
> 
> v7: (Christian)
> 1. error handling when creating signal pt.
> 2. remove timeline naming in func.
> 3. export flags in find_fence.
> 4. allow reset timeline.
> 
> individual syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* timeline
> syncobj is tested by ./amdgpu_test -s 9
> 
> Signed-off-by: Chunming Zhou 
> Cc: Christian Konig 
> Cc: Dave Airlie 
> Cc: Daniel Rakos 
> Cc: Daniel Vetter 
> ---
>  drivers/gpu/drm/drm_syncobj.c  | 293 ++---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
>  include/drm/drm_syncobj.h  |  65 ++---
>  include/uapi/drm/drm.h |   1 +
>  4 files changed, 287 insertions(+), 74 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c
> b/drivers/gpu/drm/drm_syncobj.c index f796c9fc3858..95b60ac045c6 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -56,6 +56,9 @@
>  #include "drm_internal.h"
>  #include 
> 
> +/* merge normal syncobj to timeline syncobj, the point interval is 1 */
> +#define DRM_SYNCOBJ_INDIVIDUAL_POINT 1
> +
>  struct drm_syncobj_stub_fence {
>   struct dma_fence base;
>   spinlock_t lock;
> @@ -82,6 +85,11 @@ static const struct dma_fence_ops
> drm_syncobj_stub_fence_ops = {
>   .release = drm_syncobj_stub_fence_release,  };
> 
> +struct drm_syncobj_signal_pt {
> +

RE: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6

2018-09-19 Thread Zhou, David(ChunMing)


> -Original Message-
> From: amd-gfx  On Behalf Of
> Christian K?nig
> Sent: Wednesday, September 19, 2018 3:45 PM
> To: Zhou, David(ChunMing) ; Zhou,
> David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; Daniel Vetter ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH 1/4] [RFC]drm: add syncobj timeline support v6
> 
> Am 19.09.2018 um 09:32 schrieb zhoucm1:
> >
> >
> > On 2018年09月19日 15:18, Christian König wrote:
> >> Am 19.09.2018 um 06:26 schrieb Chunming Zhou:
> > [snip]
> >>>   *fence = NULL;
> >>>   drm_syncobj_add_callback_locked(syncobj, cb, func); @@
> >>> -164,6 +177,153 @@ void drm_syncobj_remove_callback(struct
> >>> drm_syncobj *syncobj,
> >>>   spin_unlock(>lock);
> >>>   }
> >>>   +static void drm_syncobj_timeline_init(struct drm_syncobj
> >>> *syncobj)
> >>
> >> We still have _timeline_ in the name here.
> > the func is relevant to timeline members, or which name is proper?
> 
> Yeah, but we now use the timeline implementation for the individual syncobj
> as well.
> 
> Not a big issue, but I would just name it
> drm_syncobj_init()/drm_syncobj_fini.

There is already drm_syncobj_init/fini in drm_syncboj.c , any other name can be 
suggested?

> 
> >
> >>
> >>> +{
> >>> +    spin_lock(>lock);
> >>> +    syncobj->timeline_context = dma_fence_context_alloc(1);
> > [snip]
> >>> +}
> >>> +
> >>> +int drm_syncobj_lookup_fence(struct drm_syncobj *syncobj, u64
> >>> +point,
> >>> +   struct dma_fence **fence) {
> >>> +
> >>> +    return drm_syncobj_search_fence(syncobj, point,
> >>> +    DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT,
> >>
> >> I still have a bad feeling setting that flag as default cause it
> >> might change the behavior for the UAPI.
> >>
> >> Maybe export drm_syncobj_search_fence directly? E.g. with the flags
> >> parameter.
> > previous v5 indeed do this, you let me wrap it, need change back?
> 
> No, the problem is that drm_syncobj_find_fence() is still using
> drm_syncobj_lookup_fence() which sets the flag instead of
> drm_syncobj_search_fence() without the flag.
> 
> That changes the UAPI behavior because previously we would have returned
> an error code and now we block for a fence to appear.
> 
> So I think the right solution would be to add the flags parameter to
> drm_syncobj_find_fence() and let the driver decide if we need to block or
> get -ENOENT.

Got your means,
Exporting flag in func is easy,
 but driver doesn't pass flag, which flag is proper by default? We still need 
to give a default flag in patch, don't we?

Thanks,
David Zhou

> 
> Regards,
> Christian.
> 
> >
> > Regards,
> > David Zhou
> >>
> >> Regards,
> >> Christian.
> >>
> >>> +    fence);
> >>> +}
> >>> +EXPORT_SYMBOL(drm_syncobj_lookup_fence);
> >>> +
> >>>   /**
> >>>    * drm_syncobj_find_fence - lookup and reference the fence in a
> >>> sync object
> >>>    * @file_private: drm file private pointer @@ -228,7 +443,7 @@
> >>> static int drm_syncobj_assign_null_handle(struct
> >>> drm_syncobj *syncobj)
> >>>    * @fence: out parameter for the fence
> >>>    *
> >>>    * This is just a convenience function that combines
> >>> drm_syncobj_find() and
> >>> - * drm_syncobj_fence_get().
> >>> + * drm_syncobj_lookup_fence().
> >>>    *
> >>>    * Returns 0 on success or a negative error value on failure. On
> >>> success @fence
> >>>    * contains a reference to the fence, which must be released by
> >>> calling @@ -236,18 +451,11 @@ static int
> >>> drm_syncobj_assign_null_handle(struct drm_syncobj *syncobj)
> >>>    */
> >>>   int drm_syncobj_find_fence(struct drm_file *file_private,
> >>>  u32 handle, u64 point,
> >>> -   struct dma_fence **fence) -{
> >>> +   struct dma_fence **fence) {
> >>>   struct drm_syncobj *syncobj = drm_syncobj_find(file_private,
> >>> handle);
> >>> -    int ret = 0;
> >>> -
> >>> -    if (!syncobj)
> >>> -    return -ENOENT;
> >>> +    int ret;
> >>>   -    *fe

RE: [PATCH] [RFC]drm: add syncobj timeline support v5

2018-09-16 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Daniel Vetter  On Behalf Of Daniel Vetter
> Sent: Saturday, September 15, 2018 12:11 AM
> To: Koenig, Christian 
> Cc: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org; amd-...@lists.freedesktop.org; Dave Airlie
> ; Rakos, Daniel ; Daniel
> Vetter 
> Subject: Re: [PATCH] [RFC]drm: add syncobj timeline support v5
> 
> On Fri, Sep 14, 2018 at 12:49:45PM +0200, Christian König wrote:
> > Am 14.09.2018 um 12:37 schrieb Chunming Zhou:
> > > This patch is for VK_KHR_timeline_semaphore extension, semaphore is
> called syncobj in kernel side:
> > > This extension introduces a new type of syncobj that has an integer
> > > payload identifying a point in a timeline. Such timeline syncobjs
> > > support the following operations:
> > > * CPU query - A host operation that allows querying the payload of the
> > >   timeline syncobj.
> > > * CPU wait - A host operation that allows a blocking wait for a
> > >   timeline syncobj to reach a specified value.
> > > * Device wait - A device operation that allows waiting for a
> > >   timeline syncobj to reach a specified value.
> > > * Device signal - A device operation that allows advancing the
> > >   timeline syncobj to a specified value.
> > >
> > > Since it's a timeline, that means the front time point(PT) always is
> signaled before the late PT.
> > > a. signal PT design:
> > > Signal PT fence N depends on PT[N-1] fence and signal opertion
> > > fence, when PT[N] fence is signaled, the timeline will increase to value 
> > > of
> PT[N].
> > > b. wait PT design:
> > > Wait PT fence is signaled by reaching timeline point value, when
> > > timeline is increasing, will compare wait PTs value with new
> > > timeline value, if PT value is lower than timeline value, then wait
> > > PT will be signaled, otherwise keep in list. syncobj wait operation
> > > can wait on any point of timeline, so need a RB tree to order them. And
> wait PT could ahead of signal PT, we need a sumission fence to perform that.
> > >
> > > v2:
> > > 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2.
> move
> > > unexposed denitions to .c file. (Daniel Vetter) 3. split up the
> > > change to drm_syncobj_find_fence() in a separate patch. (Christian)
> > > 4. split up the change to drm_syncobj_replace_fence() in a separate
> patch.
> > > 5. drop the submission_fence implementation and instead use
> > > wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL
> > > type syncobj case. (Daniel Vetter)
> > >
> > > v3:
> > > 1. replace normal syncobj with timeline implemenation. (Vetter and
> Christian)
> > >  a. normal syncobj signal op will create a signal PT to tail of 
> > > signal pt list.
> > >  b. normal syncobj wait op will create a wait pt with last signal 
> > > point, and
> this wait PT is only signaled by related signal point PT.
> > > 2. many bug fix and clean up
> > > 3. stub fence moving is moved to other patch.
> > >
> > > v4:
> > > 1. fix RB tree loop with while(node=rb_first(...)). (Christian) 2.
> > > fix syncobj lifecycle. (Christian) 3. only enable_signaling when
> > > there is wait_pt. (Christian) 4. fix timeline path issues.
> > > 5. write a timeline test in libdrm
> > >
> > > v5: (Christian)
> > > 1. semaphore is called syncobj in kernel side.
> > > 2. don't need 'timeline' characters in some function name.
> > > 3. keep syncobj cb
> > >
> > > normal syncobj is tested by ./deqp-vk -n dEQP-VK*semaphore* timeline
> > > syncobj is tested by ./amdgpu_test -s 9
> > >
> > > Signed-off-by: Chunming Zhou 
> > > Cc: Christian Konig 
> > > Cc: Dave Airlie 
> > > Cc: Daniel Rakos 
> > > Cc: Daniel Vetter 
> >
> > At least on first glance that looks like it should work, going to do a
> > detailed review on Monday.
> 
> Just for my understanding, it's all condensed down to 1 patch now?

Yes, Christian suggest that.

 >I kinda
> didn't follow the detailed discussion last few days at all :-/
> 
> Also, is there a testcase, igt highly preferred (because then we'll run it in 
> our
> intel-gfx CI, and a bunch of people outside of intel have already discovered
> that and are using it).


I already wrote the test in libdrm unit test, since I'm not familiar with IGT 
stuff.

Thanks,
David Zhou
> 
> Thanks, Daniel
> 
> >
>

RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-14 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Koenig, Christian
> Sent: Friday, September 14, 2018 3:27 PM
> To: Zhou, David(ChunMing) ; Zhou,
> David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; amd-...@lists.freedesktop.org; Daniel Vetter
> 
> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
> Am 14.09.2018 um 05:59 schrieb zhoucm1:
> >
> >
> > On 2018年09月14日 11:14, zhoucm1 wrote:
> >>
> >>
> >> On 2018年09月13日 18:22, Christian König wrote:
> >>> Am 13.09.2018 um 11:35 schrieb Zhou, David(ChunMing):
> >>>>
> >>>>> -----Original Message-
> >>>>> From: Koenig, Christian
> >>>>> Sent: Thursday, September 13, 2018 5:20 PM
> >>>>> To: Zhou, David(ChunMing) ; dri-
> >>>>> de...@lists.freedesktop.org
> >>>>> Cc: Dave Airlie ; Rakos, Daniel
> >>>>> ; amd-...@lists.freedesktop.org
> >>>>> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> >>>>>
> >>>>> Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):
> >>>>>>> -Original Message-
> >>>>>>> From: Christian König 
> >>>>>>> Sent: Thursday, September 13, 2018 4:50 PM
> >>>>>>> To: Zhou, David(ChunMing) ; Koenig,
> >>>>>>> Christian ;
> >>>>>>> dri-devel@lists.freedesktop.org
> >>>>>>> Cc: Dave Airlie ; Rakos, Daniel
> >>>>>>> ; amd-...@lists.freedesktop.org
> >>>>>>> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support
> >>>>>>> v4
> >>>>>>>
> >>>>>>> Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):
> >>>>>>>>> -Original Message-
> >>>>>>>>> From: Koenig, Christian
> >>>>>>>>> Sent: Thursday, September 13, 2018 2:56 PM
> >>>>>>>>> To: Zhou, David(ChunMing) ; Zhou,
> >>>>>>>>> David(ChunMing) ; dri-
> >>>>>>>>> de...@lists.freedesktop.org
> >>>>>>>>> Cc: Dave Airlie ; Rakos, Daniel
> >>>>>>>>> ; amd-...@lists.freedesktop.org
> >>>>>>>>> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline
> >>>>>>>>> support v4
> >>>>>>>>>
> >>>>>>>>> Am 13.09.2018 um 04:15 schrieb zhoucm1:
> >>>>>>>>>> On 2018年09月12日 19:05, Christian König wrote:
> >>>>>>>>>>>>>>> [SNIP]
> >>>>>>>>>>>>>>> +static void
> >>>>>>>>>>>>>>> +drm_syncobj_find_signal_pt_for_wait_pt(struct
> >>>>>>>>>>>>>>> drm_syncobj *syncobj,
> >>>>>>>>>>>>>>> +   struct drm_syncobj_wait_pt
> >>>>>>>>>>>>>>> +*wait_pt) {
> >>>>>>>>>>>>>> That whole approach still looks horrible complicated to me.
> >>>>>>>>>>>> It's already very close to what you said before.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> Especially the separation of signal and wait pt is
> >>>>>>>>>>>>>> completely unnecessary as far as I can see.
> >>>>>>>>>>>>>> When a wait pt is requested we just need to search for
> >>>>>>>>>>>>>> the signal point which it will trigger.
> >>>>>>>>>>>> Yeah, I tried this, but when I implement cpu wait ioctl on
> >>>>>>>>>>>> specific point, we need a advanced wait pt fence,
> >>>>>>>>>>>> otherwise, we could still need old syncobj cb.
> >>>>>>>>>>> Why? I mean you just need to call drm_syncobj_find_fence()
> >>>>>>>>>>> and
> >>>>>>> when
> >>>>>>>>>>> that one returns NULL you use wait_event_*() to wait for a
> >>>>>>>>>>> signal point >= your wait point to appear and tr

RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Koenig, Christian
> Sent: Thursday, September 13, 2018 5:20 PM
> To: Zhou, David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; amd-...@lists.freedesktop.org
> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
> Am 13.09.2018 um 11:11 schrieb Zhou, David(ChunMing):
> >
> >> -Original Message-
> >> From: Christian König 
> >> Sent: Thursday, September 13, 2018 4:50 PM
> >> To: Zhou, David(ChunMing) ; Koenig, Christian
> >> ; dri-devel@lists.freedesktop.org
> >> Cc: Dave Airlie ; Rakos, Daniel
> >> ; amd-...@lists.freedesktop.org
> >> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> >>
> >> Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):
> >>>> -----Original Message-
> >>>> From: Koenig, Christian
> >>>> Sent: Thursday, September 13, 2018 2:56 PM
> >>>> To: Zhou, David(ChunMing) ; Zhou,
> >>>> David(ChunMing) ; dri-
> >>>> de...@lists.freedesktop.org
> >>>> Cc: Dave Airlie ; Rakos, Daniel
> >>>> ; amd-...@lists.freedesktop.org
> >>>> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> >>>>
> >>>> Am 13.09.2018 um 04:15 schrieb zhoucm1:
> >>>>> On 2018年09月12日 19:05, Christian König wrote:
> >>>>>>>>>> [SNIP]
> >>>>>>>>>> +static void drm_syncobj_find_signal_pt_for_wait_pt(struct
> >>>>>>>>>> drm_syncobj *syncobj,
> >>>>>>>>>> +   struct drm_syncobj_wait_pt
> >>>>>>>>>> +*wait_pt) {
> >>>>>>>>> That whole approach still looks horrible complicated to me.
> >>>>>>> It's already very close to what you said before.
> >>>>>>>
> >>>>>>>>> Especially the separation of signal and wait pt is completely
> >>>>>>>>> unnecessary as far as I can see.
> >>>>>>>>> When a wait pt is requested we just need to search for the
> >>>>>>>>> signal point which it will trigger.
> >>>>>>> Yeah, I tried this, but when I implement cpu wait ioctl on
> >>>>>>> specific point, we need a advanced wait pt fence, otherwise, we
> >>>>>>> could still need old syncobj cb.
> >>>>>> Why? I mean you just need to call drm_syncobj_find_fence() and
> >> when
> >>>>>> that one returns NULL you use wait_event_*() to wait for a signal
> >>>>>> point >= your wait point to appear and try again.
> >>>>> e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
> >>>>> no fence yet, as you said, during drm_syncobj_find_fence(A) is
> >>>>> working on wait_event, syncobjB and syncobjC could already be
> >>>>> signaled, then we don't know which one is first signaled, which is
> >>>>> need when wait ioctl returns.
> >>>> I don't really see a problem with that. When you wait for the first
> >>>> one you need to wait for A,B,C at the same time anyway.
> >>>>
> >>>> So what you do is to register a fence callback on the fences you
> >>>> already have and for the syncobj which doesn't yet have a fence you
> >>>> make sure that they wake up your thread when they get one.
> >>>>
> >>>> So essentially exactly what drm_syncobj_fence_get_or_add_callback()
> >>>> already does today.
> >>> So do you mean we need still use old syncobj CB for that?
> >> Yes, as far as I can see it should work.
> >>
> >>>Advanced wait pt is bad?
> >> Well it isn't bad, I just don't see any advantage in it.
> >
> > The advantage is to replace old syncobj cb.
> >
> >> The existing mechanism
> >> should already be able to handle that.
> > I thought more a bit, we don't that mechanism at all, if use advanced wait
> pt, we can easily use fence array to achieve it for wait ioctl, we should use
> kernel existing feature as much as possible, not invent another, shouldn't we?
> I remember  you said  it before.
> 
> Yeah, but the syncobj cb is an existing feature.

This is obviously a workaround when doing for wait ioctl, Do you see it used in 
other place?

> And I absolutely don't

RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Christian König 
> Sent: Thursday, September 13, 2018 4:50 PM
> To: Zhou, David(ChunMing) ; Koenig, Christian
> ; dri-devel@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; amd-...@lists.freedesktop.org
> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
> Am 13.09.2018 um 09:43 schrieb Zhou, David(ChunMing):
> >
> >> -Original Message-
> >> From: Koenig, Christian
> >> Sent: Thursday, September 13, 2018 2:56 PM
> >> To: Zhou, David(ChunMing) ; Zhou,
> >> David(ChunMing) ; dri-
> >> de...@lists.freedesktop.org
> >> Cc: Dave Airlie ; Rakos, Daniel
> >> ; amd-...@lists.freedesktop.org
> >> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> >>
> >> Am 13.09.2018 um 04:15 schrieb zhoucm1:
> >>> On 2018年09月12日 19:05, Christian König wrote:
> >>>>>>>> [SNIP]
> >>>>>>>> +static void drm_syncobj_find_signal_pt_for_wait_pt(struct
> >>>>>>>> drm_syncobj *syncobj,
> >>>>>>>> +   struct drm_syncobj_wait_pt
> >>>>>>>> +*wait_pt) {
> >>>>>>> That whole approach still looks horrible complicated to me.
> >>>>> It's already very close to what you said before.
> >>>>>
> >>>>>>> Especially the separation of signal and wait pt is completely
> >>>>>>> unnecessary as far as I can see.
> >>>>>>> When a wait pt is requested we just need to search for the
> >>>>>>> signal point which it will trigger.
> >>>>> Yeah, I tried this, but when I implement cpu wait ioctl on
> >>>>> specific point, we need a advanced wait pt fence, otherwise, we
> >>>>> could still need old syncobj cb.
> >>>> Why? I mean you just need to call drm_syncobj_find_fence() and
> when
> >>>> that one returns NULL you use wait_event_*() to wait for a signal
> >>>> point >= your wait point to appear and try again.
> >>> e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have
> >>> no fence yet, as you said, during drm_syncobj_find_fence(A) is
> >>> working on wait_event, syncobjB and syncobjC could already be
> >>> signaled, then we don't know which one is first signaled, which is
> >>> need when wait ioctl returns.
> >> I don't really see a problem with that. When you wait for the first
> >> one you need to wait for A,B,C at the same time anyway.
> >>
> >> So what you do is to register a fence callback on the fences you
> >> already have and for the syncobj which doesn't yet have a fence you
> >> make sure that they wake up your thread when they get one.
> >>
> >> So essentially exactly what drm_syncobj_fence_get_or_add_callback()
> >> already does today.
> > So do you mean we need still use old syncobj CB for that?
> 
> Yes, as far as I can see it should work.
> 
> >   Advanced wait pt is bad?
> 
> Well it isn't bad, I just don't see any advantage in it.


The advantage is to replace old syncobj cb.

> The existing mechanism
> should already be able to handle that.

I thought more a bit, we don't that mechanism at all, if use advanced wait pt, 
we can easily use fence array to achieve it for wait ioctl, we should use 
kernel existing feature as much as possible, not invent another, shouldn't we?  
I remember  you said  it before.

Thanks,
David Zhou
> 
> Christian.
> 
> >
> > Thanks,
> > David Zhou
> >> Regards,
> >> Christian.
> >>
> >>> Back to my implementation, it already fixes all your concerns
> >>> before, and can be able to easily used in wait_ioctl. When you feel
> >>> that is complicated, I guess that is because we merged all logic to
> >>> that and much clean up in one patch. In fact, it already is very
> >>> simple, timeline_init/fini, create signal/wait_pt, find signal_pt
> >>> for wait_pt, garbage collection, just them.
> >>>
> >>> Thanks,
> >>> David Zhou
> >>>> Regards,
> >>>> Christian.
> > ___
> > amd-gfx mailing list
> > amd-...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4

2018-09-13 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Koenig, Christian
> Sent: Thursday, September 13, 2018 2:56 PM
> To: Zhou, David(ChunMing) ; Zhou,
> David(ChunMing) ; dri-
> de...@lists.freedesktop.org
> Cc: Dave Airlie ; Rakos, Daniel
> ; amd-...@lists.freedesktop.org
> Subject: Re: [PATCH 1/3] [RFC]drm: add syncobj timeline support v4
> 
> Am 13.09.2018 um 04:15 schrieb zhoucm1:
> > On 2018年09月12日 19:05, Christian König wrote:
> >>>>>
> >>>>>> [SNIP]
> >>>>>> +static void drm_syncobj_find_signal_pt_for_wait_pt(struct
> >>>>>> drm_syncobj *syncobj,
> >>>>>> +   struct drm_syncobj_wait_pt *wait_pt)
> >>>>>> +{
> >>>>>
> >>>>> That whole approach still looks horrible complicated to me.
> >>> It's already very close to what you said before.
> >>>
> >>>>>
> >>>>> Especially the separation of signal and wait pt is completely
> >>>>> unnecessary as far as I can see.
> >>>>> When a wait pt is requested we just need to search for the signal
> >>>>> point which it will trigger.
> >>> Yeah, I tried this, but when I implement cpu wait ioctl on specific
> >>> point, we need a advanced wait pt fence, otherwise, we could still
> >>> need old syncobj cb.
> >>
> >> Why? I mean you just need to call drm_syncobj_find_fence() and when
> >> that one returns NULL you use wait_event_*() to wait for a signal
> >> point >= your wait point to appear and try again.
> > e.g. when there are 3 syncobjs(A,B,C) to wait, all syncobjABC have no
> > fence yet, as you said, during drm_syncobj_find_fence(A) is working on
> > wait_event, syncobjB and syncobjC could already be signaled, then we
> > don't know which one is first signaled, which is need when wait ioctl
> > returns.
> 
> I don't really see a problem with that. When you wait for the first one you
> need to wait for A,B,C at the same time anyway.
> 
> So what you do is to register a fence callback on the fences you already have
> and for the syncobj which doesn't yet have a fence you make sure that they
> wake up your thread when they get one.
> 
> So essentially exactly what drm_syncobj_fence_get_or_add_callback()
> already does today.

So do you mean we need still use old syncobj CB for that? Advanced wait pt is 
bad?

Thanks,
David Zhou
> 
> Regards,
> Christian.
> 
> >
> > Back to my implementation, it already fixes all your concerns before,
> > and can be able to easily used in wait_ioctl. When you feel that is
> > complicated, I guess that is because we merged all logic to that and
> > much clean up in one patch. In fact, it already is very simple,
> > timeline_init/fini, create signal/wait_pt, find signal_pt for wait_pt,
> > garbage collection, just them.
> >
> > Thanks,
> > David Zhou
> >>
> >> Regards,
> >> Christian.

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] drm/scheduler: Add stopped flag to drm_sched_entity

2018-08-20 Thread Zhou, David(ChunMing)


-Original Message-
From: dri-devel  On Behalf Of Andrey 
Grodzovsky
Sent: Friday, August 17, 2018 11:16 PM
To: dri-devel@lists.freedesktop.org
Cc: Koenig, Christian ; amd-...@lists.freedesktop.org
Subject: [PATCH] drm/scheduler: Add stopped flag to drm_sched_entity

The flag will prevent another thread from same process to reinsert the entity 
queue into scheduler's rq after it was already removed from there by another 
thread during drm_sched_entity_flush.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 10 +-
 include/drm/gpu_scheduler.h  |  2 ++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 1416edb..07cfe63 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -177,8 +177,12 @@ long drm_sched_entity_flush(struct drm_sched_entity 
*entity, long timeout)
/* For killed process disable any more IBs enqueue right now */
last_user = cmpxchg(>last_user, current->group_leader, NULL);
if ((!last_user || last_user == current->group_leader) &&
-   (current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
+   (current->flags & PF_EXITING) && (current->exit_code == SIGKILL)) {
+   spin_lock(>rq_lock);
+   entity->stopped = true;
drm_sched_rq_remove_entity(entity->rq, entity);
+   spin_unlock(>rq_lock);
+   }
 
return ret;
 }
@@ -504,6 +508,10 @@ void drm_sched_entity_push_job(struct drm_sched_job 
*sched_job,
if (first) {
/* Add the entity to the run queue */
spin_lock(>rq_lock);
+   if (entity->stopped) {
+   spin_unlock(>rq_lock);
+   return;
+   }
[DZ] the code changes so frequent recently and has this regression, my code 
synced last Friday still has below checking:
spin_lock(>rq_lock);
if (!entity->rq) {
DRM_ERROR("Trying to push to a killed entity\n");
spin_unlock(>rq_lock);
return;
}
So you should add DRM_ERROR as well when hitting it.

With that fix, patch is Reviewed-by: Chunming Zhou 

Regards,
David Zhou
drm_sched_rq_add_entity(entity->rq, entity);
spin_unlock(>rq_lock);
drm_sched_wakeup(entity->rq->sched);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index 
919ae57..daec50f 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -70,6 +70,7 @@ enum drm_sched_priority {
  * @fini_status: contains the exit status in case the process was signalled.
  * @last_scheduled: points to the finished fence of the last scheduled job.
  * @last_user: last group leader pushing a job into the entity.
+ * @stopped: Marks the enity as removed from rq and destined for termination.
  *
  * Entities will emit jobs in order to their corresponding hardware
  * ring, and the scheduler will alternate between entities based on @@ -92,6 
+93,7 @@ struct drm_sched_entity {
atomic_t*guilty;
struct dma_fence*last_scheduled;
struct task_struct  *last_user;
+   boolstopped;
 };
 
 /**
--
2.7.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 3/4] drm/scheduler: add new function to get least loaded sched v2

2018-08-01 Thread Zhou, David(ChunMing)
Another big question:
I agree the general idea is good to balance scheduler load for same ring family.
But, when same entity job run on different scheduler, that means the later job 
could be completed ahead of front, Right?
That will break fence design, later fence must be signaled after front fence in 
same fence context.

Anything I missed?

Regards,
David Zhou

From: dri-devel  On Behalf Of Nayan 
Deshmukh
Sent: Thursday, August 02, 2018 12:07 AM
To: Grodzovsky, Andrey 
Cc: amd-...@lists.freedesktop.org; Maling list - DRI developers 
; Koenig, Christian 
Subject: Re: [PATCH 3/4] drm/scheduler: add new function to get least loaded 
sched v2

Yes, that is correct.

Nayan

On Wed, Aug 1, 2018, 9:05 PM Andrey Grodzovsky 
mailto:andrey.grodzov...@amd.com>> wrote:
Clarification question -  if the run queues belong to different
schedulers they effectively point to different rings,

it means we allow to move (reschedule) a drm_sched_entity from one ring
to another - i assume that the idea int the first place, that

you have a set of HW rings and you can utilize any of them for your jobs
(like compute rings). Correct ?

Andrey


On 08/01/2018 04:20 AM, Nayan Deshmukh wrote:
> The function selects the run queue from the rq_list with the
> least load. The load is decided by the number of jobs in a
> scheduler.
>
> v2: avoid using atomic read twice consecutively, instead store
>  it locally
>
> Signed-off-by: Nayan Deshmukh 
> mailto:nayan26deshm...@gmail.com>>
> ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 25 +
>   1 file changed, 25 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
> b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> index 375f6f7f6a93..fb4e542660b0 100644
> --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
> @@ -255,6 +255,31 @@ static bool drm_sched_entity_is_ready(struct 
> drm_sched_entity *entity)
>   return true;
>   }
>
> +/**
> + * drm_sched_entity_get_free_sched - Get the rq from rq_list with least load
> + *
> + * @entity: scheduler entity
> + *
> + * Return the pointer to the rq with least load.
> + */
> +static struct drm_sched_rq *
> +drm_sched_entity_get_free_sched(struct drm_sched_entity *entity)
> +{
> + struct drm_sched_rq *rq = NULL;
> + unsigned int min_jobs = UINT_MAX, num_jobs;
> + int i;
> +
> + for (i = 0; i < entity->num_rq_list; ++i) {
> + num_jobs = atomic_read(>rq_list[i]->sched->num_jobs);
> + if (num_jobs < min_jobs) {
> + min_jobs = num_jobs;
> + rq = entity->rq_list[i];
> + }
> + }
> +
> + return rq;
> +}
> +
>   static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
>   struct dma_fence_cb *cb)
>   {
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v2 1/2] drm/scheduler: Rename cleanup functions v2.

2018-06-21 Thread Zhou, David(ChunMing)
Acked-by: Chunming Zhou 

-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
Andrey Grodzovsky
Sent: Thursday, June 21, 2018 11:33 PM
To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Cc: e...@anholt.net; Koenig, Christian ; Grodzovsky, 
Andrey ; l.st...@pengutronix.de
Subject: [PATCH v2 1/2] drm/scheduler: Rename cleanup functions v2.

Everything in the flush code path (i.e. waiting for SW queue to become empty) 
names with *_flush() and everything in the release code path names *_fini()

This patch also effect the amdgpu and etnaviv drivers which use those functions.

v2:
Also apply the change to vd3.

Signed-off-by: Andrey Grodzovsky 
Suggested-by: Christian König 
Acked-by: Lucas Stach 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |  8 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_drv.c |  4 ++--
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 18 +-
 drivers/gpu/drm/v3d/v3d_drv.c |  2 +-
 include/drm/gpu_scheduler.h   |  6 +++---
 11 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 64b3a1e..c0f06c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -104,7 +104,7 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev,
 
 failed:
for (j = 0; j < i; j++)
-   drm_sched_entity_fini(>rings[j]->sched,
+   drm_sched_entity_destroy(>rings[j]->sched,
  >rings[j].entity);
kfree(ctx->fences);
ctx->fences = NULL;
@@ -178,7 +178,7 @@ static void amdgpu_ctx_do_release(struct kref *ref)
if (ctx->adev->rings[i] == >adev->gfx.kiq.ring)
continue;
 
-   drm_sched_entity_fini(>adev->rings[i]->sched,
+   drm_sched_entity_destroy(>adev->rings[i]->sched,
>rings[i].entity);
}
 
@@ -466,7 +466,7 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr *mgr)
if (ctx->adev->rings[i] == >adev->gfx.kiq.ring)
continue;
 
-   max_wait = 
drm_sched_entity_do_release(>adev->rings[i]->sched,
+   max_wait = 
drm_sched_entity_flush(>adev->rings[i]->sched,
  >rings[i].entity, max_wait);
}
}
@@ -492,7 +492,7 @@ void amdgpu_ctx_mgr_entity_cleanup(struct amdgpu_ctx_mgr 
*mgr)
continue;
 
if (kref_read(>refcount) == 1)
-   
drm_sched_entity_cleanup(>adev->rings[i]->sched,
+   
drm_sched_entity_fini(>adev->rings[i]->sched,
>rings[i].entity);
else
DRM_ERROR("ctx %p is still alive\n", ctx); diff 
--git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 0c084d3..0246cb8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -162,7 +162,7 @@ static int amdgpu_ttm_global_init(struct amdgpu_device 
*adev)  static void amdgpu_ttm_global_fini(struct amdgpu_device *adev)  {
if (adev->mman.mem_global_referenced) {
-   drm_sched_entity_fini(adev->mman.entity.sched,
+   drm_sched_entity_destroy(adev->mman.entity.sched,
  >mman.entity);
mutex_destroy(>mman.gtt_window_lock);
drm_global_item_unref(>mman.bo_global_ref.ref);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index cc15d32..0b46ea1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -309,7 +309,7 @@ int amdgpu_uvd_sw_fini(struct amdgpu_device *adev)
for (j = 0; j < adev->uvd.num_uvd_inst; ++j) {
kfree(adev->uvd.inst[j].saved_bo);
 
-   drm_sched_entity_fini(>uvd.inst[j].ring.sched, 
>uvd.inst[j].entity);
+   drm_sched_entity_destroy(>uvd.inst[j].ring.sched, 
+>uvd.inst[j].entity);
 
amdgpu_bo_free_kernel(>uvd.inst[j].vcpu_bo,
  >uvd.inst[j].gpu_addr,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
index 23d960e..b0dcdfd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c
@@ -222,7 +222,7 @@ int amdgpu_vce_sw_fini(struct 

RE: [PATCH 2/2] drm/amdgpu: set ttm bo priority before initialization

2018-05-10 Thread Zhou, David(ChunMing)
The series  is OK to me, Reviewed-by: Chunming  Zhou <david1.z...@amd.com>
It is better to wait Christian to have a look  before pushing patch.

Regards,
David Zhou
-Original Message-
From: Junwei Zhang [mailto:jerry.zh...@amd.com] 
Sent: Friday, May 11, 2018 12:58 PM
To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Cc: Koenig, Christian <christian.koe...@amd.com>; Zhou, David(ChunMing) 
<david1.z...@amd.com>; Zhang, Jerry <jerry.zh...@amd.com>
Subject: [PATCH 2/2] drm/amdgpu: set ttm bo priority before initialization

Signed-off-by: Junwei Zhang <jerry.zh...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index e62153a..6a9e46a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -419,6 +419,8 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 
bo->tbo.bdev = >mman.bdev;
amdgpu_ttm_placement_from_domain(bo, bp->domain);
+   if (bp->type == ttm_bo_type_kernel)
+   bo->tbo.priority = 1;
 
r = ttm_bo_init_reserved(>mman.bdev, >tbo, size, bp->type,
 >placement, page_align, , acc_size, @@ 
-434,9 +436,6 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
else
amdgpu_cs_report_moved_bytes(adev, ctx.bytes_moved, 0);
 
-   if (bp->type == ttm_bo_type_kernel)
-   bo->tbo.priority = 1;
-
if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED &&
bo->tbo.mem.placement & TTM_PL_FLAG_VRAM) {
struct dma_fence *fence;
--
1.9.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH] dma-buf/reservation: should keep later one in add fence(v3)

2018-03-05 Thread Zhou, David(ChunMing)
Patch looks ok to me, Reviewed-by: Chunming Zhou , and 
it’s better to get others RB as well.

Regards,
David Zhou
-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
Monk Liu
Sent: Tuesday, March 06, 2018 3:01 PM
To: dri-de...@freedesktop.org
Cc: Liu, Monk 
Subject: [PATCH] dma-buf/reservation: should keep later one in add fence(v3)

v2:
still check context first to avoid warning from dma_fence_is_later apply this 
fix in add_shared_replace as well

v3:
use a bool falg to indict if fence is need to insert to new slot and ignore it 
if it is an eld fence compared with the one with the same context in old->shared

Change-Id: If6a979ba9fd6c923b82212f35f07a9ff31c86767
Signed-off-by: Monk Liu 
---
 drivers/dma-buf/reservation.c | 28 ++--
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c 
index 314eb10..a7d0598 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -118,7 +118,8 @@ reservation_object_add_shared_inplace(struct 
reservation_object *obj,
old_fence = rcu_dereference_protected(fobj->shared[i],
reservation_object_held(obj));
 
-   if (old_fence->context == fence->context) {
+   if (old_fence->context == fence->context &&
+   dma_fence_is_later(fence, old_fence)) {
/* memory barrier is added by write_seqcount_begin */
RCU_INIT_POINTER(fobj->shared[i], fence);
write_seqcount_end(>seq);
@@ -158,6 +159,7 @@ reservation_object_add_shared_replace(struct 
reservation_object *obj,
  struct dma_fence *fence)
 {
unsigned i, j, k;
+   bool wrong_fence = false;
 
dma_fence_get(fence);
 
@@ -179,15 +181,29 @@ reservation_object_add_shared_replace(struct 
reservation_object *obj,
check = rcu_dereference_protected(old->shared[i],
reservation_object_held(obj));
 
-   if (check->context == fence->context ||
-   dma_fence_is_signaled(check))
+   if (dma_fence_is_signaled(check)) {
+   /* put check to tail of fobj if signaled */
RCU_INIT_POINTER(fobj->shared[--k], check);
-   else
+   } else if (check->context == fence->context) {
+   if (dma_fence_is_later(fence, check)) {
+   /* put check to tail of fobj if it is 
deprecated */
+   RCU_INIT_POINTER(fobj->shared[--k], check);
+   } else {
+   /* this is a wrong operation that add an eld 
fence */
+   wrong_fence = true;
+   RCU_INIT_POINTER(fobj->shared[j++], check);
+   }
+   } else {
+   /* add fence to new slot */
RCU_INIT_POINTER(fobj->shared[j++], check);
+   }
}
+
fobj->shared_count = j;
-   RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
-   fobj->shared_count++;
+   if (!wrong_fence) {
+   RCU_INIT_POINTER(fobj->shared[fobj->shared_count], fence);
+   fobj->shared_count++;
+   }
 
 done:
preempt_disable();
--
2.7.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/2] [WIP]drm/ttm: add waiter list to prevent allocation not in order

2018-01-26 Thread Zhou, David(ChunMing)
I don't want to prevent all, my new approach is to prevent the later allocation 
is trying and ahead of front to get the memory space that the front made from 
eviction.



发自坚果 Pro

Christian K鰊ig <ckoenig.leichtzumer...@gmail.com> 于 2018年1月26日 下午9:24写道:

Yes, exactly that's the problem.

See when you want to prevent a process B from allocating the memory process A 
has evicted, you need to prevent all concurrent allocation.

And we don't do that because it causes a major performance drop.

Regards,
Christian.

Am 26.01.2018 um 14:21 schrieb Zhou, David(ChunMing):
You patch will prevent concurrent allocation, and will result in allocation 
performance drop much.


发自坚果 Pro

Christian K鰊ig 
<ckoenig.leichtzumer...@gmail.com><mailto:ckoenig.leichtzumer...@gmail.com> 于 
2018年1月26日 下午9:04写道:

Attached is what you actually want to do cleanly implemented. But as I said 
this is a NO-GO.

Regards,
Christian.

Am 26.01.2018 um 13:43 schrieb Christian König:
After my investigation, this issue should be detect of TTM design self, which 
breaks scheduling balance.
Yeah, but again. This is indented design we can't change easily.

Regards,
Christian.

Am 26.01.2018 um 13:36 schrieb Zhou, David(ChunMing):
I am off work, so reply mail by phone, the format could not be text.

back to topic itself:
the problem indeed happen on amdgpu driver, someone reports me that application 
runs with two instances, the performance are different.
I also reproduced the issue with unit test(bo_eviction_test). They always think 
our scheduler isn't working as expected.

After my investigation, this issue should be detect of TTM design self, which 
breaks scheduling balance.

Further, if we run containers for our gpu, container A could run high score, 
container B runs low score with same benchmark.

So this is bug that we need fix.

Regards,
David Zhou


发自坚果 Pro

Christian K鰊ig 
<ckoenig.leichtzumer...@gmail.com><mailto:ckoenig.leichtzumer...@gmail.com> 于 
2018年1月26日 下午6:31写道:

Am 26.01.2018 um 11:22 schrieb Chunming Zhou:
> there is a scheduling balance issue about get node like:
> a. process A allocates full memory and use it for submission.
> b. process B tries to allocates memory, will wait for process A BO idle in 
> eviction.
> c. process A completes the job, process B eviction will put process A BO node,
> but in the meantime, process C is comming to allocate BO, whill directly get 
> node successfully, and do submission,
> process B will again wait for process C BO idle.
> d. repeat the above setps, process B could be delayed much more.
>
> later allocation must not be ahead of front in same place.

Again NAK to the whole approach.

At least with amdgpu the problem you described above never occurs
because evictions are pipelined operations. We could only block for
deleted regions to become free.

But independent of that incoming memory requests while we make room for
eviction are intended to be served first.

Changing that is certainly a no-go cause that would favor memory hungry
applications over small clients.

Regards,
Christian.

>
> Change-Id: I3daa892e50f82226c552cc008a29e55894a98f18
> Signed-off-by: Chunming Zhou <david1.z...@amd.com><mailto:david1.z...@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c| 69 
> +++--
>   include/drm/ttm/ttm_bo_api.h|  7 +
>   include/drm/ttm/ttm_bo_driver.h |  7 +
>   3 files changed, 80 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index d33a6bb742a1..558ec2cf465d 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -841,6 +841,58 @@ static int ttm_bo_add_move_fence(struct 
> ttm_buffer_object *bo,
>return 0;
>   }
>
> +static void ttm_man_init_waiter(struct ttm_bo_waiter *waiter,
> + struct ttm_buffer_object *bo,
> + const struct ttm_place *place)
> +{
> + waiter->tbo = bo;
> + memcpy((void *)>place, (void *)place, sizeof(*place));
> + INIT_LIST_HEAD(>list);
> +}
> +
> +static void ttm_man_add_waiter(struct ttm_mem_type_manager *man,
> +struct ttm_bo_waiter *waiter)
> +{
> + if (!waiter)
> + return;
> + spin_lock(>wait_lock);
> + list_add_tail(>list, >waiter_list);
> + spin_unlock(>wait_lock);
> +}
> +
> +static void ttm_man_del_waiter(struct ttm_mem_type_manager *man,
> +struct ttm_bo_waiter *waiter)
> +{
> + if (!waiter)
> + return;
> + spin_lock(>wait_lock);
> + if (!list_empty(>list))
> + list_del(>list);
> + spin_unlock(>wait_lock);
> + kfree(waiter);
> +}
> +
> +int ttm_man_check_bo

Re: [PATCH 1/2] [WIP]drm/ttm: add waiter list to prevent allocation not in order

2018-01-26 Thread Zhou, David(ChunMing)
You patch will prevent concurrent allocation, and will result in allocation 
performance drop much.


发自坚果 Pro

Christian K鰊ig <ckoenig.leichtzumer...@gmail.com> 于 2018年1月26日 下午9:04写道:

Attached is what you actually want to do cleanly implemented. But as I said 
this is a NO-GO.

Regards,
Christian.

Am 26.01.2018 um 13:43 schrieb Christian König:
After my investigation, this issue should be detect of TTM design self, which 
breaks scheduling balance.
Yeah, but again. This is indented design we can't change easily.

Regards,
Christian.

Am 26.01.2018 um 13:36 schrieb Zhou, David(ChunMing):
I am off work, so reply mail by phone, the format could not be text.

back to topic itself:
the problem indeed happen on amdgpu driver, someone reports me that application 
runs with two instances, the performance are different.
I also reproduced the issue with unit test(bo_eviction_test). They always think 
our scheduler isn't working as expected.

After my investigation, this issue should be detect of TTM design self, which 
breaks scheduling balance.

Further, if we run containers for our gpu, container A could run high score, 
container B runs low score with same benchmark.

So this is bug that we need fix.

Regards,
David Zhou


发自坚果 Pro

Christian K鰊ig 
<ckoenig.leichtzumer...@gmail.com><mailto:ckoenig.leichtzumer...@gmail.com> 于 
2018年1月26日 下午6:31写道:

Am 26.01.2018 um 11:22 schrieb Chunming Zhou:
> there is a scheduling balance issue about get node like:
> a. process A allocates full memory and use it for submission.
> b. process B tries to allocates memory, will wait for process A BO idle in 
> eviction.
> c. process A completes the job, process B eviction will put process A BO node,
> but in the meantime, process C is comming to allocate BO, whill directly get 
> node successfully, and do submission,
> process B will again wait for process C BO idle.
> d. repeat the above setps, process B could be delayed much more.
>
> later allocation must not be ahead of front in same place.

Again NAK to the whole approach.

At least with amdgpu the problem you described above never occurs
because evictions are pipelined operations. We could only block for
deleted regions to become free.

But independent of that incoming memory requests while we make room for
eviction are intended to be served first.

Changing that is certainly a no-go cause that would favor memory hungry
applications over small clients.

Regards,
Christian.

>
> Change-Id: I3daa892e50f82226c552cc008a29e55894a98f18
> Signed-off-by: Chunming Zhou <david1.z...@amd.com><mailto:david1.z...@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c| 69 
> +++--
>   include/drm/ttm/ttm_bo_api.h|  7 +
>   include/drm/ttm/ttm_bo_driver.h |  7 +
>   3 files changed, 80 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index d33a6bb742a1..558ec2cf465d 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -841,6 +841,58 @@ static int ttm_bo_add_move_fence(struct 
> ttm_buffer_object *bo,
>return 0;
>   }
>
> +static void ttm_man_init_waiter(struct ttm_bo_waiter *waiter,
> + struct ttm_buffer_object *bo,
> + const struct ttm_place *place)
> +{
> + waiter->tbo = bo;
> + memcpy((void *)>place, (void *)place, sizeof(*place));
> + INIT_LIST_HEAD(>list);
> +}
> +
> +static void ttm_man_add_waiter(struct ttm_mem_type_manager *man,
> +struct ttm_bo_waiter *waiter)
> +{
> + if (!waiter)
> + return;
> + spin_lock(>wait_lock);
> + list_add_tail(>list, >waiter_list);
> + spin_unlock(>wait_lock);
> +}
> +
> +static void ttm_man_del_waiter(struct ttm_mem_type_manager *man,
> +struct ttm_bo_waiter *waiter)
> +{
> + if (!waiter)
> + return;
> + spin_lock(>wait_lock);
> + if (!list_empty(>list))
> + list_del(>list);
> + spin_unlock(>wait_lock);
> + kfree(waiter);
> +}
> +
> +int ttm_man_check_bo(struct ttm_mem_type_manager *man,
> +  struct ttm_buffer_object *bo,
> +  const struct ttm_place *place)
> +{
> + struct ttm_bo_waiter *waiter, *tmp;
> +
> + spin_lock(>wait_lock);
> + list_for_each_entry_safe(waiter, tmp, >waiter_list, list) {
> + if ((bo != waiter->tbo) &&
> + ((place->fpfn >= waiter->place.fpfn &&
> +   place->fpfn <= waiter->place.lpfn) ||
> +  (place->lpfn <= waiter->place.lpfn && place->lpfn >

Re: [PATCH 1/2] [WIP]drm/ttm: add waiter list to prevent allocation not in order

2018-01-26 Thread Zhou, David(ChunMing)
I am off work, so reply mail by phone, the format could not be text.

back to topic itself:
the problem indeed happen on amdgpu driver, someone reports me that application 
runs with two instances, the performance are different.
I also reproduced the issue with unit test(bo_eviction_test). They always think 
our scheduler isn't working as expected.

After my investigation, this issue should be detect of TTM design self, which 
breaks scheduling balance.

Further, if we run containers for our gpu, container A could run high score, 
container B runs low score with same benchmark.

So this is bug that we need fix.

Regards,
David Zhou


发自坚果 Pro

Christian K�nig  于 2018年1月26日 下午6:31写道:

Am 26.01.2018 um 11:22 schrieb Chunming Zhou:
> there is a scheduling balance issue about get node like:
> a. process A allocates full memory and use it for submission.
> b. process B tries to allocates memory, will wait for process A BO idle in 
> eviction.
> c. process A completes the job, process B eviction will put process A BO node,
> but in the meantime, process C is comming to allocate BO, whill directly get 
> node successfully, and do submission,
> process B will again wait for process C BO idle.
> d. repeat the above setps, process B could be delayed much more.
>
> later allocation must not be ahead of front in same place.

Again NAK to the whole approach.

At least with amdgpu the problem you described above never occurs
because evictions are pipelined operations. We could only block for
deleted regions to become free.

But independent of that incoming memory requests while we make room for
eviction are intended to be served first.

Changing that is certainly a no-go cause that would favor memory hungry
applications over small clients.

Regards,
Christian.

>
> Change-Id: I3daa892e50f82226c552cc008a29e55894a98f18
> Signed-off-by: Chunming Zhou 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c| 69 
> +++--
>   include/drm/ttm/ttm_bo_api.h|  7 +
>   include/drm/ttm/ttm_bo_driver.h |  7 +
>   3 files changed, 80 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index d33a6bb742a1..558ec2cf465d 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -841,6 +841,58 @@ static int ttm_bo_add_move_fence(struct 
> ttm_buffer_object *bo,
>return 0;
>   }
>
> +static void ttm_man_init_waiter(struct ttm_bo_waiter *waiter,
> + struct ttm_buffer_object *bo,
> + const struct ttm_place *place)
> +{
> + waiter->tbo = bo;
> + memcpy((void *)>place, (void *)place, sizeof(*place));
> + INIT_LIST_HEAD(>list);
> +}
> +
> +static void ttm_man_add_waiter(struct ttm_mem_type_manager *man,
> +struct ttm_bo_waiter *waiter)
> +{
> + if (!waiter)
> + return;
> + spin_lock(>wait_lock);
> + list_add_tail(>list, >waiter_list);
> + spin_unlock(>wait_lock);
> +}
> +
> +static void ttm_man_del_waiter(struct ttm_mem_type_manager *man,
> +struct ttm_bo_waiter *waiter)
> +{
> + if (!waiter)
> + return;
> + spin_lock(>wait_lock);
> + if (!list_empty(>list))
> + list_del(>list);
> + spin_unlock(>wait_lock);
> + kfree(waiter);
> +}
> +
> +int ttm_man_check_bo(struct ttm_mem_type_manager *man,
> +  struct ttm_buffer_object *bo,
> +  const struct ttm_place *place)
> +{
> + struct ttm_bo_waiter *waiter, *tmp;
> +
> + spin_lock(>wait_lock);
> + list_for_each_entry_safe(waiter, tmp, >waiter_list, list) {
> + if ((bo != waiter->tbo) &&
> + ((place->fpfn >= waiter->place.fpfn &&
> +   place->fpfn <= waiter->place.lpfn) ||
> +  (place->lpfn <= waiter->place.lpfn && place->lpfn >=
> +   waiter->place.fpfn)))
> + goto later_bo;
> + }
> + spin_unlock(>wait_lock);
> + return true;
> +later_bo:
> + spin_unlock(>wait_lock);
> + return false;
> +}
>   /**
>* Repeatedly evict memory from the LRU for @mem_type until we create enough
>* space, or we've evicted everything and there isn't enough space.
> @@ -853,17 +905,26 @@ static int ttm_bo_mem_force_space(struct 
> ttm_buffer_object *bo,
>   {
>struct ttm_bo_device *bdev = bo->bdev;
>struct ttm_mem_type_manager *man = >man[mem_type];
> + struct ttm_bo_waiter waiter;
>int ret;
>
> + ttm_man_init_waiter(, bo, place);
> + ttm_man_add_waiter(man, );
>do {
>ret = (*man->func->get_node)(man, bo, place, mem);
> - if (unlikely(ret != 0))
> + if (unlikely(ret != 0)) {
> + ttm_man_del_waiter(man, );
>return ret;
> - if (mem->mm_node)
> + 

Re: [PATCH libdrm] drm: fix return value

2018-01-16 Thread Zhou, David(ChunMing)
Can your guys help me push it and last vamgr patches to upstream?
My new count request for libdrm still is under pending.

Thanks,
David Zhou


发自坚果 Pro

Christian K鰊ig  于 2018年1月16日 下午4:56写道:

Apart from that a good catch and Reviewed-by: Christian König
.

Regards,
Christian.

Am 16.01.2018 um 09:49 schrieb Michel Dänzer:
> Moving from the amd-gfx list to dri-devel, since this isn't amdgpu specific.
>
>
> On 2018-01-16 03:54 AM, Chunming Zhou wrote:
>> otherwise -ETIME is missed.
>>
>> Change-Id: Ic5580a74d8027cc468c6135f8cf2f81817993423
>> Signed-off-by: Chunming Zhou 
>> ---
>>   xf86drm.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/xf86drm.c b/xf86drm.c
>> index 8a327170..3881bd9f 100644
>> --- a/xf86drm.c
>> +++ b/xf86drm.c
>> @@ -4241,7 +4241,7 @@ int drmSyncobjWait(int fd, uint32_t *handles, unsigned 
>> num_handles,
>>
>>   ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_WAIT, );
>>   if (ret < 0)
>> -return ret;
>> +return -errno;
>>
>>   if (first_signaled)
>>   *first_signaled = args.first_signaled;
>>
>

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/3] drm/syncobj: extract two helpers from drm_syncobj_create

2017-09-30 Thread Zhou, David(ChunMing)
Could you test and review it? On hand, I have no env.

Regards,
David Zhou


发自坚果 Pro

Marek Ol?醟  于 2017年9月30日 下午11:56写道:

The idea sounds good.

Marek

On Sat, Sep 30, 2017 at 3:55 AM, Chunming Zhou  wrote:
> My mean is like the attached, I revert part of yours.
>
> Regards,
>
> David zhou
>
>
>
> On 2017年09月29日 22:15, Marek Olšák wrote:
>>
>> On Fri, Sep 29, 2017 at 4:13 PM, Marek Olšák  wrote:
>>>
>>> On Fri, Sep 29, 2017 at 4:44 AM, Chunming Zhou  wrote:


 On 2017年09月13日 04:42, Marek Olšák wrote:
>
> From: Marek Olšák 
>
> For amdgpu.
>
> drm_syncobj_create is renamed to drm_syncobj_create_as_handle, and new
> helpers drm_syncobj_create and drm_syncobj_get_handle are added.
>
> Signed-off-by: Marek Olšák 
> ---
>drivers/gpu/drm/drm_syncobj.c | 49
> +++
>include/drm/drm_syncobj.h |  4 
>2 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c
> b/drivers/gpu/drm/drm_syncobj.c
> index 0422b8c..0bb1741 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -262,8 +262,14 @@ void drm_syncobj_free(struct kref *kref)
>}
>EXPORT_SYMBOL(drm_syncobj_free);
>-static int drm_syncobj_create(struct drm_file *file_private,
> - u32 *handle, uint32_t flags)

 You can add a new parameter for passing dma fence, then in patch3, you
 can
 directly use it for AMDGPU_FENCE_TO HANDLE_GET_SYNCOBJ.

 otherwise the set looks good to me.
>>>
>>> Sorry I just pushed this.
>>
>> Actually, you commented on a deleted line. The function already has
>> dma_fence among the parameters.
>>
>> Marek
>
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-13 Thread Zhou, David(ChunMing)
For android using mesa instance, egl draw will dequeue an android buffer, after 
egl draw, the buffer will back to android bufffer queue, but need append a 
syncfile fd. If getting syncfile fd for every egl draw always needs several 
syncobj ioctls, the io overhead isn't small. But if we directly return syncfile 
when egl draw CS,  isn't it better?


发自坚果 Pro

Christian K鰊ig <deathsim...@vodafone.de> 于 2017年9月13日 下午9:04写道:

syncfile indeed be a good way to pass fence for user space, which already is 
proved in Android and is upstreamed.
Not really. syncfile needs a file descriptor for each handle it generates, 
that's quite a show stopper if you want to use it in general.

Android syncfile are meant to be used for inter process sharing, but as command 
submission sequence number they are not such a good fit.

Mareks approach looks really good to me and we should follow that direction 
further.

Regards,
Christian.

Am 13.09.2017 um 14:25 schrieb Zhou, David(ChunMing):
Yes, to be comptibility, I kept both seq_no and syncfile fd in the patch set, 
you can take a look, which really is simpler and effective way.

syncfile indeed be a good way to pass fence for user space, which already is 
proved in Android and is upstreamed.

Regards,
David Zhou


发自坚果 Pro

Marek Ol?醟 <mar...@gmail.com><mailto:mar...@gmail.com> 于 2017年9月13日 下午8:06写道:

On Wed, Sep 13, 2017 at 1:32 PM, Zhou, David(ChunMing)
<david1.z...@amd.com><mailto:david1.z...@amd.com> wrote:
> Could you describe how difficult to directly use CS syncfile fd in Mesa
> compared with concerting CS seq to syncfile fd via several syncobj ioctls?

It just simplifies things. Mesa primarily uses seq_no-based fences and
will continue to use them. We can't remove the seq_no fence code
because we have to keep Mesa compatible with older kernels.

The only possibilities are:
- Mesa gets both seq_no and sync_file from CS.
- Mesa only gets seq_no from CS.

I decided to take the simpler option. I don't know if there is a perf
difference between CS returning a sync_file and using a separate
ioctl, but it's probably insignificant since we already call 3 ioctls
per IB submission (BO list create+destroy, submit).

Marek

>
> Regards,
> David Zhou
>
> 发自坚果 Pro
>
> Marek Ol?醟 <mar...@gmail.com><mailto:mar...@gmail.com> 于 2017年9月13日 下午6:11写道:
>
> On Wed, Sep 13, 2017 at 5:03 AM, zhoucm1 
> <david1.z...@amd.com><mailto:david1.z...@amd.com> wrote:
>> Hi Marek,
>>
>> You're doing same things with me, see my "introduce syncfile as fence
>> reuturn" patch set, which makes things more simple, we just need to
>> directly
>> return syncfile fd to UMD when CS, then the fence UMD get will be always
>> syncfile fd, UMD don't need to construct ip_type/ip_instance/ctx_id/ring
>> any
>> more, which also can pass to dependency and syncobj as well.
>
> For simpler Mesa code, Mesa won't get a sync file from the CS ioctl.
>
> Marek



___
amd-gfx mailing list
amd-...@lists.freedesktop.org<mailto:amd-...@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-13 Thread Zhou, David(ChunMing)
Yes, to be comptibility, I kept both seq_no and syncfile fd in the patch set, 
you can take a look, which really is simpler and effective way.

syncfile indeed be a good way to pass fence for user space, which already is 
proved in Android and is upstreamed.

Regards,
David Zhou


发自坚果 Pro

Marek Ol?醟 <mar...@gmail.com> 于 2017年9月13日 下午8:06写道:

On Wed, Sep 13, 2017 at 1:32 PM, Zhou, David(ChunMing)
<david1.z...@amd.com> wrote:
> Could you describe how difficult to directly use CS syncfile fd in Mesa
> compared with concerting CS seq to syncfile fd via several syncobj ioctls?

It just simplifies things. Mesa primarily uses seq_no-based fences and
will continue to use them. We can't remove the seq_no fence code
because we have to keep Mesa compatible with older kernels.

The only possibilities are:
- Mesa gets both seq_no and sync_file from CS.
- Mesa only gets seq_no from CS.

I decided to take the simpler option. I don't know if there is a perf
difference between CS returning a sync_file and using a separate
ioctl, but it's probably insignificant since we already call 3 ioctls
per IB submission (BO list create+destroy, submit).

Marek

>
> Regards,
> David Zhou
>
> 发自坚果 Pro
>
> Marek Ol?醟 <mar...@gmail.com> 于 2017年9月13日 下午6:11写道:
>
> On Wed, Sep 13, 2017 at 5:03 AM, zhoucm1 <david1.z...@amd.com> wrote:
>> Hi Marek,
>>
>> You're doing same things with me, see my "introduce syncfile as fence
>> reuturn" patch set, which makes things more simple, we just need to
>> directly
>> return syncfile fd to UMD when CS, then the fence UMD get will be always
>> syncfile fd, UMD don't need to construct ip_type/ip_instance/ctx_id/ring
>> any
>> more, which also can pass to dependency and syncobj as well.
>
> For simpler Mesa code, Mesa won't get a sync file from the CS ioctl.
>
> Marek
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/3] drm/amdgpu: add FENCE_TO_HANDLE ioctl that returns syncobj or sync_file

2017-09-13 Thread Zhou, David(ChunMing)
Could you describe how difficult to directly use CS syncfile fd in Mesa 
compared with concerting CS seq to syncfile fd via several syncobj ioctls?

Regards,
David Zhou


发自坚果 Pro

Marek Ol?醟  于 2017年9月13日 下午6:11写道:

On Wed, Sep 13, 2017 at 5:03 AM, zhoucm1  wrote:
> Hi Marek,
>
> You're doing same things with me, see my "introduce syncfile as fence
> reuturn" patch set, which makes things more simple, we just need to directly
> return syncfile fd to UMD when CS, then the fence UMD get will be always
> syncfile fd, UMD don't need to construct ip_type/ip_instance/ctx_id/ring any
> more, which also can pass to dependency and syncobj as well.

For simpler Mesa code, Mesa won't get a sync file from the CS ioctl.

Marek
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 4/4] drm/amdgpu: resize VRAM BAR for CPU access

2017-03-15 Thread Zhou, David(ChunMing)
Does that means we don't need invisible vram later?

David

-Original Message-
From: dri-devel [mailto:dri-devel-boun...@lists.freedesktop.org] On Behalf Of 
Christian K?nig
Sent: Wednesday, March 15, 2017 3:38 PM
To: Ayyappa Ch 
Cc: linux-...@vger.kernel.org; linux-ker...@vger.kernel.org; 
amd-...@lists.freedesktop.org; platform-driver-...@vger.kernel.org; 
helg...@kernel.org; dri-devel@lists.freedesktop.org
Subject: Re: [PATCH 4/4] drm/amdgpu: resize VRAM BAR for CPU access

Carizzo is an APU and resizing BARs isn't needed nor supported there. 
The CPU can access the full stolen VRAM directly on that hardware.

As far as I know ASICs with support for this are Tonga, Fiji and all Polaris 
variants.

Christian.

Am 15.03.2017 um 08:23 schrieb Ayyappa Ch:
> Is it possible on Carrizo asics? Or only supports on newer asics?
>
> On Mon, Mar 13, 2017 at 6:11 PM, Christian König 
>  wrote:
>> From: Christian König 
>>
>> Try to resize BAR0 to let CPU access all of VRAM.
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 
>> +
>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  |  8 +---
>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  |  8 +---
>>   4 files changed, 40 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 3b81ded..905ded9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1719,6 +1719,7 @@ uint64_t amdgpu_ttm_tt_pte_flags(struct amdgpu_device 
>> *adev, struct ttm_tt *ttm,
>>   struct ttm_mem_reg *mem);
>>   void amdgpu_vram_location(struct amdgpu_device *adev, struct amdgpu_mc 
>> *mc, u64 base);
>>   void amdgpu_gtt_location(struct amdgpu_device *adev, struct 
>> amdgpu_mc *mc);
>> +void amdgpu_resize_bar0(struct amdgpu_device *adev);
>>   void amdgpu_ttm_set_active_vram_size(struct amdgpu_device *adev, u64 size);
>>   int amdgpu_ttm_init(struct amdgpu_device *adev);
>>   void amdgpu_ttm_fini(struct amdgpu_device *adev); diff --git 
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 118f4e6..92955fe 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -692,6 +692,35 @@ void amdgpu_gtt_location(struct amdgpu_device *adev, 
>> struct amdgpu_mc *mc)
>>  mc->gtt_size >> 20, mc->gtt_start, mc->gtt_end);
>>   }
>>
>> +/**
>> + * amdgpu_resize_bar0 - try to resize BAR0
>> + *
>> + * @adev: amdgpu_device pointer
>> + *
>> + * Try to resize BAR0 to make all VRAM CPU accessible.
>> + */
>> +void amdgpu_resize_bar0(struct amdgpu_device *adev) {
>> +   u32 size = max(ilog2(adev->mc.real_vram_size - 1) + 1, 20) - 20;
>> +   int r;
>> +
>> +   r = pci_resize_resource(adev->pdev, 0, size);
>> +
>> +   if (r == -ENOTSUPP) {
>> +   /* The hardware don't support the extension. */
>> +   return;
>> +
>> +   } else if (r == -ENOSPC) {
>> +   DRM_INFO("Not enoigh PCI address space for a large BAR.");
>> +   } else if (r) {
>> +   DRM_ERROR("Problem resizing BAR0 (%d).", r);
>> +   }
>> +
>> +   /* Reinit the doorbell mapping, it is most likely moved as well */
>> +   amdgpu_doorbell_fini(adev);
>> +   BUG_ON(amdgpu_doorbell_init(adev));
>> +}
>> +
>>   /*
>>* GPU helpers function.
>>*/
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
>> index dc9b6d6..36a7aa5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
>> @@ -367,13 +367,15 @@ static int gmc_v7_0_mc_init(struct amdgpu_device *adev)
>>  break;
>>  }
>>  adev->mc.vram_width = numchan * chansize;
>> -   /* Could aper size report 0 ? */
>> -   adev->mc.aper_base = pci_resource_start(adev->pdev, 0);
>> -   adev->mc.aper_size = pci_resource_len(adev->pdev, 0);
>>  /* size in MB on si */
>>  adev->mc.mc_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 
>> 1024ULL;
>>  adev->mc.real_vram_size = RREG32(mmCONFIG_MEMSIZE) * 1024ULL 
>> * 1024ULL;
>>
>> +   if (!(adev->flags & AMD_IS_APU))
>> +   amdgpu_resize_bar0(adev);
>> +   adev->mc.aper_base = pci_resource_start(adev->pdev, 0);
>> +   adev->mc.aper_size = pci_resource_len(adev->pdev, 0);
>> +
>>   #ifdef CONFIG_X86_64
>>  if (adev->flags & AMD_IS_APU) {
>>  adev->mc.aper_base = 
>> ((u64)RREG32(mmMC_VM_FB_OFFSET)) << 22; diff --git 
>> a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> index c087b00..7761ad3 100644
>> --- 

[PATCH] drm/amd/amdgpu: fix spelling mistake: "comleted" -> "completed"

2016-12-30 Thread Zhou, David(ChunMing)
+amd-gfx, patch is Reviewed-by: Chunming Zhou 

-Original Message-
From: Colin King [mailto:colin.k...@canonical.com] 
Sent: Thursday, December 29, 2016 11:47 PM
To: Deucher, Alexander ; Koenig, Christian 
; David Airlie ; Zhou, 
David(ChunMing) ; StDenis, Tom ; Liu, Monk ; dri-devel at lists.freedesktop.org
Cc: linux-kernel at vger.kernel.org
Subject: [PATCH] drm/amd/amdgpu: fix spelling mistake: "comleted" -> "completed"

From: Colin Ian King <colin.k...@canonical.com>

trivial fix to spelling mistake in WARN message

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 60bd4af..9ca3167 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2335,7 +2335,7 @@ int amdgpu_gpu_reset(struct amdgpu_device *adev)
if (fence) {
r = dma_fence_wait(fence, false);
if (r) {
-   WARN(r, "recovery from shadow 
isn't comleted\n");
+   WARN(r, "recovery from shadow 
isn't completed\n");
break;
}
}
@@ -2347,7 +2347,7 @@ int amdgpu_gpu_reset(struct amdgpu_device *adev)
if (fence) {
r = dma_fence_wait(fence, false);
if (r)
-   WARN(r, "recovery from shadow isn't 
comleted\n");
+   WARN(r, "recovery from shadow isn't 
completed\n");
}
dma_fence_put(fence);
}
-- 
2.10.2



Fw: add find_bo_from_cpu_mapping interface

2015-12-03 Thread Zhou, David(ChunMing)



UMD can find the bo by the cpu address of BO. Any comments?


Regards,

David Zhou
-- next part --
An HTML attachment was scrubbed...
URL: 

-- next part --
A non-text attachment was scrubbed...
Name: 0001-drm-amdgpu-return-bo-itself-if-userptr-is-cpu-addr-o.patch
Type: text/x-patch
Size: 5767 bytes
Desc: 0001-drm-amdgpu-return-bo-itself-if-userptr-is-cpu-addr-o.patch
URL: 

-- next part --
A non-text attachment was scrubbed...
Name: 0001-amdgpu-add-bo-handle-to-hash-table-when-cpu-mapping.patch
Type: text/x-patch
Size: 1018 bytes
Desc: 0001-amdgpu-add-bo-handle-to-hash-table-when-cpu-mapping.patch
URL: 

-- next part --
A non-text attachment was scrubbed...
Name: 0002-amdgpu-add-amdgpu_find_bo_by_cpu_mapping-interface.patch
Type: text/x-patch
Size: 4828 bytes
Desc: 0002-amdgpu-add-amdgpu_find_bo_by_cpu_mapping-interface.patch
URL: 



[PATCH] drm/amdgpu: don't oops on failure to load

2015-11-03 Thread Zhou, David(ChunMing)
Thanks for catching. Reviewed-by: Chunming Zhou 

Regards,
David Zhou

> -Original Message-
> From: dri-devel [mailto:dri-devel-bounces at lists.freedesktop.org] On Behalf
> Of Dave Airlie
> Sent: Tuesday, November 03, 2015 7:35 AM
> To: dri-devel at lists.freedesktop.org
> Subject: [PATCH] drm/amdgpu: don't oops on failure to load
> 
> From: Dave Airlie 
> 
> In two places amdgpu tries to tear down something it hasn't initalised when
> failing. This is what happens when you enable experimental support on topaz
> which then fails in ring init.
> 
> This patch allows it to fail cleanly.
> 
> Signed-off-by: Dave Airlie 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 3 +++
>  drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 3 ++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index e0b80cc..fec65f0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -69,6 +69,9 @@ void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
>   struct amdgpu_device *adev = ctx->adev;
>   unsigned i, j;
> 
> + if (!adev)
> + return;
> +
>   for (i = 0; i < AMDGPU_MAX_RINGS; ++i)
>   for (j = 0; j < AMDGPU_CTX_MAX_CS_PENDING; ++j)
>   fence_put(ctx->rings[i].fences[j]);
> diff --git a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> index 7fa1d7a..d3b9eb7 100644
> --- a/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> +++ b/drivers/gpu/drm/amd/scheduler/gpu_scheduler.c
> @@ -462,5 +462,6 @@ int amd_sched_init(struct amd_gpu_scheduler
> *sched,
>   */
>  void amd_sched_fini(struct amd_gpu_scheduler *sched)  {
> - kthread_stop(sched->thread);
> + if (sched->thread)
> + kthread_stop(sched->thread);
>  }
> --
> 2.5.0
> 
> ___
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


drm/amdgpu: implement cgs gpu memory callbacks

2015-08-25 Thread Zhou, David(ChunMing)


> -Original Message-
> From: Dan Carpenter [mailto:dan.carpenter at oracle.com]
> Sent: Tuesday, August 25, 2015 1:51 PM
> To: Zhou, David(ChunMing)
> Cc: dri-devel at lists.freedesktop.org
> Subject: Re: drm/amdgpu: implement cgs gpu memory callbacks
> 
> On Tue, Aug 25, 2015 at 02:07:21AM +0000, Zhou, David(ChunMing) wrote:
> > >and can the shift actually
> > > wrap?
> > [DZ] of course, adding shift wrap is better. fpfn/lpfn is page number, so
> < 
> By "Shift wrap" I meant how you shift beyond the end of a 32bit number and it
> truncates, or if it's signed then it can wrap around (it's undefined 
> actually, I
> probably should use a different word?).
> 
> int main(void)
> {
>   unsigned long long a = 0xf000U << 12;
>   unsigned long long b = 0xf000ULL << 12;
> 
>   printf("%llx %llx\n", a, b);
> 
>   return 0;
> }
[DZ] oh, I understand your mean, with previous PAGE_SHIFT, I think it should be 
like 'b' in your example without truncates.

Thanks,
 David Zhou
> 
> regards,
> dan carpenter


drm/amdgpu: implement cgs gpu memory callbacks

2015-08-25 Thread Zhou, David(ChunMing)
Inline...[DZ]

> -Original Message-
> From: Dan Carpenter [mailto:dan.carpenter at oracle.com]
> Sent: Tuesday, August 25, 2015 3:51 AM
> To: Zhou, David(ChunMing)
> Cc: dri-devel at lists.freedesktop.org
> Subject: Re: drm/amdgpu: implement cgs gpu memory callbacks
> 
> On Mon, Aug 24, 2015 at 07:09:15AM +0000, Zhou, David(ChunMing) wrote:
> > Hi Dan,
> > Thanks for figuring out that.
> > >274  min_offset = obj->placements[0].fpfn << PAGE_SHIFT;
> > >275  max_offset = obj->placements[0].lpfn << PAGE_SHIFT;
> > Maybe should be:
> > min_offset = obj->placements[0].fpfn;
> > min_offset <<= PAGE_SHIFT;
> > max_offset = obj->placements[0].lpfn;
> > max_offset <<= PAGE_SHIFT;
> 
> 
> It's probably just simpler to be:
> 
>   min_offset = (u64)obj->placements[0].fpfn << PAGE_SHIFT;
>   max_offset = (u64)obj->placements[0].lpfn << PAGE_SHIFT;
> 
> But the larger questions aer why is min_offset a u64,
[DZ] max/min_offset is memory size.

>and can the shift actually
> wrap? 
[DZ] of course, adding shift wrap is better. fpfn/lpfn is page number, so 
< I'm just looking at static checker warnings, and I'm not very familiar
> with this code so I don't know the answers.
> 
> regards,
> dan carpenter



drm/amdgpu: implement cgs gpu memory callbacks

2015-08-24 Thread Zhou, David(ChunMing)
Hi Dan,
Thanks for figuring out that. 
>274  min_offset = obj->placements[0].fpfn << PAGE_SHIFT;
>275  max_offset = obj->placements[0].lpfn << PAGE_SHIFT;
Maybe should be:
min_offset = obj->placements[0].fpfn;
min_offset <<= PAGE_SHIFT;
max_offset = obj->placements[0].lpfn;
max_offset <<= PAGE_SHIFT;

Regards,
  David Zhou

> -Original Message-
> From: Dan Carpenter [mailto:dan.carpenter at oracle.com]
> Sent: Saturday, August 22, 2015 12:24 AM
> To: Zhou, David(ChunMing)
> Cc: dri-devel at lists.freedesktop.org
> Subject: re: drm/amdgpu: implement cgs gpu memory callbacks
> 
> Hello Chunming Zhou,
> 
> The patch 57ff96cf471a: "drm/amdgpu: implement cgs gpu memory callbacks"
> from Apr 24, 2015, leads to the following static checker
> warning:
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c:274
> amdgpu_cgs_gmap_gpu_mem()
>   warn: should 'obj->placements[0]->fpfn << 12' be a 64 bit type?
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c:275
> amdgpu_cgs_gmap_gpu_mem()
>   warn: should 'obj->placements[0]->lpfn << 12' be a 64 bit type?
> 
> 
> drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c
>265  static int amdgpu_cgs_gmap_gpu_mem(void *cgs_device, cgs_handle_t
> handle,
>266 uint64_t *mcaddr)
>267  {
>268  int r;
>269  u64 min_offset, max_offset;
>270  struct amdgpu_bo *obj = (struct amdgpu_bo *)handle;
>271
>272  WARN_ON_ONCE(obj->placement.num_placement > 1);
>273
>274  min_offset = obj->placements[0].fpfn << PAGE_SHIFT;
>275  max_offset = obj->placements[0].lpfn << PAGE_SHIFT;
> 
> Both of these.
> 
>276
>277  r = amdgpu_bo_reserve(obj, false);
>278  if (unlikely(r != 0))
>279  return r;
>280  r = amdgpu_bo_pin_restricted(obj, AMDGPU_GEM_DOMAIN_GTT,
>281   min_offset, max_offset, mcaddr);
>282  amdgpu_bo_unreserve(obj);
>283  return r;
>284  }
> 
> There are actually a few of these warnings which were less clear whether the
> warning was correct or not so I didn't send them.
> 
> drivers/gpu/drm/amd/amdgpu/cz_smc.c:463
> cz_smu_populate_single_firmware_entry() warn: should '((header->jt_offset))
> << 2' be a 64 bit type?
> drivers/gpu/drm/amd/amdgpu/fiji_smc.c:404
> fiji_smu_populate_single_firmware_entry() warn: should '((header->jt_offset))
> << 2' be a 64 bit type?
> drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c:724
> amdgpu_cgs_get_firmware_info() warn: should '((header->jt_offset)) << 2' be a
> 64 bit type?
> drivers/gpu/drm/amd/amdgpu/tonga_smc.c:406
> tonga_smu_populate_single_firmware_entry() warn: should '((header-
> >jt_offset)) << 2' be a 64 bit type?
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:615 amdgpu_gem_op_ioctl()
> warn: should 'robj->tbo.mem.page_alignment << 12' be a 64 bit type?
> 
> regards,
> dan carpenter


test

2015-07-16 Thread Zhou, David(ChunMing)
test
-- next part --
An HTML attachment was scrubbed...
URL: