Re: [PATCH] drm/amdgpu: remove distinction between explicit and implicit sync (v2)

2020-06-11 Thread Chunming Zhou
I didn't check the patch details, if it is for existing implicit sync of 
shared buffer, feel free go ahead.


But if you add some description for its usage, that will be more clear 
to others.


-David

在 2020/6/11 15:19, Marek Olšák 写道:

Hi David,

Explicit sync has nothing to do with this. This is for implicit sync, 
which is required by DRI3. This fix allows removing existing 
inefficiencies from drivers, so it's a good thing.


Marek

On Wed., Jun. 10, 2020, 03:56 Chunming Zhou, <mailto:zhou...@amd.com>> wrote:



在 2020/6/10 15:41, Christian König 写道:

That's true, but for now we are stuck with the implicit sync for
quite a number of use cases.

My problem is rather that we already tried this and it backfired
immediately.

I do remember that it was your patch who introduced the pipeline
sync flag handling and I warned that this could be problematic.
You then came back with a QA result saying that this is indeed
causing a huge performance drop in one test case and we need to
do something else. Together we then came up with the different
handling between implicit and explicit sync.


Isn't pipeline sync flag to fix some issue because of parralel
execution between jobs in one pipeline?  I really don't have this
memory in mind why that's realted to this, Or do you mean extra
sync hides many other potential issues?

Anyway, when I go through Vulkan WSI code, the synchronization
isn't so smooth between OS window system. And when I saw Jason
drives explicit sync through the whole Linux ecosystem like
Android window system does, I feel that's really a good direction.

-David



But I can't find that stupid mail thread any more. I knew that it
was a couple of years ago when we started with the explicit sync
for Vulkan.

Christian.

Am 10.06.20 um 08:29 schrieb Zhou, David(ChunMing):


[AMD Official Use Only - Internal Distribution Only]

Not sue if this is right direction, I think usermode wants all
synchronizations to be explicit. Implicit sync often confuses
people who don’t know its history. I remember Jason from Intel
 is driving explicit synchronization through the Linux
ecosystem, which even removes implicit sync of shared buffer.

-David

*From:* amd-gfx 
<mailto:amd-gfx-boun...@lists.freedesktop.org> *On Behalf Of
*Marek Olšák
*Sent:* Tuesday, June 9, 2020 6:58 PM
*To:* amd-gfx mailing list 
<mailto:amd-gfx@lists.freedesktop.org>
*Subject:* [PATCH] drm/amdgpu: remove distinction between
explicit and implicit sync (v2)

Hi,

This enables a full pipeline sync for implicit sync. It's
Christian's patch with the driver version bumped. With this,
user mode drivers don't have to wait for idle at the end of gfx IBs.

Any concerns?

Thanks,

Marek


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  <mailto:amd-gfx@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx  
<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx=02%7C01%7CDavid1.Zhou%40amd.com%7C0d3096fc043f4443f14e08d80dd7c674%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637274567683552668=xIHDswGRsdCP%2BE7MRI4nKXdoMgV2LBzFPP46zGpQusk%3D=0>


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: remove distinction between explicit and implicit sync (v2)

2020-06-10 Thread Chunming Zhou


在 2020/6/10 15:41, Christian König 写道:
That's true, but for now we are stuck with the implicit sync for quite 
a number of use cases.


My problem is rather that we already tried this and it backfired 
immediately.


I do remember that it was your patch who introduced the pipeline sync 
flag handling and I warned that this could be problematic. You then 
came back with a QA result saying that this is indeed causing a huge 
performance drop in one test case and we need to do something else. 
Together we then came up with the different handling between implicit 
and explicit sync.


Isn't pipeline sync flag to fix some issue because of parralel execution 
between jobs in one pipeline?  I really don't have this memory in mind 
why that's realted to this, Or do you mean extra sync hides many other 
potential issues?


Anyway, when I go through Vulkan WSI code, the synchronization isn't so 
smooth between OS window system. And when I saw Jason drives explicit 
sync through the whole Linux ecosystem like Android window system does, 
I feel that's really a good direction.


-David



But I can't find that stupid mail thread any more. I knew that it was 
a couple of years ago when we started with the explicit sync for Vulkan.


Christian.

Am 10.06.20 um 08:29 schrieb Zhou, David(ChunMing):


[AMD Official Use Only - Internal Distribution Only]

Not sue if this is right direction, I think usermode wants all 
synchronizations to be explicit. Implicit sync often confuses people 
who don’t know its history. I remember Jason from Intel  is driving 
explicit synchronization through the Linux ecosystem, which even 
removes implicit sync of shared buffer.


-David

*From:* amd-gfx  *On Behalf Of 
*Marek Olšák

*Sent:* Tuesday, June 9, 2020 6:58 PM
*To:* amd-gfx mailing list 
*Subject:* [PATCH] drm/amdgpu: remove distinction between explicit 
and implicit sync (v2)


Hi,

This enables a full pipeline sync for implicit sync. It's Christian's 
patch with the driver version bumped. With this, user mode drivers 
don't have to wait for idle at the end of gfx IBs.


Any concerns?

Thanks,

Marek


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] MAINTAINERS: Remove me from amdgpu maintainers

2020-05-06 Thread Chunming Zhou
Glad to spend time on kernel driver in past years.
I've moved to new focus in umd and couldn't commit
enough time to discussions.

Signed-off-by: Chunming Zhou 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 938316092634..4ca508bd4c9e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14066,7 +14066,6 @@ F:  drivers/net/wireless/quantenna
 RADEON and AMDGPU DRM DRIVERS
 M: Alex Deucher 
 M: Christian König 
-M: David (ChunMing) Zhou 
 L: amd-gfx@lists.freedesktop.org
 S: Supported
 T: git git://people.freedesktop.org/~agd5f/linux
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to compute IBs too

2020-04-27 Thread Chunming Zhou

Yes, same question.

In fact, PAL cmd stream has itself Relase/Acquire packets. That we use 
the flag is per your request.


-David

在 2020/4/27 22:53, Christian König 写道:

Yeah, but is Mesa going to use it?

Christian.

Am 27.04.20 um 15:54 schrieb Marek Olšák:
PAL requested it and they are going to use it. (it looks like they 
have to use it for correctness)


Marek

On Mon, Apr 27, 2020 at 9:02 AM Deucher, Alexander 
mailto:alexander.deuc...@amd.com>> wrote:


[AMD Official Use Only - Internal Distribution Only]


Do we have open source code UMD code which uses this?

Alex

*From:* Christian König mailto:ckoenig.leichtzumer...@gmail.com>>
*Sent:* Sunday, April 26, 2020 4:55 AM
*To:* Marek Olšák mailto:mar...@gmail.com>>;
Koenig, Christian mailto:christian.koe...@amd.com>>
*Cc:* Deucher, Alexander mailto:alexander.deuc...@amd.com>>; amd-gfx mailing list
mailto:amd-gfx@lists.freedesktop.org>>
*Subject:* Re: drm/amdgpu: apply AMDGPU_IB_FLAG_EMIT_MEM_SYNC to
compute IBs too
Thanks for that explanation. I suspected that there was a good
reason to have that in the kernel, but couldn't find one.

In this case the patch is Reviewed-by: Christian König
 

We should probably add this explanation as comment to the flag as
well.

Thanks,
Christian.

Am 26.04.20 um 02:43 schrieb Marek Olšák:

It was merged into amd-staging-drm-next.

I'm not absolutely sure, but I think we need to invalidate
before IBs if an IB is cached in L2 and the CPU has updated it.
It can only be cached in L2 if something other than CP has read
it or written to it without invalidation. CP reads don't cache
it but they can hit the cache if it's already cached.

For CE, we need to invalidate before the IB in the kernel,
because CE IBs can't do cache invalidations IIRC. This is the
number one reason for merging the already pushed commits.

Marek

On Sat., Apr. 25, 2020, 11:03 Christian König,
mailto:ckoenig.leichtzumer...@gmail.com>> wrote:

Was that patch set actually merged upstream? My last status
is that we couldn't find a reason why we need to do this in
the kernel.

Christian.

Am 25.04.20 um 10:52 schrieb Marek Olšák:

This was missed.

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  
https://lists.freedesktop.org/mailman/listinfo/amd-gfx  




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org  
https://lists.freedesktop.org/mailman/listinfo/amd-gfx  




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfxdata=02%7C01%7Cdavid1.zhou%40amd.com%7Ced56cca1a5214cf9132808d7eabac6d9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637235960880895689sdata=6p%2BAuZXHiUrO8wElftOqsJzHF%2BVLe5TMDIF%2BbJNV6ac%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed delete worker

2020-04-09 Thread Chunming Zhou

We can have both of yours, I think.

Even switch to use spin_trylock, I think we are ok to have 
cond_resched() Xinhui added in this patch. That can give more chance to 
urgent task to use cpu.



-David

在 2020/4/9 22:59, Christian König 写道:

Why we break out the loops when there are pending bos to be released?


We do this anyway if we can't acquire the necessary locks. Freeing 
already deleted BOs is just a very lazy background work.



So it did not break anything with this patch I think.


Oh, the patch will certainly work. I'm just not sure if it's the ideal 
behavior.



https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.


Yes, and that is also a good example of what I mean here:

	if  (!mutex_trylock 
(_mutex 
))

/* Give up. Setup the next iteration. */
goto  out;


If the function can't acquire the lock immediately it gives up and 
waits for the next iteration.


I think it would be better if we do this in TTM as well if we spend to 
much time cleaning up old BOs.


On the other hand you are right that cond_resched() has the advantage 
that we could spend more time on cleaning up old BOs if there is 
nothing else for the CPU TODO.


Regards,
Christian.

Am 09.04.20 um 16:24 schrieb Pan, Xinhui:

https://elixir.bootlin.com/linux/latest/source/mm/slab.c#L4026

This is another example of the usage of  cond_sched.

*From:* Pan, Xinhui 
*Sent:* Thursday, April 9, 2020 10:11:08 PM
*To:* Lucas Stach ; 
amd-gfx@lists.freedesktop.org ; 
Koenig, Christian 

*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker
I think it doesn't matter if workitem schedule out. Even we did not 
schedule out, the workqueue itself will schedule out later.

So it did not break anything with this patch I think.

*From:* Pan, Xinhui 
*Sent:* Thursday, April 9, 2020 10:07:09 PM
*To:* Lucas Stach ; 
amd-gfx@lists.freedesktop.org ; 
Koenig, Christian 

*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker

Why we break out the loops when there are pending bos to be released?

And I just checked the process_one_work. Right after the work item 
callback is called,  the workqueue itself will call cond_resched. So 
I think


*From:* Koenig, Christian 
*Sent:* Thursday, April 9, 2020 9:38:24 PM
*To:* Lucas Stach ; Pan, Xinhui 
; amd-gfx@lists.freedesktop.org 


*Cc:* dri-de...@lists.freedesktop.org 
*Subject:* Re: [PATCH] drm/ttm: Schedule out if possibe in bo delayed 
delete worker

Am 09.04.20 um 15:25 schrieb Lucas Stach:
> Am Donnerstag, den 09.04.2020, 14:35 +0200 schrieb Christian König:
>> Am 09.04.20 um 03:31 schrieb xinhui pan:
>>> The delayed delete list is per device which might be very huge. 
And in

>>> a heavy workload test, the list might always not be empty. That will
>>> trigger any RCU stall warnings or softlockups in non-preemptible 
kernels

>>> Lets do schedule out if possible in that case.
>> Mhm, I'm not sure if that is actually allowed. This is called from a
>> work item and those are not really supposed to be scheduled away.
> Huh? Workitems can schedule out just fine, otherwise they would be
> horribly broken when it comes to sleeping locks.

Let me refine the sentence: Work items are not really supposed to be
scheduled purposely. E.g. you shouldn't call schedule() or
cond_resched() like in the case here.

Getting scheduled away because we wait for a lock is of course perfectly
fine.

>   The workqueue code
> even has measures to keep the workqueues at the expected concurrency
> level by starting other workitems when one of them goes to sleep.

Yeah, and exactly that's what I would say we should avoid here :)

In other words work items can be scheduled away, but they should not if
not really necessary (e.g. waiting for a lock).

Otherwise as you said new threads for work item processing are started
up and I don't think we want that.

Just returning from the work item and waiting for the next cycle is most
likely the better option.

Regards,
Christian.

>
> 

Re: [PATCH] drm/amdgpu: resvert "disable bulk moves for now"

2019-09-12 Thread Chunming Zhou
RB on it to go ahead.

-David

在 2019/9/12 18:15, Christian König 写道:
> This reverts commit a213c2c7e235cfc0e0a161a558f7fdf2fb3a624a.
>
> The changes to fix this should have landed in 5.1.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 --
>   1 file changed, 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 48349e4f0701..fd3fbaa73fa3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -603,14 +603,12 @@ void amdgpu_vm_move_to_lru_tail(struct amdgpu_device 
> *adev,
>   struct ttm_bo_global *glob = adev->mman.bdev.glob;
>   struct amdgpu_vm_bo_base *bo_base;
>   
> -#if 0
>   if (vm->bulk_moveable) {
>   spin_lock(>lru_lock);
>   ttm_bo_bulk_move_lru_tail(>lru_bulk_move);
>   spin_unlock(>lru_lock);
>   return;
>   }
> -#endif
>   
>   memset(>lru_bulk_move, 0, sizeof(vm->lru_bulk_move));
>   
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: grab the id mgr lock while accessing passid_mapping

2019-09-10 Thread Chunming Zhou
Reviewed-by: Chunming Zhou 

在 2019/9/10 16:56, Christian König 写道:
> Ping!
>
> Am 09.09.19 um 13:59 schrieb Christian König:
>> Need to make sure that we actually dropping the right fence.
>> Could be done with RCU as well, but to complicated for a fix.
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 +---
>>   1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index b285ab25146d..e11764164cbf 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -1036,10 +1036,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, 
>> struct amdgpu_job *job, bool need_
>>   id->oa_base != job->oa_base ||
>>   id->oa_size != job->oa_size);
>>   bool vm_flush_needed = job->vm_needs_flush;
>> -    bool pasid_mapping_needed = id->pasid != job->pasid ||
>> -    !id->pasid_mapping ||
>> -    !dma_fence_is_signaled(id->pasid_mapping);
>>   struct dma_fence *fence = NULL;
>> +    bool pasid_mapping_needed;
>>   unsigned patch_offset = 0;
>>   int r;
>>   @@ -1049,6 +1047,12 @@ int amdgpu_vm_flush(struct amdgpu_ring 
>> *ring, struct amdgpu_job *job, bool need_
>>   pasid_mapping_needed = true;
>>   }
>>   +    mutex_lock(_mgr->lock);
>> +    if (id->pasid != job->pasid || !id->pasid_mapping ||
>> +    !dma_fence_is_signaled(id->pasid_mapping))
>> +    pasid_mapping_needed = true;
>> +    mutex_unlock(_mgr->lock);
>> +
>>   gds_switch_needed &= !!ring->funcs->emit_gds_switch;
>>   vm_flush_needed &= !!ring->funcs->emit_vm_flush &&
>>   job->vm_pd_addr != AMDGPU_BO_INVALID_OFFSET;
>> @@ -1088,9 +1092,11 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, 
>> struct amdgpu_job *job, bool need_
>>   }
>>     if (pasid_mapping_needed) {
>> +    mutex_lock(_mgr->lock);
>>   id->pasid = job->pasid;
>>   dma_fence_put(id->pasid_mapping);
>>   id->pasid_mapping = dma_fence_get(fence);
>> +    mutex_unlock(_mgr->lock);
>>   }
>>   dma_fence_put(fence);
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdgpu: remove amdgpu_cs_try_evict

2019-09-03 Thread Chunming Zhou
Reviewed-by: Chunming Zhou  for series.

-David

在 2019/9/3 17:09, Christian König 写道:
> Trying to evict things from the current working set doesn't work that
> well anymore because of per VM BOs.
>
> Rely on reserving VRAM for page tables to avoid contention.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 71 +-
>   2 files changed, 1 insertion(+), 71 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index a236213f8e8e..d1995156733e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -478,7 +478,6 @@ struct amdgpu_cs_parser {
>   uint64_tbytes_moved_vis_threshold;
>   uint64_tbytes_moved;
>   uint64_tbytes_moved_vis;
> - struct amdgpu_bo_list_entry *evictable;
>   
>   /* user fence */
>   struct amdgpu_bo_list_entry uf_entry;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index fd95b586b590..03182d968d3d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -447,75 +447,12 @@ static int amdgpu_cs_bo_validate(struct 
> amdgpu_cs_parser *p,
>   return r;
>   }
>   
> -/* Last resort, try to evict something from the current working set */
> -static bool amdgpu_cs_try_evict(struct amdgpu_cs_parser *p,
> - struct amdgpu_bo *validated)
> -{
> - uint32_t domain = validated->allowed_domains;
> - struct ttm_operation_ctx ctx = { true, false };
> - int r;
> -
> - if (!p->evictable)
> - return false;
> -
> - for (;>evictable->tv.head != >validated;
> -  p->evictable = list_prev_entry(p->evictable, tv.head)) {
> -
> - struct amdgpu_bo_list_entry *candidate = p->evictable;
> - struct amdgpu_bo *bo = ttm_to_amdgpu_bo(candidate->tv.bo);
> - struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> - bool update_bytes_moved_vis;
> - uint32_t other;
> -
> - /* If we reached our current BO we can forget it */
> - if (bo == validated)
> - break;
> -
> - /* We can't move pinned BOs here */
> - if (bo->pin_count)
> - continue;
> -
> - other = amdgpu_mem_type_to_domain(bo->tbo.mem.mem_type);
> -
> - /* Check if this BO is in one of the domains we need space for 
> */
> - if (!(other & domain))
> - continue;
> -
> - /* Check if we can move this BO somewhere else */
> - other = bo->allowed_domains & ~domain;
> - if (!other)
> - continue;
> -
> - /* Good we can try to move this BO somewhere else */
> - update_bytes_moved_vis =
> - !amdgpu_gmc_vram_full_visible(>gmc) &&
> - amdgpu_bo_in_cpu_visible_vram(bo);
> - amdgpu_bo_placement_from_domain(bo, other);
> - r = ttm_bo_validate(>tbo, >placement, );
> - p->bytes_moved += ctx.bytes_moved;
> - if (update_bytes_moved_vis)
> - p->bytes_moved_vis += ctx.bytes_moved;
> -
> - if (unlikely(r))
> - break;
> -
> - p->evictable = list_prev_entry(p->evictable, tv.head);
> - list_move(>tv.head, >validated);
> -
> - return true;
> - }
> -
> - return false;
> -}
> -
>   static int amdgpu_cs_validate(void *param, struct amdgpu_bo *bo)
>   {
>   struct amdgpu_cs_parser *p = param;
>   int r;
>   
> - do {
> - r = amdgpu_cs_bo_validate(p, bo);
> - } while (r == -ENOMEM && amdgpu_cs_try_evict(p, bo));
> + r = amdgpu_cs_bo_validate(p, bo);
>   if (r)
>   return r;
>   
> @@ -554,9 +491,6 @@ static int amdgpu_cs_list_validate(struct 
> amdgpu_cs_parser *p,
>   binding_userptr = true;
>   }
>   
> - if (p->evictable == lobj)
> - p->evictable = NULL;
> -
>   r = amdgpu_cs_validate(p, bo);
>   if (r)
>   return r;
> @@ -659,9 +593,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser 
> *p,
>   

Re: [PATCH] drm/amdgpu: fix dma_fence_wait without reference

2019-08-16 Thread Chunming Zhou
Reviewed-by: Chunming Zhou 

在 2019/8/16 21:21, Christian König 写道:
> We need to grab a reference to the fence we wait for.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 27 ++---
>   1 file changed, 15 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index f539a2a92774..7398b4850649 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -534,21 +534,24 @@ int amdgpu_ctx_wait_prev_fence(struct amdgpu_ctx *ctx,
>  struct drm_sched_entity *entity)
>   {
>   struct amdgpu_ctx_entity *centity = to_amdgpu_ctx_entity(entity);
> - unsigned idx = centity->sequence & (amdgpu_sched_jobs - 1);
> - struct dma_fence *other = centity->fences[idx];
> + struct dma_fence *other;
> + unsigned idx;
> + long r;
>   
> - if (other) {
> - signed long r;
> - r = dma_fence_wait(other, true);
> - if (r < 0) {
> - if (r != -ERESTARTSYS)
> - DRM_ERROR("Error (%ld) waiting for fence!\n", 
> r);
> + spin_lock(>ring_lock);
> + idx = centity->sequence & (amdgpu_sched_jobs - 1);
> + other = dma_fence_get(centity->fences[idx]);
> + spin_unlock(>ring_lock);
>   
> - return r;
> - }
> - }
> + if (!other)
> + return 0;
>   
> - return 0;
> + r = dma_fence_wait(other, true);
> + if (r < 0 && r != -ERESTARTSYS)
> + DRM_ERROR("Error (%ld) waiting for fence!\n", r);
> +
> + dma_fence_put(other);
> + return r;
>   }
>   
>   void amdgpu_ctx_mgr_init(struct amdgpu_ctx_mgr *mgr)
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix a potential information leaking bug

2019-07-27 Thread Chunming Zhou

在 2019/7/27 17:30, Wang Xiayang 写道:
> Coccinelle reports a path that the array "data" is never initialized.
> The path skips the checks in the conditional branches when either
> of callback functions, read_wave_vgprs and read_wave_sgprs, is not
> registered. Later, the uninitialized "data" array is read
> in the while-loop below and passed to put_user().
>
> Fix the path by allocating the array with kcalloc().
>
> The patch is simplier than adding a fall-back branch that explicitly
> calls memset(data, 0, ...). Also it does not need the multiplication
> 1024*sizeof(*data) as the size parameter for memset() though there is
> no risk of integer overflow.
>
> Signed-off-by: Wang Xiayang 

Reviewed-by: Chunming Zhou 

-David

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index 6d54decef7f8..5652cc72ed3a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -707,7 +707,7 @@ static ssize_t amdgpu_debugfs_gpr_read(struct file *f, 
> char __user *buf,
>   thread = (*pos & GENMASK_ULL(59, 52)) >> 52;
>   bank = (*pos & GENMASK_ULL(61, 60)) >> 60;
>   
> - data = kmalloc_array(1024, sizeof(*data), GFP_KERNEL);
> + data = kcalloc(1024, sizeof(*data), GFP_KERNEL);
>   if (!data)
>   return -ENOMEM;
>   
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Intermittent errors when using amdgpu_job_submit_direct

2019-07-10 Thread Chunming Zhou

在 2019/7/10 3:26, Kuehling, Felix 写道:
> On 2019-07-09 8:58 a.m., Zhou, David(ChunMing) wrote:
>> I've raised it up when Christian make page fault, at that patch,
>> amdgpu_job_submit_direct uses exclusive page fault ring for that.
>>
>> But if you use amdgpu_job_submit_direct for gerneral rings ocuppied by
>> scheduler, I guess varias bugs will happen.
> The problem is, even the paging ring is used by the scheduler. There are
> several places where buffer operations are submitted to the paging ring
> through the scheduler. That makes any use of the paging ring through
> direct submission problematic.
>
> Even ignoring the scheduler, if it's possible that multiple threads
> submit to the paging ring, we'll need locking to ensure that the
> contents of the ring remain consistent. IIRC, the rings used to have
> locking before we had a GPU scheduler. For comparison, see
> radeon_ring.c, which still has locking. With the GPU scheduler, the
> rings became single-producer queues that no longer needed locking. But
> with direct submission that is no longer true. I think a good place to
> do that locking now would be in amdgpu_ib_schedule.

Yes, That is exact reason why we remove ring lock at that moment.

You can add back it when using submit_direct co-existing with scheduler.

-David

>
> Regards,
>     Felix
>
>
>> -David
>>
>> 在 2019/7/9 12:53, Kuehling, Felix 写道:
>>> I'm seeing some weird intermittent bugs (vm faults, hangs, etc) when
>>> trying to use amdgpu_job_submit_direct. I'm wondering if there is a
>>> possibility of a race condition, when a submit_direct and a GPU
>>> scheduler thread try to submit to the same ring at the same time. I
>>> didn't see any locking to allow multiple threads safely submitting to
>>> the same ring.
>>>
>>> Am I missing something?
>>>
>>> Thanks,
>>>   Felix
>>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/ttm: Fix the memory delay free issue

2019-07-10 Thread Chunming Zhou
It doesn't make sense that freeing BO still uses per-vm resv.

I remember when BO is in release list, its resv will be from per-vm resv 
copy. Could you check it?

-David

在 2019/7/10 17:29, Emily Deng 写道:
> For vulkan cts allocation test cases, they will create a series of bos, and 
> then free
> them. As it has lots of alloction test cases with the same vm, as per vm
> bo feature enable, all of those bos' resv are the same. But the bo free is 
> quite slow,
> as they use the same resv object, for every time, free a bo,
> it will check the resv whether signal, if it signal, then will free it. But
> as the test cases will continue to create bo, and the resv fence is 
> increasing. So the
> free is more slower than creating. It will cause memory exhausting.
>
> Method:
> When the resv signal, release all the bos which are use the same
> resv object.
>
> Signed-off-by: Emily Deng 
> ---
>   drivers/gpu/drm/ttm/ttm_bo.c | 29 -
>   1 file changed, 24 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index f9a3d4c..57ec59b 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -543,6 +543,7 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object 
> *bo,
>   {
>   struct ttm_bo_global *glob = bo->bdev->glob;
>   struct reservation_object *resv;
> + struct ttm_buffer_object *resv_bo, *resv_bo_next;
>   int ret;
>   
>   if (unlikely(list_empty(>ddestroy)))
> @@ -566,10 +567,14 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object 
> *bo,
>  interruptible,
>  30 * HZ);
>   
> - if (lret < 0)
> + if (lret < 0) {
> + kref_put(>list_kref, ttm_bo_release_list);
>   return lret;
> - else if (lret == 0)
> + }
> + else if (lret == 0) {
> + kref_put(>list_kref, ttm_bo_release_list);
>   return -EBUSY;
> + }
>   
>   spin_lock(>lru_lock);
>   if (unlock_resv && !kcl_reservation_object_trylock(bo->resv)) {
> @@ -582,6 +587,7 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object 
> *bo,
>* here.
>*/
>   spin_unlock(>lru_lock);
> + kref_put(>list_kref, ttm_bo_release_list);
>   return 0;
>   }
>   ret = 0;
> @@ -591,15 +597,29 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object 
> *bo,
>   if (unlock_resv)
>   kcl_reservation_object_unlock(bo->resv);
>   spin_unlock(>lru_lock);
> + kref_put(>list_kref, ttm_bo_release_list);
>   return ret;
>   }
>   
>   ttm_bo_del_from_lru(bo);
>   list_del_init(>ddestroy);
>   kref_put(>list_kref, ttm_bo_ref_bug);
> -
>   spin_unlock(>lru_lock);
>   ttm_bo_cleanup_memtype_use(bo);
> + kref_put(>list_kref, ttm_bo_release_list);
> +
> + spin_lock(>lru_lock);
> + list_for_each_entry_safe(resv_bo, resv_bo_next, >bdev->ddestroy, 
> ddestroy) {
> + if (resv_bo->resv == bo->resv) {
> + ttm_bo_del_from_lru(resv_bo);
> + list_del_init(_bo->ddestroy);
> + spin_unlock(>lru_lock);
> + ttm_bo_cleanup_memtype_use(resv_bo);
> + kref_put(_bo->list_kref, ttm_bo_release_list);
> + spin_lock(>lru_lock);
> + }
> + }
> + spin_unlock(>lru_lock);
>   
>   if (unlock_resv)
>   kcl_reservation_object_unlock(bo->resv);
> @@ -639,9 +659,8 @@ static bool ttm_bo_delayed_delete(struct ttm_bo_device 
> *bdev, bool remove_all)
>   ttm_bo_cleanup_refs(bo, false, !remove_all, true);
>   } else {
>   spin_unlock(>lru_lock);
> + kref_put(>list_kref, ttm_bo_release_list);
>   }
> -
> - kref_put(>list_kref, ttm_bo_release_list);
>   spin_lock(>lru_lock);
>   }
>   list_splice_tail(, >ddestroy);
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Intermittent errors when using amdgpu_job_submit_direct

2019-07-09 Thread Chunming Zhou
I've raised it up when Christian make page fault, at that patch, 
amdgpu_job_submit_direct uses exclusive page fault ring for that.

But if you use amdgpu_job_submit_direct for gerneral rings ocuppied by 
scheduler, I guess varias bugs will happen.

-David

在 2019/7/9 12:53, Kuehling, Felix 写道:
> I'm seeing some weird intermittent bugs (vm faults, hangs, etc) when
> trying to use amdgpu_job_submit_direct. I'm wondering if there is a
> possibility of a race condition, when a submit_direct and a GPU
> scheduler thread try to submit to the same ring at the same time. I
> didn't see any locking to allow multiple threads safely submitting to
> the same ring.
>
> Am I missing something?
>
> Thanks,
>     Felix
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/5] drm/amdgpu: allow direct submission in the VM backends

2019-06-28 Thread Chunming Zhou

在 2019/6/28 20:18, Christian König 写道:
> This allows us to update page tables directly while in a page fault.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h  |  5 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c  |  4 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 29 +
>   3 files changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 489a162ca620..5941accea061 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -197,6 +197,11 @@ struct amdgpu_vm_update_params {
>*/
>   struct amdgpu_vm *vm;
>   
> + /**
> +  * @direct: if changes should be made directly
> +  */
> + bool direct;
> +
>   /**
>* @pages_addr:
>*
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> index 5222d165abfc..f94e4896079c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> @@ -49,6 +49,10 @@ static int amdgpu_vm_cpu_prepare(struct 
> amdgpu_vm_update_params *p, void *owner,
>   {
>   int r;
>   
> + /* Don't wait for anything during page fault */
> + if (p->direct)
> + return 0;
> +
>   /* Wait for PT BOs to be idle. PTs share the same resv. object
>* as the root PD BO
>*/
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index ddd181f5ed37..891d597063cb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -68,17 +68,17 @@ static int amdgpu_vm_sdma_prepare(struct 
> amdgpu_vm_update_params *p,
>   if (r)
>   return r;
>   
> - r = amdgpu_sync_fence(p->adev, >job->sync, exclusive, false);
> - if (r)
> - return r;
> + p->num_dw_left = ndw;
> +
> + if (p->direct)
> + return 0;
>   
> - r = amdgpu_sync_resv(p->adev, >job->sync, root->tbo.resv,
> -  owner, false);
> + r = amdgpu_sync_fence(p->adev, >job->sync, exclusive, false);
>   if (r)
>   return r;
>   
> - p->num_dw_left = ndw;
> - return 0;
> + return amdgpu_sync_resv(p->adev, >job->sync, root->tbo.resv,
> + owner, false);
>   }
>   
>   /**
> @@ -99,13 +99,21 @@ static int amdgpu_vm_sdma_commit(struct 
> amdgpu_vm_update_params *p,
>   struct dma_fence *f;
>   int r;
>   
> - ring = container_of(p->vm->entity.rq->sched, struct amdgpu_ring, sched);
> + if (p->direct)
> + ring = p->adev->vm_manager.page_fault;
> + else
> + ring = container_of(p->vm->entity.rq->sched,
> + struct amdgpu_ring, sched);
>   
>   WARN_ON(ib->length_dw == 0);
>   amdgpu_ring_pad_ib(ring, ib);
>   WARN_ON(ib->length_dw > p->num_dw_left);
> - r = amdgpu_job_submit(p->job, >vm->entity,
> -   AMDGPU_FENCE_OWNER_VM, );
> +
> + if (p->direct)
> + r = amdgpu_job_submit_direct(p->job, ring, );

When we use direct submission after intialization, we need to take care 
of ring race condision, don't we? Am I missing anything?


-David

> + else
> + r = amdgpu_job_submit(p->job, >vm->entity,
> +   AMDGPU_FENCE_OWNER_VM, );
>   if (r)
>   goto error;
>   
> @@ -120,7 +128,6 @@ static int amdgpu_vm_sdma_commit(struct 
> amdgpu_vm_update_params *p,
>   return r;
>   }
>   
> -
>   /**
>* amdgpu_vm_sdma_copy_ptes - copy the PTEs from mapping
>*
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: fix transform feedback GDS hang on gfx10

2019-06-20 Thread Chunming Zhou
please take care of .emit_ib_size member, otherwise it looks ok to me.

-David

在 2019/6/20 8:02, Marek Olšák 写道:
> From: Marek Olšák 
>
> Signed-off-by: Marek Olšák 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h |  3 ++-
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 12 ++--
>   2 files changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> index dad2186f4ed5..df8a23554831 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gds.h
> @@ -24,21 +24,22 @@
>   #ifndef __AMDGPU_GDS_H__
>   #define __AMDGPU_GDS_H__
>   
>   struct amdgpu_ring;
>   struct amdgpu_bo;
>   
>   struct amdgpu_gds {
>   uint32_t gds_size;
>   uint32_t gws_size;
>   uint32_t oa_size;
> - uint32_tgds_compute_max_wave_id;
> + uint32_t gds_compute_max_wave_id;
> + uint32_t vgt_gs_max_wave_id;
>   };
>   
>   struct amdgpu_gds_reg_offset {
>   uint32_tmem_base;
>   uint32_tmem_size;
>   uint32_tgws;
>   uint32_toa;
>   };
>   
>   #endif /* __AMDGPU_GDS_H__ */
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 0090cba2d24d..75a34779a57c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -4213,20 +4213,29 @@ static void gfx_v10_0_ring_emit_hdp_flush(struct 
> amdgpu_ring *ring)
>   }
>   
>   static void gfx_v10_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
>  struct amdgpu_job *job,
>  struct amdgpu_ib *ib,
>  uint32_t flags)
>   {
>   unsigned vmid = AMDGPU_JOB_GET_VMID(job);
>   u32 header, control = 0;
>   
> + /* Prevent a hw deadlock due to a wave ID mismatch between ME and GDS.
> +  * This resets the wave ID counters. (needed by transform feedback)
> +  * TODO: This might only be needed on a VMID switch when we change
> +  *   the GDS OA mapping, not sure.
> +  */
> + amdgpu_ring_write(ring, PACKET3(PACKET3_SET_CONFIG_REG, 1));
> + amdgpu_ring_write(ring, mmVGT_GS_MAX_WAVE_ID);
> + amdgpu_ring_write(ring, ring->adev->gds.vgt_gs_max_wave_id);
> +
>   if (ib->flags & AMDGPU_IB_FLAG_CE)
>   header = PACKET3(PACKET3_INDIRECT_BUFFER_CNST, 2);
>   else
>   header = PACKET3(PACKET3_INDIRECT_BUFFER, 2);
>   
>   control |= ib->length_dw | (vmid << 24);
>   
>   if (amdgpu_mcbp && (ib->flags & AMDGPU_IB_FLAG_PREEMPT)) {
>   control |= INDIRECT_BUFFER_PRE_ENB(1);
>   
> @@ -5094,24 +5103,23 @@ static void gfx_v10_0_set_rlc_funcs(struct 
> amdgpu_device *adev)
>   default:
>   break;
>   }
>   }
>   
>   static void gfx_v10_0_set_gds_init(struct amdgpu_device *adev)
>   {
>   /* init asic gds info */
>   switch (adev->asic_type) {
>   case CHIP_NAVI10:
> - adev->gds.gds_size = 0x1;
> - break;
>   default:
>   adev->gds.gds_size = 0x1;
> + adev->gds.vgt_gs_max_wave_id = 0x3ff;
>   break;
>   }
>   
>   adev->gds.gws_size = 64;
>   adev->gds.oa_size = 16;
>   }
>   
>   static void gfx_v10_0_set_user_wgp_inactive_bitmap_per_sh(struct 
> amdgpu_device *adev,
> u32 bitmap)
>   {
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: add DRIVER_SYNCOBJ_TIMELINE to amdgpu

2019-05-27 Thread Chunming Zhou
Change-Id: I2b1af1478fbddbb5084b90b3ff85c2eb964bd217
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 78706dfa753a..1f38d6fc1fe3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1307,7 +1307,8 @@ static struct drm_driver kms_driver = {
.driver_features =
DRIVER_USE_AGP | DRIVER_ATOMIC |
DRIVER_GEM |
-   DRIVER_PRIME | DRIVER_RENDER | DRIVER_MODESET | DRIVER_SYNCOBJ,
+   DRIVER_PRIME | DRIVER_RENDER | DRIVER_MODESET | DRIVER_SYNCOBJ |
+   DRIVER_SYNCOBJ_TIMELINE,
.load = amdgpu_driver_load_kms,
.open = amdgpu_driver_open_kms,
.postclose = amdgpu_driver_postclose_kms,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 06/10] drm/ttm: fix busy memory to fail other user v10

2019-05-23 Thread Chunming Zhou

在 2019/5/23 19:03, Christian König 写道:
> [CAUTION: External Email]
>
> Am 23.05.19 um 12:24 schrieb zhoucm1:
>>
>>
>> On 2019年05月22日 20:59, Christian König wrote:
>>> [CAUTION: External Email]
>>>
>>> BOs on the LRU might be blocked during command submission
>>> and cause OOM situations.
>>>
>>> Avoid this by blocking for the first busy BO not locked by
>>> the same ticket as the BO we are searching space for.
>>>
>>> v10: completely start over with the patch since we didn't
>>>   handled a whole bunch of corner cases.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_bo.c | 77 
>>> ++--
>>>   1 file changed, 66 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c 
>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>> index 4c6389d849ed..861facac33d4 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>> @@ -771,32 +771,72 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
>>>    * b. Otherwise, trylock it.
>>>    */
>>>   static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object
>>> *bo,
>>> -   struct ttm_operation_ctx *ctx, bool *locked)
>>> +   struct ttm_operation_ctx *ctx, bool *locked,
>>> bool *busy)
>>>   {
>>>  bool ret = false;
>>>
>>> -   *locked = false;
>>>  if (bo->resv == ctx->resv) {
>>>  reservation_object_assert_held(bo->resv);
>>>  if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
>>>  || !list_empty(>ddestroy))
>>>  ret = true;
>>> +   *locked = false;
>>> +   if (busy)
>>> +   *busy = false;
>>>  } else {
>>> -   *locked = reservation_object_trylock(bo->resv);
>>> -   ret = *locked;
>>> +   ret = reservation_object_trylock(bo->resv);
>>> +   *locked = ret;
>>> +   if (busy)
>>> +   *busy = !ret;
>>>  }
>>>
>>>  return ret;
>>>   }
>>>
>>> +/**
>>> + * ttm_mem_evict_wait_busy - wait for a busy BO to become available
>>> + *
>>> + * @busy_bo: BO which couldn't be locked with trylock
>>> + * @ctx: operation context
>>> + * @ticket: acquire ticket
>>> + *
>>> + * Try to lock a busy buffer object to avoid failing eviction.
>>> + */
>>> +static int ttm_mem_evict_wait_busy(struct ttm_buffer_object *busy_bo,
>>> +  struct ttm_operation_ctx *ctx,
>>> +  struct ww_acquire_ctx *ticket)
>>> +{
>>> +   int r;
>>> +
>>> +   if (!busy_bo || !ticket)
>>> +   return -EBUSY;
>>> +
>>> +   if (ctx->interruptible)
>>> +   r = 
>>> reservation_object_lock_interruptible(busy_bo->resv,
>>> + ticket);
>>> +   else
>>> +   r = reservation_object_lock(busy_bo->resv, ticket);
>>> +
>>> +   /*
>>> +    * TODO: It would be better to keep the BO locked until
>>> allocation is at
>>> +    * least tried one more time, but that would mean a much
>>> larger rework
>>> +    * of TTM.
>>> +    */
>>> +   if (!r)
>>> +   reservation_object_unlock(busy_bo->resv);
>>> +
>>> +   return r == -EDEADLK ? -EAGAIN : r;
>>> +}
>>> +
>>>   static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
>>>     uint32_t mem_type,
>>>     const struct ttm_place *place,
>>> -  struct ttm_operation_ctx *ctx)
>>> +  struct ttm_operation_ctx *ctx,
>>> +  struct ww_acquire_ctx *ticket)
>>>   {
>>> +   struct ttm_buffer_object *bo = NULL, *busy_bo = NULL;
>>>  struct ttm_bo_global *glob = bdev->glob;
>>>  struct ttm_mem_type_manager *man = >man[mem_type];
>>> -   struct ttm_buffer_object *bo = NULL;
>>>  bool locked = false;
>>>  unsigned i;
>>>  int ret;
>>> @@ -804,8 +844,15 @@ static int ttm_mem_evict_first(struct
>>> ttm_bo_device *bdev,
>>>  spin_lock(>lru_lock);
>>>  for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
>>>  list_for_each_entry(bo, >lru[i], lru) {
>>> -   if (!ttm_bo_evict_swapout_allowable(bo, ctx,
>>> ))
>>> +   bool busy;
>>> +
>>> +   if (!ttm_bo_evict_swapout_allowable(bo, ctx,
>>> ,
>>> + )) {
>>> +   if (busy && !busy_bo &&
>>> +   bo->resv->lock.ctx != ticket)
>>> +   busy_bo = bo;
>>>  continue;
>>> +   }
>>>
>>>  if (place &&
>>> !bdev->driver->eviction_valuable(bo,
>>> place)) {
>>> @@ -824,8 +871,13 @@ static int ttm_mem_evict_first(struct
>>> ttm_bo_device *bdev,
>>>  }
>>>
>>>  if (!bo) {
>>> +   if 

[PATCH libdrm 3/7] wrap syncobj timeline query/wait APIs for amdgpu v3

2019-05-13 Thread Chunming Zhou
v2: symbos are stored in lexical order.
v3: drop export/import and extra query indirection

Signed-off-by: Chunming Zhou 
Acked-by: Christian König 
---
 amdgpu/amdgpu-symbol-check |  2 ++
 amdgpu/amdgpu.h| 39 ++
 amdgpu/amdgpu_cs.c | 23 ++
 3 files changed, 64 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 4d806922..d3c5bb89 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -53,8 +53,10 @@ amdgpu_cs_submit_raw
 amdgpu_cs_submit_raw2
 amdgpu_cs_syncobj_export_sync_file
 amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_query
 amdgpu_cs_syncobj_reset
 amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_wait
 amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
 amdgpu_cs_wait_semaphore
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index c44a495a..5ebfe1e3 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1536,6 +1536,45 @@ int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
   int64_t timeout_nsec, unsigned flags,
   uint32_t *first_signaled);
 
+/**
+ *  Wait for one or all sync objects on their points to signal.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [in] array of sync points to wait
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   timeout_nsec - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_*
+ * \param   first_signaled - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles,
+   int64_t timeout_nsec, unsigned flags,
+   uint32_t *first_signaled);
+/**
+ *  Query sync objects payloads.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [out] array of sync points returned, which presents
+ * syncobj payload.
+ * \param   num_handles - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles);
+
 /**
  *  Export kernel sync object to shareable fd.
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 7c5b9d13..9fcaf2c4 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -686,6 +686,29 @@ drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle 
dev,
  flags, first_signaled);
 }
 
+drm_public int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+  uint32_t *handles, uint64_t 
*points,
+  unsigned num_handles,
+  int64_t timeout_nsec, unsigned 
flags,
+  uint32_t *first_signaled)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTimelineWait(dev->fd, handles, points, num_handles,
+ timeout_nsec, flags, first_signaled);
+}
+
+drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+  uint32_t *handles, uint64_t *points,
+  unsigned num_handles)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjQuery(dev->fd, handles, points, num_handles);
+}
+
 drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
uint32_t handle,
int *shared_fd)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 2/7] add timeline wait/query ioctl v2

2019-05-13 Thread Chunming Zhou
v2: drop export/import

Signed-off-by: Chunming Zhou 
---
 xf86drm.c | 44 
 xf86drm.h |  6 ++
 2 files changed, 50 insertions(+)

diff --git a/xf86drm.c b/xf86drm.c
index 2c19376b..17e3d880 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4256,3 +4256,47 @@ drm_public int drmSyncobjSignal(int fd, const uint32_t 
*handles,
 ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_SIGNAL, );
 return ret;
 }
+
+drm_public int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t 
*points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled)
+{
+struct drm_syncobj_timeline_wait args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.timeout_nsec = timeout_nsec;
+args.count_handles = num_handles;
+args.flags = flags;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, );
+if (ret < 0)
+return -errno;
+
+if (first_signaled)
+*first_signaled = args.first_signaled;
+return ret;
+}
+
+
+drm_public int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
+  uint32_t handle_count)
+{
+struct drm_syncobj_timeline_array args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.count_handles = handle_count;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_QUERY, );
+if (ret)
+return ret;
+return 0;
+}
+
+
diff --git a/xf86drm.h b/xf86drm.h
index 887ecc76..60c7a84f 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -876,6 +876,12 @@ extern int drmSyncobjWait(int fd, uint32_t *handles, 
unsigned num_handles,
  uint32_t *first_signaled);
 extern int drmSyncobjReset(int fd, const uint32_t *handles, uint32_t 
handle_count);
 extern int drmSyncobjSignal(int fd, const uint32_t *handles, uint32_t 
handle_count);
+extern int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t *points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled);
+extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
+  uint32_t handle_count);
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 7/7] add syncobj timeline tests v3

2019-05-13 Thread Chunming Zhou
v2: drop DRM_SYNCOBJ_CREATE_TYPE_TIMELINE, fix timeout calculation,
fix some warnings
v3: add export/import and cpu signal testing cases

Signed-off-by: Chunming Zhou 
Acked-by: Christian König 
---
 tests/amdgpu/Makefile.am |   3 +-
 tests/amdgpu/amdgpu_test.c   |  11 ++
 tests/amdgpu/amdgpu_test.h   |  21 +++
 tests/amdgpu/meson.build |   2 +-
 tests/amdgpu/syncobj_tests.c | 290 +++
 5 files changed, 325 insertions(+), 2 deletions(-)
 create mode 100644 tests/amdgpu/syncobj_tests.c

diff --git a/tests/amdgpu/Makefile.am b/tests/amdgpu/Makefile.am
index 48278848..920882d0 100644
--- a/tests/amdgpu/Makefile.am
+++ b/tests/amdgpu/Makefile.am
@@ -34,4 +34,5 @@ amdgpu_test_SOURCES = \
uve_ib.h \
deadlock_tests.c \
vm_tests.c  \
-   ras_tests.c
+   ras_tests.c \
+   syncobj_tests.c
diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
index 35c8bf6c..73403fb4 100644
--- a/tests/amdgpu/amdgpu_test.c
+++ b/tests/amdgpu/amdgpu_test.c
@@ -57,6 +57,7 @@
 #define DEADLOCK_TESTS_STR "Deadlock Tests"
 #define VM_TESTS_STR "VM Tests"
 #define RAS_TESTS_STR "RAS Tests"
+#define SYNCOBJ_TIMELINE_TESTS_STR "SYNCOBJ TIMELINE Tests"
 
 /**
  *  Open handles for amdgpu devices
@@ -123,6 +124,12 @@ static CU_SuiteInfo suites[] = {
.pCleanupFunc = suite_ras_tests_clean,
.pTests = ras_tests,
},
+   {
+   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+   .pInitFunc = suite_syncobj_timeline_tests_init,
+   .pCleanupFunc = suite_syncobj_timeline_tests_clean,
+   .pTests = syncobj_timeline_tests,
+   },
 
CU_SUITE_INFO_NULL,
 };
@@ -176,6 +183,10 @@ static Suites_Active_Status suites_active_stat[] = {
.pName = RAS_TESTS_STR,
.pActive = suite_ras_tests_enable,
},
+   {
+   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+   .pActive = suite_syncobj_timeline_tests_enable,
+   },
 };
 
 
diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h
index bcd0bc7e..36675ea3 100644
--- a/tests/amdgpu/amdgpu_test.h
+++ b/tests/amdgpu/amdgpu_test.h
@@ -216,6 +216,27 @@ CU_BOOL suite_ras_tests_enable(void);
 extern CU_TestInfo ras_tests[];
 
 
+/**
+ * Initialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_init();
+
+/**
+ * Deinitialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_clean();
+
+/**
+ * Decide if the suite is enabled by default or not.
+ */
+CU_BOOL suite_syncobj_timeline_tests_enable(void);
+
+/**
+ * Tests in syncobj timeline test suite
+ */
+extern CU_TestInfo syncobj_timeline_tests[];
+
+
 /**
  * Helper functions
  */
diff --git a/tests/amdgpu/meson.build b/tests/amdgpu/meson.build
index 95ed9305..1726cb43 100644
--- a/tests/amdgpu/meson.build
+++ b/tests/amdgpu/meson.build
@@ -24,7 +24,7 @@ if dep_cunit.found()
 files(
   'amdgpu_test.c', 'basic_tests.c', 'bo_tests.c', 'cs_tests.c',
   'vce_tests.c', 'uvd_enc_tests.c', 'vcn_tests.c', 'deadlock_tests.c',
-  'vm_tests.c', 'ras_tests.c',
+  'vm_tests.c', 'ras_tests.c', 'syncobj_tests.c',
 ),
 dependencies : [dep_cunit, dep_threads],
 include_directories : [inc_root, inc_drm, 
include_directories('../../amdgpu')],
diff --git a/tests/amdgpu/syncobj_tests.c b/tests/amdgpu/syncobj_tests.c
new file mode 100644
index ..a0c627d7
--- /dev/null
+++ b/tests/amdgpu/syncobj_tests.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright 2017 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+*/
+
+#include "CUnit/Basic.h"
+
+#include "amdgpu_test.h"
+#include "amdgpu_drm.h"
+#include "amdgpu_internal.h"
+#include 
+
+static  amdgpu_device_hand

[PATCH libdrm 6/7] wrap transfer interfaces

2019-05-13 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
Acked-by: Christian König 
---
 amdgpu/amdgpu.h| 22 ++
 amdgpu/amdgpu_cs.c | 16 
 2 files changed, 38 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index d2480dbe..9d9b0832 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1685,6 +1685,28 @@ int 
amdgpu_cs_syncobj_import_sync_file2(amdgpu_device_handle dev,
uint32_t syncobj,
uint64_t point,
int sync_file_fd);
+
+/**
+ *  transfer between syncbojs.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   dst_handle - \c [in] sync object handle
+ * \param   dst_point  - \c [in] timeline point, 0 presents dst is binary
+ * \param   src_handle - \c [in] sync object handle
+ * \param   src_point  - \c [in] timeline point, 0 presents src is binary
+ * \param   flags  - \c [in] flags
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_transfer(amdgpu_device_handle dev,
+  uint32_t dst_handle,
+  uint64_t dst_point,
+  uint32_t src_handle,
+  uint64_t src_point,
+  uint32_t flags);
+
 /**
  * Export an amdgpu fence as a handle (syncobj or fd).
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index daca4421..977fa3cf 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -817,6 +817,22 @@ out:
return ret;
 }
 
+drm_public int amdgpu_cs_syncobj_transfer(amdgpu_device_handle dev,
+ uint32_t dst_handle,
+ uint64_t dst_point,
+ uint32_t src_handle,
+ uint64_t src_point,
+ uint32_t flags)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTransfer(dev->fd,
+ dst_handle, dst_point,
+ src_handle, src_point,
+ flags);
+}
+
 drm_public int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
amdgpu_context_handle context,
amdgpu_bo_list_handle bo_list_handle,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 4/7] add timeline signal/transfer ioctls v2

2019-05-13 Thread Chunming Zhou
v2: use one transfer ioctl

Signed-off-by: Chunming Zhou 
---
 xf86drm.c | 33 +
 xf86drm.h |  6 ++
 2 files changed, 39 insertions(+)

diff --git a/xf86drm.c b/xf86drm.c
index 17e3d880..acd16fab 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4257,6 +4257,21 @@ drm_public int drmSyncobjSignal(int fd, const uint32_t 
*handles,
 return ret;
 }
 
+drm_public int drmSyncobjTimelineSignal(int fd, const uint32_t *handles,
+   uint64_t *points, uint32_t handle_count)
+{
+struct drm_syncobj_timeline_array args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.count_handles = handle_count;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, );
+return ret;
+}
+
 drm_public int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t 
*points,
  unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
@@ -4299,4 +4314,22 @@ drm_public int drmSyncobjQuery(int fd, uint32_t 
*handles, uint64_t *points,
 return 0;
 }
 
+drm_public int drmSyncobjTransfer(int fd,
+ uint32_t dst_handle, uint64_t dst_point,
+ uint32_t src_handle, uint64_t src_point,
+ uint32_t flags)
+{
+struct drm_syncobj_transfer args;
+int ret;
+
+memclear(args);
+args.src_handle = src_handle;
+args.dst_handle = dst_handle;
+args.src_point = src_point;
+args.dst_point = dst_point;
+args.flags = flags;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TRANSFER, );
 
+return ret;
+}
diff --git a/xf86drm.h b/xf86drm.h
index 60c7a84f..3fb1d1ca 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -876,12 +876,18 @@ extern int drmSyncobjWait(int fd, uint32_t *handles, 
unsigned num_handles,
  uint32_t *first_signaled);
 extern int drmSyncobjReset(int fd, const uint32_t *handles, uint32_t 
handle_count);
 extern int drmSyncobjSignal(int fd, const uint32_t *handles, uint32_t 
handle_count);
+extern int drmSyncobjTimelineSignal(int fd, const uint32_t *handles,
+   uint64_t *points, uint32_t handle_count);
 extern int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t *points,
  unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
  uint32_t *first_signaled);
 extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
   uint32_t handle_count);
+extern int drmSyncobjTransfer(int fd,
+ uint32_t dst_handle, uint64_t dst_point,
+ uint32_t src_handle, uint64_t src_point,
+ uint32_t flags);
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 5/7] expose timeline signal/export/import interfaces v2

2019-05-13 Thread Chunming Zhou
v2: adapt to new one transfer ioctl

Signed-off-by: Chunming Zhou 
Acked-by: Christian König 
---
 amdgpu/amdgpu-symbol-check |  3 ++
 amdgpu/amdgpu.h| 51 
 amdgpu/amdgpu_cs.c | 68 ++
 3 files changed, 122 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index d3c5bb89..274b4c6d 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -52,10 +52,13 @@ amdgpu_cs_submit
 amdgpu_cs_submit_raw
 amdgpu_cs_submit_raw2
 amdgpu_cs_syncobj_export_sync_file
+amdgpu_cs_syncobj_export_sync_file2
 amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_import_sync_file2
 amdgpu_cs_syncobj_query
 amdgpu_cs_syncobj_reset
 amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_signal
 amdgpu_cs_syncobj_timeline_wait
 amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 5ebfe1e3..d2480dbe 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1516,6 +1516,23 @@ int amdgpu_cs_syncobj_reset(amdgpu_device_handle dev,
 int amdgpu_cs_syncobj_signal(amdgpu_device_handle dev,
 const uint32_t *syncobjs, uint32_t syncobj_count);
 
+/**
+ * Signal kernel timeline sync objects.
+ *
+ * \param dev   - \c [in] device handle
+ * \param syncobjs  - \c [in] array of sync object handles
+ * \param points   - \c [in] array of timeline points
+ * \param syncobj_count - \c [in] number of handles in syncobjs
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_syncobj_timeline_signal(amdgpu_device_handle dev,
+ const uint32_t *syncobjs,
+ uint64_t *points,
+ uint32_t syncobj_count);
+
 /**
  *  Wait for one or all sync objects to signal.
  *
@@ -1633,7 +1650,41 @@ int 
amdgpu_cs_syncobj_export_sync_file(amdgpu_device_handle dev,
 int amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
   uint32_t syncobj,
   int sync_file_fd);
+/**
+ *  Export kernel timeline sync object to a sync_file.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   point  - \c [in] timeline point
+ * \param   flags  - \c [in] flags
+ * \param   sync_file_fd - \c [out] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_export_sync_file2(amdgpu_device_handle dev,
+   uint32_t syncobj,
+   uint64_t point,
+   uint32_t flags,
+   int *sync_file_fd);
 
+/**
+ *  Import kernel timeline sync object from a sync_file.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   point  - \c [in] timeline point
+ * \param   sync_file_fd - \c [in] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_import_sync_file2(amdgpu_device_handle dev,
+   uint32_t syncobj,
+   uint64_t point,
+   int sync_file_fd);
 /**
  * Export an amdgpu fence as a handle (syncobj or fd).
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 9fcaf2c4..daca4421 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -674,6 +674,18 @@ drm_public int 
amdgpu_cs_syncobj_signal(amdgpu_device_handle dev,
return drmSyncobjSignal(dev->fd, syncobjs, syncobj_count);
 }
 
+drm_public int amdgpu_cs_syncobj_timeline_signal(amdgpu_device_handle dev,
+const uint32_t *syncobjs,
+uint64_t *points,
+uint32_t syncobj_count)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTimelineSignal(dev->fd, syncobjs,
+   points, syncobj_count);
+}
+
 drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
  uint32_t *handles, unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
@@ -749,6 +761,62 @@ drm_public int 
amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
return drmSyncobjImportSyncFile(dev->fd, syncobj, sync_file_fd);
 }
 
+drm_public int amdgpu_cs_syncobj_export_sync_file2(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  

[PATCH libdrm 1/7] addr cs chunk for syncobj timeline

2019-05-13 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 include/drm/amdgpu_drm.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index d0701ffc..3d0318e6 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -528,6 +528,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
 #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
 #define AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES 0x07
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT0x08
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL  0x09
 
 struct drm_amdgpu_cs_chunk {
__u32   chunk_id;
@@ -608,6 +610,13 @@ struct drm_amdgpu_cs_chunk_sem {
__u32 handle;
 };
 
+struct drm_amdgpu_cs_chunk_syncobj {
+   __u32 handle;
+   __u32 flags;
+   __u64 point;
+};
+
+
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ 0
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD  1
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD2
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/ttm: fix busy memory to fail other user v6

2019-05-07 Thread Chunming Zhou
heavy gpu job could occupy memory long time, which lead other user fail to get 
memory.

basically pick up Christian idea:

1. Reserve the BO in DC using a ww_mutex ticket (trivial).
2. If we then run into this EBUSY condition in TTM check if the BO we need 
memory for (or rather the ww_mutex of its reservation object) has a ticket 
assigned.
3. If we have a ticket we grab a reference to the first BO on the LRU, drop the 
LRU lock and try to grab the reservation lock with the ticket.
4. If getting the reservation lock with the ticket succeeded we check if the BO 
is still the first one on the LRU in question (the BO could have moved).
5. If the BO is still the first one on the LRU in question we try to evict it 
as we would evict any other BO.
6. If any of the "If's" above fail we just back off and return -EBUSY.

v2: fix some minor check
v3: address Christian v2 comments.
v4: fix some missing
v5: handle first_bo unlock and bo_get/put
v6: abstract unified iterate function, and handle all possible usecase not only 
pinned bo.

Change-Id: I21423fb922f885465f13833c41df1e134364a8e7
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 113 ++-
 1 file changed, 97 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 8502b3ed2d88..bbf1d14d00a7 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -766,11 +766,13 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
  * b. Otherwise, trylock it.
  */
 static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
-   struct ttm_operation_ctx *ctx, bool *locked)
+   struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
 {
bool ret = false;
 
*locked = false;
+   if (busy)
+   *busy = false;
if (bo->resv == ctx->resv) {
reservation_object_assert_held(bo->resv);
if (ctx->flags & TTM_OPT_FLAG_ALLOW_RES_EVICT
@@ -779,35 +781,45 @@ static bool ttm_bo_evict_swapout_allowable(struct 
ttm_buffer_object *bo,
} else {
*locked = reservation_object_trylock(bo->resv);
ret = *locked;
+   if (!ret && busy)
+   *busy = true;
}
 
return ret;
 }
 
-static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
-  uint32_t mem_type,
-  const struct ttm_place *place,
-  struct ttm_operation_ctx *ctx)
+static struct ttm_buffer_object*
+ttm_mem_find_evitable_bo(struct ttm_bo_device *bdev,
+struct ttm_mem_type_manager *man,
+const struct ttm_place *place,
+struct ttm_operation_ctx *ctx,
+struct ttm_buffer_object **first_bo,
+bool *locked)
 {
-   struct ttm_bo_global *glob = bdev->glob;
-   struct ttm_mem_type_manager *man = >man[mem_type];
struct ttm_buffer_object *bo = NULL;
-   bool locked = false;
-   unsigned i;
-   int ret;
+   int i;
 
-   spin_lock(>lru_lock);
+   if (first_bo)
+   *first_bo = NULL;
for (i = 0; i < TTM_MAX_BO_PRIORITY; ++i) {
list_for_each_entry(bo, >lru[i], lru) {
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, ))
+   bool busy = false;
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, locked,
+   )) {
+   if (first_bo && !(*first_bo) && busy) {
+   ttm_bo_get(bo);
+   *first_bo = bo;
+   }
continue;
+   }
 
if (place && !bdev->driver->eviction_valuable(bo,
  place)) {
-   if (locked)
+   if (*locked)
reservation_object_unlock(bo->resv);
continue;
}
+
break;
}
 
@@ -818,9 +830,66 @@ static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
bo = NULL;
}
 
+   return bo;
+}
+
+static int ttm_mem_evict_first(struct ttm_bo_device *bdev,
+  uint32_t mem_type,
+  const struct ttm_place *place,
+  struct ttm_operation_ctx *ctx)
+{
+   struct ttm_bo_global *glob = bdev->glob;
+   struct ttm_mem_type_manager *man = >man[mem_type];
+   struct ttm_buffer_object *bo = NULL, *first_bo = NUL

[PATCH 2/2] drm/amd/display: use ttm_eu_reserve_buffers instead of amdgpu_bo_reserve

2019-05-07 Thread Chunming Zhou
add ticket for display bo, so that it can preempt busy bo.

Change-Id: I9f031cdcc8267de00e819ae303baa0a52df8ebb9
Signed-off-by: Chunming Zhou 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 22 ++-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index ac22f7351a42..8633d52e3fbe 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4176,6 +4176,9 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
struct amdgpu_device *adev;
struct amdgpu_bo *rbo;
struct dm_plane_state *dm_plane_state_new, *dm_plane_state_old;
+   struct list_head list, duplicates;
+   struct ttm_validate_buffer tv;
+   struct ww_acquire_ctx ticket;
uint64_t tiling_flags;
uint32_t domain;
int r;
@@ -4192,9 +4195,18 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
obj = new_state->fb->obj[0];
rbo = gem_to_amdgpu_bo(obj);
adev = amdgpu_ttm_adev(rbo->tbo.bdev);
-   r = amdgpu_bo_reserve(rbo, false);
-   if (unlikely(r != 0))
+   INIT_LIST_HEAD();
+   INIT_LIST_HEAD();
+
+   tv.bo = >tbo;
+   tv.num_shared = 1;
+   list_add(, );
+
+   r = ttm_eu_reserve_buffers(, , false, );
+   if (r) {
+   dev_err(adev->dev, "fail to reserve bo (%d)\n", r);
return r;
+   }
 
if (plane->type != DRM_PLANE_TYPE_CURSOR)
domain = amdgpu_display_supported_domains(adev);
@@ -4205,21 +4217,21 @@ static int dm_plane_helper_prepare_fb(struct drm_plane 
*plane,
if (unlikely(r != 0)) {
if (r != -ERESTARTSYS)
DRM_ERROR("Failed to pin framebuffer with error %d\n", 
r);
-   amdgpu_bo_unreserve(rbo);
+   ttm_eu_backoff_reservation(, );
return r;
}
 
r = amdgpu_ttm_alloc_gart(>tbo);
if (unlikely(r != 0)) {
amdgpu_bo_unpin(rbo);
-   amdgpu_bo_unreserve(rbo);
+   ttm_eu_backoff_reservation(, );
DRM_ERROR("%p bind failed\n", rbo);
return r;
}
 
amdgpu_bo_get_tiling_flags(rbo, _flags);
 
-   amdgpu_bo_unreserve(rbo);
+   ttm_eu_backoff_reservation(, );
 
afb->address = amdgpu_bo_gpu_offset(rbo);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v5 6/6] drm/amdgpu: Avoid HW reset if guilty job already signaled.

2019-04-22 Thread Chunming Zhou
+Monk.

GPU reset is used widely in SRIOV, so need virtulizatino guy take a look.

But out of curious, why guilty job can signal more if the job is already 
set to guilty? set it wrongly?


-David

在 2019/4/18 23:00, Andrey Grodzovsky 写道:
> Also reject TDRs if another one already running.
>
> v2:
> Stop all schedulers across device and entire XGMI hive before
> force signaling HW fences.
> Avoid passing job_signaled to helper fnctions to keep all the decision
> making about skipping HW reset in one place.
>
> v3:
> Fix SW sched. hang after non HW reset. sched.hw_rq_count has to be balanced
> against it's decrement in drm_sched_stop in non HW reset case.
> v4: rebase
> v5: Revert v3 as we do it now in sceduler code.
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 143 
> +++--
>   1 file changed, 95 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index a0e165c..85f8792 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3334,8 +3334,6 @@ static int amdgpu_device_pre_asic_reset(struct 
> amdgpu_device *adev,
>   if (!ring || !ring->sched.thread)
>   continue;
>   
> - drm_sched_stop(>sched, >base);
> -
>   /* after all hw jobs are reset, hw fence is meaningless, so 
> force_completion */
>   amdgpu_fence_driver_force_completion(ring);
>   }
> @@ -3343,6 +3341,7 @@ static int amdgpu_device_pre_asic_reset(struct 
> amdgpu_device *adev,
>   if(job)
>   drm_sched_increase_karma(>base);
>   
> + /* Don't suspend on bare metal if we are not going to HW reset the ASIC 
> */
>   if (!amdgpu_sriov_vf(adev)) {
>   
>   if (!need_full_reset)
> @@ -3480,37 +3479,21 @@ static int amdgpu_do_asic_reset(struct 
> amdgpu_hive_info *hive,
>   return r;
>   }
>   
> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
> +static bool amdgpu_device_lock_adev(struct amdgpu_device *adev, bool trylock)
>   {
> - int i;
> -
> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> - struct amdgpu_ring *ring = adev->rings[i];
> -
> - if (!ring || !ring->sched.thread)
> - continue;
> -
> - if (!adev->asic_reset_res)
> - drm_sched_resubmit_jobs(>sched);
> + if (trylock) {
> + if (!mutex_trylock(>lock_reset))
> + return false;
> + } else
> + mutex_lock(>lock_reset);
>   
> - drm_sched_start(>sched, !adev->asic_reset_res);
> - }
> -
> - if (!amdgpu_device_has_dc_support(adev)) {
> - drm_helper_resume_force_mode(adev->ddev);
> - }
> -
> - adev->asic_reset_res = 0;
> -}
> -
> -static void amdgpu_device_lock_adev(struct amdgpu_device *adev)
> -{
> - mutex_lock(>lock_reset);
>   atomic_inc(>gpu_reset_counter);
>   adev->in_gpu_reset = 1;
>   /* Block kfd: SRIOV would do it separately */
>   if (!amdgpu_sriov_vf(adev))
>   amdgpu_amdkfd_pre_reset(adev);
> +
> + return true;
>   }
>   
>   static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
> @@ -3538,40 +3521,42 @@ static void amdgpu_device_unlock_adev(struct 
> amdgpu_device *adev)
>   int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> struct amdgpu_job *job)
>   {
> - int r;
> + struct list_head device_list, *device_list_handle =  NULL;
> + bool need_full_reset, job_signaled;
>   struct amdgpu_hive_info *hive = NULL;
> - bool need_full_reset = false;
>   struct amdgpu_device *tmp_adev = NULL;
> - struct list_head device_list, *device_list_handle =  NULL;
> + int i, r = 0;
>   
> + need_full_reset = job_signaled = false;
>   INIT_LIST_HEAD(_list);
>   
>   dev_info(adev->dev, "GPU reset begin!\n");
>   
> + hive = amdgpu_get_xgmi_hive(adev, false);
> +
>   /*
> -  * In case of XGMI hive disallow concurrent resets to be triggered
> -  * by different nodes. No point also since the one node already 
> executing
> -  * reset will also reset all the other nodes in the hive.
> +  * Here we trylock to avoid chain of resets executing from
> +  * either trigger by jobs on different adevs in XGMI hive or jobs on
> +  * different schedulers for same device while this TO handler is 
> running.
> +  * We always reset all schedulers for device and all devices for XGMI
> +  * hive so that should take care of them too.
>*/
> - hive = amdgpu_get_xgmi_hive(adev, 0);
> - if (hive && adev->gmc.xgmi.num_physical_nodes > 1 &&
> - !mutex_trylock(>reset_lock))
> +
> + if (hive && !mutex_trylock(>reset_lock)) {
> + DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another 
> already in 

Re: [PATCH v5 4/6] drm/sched: Keep s_fence->parent pointer

2019-04-22 Thread Chunming Zhou
+Monk to response this patch.


在 2019/4/18 23:00, Andrey Grodzovsky 写道:
> For later driver's reference to see if the fence is signaled.
>
> v2: Move parent fence put to resubmit jobs.
>
> Signed-off-by: Andrey Grodzovsky 
> Reviewed-by: Christian König 
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 11 +--
>   1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 7816de7..03e6bd8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -375,8 +375,6 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
> struct drm_sched_job *bad)
>   if (s_job->s_fence->parent &&
>   dma_fence_remove_callback(s_job->s_fence->parent,
> _job->cb)) {
> - dma_fence_put(s_job->s_fence->parent);
> - s_job->s_fence->parent = NULL;

I vaguely remember Monk set parent to be NULL to avoiod potiential free 
problem after callback removal.


-David


>   atomic_dec(>hw_rq_count);
>   } else {
>   /*
> @@ -403,6 +401,14 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, 
> struct drm_sched_job *bad)
>   sched->ops->free_job(s_job);
>   }
>   }
> +
> + /*
> +  * Stop pending timer in flight as we rearm it in  drm_sched_start. This
> +  * avoids the pending timeout work in progress to fire right away after
> +  * this TDR finished and before the newly restarted jobs had a
> +  * chance to complete.
> +  */
> + cancel_delayed_work(>work_tdr);
>   }
>   
>   EXPORT_SYMBOL(drm_sched_stop);
> @@ -477,6 +483,7 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
> *sched)
>   if (found_guilty && s_job->s_fence->scheduled.context == 
> guilty_context)
>   dma_fence_set_error(_fence->finished, -ECANCELED);
>   
> + dma_fence_put(s_job->s_fence->parent);
>   s_job->s_fence->parent = sched->ops->run_job(s_job);
>   }
>   }
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v5 3/6] drm/scheduler: rework job destruction

2019-04-22 Thread Chunming Zhou
Hi Andrey,

static void drm_sched_process_job(struct dma_fence *f, struct 
dma_fence_cb *cb)
{
...
     spin_lock_irqsave(>job_list_lock, flags);
     /* remove job from ring_mirror_list */
     list_del_init(_job->node);
     spin_unlock_irqrestore(>job_list_lock, flags);
[David] How about just remove above to worker from irq process? Any 
problem? Maybe I missed previous your discussion, but I think removing 
lock for list is a risk for future maintenance although you make sure 
thread safe currently.

-David

...

     schedule_work(_job->finish_work);
}

在 2019/4/18 23:00, Andrey Grodzovsky 写道:
> From: Christian König 
>
> We now destroy finished jobs from the worker thread to make sure that
> we never destroy a job currently in timeout processing.
> By this we avoid holding lock around ring mirror list in drm_sched_stop
> which should solve a deadlock reported by a user.
>
> v2: Remove unused variable.
> v4: Move guilty job free into sched code.
> v5:
> Move sched->hw_rq_count to drm_sched_start to account for counter
> decrement in drm_sched_stop even when we don't call resubmit jobs
> if guily job did signal.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109692
>
> Signed-off-by: Christian König 
> Signed-off-by: Andrey Grodzovsky 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   9 +-
>   drivers/gpu/drm/etnaviv/etnaviv_dump.c |   4 -
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c|   2 +-
>   drivers/gpu/drm/lima/lima_sched.c  |   2 +-
>   drivers/gpu/drm/panfrost/panfrost_job.c|   2 +-
>   drivers/gpu/drm/scheduler/sched_main.c | 159 
> +
>   drivers/gpu/drm/v3d/v3d_sched.c|   2 +-
>   include/drm/gpu_scheduler.h|   6 +-
>   8 files changed, 102 insertions(+), 84 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7cee269..a0e165c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3334,7 +3334,7 @@ static int amdgpu_device_pre_asic_reset(struct 
> amdgpu_device *adev,
>   if (!ring || !ring->sched.thread)
>   continue;
>   
> - drm_sched_stop(>sched);
> + drm_sched_stop(>sched, >base);
>   
>   /* after all hw jobs are reset, hw fence is meaningless, so 
> force_completion */
>   amdgpu_fence_driver_force_completion(ring);
> @@ -3343,8 +3343,6 @@ static int amdgpu_device_pre_asic_reset(struct 
> amdgpu_device *adev,
>   if(job)
>   drm_sched_increase_karma(>base);
>   
> -
> -
>   if (!amdgpu_sriov_vf(adev)) {
>   
>   if (!need_full_reset)
> @@ -3482,8 +3480,7 @@ static int amdgpu_do_asic_reset(struct amdgpu_hive_info 
> *hive,
>   return r;
>   }
>   
> -static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev,
> -   struct amdgpu_job *job)
> +static void amdgpu_device_post_asic_reset(struct amdgpu_device *adev)
>   {
>   int i;
>   
> @@ -3623,7 +3620,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   
>   /* Post ASIC reset for all devs .*/
>   list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) {
> - amdgpu_device_post_asic_reset(tmp_adev, tmp_adev == adev ? job 
> : NULL);
> + amdgpu_device_post_asic_reset(tmp_adev);
>   
>   if (r) {
>   /* bad news, how to tell it to userspace ? */
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c 
> b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> index 33854c9..5778d9c 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
> @@ -135,13 +135,11 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
>   mmu_size + gpu->buffer.size;
>   
>   /* Add in the active command buffers */
> - spin_lock_irqsave(>sched.job_list_lock, flags);
>   list_for_each_entry(s_job, >sched.ring_mirror_list, node) {
>   submit = to_etnaviv_submit(s_job);
>   file_size += submit->cmdbuf.size;
>   n_obj++;
>   }
> - spin_unlock_irqrestore(>sched.job_list_lock, flags);
>   
>   /* Add in the active buffer objects */
>   list_for_each_entry(vram, >mmu->mappings, mmu_node) {
> @@ -183,14 +181,12 @@ void etnaviv_core_dump(struct etnaviv_gpu *gpu)
> gpu->buffer.size,
> etnaviv_cmdbuf_get_va(>buffer));
>   
> - spin_lock_irqsave(>sched.job_list_lock, flags);
>   list_for_each_entry(s_job, >sched.ring_mirror_list, node) {
>   submit = to_etnaviv_submit(s_job);
>   etnaviv_core_dump_mem(, ETDUMP_BUF_CMD,
> submit->cmdbuf.vaddr, submit->cmdbuf.size,
> etnaviv_cmdbuf_get_va(>cmdbuf));
>   }
> - 

Re: dynamic DMA-buf sharing between devices

2019-04-17 Thread Chunming Zhou
I like you do somethings step by step, you can ping me when they are ready.

-David

在 2019/4/17 21:59, Christian König 写道:
> On top of those I have 6 more patches in the pipeline to enable VRAM 
> P2P with DMA-buf.
>
> So that is not the end of the patch set :)
>
> Christian.
>
> Am 17.04.19 um 15:52 schrieb Chunming Zhou:
>> Thanks Christian, great job. I will verify it this week when I finish my
>> current work on hand.
>>
>> -David
>>
>> 在 2019/4/17 2:38, Christian König wrote:
>>> Hi everybody,
>>>
>>> core idea in this patch set is that DMA-buf importers can now 
>>> provide an optional invalidate callback. Using this callback and the 
>>> reservation object exporters can now avoid pinning DMA-buf memory 
>>> for a long time while sharing it between devices.
>>>
>>> I've already send out an older version roughly a year ago, but 
>>> didn't had time to further look into cleaning this up.
>>>
>>> The last time a major problem was that we would had to fix up all 
>>> drivers implementing DMA-buf at once.
>>>
>>> Now I avoid this by allowing mappings to be cached in the DMA-buf 
>>> attachment and so driver can optionally move over to the new 
>>> interface one by one.
>>>
>>> This is also a prerequisite to my patchset enabling sharing of 
>>> device memory with DMA-buf.
>>>
>>> Please review and/or comment,
>>> Christian.
>>>
>>>
>>> ___
>>> dri-devel mailing list
>>> dri-de...@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: dynamic DMA-buf sharing between devices

2019-04-17 Thread Chunming Zhou
Thanks Christian, great job. I will verify it this week when I finish my 
current work on hand.

-David

在 2019/4/17 2:38, Christian König wrote:
> Hi everybody,
>
> core idea in this patch set is that DMA-buf importers can now provide an 
> optional invalidate callback. Using this callback and the reservation object 
> exporters can now avoid pinning DMA-buf memory for a long time while sharing 
> it between devices.
>
> I've already send out an older version roughly a year ago, but didn't had 
> time to further look into cleaning this up.
>
> The last time a major problem was that we would had to fix up all drivers 
> implementing DMA-buf at once.
>
> Now I avoid this by allowing mappings to be cached in the DMA-buf attachment 
> and so driver can optionally move over to the new interface one by one.
>
> This is also a prerequisite to my patchset enabling sharing of device memory 
> with DMA-buf.
>
> Please review and/or comment,
> Christian.
>
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 5/8] add timeline signal/transfer ioctls v2

2019-04-09 Thread Chunming Zhou
v2: use one transfer ioctl

Signed-off-by: Chunming Zhou 
---
 xf86drm.c | 33 +
 xf86drm.h |  6 ++
 2 files changed, 39 insertions(+)

diff --git a/xf86drm.c b/xf86drm.c
index 66e0c985..d57c4218 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4280,6 +4280,21 @@ drm_public int drmSyncobjSignal(int fd, const uint32_t 
*handles,
 return ret;
 }
 
+drm_public int drmSyncobjTimelineSignal(int fd, const uint32_t *handles,
+   uint64_t *points, uint32_t handle_count)
+{
+struct drm_syncobj_timeline_array args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.count_handles = handle_count;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, );
+return ret;
+}
+
 drm_public int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t 
*points,
  unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
@@ -4322,4 +4337,22 @@ drm_public int drmSyncobjQuery(int fd, uint32_t 
*handles, uint64_t *points,
 return 0;
 }
 
+drm_public int drmSyncobjTransfer(int fd,
+ uint32_t dst_handle, uint64_t dst_point,
+ uint32_t src_handle, uint64_t src_point,
+ uint32_t flags)
+{
+struct drm_syncobj_transfer args;
+int ret;
+
+memclear(args);
+args.src_handle = src_handle;
+args.dst_handle = dst_handle;
+args.src_point = src_point;
+args.dst_point = dst_point;
+args.flags = flags;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TRANSFER, );
 
+return ret;
+}
diff --git a/xf86drm.h b/xf86drm.h
index 60c7a84f..3fb1d1ca 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -876,12 +876,18 @@ extern int drmSyncobjWait(int fd, uint32_t *handles, 
unsigned num_handles,
  uint32_t *first_signaled);
 extern int drmSyncobjReset(int fd, const uint32_t *handles, uint32_t 
handle_count);
 extern int drmSyncobjSignal(int fd, const uint32_t *handles, uint32_t 
handle_count);
+extern int drmSyncobjTimelineSignal(int fd, const uint32_t *handles,
+   uint64_t *points, uint32_t handle_count);
 extern int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t *points,
  unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
  uint32_t *first_signaled);
 extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
   uint32_t handle_count);
+extern int drmSyncobjTransfer(int fd,
+ uint32_t dst_handle, uint64_t dst_point,
+ uint32_t src_handle, uint64_t src_point,
+ uint32_t flags);
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 7/8] wrap transfer interfaces

2019-04-09 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 amdgpu/amdgpu.h| 22 ++
 amdgpu/amdgpu_cs.c | 16 
 2 files changed, 38 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index b5bd3ed9..2350835b 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1670,6 +1670,28 @@ int 
amdgpu_cs_syncobj_import_sync_file2(amdgpu_device_handle dev,
uint32_t syncobj,
uint64_t point,
int sync_file_fd);
+
+/**
+ *  transfer between syncbojs.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   dst_handle - \c [in] sync object handle
+ * \param   dst_point  - \c [in] timeline point, 0 presents dst is binary
+ * \param   src_handle - \c [in] sync object handle
+ * \param   src_point  - \c [in] timeline point, 0 presents src is binary
+ * \param   flags  - \c [in] flags
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_transfer(amdgpu_device_handle dev,
+  uint32_t dst_handle,
+  uint64_t dst_point,
+  uint32_t src_handle,
+  uint64_t src_point,
+  uint32_t flags);
+
 /**
  * Export an amdgpu fence as a handle (syncobj or fd).
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 1c02d16f..a1c1af55 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -792,6 +792,22 @@ out:
return ret;
 }
 
+drm_public int amdgpu_cs_syncobj_transfer(amdgpu_device_handle dev,
+ uint32_t dst_handle,
+ uint64_t dst_point,
+ uint32_t src_handle,
+ uint64_t src_point,
+ uint32_t flags)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTransfer(dev->fd,
+ dst_handle, dst_point,
+ src_handle, src_point,
+ flags);
+}
+
 drm_public int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
amdgpu_context_handle context,
amdgpu_bo_list_handle bo_list_handle,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 3/8] add timeline wait/query ioctl v2

2019-04-09 Thread Chunming Zhou
v2: drop export/import

Signed-off-by: Chunming Zhou 
---
 xf86drm.c | 44 
 xf86drm.h |  6 ++
 2 files changed, 50 insertions(+)

diff --git a/xf86drm.c b/xf86drm.c
index 18ad7c58..66e0c985 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4279,3 +4279,47 @@ drm_public int drmSyncobjSignal(int fd, const uint32_t 
*handles,
 ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_SIGNAL, );
 return ret;
 }
+
+drm_public int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t 
*points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled)
+{
+struct drm_syncobj_timeline_wait args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.timeout_nsec = timeout_nsec;
+args.count_handles = num_handles;
+args.flags = flags;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, );
+if (ret < 0)
+return -errno;
+
+if (first_signaled)
+*first_signaled = args.first_signaled;
+return ret;
+}
+
+
+drm_public int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
+  uint32_t handle_count)
+{
+struct drm_syncobj_timeline_array args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.count_handles = handle_count;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_QUERY, );
+if (ret)
+return ret;
+return 0;
+}
+
+
diff --git a/xf86drm.h b/xf86drm.h
index 887ecc76..60c7a84f 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -876,6 +876,12 @@ extern int drmSyncobjWait(int fd, uint32_t *handles, 
unsigned num_handles,
  uint32_t *first_signaled);
 extern int drmSyncobjReset(int fd, const uint32_t *handles, uint32_t 
handle_count);
 extern int drmSyncobjSignal(int fd, const uint32_t *handles, uint32_t 
handle_count);
+extern int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t *points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled);
+extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
+  uint32_t handle_count);
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 2/8] addr cs chunk for syncobj timeline

2019-04-09 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 include/drm/amdgpu_drm.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index e3a97da4..ab53f2e0 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -528,6 +528,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
 #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
 #define AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES 0x07
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT0x08
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL  0x09
 
 struct drm_amdgpu_cs_chunk {
__u32   chunk_id;
@@ -608,6 +610,13 @@ struct drm_amdgpu_cs_chunk_sem {
__u32 handle;
 };
 
+struct drm_amdgpu_cs_chunk_syncobj {
+   __u32 handle;
+   __u32 flags;
+   __u64 point;
+};
+
+
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ 0
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD  1
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD2
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 6/8] expose timeline signal/export/import interfaces v2

2019-04-09 Thread Chunming Zhou
v2: adapt to new one transfer ioctl

Signed-off-by: Chunming Zhou 
---
 amdgpu/amdgpu-symbol-check |  3 ++
 amdgpu/amdgpu.h| 51 
 amdgpu/amdgpu_cs.c | 68 ++
 3 files changed, 122 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 67ba3039..0cc54e5e 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -51,10 +51,13 @@ amdgpu_cs_submit
 amdgpu_cs_submit_raw
 amdgpu_cs_submit_raw2
 amdgpu_cs_syncobj_export_sync_file
+amdgpu_cs_syncobj_export_sync_file2
 amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_import_sync_file2
 amdgpu_cs_syncobj_query
 amdgpu_cs_syncobj_reset
 amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_signal
 amdgpu_cs_syncobj_timeline_wait
 amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index dcf662e9..b5bd3ed9 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1501,6 +1501,23 @@ int amdgpu_cs_syncobj_reset(amdgpu_device_handle dev,
 int amdgpu_cs_syncobj_signal(amdgpu_device_handle dev,
 const uint32_t *syncobjs, uint32_t syncobj_count);
 
+/**
+ * Signal kernel timeline sync objects.
+ *
+ * \param dev   - \c [in] device handle
+ * \param syncobjs  - \c [in] array of sync object handles
+ * \param points   - \c [in] array of timeline points
+ * \param syncobj_count - \c [in] number of handles in syncobjs
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_syncobj_timeline_signal(amdgpu_device_handle dev,
+ const uint32_t *syncobjs,
+ uint64_t *points,
+ uint32_t syncobj_count);
+
 /**
  *  Wait for one or all sync objects to signal.
  *
@@ -1618,7 +1635,41 @@ int 
amdgpu_cs_syncobj_export_sync_file(amdgpu_device_handle dev,
 int amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
   uint32_t syncobj,
   int sync_file_fd);
+/**
+ *  Export kernel timeline sync object to a sync_file.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   point  - \c [in] timeline point
+ * \param   flags  - \c [in] flags
+ * \param   sync_file_fd - \c [out] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_export_sync_file2(amdgpu_device_handle dev,
+   uint32_t syncobj,
+   uint64_t point,
+   uint32_t flags,
+   int *sync_file_fd);
 
+/**
+ *  Import kernel timeline sync object from a sync_file.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   point  - \c [in] timeline point
+ * \param   sync_file_fd - \c [in] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_import_sync_file2(amdgpu_device_handle dev,
+   uint32_t syncobj,
+   uint64_t point,
+   int sync_file_fd);
 /**
  * Export an amdgpu fence as a handle (syncobj or fd).
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index b8b0d566..1c02d16f 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -649,6 +649,18 @@ drm_public int 
amdgpu_cs_syncobj_signal(amdgpu_device_handle dev,
return drmSyncobjSignal(dev->fd, syncobjs, syncobj_count);
 }
 
+drm_public int amdgpu_cs_syncobj_timeline_signal(amdgpu_device_handle dev,
+const uint32_t *syncobjs,
+uint64_t *points,
+uint32_t syncobj_count)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTimelineSignal(dev->fd, syncobjs,
+   points, syncobj_count);
+}
+
 drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
  uint32_t *handles, unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
@@ -724,6 +736,62 @@ drm_public int 
amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
return drmSyncobjImportSyncFile(dev->fd, syncobj, sync_file_fd);
 }
 
+drm_public int amdgpu_cs_syncobj_export_sync_file2(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  uint64_t point,
+  

[PATCH libdrm 4/8] wrap syncobj timeline query/wait APIs for amdgpu v3

2019-04-09 Thread Chunming Zhou
v2: symbos are stored in lexical order.
v3: drop export/import and extra query indirection

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
 amdgpu/amdgpu-symbol-check |  2 ++
 amdgpu/amdgpu.h| 39 ++
 amdgpu/amdgpu_cs.c | 23 ++
 3 files changed, 64 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 96a44b40..67ba3039 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -52,8 +52,10 @@ amdgpu_cs_submit_raw
 amdgpu_cs_submit_raw2
 amdgpu_cs_syncobj_export_sync_file
 amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_query
 amdgpu_cs_syncobj_reset
 amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_wait
 amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
 amdgpu_cs_wait_semaphore
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index d6de3b8d..dcf662e9 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1521,6 +1521,45 @@ int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
   int64_t timeout_nsec, unsigned flags,
   uint32_t *first_signaled);
 
+/**
+ *  Wait for one or all sync objects on their points to signal.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [in] array of sync points to wait
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   timeout_nsec - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_*
+ * \param   first_signaled - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles,
+   int64_t timeout_nsec, unsigned flags,
+   uint32_t *first_signaled);
+/**
+ *  Query sync objects payloads.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [out] array of sync points returned, which presents
+ * syncobj payload.
+ * \param   num_handles - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles);
+
 /**
  *  Export kernel sync object to shareable fd.
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 5bedf748..b8b0d566 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -661,6 +661,29 @@ drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle 
dev,
  flags, first_signaled);
 }
 
+drm_public int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+  uint32_t *handles, uint64_t 
*points,
+  unsigned num_handles,
+  int64_t timeout_nsec, unsigned 
flags,
+  uint32_t *first_signaled)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTimelineWait(dev->fd, handles, points, num_handles,
+ timeout_nsec, flags, first_signaled);
+}
+
+drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+  uint32_t *handles, uint64_t *points,
+  unsigned num_handles)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjQuery(dev->fd, handles, points, num_handles);
+}
+
 drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
uint32_t handle,
int *shared_fd)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 8/8] add syncobj timeline tests v3

2019-04-09 Thread Chunming Zhou
v2: drop DRM_SYNCOBJ_CREATE_TYPE_TIMELINE, fix timeout calculation,
fix some warnings
v3: add export/import and cpu signal testing cases

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
 tests/amdgpu/Makefile.am |   3 +-
 tests/amdgpu/amdgpu_test.c   |  11 ++
 tests/amdgpu/amdgpu_test.h   |  21 +++
 tests/amdgpu/meson.build |   2 +-
 tests/amdgpu/syncobj_tests.c | 290 +++
 5 files changed, 325 insertions(+), 2 deletions(-)
 create mode 100644 tests/amdgpu/syncobj_tests.c

diff --git a/tests/amdgpu/Makefile.am b/tests/amdgpu/Makefile.am
index 48278848..920882d0 100644
--- a/tests/amdgpu/Makefile.am
+++ b/tests/amdgpu/Makefile.am
@@ -34,4 +34,5 @@ amdgpu_test_SOURCES = \
uve_ib.h \
deadlock_tests.c \
vm_tests.c  \
-   ras_tests.c
+   ras_tests.c \
+   syncobj_tests.c
diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
index 8fc7a0b9..214c7fce 100644
--- a/tests/amdgpu/amdgpu_test.c
+++ b/tests/amdgpu/amdgpu_test.c
@@ -57,6 +57,7 @@
 #define DEADLOCK_TESTS_STR "Deadlock Tests"
 #define VM_TESTS_STR "VM Tests"
 #define RAS_TESTS_STR "RAS Tests"
+#define SYNCOBJ_TIMELINE_TESTS_STR "SYNCOBJ TIMELINE Tests"
 
 /**
  *  Open handles for amdgpu devices
@@ -123,6 +124,12 @@ static CU_SuiteInfo suites[] = {
.pCleanupFunc = suite_ras_tests_clean,
.pTests = ras_tests,
},
+   {
+   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+   .pInitFunc = suite_syncobj_timeline_tests_init,
+   .pCleanupFunc = suite_syncobj_timeline_tests_clean,
+   .pTests = syncobj_timeline_tests,
+   },
 
CU_SUITE_INFO_NULL,
 };
@@ -176,6 +183,10 @@ static Suites_Active_Status suites_active_stat[] = {
.pName = RAS_TESTS_STR,
.pActive = suite_ras_tests_enable,
},
+   {
+   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+   .pActive = suite_syncobj_timeline_tests_enable,
+   },
 };
 
 
diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h
index bcd0bc7e..36675ea3 100644
--- a/tests/amdgpu/amdgpu_test.h
+++ b/tests/amdgpu/amdgpu_test.h
@@ -216,6 +216,27 @@ CU_BOOL suite_ras_tests_enable(void);
 extern CU_TestInfo ras_tests[];
 
 
+/**
+ * Initialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_init();
+
+/**
+ * Deinitialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_clean();
+
+/**
+ * Decide if the suite is enabled by default or not.
+ */
+CU_BOOL suite_syncobj_timeline_tests_enable(void);
+
+/**
+ * Tests in syncobj timeline test suite
+ */
+extern CU_TestInfo syncobj_timeline_tests[];
+
+
 /**
  * Helper functions
  */
diff --git a/tests/amdgpu/meson.build b/tests/amdgpu/meson.build
index 95ed9305..1726cb43 100644
--- a/tests/amdgpu/meson.build
+++ b/tests/amdgpu/meson.build
@@ -24,7 +24,7 @@ if dep_cunit.found()
 files(
   'amdgpu_test.c', 'basic_tests.c', 'bo_tests.c', 'cs_tests.c',
   'vce_tests.c', 'uvd_enc_tests.c', 'vcn_tests.c', 'deadlock_tests.c',
-  'vm_tests.c', 'ras_tests.c',
+  'vm_tests.c', 'ras_tests.c', 'syncobj_tests.c',
 ),
 dependencies : [dep_cunit, dep_threads],
 include_directories : [inc_root, inc_drm, 
include_directories('../../amdgpu')],
diff --git a/tests/amdgpu/syncobj_tests.c b/tests/amdgpu/syncobj_tests.c
new file mode 100644
index ..a0c627d7
--- /dev/null
+++ b/tests/amdgpu/syncobj_tests.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright 2017 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+*/
+
+#include "CUnit/Basic.h"
+
+#include "amdgpu_test.h"
+#include "amdgpu_drm.h"
+#include "amdgpu_internal.h"
+#include 
+
+static  amdgpu_device_hand

[PATCH libdrm 1/8] new syncobj extension v3

2019-04-09 Thread Chunming Zhou
v2: drop not implemented IOCTLs and flags
v3: add transfer/signal ioctls

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
 include/drm/drm.h | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/include/drm/drm.h b/include/drm/drm.h
index 85c685a2..26f51bca 100644
--- a/include/drm/drm.h
+++ b/include/drm/drm.h
@@ -729,8 +729,18 @@ struct drm_syncobj_handle {
__u32 pad;
 };
 
+struct drm_syncobj_transfer {
+__u32 src_handle;
+__u32 dst_handle;
+__u64 src_point;
+__u64 dst_point;
+__u32 flags;
+__u32 pad;
+};
+
 #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
 #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2)
 struct drm_syncobj_wait {
__u64 handles;
/* absolute timeout */
@@ -741,12 +751,31 @@ struct drm_syncobj_wait {
__u32 pad;
 };
 
+struct drm_syncobj_timeline_wait {
+__u64 handles;
+/* wait on specific timeline point for every handles*/
+__u64 points;
+/* absolute timeout */
+__s64 timeout_nsec;
+__u32 count_handles;
+__u32 flags;
+__u32 first_signaled; /* only valid when not waiting all */
+__u32 pad;
+};
+
 struct drm_syncobj_array {
__u64 handles;
__u32 count_handles;
__u32 pad;
 };
 
+struct drm_syncobj_timeline_array {
+__u64 handles;
+__u64 points;
+__u32 count_handles;
+__u32 pad;
+};
+
 /* Query current scanout sequence number */
 struct drm_crtc_get_sequence {
__u32 crtc_id;  /* requested crtc_id */
@@ -903,6 +932,12 @@ extern "C" {
 #define DRM_IOCTL_MODE_GET_LEASE   DRM_IOWR(0xC8, struct 
drm_mode_get_lease)
 #define DRM_IOCTL_MODE_REVOKE_LEASEDRM_IOWR(0xC9, struct 
drm_mode_revoke_lease)
 
+#define DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT DRM_IOWR(0xCA, struct 
drm_syncobj_timeline_wait)
+#define DRM_IOCTL_SYNCOBJ_QUERY DRM_IOWR(0xCB, struct 
drm_syncobj_timeline_array)
+#define DRM_IOCTL_SYNCOBJ_TRANSFER DRM_IOWR(0xCC, struct 
drm_syncobj_transfer)
+#define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL   DRM_IOWR(0xCD, struct 
drm_syncobj_timeline_array)
+
+
 /**
  * Device specific ioctls should only be in their respective headers
  * The device specific ioctl range is from 0x40 to 0x9f.
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix old fence check in amdgpu_fence_emit

2019-04-01 Thread Chunming Zhou

在 2019/4/1 22:07, Koenig, Christian 写道:
> Am 01.04.19 um 16:04 schrieb Zhou, David(ChunMing):
>> 在 2019/4/1 21:05, Christian König 写道:
>>> Am 01.04.19 um 04:54 schrieb Zhou, David(ChunMing):
>>>>> -Original Message-
>>>>> From: amd-gfx  On Behalf Of
>>>>> Christian K?nig
>>>>> Sent: Saturday, March 30, 2019 2:33 AM
>>>>> To: amd-gfx@lists.freedesktop.org
>>>>> Subject: [PATCH] drm/amdgpu: fix old fence check in amdgpu_fence_emit
>>>>>
>>>>> We don't hold a reference to the old fence, so it can go away any
>>>>> time we are
>>>>> waiting for it to signal.
>>>>>
>>>>> Signed-off-by: Christian König 
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 24 -
>>>>> --
>>>>>     1 file changed, 17 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> index ee47c11e92ce..4dee2326b29c 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> @@ -136,8 +136,9 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
>>>>> struct dma_fence **f,  {
>>>>>     struct amdgpu_device *adev = ring->adev;
>>>>>     struct amdgpu_fence *fence;
>>>>> -    struct dma_fence *old, **ptr;
>>>>> +    struct dma_fence __rcu **ptr;
>>>>>     uint32_t seq;
>>>>> +    int r;
>>>>>
>>>>>     fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
>>>>>     if (fence == NULL)
>>>>> @@ -153,15 +154,24 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
>>>>> struct dma_fence **f,
>>>>>    seq, flags | AMDGPU_FENCE_FLAG_INT);
>>>>>
>>>>>     ptr = >fence_drv.fences[seq & ring-
>>>>>> fence_drv.num_fences_mask];
>>>>> +    if (unlikely(rcu_dereference_protected(*ptr, 1))) {
>>>> Isn't this line redundant with dma_fence_get_rcu_safe? I think it's
>>>> unnecessary.
>>>> Otherwise looks ok to me.
>>> The key point is lock()+dma_fence_get_rcu_safe(ptr)+unlock() is rather
>>> expensive for something which is really unlikely.
>>>
>>> So we check here if we already see the variable as NULL and if that is
>>> true, then we can just skip the whole expensive dance.
>> but that is most unlikely case, isn't it?  That ptr is NULL seems only
>> when before first fence emitted.
> No, the pointer is set to NULL when the fence is processed. See
> amdgpu_fence_process.

Yeah, I see that RCU__INIT again for every singal fence.

Sorry for noise, pathc is Reviewed-by: Chunming Zhou 


-David

>
> Christian.
>
>>
>> -David
>>
>>> Christian.
>>>
>>>> -David
>>>>> +    struct dma_fence *old;
>>>>> +
>>>>> +    rcu_read_lock();
>>>>> +    old = dma_fence_get_rcu_safe(ptr);
>>>>> +    rcu_read_unlock();
>>>>> +
>>>>> +    if (old) {
>>>>> +    r = dma_fence_wait(old, false);
>>>>> +    dma_fence_put(old);
>>>>> +    if (r)
>>>>> +    return r;
>>>>> +    }
>>>>> +    }
>>>>> +
>>>>>     /* This function can't be called concurrently anyway, otherwise
>>>>>      * emitting the fence would mess up the hardware ring buffer.
>>>>>      */
>>>>> -    old = rcu_dereference_protected(*ptr, 1);
>>>>> -    if (old && !dma_fence_is_signaled(old)) {
>>>>> -    DRM_INFO("rcu slot is busy\n");
>>>>> -    dma_fence_wait(old, false);
>>>>> -    }
>>>>> -
>>>>>     rcu_assign_pointer(*ptr, dma_fence_get(>base));
>>>>>
>>>>>     *f = >base;
>>>>> -- 
>>>>> 2.17.1
>>>>>
>>>>> ___
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: fix old fence check in amdgpu_fence_emit

2019-04-01 Thread Chunming Zhou

在 2019/4/1 21:05, Christian König 写道:
> Am 01.04.19 um 04:54 schrieb Zhou, David(ChunMing):
>>
>>> -Original Message-
>>> From: amd-gfx  On Behalf Of
>>> Christian K?nig
>>> Sent: Saturday, March 30, 2019 2:33 AM
>>> To: amd-gfx@lists.freedesktop.org
>>> Subject: [PATCH] drm/amdgpu: fix old fence check in amdgpu_fence_emit
>>>
>>> We don't hold a reference to the old fence, so it can go away any 
>>> time we are
>>> waiting for it to signal.
>>>
>>> Signed-off-by: Christian König 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 24 -
>>> --
>>>   1 file changed, 17 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index ee47c11e92ce..4dee2326b29c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -136,8 +136,9 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
>>> struct dma_fence **f,  {
>>>   struct amdgpu_device *adev = ring->adev;
>>>   struct amdgpu_fence *fence;
>>> -    struct dma_fence *old, **ptr;
>>> +    struct dma_fence __rcu **ptr;
>>>   uint32_t seq;
>>> +    int r;
>>>
>>>   fence = kmem_cache_alloc(amdgpu_fence_slab, GFP_KERNEL);
>>>   if (fence == NULL)
>>> @@ -153,15 +154,24 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
>>> struct dma_fence **f,
>>>  seq, flags | AMDGPU_FENCE_FLAG_INT);
>>>
>>>   ptr = >fence_drv.fences[seq & ring-
 fence_drv.num_fences_mask];
>>> +    if (unlikely(rcu_dereference_protected(*ptr, 1))) {
>> Isn't this line redundant with dma_fence_get_rcu_safe? I think it's 
>> unnecessary.
>> Otherwise looks ok to me.
>
> The key point is lock()+dma_fence_get_rcu_safe(ptr)+unlock() is rather 
> expensive for something which is really unlikely.
>
> So we check here if we already see the variable as NULL and if that is 
> true, then we can just skip the whole expensive dance.

but that is most unlikely case, isn't it?  That ptr is NULL seems only 
when before first fence emitted.


-David

>
> Christian.
>
>>
>> -David
>>> +    struct dma_fence *old;
>>> +
>>> +    rcu_read_lock();
>>> +    old = dma_fence_get_rcu_safe(ptr);
>>> +    rcu_read_unlock();
>>> +
>>> +    if (old) {
>>> +    r = dma_fence_wait(old, false);
>>> +    dma_fence_put(old);
>>> +    if (r)
>>> +    return r;
>>> +    }
>>> +    }
>>> +
>>>   /* This function can't be called concurrently anyway, otherwise
>>>    * emitting the fence would mess up the hardware ring buffer.
>>>    */
>>> -    old = rcu_dereference_protected(*ptr, 1);
>>> -    if (old && !dma_fence_is_signaled(old)) {
>>> -    DRM_INFO("rcu slot is busy\n");
>>> -    dma_fence_wait(old, false);
>>> -    }
>>> -
>>>   rcu_assign_pointer(*ptr, dma_fence_get(>base));
>>>
>>>   *f = >base;
>>> -- 
>>> 2.17.1
>>>
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 4/8] wrap syncobj timeline query/wait APIs for amdgpu v3

2019-04-01 Thread Chunming Zhou
v2: symbos are stored in lexical order.
v3: drop export/import and extra query indirection

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
 amdgpu/amdgpu-symbol-check |  2 ++
 amdgpu/amdgpu.h| 39 ++
 amdgpu/amdgpu_cs.c | 23 ++
 3 files changed, 64 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 96a44b40..67ba3039 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -52,8 +52,10 @@ amdgpu_cs_submit_raw
 amdgpu_cs_submit_raw2
 amdgpu_cs_syncobj_export_sync_file
 amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_query
 amdgpu_cs_syncobj_reset
 amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_wait
 amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
 amdgpu_cs_wait_semaphore
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index d6de3b8d..dcf662e9 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1521,6 +1521,45 @@ int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
   int64_t timeout_nsec, unsigned flags,
   uint32_t *first_signaled);
 
+/**
+ *  Wait for one or all sync objects on their points to signal.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [in] array of sync points to wait
+ * \param   num_handles - \c [in] self-explanatory
+ * \param   timeout_nsec - \c [in] self-explanatory
+ * \param   flags   - \c [in] a bitmask of DRM_SYNCOBJ_WAIT_FLAGS_*
+ * \param   first_signaled - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles,
+   int64_t timeout_nsec, unsigned flags,
+   uint32_t *first_signaled);
+/**
+ *  Query sync objects payloads.
+ *
+ * \param   dev- \c [in] self-explanatory
+ * \param   handles - \c [in] array of sync object handles
+ * \param   points - \c [out] array of sync points returned, which presents
+ * syncobj payload.
+ * \param   num_handles - \c [in] self-explanatory
+ *
+ * \return   0 on success\n
+ *  -ETIME - Timeout
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+   uint32_t *handles, uint64_t *points,
+   unsigned num_handles);
+
 /**
  *  Export kernel sync object to shareable fd.
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 5bedf748..b8b0d566 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -661,6 +661,29 @@ drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle 
dev,
  flags, first_signaled);
 }
 
+drm_public int amdgpu_cs_syncobj_timeline_wait(amdgpu_device_handle dev,
+  uint32_t *handles, uint64_t 
*points,
+  unsigned num_handles,
+  int64_t timeout_nsec, unsigned 
flags,
+  uint32_t *first_signaled)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTimelineWait(dev->fd, handles, points, num_handles,
+ timeout_nsec, flags, first_signaled);
+}
+
+drm_public int amdgpu_cs_syncobj_query(amdgpu_device_handle dev,
+  uint32_t *handles, uint64_t *points,
+  unsigned num_handles)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjQuery(dev->fd, handles, points, num_handles);
+}
+
 drm_public int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
uint32_t handle,
int *shared_fd)
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 8/8] add syncobj timeline tests v3

2019-04-01 Thread Chunming Zhou
v2: drop DRM_SYNCOBJ_CREATE_TYPE_TIMELINE, fix timeout calculation,
fix some warnings
v3: add export/import and cpu signal testing cases

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
 tests/amdgpu/Makefile.am |   3 +-
 tests/amdgpu/amdgpu_test.c   |  12 ++
 tests/amdgpu/amdgpu_test.h   |  21 +++
 tests/amdgpu/meson.build |   2 +-
 tests/amdgpu/syncobj_tests.c | 290 +++
 5 files changed, 326 insertions(+), 2 deletions(-)
 create mode 100644 tests/amdgpu/syncobj_tests.c

diff --git a/tests/amdgpu/Makefile.am b/tests/amdgpu/Makefile.am
index 447ff217..d3fbe2bb 100644
--- a/tests/amdgpu/Makefile.am
+++ b/tests/amdgpu/Makefile.am
@@ -33,4 +33,5 @@ amdgpu_test_SOURCES = \
vcn_tests.c \
uve_ib.h \
deadlock_tests.c \
-   vm_tests.c
+   vm_tests.c \
+   syncobj_tests.c
diff --git a/tests/amdgpu/amdgpu_test.c b/tests/amdgpu/amdgpu_test.c
index a793ca7d..4377988d 100644
--- a/tests/amdgpu/amdgpu_test.c
+++ b/tests/amdgpu/amdgpu_test.c
@@ -56,6 +56,7 @@
 #define UVD_ENC_TESTS_STR "UVD ENC Tests"
 #define DEADLOCK_TESTS_STR "Deadlock Tests"
 #define VM_TESTS_STR "VM Tests"
+#define SYNCOBJ_TIMELINE_TESTS_STR "SYNCOBJ TIMELINE Tests"
 
 /**
  *  Open handles for amdgpu devices
@@ -116,6 +117,12 @@ static CU_SuiteInfo suites[] = {
.pCleanupFunc = suite_vm_tests_clean,
.pTests = vm_tests,
},
+   {
+   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+   .pInitFunc = suite_syncobj_timeline_tests_init,
+   .pCleanupFunc = suite_syncobj_timeline_tests_clean,
+   .pTests = syncobj_timeline_tests,
+   },
 
CU_SUITE_INFO_NULL,
 };
@@ -165,6 +172,11 @@ static Suites_Active_Status suites_active_stat[] = {
.pName = VM_TESTS_STR,
.pActive = suite_vm_tests_enable,
},
+   {
+   .pName = SYNCOBJ_TIMELINE_TESTS_STR,
+   .pActive = suite_syncobj_timeline_tests_enable,
+   },
+
 };
 
 
diff --git a/tests/amdgpu/amdgpu_test.h b/tests/amdgpu/amdgpu_test.h
index af81eea8..24d64b64 100644
--- a/tests/amdgpu/amdgpu_test.h
+++ b/tests/amdgpu/amdgpu_test.h
@@ -194,6 +194,27 @@ CU_BOOL suite_vm_tests_enable(void);
  */
 extern CU_TestInfo vm_tests[];
 
+/**
+ * Initialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_init();
+
+/**
+ * Deinitialize syncobj timeline test suite
+ */
+int suite_syncobj_timeline_tests_clean();
+
+/**
+ * Decide if the suite is enabled by default or not.
+ */
+CU_BOOL suite_syncobj_timeline_tests_enable(void);
+
+/**
+ * Tests in syncobj timeline test suite
+ */
+extern CU_TestInfo syncobj_timeline_tests[];
+
+
 /**
  * Helper functions
  */
diff --git a/tests/amdgpu/meson.build b/tests/amdgpu/meson.build
index 4c1237c6..3ceec715 100644
--- a/tests/amdgpu/meson.build
+++ b/tests/amdgpu/meson.build
@@ -24,7 +24,7 @@ if dep_cunit.found()
 files(
   'amdgpu_test.c', 'basic_tests.c', 'bo_tests.c', 'cs_tests.c',
   'vce_tests.c', 'uvd_enc_tests.c', 'vcn_tests.c', 'deadlock_tests.c',
-  'vm_tests.c',
+  'vm_tests.c', 'syncobj_tests.c',
 ),
 dependencies : [dep_cunit, dep_threads],
 include_directories : [inc_root, inc_drm, 
include_directories('../../amdgpu')],
diff --git a/tests/amdgpu/syncobj_tests.c b/tests/amdgpu/syncobj_tests.c
new file mode 100644
index ..a0c627d7
--- /dev/null
+++ b/tests/amdgpu/syncobj_tests.c
@@ -0,0 +1,290 @@
+/*
+ * Copyright 2017 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+*/
+
+#include "CUnit/Basic.h"
+
+#include "amdgpu_test.h"
+#include "amdgpu_drm.h"
+#include "amdgpu_internal.h"
+#include 
+
+static  amdgpu_device_handle device_handle;
+static  ui

[PATCH libdrm 5/8] add timeline signal/transfer ioctls v2

2019-04-01 Thread Chunming Zhou
v2: use one transfer ioctl

Signed-off-by: Chunming Zhou 
---
 xf86drm.c | 33 +
 xf86drm.h |  6 ++
 2 files changed, 39 insertions(+)

diff --git a/xf86drm.c b/xf86drm.c
index 66e0c985..d57c4218 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4280,6 +4280,21 @@ drm_public int drmSyncobjSignal(int fd, const uint32_t 
*handles,
 return ret;
 }
 
+drm_public int drmSyncobjTimelineSignal(int fd, const uint32_t *handles,
+   uint64_t *points, uint32_t handle_count)
+{
+struct drm_syncobj_timeline_array args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.count_handles = handle_count;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, );
+return ret;
+}
+
 drm_public int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t 
*points,
  unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
@@ -4322,4 +4337,22 @@ drm_public int drmSyncobjQuery(int fd, uint32_t 
*handles, uint64_t *points,
 return 0;
 }
 
+drm_public int drmSyncobjTransfer(int fd,
+ uint32_t dst_handle, uint64_t dst_point,
+ uint32_t src_handle, uint64_t src_point,
+ uint32_t flags)
+{
+struct drm_syncobj_transfer args;
+int ret;
+
+memclear(args);
+args.src_handle = src_handle;
+args.dst_handle = dst_handle;
+args.src_point = src_point;
+args.dst_point = dst_point;
+args.flags = flags;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TRANSFER, );
 
+return ret;
+}
diff --git a/xf86drm.h b/xf86drm.h
index 60c7a84f..3fb1d1ca 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -876,12 +876,18 @@ extern int drmSyncobjWait(int fd, uint32_t *handles, 
unsigned num_handles,
  uint32_t *first_signaled);
 extern int drmSyncobjReset(int fd, const uint32_t *handles, uint32_t 
handle_count);
 extern int drmSyncobjSignal(int fd, const uint32_t *handles, uint32_t 
handle_count);
+extern int drmSyncobjTimelineSignal(int fd, const uint32_t *handles,
+   uint64_t *points, uint32_t handle_count);
 extern int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t *points,
  unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
  uint32_t *first_signaled);
 extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
   uint32_t handle_count);
+extern int drmSyncobjTransfer(int fd,
+ uint32_t dst_handle, uint64_t dst_point,
+ uint32_t src_handle, uint64_t src_point,
+ uint32_t flags);
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 7/8] wrap transfer interfaces

2019-04-01 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 amdgpu/amdgpu.h| 22 ++
 amdgpu/amdgpu_cs.c | 16 
 2 files changed, 38 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index b5bd3ed9..2350835b 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1670,6 +1670,28 @@ int 
amdgpu_cs_syncobj_import_sync_file2(amdgpu_device_handle dev,
uint32_t syncobj,
uint64_t point,
int sync_file_fd);
+
+/**
+ *  transfer between syncbojs.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   dst_handle - \c [in] sync object handle
+ * \param   dst_point  - \c [in] timeline point, 0 presents dst is binary
+ * \param   src_handle - \c [in] sync object handle
+ * \param   src_point  - \c [in] timeline point, 0 presents src is binary
+ * \param   flags  - \c [in] flags
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_transfer(amdgpu_device_handle dev,
+  uint32_t dst_handle,
+  uint64_t dst_point,
+  uint32_t src_handle,
+  uint64_t src_point,
+  uint32_t flags);
+
 /**
  * Export an amdgpu fence as a handle (syncobj or fd).
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 1c02d16f..a1c1af55 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -792,6 +792,22 @@ out:
return ret;
 }
 
+drm_public int amdgpu_cs_syncobj_transfer(amdgpu_device_handle dev,
+ uint32_t dst_handle,
+ uint64_t dst_point,
+ uint32_t src_handle,
+ uint64_t src_point,
+ uint32_t flags)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTransfer(dev->fd,
+ dst_handle, dst_point,
+ src_handle, src_point,
+ flags);
+}
+
 drm_public int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
amdgpu_context_handle context,
amdgpu_bo_list_handle bo_list_handle,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 6/8] expose timeline signal/export/import interfaces v2

2019-04-01 Thread Chunming Zhou
v2: adapt to new one transfer ioctl

Signed-off-by: Chunming Zhou 
---
 amdgpu/amdgpu-symbol-check |  3 ++
 amdgpu/amdgpu.h| 51 
 amdgpu/amdgpu_cs.c | 68 ++
 3 files changed, 122 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 67ba3039..0cc54e5e 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -51,10 +51,13 @@ amdgpu_cs_submit
 amdgpu_cs_submit_raw
 amdgpu_cs_submit_raw2
 amdgpu_cs_syncobj_export_sync_file
+amdgpu_cs_syncobj_export_sync_file2
 amdgpu_cs_syncobj_import_sync_file
+amdgpu_cs_syncobj_import_sync_file2
 amdgpu_cs_syncobj_query
 amdgpu_cs_syncobj_reset
 amdgpu_cs_syncobj_signal
+amdgpu_cs_syncobj_timeline_signal
 amdgpu_cs_syncobj_timeline_wait
 amdgpu_cs_syncobj_wait
 amdgpu_cs_wait_fences
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index dcf662e9..b5bd3ed9 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1501,6 +1501,23 @@ int amdgpu_cs_syncobj_reset(amdgpu_device_handle dev,
 int amdgpu_cs_syncobj_signal(amdgpu_device_handle dev,
 const uint32_t *syncobjs, uint32_t syncobj_count);
 
+/**
+ * Signal kernel timeline sync objects.
+ *
+ * \param dev   - \c [in] device handle
+ * \param syncobjs  - \c [in] array of sync object handles
+ * \param points   - \c [in] array of timeline points
+ * \param syncobj_count - \c [in] number of handles in syncobjs
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_syncobj_timeline_signal(amdgpu_device_handle dev,
+ const uint32_t *syncobjs,
+ uint64_t *points,
+ uint32_t syncobj_count);
+
 /**
  *  Wait for one or all sync objects to signal.
  *
@@ -1618,7 +1635,41 @@ int 
amdgpu_cs_syncobj_export_sync_file(amdgpu_device_handle dev,
 int amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
   uint32_t syncobj,
   int sync_file_fd);
+/**
+ *  Export kernel timeline sync object to a sync_file.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   point  - \c [in] timeline point
+ * \param   flags  - \c [in] flags
+ * \param   sync_file_fd - \c [out] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_export_sync_file2(amdgpu_device_handle dev,
+   uint32_t syncobj,
+   uint64_t point,
+   uint32_t flags,
+   int *sync_file_fd);
 
+/**
+ *  Import kernel timeline sync object from a sync_file.
+ *
+ * \param   dev- \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   point  - \c [in] timeline point
+ * \param   sync_file_fd - \c [in] sync_file file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_syncobj_import_sync_file2(amdgpu_device_handle dev,
+   uint32_t syncobj,
+   uint64_t point,
+   int sync_file_fd);
 /**
  * Export an amdgpu fence as a handle (syncobj or fd).
  *
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index b8b0d566..1c02d16f 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -649,6 +649,18 @@ drm_public int 
amdgpu_cs_syncobj_signal(amdgpu_device_handle dev,
return drmSyncobjSignal(dev->fd, syncobjs, syncobj_count);
 }
 
+drm_public int amdgpu_cs_syncobj_timeline_signal(amdgpu_device_handle dev,
+const uint32_t *syncobjs,
+uint64_t *points,
+uint32_t syncobj_count)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjTimelineSignal(dev->fd, syncobjs,
+   points, syncobj_count);
+}
+
 drm_public int amdgpu_cs_syncobj_wait(amdgpu_device_handle dev,
  uint32_t *handles, unsigned num_handles,
  int64_t timeout_nsec, unsigned flags,
@@ -724,6 +736,62 @@ drm_public int 
amdgpu_cs_syncobj_import_sync_file(amdgpu_device_handle dev,
return drmSyncobjImportSyncFile(dev->fd, syncobj, sync_file_fd);
 }
 
+drm_public int amdgpu_cs_syncobj_export_sync_file2(amdgpu_device_handle dev,
+  uint32_t syncobj,
+  uint64_t point,
+  

[PATCH libdrm 3/8] add timeline wait/query ioctl v2

2019-04-01 Thread Chunming Zhou
v2: drop export/import

Signed-off-by: Chunming Zhou 
---
 xf86drm.c | 44 
 xf86drm.h |  6 ++
 2 files changed, 50 insertions(+)

diff --git a/xf86drm.c b/xf86drm.c
index 18ad7c58..66e0c985 100644
--- a/xf86drm.c
+++ b/xf86drm.c
@@ -4279,3 +4279,47 @@ drm_public int drmSyncobjSignal(int fd, const uint32_t 
*handles,
 ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_SIGNAL, );
 return ret;
 }
+
+drm_public int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t 
*points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled)
+{
+struct drm_syncobj_timeline_wait args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.timeout_nsec = timeout_nsec;
+args.count_handles = num_handles;
+args.flags = flags;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, );
+if (ret < 0)
+return -errno;
+
+if (first_signaled)
+*first_signaled = args.first_signaled;
+return ret;
+}
+
+
+drm_public int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
+  uint32_t handle_count)
+{
+struct drm_syncobj_timeline_array args;
+int ret;
+
+memclear(args);
+args.handles = (uintptr_t)handles;
+args.points = (uint64_t)(uintptr_t)points;
+args.count_handles = handle_count;
+
+ret = drmIoctl(fd, DRM_IOCTL_SYNCOBJ_QUERY, );
+if (ret)
+return ret;
+return 0;
+}
+
+
diff --git a/xf86drm.h b/xf86drm.h
index 887ecc76..60c7a84f 100644
--- a/xf86drm.h
+++ b/xf86drm.h
@@ -876,6 +876,12 @@ extern int drmSyncobjWait(int fd, uint32_t *handles, 
unsigned num_handles,
  uint32_t *first_signaled);
 extern int drmSyncobjReset(int fd, const uint32_t *handles, uint32_t 
handle_count);
 extern int drmSyncobjSignal(int fd, const uint32_t *handles, uint32_t 
handle_count);
+extern int drmSyncobjTimelineWait(int fd, uint32_t *handles, uint64_t *points,
+ unsigned num_handles,
+ int64_t timeout_nsec, unsigned flags,
+ uint32_t *first_signaled);
+extern int drmSyncobjQuery(int fd, uint32_t *handles, uint64_t *points,
+  uint32_t handle_count);
 
 #if defined(__cplusplus)
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 2/8] addr cs chunk for syncobj timeline

2019-04-01 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 include/drm/amdgpu_drm.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index be84e43c..bfa04dd8 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -523,6 +523,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_SYNCOBJ_IN  0x04
 #define AMDGPU_CHUNK_ID_SYNCOBJ_OUT 0x05
 #define AMDGPU_CHUNK_ID_BO_HANDLES  0x06
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT0x08
+#define AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL  0x09
 
 struct drm_amdgpu_cs_chunk {
__u32   chunk_id;
@@ -598,6 +600,13 @@ struct drm_amdgpu_cs_chunk_sem {
__u32 handle;
 };
 
+struct drm_amdgpu_cs_chunk_syncobj {
+   __u32 handle;
+   __u32 flags;
+   __u64 point;
+};
+
+
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ 0
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNCOBJ_FD  1
 #define AMDGPU_FENCE_TO_HANDLE_GET_SYNC_FILE_FD2
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 1/8] new syncobj extension v3

2019-04-01 Thread Chunming Zhou
v2: drop not implemented IOCTLs and flags
v3: add transfer/signal ioctls

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
---
 include/drm/drm.h | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/include/drm/drm.h b/include/drm/drm.h
index 85c685a2..26f51bca 100644
--- a/include/drm/drm.h
+++ b/include/drm/drm.h
@@ -729,8 +729,18 @@ struct drm_syncobj_handle {
__u32 pad;
 };
 
+struct drm_syncobj_transfer {
+__u32 src_handle;
+__u32 dst_handle;
+__u64 src_point;
+__u64 dst_point;
+__u32 flags;
+__u32 pad;
+};
+
 #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_ALL (1 << 0)
 #define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT (1 << 1)
+#define DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE (1 << 2)
 struct drm_syncobj_wait {
__u64 handles;
/* absolute timeout */
@@ -741,12 +751,31 @@ struct drm_syncobj_wait {
__u32 pad;
 };
 
+struct drm_syncobj_timeline_wait {
+__u64 handles;
+/* wait on specific timeline point for every handles*/
+__u64 points;
+/* absolute timeout */
+__s64 timeout_nsec;
+__u32 count_handles;
+__u32 flags;
+__u32 first_signaled; /* only valid when not waiting all */
+__u32 pad;
+};
+
 struct drm_syncobj_array {
__u64 handles;
__u32 count_handles;
__u32 pad;
 };
 
+struct drm_syncobj_timeline_array {
+__u64 handles;
+__u64 points;
+__u32 count_handles;
+__u32 pad;
+};
+
 /* Query current scanout sequence number */
 struct drm_crtc_get_sequence {
__u32 crtc_id;  /* requested crtc_id */
@@ -903,6 +932,12 @@ extern "C" {
 #define DRM_IOCTL_MODE_GET_LEASE   DRM_IOWR(0xC8, struct 
drm_mode_get_lease)
 #define DRM_IOCTL_MODE_REVOKE_LEASEDRM_IOWR(0xC9, struct 
drm_mode_revoke_lease)
 
+#define DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT DRM_IOWR(0xCA, struct 
drm_syncobj_timeline_wait)
+#define DRM_IOCTL_SYNCOBJ_QUERY DRM_IOWR(0xCB, struct 
drm_syncobj_timeline_array)
+#define DRM_IOCTL_SYNCOBJ_TRANSFER DRM_IOWR(0xCC, struct 
drm_syncobj_transfer)
+#define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL   DRM_IOWR(0xCD, struct 
drm_syncobj_timeline_array)
+
+
 /**
  * Device specific ioctls should only be in their respective headers
  * The device specific ioctl range is from 0x40 to 0x9f.
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 6/9] drm/amdgpu: add timeline support in amdgpu CS v3

2019-04-01 Thread Chunming Zhou
syncobj wait/signal operation is appending in command submission.
v2: separate to two kinds in/out_deps functions
v3: fix checking for timeline syncobj

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
Reviewed-by: Lionel Landwerlin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 152 +
 include/uapi/drm/amdgpu_drm.h  |   8 ++
 3 files changed, 144 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8d0d7f3dd5fb..deec2c796253 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -433,6 +433,12 @@ struct amdgpu_cs_chunk {
void*kdata;
 };
 
+struct amdgpu_cs_post_dep {
+   struct drm_syncobj *syncobj;
+   struct dma_fence_chain *chain;
+   u64 point;
+};
+
 struct amdgpu_cs_parser {
struct amdgpu_device*adev;
struct drm_file *filp;
@@ -462,8 +468,8 @@ struct amdgpu_cs_parser {
/* user fence */
struct amdgpu_bo_list_entry uf_entry;
 
-   unsigned num_post_dep_syncobjs;
-   struct drm_syncobj **post_dep_syncobjs;
+   unsignednum_post_deps;
+   struct amdgpu_cs_post_dep   *post_deps;
 };
 
 static inline u32 amdgpu_get_ib_value(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 52a5e4fdc95b..2f6239b6be6f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -215,6 +215,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, union drm_amdgpu_cs
case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
break;
 
default:
@@ -804,9 +806,11 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser 
*parser, int error,
ttm_eu_backoff_reservation(>ticket,
   >validated);
 
-   for (i = 0; i < parser->num_post_dep_syncobjs; i++)
-   drm_syncobj_put(parser->post_dep_syncobjs[i]);
-   kfree(parser->post_dep_syncobjs);
+   for (i = 0; i < parser->num_post_deps; i++) {
+   drm_syncobj_put(parser->post_deps[i].syncobj);
+   kfree(parser->post_deps[i].chain);
+   }
+   kfree(parser->post_deps);
 
dma_fence_put(parser->fence);
 
@@ -1117,13 +1121,18 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
 }
 
 static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
-uint32_t handle)
+uint32_t handle, u64 point,
+u64 flags)
 {
-   int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, 0, 0, );
-   if (r)
+   int r;
+
+   r = drm_syncobj_find_fence(p->filp, handle, point, flags, );
+   if (r) {
+   DRM_ERROR("syncobj %u failed to find fence @ %llu (%d)!\n",
+ handle, point, r);
return r;
+   }
 
r = amdgpu_sync_fence(p->adev, >job->sync, fence, true);
dma_fence_put(fence);
@@ -1134,46 +1143,118 @@ static int 
amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
 static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
struct amdgpu_cs_chunk *chunk)
 {
+   struct drm_amdgpu_cs_chunk_sem *deps;
unsigned num_deps;
int i, r;
-   struct drm_amdgpu_cs_chunk_sem *deps;
 
deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
num_deps = chunk->length_dw * 4 /
sizeof(struct drm_amdgpu_cs_chunk_sem);
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle,
+ 0, 0);
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
+
 
+static int amdgpu_cs_process_syncobj_timeline_in_dep(struct amdgpu_cs_parser 
*p,
+struct amdgpu_cs_chunk 
*chunk)
+{
+   struct drm_amdgpu_cs_chunk_syncobj *syncobj_deps;
+   unsigned num_deps;
+   int i, r;
+
+   syncobj_deps = (struct drm_amdgpu_cs_chunk_syncobj *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_c

[PATCH 7/9] drm/syncobj: add transition iotcls between binary and timeline v2

2019-04-01 Thread Chunming Zhou
we need to import/export timeline point.

v2: unify to one transfer ioctl

Signed-off-by: Chunming Zhou 
Cc: Lionel Landwerlin 
Reviewed-by: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 +
 drivers/gpu/drm/drm_ioctl.c|  2 +
 drivers/gpu/drm/drm_syncobj.c  | 74 ++
 include/uapi/drm/drm.h | 10 +
 4 files changed, 88 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 695179bb88dc..dd11ae5f1eef 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -180,6 +180,8 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_transfer_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
 int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 7a534c184e52..92b3b7b2fd81 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -686,6 +686,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, 
drm_syncobj_fd_to_handle_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TRANSFER, drm_syncobj_transfer_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 6c273e73d920..63d5d2bf35c2 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -680,6 +680,80 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
>handle);
 }
 
+static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
+   struct drm_syncobj_transfer *args)
+{
+   struct drm_syncobj *timeline_syncobj = NULL;
+   struct dma_fence *fence;
+   struct dma_fence_chain *chain;
+   int ret;
+
+   timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle);
+   if (!timeline_syncobj) {
+   return -ENOENT;
+   }
+   ret = drm_syncobj_find_fence(file_private, args->src_handle,
+args->src_point, args->flags,
+);
+   if (ret)
+   goto err;
+   chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+   if (!chain) {
+   ret = -ENOMEM;
+   goto err1;
+   }
+   drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point);
+err1:
+   dma_fence_put(fence);
+err:
+   drm_syncobj_put(timeline_syncobj);
+
+   return ret;
+}
+
+static int
+drm_syncobj_transfer_to_binary(struct drm_file *file_private,
+  struct drm_syncobj_transfer *args)
+{
+   struct drm_syncobj *binary_syncobj = NULL;
+   struct dma_fence *fence;
+   int ret;
+
+   binary_syncobj = drm_syncobj_find(file_private, args->dst_handle);
+   if (!binary_syncobj)
+   return -ENOENT;
+   ret = drm_syncobj_find_fence(file_private, args->src_handle,
+args->src_point, args->flags, );
+   if (ret)
+   goto err;
+   drm_syncobj_replace_fence(binary_syncobj, fence);
+   dma_fence_put(fence);
+err:
+   drm_syncobj_put(binary_syncobj);
+
+   return ret;
+}
+int
+drm_syncobj_transfer_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private)
+{
+   struct drm_syncobj_transfer *args = data;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad)
+   return -EINVAL;
+
+   if (args->dst_point)
+   ret = drm_syncobj_transfer_to_timeline(file_private, args);
+   else
+   ret = drm_syncobj_transfer_to_binary(file_private, args);
+
+   return ret;
+}
+
 static void syncobj_wait_fence_func(struct dma_fence *fence,
struct dma_fence_cb *cb)
 {
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index c62be0840ba5..e8d0d6b51875 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -735,6 +735,15 @@ struc

[PATCH 5/9] drm/syncobj: use the timeline point in drm_syncobj_find_fence v4

2019-04-01 Thread Chunming Zhou
From: Christian König 

Implement finding the right timeline point in drm_syncobj_find_fence.

v2: return -EINVAL when the point is not submitted yet.
v3: fix reference counting bug, add flags handling as well
v4: add timeout for find fence

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
Reviewed-by: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_syncobj.c | 50 ---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 3a1be7f79de1..6c273e73d920 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -214,6 +214,8 @@ static void drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)
dma_fence_put(fence);
 }
 
+/* 5s default for wait submission */
+#define DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT 50ULL
 /**
  * drm_syncobj_find_fence - lookup and reference the fence in a sync object
  * @file_private: drm file private pointer
@@ -234,16 +236,58 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
   struct dma_fence **fence)
 {
struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
-   int ret = 0;
+   struct syncobj_wait_entry wait;
+   u64 timeout = nsecs_to_jiffies64(DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT);
+   int ret;
 
if (!syncobj)
return -ENOENT;
 
*fence = drm_syncobj_fence_get(syncobj);
-   if (!*fence) {
+   drm_syncobj_put(syncobj);
+
+   if (*fence) {
+   ret = dma_fence_chain_find_seqno(fence, point);
+   if (!ret)
+   return 0;
+   dma_fence_put(*fence);
+   } else {
ret = -EINVAL;
}
-   drm_syncobj_put(syncobj);
+
+   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
+   return ret;
+
+   memset(, 0, sizeof(wait));
+   wait.task = current;
+   wait.point = point;
+   drm_syncobj_fence_add_wait(syncobj, );
+
+   do {
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (wait.fence) {
+   ret = 0;
+   break;
+   }
+if (timeout == 0) {
+ret = -ETIME;
+break;
+}
+
+   if (signal_pending(current)) {
+   ret = -ERESTARTSYS;
+   break;
+   }
+
+timeout = schedule_timeout(timeout);
+   } while (1);
+
+   __set_current_state(TASK_RUNNING);
+   *fence = wait.fence;
+
+   if (wait.node.next)
+   drm_syncobj_remove_wait(syncobj, );
+
return ret;
 }
 EXPORT_SYMBOL(drm_syncobj_find_fence);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 8/9] drm/syncobj: add timeline signal ioctl for syncobj v5

2019-04-01 Thread Chunming Zhou
v2: individually allocate chain array, since chain node is free independently.
v3: all existing points must be already signaled before cpu perform signal 
operation,
so add check condition for that.
v4: remove v3 change and add checking to prevent out-of-order
v5: unify binary and timeline

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
Reviewed-by: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 +
 drivers/gpu/drm/drm_ioctl.c|  2 +
 drivers/gpu/drm/drm_syncobj.c  | 73 ++
 include/uapi/drm/drm.h |  1 +
 4 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index dd11ae5f1eef..d9a483a5fce0 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -190,6 +190,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 92b3b7b2fd81..d337f161909c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -696,6 +696,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, 
drm_syncobj_timeline_signal_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 63d5d2bf35c2..f3ceeb504e6c 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1184,6 +1184,79 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
return ret;
 }
 
+int
+drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   struct dma_fence_chain **chains;
+   uint64_t *points;
+   uint32_t i, j;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   points = kmalloc_array(args->count_handles, sizeof(*points),
+  GFP_KERNEL);
+   if (!points) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   if (!u64_to_user_ptr(args->points)) {
+   memset(points, 0, args->count_handles * sizeof(uint64_t));
+   } else if (copy_from_user(points, u64_to_user_ptr(args->points),
+ sizeof(uint64_t) * args->count_handles)) {
+   ret = -EFAULT;
+   goto err_points;
+   }
+
+   chains = kmalloc_array(args->count_handles, sizeof(void *), GFP_KERNEL);
+   if (!chains) {
+   ret = -ENOMEM;
+   goto err_points;
+   }
+   for (i = 0; i < args->count_handles; i++) {
+   chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+   if (!chains[i]) {
+   for (j = 0; j < i; j++)
+   kfree(chains[j]);
+   ret = -ENOMEM;
+   goto err_chains;
+   }
+   }
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence *fence = dma_fence_get_stub();
+
+   drm_syncobj_add_point(syncobjs[i], chains[i],
+ fence, points[i]);
+   dma_fence_put(fence);
+   }
+err_chains:
+   kfree(chains);
+err_points:
+   kfree(points);
+out:
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
+
 int drm_syn

[PATCH 9/9] drm/amdgpu: update version for timeline syncobj support in amdgpu

2019-04-01 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
Reviewed-by: Lionel Landwerlin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 8a0732088640..4d8db87048d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -74,9 +74,10 @@
  * - 3.28.0 - Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
  * - 3.29.0 - Add AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID
  * - 3.30.0 - Add AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE.
+ * - 3.31.0 - Add syncobj timeline support to AMDGPU_CS.
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   30
+#define KMS_DRIVER_MINOR   31
 #define KMS_DRIVER_PATCHLEVEL  0
 
 int amdgpu_vram_limit = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/9] drm/syncobj: add timeline payload query ioctl v6

2019-04-01 Thread Chunming Zhou
user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface
v5: query last signaled timeline point, not last point.
v6: add unorder point check

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
Reviewed-by: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  2 ++
 drivers/gpu/drm/drm_syncobj.c  | 62 ++
 include/uapi/drm/drm.h | 10 ++
 4 files changed, 76 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 331ac6225b58..695179bb88dc 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -188,6 +188,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index c984654646fa..7a534c184e52 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -694,6 +694,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index b1262e92011c..3a1be7f79de1 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1065,3 +1065,65 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
 
return ret;
 }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   uint64_t __user *points = u64_to_user_ptr(args->points);
+   uint32_t i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+   struct dma_fence *fence;
+   uint64_t point;
+
+   fence = drm_syncobj_fence_get(syncobjs[i]);
+   chain = to_dma_fence_chain(fence);
+   if (chain) {
+   struct dma_fence *iter, *last_signaled = NULL;
+
+   dma_fence_chain_for_each(iter, fence) {
+   if (!iter)
+   break;
+   dma_fence_put(last_signaled);
+   last_signaled = dma_fence_get(iter);
+   if 
(!to_dma_fence_chain(last_signaled)->prev_seqno)
+   /* It is most likely that timeline has
+* unorder points. */
+   break;
+   }
+   point = dma_fence_is_signaled(last_signaled) ?
+   last_signaled->seqno :
+   to_dma_fence_chain(last_signaled)->prev_seqno;
+   dma_fence_put(last_signaled);
+   } else {
+   point = 0;
+   }
+   ret = copy_to_user([i], , sizeof(uint64_t));
+   ret = ret ? -EFAULT : 0;
+   if (ret)
+   break;
+   }
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 44ebcdd9bd1

[PATCH 3/9] drm/syncobj: add support for timeline point wait v8

2019-04-01 Thread Chunming Zhou
points array is one-to-one match with syncobjs array.
v2:
add seperate ioctl for timeline point wait, otherwise break uapi.
v3:
userspace can specify two kinds waits::
a. Wait for time point to be completed.
b. and wait for time point to become available
v4:
rebase
v5:
add comment for xxx_WAIT_AVAILABLE
v6: rebase and rework on new container
v7: drop _WAIT_COMPLETED, it is the default anyway
v8: correctly handle garbage collected fences

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
Reviewed-by: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 153 ++---
 include/uapi/drm/drm.h |  15 
 4 files changed, 143 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 251d67e04c2d..331ac6225b58 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -182,6 +182,8 @@ int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 687943df58e1..c984654646fa 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -688,6 +688,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET, drm_syncobj_reset_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index dbe4a1c75fbc..b1262e92011c 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -61,6 +61,7 @@ struct syncobj_wait_entry {
struct task_struct *task;
struct dma_fence *fence;
struct dma_fence_cb fence_cb;
+   u64point;
 };
 
 static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
@@ -95,6 +96,8 @@ EXPORT_SYMBOL(drm_syncobj_find);
 static void drm_syncobj_fence_add_wait(struct drm_syncobj *syncobj,
   struct syncobj_wait_entry *wait)
 {
+   struct dma_fence *fence;
+
if (wait->fence)
return;
 
@@ -103,11 +106,15 @@ static void drm_syncobj_fence_add_wait(struct drm_syncobj 
*syncobj,
 * have the lock, try one more time just to be sure we don't add a
 * callback when a fence has already been set.
 */
-   if (syncobj->fence)
-   wait->fence = dma_fence_get(
-   rcu_dereference_protected(syncobj->fence, 1));
-   else
+   fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+   if (!fence || dma_fence_chain_find_seqno(, wait->point)) {
+   dma_fence_put(fence);
list_add_tail(>node, >cb_list);
+   } else if (!fence) {
+   wait->fence = dma_fence_get_stub();
+   } else {
+   wait->fence = fence;
+   }
spin_unlock(>lock);
 }
 
@@ -150,10 +157,8 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
dma_fence_chain_init(chain, prev, fence, point);
rcu_assign_pointer(syncobj->fence, >base);
 
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
spin_unlock(>lock);
 
/* Walk the chain once to trigger garbage collection */
@@ -185,10 +190,8 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
rcu_assign_pointer(syncobj->fence, fence);
 
if (fence != old_fence) {
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
}

[PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point interface v4

2019-04-01 Thread Chunming Zhou
From: Christian König 

Use the dma_fence_chain object to create a timeline of fence objects
instead of just replacing the existing fence.

v2: rebase and cleanup
v3: fix garbage collection parameters
v4: add unorder point check, print a warn calltrace

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
Reviewed-by: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_syncobj.c | 40 +++
 include/drm/drm_syncobj.h |  5 +
 2 files changed, 45 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 5329e66598c6..dbe4a1c75fbc 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -122,6 +122,46 @@ static void drm_syncobj_remove_wait(struct drm_syncobj 
*syncobj,
spin_unlock(>lock);
 }
 
+/**
+ * drm_syncobj_add_point - add new timeline point to the syncobj
+ * @syncobj: sync object to add timeline point do
+ * @chain: chain node to use to add the point
+ * @fence: fence to encapsulate in the chain node
+ * @point: sequence number to use for the point
+ *
+ * Add the chain node as new timeline point to the syncobj.
+ */
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point)
+{
+   struct syncobj_wait_entry *cur, *tmp;
+   struct dma_fence *prev;
+
+   dma_fence_get(fence);
+
+   spin_lock(>lock);
+
+   prev = drm_syncobj_fence_get(syncobj);
+   /* You are adding an unorder point to timeline, which could cause 
payload returned from query_ioctl is 0! */
+   if (prev && prev->seqno >= point)
+   DRM_ERROR("You are adding an unorder point to timeline!\n");
+   dma_fence_chain_init(chain, prev, fence, point);
+   rcu_assign_pointer(syncobj->fence, >base);
+
+   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
+   list_del_init(>node);
+   syncobj_wait_syncobj_func(syncobj, cur);
+   }
+   spin_unlock(>lock);
+
+   /* Walk the chain once to trigger garbage collection */
+   dma_fence_chain_for_each(fence, prev);
+   dma_fence_put(prev);
+}
+EXPORT_SYMBOL(drm_syncobj_add_point);
+
 /**
  * drm_syncobj_replace_fence - replace fence in a sync object.
  * @syncobj: Sync object to replace fence in
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 0311c9fdbd2f..6cf7243a1dc5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -27,6 +27,7 @@
 #define __DRM_SYNCOBJ_H__
 
 #include 
+#include 
 
 struct drm_file;
 
@@ -112,6 +113,10 @@ drm_syncobj_fence_get(struct drm_syncobj *syncobj)
 
 struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
 u32 handle);
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point);
 void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
   struct dma_fence *fence);
 int drm_syncobj_find_fence(struct drm_file *file_private,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 0/9] *** timeline syncobj support ***

2019-04-01 Thread Chunming Zhou
timeline syncobj gives user more flexibility and convenience to do 
sychronization.
Lionel has written cts and adapted ANV based on this patch set, and also 
reviewed
the patch set.
please someone from community helps to submit the patch set to drm-misc-next.

Christian König (3):
  dma-buf: add new dma_fence_chain container v7
  drm/syncobj: add new drm_syncobj_add_point interface v4
  drm/syncobj: use the timeline point in drm_syncobj_find_fence v4

Chunming Zhou (6):
  drm/syncobj: add support for timeline point wait v8
  drm/syncobj: add timeline payload query ioctl v6
  drm/amdgpu: add timeline support in amdgpu CS v3
  drm/syncobj: add transition iotcls between binary and timeline v2
  drm/syncobj: add timeline signal ioctl for syncobj v5
  drm/amdgpu: update version for timeline syncobj support in amdgpu

 drivers/dma-buf/Makefile|   3 +-
 drivers/dma-buf/dma-fence-chain.c   | 241 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 152 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |   3 +-
 drivers/gpu/drm/drm_internal.h  |   8 +
 drivers/gpu/drm/drm_ioctl.c |   8 +
 drivers/gpu/drm/drm_syncobj.c   | 446 ++--
 include/drm/drm_syncobj.h   |   5 +
 include/linux/dma-fence-chain.h |  81 +
 include/uapi/drm/amdgpu_drm.h   |   8 +
 include/uapi/drm/drm.h  |  36 ++
 12 files changed, 944 insertions(+), 57 deletions(-)
 create mode 100644 drivers/dma-buf/dma-fence-chain.c
 create mode 100644 include/linux/dma-fence-chain.h

-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/9] dma-buf: add new dma_fence_chain container v7

2019-04-01 Thread Chunming Zhou
From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
drop prev reference during garbage collection if it's not a chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling
v5: fix iteration when walking each chain node
v6: add __rcu for member 'prev' of struct chain node
v7: fix rcu warnings from kernel robot

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
Reviewed-by: Lionel Landwerlin 
---
 drivers/dma-buf/Makefile  |   3 +-
 drivers/dma-buf/dma-fence-chain.c | 241 ++
 include/linux/dma-fence-chain.h   |  81 ++
 3 files changed, 324 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma-buf/dma-fence-chain.c
 create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
 obj-$(CONFIG_SYNC_FILE)+= sync_file.o
 obj-$(CONFIG_SW_SYNC)  += sw_sync.o sync_debug.o
 obj-$(CONFIG_UDMABUF)  += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..c729f98a7bd3
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(>prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg((void **)>prev, (void *)prev, (void 
*)replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence pointer to the chain node which will signal this sequence
+ * number. If no sequence number is provided then this is a no-op.
+ *
+ * Returns EINVAL if the fence is not a chain node or the sequence number has
+ * not yet advanced far enough.
+ */
+int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)
+{
+   struct dma_fence_chain *chain;
+
+   if (!seqno)
+   return 0;
+
+   

Re: [PATCH 8/9] drm/syncobj: add timeline signal ioctl for syncobj v5

2019-03-28 Thread Chunming Zhou

在 2019/3/28 20:53, Lionel Landwerlin 写道:
> On 25/03/2019 08:32, Chunming Zhou wrote:
>> v2: individually allocate chain array, since chain node is free 
>> independently.
>> v3: all existing points must be already signaled before cpu perform 
>> signal operation,
>>  so add check condition for that.
>> v4: remove v3 change and add checking to prevent out-of-order
>> v5: unify binary and timeline
>>
>> Signed-off-by: Chunming Zhou 
>> Cc: Tobias Hector 
>> Cc: Jason Ekstrand 
>> Cc: Dave Airlie 
>> Cc: Chris Wilson 
>> Cc: Lionel Landwerlin 
>> ---
>>   drivers/gpu/drm/drm_internal.h |  2 +
>>   drivers/gpu/drm/drm_ioctl.c    |  2 +
>>   drivers/gpu/drm/drm_syncobj.c  | 73 ++
>>   include/uapi/drm/drm.h |  1 +
>>   4 files changed, 78 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_internal.h 
>> b/drivers/gpu/drm/drm_internal.h
>> index dd11ae5f1eef..d9a483a5fce0 100644
>> --- a/drivers/gpu/drm/drm_internal.h
>> +++ b/drivers/gpu/drm/drm_internal.h
>> @@ -190,6 +190,8 @@ int drm_syncobj_reset_ioctl(struct drm_device 
>> *dev, void *data,
>>   struct drm_file *file_private);
>>   int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
>>    struct drm_file *file_private);
>> +int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void 
>> *data,
>> +  struct drm_file *file_private);
>>   int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
>>   struct drm_file *file_private);
>>   diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
>> index 92b3b7b2fd81..d337f161909c 100644
>> --- a/drivers/gpu/drm/drm_ioctl.c
>> +++ b/drivers/gpu/drm/drm_ioctl.c
>> @@ -696,6 +696,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
>>     DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
>>     DRM_UNLOCKED|DRM_RENDER_ALLOW),
>> +    DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, 
>> drm_syncobj_timeline_signal_ioctl,
>> +  DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
>>     DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>   DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, 
>> drm_crtc_get_sequence_ioctl, DRM_UNLOCKED),
>> diff --git a/drivers/gpu/drm/drm_syncobj.c 
>> b/drivers/gpu/drm/drm_syncobj.c
>> index ee2d66e047e7..099596190845 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -1183,6 +1183,79 @@ drm_syncobj_signal_ioctl(struct drm_device 
>> *dev, void *data,
>>   return ret;
>>   }
>>   +int
>> +drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
>> +  struct drm_file *file_private)
>> +{
>> +    struct drm_syncobj_timeline_array *args = data;
>> +    struct drm_syncobj **syncobjs;
>> +    struct dma_fence_chain **chains;
>> +    uint64_t *points;
>> +    uint32_t i, j;
>> +    int ret;
>> +
>> +    if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
>> +    return -EOPNOTSUPP;
>> +
>> +    if (args->pad != 0)
>> +    return -EINVAL;
>> +
>> +    if (args->count_handles == 0)
>> +    return -EINVAL;
>> +
>> +    ret = drm_syncobj_array_find(file_private,
>> + u64_to_user_ptr(args->handles),
>> + args->count_handles,
>> + );
>> +    if (ret < 0)
>> +    return ret;
>> +
>> +    points = kmalloc_array(args->count_handles, sizeof(*points),
>> +   GFP_KERNEL);
>> +    if (!points) {
>> +    ret = -ENOMEM;
>> +    goto out;
>> +    }
>> +    if (!u64_to_user_ptr(args->points)) {
>> +    memset(points, 0, args->count_handles * sizeof(uint64_t));
>> +    } else if (copy_from_user(points, u64_to_user_ptr(args->points),
>> +  sizeof(uint64_t) * args->count_handles)) {
>> +    ret = -EFAULT;
>> +    goto err_points;
>> +    }
>> +
>> +    chains = kmalloc_array(args->count_handles, sizeof(void *), 
>> GFP_KERNEL);
>> +    if (!chains) {
>> +    ret = -ENOMEM;
>> +    goto err_points;
>> +    }
>> +    for (i = 0; i < args->count_handles; i++) {
>> +    chains[i] = kzalloc(sizeof(struct dma_fence_chain), 
>

[PATCH 8/9] drm/syncobj: add timeline signal ioctl for syncobj v5

2019-03-25 Thread Chunming Zhou
v2: individually allocate chain array, since chain node is free independently.
v3: all existing points must be already signaled before cpu perform signal 
operation,
so add check condition for that.
v4: remove v3 change and add checking to prevent out-of-order
v5: unify binary and timeline

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 +
 drivers/gpu/drm/drm_ioctl.c|  2 +
 drivers/gpu/drm/drm_syncobj.c  | 73 ++
 include/uapi/drm/drm.h |  1 +
 4 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index dd11ae5f1eef..d9a483a5fce0 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -190,6 +190,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 92b3b7b2fd81..d337f161909c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -696,6 +696,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, 
drm_syncobj_timeline_signal_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index ee2d66e047e7..099596190845 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1183,6 +1183,79 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
return ret;
 }
 
+int
+drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   struct dma_fence_chain **chains;
+   uint64_t *points;
+   uint32_t i, j;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   points = kmalloc_array(args->count_handles, sizeof(*points),
+  GFP_KERNEL);
+   if (!points) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   if (!u64_to_user_ptr(args->points)) {
+   memset(points, 0, args->count_handles * sizeof(uint64_t));
+   } else if (copy_from_user(points, u64_to_user_ptr(args->points),
+ sizeof(uint64_t) * args->count_handles)) {
+   ret = -EFAULT;
+   goto err_points;
+   }
+
+   chains = kmalloc_array(args->count_handles, sizeof(void *), GFP_KERNEL);
+   if (!chains) {
+   ret = -ENOMEM;
+   goto err_points;
+   }
+   for (i = 0; i < args->count_handles; i++) {
+   chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+   if (!chains[i]) {
+   for (j = 0; j < i; j++)
+   kfree(chains[j]);
+   ret = -ENOMEM;
+   goto err_chains;
+   }
+   }
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence *fence = dma_fence_get_stub();
+
+   drm_syncobj_add_point(syncobjs[i], chains[i],
+ fence, points[i]);
+   dma_fence_put(fence);
+   }
+err_chains:
+   kfree(chains);
+err_points:
+   kfree(points);
+out:
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
+
 int drm_syncobj_query_ioctl(struct drm_device *dev, void

[PATCH 9/9] drm/amdgpu: update version for timeline syncobj support in amdgpu

2019-03-25 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 8a0732088640..4d8db87048d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -74,9 +74,10 @@
  * - 3.28.0 - Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
  * - 3.29.0 - Add AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID
  * - 3.30.0 - Add AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE.
+ * - 3.31.0 - Add syncobj timeline support to AMDGPU_CS.
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   30
+#define KMS_DRIVER_MINOR   31
 #define KMS_DRIVER_PATCHLEVEL  0
 
 int amdgpu_vram_limit = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point interface v4

2019-03-25 Thread Chunming Zhou
From: Christian König 

Use the dma_fence_chain object to create a timeline of fence objects
instead of just replacing the existing fence.

v2: rebase and cleanup
v3: fix garbage collection parameters
v4: add unorder point check, print a warn calltrace

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_syncobj.c | 39 +++
 include/drm/drm_syncobj.h |  5 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 5329e66598c6..19a9ce638119 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -122,6 +122,45 @@ static void drm_syncobj_remove_wait(struct drm_syncobj 
*syncobj,
spin_unlock(>lock);
 }
 
+/**
+ * drm_syncobj_add_point - add new timeline point to the syncobj
+ * @syncobj: sync object to add timeline point do
+ * @chain: chain node to use to add the point
+ * @fence: fence to encapsulate in the chain node
+ * @point: sequence number to use for the point
+ *
+ * Add the chain node as new timeline point to the syncobj.
+ */
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point)
+{
+   struct syncobj_wait_entry *cur, *tmp;
+   struct dma_fence *prev;
+
+   dma_fence_get(fence);
+
+   spin_lock(>lock);
+
+   prev = drm_syncobj_fence_get(syncobj);
+   /* You are adding an unorder point to timeline, which could cause 
payload returned from query_ioctl is 0! */
+   WARN_ON_ONCE(prev && prev->seqno >= point);
+   dma_fence_chain_init(chain, prev, fence, point);
+   rcu_assign_pointer(syncobj->fence, >base);
+
+   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
+   list_del_init(>node);
+   syncobj_wait_syncobj_func(syncobj, cur);
+   }
+   spin_unlock(>lock);
+
+   /* Walk the chain once to trigger garbage collection */
+   dma_fence_chain_for_each(fence, prev);
+   dma_fence_put(prev);
+}
+EXPORT_SYMBOL(drm_syncobj_add_point);
+
 /**
  * drm_syncobj_replace_fence - replace fence in a sync object.
  * @syncobj: Sync object to replace fence in
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 0311c9fdbd2f..6cf7243a1dc5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -27,6 +27,7 @@
 #define __DRM_SYNCOBJ_H__
 
 #include 
+#include 
 
 struct drm_file;
 
@@ -112,6 +113,10 @@ drm_syncobj_fence_get(struct drm_syncobj *syncobj)
 
 struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
 u32 handle);
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point);
 void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
   struct dma_fence *fence);
 int drm_syncobj_find_fence(struct drm_file *file_private,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 7/9] drm/syncobj: add transition iotcls between binary and timeline v2

2019-03-25 Thread Chunming Zhou
we need to import/export timeline point.

v2: unify to one transfer ioctl

Signed-off-by: Chunming Zhou 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 +
 drivers/gpu/drm/drm_ioctl.c|  2 +
 drivers/gpu/drm/drm_syncobj.c  | 74 ++
 include/uapi/drm/drm.h | 10 +
 4 files changed, 88 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 695179bb88dc..dd11ae5f1eef 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -180,6 +180,8 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_transfer_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
 int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 7a534c184e52..92b3b7b2fd81 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -686,6 +686,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, 
drm_syncobj_fd_to_handle_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TRANSFER, drm_syncobj_transfer_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 087fd4e7eaf3..ee2d66e047e7 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -679,6 +679,80 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
>handle);
 }
 
+static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
+   struct drm_syncobj_transfer *args)
+{
+   struct drm_syncobj *timeline_syncobj = NULL;
+   struct dma_fence *fence;
+   struct dma_fence_chain *chain;
+   int ret;
+
+   timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle);
+   if (!timeline_syncobj) {
+   return -ENOENT;
+   }
+   ret = drm_syncobj_find_fence(file_private, args->src_handle,
+args->src_point, args->flags,
+);
+   if (ret)
+   goto err;
+   chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+   if (!chain) {
+   ret = -ENOMEM;
+   goto err1;
+   }
+   drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point);
+err1:
+   dma_fence_put(fence);
+err:
+   drm_syncobj_put(timeline_syncobj);
+
+   return ret;
+}
+
+static int
+drm_syncobj_transfer_to_binary(struct drm_file *file_private,
+  struct drm_syncobj_transfer *args)
+{
+   struct drm_syncobj *binary_syncobj = NULL;
+   struct dma_fence *fence;
+   int ret;
+
+   binary_syncobj = drm_syncobj_find(file_private, args->dst_handle);
+   if (!binary_syncobj)
+   return -ENOENT;
+   ret = drm_syncobj_find_fence(file_private, args->src_handle,
+args->src_point, args->flags, );
+   if (ret)
+   goto err;
+   drm_syncobj_replace_fence(binary_syncobj, fence);
+   dma_fence_put(fence);
+err:
+   drm_syncobj_put(binary_syncobj);
+
+   return ret;
+}
+int
+drm_syncobj_transfer_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private)
+{
+   struct drm_syncobj_transfer *args = data;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad)
+   return -EINVAL;
+
+   if (args->dst_point)
+   ret = drm_syncobj_transfer_to_timeline(file_private, args);
+   else
+   ret = drm_syncobj_transfer_to_binary(file_private, args);
+
+   return ret;
+}
+
 static void syncobj_wait_fence_func(struct dma_fence *fence,
struct dma_fence_cb *cb)
 {
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index c62be0840ba5..e8d0d6b51875 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -735,6 +735,15 @@ struct drm_syncobj_handl

[PATCH 6/9] drm/amdgpu: add timeline support in amdgpu CS v3

2019-03-25 Thread Chunming Zhou
syncobj wait/signal operation is appending in command submission.
v2: separate to two kinds in/out_deps functions
v3: fix checking for timeline syncobj

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 152 +
 include/uapi/drm/amdgpu_drm.h  |   8 ++
 3 files changed, 144 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8d0d7f3dd5fb..deec2c796253 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -433,6 +433,12 @@ struct amdgpu_cs_chunk {
void*kdata;
 };
 
+struct amdgpu_cs_post_dep {
+   struct drm_syncobj *syncobj;
+   struct dma_fence_chain *chain;
+   u64 point;
+};
+
 struct amdgpu_cs_parser {
struct amdgpu_device*adev;
struct drm_file *filp;
@@ -462,8 +468,8 @@ struct amdgpu_cs_parser {
/* user fence */
struct amdgpu_bo_list_entry uf_entry;
 
-   unsigned num_post_dep_syncobjs;
-   struct drm_syncobj **post_dep_syncobjs;
+   unsignednum_post_deps;
+   struct amdgpu_cs_post_dep   *post_deps;
 };
 
 static inline u32 amdgpu_get_ib_value(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 52a5e4fdc95b..2f6239b6be6f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -215,6 +215,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, union drm_amdgpu_cs
case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
break;
 
default:
@@ -804,9 +806,11 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser 
*parser, int error,
ttm_eu_backoff_reservation(>ticket,
   >validated);
 
-   for (i = 0; i < parser->num_post_dep_syncobjs; i++)
-   drm_syncobj_put(parser->post_dep_syncobjs[i]);
-   kfree(parser->post_dep_syncobjs);
+   for (i = 0; i < parser->num_post_deps; i++) {
+   drm_syncobj_put(parser->post_deps[i].syncobj);
+   kfree(parser->post_deps[i].chain);
+   }
+   kfree(parser->post_deps);
 
dma_fence_put(parser->fence);
 
@@ -1117,13 +1121,18 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
 }
 
 static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
-uint32_t handle)
+uint32_t handle, u64 point,
+u64 flags)
 {
-   int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, 0, 0, );
-   if (r)
+   int r;
+
+   r = drm_syncobj_find_fence(p->filp, handle, point, flags, );
+   if (r) {
+   DRM_ERROR("syncobj %u failed to find fence @ %llu (%d)!\n",
+ handle, point, r);
return r;
+   }
 
r = amdgpu_sync_fence(p->adev, >job->sync, fence, true);
dma_fence_put(fence);
@@ -1134,46 +1143,118 @@ static int 
amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
 static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
struct amdgpu_cs_chunk *chunk)
 {
+   struct drm_amdgpu_cs_chunk_sem *deps;
unsigned num_deps;
int i, r;
-   struct drm_amdgpu_cs_chunk_sem *deps;
 
deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
num_deps = chunk->length_dw * 4 /
sizeof(struct drm_amdgpu_cs_chunk_sem);
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle,
+ 0, 0);
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
+
 
+static int amdgpu_cs_process_syncobj_timeline_in_dep(struct amdgpu_cs_parser 
*p,
+struct amdgpu_cs_chunk 
*chunk)
+{
+   struct drm_amdgpu_cs_chunk_syncobj *syncobj_deps;
+   unsigned num_deps;
+   int i, r;
+
+   syncobj_deps = (struct drm_amdgpu_cs_chunk_syncobj *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_syncobj);
for (i = 0; i < num_deps; ++i

[PATCH 5/9] drm/syncobj: use the timeline point in drm_syncobj_find_fence v4

2019-03-25 Thread Chunming Zhou
From: Christian König 

Implement finding the right timeline point in drm_syncobj_find_fence.

v2: return -EINVAL when the point is not submitted yet.
v3: fix reference counting bug, add flags handling as well
v4: add timeout for find fence

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_syncobj.c | 50 ---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0e62a793c8dd..087fd4e7eaf3 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -213,6 +213,8 @@ static void drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)
dma_fence_put(fence);
 }
 
+/* 5s default for wait submission */
+#define DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT 50ULL
 /**
  * drm_syncobj_find_fence - lookup and reference the fence in a sync object
  * @file_private: drm file private pointer
@@ -233,16 +235,58 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
   struct dma_fence **fence)
 {
struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
-   int ret = 0;
+   struct syncobj_wait_entry wait;
+   u64 timeout = nsecs_to_jiffies64(DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT);
+   int ret;
 
if (!syncobj)
return -ENOENT;
 
*fence = drm_syncobj_fence_get(syncobj);
-   if (!*fence) {
+   drm_syncobj_put(syncobj);
+
+   if (*fence) {
+   ret = dma_fence_chain_find_seqno(fence, point);
+   if (!ret)
+   return 0;
+   dma_fence_put(*fence);
+   } else {
ret = -EINVAL;
}
-   drm_syncobj_put(syncobj);
+
+   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
+   return ret;
+
+   memset(, 0, sizeof(wait));
+   wait.task = current;
+   wait.point = point;
+   drm_syncobj_fence_add_wait(syncobj, );
+
+   do {
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (wait.fence) {
+   ret = 0;
+   break;
+   }
+if (timeout == 0) {
+ret = -ETIME;
+break;
+}
+
+   if (signal_pending(current)) {
+   ret = -ERESTARTSYS;
+   break;
+   }
+
+timeout = schedule_timeout(timeout);
+   } while (1);
+
+   __set_current_state(TASK_RUNNING);
+   *fence = wait.fence;
+
+   if (wait.node.next)
+   drm_syncobj_remove_wait(syncobj, );
+
return ret;
 }
 EXPORT_SYMBOL(drm_syncobj_find_fence);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/9] dma-buf: add new dma_fence_chain container v7

2019-03-25 Thread Chunming Zhou
From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
drop prev reference during garbage collection if it's not a chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling
v5: fix iteration when walking each chain node
v6: add __rcu for member 'prev' of struct chain node
v7: fix rcu warnings from kernel robot

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
---
 drivers/dma-buf/Makefile  |   3 +-
 drivers/dma-buf/dma-fence-chain.c | 241 ++
 include/linux/dma-fence-chain.h   |  81 ++
 3 files changed, 324 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma-buf/dma-fence-chain.c
 create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
 obj-$(CONFIG_SYNC_FILE)+= sync_file.o
 obj-$(CONFIG_SW_SYNC)  += sw_sync.o sync_debug.o
 obj-$(CONFIG_UDMABUF)  += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..c729f98a7bd3
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(>prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg((void **)>prev, (void *)prev, (void 
*)replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence pointer to the chain node which will signal this sequence
+ * number. If no sequence number is provided then this is a no-op.
+ *
+ * Returns EINVAL if the fence is not a chain node or the sequence number has
+ * not yet advanced far enough.
+ */
+int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)
+{
+   struct dma_fence_chain *chain;
+
+   if (!seqno)
+   return 0;
+
+   chain = 

[PATCH 4/9] drm/syncobj: add timeline payload query ioctl v6

2019-03-25 Thread Chunming Zhou
user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface
v5: query last signaled timeline point, not last point.
v6: add unorder point check

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  2 ++
 drivers/gpu/drm/drm_syncobj.c  | 62 ++
 include/uapi/drm/drm.h | 10 ++
 4 files changed, 76 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 331ac6225b58..695179bb88dc 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -188,6 +188,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index c984654646fa..7a534c184e52 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -694,6 +694,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index eae51978cda4..0e62a793c8dd 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1064,3 +1064,65 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
 
return ret;
 }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   uint64_t __user *points = u64_to_user_ptr(args->points);
+   uint32_t i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+   struct dma_fence *fence;
+   uint64_t point;
+
+   fence = drm_syncobj_fence_get(syncobjs[i]);
+   chain = to_dma_fence_chain(fence);
+   if (chain) {
+   struct dma_fence *iter, *last_signaled = NULL;
+
+   dma_fence_chain_for_each(iter, fence) {
+   if (!iter)
+   break;
+   dma_fence_put(last_signaled);
+   last_signaled = dma_fence_get(iter);
+   if 
(!to_dma_fence_chain(last_signaled)->prev_seqno)
+   /* It is most likely that timeline has
+* unorder points. */
+   break;
+   }
+   point = dma_fence_is_signaled(last_signaled) ?
+   last_signaled->seqno :
+   to_dma_fence_chain(last_signaled)->prev_seqno;
+   dma_fence_put(last_signaled);
+   } else {
+   point = 0;
+   }
+   ret = copy_to_user([i], , sizeof(uint64_t));
+   ret = ret ? -EFAULT : 0;
+   if (ret)
+   break;
+   }
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 44ebcdd9bd1d..c62be0840ba5 100644
--- a/

[PATCH 3/9] drm/syncobj: add support for timeline point wait v8

2019-03-25 Thread Chunming Zhou
points array is one-to-one match with syncobjs array.
v2:
add seperate ioctl for timeline point wait, otherwise break uapi.
v3:
userspace can specify two kinds waits::
a. Wait for time point to be completed.
b. and wait for time point to become available
v4:
rebase
v5:
add comment for xxx_WAIT_AVAILABLE
v6: rebase and rework on new container
v7: drop _WAIT_COMPLETED, it is the default anyway
v8: correctly handle garbage collected fences

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 153 ++---
 include/uapi/drm/drm.h |  15 
 4 files changed, 143 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 251d67e04c2d..331ac6225b58 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -182,6 +182,8 @@ int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 687943df58e1..c984654646fa 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -688,6 +688,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET, drm_syncobj_reset_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 19a9ce638119..eae51978cda4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -61,6 +61,7 @@ struct syncobj_wait_entry {
struct task_struct *task;
struct dma_fence *fence;
struct dma_fence_cb fence_cb;
+   u64point;
 };
 
 static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
@@ -95,6 +96,8 @@ EXPORT_SYMBOL(drm_syncobj_find);
 static void drm_syncobj_fence_add_wait(struct drm_syncobj *syncobj,
   struct syncobj_wait_entry *wait)
 {
+   struct dma_fence *fence;
+
if (wait->fence)
return;
 
@@ -103,11 +106,15 @@ static void drm_syncobj_fence_add_wait(struct drm_syncobj 
*syncobj,
 * have the lock, try one more time just to be sure we don't add a
 * callback when a fence has already been set.
 */
-   if (syncobj->fence)
-   wait->fence = dma_fence_get(
-   rcu_dereference_protected(syncobj->fence, 1));
-   else
+   fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+   if (!fence || dma_fence_chain_find_seqno(, wait->point)) {
+   dma_fence_put(fence);
list_add_tail(>node, >cb_list);
+   } else if (!fence) {
+   wait->fence = dma_fence_get_stub();
+   } else {
+   wait->fence = fence;
+   }
spin_unlock(>lock);
 }
 
@@ -149,10 +156,8 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
dma_fence_chain_init(chain, prev, fence, point);
rcu_assign_pointer(syncobj->fence, >base);
 
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
spin_unlock(>lock);
 
/* Walk the chain once to trigger garbage collection */
@@ -184,10 +189,8 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
rcu_assign_pointer(syncobj->fence, fence);
 
if (fence != old_fence) {
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
}
 
spin_unlock(>lock);
@@ -644,

[PATCH 7/9] drm/syncobj: add transition iotcls between binary and timeline v2

2019-03-19 Thread Chunming Zhou
we need to import/export timeline point.

v2: unify to one transfer ioctl

Signed-off-by: Chunming Zhou 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 +
 drivers/gpu/drm/drm_ioctl.c|  2 +
 drivers/gpu/drm/drm_syncobj.c  | 74 ++
 include/uapi/drm/drm.h | 10 +
 4 files changed, 88 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 695179bb88dc..dd11ae5f1eef 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -180,6 +180,8 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_transfer_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
 int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 7a534c184e52..92b3b7b2fd81 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -686,6 +686,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, 
drm_syncobj_fd_to_handle_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TRANSFER, drm_syncobj_transfer_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 087fd4e7eaf3..ee2d66e047e7 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -679,6 +679,80 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
>handle);
 }
 
+static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
+   struct drm_syncobj_transfer *args)
+{
+   struct drm_syncobj *timeline_syncobj = NULL;
+   struct dma_fence *fence;
+   struct dma_fence_chain *chain;
+   int ret;
+
+   timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle);
+   if (!timeline_syncobj) {
+   return -ENOENT;
+   }
+   ret = drm_syncobj_find_fence(file_private, args->src_handle,
+args->src_point, args->flags,
+);
+   if (ret)
+   goto err;
+   chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+   if (!chain) {
+   ret = -ENOMEM;
+   goto err1;
+   }
+   drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point);
+err1:
+   dma_fence_put(fence);
+err:
+   drm_syncobj_put(timeline_syncobj);
+
+   return ret;
+}
+
+static int
+drm_syncobj_transfer_to_binary(struct drm_file *file_private,
+  struct drm_syncobj_transfer *args)
+{
+   struct drm_syncobj *binary_syncobj = NULL;
+   struct dma_fence *fence;
+   int ret;
+
+   binary_syncobj = drm_syncobj_find(file_private, args->dst_handle);
+   if (!binary_syncobj)
+   return -ENOENT;
+   ret = drm_syncobj_find_fence(file_private, args->src_handle,
+args->src_point, args->flags, );
+   if (ret)
+   goto err;
+   drm_syncobj_replace_fence(binary_syncobj, fence);
+   dma_fence_put(fence);
+err:
+   drm_syncobj_put(binary_syncobj);
+
+   return ret;
+}
+int
+drm_syncobj_transfer_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private)
+{
+   struct drm_syncobj_transfer *args = data;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad)
+   return -EINVAL;
+
+   if (args->dst_point)
+   ret = drm_syncobj_transfer_to_timeline(file_private, args);
+   else
+   ret = drm_syncobj_transfer_to_binary(file_private, args);
+
+   return ret;
+}
+
 static void syncobj_wait_fence_func(struct dma_fence *fence,
struct dma_fence_cb *cb)
 {
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index c62be0840ba5..e8d0d6b51875 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -735,6 +735,15 @@ struct drm_syncobj_handl

[PATCH 6/9] drm/amdgpu: add timeline support in amdgpu CS v3

2019-03-19 Thread Chunming Zhou
syncobj wait/signal operation is appending in command submission.
v2: separate to two kinds in/out_deps functions
v3: fix checking for timeline syncobj

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 152 +
 include/uapi/drm/amdgpu_drm.h  |   8 ++
 3 files changed, 144 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8d0d7f3dd5fb..deec2c796253 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -433,6 +433,12 @@ struct amdgpu_cs_chunk {
void*kdata;
 };
 
+struct amdgpu_cs_post_dep {
+   struct drm_syncobj *syncobj;
+   struct dma_fence_chain *chain;
+   u64 point;
+};
+
 struct amdgpu_cs_parser {
struct amdgpu_device*adev;
struct drm_file *filp;
@@ -462,8 +468,8 @@ struct amdgpu_cs_parser {
/* user fence */
struct amdgpu_bo_list_entry uf_entry;
 
-   unsigned num_post_dep_syncobjs;
-   struct drm_syncobj **post_dep_syncobjs;
+   unsignednum_post_deps;
+   struct amdgpu_cs_post_dep   *post_deps;
 };
 
 static inline u32 amdgpu_get_ib_value(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 52a5e4fdc95b..2f6239b6be6f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -215,6 +215,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, union drm_amdgpu_cs
case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
break;
 
default:
@@ -804,9 +806,11 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser 
*parser, int error,
ttm_eu_backoff_reservation(>ticket,
   >validated);
 
-   for (i = 0; i < parser->num_post_dep_syncobjs; i++)
-   drm_syncobj_put(parser->post_dep_syncobjs[i]);
-   kfree(parser->post_dep_syncobjs);
+   for (i = 0; i < parser->num_post_deps; i++) {
+   drm_syncobj_put(parser->post_deps[i].syncobj);
+   kfree(parser->post_deps[i].chain);
+   }
+   kfree(parser->post_deps);
 
dma_fence_put(parser->fence);
 
@@ -1117,13 +1121,18 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
 }
 
 static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
-uint32_t handle)
+uint32_t handle, u64 point,
+u64 flags)
 {
-   int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, 0, 0, );
-   if (r)
+   int r;
+
+   r = drm_syncobj_find_fence(p->filp, handle, point, flags, );
+   if (r) {
+   DRM_ERROR("syncobj %u failed to find fence @ %llu (%d)!\n",
+ handle, point, r);
return r;
+   }
 
r = amdgpu_sync_fence(p->adev, >job->sync, fence, true);
dma_fence_put(fence);
@@ -1134,46 +1143,118 @@ static int 
amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
 static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
struct amdgpu_cs_chunk *chunk)
 {
+   struct drm_amdgpu_cs_chunk_sem *deps;
unsigned num_deps;
int i, r;
-   struct drm_amdgpu_cs_chunk_sem *deps;
 
deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
num_deps = chunk->length_dw * 4 /
sizeof(struct drm_amdgpu_cs_chunk_sem);
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle,
+ 0, 0);
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
+
 
+static int amdgpu_cs_process_syncobj_timeline_in_dep(struct amdgpu_cs_parser 
*p,
+struct amdgpu_cs_chunk 
*chunk)
+{
+   struct drm_amdgpu_cs_chunk_syncobj *syncobj_deps;
+   unsigned num_deps;
+   int i, r;
+
+   syncobj_deps = (struct drm_amdgpu_cs_chunk_syncobj *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_syncobj);
for (i = 0; i < num_deps; ++i

[PATCH 9/9] drm/amdgpu: update version for timeline syncobj support in amdgpu

2019-03-19 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 8a0732088640..4d8db87048d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -74,9 +74,10 @@
  * - 3.28.0 - Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
  * - 3.29.0 - Add AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID
  * - 3.30.0 - Add AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE.
+ * - 3.31.0 - Add syncobj timeline support to AMDGPU_CS.
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   30
+#define KMS_DRIVER_MINOR   31
 #define KMS_DRIVER_PATCHLEVEL  0
 
 int amdgpu_vram_limit = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/9] drm/syncobj: add support for timeline point wait v8

2019-03-19 Thread Chunming Zhou
points array is one-to-one match with syncobjs array.
v2:
add seperate ioctl for timeline point wait, otherwise break uapi.
v3:
userspace can specify two kinds waits::
a. Wait for time point to be completed.
b. and wait for time point to become available
v4:
rebase
v5:
add comment for xxx_WAIT_AVAILABLE
v6: rebase and rework on new container
v7: drop _WAIT_COMPLETED, it is the default anyway
v8: correctly handle garbage collected fences

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 153 ++---
 include/uapi/drm/drm.h |  15 
 4 files changed, 143 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 251d67e04c2d..331ac6225b58 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -182,6 +182,8 @@ int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 687943df58e1..c984654646fa 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -688,6 +688,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET, drm_syncobj_reset_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 19a9ce638119..eae51978cda4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -61,6 +61,7 @@ struct syncobj_wait_entry {
struct task_struct *task;
struct dma_fence *fence;
struct dma_fence_cb fence_cb;
+   u64point;
 };
 
 static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
@@ -95,6 +96,8 @@ EXPORT_SYMBOL(drm_syncobj_find);
 static void drm_syncobj_fence_add_wait(struct drm_syncobj *syncobj,
   struct syncobj_wait_entry *wait)
 {
+   struct dma_fence *fence;
+
if (wait->fence)
return;
 
@@ -103,11 +106,15 @@ static void drm_syncobj_fence_add_wait(struct drm_syncobj 
*syncobj,
 * have the lock, try one more time just to be sure we don't add a
 * callback when a fence has already been set.
 */
-   if (syncobj->fence)
-   wait->fence = dma_fence_get(
-   rcu_dereference_protected(syncobj->fence, 1));
-   else
+   fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+   if (!fence || dma_fence_chain_find_seqno(, wait->point)) {
+   dma_fence_put(fence);
list_add_tail(>node, >cb_list);
+   } else if (!fence) {
+   wait->fence = dma_fence_get_stub();
+   } else {
+   wait->fence = fence;
+   }
spin_unlock(>lock);
 }
 
@@ -149,10 +156,8 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
dma_fence_chain_init(chain, prev, fence, point);
rcu_assign_pointer(syncobj->fence, >base);
 
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
spin_unlock(>lock);
 
/* Walk the chain once to trigger garbage collection */
@@ -184,10 +189,8 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
rcu_assign_pointer(syncobj->fence, fence);
 
if (fence != old_fence) {
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
}
 
spin_unlock(>lock);
@@ -644,

[PATCH 8/9] drm/syncobj: add timeline signal ioctl for syncobj v4

2019-03-19 Thread Chunming Zhou
v2: individually allocate chain array, since chain node is free independently.
v3: all existing points must be already signaled before cpu perform signal 
operation,
so add check condition for that.
v4: remove v3 change and add checking to prevent out-of-order

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 +
 drivers/gpu/drm/drm_ioctl.c|  2 +
 drivers/gpu/drm/drm_syncobj.c  | 93 ++
 include/uapi/drm/drm.h |  1 +
 4 files changed, 98 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index dd11ae5f1eef..d9a483a5fce0 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -190,6 +190,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 92b3b7b2fd81..d337f161909c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -696,6 +696,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, 
drm_syncobj_timeline_signal_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index ee2d66e047e7..a3702c75fd1e 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1183,6 +1183,99 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
return ret;
 }
 
+int
+drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   struct dma_fence_chain **chains;
+   uint64_t *points;
+   uint32_t i, j, timeline_count = 0;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   points = kmalloc_array(args->count_handles, sizeof(*points),
+  GFP_KERNEL);
+   if (!points) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   if (!u64_to_user_ptr(args->points)) {
+   memset(points, 0, args->count_handles * sizeof(uint64_t));
+   } else if (copy_from_user(points, u64_to_user_ptr(args->points),
+ sizeof(uint64_t) * args->count_handles)) {
+   ret = -EFAULT;
+   goto err_points;
+   }
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+   struct dma_fence *fence;
+
+   fence = drm_syncobj_fence_get(syncobjs[i]);
+   chain = to_dma_fence_chain(fence);
+   if (chain) {
+   if (points[i] <= fence->seqno) {
+   DRM_ERROR("signal point canot be 
out-of-order!\n");
+   ret = -EPERM;
+   goto err_points;
+   }
+   }
+   if (points[i])
+   timeline_count++;
+   }
+
+   chains = kmalloc_array(timeline_count, sizeof(void *), GFP_KERNEL);
+   if (!chains) {
+   ret = -ENOMEM;
+   goto err_points;
+   }
+   for (i = 0; i < timeline_count; i++) {
+   chains[i] = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+   if (!chains[i]) {
+   for (j = 0; j < i; j++)
+

[PATCH 5/9] drm/syncobj: use the timeline point in drm_syncobj_find_fence v4

2019-03-19 Thread Chunming Zhou
From: Christian König 

Implement finding the right timeline point in drm_syncobj_find_fence.

v2: return -EINVAL when the point is not submitted yet.
v3: fix reference counting bug, add flags handling as well
v4: add timeout for find fence

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_syncobj.c | 50 ---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0e62a793c8dd..087fd4e7eaf3 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -213,6 +213,8 @@ static void drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)
dma_fence_put(fence);
 }
 
+/* 5s default for wait submission */
+#define DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT 50ULL
 /**
  * drm_syncobj_find_fence - lookup and reference the fence in a sync object
  * @file_private: drm file private pointer
@@ -233,16 +235,58 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
   struct dma_fence **fence)
 {
struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
-   int ret = 0;
+   struct syncobj_wait_entry wait;
+   u64 timeout = nsecs_to_jiffies64(DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT);
+   int ret;
 
if (!syncobj)
return -ENOENT;
 
*fence = drm_syncobj_fence_get(syncobj);
-   if (!*fence) {
+   drm_syncobj_put(syncobj);
+
+   if (*fence) {
+   ret = dma_fence_chain_find_seqno(fence, point);
+   if (!ret)
+   return 0;
+   dma_fence_put(*fence);
+   } else {
ret = -EINVAL;
}
-   drm_syncobj_put(syncobj);
+
+   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
+   return ret;
+
+   memset(, 0, sizeof(wait));
+   wait.task = current;
+   wait.point = point;
+   drm_syncobj_fence_add_wait(syncobj, );
+
+   do {
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (wait.fence) {
+   ret = 0;
+   break;
+   }
+if (timeout == 0) {
+ret = -ETIME;
+break;
+}
+
+   if (signal_pending(current)) {
+   ret = -ERESTARTSYS;
+   break;
+   }
+
+timeout = schedule_timeout(timeout);
+   } while (1);
+
+   __set_current_state(TASK_RUNNING);
+   *fence = wait.fence;
+
+   if (wait.node.next)
+   drm_syncobj_remove_wait(syncobj, );
+
return ret;
 }
 EXPORT_SYMBOL(drm_syncobj_find_fence);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/9] drm/syncobj: add timeline payload query ioctl v6

2019-03-19 Thread Chunming Zhou
user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface
v5: query last signaled timeline point, not last point.
v6: add unorder point check

Signed-off-by: Chunming Zhou 
Cc: Tobias Hector 
Cc: Jason Ekstrand 
Cc: Dave Airlie 
Cc: Chris Wilson 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  2 ++
 drivers/gpu/drm/drm_syncobj.c  | 62 ++
 include/uapi/drm/drm.h | 10 ++
 4 files changed, 76 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 331ac6225b58..695179bb88dc 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -188,6 +188,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index c984654646fa..7a534c184e52 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -694,6 +694,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index eae51978cda4..0e62a793c8dd 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1064,3 +1064,65 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
 
return ret;
 }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   uint64_t __user *points = u64_to_user_ptr(args->points);
+   uint32_t i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+   struct dma_fence *fence;
+   uint64_t point;
+
+   fence = drm_syncobj_fence_get(syncobjs[i]);
+   chain = to_dma_fence_chain(fence);
+   if (chain) {
+   struct dma_fence *iter, *last_signaled = NULL;
+
+   dma_fence_chain_for_each(iter, fence) {
+   if (!iter)
+   break;
+   dma_fence_put(last_signaled);
+   last_signaled = dma_fence_get(iter);
+   if 
(!to_dma_fence_chain(last_signaled)->prev_seqno)
+   /* It is most likely that timeline has
+* unorder points. */
+   break;
+   }
+   point = dma_fence_is_signaled(last_signaled) ?
+   last_signaled->seqno :
+   to_dma_fence_chain(last_signaled)->prev_seqno;
+   dma_fence_put(last_signaled);
+   } else {
+   point = 0;
+   }
+   ret = copy_to_user([i], , sizeof(uint64_t));
+   ret = ret ? -EFAULT : 0;
+   if (ret)
+   break;
+   }
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 44ebcdd9bd1d..c62be0840ba5 100644
--- a/

[PATCH 1/9] dma-buf: add new dma_fence_chain container v6

2019-03-19 Thread Chunming Zhou
From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
drop prev reference during garbage collection if it's not a chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling
v5: fix iteration when walking each chain node
v6: add __rcu for member 'prev' of struct chain node

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
---
 drivers/dma-buf/Makefile  |   3 +-
 drivers/dma-buf/dma-fence-chain.c | 241 ++
 include/linux/dma-fence-chain.h   |  81 ++
 3 files changed, 324 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma-buf/dma-fence-chain.c
 create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
 obj-$(CONFIG_SYNC_FILE)+= sync_file.o
 obj-$(CONFIG_SW_SYNC)  += sw_sync.o sync_debug.o
 obj-$(CONFIG_UDMABUF)  += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..0c5e3c902fa0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(>prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg(>prev, prev, replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence pointer to the chain node which will signal this sequence
+ * number. If no sequence number is provided then this is a no-op.
+ *
+ * Returns EINVAL if the fence is not a chain node or the sequence number has
+ * not yet advanced far enough.
+ */
+int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)
+{
+   struct dma_fence_chain *chain;
+
+   if (!seqno)
+   return 0;
+
+   chain = to_dma_fence_chain(*pfence);
+   if (!chain || chain->base.seqno < seqno)
+  

[PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point interface v4

2019-03-19 Thread Chunming Zhou
From: Christian König 

Use the dma_fence_chain object to create a timeline of fence objects
instead of just replacing the existing fence.

v2: rebase and cleanup
v3: fix garbage collection parameters
v4: add unorder point check, print a warn calltrace

Signed-off-by: Christian König 
Cc: Lionel Landwerlin 
---
 drivers/gpu/drm/drm_syncobj.c | 39 +++
 include/drm/drm_syncobj.h |  5 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 5329e66598c6..19a9ce638119 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -122,6 +122,45 @@ static void drm_syncobj_remove_wait(struct drm_syncobj 
*syncobj,
spin_unlock(>lock);
 }
 
+/**
+ * drm_syncobj_add_point - add new timeline point to the syncobj
+ * @syncobj: sync object to add timeline point do
+ * @chain: chain node to use to add the point
+ * @fence: fence to encapsulate in the chain node
+ * @point: sequence number to use for the point
+ *
+ * Add the chain node as new timeline point to the syncobj.
+ */
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point)
+{
+   struct syncobj_wait_entry *cur, *tmp;
+   struct dma_fence *prev;
+
+   dma_fence_get(fence);
+
+   spin_lock(>lock);
+
+   prev = drm_syncobj_fence_get(syncobj);
+   /* You are adding an unorder point to timeline, which could cause 
payload returned from query_ioctl is 0! */
+   WARN_ON_ONCE(prev && prev->seqno >= point);
+   dma_fence_chain_init(chain, prev, fence, point);
+   rcu_assign_pointer(syncobj->fence, >base);
+
+   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
+   list_del_init(>node);
+   syncobj_wait_syncobj_func(syncobj, cur);
+   }
+   spin_unlock(>lock);
+
+   /* Walk the chain once to trigger garbage collection */
+   dma_fence_chain_for_each(fence, prev);
+   dma_fence_put(prev);
+}
+EXPORT_SYMBOL(drm_syncobj_add_point);
+
 /**
  * drm_syncobj_replace_fence - replace fence in a sync object.
  * @syncobj: Sync object to replace fence in
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 0311c9fdbd2f..6cf7243a1dc5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -27,6 +27,7 @@
 #define __DRM_SYNCOBJ_H__
 
 #include 
+#include 
 
 struct drm_file;
 
@@ -112,6 +113,10 @@ drm_syncobj_fence_get(struct drm_syncobj *syncobj)
 
 struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
 u32 handle);
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point);
 void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
   struct dma_fence *fence);
 int drm_syncobj_find_fence(struct drm_file *file_private,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 8/9] drm/syncobj: add timeline signal ioctl for syncobj v3

2019-03-15 Thread Chunming Zhou
v2: individually allocate chain array, since chain node is free independently.
v3: all existing points must be already signaled before cpu perform signal 
operation,
so add check condition for that.

Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 103 +
 include/uapi/drm/drm.h |   1 +
 4 files changed, 108 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index dd11ae5f1eef..d9a483a5fce0 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -190,6 +190,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 92b3b7b2fd81..d337f161909c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -696,6 +696,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, 
drm_syncobj_timeline_signal_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 306c7b7e2770..eaeb038f97d7 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1183,6 +1183,109 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
return ret;
 }
 
+int
+drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   struct dma_fence_chain **chains;
+   uint64_t *points;
+   uint32_t i, j, timeline_count = 0;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+struct dma_fence *fence;
+
+fence = drm_syncobj_fence_get(syncobjs[i]);
+chain = to_dma_fence_chain(fence);
+if (chain) {
+struct dma_fence *iter;
+
+dma_fence_chain_for_each(iter, fence) {
+if (!iter)
+break;
+   if (!dma_fence_is_signaled(iter)) {
+   dma_fence_put(iter);
+   DRM_ERROR("Client must guarantee all 
existing timeline points signaled before performing host signal operation!");
+   ret = -EPERM;
+   goto out;
+   }
+}
+}
+}
+
+   points = kmalloc_array(args->count_handles, sizeof(*points),
+  GFP_KERNEL);
+   if (!points) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   if (!u64_to_user_ptr(args->points)) {
+   memset(points, 0, args->count_handles * sizeof(uint64_t));
+   } else if (copy_from_user(points, u64_to_user_ptr(args->points),
+ sizeof(uint64_t) * args->count_handles)) {
+   ret = -EFAULT;
+   goto err_points;
+   }
+
+
+   for (i = 0; i < args->count_handles; i++) {
+   if (points[i])
+   timeline_count++;
+   }
+   chains = kmalloc_array(timeline_count, sizeof(void *), GFP_KERNEL);
+   if (!chains) {
+ 

[PATCH 6/9] drm/amdgpu: add timeline support in amdgpu CS v3

2019-03-15 Thread Chunming Zhou
syncobj wait/signal operation is appending in command submission.
v2: separate to two kinds in/out_deps functions
v3: fix checking for timeline syncobj

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 152 +
 include/uapi/drm/amdgpu_drm.h  |   8 ++
 3 files changed, 144 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8d0d7f3dd5fb..deec2c796253 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -433,6 +433,12 @@ struct amdgpu_cs_chunk {
void*kdata;
 };
 
+struct amdgpu_cs_post_dep {
+   struct drm_syncobj *syncobj;
+   struct dma_fence_chain *chain;
+   u64 point;
+};
+
 struct amdgpu_cs_parser {
struct amdgpu_device*adev;
struct drm_file *filp;
@@ -462,8 +468,8 @@ struct amdgpu_cs_parser {
/* user fence */
struct amdgpu_bo_list_entry uf_entry;
 
-   unsigned num_post_dep_syncobjs;
-   struct drm_syncobj **post_dep_syncobjs;
+   unsignednum_post_deps;
+   struct amdgpu_cs_post_dep   *post_deps;
 };
 
 static inline u32 amdgpu_get_ib_value(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 52a5e4fdc95b..2f6239b6be6f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -215,6 +215,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, union drm_amdgpu_cs
case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
break;
 
default:
@@ -804,9 +806,11 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser 
*parser, int error,
ttm_eu_backoff_reservation(>ticket,
   >validated);
 
-   for (i = 0; i < parser->num_post_dep_syncobjs; i++)
-   drm_syncobj_put(parser->post_dep_syncobjs[i]);
-   kfree(parser->post_dep_syncobjs);
+   for (i = 0; i < parser->num_post_deps; i++) {
+   drm_syncobj_put(parser->post_deps[i].syncobj);
+   kfree(parser->post_deps[i].chain);
+   }
+   kfree(parser->post_deps);
 
dma_fence_put(parser->fence);
 
@@ -1117,13 +1121,18 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
 }
 
 static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
-uint32_t handle)
+uint32_t handle, u64 point,
+u64 flags)
 {
-   int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, 0, 0, );
-   if (r)
+   int r;
+
+   r = drm_syncobj_find_fence(p->filp, handle, point, flags, );
+   if (r) {
+   DRM_ERROR("syncobj %u failed to find fence @ %llu (%d)!\n",
+ handle, point, r);
return r;
+   }
 
r = amdgpu_sync_fence(p->adev, >job->sync, fence, true);
dma_fence_put(fence);
@@ -1134,46 +1143,118 @@ static int 
amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
 static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
struct amdgpu_cs_chunk *chunk)
 {
+   struct drm_amdgpu_cs_chunk_sem *deps;
unsigned num_deps;
int i, r;
-   struct drm_amdgpu_cs_chunk_sem *deps;
 
deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
num_deps = chunk->length_dw * 4 /
sizeof(struct drm_amdgpu_cs_chunk_sem);
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle,
+ 0, 0);
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
+
 
+static int amdgpu_cs_process_syncobj_timeline_in_dep(struct amdgpu_cs_parser 
*p,
+struct amdgpu_cs_chunk 
*chunk)
+{
+   struct drm_amdgpu_cs_chunk_syncobj *syncobj_deps;
+   unsigned num_deps;
+   int i, r;
+
+   syncobj_deps = (struct drm_amdgpu_cs_chunk_syncobj *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_syncobj);
for (i = 0; i 

[PATCH 5/9] drm/syncobj: use the timeline point in drm_syncobj_find_fence v3

2019-03-15 Thread Chunming Zhou
From: Christian König 

Implement finding the right timeline point in drm_syncobj_find_fence.

v2: return -EINVAL when the point is not submitted yet.
v3: fix reference counting bug, add flags handling as well
v4: add timeout for find fence

Signed-off-by: Christian König 
---
 drivers/gpu/drm/drm_syncobj.c | 50 ---
 1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 0e62a793c8dd..dd19c47d0b44 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -213,6 +213,8 @@ static void drm_syncobj_assign_null_handle(struct 
drm_syncobj *syncobj)
dma_fence_put(fence);
 }
 
+/* 5s */
+#define DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT 50
 /**
  * drm_syncobj_find_fence - lookup and reference the fence in a sync object
  * @file_private: drm file private pointer
@@ -233,16 +235,58 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
   struct dma_fence **fence)
 {
struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
-   int ret = 0;
+   struct syncobj_wait_entry wait;
+   u64 timeout = nsecs_to_jiffies64(DRM_SYNCOBJ_WAIT_FOR_SUBMIT_TIMEOUT);
+   int ret;
 
if (!syncobj)
return -ENOENT;
 
*fence = drm_syncobj_fence_get(syncobj);
-   if (!*fence) {
+   drm_syncobj_put(syncobj);
+
+   if (*fence) {
+   ret = dma_fence_chain_find_seqno(fence, point);
+   if (!ret)
+   return 0;
+   dma_fence_put(*fence);
+   } else {
ret = -EINVAL;
}
-   drm_syncobj_put(syncobj);
+
+   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
+   return ret;
+
+   memset(, 0, sizeof(wait));
+   wait.task = current;
+   wait.point = point;
+   drm_syncobj_fence_add_wait(syncobj, );
+
+   do {
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (wait.fence) {
+   ret = 0;
+   break;
+   }
+if (timeout == 0) {
+ret = -ETIME;
+break;
+}
+
+   if (signal_pending(current)) {
+   ret = -ERESTARTSYS;
+   break;
+   }
+
+timeout = schedule_timeout(timeout);
+   } while (1);
+
+   __set_current_state(TASK_RUNNING);
+   *fence = wait.fence;
+
+   if (wait.node.next)
+   drm_syncobj_remove_wait(syncobj, );
+
return ret;
 }
 EXPORT_SYMBOL(drm_syncobj_find_fence);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 9/9] drm/amdgpu: update version for timeline syncobj support in amdgpu

2019-03-15 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 8a0732088640..4d8db87048d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -74,9 +74,10 @@
  * - 3.28.0 - Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
  * - 3.29.0 - Add AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID
  * - 3.30.0 - Add AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE.
+ * - 3.31.0 - Add syncobj timeline support to AMDGPU_CS.
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   30
+#define KMS_DRIVER_MINOR   31
 #define KMS_DRIVER_PATCHLEVEL  0
 
 int amdgpu_vram_limit = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/9] drm/syncobj: add timeline payload query ioctl v6

2019-03-15 Thread Chunming Zhou
user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface
v5: query last signaled timeline point, not last point.
v6: add unorder point check

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  2 ++
 drivers/gpu/drm/drm_syncobj.c  | 62 ++
 include/uapi/drm/drm.h | 10 ++
 4 files changed, 76 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 331ac6225b58..695179bb88dc 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -188,6 +188,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index c984654646fa..7a534c184e52 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -694,6 +694,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index eae51978cda4..0e62a793c8dd 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1064,3 +1064,65 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
 
return ret;
 }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   uint64_t __user *points = u64_to_user_ptr(args->points);
+   uint32_t i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+   struct dma_fence *fence;
+   uint64_t point;
+
+   fence = drm_syncobj_fence_get(syncobjs[i]);
+   chain = to_dma_fence_chain(fence);
+   if (chain) {
+   struct dma_fence *iter, *last_signaled = NULL;
+
+   dma_fence_chain_for_each(iter, fence) {
+   if (!iter)
+   break;
+   dma_fence_put(last_signaled);
+   last_signaled = dma_fence_get(iter);
+   if 
(!to_dma_fence_chain(last_signaled)->prev_seqno)
+   /* It is most likely that timeline has
+* unorder points. */
+   break;
+   }
+   point = dma_fence_is_signaled(last_signaled) ?
+   last_signaled->seqno :
+   to_dma_fence_chain(last_signaled)->prev_seqno;
+   dma_fence_put(last_signaled);
+   } else {
+   point = 0;
+   }
+   ret = copy_to_user([i], , sizeof(uint64_t));
+   ret = ret ? -EFAULT : 0;
+   if (ret)
+   break;
+   }
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 0092111d002c..b2c36f

[PATCH 7/9] drm/syncobj: add transition iotcls between binary and timeline v2

2019-03-15 Thread Chunming Zhou
we need to import/export timeline point.

v2: unify to one transfer ioctl

Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/drm_internal.h |  2 +
 drivers/gpu/drm/drm_ioctl.c|  2 +
 drivers/gpu/drm/drm_syncobj.c  | 74 ++
 include/uapi/drm/drm.h | 10 +
 4 files changed, 88 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 695179bb88dc..dd11ae5f1eef 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -180,6 +180,8 @@ int drm_syncobj_handle_to_fd_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_transfer_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
 int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 7a534c184e52..92b3b7b2fd81 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -686,6 +686,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_FD_TO_HANDLE, 
drm_syncobj_fd_to_handle_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TRANSFER, drm_syncobj_transfer_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index dd19c47d0b44..306c7b7e2770 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -679,6 +679,80 @@ drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
>handle);
 }
 
+static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
+   struct drm_syncobj_transfer *args)
+{
+   struct drm_syncobj *timeline_syncobj = NULL;
+   struct dma_fence *fence;
+   struct dma_fence_chain *chain;
+   int ret;
+
+   timeline_syncobj = drm_syncobj_find(file_private, args->dst_handle);
+   if (!timeline_syncobj) {
+   return -ENOENT;
+   }
+   ret = drm_syncobj_find_fence(file_private, args->src_handle,
+args->src_point, args->flags,
+);
+   if (ret)
+   goto err;
+   chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
+   if (!chain) {
+   ret = -ENOMEM;
+   goto err1;
+   }
+   drm_syncobj_add_point(timeline_syncobj, chain, fence, args->dst_point);
+err1:
+   dma_fence_put(fence);
+err:
+   drm_syncobj_put(timeline_syncobj);
+
+   return ret;
+}
+
+static int
+drm_syncobj_transfer_to_binary(struct drm_file *file_private,
+  struct drm_syncobj_transfer *args)
+{
+   struct drm_syncobj *binary_syncobj = NULL;
+   struct dma_fence *fence;
+   int ret;
+
+   binary_syncobj = drm_syncobj_find(file_private, args->dst_handle);
+   if (!binary_syncobj)
+   return -ENOENT;
+   ret = drm_syncobj_find_fence(file_private, args->src_handle,
+args->src_point, args->flags, );
+   if (ret)
+   goto err;
+   drm_syncobj_replace_fence(binary_syncobj, fence);
+   dma_fence_put(fence);
+err:
+   drm_syncobj_put(binary_syncobj);
+
+   return ret;
+}
+int
+drm_syncobj_transfer_ioctl(struct drm_device *dev, void *data,
+  struct drm_file *file_private)
+{
+   struct drm_syncobj_transfer *args = data;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad)
+   return -EINVAL;
+
+   if (args->dst_point)
+   ret = drm_syncobj_transfer_to_timeline(file_private, args);
+   else
+   ret = drm_syncobj_transfer_to_binary(file_private, args);
+
+   return ret;
+}
+
 static void syncobj_wait_fence_func(struct dma_fence *fence,
struct dma_fence_cb *cb)
 {
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index b2c36f2b2599..4c1e2e6579fa 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -735,6 +735,15 @@ struct drm_syncobj_handle {
__u32 pad;
 };
 
+struct 

[PATCH 3/9] drm/syncobj: add support for timeline point wait v8

2019-03-15 Thread Chunming Zhou
points array is one-to-one match with syncobjs array.
v2:
add seperate ioctl for timeline point wait, otherwise break uapi.
v3:
userspace can specify two kinds waits::
a. Wait for time point to be completed.
b. and wait for time point to become available
v4:
rebase
v5:
add comment for xxx_WAIT_AVAILABLE
v6: rebase and rework on new container
v7: drop _WAIT_COMPLETED, it is the default anyway
v8: correctly handle garbage collected fences

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Chris Wilson 
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 153 ++---
 include/uapi/drm/drm.h |  15 
 4 files changed, 143 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 251d67e04c2d..331ac6225b58 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -182,6 +182,8 @@ int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 687943df58e1..c984654646fa 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -688,6 +688,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET, drm_syncobj_reset_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 19a9ce638119..eae51978cda4 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -61,6 +61,7 @@ struct syncobj_wait_entry {
struct task_struct *task;
struct dma_fence *fence;
struct dma_fence_cb fence_cb;
+   u64point;
 };
 
 static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
@@ -95,6 +96,8 @@ EXPORT_SYMBOL(drm_syncobj_find);
 static void drm_syncobj_fence_add_wait(struct drm_syncobj *syncobj,
   struct syncobj_wait_entry *wait)
 {
+   struct dma_fence *fence;
+
if (wait->fence)
return;
 
@@ -103,11 +106,15 @@ static void drm_syncobj_fence_add_wait(struct drm_syncobj 
*syncobj,
 * have the lock, try one more time just to be sure we don't add a
 * callback when a fence has already been set.
 */
-   if (syncobj->fence)
-   wait->fence = dma_fence_get(
-   rcu_dereference_protected(syncobj->fence, 1));
-   else
+   fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+   if (!fence || dma_fence_chain_find_seqno(, wait->point)) {
+   dma_fence_put(fence);
list_add_tail(>node, >cb_list);
+   } else if (!fence) {
+   wait->fence = dma_fence_get_stub();
+   } else {
+   wait->fence = fence;
+   }
spin_unlock(>lock);
 }
 
@@ -149,10 +156,8 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
dma_fence_chain_init(chain, prev, fence, point);
rcu_assign_pointer(syncobj->fence, >base);
 
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
spin_unlock(>lock);
 
/* Walk the chain once to trigger garbage collection */
@@ -184,10 +189,8 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
rcu_assign_pointer(syncobj->fence, fence);
 
if (fence != old_fence) {
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
}
 
spin_unlock(>lock);
@@ -644,

[PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point interface v4

2019-03-15 Thread Chunming Zhou
From: Christian König 

Use the dma_fence_chain object to create a timeline of fence objects
instead of just replacing the existing fence.

v2: rebase and cleanup
v3: fix garbage collection parameters
v4: add unorder point check, print a warn calltrace

Signed-off-by: Christian König 
---
 drivers/gpu/drm/drm_syncobj.c | 39 +++
 include/drm/drm_syncobj.h |  5 +
 2 files changed, 44 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 5329e66598c6..19a9ce638119 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -122,6 +122,45 @@ static void drm_syncobj_remove_wait(struct drm_syncobj 
*syncobj,
spin_unlock(>lock);
 }
 
+/**
+ * drm_syncobj_add_point - add new timeline point to the syncobj
+ * @syncobj: sync object to add timeline point do
+ * @chain: chain node to use to add the point
+ * @fence: fence to encapsulate in the chain node
+ * @point: sequence number to use for the point
+ *
+ * Add the chain node as new timeline point to the syncobj.
+ */
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point)
+{
+   struct syncobj_wait_entry *cur, *tmp;
+   struct dma_fence *prev;
+
+   dma_fence_get(fence);
+
+   spin_lock(>lock);
+
+   prev = drm_syncobj_fence_get(syncobj);
+   /* You are adding an unorder point to timeline, which could cause 
payload returned from query_ioctl is 0! */
+   WARN_ON_ONCE(prev && prev->seqno >= point);
+   dma_fence_chain_init(chain, prev, fence, point);
+   rcu_assign_pointer(syncobj->fence, >base);
+
+   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
+   list_del_init(>node);
+   syncobj_wait_syncobj_func(syncobj, cur);
+   }
+   spin_unlock(>lock);
+
+   /* Walk the chain once to trigger garbage collection */
+   dma_fence_chain_for_each(fence, prev);
+   dma_fence_put(prev);
+}
+EXPORT_SYMBOL(drm_syncobj_add_point);
+
 /**
  * drm_syncobj_replace_fence - replace fence in a sync object.
  * @syncobj: Sync object to replace fence in
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 0311c9fdbd2f..6cf7243a1dc5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -27,6 +27,7 @@
 #define __DRM_SYNCOBJ_H__
 
 #include 
+#include 
 
 struct drm_file;
 
@@ -112,6 +113,10 @@ drm_syncobj_fence_get(struct drm_syncobj *syncobj)
 
 struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
 u32 handle);
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point);
 void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
   struct dma_fence *fence);
 int drm_syncobj_find_fence(struct drm_file *file_private,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/9] dma-buf: add new dma_fence_chain container v5

2019-03-15 Thread Chunming Zhou
From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
drop prev reference during garbage collection if it's not a chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling
v5: fix iteration when walking each chain node

Signed-off-by: Christian König 
---
 drivers/dma-buf/Makefile  |   3 +-
 drivers/dma-buf/dma-fence-chain.c | 241 ++
 include/linux/dma-fence-chain.h   |  81 ++
 3 files changed, 324 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma-buf/dma-fence-chain.c
 create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
 obj-$(CONFIG_SYNC_FILE)+= sync_file.o
 obj-$(CONFIG_SW_SYNC)  += sw_sync.o sync_debug.o
 obj-$(CONFIG_UDMABUF)  += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..0c5e3c902fa0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(>prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg(>prev, prev, replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence pointer to the chain node which will signal this sequence
+ * number. If no sequence number is provided then this is a no-op.
+ *
+ * Returns EINVAL if the fence is not a chain node or the sequence number has
+ * not yet advanced far enough.
+ */
+int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)
+{
+   struct dma_fence_chain *chain;
+
+   if (!seqno)
+   return 0;
+
+   chain = to_dma_fence_chain(*pfence);
+   if (!chain || chain->base.seqno < seqno)
+   return -EINVAL;
+
+   dma_fence_chain_for_each(*pfence, >base) 

Re: [PATCH 1/3] drm/amdgpu: remove non-sense NULL ptr check

2019-03-12 Thread Chunming Zhou
The series is Reviewed-by: Chunming Zhou 

在 2019/3/8 22:31, Christian König 写道:
> It's a bug having a dead pointer in the IDR, silently returning
> is the worst we can do.
>
> Signed-off-by: Christian König 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 10 --
>   1 file changed, 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> index 736ed1d67ec2..b7289f709644 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
> @@ -570,12 +570,6 @@ void amdgpu_ctx_mgr_entity_flush(struct amdgpu_ctx_mgr 
> *mgr)
>   
>   mutex_lock(>lock);
>   idr_for_each_entry(idp, ctx, id) {
> -
> - if (!ctx->adev) {
> - mutex_unlock(>lock);
> - return;
> - }
> -
>   for (i = 0; i < num_entities; i++) {
>   struct drm_sched_entity *entity;
>   
> @@ -596,10 +590,6 @@ void amdgpu_ctx_mgr_entity_fini(struct amdgpu_ctx_mgr 
> *mgr)
>   idp = >ctx_handles;
>   
>   idr_for_each_entry(idp, ctx, id) {
> -
> - if (!ctx->adev)
> - return;
> -
>   if (kref_read(>refcount) != 1) {
>   DRM_ERROR("ctx %p is still alive\n", ctx);
>   continue;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 9/9] drm/amdgpu: update version for timeline syncobj support in amdgpu

2019-03-11 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 8a0732088640..4d8db87048d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -74,9 +74,10 @@
  * - 3.28.0 - Add AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES
  * - 3.29.0 - Add AMDGPU_IB_FLAG_RESET_GDS_MAX_WAVE_ID
  * - 3.30.0 - Add AMDGPU_SCHED_OP_CONTEXT_PRIORITY_OVERRIDE.
+ * - 3.31.0 - Add syncobj timeline support to AMDGPU_CS.
  */
 #define KMS_DRIVER_MAJOR   3
-#define KMS_DRIVER_MINOR   30
+#define KMS_DRIVER_MINOR   31
 #define KMS_DRIVER_PATCHLEVEL  0
 
 int amdgpu_vram_limit = 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 6/9] drm/amdgpu: add timeline support in amdgpu CS v3

2019-03-11 Thread Chunming Zhou
syncobj wait/signal operation is appending in command submission.
v2: separate to two kinds in/out_deps functions
v3: fix checking for timeline syncobj

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 152 +
 include/uapi/drm/amdgpu_drm.h  |   8 ++
 3 files changed, 144 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8d0d7f3dd5fb..deec2c796253 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -433,6 +433,12 @@ struct amdgpu_cs_chunk {
void*kdata;
 };
 
+struct amdgpu_cs_post_dep {
+   struct drm_syncobj *syncobj;
+   struct dma_fence_chain *chain;
+   u64 point;
+};
+
 struct amdgpu_cs_parser {
struct amdgpu_device*adev;
struct drm_file *filp;
@@ -462,8 +468,8 @@ struct amdgpu_cs_parser {
/* user fence */
struct amdgpu_bo_list_entry uf_entry;
 
-   unsigned num_post_dep_syncobjs;
-   struct drm_syncobj **post_dep_syncobjs;
+   unsignednum_post_deps;
+   struct amdgpu_cs_post_dep   *post_deps;
 };
 
 static inline u32 amdgpu_get_ib_value(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 52a5e4fdc95b..2f6239b6be6f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -215,6 +215,8 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, union drm_amdgpu_cs
case AMDGPU_CHUNK_ID_SYNCOBJ_IN:
case AMDGPU_CHUNK_ID_SYNCOBJ_OUT:
case AMDGPU_CHUNK_ID_SCHEDULED_DEPENDENCIES:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_WAIT:
+   case AMDGPU_CHUNK_ID_SYNCOBJ_TIMELINE_SIGNAL:
break;
 
default:
@@ -804,9 +806,11 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser 
*parser, int error,
ttm_eu_backoff_reservation(>ticket,
   >validated);
 
-   for (i = 0; i < parser->num_post_dep_syncobjs; i++)
-   drm_syncobj_put(parser->post_dep_syncobjs[i]);
-   kfree(parser->post_dep_syncobjs);
+   for (i = 0; i < parser->num_post_deps; i++) {
+   drm_syncobj_put(parser->post_deps[i].syncobj);
+   kfree(parser->post_deps[i].chain);
+   }
+   kfree(parser->post_deps);
 
dma_fence_put(parser->fence);
 
@@ -1117,13 +1121,18 @@ static int amdgpu_cs_process_fence_dep(struct 
amdgpu_cs_parser *p,
 }
 
 static int amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
-uint32_t handle)
+uint32_t handle, u64 point,
+u64 flags)
 {
-   int r;
struct dma_fence *fence;
-   r = drm_syncobj_find_fence(p->filp, handle, 0, 0, );
-   if (r)
+   int r;
+
+   r = drm_syncobj_find_fence(p->filp, handle, point, flags, );
+   if (r) {
+   DRM_ERROR("syncobj %u failed to find fence @ %llu (%d)!\n",
+ handle, point, r);
return r;
+   }
 
r = amdgpu_sync_fence(p->adev, >job->sync, fence, true);
dma_fence_put(fence);
@@ -1134,46 +1143,118 @@ static int 
amdgpu_syncobj_lookup_and_add_to_sync(struct amdgpu_cs_parser *p,
 static int amdgpu_cs_process_syncobj_in_dep(struct amdgpu_cs_parser *p,
struct amdgpu_cs_chunk *chunk)
 {
+   struct drm_amdgpu_cs_chunk_sem *deps;
unsigned num_deps;
int i, r;
-   struct drm_amdgpu_cs_chunk_sem *deps;
 
deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
num_deps = chunk->length_dw * 4 /
sizeof(struct drm_amdgpu_cs_chunk_sem);
+   for (i = 0; i < num_deps; ++i) {
+   r = amdgpu_syncobj_lookup_and_add_to_sync(p, deps[i].handle,
+ 0, 0);
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
+
 
+static int amdgpu_cs_process_syncobj_timeline_in_dep(struct amdgpu_cs_parser 
*p,
+struct amdgpu_cs_chunk 
*chunk)
+{
+   struct drm_amdgpu_cs_chunk_syncobj *syncobj_deps;
+   unsigned num_deps;
+   int i, r;
+
+   syncobj_deps = (struct drm_amdgpu_cs_chunk_syncobj *)chunk->kdata;
+   num_deps = chunk->length_dw * 4 /
+   sizeof(struct drm_amdgpu_cs_chunk_syncobj);
for (i = 0; i 

[PATCH 8/9] drm/syncobj: add timeline signal ioctl for syncobj v3

2019-03-11 Thread Chunming Zhou
v2: individually allocate chain array, since chain node is free independently.
v3: all existing points must be already signaled before cpu perform signal 
operation,
so add check condition for that.

Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 103 +
 include/uapi/drm/drm.h |   1 +
 4 files changed, 108 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index dd11ae5f1eef..d9a483a5fce0 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -190,6 +190,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private);
 int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 92b3b7b2fd81..d337f161909c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -696,6 +696,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL, 
drm_syncobj_timeline_signal_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index f1d18d10d1f2..78fc1c029339 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1174,6 +1174,109 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
return ret;
 }
 
+int
+drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   struct dma_fence_chain **chains;
+   uint64_t *points;
+   uint32_t i, j, timeline_count = 0;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -EOPNOTSUPP;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+struct dma_fence *fence;
+
+fence = drm_syncobj_fence_get(syncobjs[i]);
+chain = to_dma_fence_chain(fence);
+if (chain) {
+struct dma_fence *iter;
+
+dma_fence_chain_for_each(iter, fence) {
+if (!iter)
+break;
+   if (!dma_fence_is_signaled(iter)) {
+   dma_fence_put(iter);
+   DRM_ERROR("Client must guarantee all 
existing timeline points signaled before performing host signal operation!");
+   ret = -EPERM;
+   goto out;
+   }
+}
+}
+}
+
+   points = kmalloc_array(args->count_handles, sizeof(*points),
+  GFP_KERNEL);
+   if (!points) {
+   ret = -ENOMEM;
+   goto out;
+   }
+   if (!u64_to_user_ptr(args->points)) {
+   memset(points, 0, args->count_handles * sizeof(uint64_t));
+   } else if (copy_from_user(points, u64_to_user_ptr(args->points),
+ sizeof(uint64_t) * args->count_handles)) {
+   ret = -EFAULT;
+   goto err_points;
+   }
+
+
+   for (i = 0; i < args->count_handles; i++) {
+   if (points[i])
+   timeline_count++;
+   }
+   chains = kmalloc_array(timeline_count, sizeof(void *), GFP_KERNEL);
+   if (!chains) {
+ 

[PATCH 4/9] drm/syncobj: add timeline payload query ioctl v5

2019-03-11 Thread Chunming Zhou
user mode can query timeline payload.
v2: check return value of copy_to_user
v3: handle querying entry by entry
v4: rebase on new chain container, simplify interface
v5: query last signaled timeline point, not last point.

Signed-off-by: Chunming Zhou 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Christian König 
Cc: Chris Wilson 
---
 drivers/gpu/drm/drm_internal.h |  2 ++
 drivers/gpu/drm/drm_ioctl.c|  2 ++
 drivers/gpu/drm/drm_syncobj.c  | 58 ++
 include/uapi/drm/drm.h | 10 ++
 4 files changed, 72 insertions(+)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 331ac6225b58..695179bb88dc 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -188,6 +188,8 @@ int drm_syncobj_reset_ioctl(struct drm_device *dev, void 
*data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
 struct drm_file *file_private);
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 
 /* drm_framebuffer.c */
 void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index c984654646fa..7a534c184e52 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -694,6 +694,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 
DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, 
drm_crtc_queue_sequence_ioctl, DRM_UNLOCKED),
DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 
DRM_MASTER|DRM_UNLOCKED),
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 25eb2176d8c7..a5adc7c06caa 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -1062,3 +1062,61 @@ drm_syncobj_signal_ioctl(struct drm_device *dev, void 
*data,
 
return ret;
 }
+
+int drm_syncobj_query_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private)
+{
+   struct drm_syncobj_timeline_array *args = data;
+   struct drm_syncobj **syncobjs;
+   uint64_t __user *points = u64_to_user_ptr(args->points);
+   uint32_t i;
+   int ret;
+
+   if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ))
+   return -ENODEV;
+
+   if (args->pad != 0)
+   return -EINVAL;
+
+   if (args->count_handles == 0)
+   return -EINVAL;
+
+   ret = drm_syncobj_array_find(file_private,
+u64_to_user_ptr(args->handles),
+args->count_handles,
+);
+   if (ret < 0)
+   return ret;
+
+   for (i = 0; i < args->count_handles; i++) {
+   struct dma_fence_chain *chain;
+   struct dma_fence *fence;
+   uint64_t point;
+
+   fence = drm_syncobj_fence_get(syncobjs[i]);
+   chain = to_dma_fence_chain(fence);
+   if (chain) {
+   struct dma_fence *iter, *last_signaled = NULL;
+
+   dma_fence_chain_for_each(iter, fence) {
+   if (!iter)
+   break;
+   dma_fence_put(last_signaled);
+   last_signaled = dma_fence_get(iter);
+   }
+   point = dma_fence_is_signaled(last_signaled) ?
+   last_signaled->seqno :
+   to_dma_fence_chain(last_signaled)->prev_seqno;
+   dma_fence_put(last_signaled);
+   } else {
+   point = 0;
+   }
+   ret = copy_to_user([i], , sizeof(uint64_t));
+   ret = ret ? -EFAULT : 0;
+   if (ret)
+   break;
+   }
+   drm_syncobj_array_free(syncobjs, args->count_handles);
+
+   return ret;
+}
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 0092111d002c..b2c36f2b2599 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -767,6 +767,14 @@ struct drm_syncobj_array {
__u32 pad;
 };
 
+struct drm_syncobj_timeline_array {
+   __u64 handles;
+   __u64 points;
+   __u32 count_handles;
+   __u32 pad;
+};
+
+
 /* Query current s

[PATCH 5/9] drm/syncobj: use the timeline point in drm_syncobj_find_fence v3

2019-03-11 Thread Chunming Zhou
From: Christian König 

Implement finding the right timeline point in drm_syncobj_find_fence.

v2: return -EINVAL when the point is not submitted yet.
v3: fix reference counting bug, add flags handling as well

Signed-off-by: Christian König 
---
 drivers/gpu/drm/drm_syncobj.c | 43 ---
 1 file changed, 40 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index a5adc7c06caa..673b805ab2e8 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -231,16 +231,53 @@ int drm_syncobj_find_fence(struct drm_file *file_private,
   struct dma_fence **fence)
 {
struct drm_syncobj *syncobj = drm_syncobj_find(file_private, handle);
-   int ret = 0;
+   struct syncobj_wait_entry wait;
+   int ret;
 
if (!syncobj)
return -ENOENT;
 
*fence = drm_syncobj_fence_get(syncobj);
-   if (!*fence) {
+   drm_syncobj_put(syncobj);
+
+   if (*fence) {
+   ret = dma_fence_chain_find_seqno(fence, point);
+   if (!ret)
+   return 0;
+   dma_fence_put(*fence);
+   } else {
ret = -EINVAL;
}
-   drm_syncobj_put(syncobj);
+
+   if (!(flags & DRM_SYNCOBJ_WAIT_FLAGS_WAIT_FOR_SUBMIT))
+   return ret;
+
+   memset(, 0, sizeof(wait));
+   wait.task = current;
+   wait.point = point;
+   drm_syncobj_fence_add_wait(syncobj, );
+
+   do {
+   set_current_state(TASK_INTERRUPTIBLE);
+   if (wait.fence) {
+   ret = 0;
+   break;
+   }
+
+   if (signal_pending(current)) {
+   ret = -ERESTARTSYS;
+   break;
+   }
+
+   schedule();
+   } while (1);
+
+   __set_current_state(TASK_RUNNING);
+   *fence = wait.fence;
+
+   if (wait.node.next)
+   drm_syncobj_remove_wait(syncobj, );
+
return ret;
 }
 EXPORT_SYMBOL(drm_syncobj_find_fence);
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 3/9] drm/syncobj: add support for timeline point wait v8

2019-03-11 Thread Chunming Zhou
points array is one-to-one match with syncobjs array.
v2:
add seperate ioctl for timeline point wait, otherwise break uapi.
v3:
userspace can specify two kinds waits::
a. Wait for time point to be completed.
b. and wait for time point to become available
v4:
rebase
v5:
add comment for xxx_WAIT_AVAILABLE
v6: rebase and rework on new container
v7: drop _WAIT_COMPLETED, it is the default anyway
v8: correctly handle garbage collected fences

Signed-off-by: Chunming Zhou 
Signed-off-by: Christian König 
Cc: Daniel Rakos 
Cc: Jason Ekstrand 
Cc: Bas Nieuwenhuizen 
Cc: Dave Airlie 
Cc: Chris Wilson 
---
 drivers/gpu/drm/drm_internal.h |   2 +
 drivers/gpu/drm/drm_ioctl.c|   2 +
 drivers/gpu/drm/drm_syncobj.c  | 153 ++---
 include/uapi/drm/drm.h |  15 
 4 files changed, 143 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h
index 251d67e04c2d..331ac6225b58 100644
--- a/drivers/gpu/drm/drm_internal.h
+++ b/drivers/gpu/drm/drm_internal.h
@@ -182,6 +182,8 @@ int drm_syncobj_fd_to_handle_ioctl(struct drm_device *dev, 
void *data,
   struct drm_file *file_private);
 int drm_syncobj_wait_ioctl(struct drm_device *dev, void *data,
   struct drm_file *file_private);
+int drm_syncobj_timeline_wait_ioctl(struct drm_device *dev, void *data,
+   struct drm_file *file_private);
 int drm_syncobj_reset_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_private);
 int drm_syncobj_signal_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 687943df58e1..c984654646fa 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -688,6 +688,8 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_WAIT, drm_syncobj_wait_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
+   DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_TIMELINE_WAIT, 
drm_syncobj_timeline_wait_ioctl,
+ DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_RESET, drm_syncobj_reset_ioctl,
  DRM_UNLOCKED|DRM_RENDER_ALLOW),
DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_SIGNAL, drm_syncobj_signal_ioctl,
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 933225fde9a9..25eb2176d8c7 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -61,6 +61,7 @@ struct syncobj_wait_entry {
struct task_struct *task;
struct dma_fence *fence;
struct dma_fence_cb fence_cb;
+   u64point;
 };
 
 static void syncobj_wait_syncobj_func(struct drm_syncobj *syncobj,
@@ -95,6 +96,8 @@ EXPORT_SYMBOL(drm_syncobj_find);
 static void drm_syncobj_fence_add_wait(struct drm_syncobj *syncobj,
   struct syncobj_wait_entry *wait)
 {
+   struct dma_fence *fence;
+
if (wait->fence)
return;
 
@@ -103,11 +106,15 @@ static void drm_syncobj_fence_add_wait(struct drm_syncobj 
*syncobj,
 * have the lock, try one more time just to be sure we don't add a
 * callback when a fence has already been set.
 */
-   if (syncobj->fence)
-   wait->fence = dma_fence_get(
-   rcu_dereference_protected(syncobj->fence, 1));
-   else
+   fence = dma_fence_get(rcu_dereference_protected(syncobj->fence, 1));
+   if (!fence || dma_fence_chain_find_seqno(, wait->point)) {
+   dma_fence_put(fence);
list_add_tail(>node, >cb_list);
+   } else if (!fence) {
+   wait->fence = dma_fence_get_stub();
+   } else {
+   wait->fence = fence;
+   }
spin_unlock(>lock);
 }
 
@@ -147,10 +154,8 @@ void drm_syncobj_add_point(struct drm_syncobj *syncobj,
dma_fence_chain_init(chain, prev, fence, point);
rcu_assign_pointer(syncobj->fence, >base);
 
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
spin_unlock(>lock);
 
/* Walk the chain once to trigger garbage collection */
@@ -182,10 +187,8 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
rcu_assign_pointer(syncobj->fence, fence);
 
if (fence != old_fence) {
-   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
-   list_del_init(>node);
+   list_for_each_entry_safe(cur, tmp, >cb_list, node)
syncobj_wait_syncobj_func(syncobj, cur);
-   }
}
 
spin_unlock(>lock);
@@ -642,

[PATCH 2/9] drm/syncobj: add new drm_syncobj_add_point interface v3

2019-03-11 Thread Chunming Zhou
From: Christian König 

Use the dma_fence_chain object to create a timeline of fence objects
instead of just replacing the existing fence.

v2: rebase and cleanup
v3: fix garbage collection parameters

Signed-off-by: Christian König 
---
 drivers/gpu/drm/drm_syncobj.c | 37 +++
 include/drm/drm_syncobj.h |  5 +
 2 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 5329e66598c6..933225fde9a9 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -122,6 +122,43 @@ static void drm_syncobj_remove_wait(struct drm_syncobj 
*syncobj,
spin_unlock(>lock);
 }
 
+/**
+ * drm_syncobj_add_point - add new timeline point to the syncobj
+ * @syncobj: sync object to add timeline point do
+ * @chain: chain node to use to add the point
+ * @fence: fence to encapsulate in the chain node
+ * @point: sequence number to use for the point
+ *
+ * Add the chain node as new timeline point to the syncobj.
+ */
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point)
+{
+   struct syncobj_wait_entry *cur, *tmp;
+   struct dma_fence *prev;
+
+   dma_fence_get(fence);
+
+   spin_lock(>lock);
+
+   prev = drm_syncobj_fence_get(syncobj);
+   dma_fence_chain_init(chain, prev, fence, point);
+   rcu_assign_pointer(syncobj->fence, >base);
+
+   list_for_each_entry_safe(cur, tmp, >cb_list, node) {
+   list_del_init(>node);
+   syncobj_wait_syncobj_func(syncobj, cur);
+   }
+   spin_unlock(>lock);
+
+   /* Walk the chain once to trigger garbage collection */
+   dma_fence_chain_for_each(fence, prev);
+   dma_fence_put(prev);
+}
+EXPORT_SYMBOL(drm_syncobj_add_point);
+
 /**
  * drm_syncobj_replace_fence - replace fence in a sync object.
  * @syncobj: Sync object to replace fence in
diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h
index 0311c9fdbd2f..6cf7243a1dc5 100644
--- a/include/drm/drm_syncobj.h
+++ b/include/drm/drm_syncobj.h
@@ -27,6 +27,7 @@
 #define __DRM_SYNCOBJ_H__
 
 #include 
+#include 
 
 struct drm_file;
 
@@ -112,6 +113,10 @@ drm_syncobj_fence_get(struct drm_syncobj *syncobj)
 
 struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private,
 u32 handle);
+void drm_syncobj_add_point(struct drm_syncobj *syncobj,
+  struct dma_fence_chain *chain,
+  struct dma_fence *fence,
+  uint64_t point);
 void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
   struct dma_fence *fence);
 int drm_syncobj_find_fence(struct drm_file *file_private,
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/9] dma-buf: add new dma_fence_chain container v5

2019-03-11 Thread Chunming Zhou
From: Christian König 

Lockless container implementation similar to a dma_fence_array, but with
only two elements per node and automatic garbage collection.

v2: properly document dma_fence_chain_for_each, add dma_fence_chain_find_seqno,
drop prev reference during garbage collection if it's not a chain fence.
v3: use head and iterator for dma_fence_chain_for_each
v4: fix reference count in dma_fence_chain_enable_signaling
v5: fix iteration when walking each chain node

Signed-off-by: Christian König 
---
 drivers/dma-buf/Makefile  |   3 +-
 drivers/dma-buf/dma-fence-chain.c | 241 ++
 include/linux/dma-fence-chain.h   |  81 ++
 3 files changed, 324 insertions(+), 1 deletion(-)
 create mode 100644 drivers/dma-buf/dma-fence-chain.c
 create mode 100644 include/linux/dma-fence-chain.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 0913a6ccab5a..1f006e083eb9 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,4 +1,5 @@
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o
+obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
+reservation.o seqno-fence.o
 obj-$(CONFIG_SYNC_FILE)+= sync_file.o
 obj-$(CONFIG_SW_SYNC)  += sw_sync.o sync_debug.o
 obj-$(CONFIG_UDMABUF)  += udmabuf.o
diff --git a/drivers/dma-buf/dma-fence-chain.c 
b/drivers/dma-buf/dma-fence-chain.c
new file mode 100644
index ..0c5e3c902fa0
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-chain.c
@@ -0,0 +1,241 @@
+/*
+ * fence-chain: chain fences together in a timeline
+ *
+ * Copyright (C) 2018 Advanced Micro Devices, Inc.
+ * Authors:
+ * Christian König 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+
+static bool dma_fence_chain_enable_signaling(struct dma_fence *fence);
+
+/**
+ * dma_fence_chain_get_prev - use RCU to get a reference to the previous fence
+ * @chain: chain node to get the previous node from
+ *
+ * Use dma_fence_get_rcu_safe to get a reference to the previous fence of the
+ * chain node.
+ */
+static struct dma_fence *dma_fence_chain_get_prev(struct dma_fence_chain 
*chain)
+{
+   struct dma_fence *prev;
+
+   rcu_read_lock();
+   prev = dma_fence_get_rcu_safe(>prev);
+   rcu_read_unlock();
+   return prev;
+}
+
+/**
+ * dma_fence_chain_walk - chain walking function
+ * @fence: current chain node
+ *
+ * Walk the chain to the next node. Returns the next fence or NULL if we are at
+ * the end of the chain. Garbage collects chain nodes which are already
+ * signaled.
+ */
+struct dma_fence *dma_fence_chain_walk(struct dma_fence *fence)
+{
+   struct dma_fence_chain *chain, *prev_chain;
+   struct dma_fence *prev, *replacement, *tmp;
+
+   chain = to_dma_fence_chain(fence);
+   if (!chain) {
+   dma_fence_put(fence);
+   return NULL;
+   }
+
+   while ((prev = dma_fence_chain_get_prev(chain))) {
+
+   prev_chain = to_dma_fence_chain(prev);
+   if (prev_chain) {
+   if (!dma_fence_is_signaled(prev_chain->fence))
+   break;
+
+   replacement = dma_fence_chain_get_prev(prev_chain);
+   } else {
+   if (!dma_fence_is_signaled(prev))
+   break;
+
+   replacement = NULL;
+   }
+
+   tmp = cmpxchg(>prev, prev, replacement);
+   if (tmp == prev)
+   dma_fence_put(tmp);
+   else
+   dma_fence_put(replacement);
+   dma_fence_put(prev);
+   }
+
+   dma_fence_put(fence);
+   return prev;
+}
+EXPORT_SYMBOL(dma_fence_chain_walk);
+
+/**
+ * dma_fence_chain_find_seqno - find fence chain node by seqno
+ * @pfence: pointer to the chain node where to start
+ * @seqno: the sequence number to search for
+ *
+ * Advance the fence pointer to the chain node which will signal this sequence
+ * number. If no sequence number is provided then this is a no-op.
+ *
+ * Returns EINVAL if the fence is not a chain node or the sequence number has
+ * not yet advanced far enough.
+ */
+int dma_fence_chain_find_seqno(struct dma_fence **pfence, uint64_t seqno)
+{
+   struct dma_fence_chain *chain;
+
+   if (!seqno)
+   return 0;
+
+   chain = to_dma_fence_chain(*pfence);
+   if (!chain || chain->base.seqno < seqno)
+   return -EINVAL;
+
+   dma_fence_chain_for_each(*pfence, >base) 

[PATCH] drm/amdgpu: enable bo priority setting from user space

2019-03-07 Thread Chunming Zhou
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h|  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  1 +
 include/drm/ttm/ttm_bo_driver.h|  9 -
 include/uapi/drm/amdgpu_drm.h  |  3 +++
 7 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
index 5cbde74b97dd..70a6baf20c22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
@@ -144,6 +144,7 @@ static int amdgpufb_create_pinned_object(struct 
amdgpu_fbdev *rfbdev,
size = mode_cmd->pitches[0] * height;
aligned_size = ALIGN(size, PAGE_SIZE);
ret = amdgpu_gem_object_create(adev, aligned_size, 0, domain,
+  TTM_BO_PRIORITY_NORMAL,
   AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED |
   AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
   AMDGPU_GEM_CREATE_VRAM_CLEARED,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index d21dd2f369da..7c1c2362c67e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -44,6 +44,7 @@ void amdgpu_gem_object_free(struct drm_gem_object *gobj)
 
 int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
 int alignment, u32 initial_domain,
+enum ttm_bo_priority priority,
 u64 flags, enum ttm_bo_type type,
 struct reservation_object *resv,
 struct drm_gem_object **obj)
@@ -60,6 +61,7 @@ int amdgpu_gem_object_create(struct amdgpu_device *adev, 
unsigned long size,
bp.type = type;
bp.resv = resv;
bp.preferred_domain = initial_domain;
+   bp.priority = priority;
 retry:
bp.flags = flags;
bp.domain = initial_domain;
@@ -229,6 +231,14 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
if (args->in.domains & ~AMDGPU_GEM_DOMAIN_MASK)
return -EINVAL;
 
+   /* check priority */
+   if (args->in.priority == 0) {
+   /* default is normal */
+   args->in.priority = TTM_BO_PRIORITY_NORMAL;
+   } else if (args->in.priority > TTM_MAX_BO_PRIORITY) {
+   args->in.priority = TTM_MAX_BO_PRIORITY;
+   DRM_ERROR("priority specified from user space is over MAX 
priority\n");
+   }
/* create a gem object to contain this object in */
if (args->in.domains & (AMDGPU_GEM_DOMAIN_GDS |
AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA)) {
@@ -252,6 +262,7 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void 
*data,
 
r = amdgpu_gem_object_create(adev, size, args->in.alignment,
 (u32)(0x & args->in.domains),
+args->in.priority - 1,
 flags, ttm_bo_type_device, resv, );
if (flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID) {
if (!r) {
@@ -304,6 +315,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void 
*data,
 
/* create a gem object to contain this object in */
r = amdgpu_gem_object_create(adev, args->size, 0, AMDGPU_GEM_DOMAIN_CPU,
+TTM_BO_PRIORITY_NORMAL,
 0, ttm_bo_type_device, NULL, );
if (r)
return r;
@@ -755,6 +767,7 @@ int amdgpu_mode_dumb_create(struct drm_file *file_priv,
domain = amdgpu_bo_get_preferred_pin_domain(adev,
amdgpu_display_supported_domains(adev));
r = amdgpu_gem_object_create(adev, args->size, 0, domain,
+TTM_BO_PRIORITY_NORMAL,
 AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED,
 ttm_bo_type_device, NULL, );
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h
index f1ddfc50bcc7..47b0a8190948 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.h
@@ -61,7 +61,7 @@ extern const struct dma_buf_ops amdgpu_dmabuf_ops;
  */
 void amdgpu_gem_force_release(struct amdgpu_device *adev);
 int amdgpu_gem_object_create(struct amdgpu_device *adev, unsigned long size,
-int alignment, u32 initial_domain,
+int alignment, u32 initial_domain, u32 priority,
 u64 flags, enum ttm_bo_type t

Re: [PATCH] drm/amdgpu: force to use CPU_ACCESS hint optimization

2019-03-06 Thread Chunming Zhou

在 2019/3/6 20:30, Christian König 写道:
> Am 06.03.19 um 13:00 schrieb Zhou, David(ChunMing):
>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Wednesday, March 06, 2019 7:55 PM
>>> To: Zhou, David(ChunMing) ; Koenig, Christian
>>> ; amd-gfx@lists.freedesktop.org
>>> Subject: Re: [PATCH] drm/amdgpu: force to use CPU_ACCESS hint
>>> optimization
>>>
>>> Am 06.03.19 um 12:52 schrieb Chunming Zhou:
>>>> As we know, visible vram can be placed to invisible when no cpu 
>>>> access.
>>>>
>>>> Signed-off-by: Chunming Zhou 
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 8 +++-
>>>>    1 file changed, 3 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> index bc62bf41b7e9..823deb66f5da 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> @@ -592,8 +592,7 @@ static int amdgpu_info_ioctl(struct drm_device
>>>> *dev, void *data, struct drm_file
>>>>
>>>>    vram_gtt.vram_size = adev->gmc.real_vram_size -
>>>>    atomic64_read(>vram_pin_size);
>>>> -    vram_gtt.vram_cpu_accessible_size = adev-
>>>> gmc.visible_vram_size -
>>>> -    atomic64_read(>visible_pin_size);
>>>> +    vram_gtt.vram_cpu_accessible_size = vram_gtt.vram_size;
>>> Well, NAK that would of course report the full VRAM as visible which 
>>> isn't
>>> correct.
>> UMD also said same reason that they like report explicit vram info to 
>> application.
>
> Yeah, I mean that is a rather good argument. The application should 
> certainly know that.
>
>> No idea to do that.
>
> Well if I understood that correctly Vulkan had the same problem with 
> cached and uncached system memory. How is it handled there?

Which problem with cached and uncached system memory?

-David

>
> Christian.
>
>>
>> -David
>>> Christian.
>>>
>>>>    vram_gtt.gtt_size = adev-
>>>> mman.bdev.man[TTM_PL_TT].size;
>>>>    vram_gtt.gtt_size *= PAGE_SIZE;
>>>>    vram_gtt.gtt_size -= atomic64_read(>gart_pin_size);
>>>> @@ -612,9 +611,8 @@ static int amdgpu_info_ioctl(struct drm_device
>>> *dev, void *data, struct drm_file
>>>>    mem.vram.max_allocation = mem.vram.usable_heap_size *
>>> 3 / 4;
>>>> mem.cpu_accessible_vram.total_heap_size =
>>>> -    adev->gmc.visible_vram_size;
>>>> -    mem.cpu_accessible_vram.usable_heap_size = adev-
>>>> gmc.visible_vram_size -
>>>> -    atomic64_read(>visible_pin_size);
>>>> +    mem.vram.total_heap_size;
>>>> +    mem.cpu_accessible_vram.usable_heap_size =
>>>> +mem.vram.usable_heap_size;
>>>>    mem.cpu_accessible_vram.heap_usage =
>>>>    amdgpu_vram_mgr_vis_usage(
>>>> mman.bdev.man[TTM_PL_VRAM]);
>>>>    mem.cpu_accessible_vram.max_allocation =
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: force to use CPU_ACCESS hint optimization

2019-03-06 Thread Chunming Zhou
As we know, visible vram can be placed to invisible when no cpu access.

Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index bc62bf41b7e9..823deb66f5da 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -592,8 +592,7 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
 
vram_gtt.vram_size = adev->gmc.real_vram_size -
atomic64_read(>vram_pin_size);
-   vram_gtt.vram_cpu_accessible_size = adev->gmc.visible_vram_size 
-
-   atomic64_read(>visible_pin_size);
+   vram_gtt.vram_cpu_accessible_size = vram_gtt.vram_size;
vram_gtt.gtt_size = adev->mman.bdev.man[TTM_PL_TT].size;
vram_gtt.gtt_size *= PAGE_SIZE;
vram_gtt.gtt_size -= atomic64_read(>gart_pin_size);
@@ -612,9 +611,8 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
mem.vram.max_allocation = mem.vram.usable_heap_size * 3 / 4;
 
mem.cpu_accessible_vram.total_heap_size =
-   adev->gmc.visible_vram_size;
-   mem.cpu_accessible_vram.usable_heap_size = 
adev->gmc.visible_vram_size -
-   atomic64_read(>visible_pin_size);
+   mem.vram.total_heap_size;
+   mem.cpu_accessible_vram.usable_heap_size = 
mem.vram.usable_heap_size;
mem.cpu_accessible_vram.heap_usage =

amdgpu_vram_mgr_vis_usage(>mman.bdev.man[TTM_PL_VRAM]);
mem.cpu_accessible_vram.max_allocation =
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

2019-01-18 Thread Chunming Zhou

在 2019/1/18 17:11, Christian König 写道:
Hi Monk,

You see that for UMD, it can use 0 to HOLE_START
Let me say it once more: The UMD nor anybody else CAN'T use 0 to HOLE_START, 
that region is reserved for the ATC hardware!

We unfortunately didn't knew that initially and also didn't used the ATC, so we 
didn't ran into a problem.

But ROCm now uses the ATC on Raven/Picasso and I have a branch where I enable 
it on Vega as well. So when we don't fix that we will run into problems here.

The ATC isn't usable in combination with SRIOV and I don't think Windows uses 
it either, so they probably never ran into any issues.

Do you mean even UMD should  not use virtual address that dropped in range from 
0 to HOLE_START ?
Yes, exactly! That in combination with ATC use can have quite a bunch of 
strange and hard to track down effects because two parts of the driver are 
using the same address space.

That way where should UMD work in ?and I assume our UMD now still using this 
range, this part make me puzzle
At least Mesa now uses the high address space from HOLE_END..0x   
.

That is why there is high_vamgr in libdrm. The extended va will be masked in 
kernel when doing vm mapping, right?

If mesa already switches to use that, shouldn't internal opengl and vulkan 
switch to?


-David

Regards,
Christian.

Am 18.01.19 um 02:32 schrieb Liu, Monk:
Thanks Christian,

Questions I have now:


  1.  You see that for UMD, it can use 0 to HOLE_START, so why CSA cannot use 
that range although the range is as you said reserved to ATC h/w ? Be note that 
for windows KMD, the CSA is allocated by UMD driver so CSA shares the same 
aperture /space range with other UMD BO, which mean CSA in windows also located 
in ATC range, if that’s a problem why windows still works well.
 *   Can you illustrate this limitation with more details ? we need to 
understand why CSA couldn’t be put in ATC range.
  2.  According to your previous description :” Now on Vega/Raven/Picasso etc.. 
(everything with a GFX9) the lower range (0x0-0x8000  ) is reserved for 
SVA/ATC use. Since we unfortunately didn't knew that initially we exposed those 
to older user space as usable and also put the CSA in there.”
 *   Do you mean even UMD should  not use virtual address that dropped in 
range from 0 to HOLE_START ?

that way where should UMD work in ?and I assume our UMD now still using this 
range, this part make me puzzle

/Monk
From: amd-gfx 

 On Behalf Of Koenig, Christian
Sent: Thursday, January 17, 2019 9:26 PM
To: Liu, Monk ; Lou, Wentao 
; 
amd-gfx@lists.freedesktop.org; Zhu, Rex 

Cc: Deng, Emily 
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than 
AMDGPU_GMC_HOLE_START

Hi Monk,


Regarding with above sentence, do you mean this range (0->HOLE_START) shouldn’t 
be exposed to user space ? I don’t get your point here …
Yes exactly. As I said the problem is that 0->HOLE_START is reserved for the 
ATC hardware, we should not touch it at all.


Putting CSA in 0~HOLD_START is the legacy approach we selected for a long time 
since very early stage, how comes that you think it is a problem now ?
That turned out to be never a good idea in the first place.

What we could do is reduce the max_pfn for SRIOV because the ATC doesn't work 
in that configuration anyway. But I would only do this as last resort.

Any idea why an address above the hole doesn't work with SRIOV? It seems to 
work fine in the bare metal case.

Regards,
Christian.

Am 17.01.19 um 14:19 schrieb Liu, Monk:
Hi Christian

Thanks for explaining the HOLD for us,

My understanding is we still could put CSA to 0~HOLE_START, because we can 
report UMD the max space is HOLD_START-CSA_SIZE , thus no colliding will hit.

> Now on Vega/Raven/Picasso etc.. (everything with a GFX9) the lower range 
> (0x0-0x8000  ) is reserved for SVA/ATC use. Since we unfortunately 
> didn't knew that initially we exposed those to older userspace as usable and 
> also put the CSA in there.


Regarding with above sentence, do you mean this range (0->HOLE_START) shouldn’t 
be exposed to user space ? I don’t get your point here …

Putting CSA in 0~HOLD_START is the legacy approach we selected for a long time 
since very early stage, how comes that you think it is a problem now ?

/Monk
From: amd-gfx 

 On Behalf Of Koenig, Christian
Sent: Thursday, January 17, 2019 4:30 PM
To: Liu, Monk ; Lou, Wentao 
; 
amd-gfx@lists.freedesktop.org; Zhu, Rex 

Cc: Deng, Emily 
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than 
AMDGPU_GMC_HOLE_START

Hi Monk,

ok let me explain a bit more how the 

  1   2   3   4   5   6   7   >