Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-05 Thread Christian König

Am 05.09.2018 um 05:36 schrieb zhoucm1:



On 2018年09月04日 17:20, Christian König wrote:

Am 04.09.2018 um 11:00 schrieb zhoucm1:



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb 
node.
2. Each node keeps a reference to the last previously 
inserted node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage 
collection and remove the first node from the tree as long as 
it is signaled.


5. When enable_signaling is requested for a node we cascade 
that to the left using rb_prev.
    This ensures that signaling is enabled for the current 
fence as well as all previous fences.


6. A wait just looks into the tree for the signal point lower 
or equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there 
is no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if 
a wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure 
the timeline value is bigger than wait pt value, that means 
signal_pt8 is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation 
asks for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed 
when garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 
is already garbage collected and 7 is asked we know that we don't 
need to return anything.
8 is a signaled node, waitA/signal operation do garbage collection, 
how waitB(7) know the garbage history?


Well we of course keep what the last garbage collected number is, 
don't we?


Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just 
a windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop 
increasing when syncobj is released.


Yeah, but the syncobj can live for a very very long time. Not having 
some form of shrinking it when fences are signaled is certainly not 
going to fly very far.

I will try to fix this problem.
btw, when I try your suggestion, I find it will be difficult to 
implement drm_syncobj_array_wait_timeout by your idea, since it needs 
first_signaled. if there is un-signaled syncobj, we will still 
register cb to timeline value change, then still back to need 
enble_signaling.


I don't full understand the problem here. When we need to wait for a 
fence it is obvious that we need to enable signaling. So what exactly is 
the issue?


BTW: I'm going to commit patches #1-#4 to drm-misc-next now if nobody is 
going to object.


Christian.



Thanks,
David Zhou


Regards,
Christian.



Anyway kref is a good way to solve the 'free' problem, I will try to 
use it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree 
since you always need to keep them around just in case somebody 
asks for a lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a 
previous point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the 
looked up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not 
when enable_signaling is called.


In other words the sync_obj keeps the references to each tree 
object to provide the wait operation, as soon as the sync_obj is 
destroyed we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way 
we can drop all unused signal points as soon as the sync_obj is 
destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs 
less memory and each node has always the same size which means 
we can use a kmem_cache 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 17:20, Christian König wrote:

Am 04.09.2018 um 11:00 schrieb zhoucm1:



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb 
node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage 
collection and remove the first node from the tree as long as 
it is signaled.


5. When enable_signaling is requested for a node we cascade 
that to the left using rb_prev.
    This ensures that signaling is enabled for the current 
fence as well as all previous fences.


6. A wait just looks into the tree for the signal point lower 
or equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there 
is no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if 
a wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure 
the timeline value is bigger than wait pt value, that means 
signal_pt8 is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need 
to return anything.
8 is a signaled node, waitA/signal operation do garbage collection, 
how waitB(7) know the garbage history?


Well we of course keep what the last garbage collected number is, 
don't we?


Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just 
a windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop 
increasing when syncobj is released.


Yeah, but the syncobj can live for a very very long time. Not having 
some form of shrinking it when fences are signaled is certainly not 
going to fly very far.

I will try to fix this problem.
btw, when I try your suggestion, I find it will be difficult to 
implement drm_syncobj_array_wait_timeout by your idea, since it needs 
first_signaled. if there is un-signaled syncobj, we will still register 
cb to timeline value change, then still back to need enble_signaling.


Thanks,
David Zhou


Regards,
Christian.



Anyway kref is a good way to solve the 'free' problem, I will try to 
use it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree 
since you always need to keep them around just in case somebody asks 
for a lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a 
previous point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the 
looked up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree 
object to provide the wait operation, as soon as the sync_obj is 
destroyed we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way 
we can drop all unused signal points as soon as the sync_obj is 
destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs 
less memory and each node has always the same size which means 
we can use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx








___
amd-gfx mailing list

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread Christian König

Am 04.09.2018 um 11:00 schrieb zhoucm1:



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade 
that to the left using rb_prev.
    This ensures that signaling is enabled for the current 
fence as well as all previous fences.


6. A wait just looks into the tree for the signal point lower 
or equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is 
no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure 
the timeline value is bigger than wait pt value, that means 
signal_pt8 is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need 
to return anything.
8 is a signaled node, waitA/signal operation do garbage collection, 
how waitB(7) know the garbage history?


Well we of course keep what the last garbage collected number is, don't we?

Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just a 
windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop 
increasing when syncobj is released.


Yeah, but the syncobj can live for a very very long time. Not having 
some form of shrinking it when fences are signaled is certainly not 
going to fly very far.


Regards,
Christian.



Anyway kref is a good way to solve the 'free' problem, I will try to 
use it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree since 
you always need to keep them around just in case somebody asks for a 
lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a 
previous point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the 
looked up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree 
object to provide the wait operation, as soon as the sync_obj is 
destroyed we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is 
destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can 
use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx








___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 16:42, Christian König wrote:

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted 
node.

3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is 
no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 
is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need to 
return anything.
8 is a signaled node, waitA/signal operation do garbage collection, how 
waitB(7) know the garbage history?




Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just a 
windows of the last not yet signaled fences.
No one said the it's a infinite timeline, timeline will stop increasing 
when syncobj is released.


Anyway kref is a good way to solve the 'free' problem, I will try to use 
it improve my patch, of course, will refer your idea.:)


Thanks,
David Zhou


Otherwise you will never be able to release nodes from the tree since 
you always need to keep them around just in case somebody asks for a 
lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the looked 
up references to the nodes are dropped.
And here, who will destroy rb node since no one do 
enable_signaling, and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed 
we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should 
work fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can 
use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread Christian König

Am 04.09.2018 um 10:27 schrieb zhoucm1:



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is 
no timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 
is need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks 
for 7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15?


The garbage collection is only done for signaled nodes. So when 8 is 
already garbage collected and 7 is asked we know that we don't need to 
return anything.


Since there is no timeline as a line, I think this is not right 
direction.


That is actually intended. There is no infinite timeline here, just a 
windows of the last not yet signaled fences.


Otherwise you will never be able to release nodes from the tree since 
you always need to keep them around just in case somebody asks for a 
lower number.


Regards,
Christian.





The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root 
(e.g. the sync object). It only destructs itself when the looked 
up references to the nodes are dropped.
And here, who will destroy rb node since no one do enable_signaling, 
and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed 
we don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should work 
fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can 
use a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 16:05, Christian König wrote:

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection 
and remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that 
to the left using rb_prev.
    This ensures that signaling is enabled for the current fence 
as well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is no 
timeline value as a line:
signal pt value doesn't must be continues, which can be jump by 
signal operation, like 1, 4, 8, 15, 19, e.g. there are five 
singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 is 
need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks for 
7 we can return 8 as well.
If defined this, then problem is coming again, if 8 is removed when 
garbage collection, you will return 15? Since there is no timeline as a 
line, I think this is not right direction.




The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.

That's intention.

Regards,
David Zhou




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra 
reference to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root (e.g. 
the sync object). It only destructs itself when the looked up 
references to the nodes are dropped.
And here, who will destroy rb node since no one do enable_signaling, 
and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object 
to provide the wait operation, as soon as the sync_obj is destroyed we 
don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we 
can drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should work 
fine.
Yeah, it could work, simple timeline reference also can solve 
'free' problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can use 
a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread Christian König

Am 04.09.2018 um 09:53 schrieb zhoucm1:

[SNIP]


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection and 
remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that to 
the left using rb_prev.
    This ensures that signaling is enabled for the current fence as 
well as all previous fences.


6. A wait just looks into the tree for the signal point lower or 
equal of the requested sequence number.
After re-thought your idea, I think it doesn't work since there is no 
timeline value as a line:
signal pt value doesn't must be continues, which can be jump by signal 
operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, 
signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a 
wait pt is 7, do you mean this wait only needs signal_pt1 and 
signal_pt4???  That's certainly not right, we need to make sure the 
timeline value is bigger than wait pt value, that means signal_pt8 is 
need for wait_pt7.


That can be defined as we like it, e.g. when a wait operation asks for 7 
we can return 8 as well.


The key is that as soon as a signal point is added adding a previous 
point is no longer allowed.




7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop the extra reference 
to each node, but never call rb_erase!
    This way the rb_tree stays in memory, but without a root (e.g. 
the sync object). It only destructs itself when the looked up 
references to the nodes are dropped.
And here, who will destroy rb node since no one do enable_signaling, 
and there is no callback to free themselves.


The node will be destroyed when the last reference drops, not when 
enable_signaling is called.


In other words the sync_obj keeps the references to each tree object to 
provide the wait operation, as soon as the sync_obj is destroyed we 
don't need that functionality any more.


We don't even need to wait for anything to be signaled, this way we can 
drop all unused signal points as soon as the sync_obj is destroyed.


Only the used ones will stay alive and provide the necessary 
functionality to provide the signal for each wait operation.


Regards,
Christian.



Regards,
David Zhou


Well that is quite a bunch of logic, but I think that should work 
fine.
Yeah, it could work, simple timeline reference also can solve 'free' 
problem.


I think this approach is still quite a bit better, 


e.g. you don't run into circle dependency problems, it needs less 
memory and each node has always the same size which means we can use 
a kmem_cache for it.


Regards,
Christian.



Thanks,
David Zhou






___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread zhoucm1



On 2018年09月04日 15:00, Christian König wrote:

Am 04.09.2018 um 06:04 schrieb zhoucm1:



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on 
previous pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for 
syncobj_timeline->signal_point every time when signal ops is 
performed.
    b. when wait ops is comming, wait pt will fetch above 
last signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I 
debug them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub 
fence as well, and in the meantime, their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. 
blocking in drm_syncobj_timeline_fini() until all waits are 
done is not a good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried 
to do that, I found that cannot meet the advance signal 
requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but 
when you want to wait for a specific point we need to condense 
the not yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the 
sync_obj alive for the synchronization to work. That is most 
likely not a good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline 
structure as a reference. This way syncobj itself can be released 
first, wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.


I've thought about that as well, but came to the conclusion that you 
run into problems because of circle dependencies.


E.g. sync_obj references sync_point and sync_point references sync_obj.
sync_obj can be freed first, only sync point references syncobj_timeline 
structure, syncobj_timeline doesn't reference sync_pt, no circle dep.




Additional to that it is quite a bit larger memory footprint because 
you need to keep the sync_obj around as well.
all signaled sync_pt are freed immediately except syncobj_timeline 
sturcture, where does extra memory foootprint take?












Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need 
that optimization.
Do you mean increasing timeline based on signal fence is not 
better? only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline value 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-04 Thread Christian König

Am 04.09.2018 um 06:04 schrieb zhoucm1:



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on previous 
pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I 
debug them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub 
fence as well, and in the meantime, their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. 
blocking in drm_syncobj_timeline_fini() until all waits are 
done is not a good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried to 
do that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but 
when you want to wait for a specific point we need to condense 
the not yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the 
sync_obj alive for the synchronization to work. That is most likely 
not a good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline structure 
as a reference. This way syncobj itself can be released first, 
wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.


I've thought about that as well, but came to the conclusion that you run 
into problems because of circle dependencies.


E.g. sync_obj references sync_point and sync_point references sync_obj.

Additional to that it is quite a bit larger memory footprint because you 
need to keep the sync_obj around as well.










Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need 
that optimization.
Do you mean increasing timeline based on signal fence is not better? 
only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline value immidiately and cannot 
free signal pt immidiately, and we also need to consider it to CPU 
query and wait.


That is actually the better coding style. We usually try to avoid 
doing things in interrupt handlers as much as possible.
OK, I see your concern, how about to delay handling to a workqueue? 
this way, we only 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-03 Thread zhoucm1



On 2018年09月03日 19:19, Christian König wrote:

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value 
when signal_pt is signaled. signal_pt is depending on previous 
pt fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only 
signaled by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I debug 
them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in 
NULL point.
and generally, when lookup them, we often need their stub fence 
as well, and in the meantime,  their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. blocking 
in drm_syncobj_timeline_fini() until all waits are done is not a 
good idea.


What you should do instead is to create a fence_array object 
with all the fence we need to wait for when a wait point is 
requested.
Yeah, this is our initial discussion result, but when I tried to 
do that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but when 
you want to wait for a specific point we need to condense the not 
yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I 
agree we can implement it with several methods, but I don't think 
there are basical differences.


The problem is that with your current approach you need the sync_obj 
alive for the synchronization to work. That is most likely not a 
good idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt 
must have waited the signal opertation, then we can easily condense 
fences for every wait pt. And meantime, we can take timeline based 
wait pt advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.
Ah, I find a easy way, we just need to make syncobj_timeline structure 
as a reference. This way syncobj itself can be released first, 
wait_pt/signal_pt don't need syncobj at all.

every wait_pt/signal_pt keep a reference of syncobj_timeline.







Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need that 
optimization.
Do you mean increasing timeline based on signal fence is not better? 
only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline value immidiately and cannot 
free signal pt immidiately, and we also need to consider it to CPU 
query and wait.


That is actually the better coding style. We usually try to avoid 
doing things in interrupt handlers as much as possible.
OK, I see your concern, how about to delay handling to a workqueue? this 
way, we only increase timeline value and wake up workqueue in fence cb, 
is that acceptable?





How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-03 Thread Christian König

Am 03.09.2018 um 12:07 schrieb Chunming Zhou:



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt 
fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only signaled 
by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I debug 
them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I 
tried to use them, they are freed sometimes, and results in NULL 
point.
and generally, when lookup them, we often need their stub fence 
as well, and in the meantime,  their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. blocking 
in drm_syncobj_timeline_fini() until all waits are done is not a 
good idea.


What you should do instead is to create a fence_array object with 
all the fence we need to wait for when a wait point is requested.
Yeah, this is our initial discussion result, but when I tried to 
do that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at 
least, there is some overhead.


As far as I can see that is independent of using a fence array 
here. See you can either use a ring buffer or an rb-tree, but when 
you want to wait for a specific point we need to condense the not 
yet signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I agree 
we can implement it with several methods, but I don't think there 
are basical differences.


The problem is that with your current approach you need the sync_obj 
alive for the synchronization to work. That is most likely not a good 
idea.
Indeed, I will fix that. How abount only creating fence array for 
every wait pt when syncobj release? when syncobj release, wait pt must 
have waited the signal opertation, then we can easily condense fences 
for every wait pt. And meantime, we can take timeline based wait pt 
advantage.


That could work, but I don't see how you want to replace the already 
issued fence with a fence_array when the sync object is destroyed.


Additional to that I would rather prefer a consistent handling, e.g. 
without extra rarely used code paths.






Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need that 
optimization.
Do you mean increasing timeline based on signal fence is not better? 
only update timeline value when requested by a wait pt?


Yes, exactly.

This way, we will not update timeline value immidiately and cannot 
free signal pt immidiately, and we also need to consider it to CPU 
query and wait.


That is actually the better coding style. We usually try to avoid doing 
things in interrupt handlers as much as possible.


How about this idea:
1. Each signaling point is a fence implementation with an rb node.
2. Each node keeps a reference to the last previously inserted node.
3. Each node is referenced by the sync object itself.
4. Before each signal/wait operation we do a garbage collection and 
remove the first node from the tree as long as it is signaled.


5. When enable_signaling is requested for a node we cascade that to the 
left using rb_prev.
    This ensures that signaling is enabled for the current fence as 
well as all previous fences.


6. A wait just looks into the tree for the signal point lower or equal 
of the requested sequence number.


7. When the sync object is released we use 
rbtree_postorder_for_each_entry_safe() and drop 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-03 Thread Chunming Zhou



在 2018/9/3 16:50, Christian König 写道:

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt 
fence and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only signaled 
by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I debug 
them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I tried 
to use them, they are freed sometimes, and results in NULL point.
and generally, when lookup them, we often need their stub fence 
as well, and in the meantime,  their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. blocking 
in drm_syncobj_timeline_fini() until all waits are done is not a 
good idea.


What you should do instead is to create a fence_array object with 
all the fence we need to wait for when a wait point is requested.
Yeah, this is our initial discussion result, but when I tried to do 
that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at least, 
there is some overhead.


As far as I can see that is independent of using a fence array here. 
See you can either use a ring buffer or an rb-tree, but when you 
want to wait for a specific point we need to condense the not yet 
signaled fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I agree 
we can implement it with several methods, but I don't think there are 
basical differences.


The problem is that with your current approach you need the sync_obj 
alive for the synchronization to work. That is most likely not a good 
idea.
Indeed, I will fix that. How abount only creating fence array for every 
wait pt when syncobj release? when syncobj release, wait pt must have 
waited the signal opertation, then we can easily condense fences for 
every wait pt. And meantime, we can take timeline based wait pt advantage.





Additional to that you enable signaling without a need from the 
waiting side. That is rather bad for implementations which need that 
optimization.
Do you mean increasing timeline based on signal fence is not better? 
only update timeline value when requested by a wait pt? This way, we 
will not update timeline value immidiately and cannot free signal pt 
immidiately, and we also need to consider it to CPU  query and wait.


Thanks,
David Zhou



I suggest to either condense all the fences you need to wait for in an 
array during the wait operation, or reference count the sync_obj and 
only enable the signaling on the fences when requested by a wait.






    b. because we allowed "wait-before-signal", if 
"wait-before-signal" happens, there isn't signal fence which can be 
used to create fence array.


Well, again we DON'T support wait-before-signal here. I will 
certainly NAK any implementation which tries to do this until we 
haven't figured out all the resource management constraints and I 
still don't see how we can do this.
yes, we don't support real wait-before-signal in patch now, just a 
fake wait-before-signal, which still wait on CS submission until 
signal operation coming through wait_event, which is the conclusion 
we disscussed before.


Well in this case we should call it wait-for-signal and not 
wait-before-signal :)







So timeline value is good to resolve that.



Otherwise somebody can easily construct a situation where timeline 
sync obj A waits on B and B waits on A.
Same situation also can happens with fence-array, we only can see 
this is a programming 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-03 Thread Christian König

Am 03.09.2018 um 06:13 schrieb Chunming Zhou:



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence 
and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only signaled 
by equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I debug 
them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I tried 
to use them, they are freed sometimes, and results in NULL point.
and generally, when lookup them, we often need their stub fence as 
well, and in the meantime,  their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the 
lifecycle.


Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. blocking in 
drm_syncobj_timeline_fini() until all waits are done is not a good 
idea.


What you should do instead is to create a fence_array object with 
all the fence we need to wait for when a wait point is requested.
Yeah, this is our initial discussion result, but when I tried to do 
that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at least, 
there is some overhead.


As far as I can see that is independent of using a fence array here. 
See you can either use a ring buffer or an rb-tree, but when you want 
to wait for a specific point we need to condense the not yet signaled 
fences into an array.
again, need to find the range of where the specific point in, we 
should close to timeline semantics, I also refered the sw_sync.c 
timeline, usally wait_pt is signalled by timeline point. And I agree 
we can implement it with several methods, but I don't think there are 
basical differences.


The problem is that with your current approach you need the sync_obj 
alive for the synchronization to work. That is most likely not a good idea.


Additional to that you enable signaling without a need from the waiting 
side. That is rather bad for implementations which need that optimization.


I suggest to either condense all the fences you need to wait for in an 
array during the wait operation, or reference count the sync_obj and 
only enable the signaling on the fences when requested by a wait.






    b. because we allowed "wait-before-signal", if 
"wait-before-signal" happens, there isn't signal fence which can be 
used to create fence array.


Well, again we DON'T support wait-before-signal here. I will 
certainly NAK any implementation which tries to do this until we 
haven't figured out all the resource management constraints and I 
still don't see how we can do this.
yes, we don't support real wait-before-signal in patch now, just a 
fake wait-before-signal, which still wait on CS submission until 
signal operation coming through wait_event, which is the conclusion we 
disscussed before.


Well in this case we should call it wait-for-signal and not 
wait-before-signal :)







So timeline value is good to resolve that.



Otherwise somebody can easily construct a situation where timeline 
sync obj A waits on B and B waits on A.
Same situation also can happens with fence-array, we only can see 
this is a programming bug with incorrect use.


No, fence-array is initialized only once with a static list of 
fences. This way it is impossible to add the fence-array to itself 
for example.


E.g. you can't build circle dependencies with that.
we already wait for signal operation event, so never build circle 
dependencies with that. The theory is same.


Yeah, ok that is certainly true.

Regards,
Christian.







Better use rbtree_postorder_for_each_entry_safe() for this.
From the comments, seems we cannot use it here, we need erase node 
here.

Comments:
 * Note, however, that it cannot handle other modifications that 
re-order the
 * rbtree it 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-09-02 Thread Chunming Zhou



在 2018/8/30 19:32, Christian König 写道:

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence 
and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point 
every time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only signaled by 
equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I debug 
them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I tried 
to use them, they are freed sometimes, and results in NULL point.
and generally, when lookup them, we often need their stub fence as 
well, and in the meantime,  their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. blocking in 
drm_syncobj_timeline_fini() until all waits are done is not a good 
idea.


What you should do instead is to create a fence_array object with 
all the fence we need to wait for when a wait point is requested.
Yeah, this is our initial discussion result, but when I tried to do 
that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at least, 
there is some overhead.


As far as I can see that is independent of using a fence array here. 
See you can either use a ring buffer or an rb-tree, but when you want 
to wait for a specific point we need to condense the not yet signaled 
fences into an array.
again, need to find the range of where the specific point in, we should 
close to timeline semantics, I also refered the sw_sync.c timeline, 
usally wait_pt is signalled by timeline point. And I agree we can 
implement it with several methods, but I don't think there are basical 
differences.




    b. because we allowed "wait-before-signal", if 
"wait-before-signal" happens, there isn't signal fence which can be 
used to create fence array.


Well, again we DON'T support wait-before-signal here. I will certainly 
NAK any implementation which tries to do this until we haven't figured 
out all the resource management constraints and I still don't see how 
we can do this.
yes, we don't support real wait-before-signal in patch now, just a fake 
wait-before-signal, which still wait on CS submission until signal 
operation coming through wait_event, which is the conclusion we 
disscussed before.





So timeline value is good to resolve that.



Otherwise somebody can easily construct a situation where timeline 
sync obj A waits on B and B waits on A.
Same situation also can happens with fence-array, we only can see 
this is a programming bug with incorrect use.


No, fence-array is initialized only once with a static list of fences. 
This way it is impossible to add the fence-array to itself for example.


E.g. you can't build circle dependencies with that.
we already wait for signal operation event, so never build circle 
dependencies with that. The theory is same.






Better use rbtree_postorder_for_each_entry_safe() for this.
From the comments, seems we cannot use it here, we need erase node 
here.

Comments:
 * Note, however, that it cannot handle other modifications that 
re-order the
 * rbtree it is iterating over. This includes calling rb_erase() on 
@pos, as

 * rb_erase() may rebalance the tree, causing us to miss some nodes.
 */


Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" 
instead if you want to destruct the whole tree.

OK, got it, I will change it in v4.

Thanks,
David Zhou


Regards,
Christian.



Thanks,
David Zhou


Regards,
Christian.


+    wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+    node = rb_next(node);
I already get the next node before erasing this node, so the "for 
(..." sentence is equal with "while (...)"


That still doesn't work. The problem is the because 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-30 Thread Christian König

[SNIP]



+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence 
and itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for syncobj_timeline->signal_point every 
time when signal ops is performed.
    b. when wait ops is comming, wait pt will fetch above last 
signal pt value as its wait point. wait pt will be only signaled by 
equal point value signal_pt.




and why is the stub fence their base?
Good question, I tried to kzalloc for them as well when I debug 
them, I encountered a problem:
I lookup/find wait_pt or signal_pt successfully, but when I tried to 
use them, they are freed sometimes, and results in NULL point.
and generally, when lookup them, we often need their stub fence as 
well, and in the meantime,  their lifecycle are same.

Above reason, I make stub fence as their base.


That sounds like you only did this because you messed up the lifecycle.

Additional to that I don't think you correctly considered the 
lifecycle of the waits and the sync object itself. E.g. blocking in 
drm_syncobj_timeline_fini() until all waits are done is not a good idea.


What you should do instead is to create a fence_array object with all 
the fence we need to wait for when a wait point is requested.
Yeah, this is our initial discussion result, but when I tried to do 
that, I found that cannot meet the advance signal requirement:
    a. We need to consider the wait and signal pt value are not 
one-to-one match, it's difficult to find dependent point, at least, 
there is some overhead.


As far as I can see that is independent of using a fence array here. See 
you can either use a ring buffer or an rb-tree, but when you want to 
wait for a specific point we need to condense the not yet signaled 
fences into an array.


    b. because we allowed "wait-before-signal", if 
"wait-before-signal" happens, there isn't signal fence which can be 
used to create fence array.


Well, again we DON'T support wait-before-signal here. I will certainly 
NAK any implementation which tries to do this until we haven't figured 
out all the resource management constraints and I still don't see how we 
can do this.



So timeline value is good to resolve that.



Otherwise somebody can easily construct a situation where timeline 
sync obj A waits on B and B waits on A.
Same situation also can happens with fence-array, we only can see this 
is a programming bug with incorrect use.


No, fence-array is initialized only once with a static list of fences. 
This way it is impossible to add the fence-array to itself for example.


E.g. you can't build circle dependencies with that.


Better use rbtree_postorder_for_each_entry_safe() for this.
From the comments, seems we cannot use it here, we need erase node 
here.

Comments:
 * Note, however, that it cannot handle other modifications that 
re-order the
 * rbtree it is iterating over. This includes calling rb_erase() on 
@pos, as

 * rb_erase() may rebalance the tree, causing us to miss some nodes.
 */


Yeah, but your current implementation has the same problem :)

You need to use something like "while (node = rb_first(...))" instead 
if you want to destruct the whole tree.


Regards,
Christian.



Thanks,
David Zhou


Regards,
Christian.


+    wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node);
+    node = rb_next(node);
I already get the next node before erasing this node, so the "for 
(..." sentence is equal with "while (...)"


That still doesn't work. The problem is the because of an erase the next 
node might skip some nodes when rebalancing.


What you need to do is to either not erase the nodes at all (because we 
re-initialize the tree anyway) or always use rb_first().


Regards,
Christian.



Thanks,
David Zhou


+ rb_erase(_pt->node,
+ _timeline->wait_pt_tree);
+    RB_CLEAR_NODE(_pt->node);
+    spin_unlock(>lock);
+    dma_fence_wait(_pt->base.base, true);
+    spin_lock(>lock);
+    /* kfree(wait_pt) is excuted by fence put */
+    dma_fence_put(_pt->base.base);
+    }
+    list_for_each_entry_safe(signal_pt, tmp,
+ _timeline->signal_pt_list, list) {
+    list_del(_pt->list);
+    if (signal_pt->signal_fence) {
+ 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-30 Thread zhoucm1



On 2018年08月30日 15:25, Christian König wrote:

Am 30.08.2018 um 05:50 schrieb zhoucm1:



On 2018年08月29日 19:42, Christian König wrote:

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an 
integer payload
identifying a point in a timeline. Such timeline semaphores support 
the

following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always 
is signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion 
fence, when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last 
signal point, and this wait PT is only signaled by related signal 
point PT.

2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 
-

  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval 
is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct 
dma_fence *fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct 
dma_fence *fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+    kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.


stub fence is used widely in both normal and timeline syncobj, if you 
think which increase patch size, I can do a separate patch for that.





+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence 
and itself signal fence from signal ops.

    b. wait pt tree checks if 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-30 Thread Christian König

Am 30.08.2018 um 05:50 schrieb zhoucm1:



On 2018年08月29日 19:42, Christian König wrote:

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an 
integer payload

identifying a point in a timeline. Such timeline semaphores support the
following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion 
fence, when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last 
signal point, and this wait PT is only signaled by related signal 
point PT.

2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 
-

  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval 
is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence 
*fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct 
dma_fence *fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+    kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.


stub fence is used widely in both normal and timeline syncobj, if you 
think which increase patch size, I can do a separate patch for that.





+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence and 
itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works 

[PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-30 Thread Chunming Zhou
VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
   * CPU query - A host operation that allows querying the payload of the
 timeline semaphore.
   * CPU wait - A host operation that allows a blocking wait for a
 timeline semaphore to reach a specified value.
   * Device wait - A device operation that allows waiting for a
 timeline semaphore to reach a specified value.
   * Device signal - A device operation that allows advancing the
 timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
b. normal syncobj wait op will create a wait pt with last signal point, and 
this wait PT is only signaled by related signal point PT.
2. many bug fix and clean up
3. stub fence moving is moved to other patch.
4. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/drm_syncobj.c  | 570 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
 include/drm/drm_syncobj.h  |  78 +--
 include/uapi/drm/drm.h |   1 +
 4 files changed, 498 insertions(+), 155 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index e9ce623d049e..a6cfec48dbbe 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,9 @@
 #include "drm_internal.h"
 #include 
 
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */
+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
 struct drm_syncobj_stub_fence {
struct dma_fence base;
spinlock_t lock;
@@ -82,6 +85,21 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops 
= {
.release = drm_syncobj_stub_fence_release,
 };
 
+struct drm_syncobj_wait_pt {
+   struct drm_syncobj_stub_fence base;
+   u64value;
+   struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+   struct drm_syncobj_stub_fence base;
+   struct dma_fence *signal_fence;
+   struct dma_fence *pre_pt_base;
+   struct dma_fence_cb signal_cb;
+   struct dma_fence_cb pre_pt_cb;
+   struct drm_syncobj *syncobj;
+   u64value;
+   struct list_head list;
+};
 
 /**
  * drm_syncobj_find - lookup and reference a sync object.
@@ -109,59 +127,256 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file 
*file_private,
 }
 EXPORT_SYMBOL(drm_syncobj_find);
 
-static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,
-   struct drm_syncobj_cb *cb,
-   drm_syncobj_func_t func)
+static void drm_syncobj_timeline_init(struct drm_syncobj *syncobj,
+ struct drm_syncobj_timeline 
*syncobj_timeline)
 {
-   cb->func = func;
-   list_add_tail(>node, >cb_list);
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   syncobj_timeline->timeline_context = dma_fence_context_alloc(1);
+   syncobj_timeline->timeline = 0;
+   syncobj_timeline->signal_point = 0;
+   init_waitqueue_head(_timeline->wq);
+
+   syncobj_timeline->wait_pt_tree = RB_ROOT;
+   INIT_LIST_HEAD(_timeline->signal_pt_list);
+   spin_unlock_irqrestore(>lock, flags);
 }
 
-static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj,
-struct dma_fence **fence,
-

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-29 Thread zhoucm1



On 2018年08月29日 19:42, Christian König wrote:

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer 
payload

identifying a point in a timeline. Such timeline semaphores support the
following operations:
    * CPU query - A host operation that allows querying the payload 
of the

  timeline semaphore.
    * CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
    * Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
    * Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is 
signaled before the late PT.

a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, 
when PT[N] fence is signaled,

the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when 
timeline is increasing, will compare
wait PTs value with new timeline value, if PT value is lower than 
timeline value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait 
on any point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal 
PT, we need a sumission fence to

perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate 
patch. (Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate 
patch.
5. drop the submission_fence implementation and instead use 
wait_event() for that. (Christian)

6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and 
Christian)
 a. normal syncobj signal op will create a signal PT to tail of 
signal pt list.
 b. normal syncobj wait op will create a wait pt with last signal 
point, and this wait PT is only signaled by related signal point PT.

2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 -
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c 
b/drivers/gpu/drm/drm_syncobj.c

index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  +/* merge normal syncobj to timeline syncobj, the point interval is 
1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+    struct dma_fence base;
+    spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence 
*fence)

+{
+    return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence 
*fence)

+{
+    return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+    kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+    .get_driver_name = drm_syncobj_stub_fence_get_name,
+    .get_timeline_name = drm_syncobj_stub_fence_get_name,
+    .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+    .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.


stub fence is used widely in both normal and timeline syncobj, if you 
think which increase patch size, I can do a separate patch for that.





+
+struct drm_syncobj_wait_pt {
+    struct drm_syncobj_stub_fence base;
+    u64    value;
+    struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+    struct drm_syncobj_stub_fence base;
+    struct dma_fence *signal_fence;
+    struct dma_fence *pre_pt_base;
+    struct dma_fence_cb signal_cb;
+    struct dma_fence_cb pre_pt_cb;
+    struct drm_syncobj *syncobj;
+    u64    value;
+    struct list_head list;
+};


What are those two structures good for

They are used to record wait ops points and signal ops points.
For timeline, they are connected by timeline value, works like:
    a. signal pt increase timeline value to signal_pt value when 
signal_pt is signaled. signal_pt is depending on previous pt fence and 
itself signal fence from signal ops.

    b. wait pt tree checks if timeline value over itself value.

For normal, works like:
    a. signal pt increase 1 for 

Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-29 Thread Christian König

Am 29.08.2018 um 12:44 schrieb Chunming Zhou:

VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
* CPU query - A host operation that allows querying the payload of the
  timeline semaphore.
* CPU wait - A host operation that allows a blocking wait for a
  timeline semaphore to reach a specified value.
* Device wait - A device operation that allows waiting for a
  timeline semaphore to reach a specified value.
* Device signal - A device operation that allows advancing the
  timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
 a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
 b. normal syncobj wait op will create a wait pt with last signal point, 
and this wait PT is only signaled by related signal point PT.
2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
  drivers/gpu/drm/drm_syncobj.c  | 593 -
  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
  include/drm/drm_syncobj.h  |  78 +--
  include/uapi/drm/drm.h |   1 +
  4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
  #include "drm_internal.h"
  #include 
  
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */

+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+   struct dma_fence base;
+   spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+   kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+   .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+   .release = drm_syncobj_stub_fence_release,
+};


Do we really need to move that around? Could reduce the size of the 
patch quite a bit if we don't.



+
+struct drm_syncobj_wait_pt {
+   struct drm_syncobj_stub_fence base;
+   u64value;
+   struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+   struct drm_syncobj_stub_fence base;
+   struct dma_fence *signal_fence;
+   struct dma_fence *pre_pt_base;
+   struct dma_fence_cb signal_cb;
+   struct dma_fence_cb pre_pt_cb;
+   struct drm_syncobj *syncobj;
+   u64value;
+   struct list_head list;
+};


What are those two structures good for and why is the stub fence their base?


+
  /**
   * drm_syncobj_find - lookup and reference a sync object.
   * @file_private: drm file private pointer
@@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file 
*file_private,
  }
  EXPORT_SYMBOL(drm_syncobj_find);
  
-static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,

-   struct drm_syncobj_cb *cb,
-   drm_syncobj_func_t func)
+static void drm_syncobj_timeline_init(struct drm_syncobj_timeline
+  

[PATCH 5/5] [RFC]drm: add syncobj timeline support v3

2018-08-29 Thread Chunming Zhou
VK_KHR_timeline_semaphore:
This extension introduces a new type of semaphore that has an integer payload
identifying a point in a timeline. Such timeline semaphores support the
following operations:
   * CPU query - A host operation that allows querying the payload of the
 timeline semaphore.
   * CPU wait - A host operation that allows a blocking wait for a
 timeline semaphore to reach a specified value.
   * Device wait - A device operation that allows waiting for a
 timeline semaphore to reach a specified value.
   * Device signal - A device operation that allows advancing the
 timeline semaphore to a specified value.

Since it's a timeline, that means the front time point(PT) always is signaled 
before the late PT.
a. signal PT design:
Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when 
PT[N] fence is signaled,
the timeline will increase to value of PT[N].
b. wait PT design:
Wait PT fence is signaled by reaching timeline point value, when timeline is 
increasing, will compare
wait PTs value with new timeline value, if PT value is lower than timeline 
value, then wait PT will be
signaled, otherwise keep in list. semaphore wait operation can wait on any 
point of timeline,
so need a RB tree to order them. And wait PT could ahead of signal PT, we need 
a sumission fence to
perform that.

v2:
1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian)
2. move unexposed denitions to .c file. (Daniel Vetter)
3. split up the change to drm_syncobj_find_fence() in a separate patch. 
(Christian)
4. split up the change to drm_syncobj_replace_fence() in a separate patch.
5. drop the submission_fence implementation and instead use wait_event() for 
that. (Christian)
6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter)

v3:
1. replace normal syncobj with timeline implemenation. (Vetter and Christian)
a. normal syncobj signal op will create a signal PT to tail of signal pt 
list.
b. normal syncobj wait op will create a wait pt with last signal point, and 
this wait PT is only signaled by related signal point PT.
2. some bug fix and clean up
3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj

TODO:
1. CPU query and wait on timeline semaphore.

Signed-off-by: Chunming Zhou 
Cc: Christian Konig 
Cc: Dave Airlie 
Cc: Daniel Rakos 
Cc: Daniel Vetter 
---
 drivers/gpu/drm/drm_syncobj.c  | 593 -
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   4 +-
 include/drm/drm_syncobj.h  |  78 +--
 include/uapi/drm/drm.h |   1 +
 4 files changed, 505 insertions(+), 171 deletions(-)

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index ab43559398d0..f701d9ef1b81 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -56,6 +56,50 @@
 #include "drm_internal.h"
 #include 
 
+/* merge normal syncobj to timeline syncobj, the point interval is 1 */
+#define DRM_SYNCOBJ_NORMAL_POINT 1
+
+struct drm_syncobj_stub_fence {
+   struct dma_fence base;
+   spinlock_t lock;
+};
+
+static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence)
+{
+return "syncobjstub";
+}
+
+static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence)
+{
+return !dma_fence_is_signaled(fence);
+}
+static void drm_syncobj_stub_fence_release(struct dma_fence *f)
+{
+   kfree(f);
+}
+static const struct dma_fence_ops drm_syncobj_stub_fence_ops = {
+   .get_driver_name = drm_syncobj_stub_fence_get_name,
+   .get_timeline_name = drm_syncobj_stub_fence_get_name,
+   .enable_signaling = drm_syncobj_stub_fence_enable_signaling,
+   .release = drm_syncobj_stub_fence_release,
+};
+
+struct drm_syncobj_wait_pt {
+   struct drm_syncobj_stub_fence base;
+   u64value;
+   struct rb_node   node;
+};
+struct drm_syncobj_signal_pt {
+   struct drm_syncobj_stub_fence base;
+   struct dma_fence *signal_fence;
+   struct dma_fence *pre_pt_base;
+   struct dma_fence_cb signal_cb;
+   struct dma_fence_cb pre_pt_cb;
+   struct drm_syncobj *syncobj;
+   u64value;
+   struct list_head list;
+};
+
 /**
  * drm_syncobj_find - lookup and reference a sync object.
  * @file_private: drm file private pointer
@@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file 
*file_private,
 }
 EXPORT_SYMBOL(drm_syncobj_find);
 
-static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj,
-   struct drm_syncobj_cb *cb,
-   drm_syncobj_func_t func)
+static void drm_syncobj_timeline_init(struct drm_syncobj_timeline
+ *syncobj_timeline)
 {
-   cb->func = func;
-   list_add_tail(>node, >cb_list);
+   syncobj_timeline->timeline_context = dma_fence_context_alloc(1);
+   syncobj_timeline->timeline = 0;
+   syncobj_timeline->signal_point =