Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 05.09.2018 um 05:36 schrieb zhoucm1: On 2018年09月04日 17:20, Christian König wrote: Am 04.09.2018 um 11:00 schrieb zhoucm1: On 2018年09月04日 16:42, Christian König wrote: Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. 8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history? Well we of course keep what the last garbage collected number is, don't we? Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released. Yeah, but the syncobj can live for a very very long time. Not having some form of shrinking it when fences are signaled is certainly not going to fly very far. I will try to fix this problem. btw, when I try your suggestion, I find it will be difficult to implement drm_syncobj_array_wait_timeout by your idea, since it needs first_signaled. if there is un-signaled syncobj, we will still register cb to timeline value change, then still back to need enble_signaling. I don't full understand the problem here. When we need to wait for a fence it is obvious that we need to enable signaling. So what exactly is the issue? BTW: I'm going to commit patches #1-#4 to drm-misc-next now if nobody is going to object. Christian. Thanks, David Zhou Regards, Christian. Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:) Thanks, David Zhou Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月04日 17:20, Christian König wrote: Am 04.09.2018 um 11:00 schrieb zhoucm1: On 2018年09月04日 16:42, Christian König wrote: Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. 8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history? Well we of course keep what the last garbage collected number is, don't we? Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released. Yeah, but the syncobj can live for a very very long time. Not having some form of shrinking it when fences are signaled is certainly not going to fly very far. I will try to fix this problem. btw, when I try your suggestion, I find it will be difficult to implement drm_syncobj_array_wait_timeout by your idea, since it needs first_signaled. if there is un-signaled syncobj, we will still register cb to timeline value change, then still back to need enble_signaling. Thanks, David Zhou Regards, Christian. Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:) Thanks, David Zhou Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 04.09.2018 um 11:00 schrieb zhoucm1: On 2018年09月04日 16:42, Christian König wrote: Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. 8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history? Well we of course keep what the last garbage collected number is, don't we? Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released. Yeah, but the syncobj can live for a very very long time. Not having some form of shrinking it when fences are signaled is certainly not going to fly very far. Regards, Christian. Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:) Thanks, David Zhou Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月04日 16:42, Christian König wrote: Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. 8 is a signaled node, waitA/signal operation do garbage collection, how waitB(7) know the garbage history? Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. No one said the it's a infinite timeline, timeline will stop increasing when syncobj is released. Anyway kref is a good way to solve the 'free' problem, I will try to use it improve my patch, of course, will refer your idea.:) Thanks, David Zhou Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 04.09.2018 um 10:27 schrieb zhoucm1: On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? The garbage collection is only done for signaled nodes. So when 8 is already garbage collected and 7 is asked we know that we don't need to return anything. Since there is no timeline as a line, I think this is not right direction. That is actually intended. There is no infinite timeline here, just a windows of the last not yet signaled fences. Otherwise you will never be able to release nodes from the tree since you always need to keep them around just in case somebody asks for a lower number. Regards, Christian. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月04日 16:05, Christian König wrote: Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. If defined this, then problem is coming again, if 8 is removed when garbage collection, you will return 15? Since there is no timeline as a line, I think this is not right direction. The key is that as soon as a signal point is added adding a previous point is no longer allowed. That's intention. Regards, David Zhou 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 04.09.2018 um 09:53 schrieb zhoucm1: [SNIP] How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. After re-thought your idea, I think it doesn't work since there is no timeline value as a line: signal pt value doesn't must be continues, which can be jump by signal operation, like 1, 4, 8, 15, 19, e.g. there are five singal_pt, signal_pt1->signal_pt4->signal_pt8->signal_pt15->signal_pt19, if a wait pt is 7, do you mean this wait only needs signal_pt1 and signal_pt4??? That's certainly not right, we need to make sure the timeline value is bigger than wait pt value, that means signal_pt8 is need for wait_pt7. That can be defined as we like it, e.g. when a wait operation asks for 7 we can return 8 as well. The key is that as soon as a signal point is added adding a previous point is no longer allowed. 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop the extra reference to each node, but never call rb_erase! This way the rb_tree stays in memory, but without a root (e.g. the sync object). It only destructs itself when the looked up references to the nodes are dropped. And here, who will destroy rb node since no one do enable_signaling, and there is no callback to free themselves. The node will be destroyed when the last reference drops, not when enable_signaling is called. In other words the sync_obj keeps the references to each tree object to provide the wait operation, as soon as the sync_obj is destroyed we don't need that functionality any more. We don't even need to wait for anything to be signaled, this way we can drop all unused signal points as soon as the sync_obj is destroyed. Only the used ones will stay alive and provide the necessary functionality to provide the signal for each wait operation. Regards, Christian. Regards, David Zhou Well that is quite a bunch of logic, but I think that should work fine. Yeah, it could work, simple timeline reference also can solve 'free' problem. I think this approach is still quite a bit better, e.g. you don't run into circle dependency problems, it needs less memory and each node has always the same size which means we can use a kmem_cache for it. Regards, Christian. Thanks, David Zhou ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月04日 15:00, Christian König wrote: Am 04.09.2018 um 06:04 schrieb zhoucm1: On 2018年09月03日 19:19, Christian König wrote: Am 03.09.2018 um 12:07 schrieb Chunming Zhou: 在 2018/9/3 16:50, Christian König 写道: Am 03.09.2018 um 06:13 schrieb Chunming Zhou: 在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea. Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage. That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed. Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths. Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline. I've thought about that as well, but came to the conclusion that you run into problems because of circle dependencies. E.g. sync_obj references sync_point and sync_point references sync_obj. sync_obj can be freed first, only sync point references syncobj_timeline structure, syncobj_timeline doesn't reference sync_pt, no circle dep. Additional to that it is quite a bit larger memory footprint because you need to keep the sync_obj around as well. all signaled sync_pt are freed immediately except syncobj_timeline sturcture, where does extra memory foootprint take? Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization. Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt? Yes, exactly. This way, we will not update timeline value
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 04.09.2018 um 06:04 schrieb zhoucm1: On 2018年09月03日 19:19, Christian König wrote: Am 03.09.2018 um 12:07 schrieb Chunming Zhou: 在 2018/9/3 16:50, Christian König 写道: Am 03.09.2018 um 06:13 schrieb Chunming Zhou: 在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea. Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage. That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed. Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths. Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline. I've thought about that as well, but came to the conclusion that you run into problems because of circle dependencies. E.g. sync_obj references sync_point and sync_point references sync_obj. Additional to that it is quite a bit larger memory footprint because you need to keep the sync_obj around as well. Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization. Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt? Yes, exactly. This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait. That is actually the better coding style. We usually try to avoid doing things in interrupt handlers as much as possible. OK, I see your concern, how about to delay handling to a workqueue? this way, we only
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年09月03日 19:19, Christian König wrote: Am 03.09.2018 um 12:07 schrieb Chunming Zhou: 在 2018/9/3 16:50, Christian König 写道: Am 03.09.2018 um 06:13 schrieb Chunming Zhou: 在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea. Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage. That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed. Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths. Ah, I find a easy way, we just need to make syncobj_timeline structure as a reference. This way syncobj itself can be released first, wait_pt/signal_pt don't need syncobj at all. every wait_pt/signal_pt keep a reference of syncobj_timeline. Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization. Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt? Yes, exactly. This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait. That is actually the better coding style. We usually try to avoid doing things in interrupt handlers as much as possible. OK, I see your concern, how about to delay handling to a workqueue? this way, we only increase timeline value and wake up workqueue in fence cb, is that acceptable? How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 03.09.2018 um 12:07 schrieb Chunming Zhou: 在 2018/9/3 16:50, Christian König 写道: Am 03.09.2018 um 06:13 schrieb Chunming Zhou: 在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea. Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage. That could work, but I don't see how you want to replace the already issued fence with a fence_array when the sync object is destroyed. Additional to that I would rather prefer a consistent handling, e.g. without extra rarely used code paths. Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization. Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt? Yes, exactly. This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait. That is actually the better coding style. We usually try to avoid doing things in interrupt handlers as much as possible. How about this idea: 1. Each signaling point is a fence implementation with an rb node. 2. Each node keeps a reference to the last previously inserted node. 3. Each node is referenced by the sync object itself. 4. Before each signal/wait operation we do a garbage collection and remove the first node from the tree as long as it is signaled. 5. When enable_signaling is requested for a node we cascade that to the left using rb_prev. This ensures that signaling is enabled for the current fence as well as all previous fences. 6. A wait just looks into the tree for the signal point lower or equal of the requested sequence number. 7. When the sync object is released we use rbtree_postorder_for_each_entry_safe() and drop
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
在 2018/9/3 16:50, Christian König 写道: Am 03.09.2018 um 06:13 schrieb Chunming Zhou: 在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea. Indeed, I will fix that. How abount only creating fence array for every wait pt when syncobj release? when syncobj release, wait pt must have waited the signal opertation, then we can easily condense fences for every wait pt. And meantime, we can take timeline based wait pt advantage. Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization. Do you mean increasing timeline based on signal fence is not better? only update timeline value when requested by a wait pt? This way, we will not update timeline value immidiately and cannot free signal pt immidiately, and we also need to consider it to CPU query and wait. Thanks, David Zhou I suggest to either condense all the fences you need to wait for in an array during the wait operation, or reference count the sync_obj and only enable the signaling on the fences when requested by a wait. b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array. Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this. yes, we don't support real wait-before-signal in patch now, just a fake wait-before-signal, which still wait on CS submission until signal operation coming through wait_event, which is the conclusion we disscussed before. Well in this case we should call it wait-for-signal and not wait-before-signal :) So timeline value is good to resolve that. Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A. Same situation also can happens with fence-array, we only can see this is a programming
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 03.09.2018 um 06:13 schrieb Chunming Zhou: 在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. The problem is that with your current approach you need the sync_obj alive for the synchronization to work. That is most likely not a good idea. Additional to that you enable signaling without a need from the waiting side. That is rather bad for implementations which need that optimization. I suggest to either condense all the fences you need to wait for in an array during the wait operation, or reference count the sync_obj and only enable the signaling on the fences when requested by a wait. b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array. Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this. yes, we don't support real wait-before-signal in patch now, just a fake wait-before-signal, which still wait on CS submission until signal operation coming through wait_event, which is the conclusion we disscussed before. Well in this case we should call it wait-for-signal and not wait-before-signal :) So timeline value is good to resolve that. Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A. Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use. No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example. E.g. you can't build circle dependencies with that. we already wait for signal operation event, so never build circle dependencies with that. The theory is same. Yeah, ok that is certainly true. Regards, Christian. Better use rbtree_postorder_for_each_entry_safe() for this. From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
在 2018/8/30 19:32, Christian König 写道: [SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. again, need to find the range of where the specific point in, we should close to timeline semantics, I also refered the sw_sync.c timeline, usally wait_pt is signalled by timeline point. And I agree we can implement it with several methods, but I don't think there are basical differences. b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array. Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this. yes, we don't support real wait-before-signal in patch now, just a fake wait-before-signal, which still wait on CS submission until signal operation coming through wait_event, which is the conclusion we disscussed before. So timeline value is good to resolve that. Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A. Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use. No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example. E.g. you can't build circle dependencies with that. we already wait for signal operation event, so never build circle dependencies with that. The theory is same. Better use rbtree_postorder_for_each_entry_safe() for this. From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */ Yeah, but your current implementation has the same problem :) You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree. OK, got it, I will change it in v4. Thanks, David Zhou Regards, Christian. Thanks, David Zhou Regards, Christian. + wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); + node = rb_next(node); I already get the next node before erasing this node, so the "for (..." sentence is equal with "while (...)" That still doesn't work. The problem is the because
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
[SNIP] + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for syncobj_timeline->signal_point every time when signal ops is performed. b. when wait ops is comming, wait pt will fetch above last signal pt value as its wait point. wait pt will be only signaled by equal point value signal_pt. and why is the stub fence their base? Good question, I tried to kzalloc for them as well when I debug them, I encountered a problem: I lookup/find wait_pt or signal_pt successfully, but when I tried to use them, they are freed sometimes, and results in NULL point. and generally, when lookup them, we often need their stub fence as well, and in the meantime, their lifecycle are same. Above reason, I make stub fence as their base. That sounds like you only did this because you messed up the lifecycle. Additional to that I don't think you correctly considered the lifecycle of the waits and the sync object itself. E.g. blocking in drm_syncobj_timeline_fini() until all waits are done is not a good idea. What you should do instead is to create a fence_array object with all the fence we need to wait for when a wait point is requested. Yeah, this is our initial discussion result, but when I tried to do that, I found that cannot meet the advance signal requirement: a. We need to consider the wait and signal pt value are not one-to-one match, it's difficult to find dependent point, at least, there is some overhead. As far as I can see that is independent of using a fence array here. See you can either use a ring buffer or an rb-tree, but when you want to wait for a specific point we need to condense the not yet signaled fences into an array. b. because we allowed "wait-before-signal", if "wait-before-signal" happens, there isn't signal fence which can be used to create fence array. Well, again we DON'T support wait-before-signal here. I will certainly NAK any implementation which tries to do this until we haven't figured out all the resource management constraints and I still don't see how we can do this. So timeline value is good to resolve that. Otherwise somebody can easily construct a situation where timeline sync obj A waits on B and B waits on A. Same situation also can happens with fence-array, we only can see this is a programming bug with incorrect use. No, fence-array is initialized only once with a static list of fences. This way it is impossible to add the fence-array to itself for example. E.g. you can't build circle dependencies with that. Better use rbtree_postorder_for_each_entry_safe() for this. From the comments, seems we cannot use it here, we need erase node here. Comments: * Note, however, that it cannot handle other modifications that re-order the * rbtree it is iterating over. This includes calling rb_erase() on @pos, as * rb_erase() may rebalance the tree, causing us to miss some nodes. */ Yeah, but your current implementation has the same problem :) You need to use something like "while (node = rb_first(...))" instead if you want to destruct the whole tree. Regards, Christian. Thanks, David Zhou Regards, Christian. + wait_pt = rb_entry(node, struct drm_syncobj_wait_pt, node); + node = rb_next(node); I already get the next node before erasing this node, so the "for (..." sentence is equal with "while (...)" That still doesn't work. The problem is the because of an erase the next node might skip some nodes when rebalancing. What you need to do is to either not erase the nodes at all (because we re-initialize the tree anyway) or always use rb_first(). Regards, Christian. Thanks, David Zhou + rb_erase(_pt->node, + _timeline->wait_pt_tree); + RB_CLEAR_NODE(_pt->node); + spin_unlock(>lock); + dma_fence_wait(_pt->base.base, true); + spin_lock(>lock); + /* kfree(wait_pt) is excuted by fence put */ + dma_fence_put(_pt->base.base); + } + list_for_each_entry_safe(signal_pt, tmp, + _timeline->signal_pt_list, list) { + list_del(_pt->list); + if (signal_pt->signal_fence) { +
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年08月30日 15:25, Christian König wrote: Am 30.08.2018 um 05:50 schrieb zhoucm1: On 2018年08月29日 19:42, Christian König wrote: Am 29.08.2018 um 12:44 schrieb Chunming Zhou: VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations: * CPU query - A host operation that allows querying the payload of the timeline semaphore. * CPU wait - A host operation that allows a blocking wait for a timeline semaphore to reach a specified value. * Device wait - A device operation that allows waiting for a timeline semaphore to reach a specified value. * Device signal - A device operation that allows advancing the timeline semaphore to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj TODO: 1. CPU query and wait on timeline semaphore. Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter --- drivers/gpu/drm/drm_syncobj.c | 593 - drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 78 +-- include/uapi/drm/drm.h | 1 + 4 files changed, 505 insertions(+), 171 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + +struct drm_syncobj_stub_fence { + struct dma_fence base; + spinlock_t lock; +}; + +static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ + return "syncobjstub"; +} + +static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ + return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ + kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { + .get_driver_name = drm_syncobj_stub_fence_get_name, + .get_timeline_name = drm_syncobj_stub_fence_get_name, + .enable_signaling = drm_syncobj_stub_fence_enable_signaling, + .release = drm_syncobj_stub_fence_release, +}; Do we really need to move that around? Could reduce the size of the patch quite a bit if we don't. stub fence is used widely in both normal and timeline syncobj, if you think which increase patch size, I can do a separate patch for that. + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 30.08.2018 um 05:50 schrieb zhoucm1: On 2018年08月29日 19:42, Christian König wrote: Am 29.08.2018 um 12:44 schrieb Chunming Zhou: VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations: * CPU query - A host operation that allows querying the payload of the timeline semaphore. * CPU wait - A host operation that allows a blocking wait for a timeline semaphore to reach a specified value. * Device wait - A device operation that allows waiting for a timeline semaphore to reach a specified value. * Device signal - A device operation that allows advancing the timeline semaphore to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj TODO: 1. CPU query and wait on timeline semaphore. Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter --- drivers/gpu/drm/drm_syncobj.c | 593 - drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 78 +-- include/uapi/drm/drm.h | 1 + 4 files changed, 505 insertions(+), 171 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + +struct drm_syncobj_stub_fence { + struct dma_fence base; + spinlock_t lock; +}; + +static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ + return "syncobjstub"; +} + +static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ + return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ + kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { + .get_driver_name = drm_syncobj_stub_fence_get_name, + .get_timeline_name = drm_syncobj_stub_fence_get_name, + .enable_signaling = drm_syncobj_stub_fence_enable_signaling, + .release = drm_syncobj_stub_fence_release, +}; Do we really need to move that around? Could reduce the size of the patch quite a bit if we don't. stub fence is used widely in both normal and timeline syncobj, if you think which increase patch size, I can do a separate patch for that. + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works
[PATCH 5/5] [RFC]drm: add syncobj timeline support v3
VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations: * CPU query - A host operation that allows querying the payload of the timeline semaphore. * CPU wait - A host operation that allows a blocking wait for a timeline semaphore to reach a specified value. * Device wait - A device operation that allows waiting for a timeline semaphore to reach a specified value. * Device signal - A device operation that allows advancing the timeline semaphore to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. many bug fix and clean up 3. stub fence moving is moved to other patch. 4. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj TODO: 1. CPU query and wait on timeline semaphore. Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter --- drivers/gpu/drm/drm_syncobj.c | 570 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 78 +-- include/uapi/drm/drm.h | 1 + 4 files changed, 498 insertions(+), 155 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index e9ce623d049e..a6cfec48dbbe 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,9 @@ #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + struct drm_syncobj_stub_fence { struct dma_fence base; spinlock_t lock; @@ -82,6 +85,21 @@ static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { .release = drm_syncobj_stub_fence_release, }; +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64value; + struct list_head list; +}; /** * drm_syncobj_find - lookup and reference a sync object. @@ -109,59 +127,256 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find); -static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, - struct drm_syncobj_cb *cb, - drm_syncobj_func_t func) +static void drm_syncobj_timeline_init(struct drm_syncobj *syncobj, + struct drm_syncobj_timeline *syncobj_timeline) { - cb->func = func; - list_add_tail(>node, >cb_list); + unsigned long flags; + + spin_lock_irqsave(>lock, flags); + syncobj_timeline->timeline_context = dma_fence_context_alloc(1); + syncobj_timeline->timeline = 0; + syncobj_timeline->signal_point = 0; + init_waitqueue_head(_timeline->wq); + + syncobj_timeline->wait_pt_tree = RB_ROOT; + INIT_LIST_HEAD(_timeline->signal_pt_list); + spin_unlock_irqrestore(>lock, flags); } -static int drm_syncobj_fence_get_or_add_callback(struct drm_syncobj *syncobj, -struct dma_fence **fence, -
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
On 2018年08月29日 19:42, Christian König wrote: Am 29.08.2018 um 12:44 schrieb Chunming Zhou: VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations: * CPU query - A host operation that allows querying the payload of the timeline semaphore. * CPU wait - A host operation that allows a blocking wait for a timeline semaphore to reach a specified value. * Device wait - A device operation that allows waiting for a timeline semaphore to reach a specified value. * Device signal - A device operation that allows advancing the timeline semaphore to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj TODO: 1. CPU query and wait on timeline semaphore. Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter --- drivers/gpu/drm/drm_syncobj.c | 593 - drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 78 +-- include/uapi/drm/drm.h | 1 + 4 files changed, 505 insertions(+), 171 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + +struct drm_syncobj_stub_fence { + struct dma_fence base; + spinlock_t lock; +}; + +static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ + return "syncobjstub"; +} + +static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ + return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ + kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { + .get_driver_name = drm_syncobj_stub_fence_get_name, + .get_timeline_name = drm_syncobj_stub_fence_get_name, + .enable_signaling = drm_syncobj_stub_fence_enable_signaling, + .release = drm_syncobj_stub_fence_release, +}; Do we really need to move that around? Could reduce the size of the patch quite a bit if we don't. stub fence is used widely in both normal and timeline syncobj, if you think which increase patch size, I can do a separate patch for that. + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64 value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64 value; + struct list_head list; +}; What are those two structures good for They are used to record wait ops points and signal ops points. For timeline, they are connected by timeline value, works like: a. signal pt increase timeline value to signal_pt value when signal_pt is signaled. signal_pt is depending on previous pt fence and itself signal fence from signal ops. b. wait pt tree checks if timeline value over itself value. For normal, works like: a. signal pt increase 1 for
Re: [PATCH 5/5] [RFC]drm: add syncobj timeline support v3
Am 29.08.2018 um 12:44 schrieb Chunming Zhou: VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations: * CPU query - A host operation that allows querying the payload of the timeline semaphore. * CPU wait - A host operation that allows a blocking wait for a timeline semaphore to reach a specified value. * Device wait - A device operation that allows waiting for a timeline semaphore to reach a specified value. * Device signal - A device operation that allows advancing the timeline semaphore to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj TODO: 1. CPU query and wait on timeline semaphore. Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter --- drivers/gpu/drm/drm_syncobj.c | 593 - drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 78 +-- include/uapi/drm/drm.h | 1 + 4 files changed, 505 insertions(+), 171 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + +struct drm_syncobj_stub_fence { + struct dma_fence base; + spinlock_t lock; +}; + +static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ +return "syncobjstub"; +} + +static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ +return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ + kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { + .get_driver_name = drm_syncobj_stub_fence_get_name, + .get_timeline_name = drm_syncobj_stub_fence_get_name, + .enable_signaling = drm_syncobj_stub_fence_enable_signaling, + .release = drm_syncobj_stub_fence_release, +}; Do we really need to move that around? Could reduce the size of the patch quite a bit if we don't. + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64value; + struct list_head list; +}; What are those two structures good for and why is the stub fence their base? + /** * drm_syncobj_find - lookup and reference a sync object. * @file_private: drm file private pointer @@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find); -static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, - struct drm_syncobj_cb *cb, - drm_syncobj_func_t func) +static void drm_syncobj_timeline_init(struct drm_syncobj_timeline +
[PATCH 5/5] [RFC]drm: add syncobj timeline support v3
VK_KHR_timeline_semaphore: This extension introduces a new type of semaphore that has an integer payload identifying a point in a timeline. Such timeline semaphores support the following operations: * CPU query - A host operation that allows querying the payload of the timeline semaphore. * CPU wait - A host operation that allows a blocking wait for a timeline semaphore to reach a specified value. * Device wait - A device operation that allows waiting for a timeline semaphore to reach a specified value. * Device signal - A device operation that allows advancing the timeline semaphore to a specified value. Since it's a timeline, that means the front time point(PT) always is signaled before the late PT. a. signal PT design: Signal PT fence N depends on PT[N-1] fence and signal opertion fence, when PT[N] fence is signaled, the timeline will increase to value of PT[N]. b. wait PT design: Wait PT fence is signaled by reaching timeline point value, when timeline is increasing, will compare wait PTs value with new timeline value, if PT value is lower than timeline value, then wait PT will be signaled, otherwise keep in list. semaphore wait operation can wait on any point of timeline, so need a RB tree to order them. And wait PT could ahead of signal PT, we need a sumission fence to perform that. v2: 1. remove unused DRM_SYNCOBJ_CREATE_TYPE_NORMAL. (Christian) 2. move unexposed denitions to .c file. (Daniel Vetter) 3. split up the change to drm_syncobj_find_fence() in a separate patch. (Christian) 4. split up the change to drm_syncobj_replace_fence() in a separate patch. 5. drop the submission_fence implementation and instead use wait_event() for that. (Christian) 6. WARN_ON(point != 0) for NORMAL type syncobj case. (Daniel Vetter) v3: 1. replace normal syncobj with timeline implemenation. (Vetter and Christian) a. normal syncobj signal op will create a signal PT to tail of signal pt list. b. normal syncobj wait op will create a wait pt with last signal point, and this wait PT is only signaled by related signal point PT. 2. some bug fix and clean up 3. tested by ./deqp-vk -n dEQP-VK*semaphore* for normal syncobj TODO: 1. CPU query and wait on timeline semaphore. Signed-off-by: Chunming Zhou Cc: Christian Konig Cc: Dave Airlie Cc: Daniel Rakos Cc: Daniel Vetter --- drivers/gpu/drm/drm_syncobj.c | 593 - drivers/gpu/drm/i915/i915_gem_execbuffer.c | 4 +- include/drm/drm_syncobj.h | 78 +-- include/uapi/drm/drm.h | 1 + 4 files changed, 505 insertions(+), 171 deletions(-) diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index ab43559398d0..f701d9ef1b81 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -56,6 +56,50 @@ #include "drm_internal.h" #include +/* merge normal syncobj to timeline syncobj, the point interval is 1 */ +#define DRM_SYNCOBJ_NORMAL_POINT 1 + +struct drm_syncobj_stub_fence { + struct dma_fence base; + spinlock_t lock; +}; + +static const char *drm_syncobj_stub_fence_get_name(struct dma_fence *fence) +{ +return "syncobjstub"; +} + +static bool drm_syncobj_stub_fence_enable_signaling(struct dma_fence *fence) +{ +return !dma_fence_is_signaled(fence); +} +static void drm_syncobj_stub_fence_release(struct dma_fence *f) +{ + kfree(f); +} +static const struct dma_fence_ops drm_syncobj_stub_fence_ops = { + .get_driver_name = drm_syncobj_stub_fence_get_name, + .get_timeline_name = drm_syncobj_stub_fence_get_name, + .enable_signaling = drm_syncobj_stub_fence_enable_signaling, + .release = drm_syncobj_stub_fence_release, +}; + +struct drm_syncobj_wait_pt { + struct drm_syncobj_stub_fence base; + u64value; + struct rb_node node; +}; +struct drm_syncobj_signal_pt { + struct drm_syncobj_stub_fence base; + struct dma_fence *signal_fence; + struct dma_fence *pre_pt_base; + struct dma_fence_cb signal_cb; + struct dma_fence_cb pre_pt_cb; + struct drm_syncobj *syncobj; + u64value; + struct list_head list; +}; + /** * drm_syncobj_find - lookup and reference a sync object. * @file_private: drm file private pointer @@ -82,59 +126,247 @@ struct drm_syncobj *drm_syncobj_find(struct drm_file *file_private, } EXPORT_SYMBOL(drm_syncobj_find); -static void drm_syncobj_add_callback_locked(struct drm_syncobj *syncobj, - struct drm_syncobj_cb *cb, - drm_syncobj_func_t func) +static void drm_syncobj_timeline_init(struct drm_syncobj_timeline + *syncobj_timeline) { - cb->func = func; - list_add_tail(>node, >cb_list); + syncobj_timeline->timeline_context = dma_fence_context_alloc(1); + syncobj_timeline->timeline = 0; + syncobj_timeline->signal_point =