Re: [openstack-dev] [magnum] Magnum conductor async container operations
SURO wrote: Josh, Please find my reply inline. Regards, SURO irc//freenode: suro-patz On 12/16/15 6:37 PM, Joshua Harlow wrote: SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] This mechanism can be further refined to achieve more asynchronous behavior. [Phase2] The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance. [Phase0] 5. The hand-off between Mcon and a thread from executor_threadpool can be reflected through new states on the 'container' object. These states can be helpful to recover/audit, in case of Mcon restart. [Phase1] Other considerations - 1. Using eventlet.greenthread instead of real threads => This approach would require further refactoring the execution code and embed yield logic, otherwise a single greenthread would block others to progress. Given, we will extend the mechanism for multiple COEs, and to keep the approach straight forward to begin with, we will use 'threading.Thread' instead of 'eventlet.greenthread'. Also unsure about the above, not quite sure I connect how greenthread usage requires more yield logic (I'm assuming you mean the yield statement here)? Btw if magnum is running with all things monkey patched (which it seems like https://github.com/openstack/magnum/blob/master/magnum/common/rpc_service.py#L33 does) then magnum usage of 'threading.Thread' is a 'eventlet.greenthread' underneath the covers, just fyi. SURO> Let's consider this - function A () { block B; // validation block C; // Blocking op } Now, if we make C a greenthread, as it is, would it not block the entire thread that runs through all the greenthreads? I assumed, it would and that's why we have to incorporate finer grain yield into C to leverage greenthread. If the answer is no, then we can use greenthread. I will validate which version of threading.Thread was getting used. Unsure how to answer this one. If all things are monkey patched then any time a blocking operation (i/o, lock acquisition...) is triggered the internals of eventlet go through a bunch of jumping around to then switch to another green thread (http://eventlet.net/doc/hubs.html). Once u start partially using greenthreads and mixing real threads then you have to start trying to reason about yielding in certain places (and at that point you might as well go to py3.4+ since it has syntax made just for this kind of thinking). Pointer for the thread monkey patching btw: https://github.com/eventlet/eventlet/blob/master/eventlet/patcher.py#L346 https://github.com/eventlet/eventlet/blob/master/eventlet/patcher.py#L212 Easy way to see this: >>> import eventlet >>> eventlet.monkey_patch() >>> import thread >>> thread.start_new_thread.__module__ 'eventlet.green.thread' >>> thread.allocate_lock.__module__ 'eventlet.green.thread' In that case, keeping the code for thread.Threading is portable, as it would work as desired, even if we remove monkey_patching, right? Yes, use `thread.Threading` (if u can) so that maybe magnum could switch off monkey patching someday, although typically unless u are already testing that turning it off in unit tests/functional tests it wouldn't be an easy flip that will typically 'just work' (especially since afaik magnum is using some oslo libraries which only work under greenthreads/eventlet). Refs:- [1] - https://blueprints.launchpad.net/magnum/+spec/async-container-operations
Re: [openstack-dev] [magnum] Magnum conductor async container operations
Suro, FYI. In before, we tried a distributed lock implementation for bay operations (here are the patches [1,2,3,4,5]). However, after several discussions online and offline, we decided to drop the blocking implementation for bay operations, in favor of non-blocking implementation (which is not implemented yet). You can find more discussion in here [6,7]. For the async container operations, I would suggest to consider a non-blocking approach first. If it is impossible and we need a blocking implementation, suggest to use the bay operations patches below as a reference. [1] https://review.openstack.org/#/c/171921/ [2] https://review.openstack.org/#/c/172603/ [3] https://review.openstack.org/#/c/172772/ [4] https://review.openstack.org/#/c/172773/ [5] https://review.openstack.org/#/c/172774/ [6] https://blueprints.launchpad.net/magnum/+spec/horizontal-scale [7] https://etherpad.openstack.org/p/liberty-work-magnum-horizontal-scale Best regards, Hongbin -Original Message- From: Adrian Otto [mailto:adrian.o...@rackspace.com] Sent: December-16-15 10:20 PM To: OpenStack Development Mailing List (not for usage questions) Cc: s...@yahoo-inc.com Subject: Re: [openstack-dev] [magnum] Magnum conductor async container operations > On Dec 16, 2015, at 6:24 PM, Joshua Harlow <harlo...@fastmail.com> wrote: > > SURO wrote: >> Hi all, >> Please review and provide feedback on the following design proposal >> for implementing the blueprint[1] on async-container-operations - >> >> 1. Magnum-conductor would have a pool of threads for executing the >> container operations, viz. executor_threadpool. The size of the >> executor_threadpool will be configurable. [Phase0] 2. Every time, >> Magnum-conductor(Mcon) receives a container-operation-request from >> Magnum-API(Mapi), it will do the initial validation, housekeeping and >> then pick a thread from the executor_threadpool to execute the rest >> of the operations. Thus Mcon will return from the RPC request context >> much faster without blocking the Mapi. If the executor_threadpool is >> empty, Mcon will execute in a manner it does today, i.e. >> synchronously - this will be the rate-limiting mechanism - thus >> relaying the feedback of exhaustion. >> [Phase0] >> How often we are hitting this scenario, may be indicative to the >> operator to create more workers for Mcon. >> 3. Blocking class of operations - There will be a class of >> operations, which can not be made async, as they are supposed to >> return result/content inline, e.g. 'container-logs'. [Phase0] 4. >> Out-of-order considerations for NonBlocking class of operations - >> there is a possible race around condition for create followed by >> start/delete of a container, as things would happen in parallel. To >> solve this, we will maintain a map of a container and executing >> thread, for current execution. If we find a request for an operation >> for a container-in-execution, we will block till the thread completes >> the execution. [Phase0] > > Does whatever do these operations (mcon?) run in more than one process? Yes, there may be multiple copies of magnum-conductor running on separate hosts. > Can it be requested to create in one process then delete in another? > If so is that map some distributed/cross-machine/cross-process map > that will be inspected to see what else is manipulating a given > container (so that the thread can block until that is not the case... > basically the map is acting like a operation-lock?) That’s how I interpreted it as well. This is a race prevention technique so that we don’t attempt to act on a resource until it is ready. Another way to deal with this is check the state of the resource, and return a “not ready” error if it’s not ready yet. If this happens in a part of the system that is unattended by a user, we can re-queue the call to retry after a minimum delay so that it proceeds only when the ready state is reached in the resource, or terminated after a maximum number of attempts, or if the resource enters an error state. This would allow other work to proceed while the retry waits in the queue. > If it's just local in one process, then I have a library for u that > can solve the problem of correctly ordering parallel operations ;) What we are aiming for is a bit more distributed. Adrian >> This mechanism can be further refined to achieve more asynchronous >> behavior. [Phase2] The approach above puts a prerequisite that >> operations for a given container on a given Bay would go to the same >> Magnum-conductor instance. >> [Phase0] >> 5. The hand-off between Mcon and a thread from executor_threadpool >> can be reflected through new states on the 'container'
Re: [openstack-dev] [magnum] Magnum conductor async container operations
Hongbin, Very useful pointers! Thanks for bringing up the relevant contexts! The proposal to block here for consecutive operations on same container, is the approach to start with. We can have a wait queue implementation following - that way the approach will be amortized over time. If you feel strongly, I am okay implementing the wait queue on the first go itself. [ I felt step-by-step approach carries in sizable code, easier to review ] By the way, I think the scope of bay lock and scope of per-bay-per-container operation is different too, in terms of blocking. I have a confusion about non-blocking bay-operations for horizontal scale [1] - " Heat will be having concurrency support, so we can rely on heat for the concurrency issue for now and drop the baylock implementation." - if user issues two consecutive updates on a Bay, and if the updates go through different magnum-conductors, they can land up at Heat in different order, resulting in different state of the bay. How Heat-concurrency will prevent that I am not very clear. [ Take an example of 'magnum bay-update k8sbay replace node_count=100' followed by 'magnum bay-update k8sbay replace node_count=10'] [1] - https://etherpad.openstack.org/p/liberty-work-magnum-horizontal-scale (Line 33) Regards, SURO irc//freenode: suro-patz On 12/17/15 8:10 AM, Hongbin Lu wrote: Suro, FYI. In before, we tried a distributed lock implementation for bay operations (here are the patches [1,2,3,4,5]). However, after several discussions online and offline, we decided to drop the blocking implementation for bay operations, in favor of non-blocking implementation (which is not implemented yet). You can find more discussion in here [6,7]. For the async container operations, I would suggest to consider a non-blocking approach first. If it is impossible and we need a blocking implementation, suggest to use the bay operations patches below as a reference. [1] https://review.openstack.org/#/c/171921/ [2] https://review.openstack.org/#/c/172603/ [3] https://review.openstack.org/#/c/172772/ [4] https://review.openstack.org/#/c/172773/ [5] https://review.openstack.org/#/c/172774/ [6] https://blueprints.launchpad.net/magnum/+spec/horizontal-scale [7] https://etherpad.openstack.org/p/liberty-work-magnum-horizontal-scale Best regards, Hongbin -Original Message- From: Adrian Otto [mailto:adrian.o...@rackspace.com] Sent: December-16-15 10:20 PM To: OpenStack Development Mailing List (not for usage questions) Cc: s...@yahoo-inc.com Subject: Re: [openstack-dev] [magnum] Magnum conductor async container operations On Dec 16, 2015, at 6:24 PM, Joshua Harlow <harlo...@fastmail.com> wrote: SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] Does whatever do these operations (mcon?) run in more than one process? Yes, there may be multiple copies of magnum-conductor running on separate hosts. Can it be requested to create in one process then delete in another? If so is that map some distributed/cross-machine/cross-process map that will be inspected to see what else is manipulating a given container (so that the thread can block until that is not the case... basically the map is acting like a operation-lock?) That’s how I interpreted it as well. This is a race prevention technique so that we don’t attempt to act on a resource until it is ready. Another way to deal with this is check the state of the resource, and ret
Re: [openstack-dev] [magnum] Magnum conductor async container operations
Josh, Thanks for bringing up this discussion. Modulo-hashing introduces a possibility for 'window of inconsistency', and to address the dynamism 'consistent hashing' is better. BUT, for the problem in hand I think modulo hashing is good enough, as number of worker instances for conductor in OpenStack space is managed through config - a change in which would require a restart of the conductor. If the conductor is restarted, then the 'window of inconsistency' does not occur for the situation we are discussing. Regards, SURO irc//freenode: suro-patz On 12/16/15 11:39 PM, Joshua Harlow wrote: SURO wrote: Please find the reply inline. Regards, SURO irc//freenode: suro-patz On 12/16/15 7:19 PM, Adrian Otto wrote: On Dec 16, 2015, at 6:24 PM, Joshua Harlowwrote: SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] Does whatever do these operations (mcon?) run in more than one process? Yes, there may be multiple copies of magnum-conductor running on separate hosts. Can it be requested to create in one process then delete in another? If so is that map some distributed/cross-machine/cross-process map that will be inspected to see what else is manipulating a given container (so that the thread can block until that is not the case... basically the map is acting like a operation-lock?) Suro> @Josh, just after this, I had mentioned "The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance." Which suggested multiple instances of magnum-conductors. Also, my idea for implementing this was as follows - magnum-conductors have an 'id' associated, which carries the notion of [0 - (N-1)]th instance of magnum-conductor. Given a request for a container operation, we would always have the bay-id and container-id. I was planning to use 'hash(bay-id, key-id) modulo N' to be the logic to ensure that the right instance picks up the intended request. Let me know if I am missing any nuance of AMQP here. Unsure about nuance of AMQP (I guess that's an implementation detail of this); but what this sounds like is similar to the hash-rings other projects have built (ironic uses one[1], ceilometer is slightly different afaik, see http://www.slideshare.net/EoghanGlynn/hash-based-central-agent-workload-partitioning-37760440 and https://github.com/openstack/ceilometer/blob/master/ceilometer/coordination.py#L48). The typical issue with modulo hashing is changes in N (whether adding new conductors or deleting them) and what that change in N does to ongoing requests, how do u change N in an online manner (and so-on); typically with modulo hashing a large amount of keys get shuffled around[2]. So just a thought but a (consistent) hashing routine/ring... might be worthwhile to look into, and/or talk with those other projects to see what they have been up to. My 2 cents, [1] https://github.com/openstack/ironic/blob/master/ironic/common/hash_ring.py [2] https://en.wikipedia.org/wiki/Consistent_hashing That’s how I interpreted it as well. This is a race prevention technique so that we don’t attempt to act on a resource until it is ready. Another way to deal with this is check the state of the resource, and return a “not ready” error if it’s not ready yet. If this happens in a part of the system that is unattended by a user, we can re-queue the call to retry after a minimum delay so that
Re: [openstack-dev] [magnum] Magnum conductor async container operations
Josh, You pointed out correct! magnum-conductor has monkey-patched code, so the underlying thread module is actually using greenthread. - I would use eventlet.greenthread explicitly, as that would enhance the readability - greenthread has a potential of not yielding by itself, if no i/o, blocking call is made. But in the present scenario, it is not much of a concern, as the container-operation execution is lighter on the client side, and mostly block for the response from the server, after issuing the request. I will update the proposal with this change. Regards, SURO irc//freenode: suro-patz On 12/16/15 11:57 PM, Joshua Harlow wrote: SURO wrote: Josh, Please find my reply inline. Regards, SURO irc//freenode: suro-patz On 12/16/15 6:37 PM, Joshua Harlow wrote: SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] This mechanism can be further refined to achieve more asynchronous behavior. [Phase2] The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance. [Phase0] 5. The hand-off between Mcon and a thread from executor_threadpool can be reflected through new states on the 'container' object. These states can be helpful to recover/audit, in case of Mcon restart. [Phase1] Other considerations - 1. Using eventlet.greenthread instead of real threads => This approach would require further refactoring the execution code and embed yield logic, otherwise a single greenthread would block others to progress. Given, we will extend the mechanism for multiple COEs, and to keep the approach straight forward to begin with, we will use 'threading.Thread' instead of 'eventlet.greenthread'. Also unsure about the above, not quite sure I connect how greenthread usage requires more yield logic (I'm assuming you mean the yield statement here)? Btw if magnum is running with all things monkey patched (which it seems like https://github.com/openstack/magnum/blob/master/magnum/common/rpc_service.py#L33 does) then magnum usage of 'threading.Thread' is a 'eventlet.greenthread' underneath the covers, just fyi. SURO> Let's consider this - function A () { block B; // validation block C; // Blocking op } Now, if we make C a greenthread, as it is, would it not block the entire thread that runs through all the greenthreads? I assumed, it would and that's why we have to incorporate finer grain yield into C to leverage greenthread. If the answer is no, then we can use greenthread. I will validate which version of threading.Thread was getting used. Unsure how to answer this one. If all things are monkey patched then any time a blocking operation (i/o, lock acquisition...) is triggered the internals of eventlet go through a bunch of jumping around to then switch to another green thread (http://eventlet.net/doc/hubs.html). Once u start partially using greenthreads and mixing real threads then you have to start trying to reason about yielding in certain places (and at that point you might as well go to py3.4+ since it has syntax made just for this kind of thinking). Pointer for the thread monkey patching btw: https://github.com/eventlet/eventlet/blob/master/eventlet/patcher.py#L346 https://github.com/eventlet/eventlet/blob/master/eventlet/patcher.py#L212 Easy way to see this: >>> import eventlet >>> eventlet.monkey_patch() >>> import thread >>> thread.start_new_thread.__module__ 'eventlet.green.thread' >>>
Re: [openstack-dev] [magnum] Magnum conductor async container operations
Please find the reply inline. Regards, SURO irc//freenode: suro-patz On 12/16/15 7:19 PM, Adrian Otto wrote: On Dec 16, 2015, at 6:24 PM, Joshua Harlowwrote: SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] Does whatever do these operations (mcon?) run in more than one process? Yes, there may be multiple copies of magnum-conductor running on separate hosts. Can it be requested to create in one process then delete in another? If so is that map some distributed/cross-machine/cross-process map that will be inspected to see what else is manipulating a given container (so that the thread can block until that is not the case... basically the map is acting like a operation-lock?) Suro> @Josh, just after this, I had mentioned "The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance." Which suggested multiple instances of magnum-conductors. Also, my idea for implementing this was as follows - magnum-conductors have an 'id' associated, which carries the notion of [0 - (N-1)]th instance of magnum-conductor. Given a request for a container operation, we would always have the bay-id and container-id. I was planning to use 'hash(bay-id, key-id) modulo N' to be the logic to ensure that the right instance picks up the intended request. Let me know if I am missing any nuance of AMQP here. That’s how I interpreted it as well. This is a race prevention technique so that we don’t attempt to act on a resource until it is ready. Another way to deal with this is check the state of the resource, and return a “not ready” error if it’s not ready yet. If this happens in a part of the system that is unattended by a user, we can re-queue the call to retry after a minimum delay so that it proceeds only when the ready state is reached in the resource, or terminated after a maximum number of attempts, or if the resource enters an error state. This would allow other work to proceed while the retry waits in the queue. Suro> @Adrian, I think async model is to let user issue a sequence of operations, which might be causally ordered. I suggest we should honor the causal ordering than implementing the implicit retry model. As per my above proposal, if we can arbitrate operations for a given bay, given container - we should be able to achieve this ordering. If it's just local in one process, then I have a library for u that can solve the problem of correctly ordering parallel operations ;) What we are aiming for is a bit more distributed. Suro> +1 Adrian This mechanism can be further refined to achieve more asynchronous behavior. [Phase2] The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance. [Phase0] 5. The hand-off between Mcon and a thread from executor_threadpool can be reflected through new states on the 'container' object. These states can be helpful to recover/audit, in case of Mcon restart. [Phase1] Other considerations - 1. Using eventlet.greenthread instead of real threads => This approach would require further refactoring the execution code and embed yield logic, otherwise a single greenthread would block others to progress. Given, we will extend the mechanism for multiple COEs, and to keep the approach straight forward to begin with, we will use 'threading.Thread' instead of
[openstack-dev] [magnum] Magnum conductor async container operations
Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] This mechanism can be further refined to achieve more asynchronous behavior. [Phase2] The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance. [Phase0] 5. The hand-off between Mcon and a thread from executor_threadpool can be reflected through new states on the 'container' object. These states can be helpful to recover/audit, in case of Mcon restart. [Phase1] Other considerations - 1. Using eventlet.greenthread instead of real threads => This approach would require further refactoring the execution code and embed yield logic, otherwise a single greenthread would block others to progress. Given, we will extend the mechanism for multiple COEs, and to keep the approach straight forward to begin with, we will use 'threading.Thread' instead of 'eventlet.greenthread'. Refs:- [1] - https://blueprints.launchpad.net/magnum/+spec/async-container-operations -- Regards, SURO irc//freenode: suro-patz __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [magnum] Magnum conductor async container operations
SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] Does whatever do these operations (mcon?) run in more than one process? Can it be requested to create in one process then delete in another? If so is that map some distributed/cross-machine/cross-process map that will be inspected to see what else is manipulating a given container (so that the thread can block until that is not the case... basically the map is acting like a operation-lock?) If it's just local in one process, then I have a library for u that can solve the problem of correctly ordering parallel operations ;) This mechanism can be further refined to achieve more asynchronous behavior. [Phase2] The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance. [Phase0] 5. The hand-off between Mcon and a thread from executor_threadpool can be reflected through new states on the 'container' object. These states can be helpful to recover/audit, in case of Mcon restart. [Phase1] Other considerations - 1. Using eventlet.greenthread instead of real threads => This approach would require further refactoring the execution code and embed yield logic, otherwise a single greenthread would block others to progress. Given, we will extend the mechanism for multiple COEs, and to keep the approach straight forward to begin with, we will use 'threading.Thread' instead of 'eventlet.greenthread'. Refs:- [1] - https://blueprints.launchpad.net/magnum/+spec/async-container-operations __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [magnum] Magnum conductor async container operations
> On Dec 16, 2015, at 6:24 PM, Joshua Harlowwrote: > > SURO wrote: >> Hi all, >> Please review and provide feedback on the following design proposal for >> implementing the blueprint[1] on async-container-operations - >> >> 1. Magnum-conductor would have a pool of threads for executing the >> container operations, viz. executor_threadpool. The size of the >> executor_threadpool will be configurable. [Phase0] >> 2. Every time, Magnum-conductor(Mcon) receives a >> container-operation-request from Magnum-API(Mapi), it will do the >> initial validation, housekeeping and then pick a thread from the >> executor_threadpool to execute the rest of the operations. Thus Mcon >> will return from the RPC request context much faster without blocking >> the Mapi. If the executor_threadpool is empty, Mcon will execute in a >> manner it does today, i.e. synchronously - this will be the >> rate-limiting mechanism - thus relaying the feedback of exhaustion. >> [Phase0] >> How often we are hitting this scenario, may be indicative to the >> operator to create more workers for Mcon. >> 3. Blocking class of operations - There will be a class of operations, >> which can not be made async, as they are supposed to return >> result/content inline, e.g. 'container-logs'. [Phase0] >> 4. Out-of-order considerations for NonBlocking class of operations - >> there is a possible race around condition for create followed by >> start/delete of a container, as things would happen in parallel. To >> solve this, we will maintain a map of a container and executing thread, >> for current execution. If we find a request for an operation for a >> container-in-execution, we will block till the thread completes the >> execution. [Phase0] > > Does whatever do these operations (mcon?) run in more than one process? Yes, there may be multiple copies of magnum-conductor running on separate hosts. > Can it be requested to create in one process then delete in another? If so is > that map some distributed/cross-machine/cross-process map that will be > inspected to see what else is manipulating a given container (so that the > thread can block until that is not the case... basically the map is acting > like a operation-lock?) That’s how I interpreted it as well. This is a race prevention technique so that we don’t attempt to act on a resource until it is ready. Another way to deal with this is check the state of the resource, and return a “not ready” error if it’s not ready yet. If this happens in a part of the system that is unattended by a user, we can re-queue the call to retry after a minimum delay so that it proceeds only when the ready state is reached in the resource, or terminated after a maximum number of attempts, or if the resource enters an error state. This would allow other work to proceed while the retry waits in the queue. > If it's just local in one process, then I have a library for u that can solve > the problem of correctly ordering parallel operations ;) What we are aiming for is a bit more distributed. Adrian >> This mechanism can be further refined to achieve more asynchronous >> behavior. [Phase2] >> The approach above puts a prerequisite that operations for a given >> container on a given Bay would go to the same Magnum-conductor instance. >> [Phase0] >> 5. The hand-off between Mcon and a thread from executor_threadpool can >> be reflected through new states on the 'container' object. These states >> can be helpful to recover/audit, in case of Mcon restart. [Phase1] >> >> Other considerations - >> 1. Using eventlet.greenthread instead of real threads => This approach >> would require further refactoring the execution code and embed yield >> logic, otherwise a single greenthread would block others to progress. >> Given, we will extend the mechanism for multiple COEs, and to keep the >> approach straight forward to begin with, we will use 'threading.Thread' >> instead of 'eventlet.greenthread'. >> >> >> Refs:- >> [1] - >> https://blueprints.launchpad.net/magnum/+spec/async-container-operations >> > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [magnum] Magnum conductor async container operations
SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] This mechanism can be further refined to achieve more asynchronous behavior. [Phase2] The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance. [Phase0] 5. The hand-off between Mcon and a thread from executor_threadpool can be reflected through new states on the 'container' object. These states can be helpful to recover/audit, in case of Mcon restart. [Phase1] Other considerations - 1. Using eventlet.greenthread instead of real threads => This approach would require further refactoring the execution code and embed yield logic, otherwise a single greenthread would block others to progress. Given, we will extend the mechanism for multiple COEs, and to keep the approach straight forward to begin with, we will use 'threading.Thread' instead of 'eventlet.greenthread'. Also unsure about the above, not quite sure I connect how greenthread usage requires more yield logic (I'm assuming you mean the yield statement here)? Btw if magnum is running with all things monkey patched (which it seems like https://github.com/openstack/magnum/blob/master/magnum/common/rpc_service.py#L33 does) then magnum usage of 'threading.Thread' is a 'eventlet.greenthread' underneath the covers, just fyi. Refs:- [1] - https://blueprints.launchpad.net/magnum/+spec/async-container-operations __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [magnum] Magnum conductor async container operations
On 12/16/15 11:39 PM, Joshua Harlow wrote: SURO wrote: Please find the reply inline. Regards, SURO irc//freenode: suro-patz On 12/16/15 7:19 PM, Adrian Otto wrote: On Dec 16, 2015, at 6:24 PM, Joshua Harlowwrote: SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] Does whatever do these operations (mcon?) run in more than one process? Yes, there may be multiple copies of magnum-conductor running on separate hosts. Can it be requested to create in one process then delete in another? If so is that map some distributed/cross-machine/cross-process map that will be inspected to see what else is manipulating a given container (so that the thread can block until that is not the case... basically the map is acting like a operation-lock?) Suro> @Josh, just after this, I had mentioned "The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance." Which suggested multiple instances of magnum-conductors. Also, my idea for implementing this was as follows - magnum-conductors have an 'id' associated, which carries the notion of [0 - (N-1)]th instance of magnum-conductor. Given a request for a container operation, we would always have the bay-id and container-id. I was planning to use 'hash(bay-id, key-id) modulo N' to be the logic to ensure that the right instance picks up the intended request. Let me know if I am missing any nuance of AMQP here. Unsure about nuance of AMQP (I guess that's an implementation detail of this); but what this sounds like is similar to the hash-rings other projects have built (ironic uses one[1], ceilometer is slightly different afaik, see http://www.slideshare.net/EoghanGlynn/hash-based-central-agent-workload-partitioning-37760440 and https://github.com/openstack/ceilometer/blob/master/ceilometer/coordination.py#L48). The typical issue with modulo hashing is changes in N (whether adding new conductors or deleting them) and what that change in N does to ongoing requests, how do u change N in an online manner (and so-on); typically with modulo hashing a large amount of keys get shuffled around[2]. So just a thought but a (consistent) hashing routine/ring... might be worthwhile to look into, and/or talk with those other projects to see what they have been up to. Suro> When a new worker instance is added, I guess it is done by restarting the magnum-conductor service. So, the sequencing would get reset altogether and the stickiness will resume afresh. I will go through your pointers, and make sure I am not missing anything here. My 2 cents, [1] https://github.com/openstack/ironic/blob/master/ironic/common/hash_ring.py [2] https://en.wikipedia.org/wiki/Consistent_hashing That’s how I interpreted it as well. This is a race prevention technique so that we don’t attempt to act on a resource until it is ready. Another way to deal with this is check the state of the resource, and return a “not ready” error if it’s not ready yet. If this happens in a part of the system that is unattended by a user, we can re-queue the call to retry after a minimum delay so that it proceeds only when the ready state is reached in the resource, or terminated after a maximum number of attempts, or if the resource enters an error state. This would allow other work to proceed while the retry waits in the queue. Suro> @Adrian, I think async model is to let user issue a
Re: [openstack-dev] [magnum] Magnum conductor async container operations
SURO wrote: Please find the reply inline. Regards, SURO irc//freenode: suro-patz On 12/16/15 7:19 PM, Adrian Otto wrote: On Dec 16, 2015, at 6:24 PM, Joshua Harlowwrote: SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] Does whatever do these operations (mcon?) run in more than one process? Yes, there may be multiple copies of magnum-conductor running on separate hosts. Can it be requested to create in one process then delete in another? If so is that map some distributed/cross-machine/cross-process map that will be inspected to see what else is manipulating a given container (so that the thread can block until that is not the case... basically the map is acting like a operation-lock?) Suro> @Josh, just after this, I had mentioned "The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance." Which suggested multiple instances of magnum-conductors. Also, my idea for implementing this was as follows - magnum-conductors have an 'id' associated, which carries the notion of [0 - (N-1)]th instance of magnum-conductor. Given a request for a container operation, we would always have the bay-id and container-id. I was planning to use 'hash(bay-id, key-id) modulo N' to be the logic to ensure that the right instance picks up the intended request. Let me know if I am missing any nuance of AMQP here. Unsure about nuance of AMQP (I guess that's an implementation detail of this); but what this sounds like is similar to the hash-rings other projects have built (ironic uses one[1], ceilometer is slightly different afaik, see http://www.slideshare.net/EoghanGlynn/hash-based-central-agent-workload-partitioning-37760440 and https://github.com/openstack/ceilometer/blob/master/ceilometer/coordination.py#L48). The typical issue with modulo hashing is changes in N (whether adding new conductors or deleting them) and what that change in N does to ongoing requests, how do u change N in an online manner (and so-on); typically with modulo hashing a large amount of keys get shuffled around[2]. So just a thought but a (consistent) hashing routine/ring... might be worthwhile to look into, and/or talk with those other projects to see what they have been up to. My 2 cents, [1] https://github.com/openstack/ironic/blob/master/ironic/common/hash_ring.py [2] https://en.wikipedia.org/wiki/Consistent_hashing That’s how I interpreted it as well. This is a race prevention technique so that we don’t attempt to act on a resource until it is ready. Another way to deal with this is check the state of the resource, and return a “not ready” error if it’s not ready yet. If this happens in a part of the system that is unattended by a user, we can re-queue the call to retry after a minimum delay so that it proceeds only when the ready state is reached in the resource, or terminated after a maximum number of attempts, or if the resource enters an error state. This would allow other work to proceed while the retry waits in the queue. Suro> @Adrian, I think async model is to let user issue a sequence of operations, which might be causally ordered. I suggest we should honor the causal ordering than implementing the implicit retry model. As per my above proposal, if we can arbitrate operations for a given bay, given container - we should be able to achieve this ordering. If it's just local in one process,
Re: [openstack-dev] [magnum] Magnum conductor async container operations
Josh, Please find my reply inline. Regards, SURO irc//freenode: suro-patz On 12/16/15 6:37 PM, Joshua Harlow wrote: SURO wrote: Hi all, Please review and provide feedback on the following design proposal for implementing the blueprint[1] on async-container-operations - 1. Magnum-conductor would have a pool of threads for executing the container operations, viz. executor_threadpool. The size of the executor_threadpool will be configurable. [Phase0] 2. Every time, Magnum-conductor(Mcon) receives a container-operation-request from Magnum-API(Mapi), it will do the initial validation, housekeeping and then pick a thread from the executor_threadpool to execute the rest of the operations. Thus Mcon will return from the RPC request context much faster without blocking the Mapi. If the executor_threadpool is empty, Mcon will execute in a manner it does today, i.e. synchronously - this will be the rate-limiting mechanism - thus relaying the feedback of exhaustion. [Phase0] How often we are hitting this scenario, may be indicative to the operator to create more workers for Mcon. 3. Blocking class of operations - There will be a class of operations, which can not be made async, as they are supposed to return result/content inline, e.g. 'container-logs'. [Phase0] 4. Out-of-order considerations for NonBlocking class of operations - there is a possible race around condition for create followed by start/delete of a container, as things would happen in parallel. To solve this, we will maintain a map of a container and executing thread, for current execution. If we find a request for an operation for a container-in-execution, we will block till the thread completes the execution. [Phase0] This mechanism can be further refined to achieve more asynchronous behavior. [Phase2] The approach above puts a prerequisite that operations for a given container on a given Bay would go to the same Magnum-conductor instance. [Phase0] 5. The hand-off between Mcon and a thread from executor_threadpool can be reflected through new states on the 'container' object. These states can be helpful to recover/audit, in case of Mcon restart. [Phase1] Other considerations - 1. Using eventlet.greenthread instead of real threads => This approach would require further refactoring the execution code and embed yield logic, otherwise a single greenthread would block others to progress. Given, we will extend the mechanism for multiple COEs, and to keep the approach straight forward to begin with, we will use 'threading.Thread' instead of 'eventlet.greenthread'. Also unsure about the above, not quite sure I connect how greenthread usage requires more yield logic (I'm assuming you mean the yield statement here)? Btw if magnum is running with all things monkey patched (which it seems like https://github.com/openstack/magnum/blob/master/magnum/common/rpc_service.py#L33 does) then magnum usage of 'threading.Thread' is a 'eventlet.greenthread' underneath the covers, just fyi. SURO> Let's consider this - function A () { block B; // validation block C; // Blocking op } Now, if we make C a greenthread, as it is, would it not block the entire thread that runs through all the greenthreads? I assumed, it would and that's why we have to incorporate finer grain yield into C to leverage greenthread. If the answer is no, then we can use greenthread. I will validate which version of threading.Thread was getting used. In that case, keeping the code for thread.Threading is portable, as it would work as desired, even if we remove monkey_patching, right? Refs:- [1] - https://blueprints.launchpad.net/magnum/+spec/async-container-operations __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev