Re: Coordinating actions in a service
On 8 May 2015 at 20:35, Gustavo Niemeyer gust...@niemeyer.net wrote: On Fri, May 8, 2015 at 10:24 AM, John Weldon johnweld...@gmail.com wrote: Hi Stuart; I think this is addressed in the proposed work for Actions 2.0 In the current model you'd have to manage all of this yourself. Actions can only be targeted to specific units in the current implementation, so you'd have to manage the distribution outside of actions (or else, as you suggest, some sort of generalised semaphore service, and a way to find all units of a service and queue up actions for all of them and have the actions individually manage themselves whether to run or not) The plan is to allow actions to be targeted at 1) specific units in a service, 2) leaders only, 3) all units in a service, or 4) a subset of units in a service. This would still be a little tricky for your use case, but you could at least manage all the logic in an action targeted at only the leader for example. None of these seem to fix the issue, though. The requirement is to execute on all units, but not at the same time. Also, the leader has no way to communicate with peer units without the hook terminating, right? There should be a way to postpone the result of an action to a moment past the end of the hook, but I actually think the proper way to fix this is to avoid the avalanche in the first place, by default: when dispatching an action to all units of a service, roll out in a sane way rather than doing all at once. If I can run an action on the leader, and if the leader can run actions on its peers, then I have a mechanism to do all sorts of weird coordination (the leader kicks off actions on peers as it deems appropriate, collates the results and returns them). I do agree that a good implementation of service level actions would handle most use cases without this level of complexity. Two options (One unit at a time, and all units simultaneously) meets most needs. N units at a time or N% units at a time is more esoteric. The leader being able to run actions on other units is a good thing (and I think planned?). I'd like my charm to schedule a weekly repair job on the cluster. At the moment, each unit has a cron job (spread out throughout the week) that runs the repair on the individual node. If would be much nicer to have a cronjob that does 'if is-leader, then run-action repair-cluster'. -- Stuart Bishop stuart.bis...@canonical.com -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
Re: Coordinating actions in a service
On Fri, May 8, 2015 at 10:24 AM, John Weldon johnweld...@gmail.com wrote: Hi Stuart; I think this is addressed in the proposed work for Actions 2.0 In the current model you'd have to manage all of this yourself. Actions can only be targeted to specific units in the current implementation, so you'd have to manage the distribution outside of actions (or else, as you suggest, some sort of generalised semaphore service, and a way to find all units of a service and queue up actions for all of them and have the actions individually manage themselves whether to run or not) The plan is to allow actions to be targeted at 1) specific units in a service, 2) leaders only, 3) all units in a service, or 4) a subset of units in a service. This would still be a little tricky for your use case, but you could at least manage all the logic in an action targeted at only the leader for example. None of these seem to fix the issue, though. The requirement is to execute on all units, but not at the same time. Also, the leader has no way to communicate with peer units without the hook terminating, right? There should be a way to postpone the result of an action to a moment past the end of the hook, but I actually think the proper way to fix this is to avoid the avalanche in the first place, by default: when dispatching an action to all units of a service, roll out in a sane way rather than doing all at once. gustavo @ http://niemeyer.net -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
Re: Coordinating actions in a service
Hi Stuart; I think this is addressed in the proposed work for Actions 2.0 In the current model you'd have to manage all of this yourself. Actions can only be targeted to specific units in the current implementation, so you'd have to manage the distribution outside of actions (or else, as you suggest, some sort of generalised semaphore service, and a way to find all units of a service and queue up actions for all of them and have the actions individually manage themselves whether to run or not) The plan is to allow actions to be targeted at 1) specific units in a service, 2) leaders only, 3) all units in a service, or 4) a subset of units in a service. This would still be a little tricky for your use case, but you could at least manage all the logic in an action targeted at only the leader for example. Cheers, -- John Weldon On Fri, May 8, 2015 at 5:17 AM, Stuart Bishop stuart.bis...@canonical.com wrote: Hi. I have several potentially long running and expensive database operations I'd like to wrap as actions, which will generally be run on just one unit or on all units of the service. The problem I have is running an action on all units of the service. For HA, I need to ensure that, if there is more than one unit, then it may only run on (num_units/2)-1 units at a time. ie. if I fire off an action on all units of a 5 unit cluster, then only two units at a time may run the action and the other units will block until they are done. Leadership is needed to do coordination like this, but I can't see how to use it with actions. The action has no way to request permission from the leader and no way to get a response. Can anyone tell me how to do this with the current model? I don't think it can be done, which I guess makes this a feature request for Actions 2.0. Or perhaps this is another use case for a general locking service providing semaphores (which would need some tricky semantics, since I'd need a semaphore that can be acquired by max(1,(num_units/2)-1) units at a time and num_units might change while waiting for the lock and before releasing it). Or is this just out of scope, and the operator needs to do this sort of coordination themselves? There is plenty of other coordination that can't be embedded in the charm as it is (eg. don't drop an old unit if it is still being used to bootstrap a new unit into the cluster). -- Stuart Bishop stuart.bis...@canonical.com -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
Re: Coordinating actions in a service
Actions are not hooks, but they are run in a similar fashion as hooks. I don't know of any restriction keeping actions from communicating with peers before the action completes, but that would be something we'd need to address for Actions 2.0 certainly. I like the concept of metering actions out to units of a service; I would expect that to happen when targeting an action to the service instead of specific units which would require the leader to decide how to fan out the actions. The implementation details for Actions 2.0 are still very conceptual; I'll certainly be calling on you for opinions this go around as we're implementing them :) Cheers, -- John Weldon On Fri, May 8, 2015 at 6:35 AM, Gustavo Niemeyer gust...@niemeyer.net wrote: On Fri, May 8, 2015 at 10:24 AM, John Weldon johnweld...@gmail.com wrote: Hi Stuart; I think this is addressed in the proposed work for Actions 2.0 In the current model you'd have to manage all of this yourself. Actions can only be targeted to specific units in the current implementation, so you'd have to manage the distribution outside of actions (or else, as you suggest, some sort of generalised semaphore service, and a way to find all units of a service and queue up actions for all of them and have the actions individually manage themselves whether to run or not) The plan is to allow actions to be targeted at 1) specific units in a service, 2) leaders only, 3) all units in a service, or 4) a subset of units in a service. This would still be a little tricky for your use case, but you could at least manage all the logic in an action targeted at only the leader for example. None of these seem to fix the issue, though. The requirement is to execute on all units, but not at the same time. Also, the leader has no way to communicate with peer units without the hook terminating, right? There should be a way to postpone the result of an action to a moment past the end of the hook, but I actually think the proper way to fix this is to avoid the avalanche in the first place, by default: when dispatching an action to all units of a service, roll out in a sane way rather than doing all at once. gustavo @ http://niemeyer.net -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
Re: Coordinating actions in a service
On 8 May 2015 at 14:28, Gustavo Niemeyer gust...@niemeyer.net wrote: - The requirement of running an action across all units but without executing in all of them at once also sounds very common. In fact, it's so common that it should probably be the default when somebody dispatches an action to all units of a service. +1 to rolling actions (and hooks?) being the default! I'm becoming more convinced that actions is the way to handle controlled rollouts of new code to units, my day-to-day most common orchestration (i.e. must be automated). Currently working on a way to orchestrate this externally, but that as the default would be great (coupled with health-check after each action completion) Thanks -- Simon -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
Re: Coordinating actions in a service
Hi Stuart, I see two ways to fix this issue, and both are probably worth implementing for their own good: - We should be able to have an action that lives past the execution of the action hook, upon request. This would enable the leader unit to communicate with all peer units to do anything it wants. The support for dispatching an action to the leader is scheduled for the upcoming work already. - The requirement of running an action across all units but without executing in all of them at once also sounds very common. In fact, it's so common that it should probably be the default when somebody dispatches an action to all units of a service. On Fri, May 8, 2015 at 9:17 AM, Stuart Bishop stuart.bis...@canonical.com wrote: Hi. I have several potentially long running and expensive database operations I'd like to wrap as actions, which will generally be run on just one unit or on all units of the service. The problem I have is running an action on all units of the service. For HA, I need to ensure that, if there is more than one unit, then it may only run on (num_units/2)-1 units at a time. ie. if I fire off an action on all units of a 5 unit cluster, then only two units at a time may run the action and the other units will block until they are done. Leadership is needed to do coordination like this, but I can't see how to use it with actions. The action has no way to request permission from the leader and no way to get a response. Can anyone tell me how to do this with the current model? I don't think it can be done, which I guess makes this a feature request for Actions 2.0. Or perhaps this is another use case for a general locking service providing semaphores (which would need some tricky semantics, since I'd need a semaphore that can be acquired by max(1,(num_units/2)-1) units at a time and num_units might change while waiting for the lock and before releasing it). Or is this just out of scope, and the operator needs to do this sort of coordination themselves? There is plenty of other coordination that can't be embedded in the charm as it is (eg. don't drop an old unit if it is still being used to bootstrap a new unit into the cluster). -- Stuart Bishop stuart.bis...@canonical.com -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju -- gustavo @ http://niemeyer.net -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
Coordinating actions in a service
Hi. I have several potentially long running and expensive database operations I'd like to wrap as actions, which will generally be run on just one unit or on all units of the service. The problem I have is running an action on all units of the service. For HA, I need to ensure that, if there is more than one unit, then it may only run on (num_units/2)-1 units at a time. ie. if I fire off an action on all units of a 5 unit cluster, then only two units at a time may run the action and the other units will block until they are done. Leadership is needed to do coordination like this, but I can't see how to use it with actions. The action has no way to request permission from the leader and no way to get a response. Can anyone tell me how to do this with the current model? I don't think it can be done, which I guess makes this a feature request for Actions 2.0. Or perhaps this is another use case for a general locking service providing semaphores (which would need some tricky semantics, since I'd need a semaphore that can be acquired by max(1,(num_units/2)-1) units at a time and num_units might change while waiting for the lock and before releasing it). Or is this just out of scope, and the operator needs to do this sort of coordination themselves? There is plenty of other coordination that can't be embedded in the charm as it is (eg. don't drop an old unit if it is still being used to bootstrap a new unit into the cluster). -- Stuart Bishop stuart.bis...@canonical.com -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju