Re: Coordinating actions in a service

2015-05-12 Thread Stuart Bishop
On 8 May 2015 at 20:35, Gustavo Niemeyer gust...@niemeyer.net wrote:

 On Fri, May 8, 2015 at 10:24 AM, John Weldon johnweld...@gmail.com wrote:

 Hi Stuart;

 I think this is addressed in the proposed work for Actions 2.0

 In the current model you'd have to manage all of this yourself.  Actions
 can only be targeted to specific units in the current implementation, so
 you'd have to manage the distribution outside of actions (or else, as you
 suggest, some sort of generalised semaphore service, and a way to find all
 units of a service and queue up actions for all of them and have the actions
 individually manage themselves whether to run or not)

 The plan is to allow actions to be targeted at 1) specific units in a
 service, 2) leaders only, 3) all units in a service, or 4) a subset of units
 in a service.  This would still be a little tricky for your use case, but
 you could at least manage all the logic in an action targeted at only the
 leader for example.


 None of these seem to fix the issue, though. The requirement is to execute
 on all units, but not at the same time. Also, the leader has no way to
 communicate with peer units without the hook terminating, right? There
 should be a way to postpone the result of an action to a moment past the end
 of the hook, but I actually think the proper way to fix this is to avoid the
 avalanche in the first place, by default: when dispatching an action to all
 units of a service, roll out in a sane way rather than doing all at once.

If I can run an action on the leader, and if the leader can run
actions on its peers, then I have a mechanism to do all sorts of weird
coordination (the leader kicks off actions on peers as it deems
appropriate, collates the results and returns them). I do agree that a
good implementation of service level actions would handle most use
cases without this level of complexity. Two options (One unit at a
time, and all units simultaneously) meets most needs. N units at a
time or N% units at a time is more esoteric.

The leader being able to run actions on other units is a good thing
(and I think planned?). I'd like my charm to schedule a weekly repair
job on the cluster. At the moment, each unit has a cron job (spread
out throughout the week) that runs the repair on the individual node.
If would be much nicer to have a cronjob that does 'if is-leader, then
run-action repair-cluster'.

-- 
Stuart Bishop stuart.bis...@canonical.com

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: Coordinating actions in a service

2015-05-08 Thread Gustavo Niemeyer
On Fri, May 8, 2015 at 10:24 AM, John Weldon johnweld...@gmail.com wrote:

 Hi Stuart;

 I think this is addressed in the proposed work for Actions 2.0

 In the current model you'd have to manage all of this yourself.  Actions
 can only be targeted to specific units in the current implementation, so
 you'd have to manage the distribution outside of actions (or else, as you
 suggest, some sort of generalised semaphore service, and a way to find all
 units of a service and queue up actions for all of them and have the
 actions individually manage themselves whether to run or not)

 The plan is to allow actions to be targeted at 1) specific units in a
 service, 2) leaders only, 3) all units in a service, or 4) a subset of
 units in a service.  This would still be a little tricky for your use case,
 but you could at least manage all the logic in an action targeted at only
 the leader for example.


None of these seem to fix the issue, though. The requirement is to execute
on all units, but not at the same time. Also, the leader has no way to
communicate with peer units without the hook terminating, right? There
should be a way to postpone the result of an action to a moment past the
end of the hook, but I actually think the proper way to fix this is to
avoid the avalanche in the first place, by default: when dispatching an
action to all units of a service, roll out in a sane way rather than doing
all at once.


gustavo @ http://niemeyer.net
-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: Coordinating actions in a service

2015-05-08 Thread John Weldon
Hi Stuart;

I think this is addressed in the proposed work for Actions 2.0

In the current model you'd have to manage all of this yourself.  Actions
can only be targeted to specific units in the current implementation, so
you'd have to manage the distribution outside of actions (or else, as you
suggest, some sort of generalised semaphore service, and a way to find all
units of a service and queue up actions for all of them and have the
actions individually manage themselves whether to run or not)

The plan is to allow actions to be targeted at 1) specific units in a
service, 2) leaders only, 3) all units in a service, or 4) a subset of
units in a service.  This would still be a little tricky for your use case,
but you could at least manage all the logic in an action targeted at only
the leader for example.

Cheers,


--
John Weldon

On Fri, May 8, 2015 at 5:17 AM, Stuart Bishop stuart.bis...@canonical.com
wrote:

 Hi.

 I have several potentially long running and expensive database
 operations I'd like to wrap as actions, which will generally be run on
 just one unit or on all units of the service.

 The problem I have is running an action on all units of the service.
 For HA, I need to ensure that, if there is more than one unit, then it
 may only run on (num_units/2)-1 units at a time. ie. if I fire off an
 action on all units of a 5 unit cluster, then only two units at a time
 may run the action and the other units will block until they are done.

 Leadership is needed to do coordination like this, but I can't see how
 to use it with actions. The action has no way to request permission
 from the leader and no way to get a response.

 Can anyone tell me how to do this with the current model?

 I don't think it can be done, which I guess makes this a feature
 request for Actions 2.0. Or perhaps this is another use case for a
 general locking service providing semaphores (which would need some
 tricky semantics, since I'd need a semaphore that can be acquired by
 max(1,(num_units/2)-1) units at a time and num_units might change
 while waiting for the lock and before releasing it).

 Or is this just out of scope, and the operator needs to do this sort
 of coordination themselves? There is plenty of other coordination that
 can't be embedded in the charm as it is (eg. don't drop an old unit if
 it is still being used to bootstrap a new unit into the cluster).

 --
 Stuart Bishop stuart.bis...@canonical.com

 --
 Juju mailing list
 Juju@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: Coordinating actions in a service

2015-05-08 Thread John Weldon
Actions are not hooks, but they are run in a similar fashion as hooks.  I
don't know of any restriction keeping actions from communicating with peers
before the action completes, but that would be something we'd need to
address for Actions 2.0 certainly.

I like the concept of metering actions out to units of a service; I would
expect that to happen when targeting an action to the service instead of
specific units which would require the leader to decide how to fan out the
actions.

The implementation details for Actions 2.0 are still very conceptual; I'll
certainly be calling on you for opinions this go around as we're
implementing them :)

Cheers,


--
John Weldon

On Fri, May 8, 2015 at 6:35 AM, Gustavo Niemeyer gust...@niemeyer.net
wrote:



 On Fri, May 8, 2015 at 10:24 AM, John Weldon johnweld...@gmail.com
 wrote:

 Hi Stuart;

 I think this is addressed in the proposed work for Actions 2.0

 In the current model you'd have to manage all of this yourself.  Actions
 can only be targeted to specific units in the current implementation, so
 you'd have to manage the distribution outside of actions (or else, as you
 suggest, some sort of generalised semaphore service, and a way to find all
 units of a service and queue up actions for all of them and have the
 actions individually manage themselves whether to run or not)

 The plan is to allow actions to be targeted at 1) specific units in a
 service, 2) leaders only, 3) all units in a service, or 4) a subset of
 units in a service.  This would still be a little tricky for your use case,
 but you could at least manage all the logic in an action targeted at only
 the leader for example.


 None of these seem to fix the issue, though. The requirement is to execute
 on all units, but not at the same time. Also, the leader has no way to
 communicate with peer units without the hook terminating, right? There
 should be a way to postpone the result of an action to a moment past the
 end of the hook, but I actually think the proper way to fix this is to
 avoid the avalanche in the first place, by default: when dispatching an
 action to all units of a service, roll out in a sane way rather than doing
 all at once.


 gustavo @ http://niemeyer.net

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: Coordinating actions in a service

2015-05-08 Thread Simon Davy
On 8 May 2015 at 14:28, Gustavo Niemeyer gust...@niemeyer.net wrote:

 - The requirement of running an action across all units but without
 executing in all of them at once also sounds very common. In fact, it's so
 common that it should probably be the default when somebody dispatches an
 action to all units of a service.

+1 to rolling actions (and hooks?) being the default!

I'm becoming more convinced that actions is the way to handle
controlled rollouts of new code to units, my day-to-day most common
orchestration (i.e. must be automated).

Currently working on a way to orchestrate this externally, but that as
the default would be great (coupled with health-check after each
action completion)

Thanks

-- 
Simon

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Re: Coordinating actions in a service

2015-05-08 Thread Gustavo Niemeyer
Hi Stuart,

I see two ways to fix this issue, and both are probably worth implementing
for their own good:

- We should be able to have an action that lives past the execution of the
action hook, upon request. This would enable the leader unit to communicate
with all peer units to do anything it wants. The support for dispatching an
action to the leader is scheduled for the upcoming work already.

- The requirement of running an action across all units but without
executing in all of them at once also sounds very common. In fact, it's so
common that it should probably be the default when somebody dispatches an
action to all units of a service.




On Fri, May 8, 2015 at 9:17 AM, Stuart Bishop stuart.bis...@canonical.com
wrote:

 Hi.

 I have several potentially long running and expensive database
 operations I'd like to wrap as actions, which will generally be run on
 just one unit or on all units of the service.

 The problem I have is running an action on all units of the service.
 For HA, I need to ensure that, if there is more than one unit, then it
 may only run on (num_units/2)-1 units at a time. ie. if I fire off an
 action on all units of a 5 unit cluster, then only two units at a time
 may run the action and the other units will block until they are done.

 Leadership is needed to do coordination like this, but I can't see how
 to use it with actions. The action has no way to request permission
 from the leader and no way to get a response.

 Can anyone tell me how to do this with the current model?

 I don't think it can be done, which I guess makes this a feature
 request for Actions 2.0. Or perhaps this is another use case for a
 general locking service providing semaphores (which would need some
 tricky semantics, since I'd need a semaphore that can be acquired by
 max(1,(num_units/2)-1) units at a time and num_units might change
 while waiting for the lock and before releasing it).

 Or is this just out of scope, and the operator needs to do this sort
 of coordination themselves? There is plenty of other coordination that
 can't be embedded in the charm as it is (eg. don't drop an old unit if
 it is still being used to bootstrap a new unit into the cluster).

 --
 Stuart Bishop stuart.bis...@canonical.com

 --
 Juju mailing list
 Juju@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju




-- 

gustavo @ http://niemeyer.net
-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju


Coordinating actions in a service

2015-05-08 Thread Stuart Bishop
Hi.

I have several potentially long running and expensive database
operations I'd like to wrap as actions, which will generally be run on
just one unit or on all units of the service.

The problem I have is running an action on all units of the service.
For HA, I need to ensure that, if there is more than one unit, then it
may only run on (num_units/2)-1 units at a time. ie. if I fire off an
action on all units of a 5 unit cluster, then only two units at a time
may run the action and the other units will block until they are done.

Leadership is needed to do coordination like this, but I can't see how
to use it with actions. The action has no way to request permission
from the leader and no way to get a response.

Can anyone tell me how to do this with the current model?

I don't think it can be done, which I guess makes this a feature
request for Actions 2.0. Or perhaps this is another use case for a
general locking service providing semaphores (which would need some
tricky semantics, since I'd need a semaphore that can be acquired by
max(1,(num_units/2)-1) units at a time and num_units might change
while waiting for the lock and before releasing it).

Or is this just out of scope, and the operator needs to do this sort
of coordination themselves? There is plenty of other coordination that
can't be embedded in the charm as it is (eg. don't drop an old unit if
it is still being used to bootstrap a new unit into the cluster).

-- 
Stuart Bishop stuart.bis...@canonical.com

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju