Re: Automatic retries of hooks

2016-01-21 Thread roger peppe
On 21 January 2016 at 09:51, James Page  wrote:
> On Wed, 20 Jan 2016 at 20:31 William Reade 
> wrote:
>>
>> On Wed, Jan 20, 2016 at 8:01 PM, Dean Henrichsmeyer 
>> wrote:
>>>
>>> You realize James was complaining and not celebrating the "success" ? The
>>> fact that we can have a discussion trying to determine whether something is
>>> a bug or a feature indicates a problem.
>>
>>
>> Sorry, I didn't intend to disparage his experience; I took it as
>> legitimate and reasonable surprise at a change we evidently didn't
>> communicate adequately. But I don't think it's a misfeature; I think it's a
>> necessary approach, in service of global reliability in challenging
>> environments.
>
>
> You didn't - don't worry!
>
>>
>> But: if there are times it's inconvenient and not just surprising, we
>> should surely be able to disable it. Gabriel/Bogdan, would you be able to
>> address this?
>
>
> I Agree with David's +1 on this feature with the condition that it can be
> disabled so that charm authors actually understand the behaviour of the
> software they are deploying.
>
> Please lets also ensure the retry limit is sensible - otherwise we might end
> up with end-users waiting a loong time to understand that something is
> not recoverable which could be equally as damaging.

It would perhaps be good if the default status showed that
the hook was being retried. On the other hand, if retries
become common, then it could be the basis of any number
of false-alarm support calls.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-21 Thread James Page
On Wed, 20 Jan 2016 at 20:31 William Reade 
wrote:

> On Wed, Jan 20, 2016 at 8:01 PM, Dean Henrichsmeyer 
> wrote:
>>
>> You realize James was complaining and not celebrating the "success" ? The
>> fact that we can have a discussion trying to determine whether something is
>> a bug or a feature indicates a problem.
>>
>
> Sorry, I didn't intend to disparage his experience; I took it as
> legitimate and reasonable surprise at a change we evidently didn't
> communicate adequately. But I don't think it's a misfeature; I think it's a
> necessary approach, in service of global reliability in challenging
> environments.
>

You didn't - don't worry!


> But: if there are times it's inconvenient and not just surprising, we
> should surely be able to disable it. Gabriel/Bogdan, would you be able to
> address this?
>

I Agree with David's +1 on this feature with the condition that it can be
disabled so that charm authors actually understand the behaviour of the
software they are deploying.

Please lets also ensure the retry limit is sensible - otherwise we might
end up with end-users waiting a loong time to understand that something
is not recoverable which could be equally as damaging.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Gabriel Samfira
On Mi, 2016-01-20 at 10:39 -0500, Aaron Bentley wrote:
> On 2016-01-20 10:30 AM, Gabriel Samfira wrote:
> > The auto-retry thing was created to overcome situations in which
> > the machine is rebooted, or chashes during a hook run
> > (independently of juju). In this case, the charm would not be able
> > to recover automatically from a transient situation.
> 
> If the intent was to handle reboots, couldn't it be written to
> restart
> any pending hooks after a reboot, rather than when the hooks fail?

The original intent was to re-run a hook in case of external
intervention outside of juju. This includes but is not limited to:

* automatic reboots
* OOM situation
* power outage
* killall -9 jujud (chaos monkey/gremlins/postal sysadmin)

This has grown to automatically retry on any failure. While retrying
 once at agent startup is enough for *some* needs, it may not be enough
for other charms. I would not remove the current behavior. I would
simply make it configurable in case the current behavior does not suit
everyone. The auto retry on all errors is a safe bet for charms that do
not implement retry logic, and as William stated, retrying an operation
inside a hook, will block all other hooks form running.

Just my 2 cents.

> Even re-running just at agent-startup would be a lot clearer.
> 
> Aaron
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread William Reade
On Wed, Jan 20, 2016 at 2:42 PM, Dean Henrichsmeyer 
wrote:

> Hi,
>
> It seems the original point James was making is getting missed. No one is
> arguing over the value of being able to retry and/or idempotent hooks.
> Yes, you should be able to retry them and yes nothing should break if you
> run them over and over.
>
> The point made is that Juju shouldn't be automatically retrying them. The
> argument of "no one knows what went wrong so Juju automatically retrying
> them is a better experience" doesn't work. The intelligence of the stack in
> question, regardless of what it is, goes in the charms. If you start
> conflating and mixing up where the intelligence goes then creating,
> running, and debugging those distributed systems will be a nightmare.
>

Hook errors *will* happen, and often for transient reasons. In handling
this, we can choose between "users retry without understanding the details"
and "juju retries without understanding the details" [0]. I'd be happy to
make the behaviour configurable, for the rare cases when the user *does*
understand the details and wants full and detailed control, but I don't
think that's the common case.

The magic should only be in Juju's ability to effectively drive the models
> and intelligence encoded in the charms. It shouldn't make assumptions about
> what that intelligence is or what those models require.
>

Stopping on hook error can only *prevent* those charms from applying their
intelligence. No more hooks to be run => no more opportunity to react. If a
charm wants to be smart about errors, it needs to detect the errors it
*knows* about, and react to those by setting status; and to move on
*without* failing the hook, thereby giving subsequent hooks an opportunity
to be smart.

Ultimately, it comes down to the fact that there's *always* another error
case you haven't considered. If you depend on the charmer to implement
retries for specific errors, that's essentially a whitelist, and they're
stuck playing whack-a-mole forever [1]. But if the charmer can depend on
external retries, they only have to worry about maintaining a
definitely-fatal blacklist and reporting those conditions in status.

Am I making any sense here?

Cheers
William


[0] or "the system stays broken forever", I suppose :).
[1] I imagine the rational approach there is to give up, and start
whitelisting by operation rather than by error; i.e. to accept that most
errors are unknown/transient and should be dumbly retried. And given that,
why should every charmer have to roll their own retries?
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Aaron Bentley
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256


On 2016-01-20 10:30 AM, Gabriel Samfira wrote:
> The auto-retry thing was created to overcome situations in which
> the machine is rebooted, or chashes during a hook run
> (independently of juju). In this case, the charm would not be able
> to recover automatically from a transient situation.

If the intent was to handle reboots, couldn't it be written to restart
any pending hooks after a reboot, rather than when the hooks fail?

Even re-running just at agent-startup would be a lot clearer.

Aaron
-BEGIN PGP SIGNATURE-
Version: GnuPG v2

iQEcBAEBCAAGBQJWn6pLAAoJEK84cMOcf+9h3mIIAMbumuMlehhMELNAlxMN2bnn
1rYUIZ7P/n2CagdMnjzysZXeUkRSHOjdklE4XKJUzhxzaknRgJXNZ8Ab5R7XMU1F
f4GnOXhskmw4mAae9beve5I4vF2WINxUQcxRaRen6Ov6VRQqRxVnMnZ6S85o4tPY
lMQRh+WP40JTzDkUWcCyKpQ5JgBqP9IQwn21y9v/LiXAfbkzrzqR04hvk7HrMM5W
lRBnTUldj3GHiI8Gjq6TVx6Th76PalfPUHoBlF7cmqEEVXydmuOjzr1C3fZR8VO5
JeXif92z5sR6z4TjoxnT7ixyfoz1Rvu6pKhIPJbi1cptXjDv5wU43MJsNqT6KpQ=
=Igdi
-END PGP SIGNATURE-

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Gabriel Samfira
The auto-retry thing was created to overcome situations in which the machine is 
rebooted, or chashes during a hook run (independently of juju). In this case, 
the charm would not be able to recover automatically from a transient situation.

This scenario is more evident in Windows workloads, where some features like 
Hyper-V require a reboot. This is fine, because you can install the feature 
with a -NoReboot flag, and use the juju-reboot --now tool to safely reboot.

After the machine comes back up, Windows still needs to configure  the new 
feature. While it is configuring the feature, the services start (including 
juju), and hook execution starts up (as its supposed to).   The problem is that 
as part of the feature configuration, the system needs to reboot one final time 
(Windowsright?). This is done automatically by the feature installer, 
outside of juju.

This causes the hook to error, but not because of an actual problem. For this 
scenario, its enough to retry once when the unit agent comes up.

A solution might be to make this feature configurable. Something like retry 
profiles:

* periodic (current behavior)
* one-shot (once at agent startup)
* disabled

Gabriel

On Mi, 2016-01-20 at 09:39 -0500, Charles Butler wrote:
I'm pretty sure that we have amenities to reboot the host without completely 
skewing the hook execution

https://jujucharms.com/docs/1.25/reference-hook-tools#juju-reboot-[--now]

This should have rebooted the machine after safely closing out of any hook 
context the charm was in, and upon reboot it should have resumed from the next 
context in queue.  I'm not a huge fan of a charm doing auto hook retries, for 
the reasons outlined by Rick, unless it is well understood and documented 
behavior.  Just chiming in with my 2 cents


Charles Butler 
> - Juju 
Charmer
Come see the future of datacenter orchestration: http://jujucharms.com

On Wed, Jan 20, 2016 at 9:22 AM, Rick Harding 
> wrote:
+1 retries are great, with backoff, when you know you're doing it because you 
have experience that certain api requests to clouds, or to other known failure 
points.

Blindly just saying "if at first you don't succeed, go go go" isn't a better 
UX. It adds another layer of complexity in debugging, and doesn't really 
improve the product. Only the charm author knows enough about what it's trying 
to achieve to do intelligent retry.

In this case, if there's something about unexpected reboots of machines, 
perhaps there's some specific case that Juju can grow some intelligence and 
hint at the charm author what happened. The charm can then react to that 
information as it deems necessary.

On Wed, Jan 20, 2016 at 8:42 AM Dean Henrichsmeyer 
> wrote:
Hi,

It seems the original point James was making is getting missed. No one is 
arguing over the value of being able to retry and/or idempotent hooks. Yes, you 
should be able to retry them and yes nothing should break if you run them over 
and over.

The point made is that Juju shouldn't be automatically retrying them. The 
argument of "no one knows what went wrong so Juju automatically retrying them 
is a better experience" doesn't work. The intelligence of the stack in 
question, regardless of what it is, goes in the charms. If you start conflating 
and mixing up where the intelligence goes then creating, running, and debugging 
those distributed systems will be a nightmare.

The magic should only be in Juju's ability to effectively drive the models and 
intelligence encoded in the charms. It shouldn't make assumptions about what 
that intelligence is or what those models require.

Thanks.


-Dean
--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev



--
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Stuart Bishop
On 20 January 2016 at 17:46, William Reade  wrote:

> On Wed, Jan 20, 2016 at 8:46 AM, Stuart Bishop 
> wrote:

>> It happens naturally if you structure your charm to have a single hook
>> that does everything that needs to be done, rather than trying to
>> craft individual hooks to deal with specific events.
>
> Independent of everything else, *this* should *excellent* advice for
> speeding up your deployments. Have you already been writing charms like
> this? I'd love to hear your experiences; and, in particular, if you've
> noticed any improvement in deployment speed. The theoretically achievable
> speedup is vast, but the hook runner wasn't written with this approach in
> mind; we might need to make a couple of small tweaks [0] to get the best out
> of the approach.

The PostgreSQL charm has now existed in three forms. Traditional,
services framework, and now reactive framework. Using the services
framework, deployment speed was slower than traditional. You ended up
with one very long string of steps, many of which were unnecessary. I
felt it easier to maintain and understand, but logs noisier and it was
slower. The reactive framework is much faster deployment wise than all
other versions, as you can easily have only the necessary steps
triggered for the current state. The execution thread is harder to
follow, since there isn't really one, but it still seems very
maintainable and understandable. There is less code than the other
versions. It does drive you to create separate handlers for each hook,
but advice is to keep hooks at the absolute bare minimum to adjust the
charms state based on the event and put all the actual logic in the
state driven handlers.


-- 
Stuart Bishop 

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Dean Henrichsmeyer
On Wed, Jan 20, 2016 at 11:41 AM, William Reade  wrote:

> On Wed, Jan 20, 2016 at 3:22 PM, Rick Harding 
> wrote:
>
>> +1 retries are great, with backoff, when you know you're doing it because
>> you have experience that certain api requests to clouds, or to other known
>> failure points.
>>
>
> If you're thinking about it in terms of "known failure points" you already
> understand that you need a wide net to catch all the retryable errors that
> could come out of a given operation. What makes hook execution different
> from any other code that we want to be reliable?
>
> Blindly just saying "if at first you don't succeed, go go go" isn't a
>> better UX. It adds another layer of complexity in debugging, and doesn't
>> really improve the product. Only the charm author knows enough about what
>> it's trying to achieve to do intelligent retry.
>>
>
> Empirically, it seems that the retries caused jamespage's charm succeed
> where it would have failed; and we have happy results from Gabriel's
> windows charms as well. That STM to be evidence that the product is
> improved...
>

You realize James was complaining and not celebrating the "success" ? The
fact that we can have a discussion trying to determine whether something is
a bug or a feature indicates a problem.

-D
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread William Reade
On Wed, Jan 20, 2016 at 8:01 PM, Dean Henrichsmeyer 
wrote:
>
> You realize James was complaining and not celebrating the "success" ? The
> fact that we can have a discussion trying to determine whether something is
> a bug or a feature indicates a problem.
>

Sorry, I didn't intend to disparage his experience; I took it as legitimate
and reasonable surprise at a change we evidently didn't communicate
adequately. But I don't think it's a misfeature; I think it's a necessary
approach, in service of global reliability in challenging environments.

But: if there are times it's inconvenient and not just surprising, we
should surely be able to disable it. Gabriel/Bogdan, would you be able to
address this?

Cheers
William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Gabriel Samfira
On Mi, 2016-01-20 at 21:31 +0100, William Reade wrote:
> On Wed, Jan 20, 2016 at 8:01 PM, Dean Henrichsmeyer <
> d...@canonical.com> wrote:
> > You realize James was complaining and not celebrating the "success"
> > ? The fact that we can have a discussion trying to determine
> > whether something is a bug or a feature indicates a problem.
> > 
> Sorry, I didn't intend to disparage his experience; I took it as
> legitimate and reasonable surprise at a change we evidently didn't
> communicate adequately. But I don't think it's a misfeature; I think
> it's a necessary approach, in service of global reliability in
> challenging environments.
> 
> But: if there are times it's inconvenient and not just surprising, we
> should surely be able to disable it. Gabriel/Bogdan, would you be
> able to address this?

Prioritizing it ASAP. Should be a simple change.

> 
> Cheers
> William
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Martin Packman
Another common error we see in CI is apt mirrors being unhappy leading
to hook failures. Just retry later does tend to be the right option
there, though it will often be an our or two until the archive is in a
usable state again.

On 20/01/2016, William Reade  wrote:
>
> Are there any concerns that I've missed?

Automatic retries make debugging your charm harder, as James found. I
think we want an environment setting to disable this for both testing
and for charm authors.

Martin

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread roger peppe
On 20 January 2016 at 12:20, Martin Packman
 wrote:
> Another common error we see in CI is apt mirrors being unhappy leading
> to hook failures. Just retry later does tend to be the right option
> there, though it will often be an our or two until the archive is in a
> usable state again.
>
> On 20/01/2016, William Reade  wrote:
>>
>> Are there any concerns that I've missed?
>
> Automatic retries make debugging your charm harder, as James found. I
> think we want an environment setting to disable this for both testing
> and for charm authors.

This seems like a good idea.
Also perhaps it wouldn't be so bad if you at least were able
to find some record of the hook failures without delving into the
logs.

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Rick Harding
+1 retries are great, with backoff, when you know you're doing it because
you have experience that certain api requests to clouds, or to other known
failure points.

Blindly just saying "if at first you don't succeed, go go go" isn't a
better UX. It adds another layer of complexity in debugging, and doesn't
really improve the product. Only the charm author knows enough about what
it's trying to achieve to do intelligent retry.

In this case, if there's something about unexpected reboots of machines,
perhaps there's some specific case that Juju can grow some intelligence and
hint at the charm author what happened. The charm can then react to that
information as it deems necessary.

On Wed, Jan 20, 2016 at 8:42 AM Dean Henrichsmeyer 
wrote:

> Hi,
>
> It seems the original point James was making is getting missed. No one is
> arguing over the value of being able to retry and/or idempotent hooks.
> Yes, you should be able to retry them and yes nothing should break if you
> run them over and over.
>
> The point made is that Juju shouldn't be automatically retrying them. The
> argument of "no one knows what went wrong so Juju automatically retrying
> them is a better experience" doesn't work. The intelligence of the stack in
> question, regardless of what it is, goes in the charms. If you start
> conflating and mixing up where the intelligence goes then creating,
> running, and debugging those distributed systems will be a nightmare.
>
> The magic should only be in Juju's ability to effectively drive the models
> and intelligence encoded in the charms. It shouldn't make assumptions about
> what that intelligence is or what those models require.
>
> Thanks.
>
>
> -Dean
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-20 Thread Dean Henrichsmeyer
Hi,

It seems the original point James was making is getting missed. No one is
arguing over the value of being able to retry and/or idempotent hooks. Yes,
you should be able to retry them and yes nothing should break if you run
them over and over.

The point made is that Juju shouldn't be automatically retrying them. The
argument of "no one knows what went wrong so Juju automatically retrying
them is a better experience" doesn't work. The intelligence of the stack in
question, regardless of what it is, goes in the charms. If you start
conflating and mixing up where the intelligence goes then creating,
running, and debugging those distributed systems will be a nightmare.

The magic should only be in Juju's ability to effectively drive the models
and intelligence encoded in the charms. It shouldn't make assumptions about
what that intelligence is or what those models require.

Thanks.

-Dean
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic retries of hooks

2016-01-19 Thread Stuart Bishop
On 20 January 2016 at 13:17, John Meinel  wrote:

> There are classes of failures that a charm hook itself cannot handle. The
> specific one Bogdan was working with is the fact that the machine itself is
> getting restarted while the charm is in the middle of processing a hook.
> There isn't any way the hook itself can handle that, unless you could raise
> a very specific error that indicates you should be retried (so as it notices
> its about to die, it raises the try-me-again error).
>
> Hooks are supposed to be idempotent regardless, aren't they? So while we
> paper over transient bugs in them, doesn't it make the system more resilient
> overall?

The new update-status hook could be used to recover, as it is called
automatically at regular intervals. If the reboot really was random,
you would need to clear the error status first. But if it is triggered
by the charm, it is just a case of 'reboot(now+30s);
status_set('waiting', 'Waiting for reboot'); sys.exit(0)' and waiting
for the update-status hook to kick in.

It happens naturally if you structure your charm to have a single hook
that does everything that needs to be done, rather than trying to
craft individual hooks to deal with specific events.



-- 
Stuart Bishop 

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev