Re: I'm concerned

2015-06-18 Thread Aaron Bentley
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 2015-06-18 12:12 AM, Tim Penhey wrote: The certupdater worker was making the mistake of trusting a watcher. It was blindly getting the addresses and updating the certificate. Is this relevant? https://bugs.launchpad.net/juju-core/+bug/1466514

Re: I'm concerned

2015-06-18 Thread Tim Penhey
On 19/06/15 01:47, Aaron Bentley wrote: On 2015-06-18 12:12 AM, Tim Penhey wrote: The certupdater worker was making the mistake of trusting a watcher. It was blindly getting the addresses and updating the certificate. Is this relevant? https://bugs.launchpad.net/juju-core/+bug/1466514

Re: I'm concerned

2015-06-17 Thread Tim Penhey
OK, found it. And it has nothing to do with leases. I'm just proposing the fix now, but it has taken me most of the day to diagnose and fix. The certupdater worker was making the mistake of trusting a watcher. It was blindly getting the addresses and updating the certificate. The cases where

Re: I'm concerned

2015-06-17 Thread William Reade
I think the problem is in the implicit apiserver-leasemgr-state dependencies; if the lease manager is stopped at the wrong moment, the apiserver will never shut down because it's waiting on a blocked leasemgr call. I'll propose something today. On Wed, Jun 17, 2015 at 7:33 AM, David Cheney

Re: I'm concerned

2015-06-17 Thread William Reade
...but I think that axw actually addressed that already. Not sure then; don't really have the bandwidth to investigate deeply right now. Sorry noise. On Wed, Jun 17, 2015 at 10:52 AM, William Reade william.re...@canonical.com wrote: I think the problem is in the implicit

Re: I'm concerned

2015-06-16 Thread David Cheney
This should be achievable. go test sends SIGQUIT on timeout, we can setup a SIGQUIT handler in the topmost suite (or import it as a side effect package), do whatever cleanup is needed, then os.Exit, unhandle the signal and try to send SIGQUIT to ourselves, or just panic. On Wed, Jun 17, 2015 at

I'm concerned

2015-06-16 Thread Tim Penhey
Hey team, I am getting more and more concerned about the length of time that master has been cursed. It seems that sometime recently we have introduced serious instability in cmd/jujud/agent, and it is often getting wedged and killed by the test timeout. I have spent some time looking, but I