Re: PROPOSAL: stop recording 'executing update-status hook'
On 22 May 2017 at 14:36, Tim Penhey <tim.pen...@canonical.com> wrote: > On 20/05/17 19:48, Merlijn Sebrechts wrote: >> >> On May 20, 2017 09:05, "John Meinel" <j...@arbash-meinel.com >> <mailto:j...@arbash-meinel.com>> wrote: >> >> I would actually prefer if it shows up in 'juju status' but that we >> suppress it from 'juju status-log' by default. >> >> >> This is still very strange behavior. Why should this be default? Just pipe >> the output of juju status through grep and exclude update-status if that is >> really what you want. >> >> However, I would even argue that this isn't what you want in most >> use-cases. "update-status" isn't seen as a special hook in charms.reactive. >> Anything can happen in that hook if the conditions are right. Ignoring >> update-status will have unforeseen consequences... > > > Hmm... there are (at least) two problems here. > > Firstly, update-status *should* be a special case hook, and it shouldn't > take long. > > The purpose of the update-status hook was to provide a regular beat for the > charm to report on the workload status. Really it shouldn't be doing other > things. > > The fact that it is a periodic execution rather than being executed in > response to model changes is the reason it isn't fitting so well into the > regular status and status history updates. > > The changes to the workload status would still be shown in the history of > the workload status, and the workload status is shown in the status output. > > One way to limit the execution of the update-status hook call would be to > put a hard timeout on it enforced by the agent. > > Thoughts? Unfortunately update-status got wired into charms.reactive like all the other standard hooks, and just means 'do whatever still needs to be done'. I think its too late to add timeouts or restrictions. But I do think special casing it in the status history is needed. Anything important will still end up in there due to workload status changes. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Juju Leader Election and Application-Specific Leadership
On 6 April 2017 at 00:26, Dmitrii Shcherbakov <dmitrii.shcherba...@canonical.com> wrote: > https://jujucharms.com/docs/2.1/reference-charm-hooks#leader-elected > "leader-elected is run at least once to signify that Juju decided this > unit is the leader. Authors can use this hook to take action if their > protocols for leadership, consensus, raft, or quorum require one unit > to assert leadership. If the election process is done internally to > the service, other code should be used to signal the leader to Juju. > For more information read the charm leadership document." > > This doc says > "If the election process is done internally to the service, other code > should be used to signal the leader to Juju.". > > However, I don't see any hook tools to assert leadership to Juju from > a charm based upon application-specific leadership information > http://paste.ubuntu.com/24319908/ > > So, as far as I understand, there is no manual way to designate a > leader and the doc is wrong. > > Does anyone know if it is supposed to be that way and if this has not > been implemented for a reason? I agree with your reading, and think the documentation is wrong. If the election process is done internally to the service, there is no way (and no need) to signal the internal 'leader' to Juju. I also put 'leader' in quotes because if your service maintains its own master, you should not call it 'leader' to avoid confusion with the Juju leader. For example, the lead unit in a PostgreSQL service appoints one of the units as master. The master remains the master until the operator runs the 'switchover' action on the lead unit, or the master unit is destroyed causing the lead unit to start the failover process. At no point does Juju care which unit is 'master'. Its communicated to the end user using the workload status. Its simple enough to do and works well. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Juju 2.1.0, and Conjure-up, are here!
On 23 February 2017 at 23:20, Simon Davy <simon.d...@canonical.com> wrote: > One thing that seems to have landed in 2.1, which is worth noting IMO, is > the local juju lxd image aliases. > > tl;dr: juju 2.1 now looks for the lxd image alias juju/$series/$arch in the > local lxd server, and uses that if it finds it. > > This is amazing. I can now build a local nightly image[1] that pre-installs > and pre-downloads a whole set of packages[2], and my local lxd units don't > have to install them when they spin up. Between layer-basic and Canonical > IS' basenode, for us that's about 111 packages that I don't need to install > on every machine in my 10 node bundle. Took my install hook times from 5min+ > each to <1min, and probably halfs my initial deploy time, on average. Ooh, thanks for highlighting this! I've needed this feature for a long time for exactly the same reasons. > [2] my current nightly cron: > https://gist.github.com/bloodearnest/3474741411c4fdd6c2bb64d08dc75040 /me starts stealing -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: lxd and constraints
On 13 January 2017 at 02:20, Nate Finch <nate.fi...@canonical.com> wrote: I'm implementing constraints for lxd containers and provider... and > stumbled on an impedance mismatch that I don't know how to handle. > > I'm not really sure how to resolve this problem. Maybe it's not a > problem. Maybe constraints just have a different meaning for containers? > You have to specify the machine number you're deploying to for any > deployment past the first anyway, so you're already manually choosing the > machine, at which point, constraints don't really make sense anyway. > I don't think Juju can handle this. Either constraints have different meanings with different cloud providers, or lxd needs to accept minimum constraints (along with any other cloud providers with this behavior). If you decide constraints need to consistently mean minimum, then I'd argue it is best to not pass them to current-gen lxd at all. Enforcing that containers are restricted to the minimum viable resources declared in a bundle does not seem helpful, and Juju does not have enough information to choose suitable maximums (and if it did, would not know if they would remain suitable tomorrow). -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Opaque automatic hook retries from API
On 6 January 2017 at 01:39, Casey Marshall <casey.marsh...@canonical.com> wrote: > On Thu, Jan 5, 2017 at 3:33 AM, Adam Collard <adam.coll...@canonical.com> > wrote: > >> Hi, >> >> The automatic hook retries[0] that landed as part of 2.0 (are documented >> as) run indefinitely[1] - this causes problems as an API user: >> >> Imagine you are driving Juju using the API, and when you perform an >> operation (e.g. set the configuration of a service, or reboot the unit, or >> add a relation..) - you want to show the status of that operation. >> >> Prior to the automatic retries, you simply perform your operation, and >> watch the delta streams for the corresponding change to the unit - the >> success or otherwise of the operation is reflected in the unit >> agent-status/workload-status pair. >> >> Now, with retries, if you see a unit in the error state, you can't >> accurately reflect the status of the operation, since the unit will >> undoubtedly retry the hook again. Maybe it succeeds, maybe it fails again. >> How can one say after receiving the first delta of a unit error if the >> operation succeeded or failed? >> >> With no visibility up front on the retry strategy that Juju will perform >> (e.g. something representing the exponential backoff and a fixed number of >> retries before Juju admits defeat) it is impossible to say at any point in >> the delta stream what the result of a failed-at-least-once operation is. >> > > I think the retry strategy is great -- it leverages the immutability we > expect hooks to provide, to deliver a robust result over unreliable > substrates -- and all substrates are unreliable where there's > internetworking involved! > > However I see your point about the retry strategy muddling status. I've > noticed this sometimes when watching openstack or k8s bundles "shake out" > the errors as they come up. I don't think this is always a charm quality > issue, it's maybe because we're trying to show two different things with > status? > errors being 'shaken out' are almost always unhandled race conditions. I find destroy-service/remove-application is particularly problematic, because the doomed units don't know they are being destroyed but rather is informed about departing one relation at a time (which is inherently racy, because the units the doomed service are related too will process their relation-departed hooks almost immediately and stop talking to the doomed service, while the doomed service still thinks it can access their resources while it falls apart one piece at a time). I'm becoming more and more a believer that we can't reasonably avoid these errors, and instead maybe we should assume that they will happen and it is perfectly normal. We can stick to writing nice idempotent handlers, simpler because we can ignore and bubble up failures. Simpler protocols (eg. removing all the handshaking the PostgreSQL interface does to try to avoid races with authorization). And going back to Adam's point, have hooks retried a few times with some sort of backoff before even being reported as a failure to the end user. One of the reasons test suites are currently flaky is that there are race conditions we have no reasonable way of solving, such as a database restarting itself while a hook on another unit is attempting to use it. Even though I currently bootstrap test envs with the retry behaviour off, I'm thinking of changing that. What if Juju made a clearer distinction between result-state ("what I'm > doing most recently or last attempted to do") vs. goal-state ("what I'm > trying to get done") in the status? Would that help? > Isn't the goal state just the failed hook? I would certainly like to see the list of hooks queued to run on each unit though if that is what you mean (not in the default tabular status, but in the json status dump). >> Can retries be limited to a small number, with a backoff algorithm >> explicitly documented and stuck to by Juju, with the retry attempt number >> included in the delta stream? >> > This sounds like a good idea. The limit could even be dynamic, with a retry attempted every time a unit it is related too successfully runs a hook, until the environment is quiescent. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: A (Very) Minimal Charm
On 16 December 2016 at 22:33, Katherine Cox-Buday < katherine.cox-bu...@canonical.com> wrote: > Tim Penhey <tim.pen...@canonical.com> writes: > > > Make sure you also run on LXD with a decent delay to the APT archive. > > Open question: is there any reason we shouldn't expect charm authors to > take a hard-right towards charms with snaps embedded as resources? I know > one of our long-standing conceptual problems is consistency across units > which snaps solves nicely. > https://github.com/stub42/layer-snap is how I'm expecting things to go. There is already one charm in the ~charmers review queue using it and I'm aware of several more in various stages of development. More work is needed though. In particular, Juju storage is inaccessible to snaps, because there is no way to reach it from inside the containment. (But none of this is a reason to not optimize Juju unit provisioning times, since we will still need an environment setup capable of running the charms so they can install the snaps for some time yet). -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Leadership Election Tools
On 14 December 2016 at 00:39, Matthew Williams < matthew.willi...@canonical.com> wrote: > Hey Folks, > > Let's say I'm a charm author that wants to test leadership election in my > charm. Are there any tools available that will let me force leadership > election in juju so that I can test how my charm handles it? I was looking > at the docs here: https://jujucharms.com/docs/stable/developer-leadership > but couldn't see anything > I don't think there is any supported way of doing this. If you don't mind an unsupported hack though, use 'juju ssh' to shut down the unit's jujud, wait 30 seconds for the lease to expire, and you should have a new leader. 'juju ssh' again to restart the jujud, 'juju wait' for the hooks to clear, and failover is done. 'juju run' will hang if you use it to shutdown jujud, so don't do that. juju ssh ubuntu/0 'sudo systemctl stop jujud-unit-ubuntu-0.service' sleep 30 juju ssh ubuntu/0 'sudo systemctl stop jujud-unit-ubuntu-0.service' juju wait Ideally, you may be able to structure things so that it doesn't matter which unit is leader. If all state relating to leadership decisions is stored in the leadership settings, and if you avoid using @hook, then it doesn't matter which unit makes the decisions. Worst case is that *no* unit is leader when hooks are run, and decisions get deferred until leader-elected runs. (Interesting race condition for the day: It is possible for all units in a service to run their upgrade-charm hook and for none of them to be leader at the time, so @hook('upgrade-charm') code guarded by is-leader may never run. And reactive handlers have no concept of priority and might kick in rather late for upgrade steps, requiring more creative use of reactive states to guard 'new' code from running too soon. Not specific to upgrade-charm hooks either, so avoid using @hook and leadership together) -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: A (Very) Minimal Charm
On 1 December 2016 at 19:53, Marco Ceppi <marco.ce...@canonical.com> wrote: > On Thu, Dec 1, 2016 at 5:00 AM Adam Collard <adam.coll...@canonical.com> > wrote: > >> On Thu, 1 Dec 2016 at 04:02 Nate Finch <nate.fi...@canonical.com> wrote: >> >> On IRC, someone was lamenting the fact that the Ubuntu charm takes longer >> to deploy now, because it has been updated to exercise more of Juju's >> features. My response was - just make a minimal charm, it's easy. And >> then of course, I had to figure out how minimal you can get. Here it is: >> >> It's just a directory with a metadata.yaml in it with these contents: >> >> name: min >> summary: nope >> description: nope >> series: >> - xenial >> >> (obviously you can set the series to whatever you want) >> No other files or directories are needed. >> >> >> This is neat, but doesn't detract from the bloat in the ubuntu charm. >> > > I'm happy to work though changes to the Ubuntu charm to decrease "bloat". > > >> IMHO the bloat in the ubuntu charm isn't from support for Juju features, >> but the switch to reactive plus conflicts in layer-base wanting to a) >> support lots of toolchains to allow layers above it to be slimmer and b) be >> a suitable base for "just deploy me" ubuntu. >> > > But it is to support the reactive framework, where we utilize newer Juju > features, like status and application-version to make the charm rich > despite it's minimal goal set. Honestly, a handful of cached wheelhouses > and some apt packages don't strike me as bloat, but I do want to make sure > the Ubuntu charm works for those using it. So, > > What's the real problem with the Ubuntu charm today? > How does it not achieve it's goal of providing a relatively blank Ubuntu > machine? What are people using the Ubuntu charm for? > > Other than demos, hacks/workarounds, and testing I'm not clear on the > purpose of an Ubuntu charm in a model serves. > The cs:ubuntu charm gets used on production to attach subordinates too. For example, we install cs:ubuntu onto our controller nodes so we can install subordinates like cs:ntp, cs:nrpe, cs:~telegraf-chamers/telegraf and others. Its also used in test suites for these sort of subordinates. The 'problem' is, like all reactive charms, the first thing it does is pull down approximately 160MB of packages and installs them (installing pip pulls in build-essentials, or at least a big chunk of it). Its very noticeable when working locally, and maybe in CI environments. If I knew how to solve this for all reactive charms, I would have suggested it already. It could be fixed in cs:ubuntu by making it non-reactive, if people think it is worth it (its not like it actually needs any reactive features. A minimal metadata.yaml and an install or start hook to set the status is all it needs). Maybe reactive is entrenched enough as the new world order that we can get specific cloud images spun for it, where a pile of packages are preinstalled so we don't need to wait for cloud-init or the charm to install them. We might be able to lower deployment times from minutes to seconds, since often this step is the main time sink. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: List plugins installed?
On 30 September 2016 at 04:47, Nate Finch <nate.fi...@canonical.com> wrote: > Seem alike the easiest thing to do is have a designated plugin directory > and have juju install copy the binary/script there. Then > we're only running plugins the user has specifically asked to install. > This does not work if the plugin has dependencies, such as the Python standard library or external tools such as git or graphviz. Nothing running inside the snap containment can access stuff outside of the containment. I think it will be more complex solution that needs to be designed with the snappy team. As far as I can tell its either going to need a small daemon running outside of containment and a way of passing messages to it (such as how a snap can open a web page in a browser running outside of containment), or having plugins distributed as snaps and somehow allowing the juju snap to call executables in these plugin snaps. (which is going to take time, so I guess we need to keep the existing mechanism going a while longer and the snap in devmode) -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Juju and snappy implementation spike - feedback please
On 9 August 2016 at 19:08, Ian Booth <ian.bo...@canonical.com> wrote: > I personally like the idea that the snap could use a juju-home interface to > allow access to the standard ~/.local/share/juju directory; thus allowing > a snap > and regular Juju to be used interchangeably (at least initially). This will > allow thw use case "hey, try my juju snap and you can use your existing > settings" But, isn't it verboten for snaps to access dot directories in > user > home in any way, regardless of what any interface says? We could provide an > import tool to copy from ~/.local/share/juju to ~/snap/blah... > > But in the other case, using a personal snap and sharing settings with the > official Juju snap - do we know what the official snappy story is around > this > scenario? I can't imagine this is the first time it's come up? > The big difference to me is that $SNAP_USER_DATA will roll back if the snap is rolled back. I'm not sure what happens if the snap is removed and reinstalled. Given end users should no longer need to be messing around with the dotfiles, I think the rollback behaviour is what should drive your decision. Is it nice behaviour? Or will it mess things up because rollback will cause things to get out of sync with the deployments? -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Quick win - juju check
On 24 May 2016 at 11:14, Tim Penhey <tim.pen...@canonical.com> wrote: > We talked quite a bit in Vancouver about quick wins. Things we could get > into Juju that are simple to write that add quick value. For trivial, quick wins consider: 'juju do --wait', from https://bugs.launchpad.net/juju-core/+bug/1445066 (hey, you filed that bug). Adding a common option for the *-set and other hook environment tools to get their data from stdin, rather than the command line, from https://bugs.launchpad.net/juju-core/+bug/1274460 My favourite is as always 'juju wait', but that might not turn out to be trivial. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Planning for Juju 2.2 (16.10 timeframe)
On 9 March 2016 at 06:51, Mark Shuttleworth <m...@ubuntu.com> wrote: > Hi folks > > We're starting to think about the next development cycle, and gathering > priorities and requests from users of Juju. I'm writing to outline some > current topics and also to invite requests or thoughts on relative > priorities - feel free to reply on-list or to me privately. Another item I'd like to see is distribution upgrades. We not have a lot of systems deployed with Trusty that will need to be upgraded to Xenial not too far in the future. For many services you would just bring up a new service with a new name and cut over, but this is impractical for other services such as database shards deployed on MaaS provisioned hardware. Handling upgrades may be as simple as allowing operators (or a charm action) perform the necessary dist-upgrade one unit at a time and have the controller notice and cope when the unit's jujud is bounced. Not all units would be running the same distribution release at the same time, and I'm assuming the service is running a multi-series charm here that supports both releases (so we don't need to worry about how to handle upgrade-charm hooks, at least for now) -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: New juju in ubuntu
On 7 April 2016 at 16:46, roger peppe <roger.pe...@canonical.com> wrote: > On 7 April 2016 at 10:17, Stuart Bishop <stuart.bis...@canonical.com> wrote: >> On 7 April 2016 at 16:03, roger peppe <roger.pe...@canonical.com> wrote: >>> On 7 April 2016 at 09:38, Tim Penhey <tim.pen...@canonical.com> wrote: >>>> We could probably set an environment variable for the plugin called >>>> JUJU_BIN that is the juju that invoked it. >>>> >>>> Wouldn't be too hard. >>> >>> How does that stop old plugins failing because the new juju is trying >>> to use them? >>> >>> An alternative possibility: name all new plugins with the prefix "juju2-" >>> rather >>> than "juju". >> >> I've opened https://bugs.launchpad.net/juju-core/+bug/1567296 to track this. >> >> Prepending the $PATH is not hard either - just override the >> environment in the exec() call. >> >> The nicest approach may be to not use 'juju1', 'juju2' and 'juju' but >> instead just 'juju'. It would be a thin wrapper that sets the $PATH >> and invokes the correct binary based on some configuration such as an >> environment variable. This would fix plugins, and lots of other stuff >> that are about to break too such as deployment scripts, test suites >> etc. > > There are actually two problems here. One is the fact that plugins > use the Juju binary. For that, setting the PATH might well be the right thing. > > But there's also a problem with other plugins that use the Juju API > directly (they might be written in Go, for example) and therefore > implicitly assume the that they're talking to a juju 1 or juju 2 environment. > Since local configuration files have changed and the API has changed, it's > important that a plugin written for Go 1 won't be invoked by a juju 2 > binary. If juju 2.x changed the plugin prefix from juju- to juju2-, that would also solve the issue of juju 2.x specific plugins showing up in juju 1.x's command line help and vice versa. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: New juju in ubuntu
On 7 April 2016 at 16:03, roger peppe <roger.pe...@canonical.com> wrote: > On 7 April 2016 at 09:38, Tim Penhey <tim.pen...@canonical.com> wrote: >> We could probably set an environment variable for the plugin called >> JUJU_BIN that is the juju that invoked it. >> >> Wouldn't be too hard. > > How does that stop old plugins failing because the new juju is trying > to use them? > > An alternative possibility: name all new plugins with the prefix "juju2-" > rather > than "juju". I've opened https://bugs.launchpad.net/juju-core/+bug/1567296 to track this. Prepending the $PATH is not hard either - just override the environment in the exec() call. The nicest approach may be to not use 'juju1', 'juju2' and 'juju' but instead just 'juju'. It would be a thin wrapper that sets the $PATH and invokes the correct binary based on some configuration such as an environment variable. This would fix plugins, and lots of other stuff that are about to break too such as deployment scripts, test suites etc. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: New juju in ubuntu
On 7 April 2016 at 03:55, Marco Ceppi <marco.ce...@canonical.com> wrote: > > On Wed, Apr 6, 2016 at 10:07 AM Stuart Bishop <stuart.bis...@canonical.com> > wrote: >> >> On 5 April 2016 at 23:35, Martin Packman <martin.pack...@canonical.com> >> wrote: >> >> > The challenge here is we want Juju 2.0 and all the new functionality >> > to be the default on release, but not break our existing users who >> > have working Juju 1.X environments and no deployment upgrade path yet. >> > So, versions 1 and 2 have to be co-installable, and when upgrading to >> > xenial users should get the new version without their existing working >> > juju being removed. >> > >> > There are several ways to accomplish that, but based on feedback from >> > the release team, we switched from using update-alternatives to having >> > 'juju' on xenial always be 2.0, and exposing the 1.X client via a >> > 'juju-1' binary wrapper. Existing scripts can either be changed to use >> > the new name, or add the version-specific binaries directory >> > '/var/lib/juju-1.25/bin' to the path. >> >> How do our plugins know what version of juju is in play? Can they >> assume that the 'juju' binary found on the path is the juju that >> invoked the plugin, or is there some other way to tell using >> environment variables or such? Or will all the juju plugins just fail >> if they are invoked from the non-default juju version? > > > You can invoke `juju version` from within the plugin and parse the output. > That's what I've been doing when I need to distinguish functionality. That seems fine if you are invoking the plugin from the default unnumbered 'juju'. But running 'juju2 wait' will mean that juju-wait will be executing juju 1.x commands and fail. And conversely running 'juju1 wait' will invoke juju 2.x and probably fail. I think the plugin API needs to be extended to support allowing multiple juju versions to coexist. An environment variable would do the trick but require every plugin to be fixed. Altering $PATH so 'juju' runs the correct juju would allow existing plugins to run unmodified (the bulk of them will work with both juju1 and juju2, since the cli is similar enough that many plugins will work unmodified. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: New juju in ubuntu
On 5 April 2016 at 23:35, Martin Packman <martin.pack...@canonical.com> wrote: > The challenge here is we want Juju 2.0 and all the new functionality > to be the default on release, but not break our existing users who > have working Juju 1.X environments and no deployment upgrade path yet. > So, versions 1 and 2 have to be co-installable, and when upgrading to > xenial users should get the new version without their existing working > juju being removed. > > There are several ways to accomplish that, but based on feedback from > the release team, we switched from using update-alternatives to having > 'juju' on xenial always be 2.0, and exposing the 1.X client via a > 'juju-1' binary wrapper. Existing scripts can either be changed to use > the new name, or add the version-specific binaries directory > '/var/lib/juju-1.25/bin' to the path. How do our plugins know what version of juju is in play? Can they assume that the 'juju' binary found on the path is the juju that invoked the plugin, or is there some other way to tell using environment variables or such? Or will all the juju plugins just fail if they are invoked from the non-default juju version? -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Planning for Juju 2.2 (16.10 timeframe)
On 1 April 2016 at 20:50, Mark Shuttleworth <m...@ubuntu.com> wrote: > On 19/03/16 01:02, Stuart Bishop wrote: >> On 9 March 2016 at 10:51, Mark Shuttleworth <m...@ubuntu.com> wrote: >> >>> Operational concerns >> I still want 'juju-wait' as a supported, builtin command rather than >> as a fragile plugin I maintain and as code embedded in Amulet that the >> ecosystem team maintain. A thoughtless change to Juju's status >> reporting would break all our CI systems. > > Hmm.. I would have thought that would be a lot more reasonable now we > have status well in hand. However, the charms need to support status for > it to be meaningful to the average operator, and we haven't yet made > good status support a requirement for charm promulgation in the store. > > I'll put this on the list to discuss. It is easier with Juju 1.24+. You check the status. If all units are idle, you wait about 15 seconds and check again. If all units are still idle and the timestamps haven't changed, the environment is probably idle. And for some (all?) versions of Juju, you also need to ssh into the units and ensure that one of the units in each service thinks it is the leader as it can take some time for a new leader to be elected. Which means 'juju wait' as a plugin takes quite a while to run and only gives a probable result, whereas if this information about the environment was exposed it could be instantaneous and correct. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Planning for Juju 2.2 (16.10 timeframe)
sk space, but means you could migrate a 10 unit Cassandra cluster to a new 5 unit Cassandra cluster. (the charm doesn't actually do this yet, this is just speculation on how it could be done). I imagine other services such as OpenStack Swift would be in the same boat. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Units & resources: are units homogeneous?
On 17 February 2016 at 01:20, Katherine Cox-Buday <katherine.cox-bu...@canonical.com> wrote: > My understanding is that it's a goal to make the management of units more > consistent, and making the units more homogeneous would support this, but > I'm wondering from a workload perspective if this is also true? One example > I could think of to support the discussion is a unit being elected leader > and thus taking a different path through it's workflow than the other units. > When it comes to resources, maybe this means it pulls a different sub-set of > the declared resources, or maybe doesn't pull resources at all (e.g. it's > coordinating the rest of the units or something). While I have charms where units have distinct roles (one master, multiple standbys, and the juju leader making decisions), they can be treated as homogeneous since they need to be able to fail over from one role to another. The only use case I can think of where different resources might be pulled down on different units is deploying a new service with data restored from a backup. The master would be the only unit to pull down this resource (the backup) on deployment, and the standbys would replicate it from the master. And now I think of it, can I stream resources? I don't want to provision a machine with 8TB of storage just so I can restore a 4TB dump. Maybe this is just a terrible example, since I probably couldn't be bothered uploading the 4TB dump in the first place, and would instead setup tunnels and pipes to stream it into a 'juju run' command. An abuse of Juju resources better suited to Juju blob storage? -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Automatic retries of hooks
On 20 January 2016 at 17:46, William Reade <william.re...@canonical.com> wrote: > On Wed, Jan 20, 2016 at 8:46 AM, Stuart Bishop <stuart.bis...@canonical.com> > wrote: >> It happens naturally if you structure your charm to have a single hook >> that does everything that needs to be done, rather than trying to >> craft individual hooks to deal with specific events. > > Independent of everything else, *this* should *excellent* advice for > speeding up your deployments. Have you already been writing charms like > this? I'd love to hear your experiences; and, in particular, if you've > noticed any improvement in deployment speed. The theoretically achievable > speedup is vast, but the hook runner wasn't written with this approach in > mind; we might need to make a couple of small tweaks [0] to get the best out > of the approach. The PostgreSQL charm has now existed in three forms. Traditional, services framework, and now reactive framework. Using the services framework, deployment speed was slower than traditional. You ended up with one very long string of steps, many of which were unnecessary. I felt it easier to maintain and understand, but logs noisier and it was slower. The reactive framework is much faster deployment wise than all other versions, as you can easily have only the necessary steps triggered for the current state. The execution thread is harder to follow, since there isn't really one, but it still seems very maintainable and understandable. There is less code than the other versions. It does drive you to create separate handlers for each hook, but advice is to keep hooks at the absolute bare minimum to adjust the charms state based on the event and put all the actual logic in the state driven handlers. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Automatic retries of hooks
On 20 January 2016 at 13:17, John Meinel <j...@arbash-meinel.com> wrote: > There are classes of failures that a charm hook itself cannot handle. The > specific one Bogdan was working with is the fact that the machine itself is > getting restarted while the charm is in the middle of processing a hook. > There isn't any way the hook itself can handle that, unless you could raise > a very specific error that indicates you should be retried (so as it notices > its about to die, it raises the try-me-again error). > > Hooks are supposed to be idempotent regardless, aren't they? So while we > paper over transient bugs in them, doesn't it make the system more resilient > overall? The new update-status hook could be used to recover, as it is called automatically at regular intervals. If the reboot really was random, you would need to clear the error status first. But if it is triggered by the charm, it is just a case of 'reboot(now+30s); status_set('waiting', 'Waiting for reboot'); sys.exit(0)' and waiting for the update-status hook to kick in. It happens naturally if you structure your charm to have a single hook that does everything that needs to be done, rather than trying to craft individual hooks to deal with specific events. -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Making logging to MongoDB the default
On 22 October 2015 at 22:17, Nate Finch <nate.fi...@canonical.com> wrote: > IMO, all-machines.log is a bad idea anyway (it duplicates what's in the log > files already, and makes it very likely that the state machines will run out > of disk space, since they're potentially aggregating hundreds or thousands > of machines' logs, not to mention adding a lot of network overhead). I'd be > happy to see it go away. However, I am not convinced that dropping text > file logs in general is a good idea, so I'd love to hear what we're gaining > by putting logs in Mongo. I'm looking forward to having access to them in a structured format so I can generate logs, reports and displays the way I like rather than dealing with the hard to parse strings in the text logs. 'juju debug-logs [--from ts] [--until ts] [-F] [--format=json]' would keep me quite happy and I can filter, format, interleave and colorize the output to my hearts content. I can even generate all-machines.log if I feel like a headache ;) -- Stuart Bishop <stuart.bis...@canonical.com> -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Use case for: min-version
On 12 August 2015 at 05:02, Jeff Pihach jeff.pih...@canonical.com wrote: Version checking for features can be dangerous because a commands output or availability may change in the future and now your charm also needs a max-version, or version-range etc. A more robust solution could be something along the lines of a feature-supported query which would return whether that command is indeed supported in the active environment with the necessary syntax. max-version should be very rare. To need it, you need to have both a backwards incompatible change in Juju and a charm supporting such a wide range of Juju versions that you need the multiple codepaths. Still, I imagine it will happen and hookenv.juju_has_version easily updated to support a range. If you want your feature-supported API, now the version number is exposed in charm-helpers you can easily add such a matrix of feature flags. It just needs someone interested enough to maintain the list as it grows and grows over time. I personally think it is impractical, for exactly the problems you describe. A flag like 'leadership' isn't very useful. I'm interested in leadership as it behaves in 1.23, or leadership as implemented in 1.25 with the leader-deposed hook, or leadership as implemented in 1.24.4 with the HA stability fixes, or storage as of 1.25 when I can upgrade a service previously using the block storage broker, or status as of 1.28 when we added more failure states, or relation-set as of 1.23.3 when the --file argument was fixed to accept input from stdin. Or most practically, I'm interested in Juju 1.24 stable because I know I'm using features that did not exist in 1.23 stable and that is the version I'm running tests with. And now I think of it, it also makes testing easier (and thus hopefully improves quality). If you are testing code guarded by both the leadership and status feature flags, you have 4 code paths to test. If you are testing code guarded by has_version_1.24, you only have 2 code paths to test. And you would save time and effort, since we all know that all versions of Juju implementing unit status also implement basic leadership. Juju is developed and releases features on a single trunk, where as for cross browser compatibility you are supporting a matrix of features enabled or not on dozens of different branches (one for each browser). pw = host.genpw() if feature('leadership'): leader_set(dict(password=pw)) else: relation_set(password=pw, relid=hookenv.get_peer_relid()) status_set('blocked', 'Connect to {} using password {} to complete setup'.format(url, pw)) if feature('status'): raise SystemExit(0) else: raise SystemExit(1) I think it is best to add the feature flags you want in your own charm, using the version number exposed by charm-helpers, rather than coarse feature flags exposed by charm-helpers or juju that don't necessarily align with your charm's actual requirements. As for graceful fallback, its great when you can do it. Both Marco and I use the same example - status_set. Under 1.23 or earlier, it uses juju-log. Under 1.24 or higher, it uses status-set. However, if you look at my original sample code you see that it isn't enough because you still need to decide what to do next based on the behaviour that was hidden from you. The graceful fallback practically requires you to sniff the version if you want to block your units properly. def block_and_exit(msg): hookenv.status_set('blocked', msg) if hookenv.has_juju_version('1.24'): raise SystemExit(0) # blocked state for modern juju raise SystemExit(1) # error state for older juju -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Use case for: min-version
On 12 August 2015 at 03:56, Tim Penhey tim.pen...@canonical.com wrote: It would be trivial for the Juju version to be exported in the hook context as environment variables. Perhaps something like this: JUJU_VERSION=1.24.4 JUJU_VERSION_MAJOR=1 JUJU_VERSION_MINOR=24 JUJU_VERSION_PATCH=4 # tag for 'alpha' 'beta' JUJU_VERSION_TAG= Thoughts? Whatever :-) An environment variable seems the obvious way to communicate the information. Give me a version string like 1.24.4 and I'm happy. The trick is documenting and sticking to the format, so I know if one day you might throw 1.26.1-alpha6 at me. So do it if it really is trivial, or if not try not to break the work around in charm-helpers (parsing the output of /var/lib/juju/tools/machine-*/jujud version) -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Use case for: min-version
On 11 August 2015 at 03:32, Matt Bruzek matthew.bru...@canonical.com wrote: We wrote a charm that needed election logic, so we used the new Juju feature is_leader. A user was interested in using a bundle that contained this charm and it failed on them. It was hard to track down the cause of the problem. It appears they were using an earlier version of Juju that is available from universe and only the PPA had the more current version. Read more about the problem here: https://bugs.launchpad.net/charms/+source/etcd/+bug/1483380 I heard the min-version feature discussed at previous Cloud sprints but to my knowledge we do not have it implemented yet. The idea was a charm could specify in metadata.yaml what min-version of Juju they support. There are a lot of new features that juju-core are cranking out (and that is *awesome*)! We have already run into this problem with a real user, and will have the problem in the future. Can we reopen the discussion of min-version? Or some other method of preventing this kind of problem in the future? charmhelpers already supports this with charmhelpers.core.hookenv.has_juju_version, thanks to Curtis who described a reliable way of accessing it. Adding it to the official hook environment is https://bugs.launchpad.net/juju-core/+bug/1455368 It is particularly useful for: hookenv.status_set('blocked', I'm in a right pickle. Help!) if hookenv.has_juju_version('1.24'): raise SystemExit(0) # Blocked state, 1.24+ raise SytemExit(1) # Error state, 1.23 I've also got version checks in charmhelpers.coordinator, which requires leadership. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Send Juju logs to different database?
On 6 May 2015 at 04:57, Menno Smits menno.sm...@canonical.com wrote: It is more likely that Juju will grow the ability to send logs to external log services using the syslog protocol (and perhaps others). You could use this to log to your own log aggregator or database. This feature has been discussed but hasn't been planned in any detail yet (pull requests would be most welcome!). syslog seems a bad fit, as the logs are now structured data and I'd like to keep it that way. I guess people want it as an option, but I'd consider it the legacy option here. My own use case would be to make a more readable debug-logs, rather than attempting to parse the debug-logs output ;) Hmm... I may be able to do this already via the Juju API. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Juju devel 1.23-beta2 is released
On 26 March 2015 at 23:58, Curtis Hovey-Canonical cur...@canonical.com wrote: ### Service Leader Elections Services running in an environment bootstrapped with the leader-election feature flag now have access to three new hook tools: is-leader - returns true only if the executing unit is guaranteed service leadership for the next 30s leader-get - as relation-get; accessible only within the service leader-set - as relation-set; will fail if not executed on leader ...and two new hooks: leader-elected - runs when the unit takes service leadership leader-settings-changed - runs when another unit runs leader-set When a unit starts up, it will always run either leader-elected or leader-settings-changed as soon as possible, delaying only doing so only to run the install hook; complete any queued or in-flight operation; or resolve a hook or upgrade error. Looking forward to this. Looking at the specifics, I'm interested in how the leader unit performs long running operations with the 30s lease. Lets say unit 0 is the leader, and decides that it is an appropriate time for unit 0 to run a repair operation that might take a few hours. As the wording currently stands, no hooks are triggered on the leader when the leader calls leader-set. This gives the leader no alternative but to perform the long running operation in the same hook, and risk another leader being elected and making a conflicting decision. I think that the leader-settings-changed hook needs to be called whenever *any* unit runs leader-set (including the current unit if it is the leader), rather than only when a different unit runs leader-set. This way, the leader can make its decisions and exit whatever hook triggered it within its 30s lease, and all units can perform their long running tasks in the leader-settings-changed hook. Alternatively, it could kick of an asynchronous task but they don't exist yet. Or would I need to run 'is-leader' in a thread every 30s to keep the lease renewed? -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Juju devel 1.23-beta2 is released
On 30 March 2015 at 18:14, John Meinel j...@arbash-meinel.com wrote: I believe the Juju agent itself is running a renew the lease every 30s. It probably wouldn't hurt for the charm to check that it is still the master periodically if it is going to be running for an hour, since it might lose connection without otherwise realizing. Oh, that will work too I think. I'd be testing this myself, but I'm having lxc issues and don't want to compound it with juju beta issues ;) -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Feature Request: -about-to-depart hook
On 3 February 2015 at 21:23, Stuart Bishop stuart.bis...@canonical.com wrote: On 28 January 2015 at 21:03, Mario Splivalo I'm not sure if this is possible... Once the unit left relation juju is no longer aware of it so there is no way of knowing if -broken completed with success or not. Or am I wrong here? Hooks have no way of telling, but juju could in the same way that you can tell by running 'juju status'. If the unit is still running, it might still run the -broken hook. Once the unit is destroyed, we know it will never run the -broken hook. While typing up https://bugs.launchpad.net/juju-core/+bug/1417874 I realized that your proposed solution of a pre-departure hook is the only one that can work. Once -departed hooks start firing both the doomed unit and the leader have already lost the access needed to decommission the departing node. I'm going to need to tear out the decommissioning code from my charm (that started failing my tests once I tightened security), and document the manual decommissioning process. Unless someone can come up with a better way forward with current juju. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Feature Request: -about-to-depart hook
On 28 January 2015 at 21:03, Mario Splivalo mario.spliv...@canonical.com wrote: On 01/27/2015 09:52 AM, Stuart Bishop wrote: Ignoring the, most likely, wrong nomenclature of the proposed hook, what are your opinions on the matter? I've been working on similar issues. When the peer relation-departed hook is fired, the unit running it knows that $REMOTE_UNIT is leaving the cluster. $REMOTE_UNIT may not be alive - we may be removing a failed unit from the service. $REMOTE_UNIT may be alive but uncontactable - some form of network partition has occurred. $REMOTE_UNIT doesn't have to be the one leaving the cluster. If I have 3-unit cluster (mongodb/0, mongodb/1, mongodb/2), and I 'juju remove mongodb/1), the relation-departed hook will fire on all three units. Moreover, it will fire twice on mongodb/1. So, from mongodb/2 perspective, $REMOTE_UNIT is indeed pointing to mongodb/0, which is, in this case, leaving the relation. But if we observe the same scenario on mongodb/0, $REMOTE_UNIT there will point to mongodb/0. But that unit is NOT leaving the cluster. There is no way to know if the hook is running on the unit that's leaving or is it running on the unit that's staying. I see, and have also struck the same problem with the Cassandra charm. It is impossible to have juju decommission a node. My relation-departed hook must reset the firewall rules, since the replication connection is unauthenticated and we cannot leave it open. This means I cannot decommission the departing unit in the relation-broken hook, as the remaining nodes refuse to talk to it and it has no way of redistributing its data. And I can't decommission the departing node in the relation-departed hook, because as you correctly say, it is impossible to know which unit is actually leaving the cluster and which are remaining. But, if that takes place in relation-departed, there is no way of knowing if you need to do a stepdown, because you don't know if you're the unit being removed, or is it the remote unit being removed. Therefore the logic for removing nodes had to go to relation-broken. But, as you explained, if the unit goes down catastrophically the relation-broken will never be executed and I have a cluster that needs manual intervention to clean up. Leadership might provide a work around, as the service is guaranteed to have exactly one leader. If a unit is running the relation-departed hook and it is the leader, it knows it is not the one leaving the cluster (or it would no longer be leader) and it can perform the decommissioning. But that is a messy work around. Given we have both struck nearly exactly the same problem, I'd surmise the same issue will occur in pretty much all similar systems (Swift, Redis, mysql, ...) and we need a better solution. I've also heard rumours of a goal state, which may provide units enough context to know what is happening. I don't know the details of this though. I'm not sure if this is possible... Once the unit left relation juju is no longer aware of it so there is no way of knowing if -broken completed with success or not. Or am I wrong here? Hooks have no way of telling, but juju could in the same way that you can tell by running 'juju status'. If the unit is still running, it might still run the -broken hook. Once the unit is destroyed, we know it will never run the -broken hook. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Feature Request: -about-to-depart hook
On 26 January 2015 at 20:54, Mario Splivalo mario.spliv...@canonical.com wrote: Hello! Currently juju provides relation-departed hook, which will fire on all units that are part of relation, and relation-broken hook, which will fire on unit that just departed the relation. The problem arises when we have a multi-unit service peered. Consider MongoDB charm where we usually have replicaset formed with three or more units: When a unit is destroyed (with 'juju remove-unit') first relation-broken hook will fire between the departing unit and all the 'staying' units. Then, on the departed unit relation-broken hook is fired. But, if we need to do some work on the departing unit before it leaves the relation, there is no way to do so. When 'relation-departed' hook is called there is no way of telling (if we make observation from within the hook) if we are running on unit that is departing, or on unit that is 'staying' within the relation. A '-before-departed' hook would, I think, solve. First a '-before-departed' hook will be fired on the departing unit. Then '-departed' hook will fire against departing and staying units. And, lastly, as it is now, the -broken hook will fire. Ignoring the, most likely, wrong nomenclature of the proposed hook, what are your opinions on the matter? I've been working on similar issues. When the peer relation-departed hook is fired, the unit running it knows that $REMOTE_UNIT is leaving the cluster. $REMOTE_UNIT may not be alive - we may be removing a failed unit from the service. $REMOTE_UNIT may be alive but uncontactable - some form of network partition has occurred. When the peer relation-broken hook is fired, the unit running it knows that is it leaving the cluster and decomissions itself. However, this hook may never be run if the unit has failed. Or it may be impossible to complete successfully (eg. corrupted filesystem). I agree that this is not rich enough to remove units robustly. The peer relation-departed hooks are not particularly useful to me, as they cannot know in advance if the relation-broken hook will complete successfully. It is the peer relation-broken hook that is responsible for properly decoupling the unit from the service, and this works fine if the unit is healthy. The problem is of course if the departing unit *has* failed, because no subsequent hooks are called to repair the damaged cluster. As a concrete example, to remove a cassandra node from a cluster: - First, run 'nodetool decommission' on the departing node. This streams its partitions to the remaining nodes. - Second, if 'nodetool decommission' failed or could not be run, run 'nodetool removenode' on one of the other nodes. This removed the failed node from the ring, and the remaining nodes will rebalance and rebuild using redundant copies of the data. Data may be lost if stored with a replication factor of 1 or if updates only waited for an acknowledgement from 1 node. An extra hook as you suggest would help me to solve this issue. But what would also solve my issue is juju leadership (currently in development). When the lead unit runs its peer relation-departed hook, it connects to the departing unit and runs the decommissioning process on its behalf. If it is unable to connect, it assumes the node is failed and cleans up. It can even notify the remaining non-leader units that the remove unit has been removed from the cluster, giving them a chance to update their configuration if necessary. You can't really do this without the leadership feature, as you can't coordinate which of the remaining units is responsible for decommissioning the departing unit (and they would trip over each other if they all attempted to decommission the departing node). The edge case in my approach is of course if the departing unit is live, but for some reason the leader cannot connect to it. Maybe your inter DC links have gone down. However, there are similar issues with the extra hook. If your -before-departed hook fails to run, how long should juju wait until it gives up and triggers the -departed hooks? Perhaps what is needed here is instead an extra hook run on the remaining units if the -broken hook could not be run successfully? Lets call it relation-failed. It could be fired when we know the vm is gone and the -broken hook was not successfully run. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: juju min version feature
On 16 December 2014 at 21:36, William Reade william.re...@canonical.com wrote: On Tue, Dec 16, 2014 at 6:36 AM, Stuart Bishop stuart.bis...@canonical.com wrote: I think we need the required juju version even if we also allow people to specify features. swift-storage could specify that it needs 'the version of juju that configures lxc to allow loopback mounts', which is a bug fix rather than a feature. Providing a feature flag for every bug fix that a charm may depend on is impractical. 1) If you're developing for 1.20, then I think the compatible-1.20 flag mentioned above should work as you desire, until juju changes to the point where some feature is actively incompatible. (As stated above, I'm expecting there will be some degree of tuning the charm environment to the declared flags regardless.) 2) Expand on the impracticality a bit please? I imagine that when we're talking about bugfixes of the sort you describe, the proportion of charms that care about a given one will be small; tracking them all may be somewhat *tedious* for the developers, but I don't see it being especially difficult or risky -- and AFAICS it need not impact any charm developers other than those who need that specific flag. ...not that I'm really keen to define a flag for every bugfix :-/. Do you have a rough idea of how often you've wanted min-version so far? Practically, as a charm developer I'll be developing and testing using juju-stable (1.20.14) and would tag my charms as minversion 1.20.14 (or tagged compatible-1.20 if you prefer). I will not give a moments thought to old versions of juju unless I have a special need for back porting my work. For example, I've recently implemented rolling restarts in a charm (and will push this to charmhelpers). For now, it uses the peer relation to coordinate things. IIRC, in older versions of juju you could not use the peer relationship unless there was at least one other peer and the relationship had been joined. That has since been fixed, and I can rely on using the peer relationship even if I only have a single unit reducing my code paths. I'm relying on this behaviour, yet have no idea if this changed in 1.16 or 1.18 or 1.20. Most developers will never even know the behaviour was different in the past, since they are developing in the present. Developers can't track what they are not aware of. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: juju min version feature
On 16 December 2014 at 00:13, John Meinel j...@arbash-meinel.com wrote: Can't we just as easily provide tools to find out what version of Juju provides a particular feature? Certainly a CLI of: $ juju supported-features leader-election container-addressibility Or even possibly something that talks to something like the charm-store: $ juju known-features leader-election: juju = 2.2 container-addressibility: juju = 2.0 I'm personally on the side of having charm *authors* talk about the features they want. Because then in juju-world we can enable/disable specific features based on them being requested, which makes charm authors get the features they need right. (e.g., if the charm doesn't say it needs leader-election, then it doesn't get leader tools exposed.) min-version, otoh, leads to people just setting it to the thing they are using, and doesn't give Juju a way to smartly enable/disable functionality. It also suffers from when we want to drop a feature that didn't turn out quite like what we thought it would. On the flip side, I could state that I am developing my charm for juju 1.20 and not care what features I'm using. If someone deploys my charm with juju 2.1, then juju could do so by deploying the charm in a 1.20 compatible environment. Juju devs can forge ahead and make backwards incompatible changes to hook tools and the meanings of environment variables by providing a compatibility layer. I do think it is useful to encode the required juju version in the charm. We also need versioning on interfaces (charms need to make backwards incompatible changes to interfaces), and better support for multiple charm series (1.0, 1.1, 2.0 vs 'trusty', 'precise'), but that is all future spec work. I think we need the required juju version even if we also allow people to specify features. swift-storage could specify that it needs 'the version of juju that configures lxc to allow loopback mounts', which is a bug fix rather than a feature. Providing a feature flag for every bug fix that a charm may depend on is impractical. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Feature Request: show running relations in 'juju status'
On 19 November 2014 at 19:59, William Reade william.re...@canonical.com wrote: On Tue, Nov 18, 2014 at 9:37 AM, Stuart Bishop stuart.bis...@canonical.com wrote: Ok. If there is a goal state, and I am able to wait until the goal state is the actual state, then my needs (and amulet and juju-deployer needs) will be met. It does seem a rather lengthy and long winded way of getting there though. The question I have always needed juju to answer is 'are there any hooks running or are there any hooks queued to run?'. I've always assumed that juju must already know this (or it would be unable to function), but refuses to communicate this single bit of information in any way. Juju as a system actually doesn't know this. Unit idleness is known only by the unit agents themselves, and only implicitly at that -- if we're blocking in a particular select clause then we're (probably!) idle, and that's it. I agree that exposing idleness would be good, and I'm doing some of the preliminary work necessary right now, but it's not my current focus: it's just a happy side-effect of what needs to be done for leader election. Ok. I was thinking of a central system tracking the unit states and firing hooks, but it seems the units are much more independent, tracking their own state and making their own decisions. That would work too. If all units are in idle state, then the system has reached a steady state and my question answered. Sort of. It's steady for now, but will not necessarily still be steady by the time you're reacted to it -- even if you're the only administrator, imagine a cron job that uses juju-run and triggers a wave of relation traffic across the system. Your example is actually a steady state in my mind, in much the same way a biological system may be in a steady state despite having a heartbeat. But yes, you can construct some pathological cases where my heuristic is not good enough to detect when the system has reached an equilibrium. I am perfectly fine with reporting that the system *was* in a steady state rather than *is* in a steady state. If your system is chaotic enough where the difference matters, I think you are better off fixing it rather than forging ahead attempting to reliably test and deploy a chaotic system. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Feature Request: show running relations in 'juju status'
On 18 November 2014 12:23, Ian Booth ian.bo...@canonical.com wrote: On 17/11/14 15:47, Stuart Bishop wrote: On 17 November 2014 07:13, Ian Booth ian.bo...@canonical.com wrote: The new Juju Status work planned for this cycle will hopefully address the main concern about knowing when a deployed charm is fully ready to do the work for which it was installed. ie the current situation whereby a unit is marked as Started but it not ready. Charms are able to mark themselves as Busy and also set a status message to indicate they are churning and not ready to run. Charms can also indicate that they are Blocked and require manual intervention (eg a service needs a database and no relation has been established yet to provide the database), or Waiting (the database on which the service relies is busy but will resolve automatically when the database is available again). As long as the 'ready' state is managed by juju and not the unit, I'll stand happily corrected :-) The focus I'd seen had been on the unit declaring its own status, and there is no way for a unit to know that is ready because it has no way of knowing that, for example, there are another 10 peer units being provisioned that will need to be related. You are correct that the initial scope of work is more about the unit, and less about the deployment as a whole. There are plans though to address the issue. We're throwing around the concept of a goal state, which is conceptually akin to looking forward in time to be able to inform units what relations they will expect to participate in and what units will be deployed. They'd likely be something like a relation-goals hook tool (to compliment relation-list and relation-ids), as well as hook(s) for when the goal state changes. There's ongoing work in the uniter by William to get the architecture right so this work can be considered. There's still a lot of value in the current Juju Status work, but as you point out, it's not the full story. Ok. If there is a goal state, and I am able to wait until the goal state is the actual state, then my needs (and amulet and juju-deployer needs) will be met. It does seem a rather lengthy and long winded way of getting there though. The question I have always needed juju to answer is 'are there any hooks running or are there any hooks queued to run?'. I've always assumed that juju must already know this (or it would be unable to function), but refuses to communicate this single bit of information in any way. So although there are not currently plans to show the number of running hooks in the first phase of this work, mechanisms are being provided to allow charm authors to better communicate the state of their charms to give much clearer and more accurate feedback as to 1) when a charm is fully ready to do work, 2) if a charm is not ready to do work, why not. A charm declaring itself ready is part of the picture. What is more important is when the system is ready. You don't want to start pumping requests through your 'ready' webserver, only to have it torn away as a new block device is mounted on your database when its storage-joined hook is invoked and returned to 'ready' state again once the storage-changed hook has completed successfully. Also being thrown around is the concept of a new agent-state called Idle, which would be used when there are no pending hooks to run. There are plans as That would work too. If all units are in idle state, then the system has reached a steady state and my question answered. well for the next phase of the Juju status work to allow collaborating services to notify when they are busy, and mark relationships as down. So if the database had it's storage-attached hook invoked, it would mark itself as Busy, mark its relation to the webserver as Down, thus allowing the webserver to put itself into Waiting. Or, if we are talking about the initial install phase, the database would not initially mark itself as Running until its declared storage requirements were met, so the webserver would go from Installing to Waiting and then to Running one the database became Running. I'm not entirely sure how useful this feature is, given the inherent race conditions with serialized hooks. Right now, you need to write charms that gracefully cope with dependent services that have gone down without notice. With this feature, you will need to write charms that gracefully cope with dependent services that have gone down and the notification hasn't reached you yet. Or if the outage is for non-juju reasons, like a network partition. The window of time waiting for hooks to bubble through could easily be minutes when you have a simple chain of services (eg. postgresql - pgbouncer - django - haproxy - apache seems common enough). Your example with storage is particularly interesting, as I was just dealing with this yesterday in my rewrite of the Cassandra charm. The existing
Re: Feature Request: show running relations in 'juju status'
On 17 November 2014 07:13, Ian Booth ian.bo...@canonical.com wrote: The new Juju Status work planned for this cycle will hopefully address the main concern about knowing when a deployed charm is fully ready to do the work for which it was installed. ie the current situation whereby a unit is marked as Started but it not ready. Charms are able to mark themselves as Busy and also set a status message to indicate they are churning and not ready to run. Charms can also indicate that they are Blocked and require manual intervention (eg a service needs a database and no relation has been established yet to provide the database), or Waiting (the database on which the service relies is busy but will resolve automatically when the database is available again). As long as the 'ready' state is managed by juju and not the unit, I'll stand happily corrected :-) The focus I'd seen had been on the unit declaring its own status, and there is no way for a unit to know that is ready because it has no way of knowing that, for example, there are another 10 peer units being provisioned that will need to be related. So although there are not currently plans to show the number of running hooks in the first phase of this work, mechanisms are being provided to allow charm authors to better communicate the state of their charms to give much clearer and more accurate feedback as to 1) when a charm is fully ready to do work, 2) if a charm is not ready to do work, why not. A charm declaring itself ready is part of the picture. What is more important is when the system is ready. You don't want to start pumping requests through your 'ready' webserver, only to have it torn away as a new block device is mounted on your database when its storage-joined hook is invoked and returned to 'ready' state again once the storage-changed hook has completed successfully. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Feature Request: show running relations in 'juju status'
On 14 November 2014 22:31, Mario Splivalo mario.spliv...@canonical.com wrote: Hello, good people! How hard would it be to implement 'showing running relations in juju status'? Currently there is no easy (if any) way of knowing the state of the deployment. When one does 'juju add-relation' the relation hooks are run, but there is no feedback on weather the hooks are still running or everything is done. Only in case there is a hook error you would see that one in 'juju status'. One can have logs tailed and assume that when there is no action for some amount of time - everything deployed as it should. Having juju status display number of running hooks would greatly help in troubleshooting deployments. This has been my most wanted feature for well over a year, and at the moment is covered by https://bugs.launchpad.net/juju-core/+bug/1254766. Unfortunately, I don't think the work has been scheduled and I don't think the latest round of updates to 'juju status' cover it. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Using subdocument _id fields for multi-environment support
On 1 October 2014 11:25, Menno Smits menno.sm...@canonical.com wrote: MongoDB allows the _id field to be a subdocument so Tim asked me to experiment with this to see if it might be a cleaner way to approach the multi-environment conversion before we update any more collections. The code for these experiments can be found here: https://gist.github.com/mjs/2959bb3e90a8d4e7db50 (I've included the output as a comment on the gist). What I've found suggests that using a subdocument for the _id is a better way forward. This approach means that each field value is only stored once so there's no chance of the document key being out of sync with other fields and there's no unnecessary redundancy in the amount of data being stored. The fields in the _id subdocument are easy to access individually and can be queried separately if required. It is also possible to create indexes on specific fields in the _id subdocument if necessary for performance reasons. Using a subdocument for the _id is taught and recommended in the MongoDB courseware. In particular, the index is more useful to the query planner. If the fields are separate, then mongodb will end up querying by unit name and then filtering the results by environment (but that won't matter much in this case). -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Using subdocument _id fields for multi-environment support
On 1 October 2014 19:31, Kapil Thangavelu kapil.thangav...@canonical.com wrote: every _id seem like clear wins for subdoc _ids. Although i'm curious what effect this data struct has on mongo resource reqs at scale vs the compound string, as mongo tries keeps _id sets in mem, when it doesn't fit in mem, perf becomes unpredictable (aka bad) as there's two io per doc fetch (id, and doc) and extra io on insert to verify uniqueness. I think it is the index that needs to be kept in RAM, rather than the actual _id, so it will be a win here. Instead of having 3 indexes to keep in RAM to stop performance sucking (_id, unit, environment), we now just have a single fatter one. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: logrotate configuration seems wrong
On 15 September 2014 12:38, John Meinel j...@arbash-meinel.com wrote: 7) copytruncate seems the wrong setting for interactive with rsyslog. I believe rsyslog is already aware that the file needs to be rotated, and thus It is only aware if you sent it a HUP signal. it shouldn't be trying to write to the same file handle (and thus we don't need to truncate in place). I'm not 100% sure on the interactions here, but copytruncate seems to have an inherent likelyhood of dropping data (while you are copying, if any data gets written then you'll miss those last few bytes when you go to truncate, right?) -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Juju Actions - Use Cases
On 10 September 2014 11:23, Tim Penhey tim.pen...@canonical.com wrote: On 10/09/14 06:59, John Weldon wrote: We're looking for use cases for Juju Actions, mostly to make sure we expose the right API. I'm hoping for a few different use cases from the Juju Web UI folks, but I'd appreciate input from anyone wanting to use Juju Actions in their charms too. I've started a document with some example use cases to prime the pump: please contribute to this document and don't feel constrained to the style or layout I adopted for the examples. If you have any interest or investment in using or publishing Actions for Juju please review and contribute! Google Docs Link https://docs.google.com/document/d/1uYffkkGA1njQ1oego_h8BYBMrlGpmN_lwsnrOZFxE9Q/edit?usp=sharing I'd love to see explicit backup/restore actions for the postgresql charm. For the PostgreSQL charm, off the top of my head: backup-start - May be a logical backup or filesystem level backup, so perhaps 2 actions - May take hours or days. - Scheduled in cron, in addition to on demand. - Should it return immediately, or emit status while the backup progresses? - Can backups be streamed back to the user? If not, the charm has to support many storage options. backup-cancel - Cancel a running backup - The charm might need to cancel a backup, eg. if failover has been triggered. backup-status - Status of running backups backup-recover - Destroy the master database, rebuilding using the backup - May take hours or days, multi-terrabyte databases are not uncommon. - Can the backup be streamed from the user? If not, the charm has to support many storage options. - If backup is filesystem level, optionally recover to a specific point in time. - Does not require location of backup, as default would be the automatic backups. - Only makes sense running on a unit in the master service if using cascading replica services - Recovery does not have to happen on the master unit in the master service. If recovery is done on a hot standby unit in the master service, that hot standby will be promoted to master when it completes. - Once recovery is complete, all hot standbys need to be rebuilt from the master failover - Promote a specific unit to be the master. rebuild - Rebuild a hot standby unit from the master unit. - This may rarely need to be done by an end user, eg. if a unit has desynchronized during an extended netsplit and the data required to catch up is no longer available. - More likely, this action will be invoked by the backup-recover action - Most likely, this action will be invoked by the peer-relation-joined and slave-relation-joined hooks, allowing the rebuild to be done asynchronously rather than the current situation where the hooks may take hours or days to complete. For the pgbouncer charm, matching the main pgbouncer actions: stop - Stop the pgbouncer daemon. - Big hammer if the 'disable, kill, pause, resume, enable' dance is not your style. start - Start the daemon. disable [db] - Disable new client connections to a given database kill [db] - Immediately drop all client and server connections on a given database. enable [db] - Reenable a database after a 'disable' pause [db] - Disconnect from a database, first waiting for queries to complete. resume [db] - Resume after a previous 'pause'. The storage and storage-subordinate charms could have some interesting use cases, although these might end up being swallowed by juju-core rather than become actions. At the moment the storage-subordinate informs the charm when the requested filesystem mount is ready, and it is the host charm's responsibility to shut down daemons, move datafiles to the new mount, and restart. If there were standard actions to stop and start the system, then the subordinate could do everything and the only burden placed on the host charm is advertising a patch that contains all of its data files. Perhaps these start/stop actions already exist in the form of the start/stop hooks. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Juju Actions - Use Cases
On 10 September 2014 19:49, Richard Harding rick.hard...@canonical.com wrote: I think most of the use cases presented so far line up with ours. One I want to call out as interesting and I hadn't thought about is killing a long running action in progress. The example of a database backup. I don't see anything along those lines in the current api doc. You can cancel something from the queue, but can you cancel something running. I don't think this one impacts the design. The cancel action can kill the process being run by the backup action easily enough, and that still meets my use case. Oh... I'll add one more to the list while I'm here reset-secrets - Causes all generated passwords and secrets to be regenerated. - Likely will cause a micro outage as clients will get disconnected, so it is on demand rather that done automatically every few hours. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: A beginner's adventure in Charm authoring
On 4 September 2014 14:26, John Meinel j...@arbash-meinel.com wrote: Deploying a local charm is needlessly complex. Why do I need to create a special directory structure, move my code under there, set --repository and write local:foo and even then it has to go scanning through the directory, looking for a charm with the right name in the metadata.yaml. Why can't I just say deploy the charm in this directory? e.g. juju deploy --local=path Bam, done. At the very least we need to know what OS Series the charm is targeting. Which is currently only inferred from the path. I don't particularly like it, and I think the code that searches your whole repository and then picks the best one is bad, as it confuses people far more often than it is helpful. (If you have $REPO, and have $REPO/precise/charm and $REPO/precise/charm-backup but the 'revision' number in charm-backup is higher for whatever reason, juju deploy --repository=$REPO charm will actually deploy charm-backup) I'm certainly for deploy the charm in this directory as long as we can sort out a good way to determine the series. The only sane way I see is for the charm to declare what series it supports, probably in its metadata.yaml. In practice, we regularly deploy branches targetted to precise to trusty and vice versa because one branch supports both series and the branch on the other series just an unmaintained atavism. I think forcing a 1:1 mapping between a branch and a series is not useful to anyone, and the series component in the charm URL just causes confusion. Well... it might have one use. Versioning. It gives you a way of breaking backwards compatibility with old versions of your charm. So for instance, the major rewrite of the Cassandra charm won't be able to upgrade-charm from the old version, so instead we hope to push it to trusty and leave the precise branch to rot in peace. Not ideal, but the only way of doing charm versioning at the moment. In fact, now I think about it the release in the URL *is* the major version (series) of the charm. It is just unfortunate that the possible charm versions have been hardwired to the Ubuntu releases, because the Ubuntu release is much less important than the release of the software I'm charming. I think we could decouple this, allowing arbitrary supported series in a charm and gaining a sane charm versioning concept, by redoing the Launchpad model and changing charms from being sourcepackages on a distribution called 'charms' to instead being products with product series. I could then switch product-series whenever my charm chances the set of supported Ubuntu releases, or when upgrade-charm stops working without manual steps. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: A beginner's adventure in Charm authoring
On 4 September 2014 16:30, John Meinel j...@arbash-meinel.com wrote: ... The only sane way I see is for the charm to declare what series it supports, probably in its metadata.yaml. In practice, we regularly deploy branches targetted to precise to trusty and vice versa because one branch supports both series and the branch on the other series just an unmaintained atavism. I think forcing a 1:1 mapping between a branch and a series is not useful to anyone, and the series component in the charm URL just causes confusion. So how do we decide what image to bring up to install your charm on? If it supports multiple OS series, then you still need a place/syntax/something to disambiguate what you actually want us to do. (I'm not saying that being directly in the URL is the ideal place, but we do need to consider how we interact with the system.) I imagine the list in config.yaml would be in recommended order. That order would be used if the series was not explicitly specified in the constraints when deploying the service. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: First customer pain point pull request - default-hook
On 22 August 2014 10:43, Marco Ceppi marco.ce...@canonical.com wrote: So there is already a JUJU_HOOK_NAME environment variable. So that is easy enough. I'm not sure what the issue is with having a default-hook file that is executed when juju can't find that hook name. I don't want to make it an all or nothing solution where you either have one file or hooks per file, there doesn't seem to be any real advantage to that. For example my default - hook might be written in a language not on the cloud image, now I need an install hook which installs that interpreter. Looking at the charms I am writing now, I have install, start, stop and do-everything-else. peer relation-broken is possibly the other one that will need to be special, to ensure a unit being destroyed doesn't stomp on active resources being used by the remaining peers. I'm a plus one to a fall back of default-hook when hook isn't found and a +1 to the already existent environment variable. I'm +0. The symlinks are a dead chicken that needs to be sacrificed, but it is all explicit. I can imagine problems with default-hook too, such as a typo causing your default-hook to be called instead of your desired hook. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Intentionally introducing failures into Juju
On 14 August 2014 07:31, Menno Smits menno.sm...@canonical.com wrote: I like the idea being able to trigger failures using the juju command line. I'm undecided about how the need to fail should be stored. An obvious location would be in a new collection managed by state, or even as a field on existing state objects and documents. The downside of this approach is that a connection to state will then need to be available from where-ever we would like failures to be triggered - this isn't always possible or convenient. Another approach would be to have juju inject-failure drop files in some location (along the lines of what I've already implemented) using SSH. This has the advantage of making the failure checks easy to perform from anywhere with the disadvantage of making it more difficult to manage existing failures. There would also be some added complexity when creating failure files for about-to-be-created entities (e.g. the juju deploy --inject-failure case). Do you have any thoughts on this? Further to just injecting failures, I'm interested in controlling when and the order hooks can run. A sort of manual mode, which could be driven by a test harness such as Amulet. Perhaps all hooks in the queue are initially held, and I can unhold them one at a time. This would let me test the odd edge cases, such as peers departing peer relations during handshaking, or what happens when a new client unit is added and its relation-changed hooks manages to run before the relation-joined hooks at the server end. If you could do this, you could inject your failures by actually breaking your units using juju run or juju ssh. Deploy your units, run the install hooks, juju ssh in breaking one of the units (rm -rf /, whatever), run the peer relation hooks, confirm that the service is still usable despite the failed unit. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Implement system reboot via juju hooks
On 11 August 2014 18:20, William Reade william.re...@canonical.com wrote: I'd like to explore your use cases a bit more to see if we can find a clean solution to your problems that doesn't go too far down the (2) road that I'm nervous about. (The try-again-later mechanism is much smaller and cleaner and I think we can accommodate that one pretty easily, fwiw -- but what are the other problems you want to solve?) Memory related settings in PostgreSQL will only take effect when the database is bounced. I need to avoid bouncing the primary database: 1) when backups are in progress. 2) when a hot standby unit is being rebuilt from the primary. Being able to have a hook abort and be retried later would let me avoid blocking. A locking service would be useful too for units to signal certain operations (with locks automatically released when the hooks that took them exit). The in-progress update to the Cassandra charm has convoluted logic in its peer relation hooks to do rolling restarts of all the nodes, and I imagine MongoDB, Swift and many others have the same issue to solve. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Implement system reboot via juju hooks
On 8 August 2014 19:58, Gabriel Samfira gsamf...@cloudbasesolutions.com wrote: Hello folks! I would like to start work on implementing reboots via juju hooks. I have outlined in a google docs document a few thoughts regarding why this is needed and some implementation details I would like to discuss before starting. You may find the doc here: http://goo.gl/tGoIuM Any thoughts/suggestions are welcome. Gabriel I don't think this should be restricted to server reboots. The framework is generally useful. I have hooks that need to bounce the primary service so config changes can take effect. They can't do that if a long running operation is currently in progress, eg. a backup or a replica node is being built. Currently, I need to block the hook until such time as I can proceed. I think this would be cleaner if I could instead return a particular error code from my hook, stating that it is partially complete and requesting it to be rescheduled. So it would be nice if requesting a reboot and requesting a hook to be rescheduled are independent things. I had wondered if juju-run should allow arbitrary things to be run in a hook context later. juju-run --after hook /sbin/reboot # queue the reboot command to be run after this hook completes. juju-run --after hook config-changed # queue the config-changed hook to be run after this hook completes, and after any previously queued commands juju-run --after tomorrow report-status # Run the report-status command sometime after 24 hours. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
Re: Mongo experts - help need please
On 25 July 2014 12:05, Gustavo Niemeyer gustavo.nieme...@canonical.com wrote: On Fri, Jul 25, 2014 at 1:02 AM, Ian Booth ian.bo...@canonical.com wrote: We've transitioned to using Session.Copy() to address the situation whereby Juju would create a mongo collection instance and then continue to make db calls against that collection without realising the underlying socket may have become disconnected. This resulted in Juju components failing, logging i/o timeout errors talking to mongo, even though mongo itself was still up and running. Sounds sane, as I indicated in previous discussions about the topic in these last two weeks and also about a year ago when we covered that. Serializing every single request to a concurrent server via a single database connection seems like a pretty bad idea for anything but simplistic servers. As an aside - I'm wondering whether the mgo driver shouldn't transparently catch an i/o error associated with a dead socket and retry using a fresh connection rather than imposing that responsibility on the caller? The evidence so far indicates that this will likely not happen. The current design was purposefully put in place so that harsh connection errors are not swept under the rug, and this seems to be working well so far. I'd rather not have juju proceeding over a harsh problem such as a master re-election midway through the execution of an algorithm without any indication that the failure has happened, let alone silently retry operations that in most cases are not idempotent. That said, the goal is of course not to make the developer's life miserable. All the driver wants is an acknowledgement that the error was perceived and taken care of. This is done trivially by calling: session.Refresh() Done. The driver will happily drop the error notice, and proceed with further operations, blocking if waiting for a re-election to take place is necessary. The bug Ian cites and is trying to work around has sessions failing with an i/o error after some time (I'm guessing resource starvation in MongoDB or TCP networking issues). session.Copy() is pulling things from a pool, so it might be handing out sessions doomed to fail with exactly the same issue. The connections in the pool could even be perfectly functional when they went in, with no way at the go level of knowing they have failed without trying them. If this is the case, then Ian would need to handle the failure by ensuring the failed connection does not go back in the pool and grabbing a new one (the defered Close() will return it I think). And repeating until it works, or until the pool has been exhausted and we know Mongo is actually down rather than just having a polluted pool. -- Stuart Bishop stuart.bis...@canonical.com -- Juju-dev mailing list Juju-dev@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev