Weekly Development Summary

2017-07-27 Thread Menno Smits
Hi everyone,

It's time for another update on what the Juju team has been up to. There's
been lots of great progress. Here are the highlights...

*Cross Model Relations*
It is now possible to establish relations between controllers! Here's a
basic example of how this looks on the command line:

$ juju bootstrap aws foo
$ juju deploy mysql
$ juju offer mysql:db

$ juju bootstrap aws bar
$ juju deploy mediawiki
$ juju expose mediawiki
$ juju relate mediawki:db foo:admin/default.mysql

This even works when the controllers are running in different clouds,
opening up all a number of exciting possibilities.

*Upgrades from Juju 1.25 to 2.x*
There's been lots of progress this week with the tooling to allow 1.25
deployments to be upgraded to 2.x. More aspects of Juju's model are now
covered by the export tools and work is now underway to support upgrading
of agent binaries. Work will start soon towards converting LXC containers
to LXD.

*Local Resources in Bundles*
Just as bundles can reference local charms, they can now reference
local resources.
As well as a referencing a revision of a resource in the charm store, you
can now specify a path to a local file when creating a bundle with charm
resources included. Here's a (fake) example of how this looks:

services:
  ubuntu:
charm: "/path/to/ubuntu/charm"
num_units: 1
resources:
  software: "/path/to/bundle/simple-bundle.zip"

Previously, the "software" field above could have only referred to a
resource revision number.

This feature will land in the "develop" branch in the next day or so and
will be released as part of Juju 2.3.

*Operating System Upgrade Support*
Work is underway on a new "update-series" command which allows the operator
to tell Juju that the operating system version for an application or
machine has changed. The idea is that the operator can perform an operating
system upgrade of one or more Juju managed machines and then tell Juju
about the change.

*Juju 2.2.3*
There will be another Juju 2.2 release out soon. It'll have the following
important changes:

   - Fixed a recently introduced upgrade issue
    affecting the Azure and
   vSphere providers.
   - Fixed model watching API to correctly report
    instance status changes.
   - Added  "primary-network"
   configuration attribute for the vSphere provider.
   - Jitter added to metrics collection to spread out load on Juju
   controllers.
   - Fixed to charm resource cleanup.
   - Fixed potential race  when
   completing model migrations.


*Quick Links*
  Work pending: https://github.com/juju/juju/pulls
  Recent commits: https://github.com/juju/juju/commits/develop

Have a great weekend.

Cheers,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Organising apiserver facades

2017-07-09 Thread Menno Smits
On 9 July 2017 at 18:13, roger peppe  wrote:

> Getting all of that facade-specific logic out of the apiserver package
> seems like a great idea to me. I don't think it would take *too* much work
> to factor the helper logic into its own package entirely, with no
> dependencies on the facades.
>



> The client is actually almost there already with its APICaller interface
> as common currency - we should really follow the intent of this old
> comment of William's
> .The
> manifold issue is the interesting one - work out a good solution to that
> and the rest'll come quite quickly, I feel.
>

​FWIW, that list of client side facade factory methods on api.Connection
which shouldn't be there used to be a lot bigger. We've removed about two
thirds of them while converting the machine agent to use the dependency
engine.


>
> It would be marvellous to get parts moving more freely in Juju again -
> losing big fan-out dependency graphs is a good way to move in that
> direction, I'm pretty sure.
>

​+1

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Organising apiserver facades

2017-07-06 Thread Menno Smits
+1

This helps us get to towards the requested feature of having client and
agent facades served from different network spaces.

Having controller agent facades only served on localhost would also be nice
from a security perspective.

On 7 Jul 2017 11:17 am, "Andrew Wilkins" 
wrote:

> On Thu, Jul 6, 2017 at 7:09 PM John Meinel  wrote:
>
>> I'd really like to see us split apart the facades-by-purpose. So we'd
>> collect the facades for Agents separately from facades for Users (and
>> possibly also facades for Controller).
>> I'm not sure if moving things just into 'facades' just moves the problem
>> around and leaves us with just a *different* directory that is a bit
>> cluttered.  But I'm +1 on things that would help organize the layout.
>>
>
> Cool. I was considering controller vs. agent already, separating client
> off sounds good to me too. I'll send a PR soon.
>
>
>> John
>> =:->
>>
>> On Thu, Jul 6, 2017 at 1:55 PM, Andrew Wilkins <
>> andrew.wilk...@canonical.com> wrote:
>>
>>> The juju/apiserver package currently has a whole lot of facade packages
>>> within it, as well as some other packages related to authentication,
>>> logging, and other bits and bobs. I find it difficult to navigate and tell
>>> what's what a lot of the time.
>>>
>>> I'd like to move the apiserver facade packages into a common "facades"
>>> sub-directory:
>>>   apiserver/facades/application
>>>   apiserver/facades/client
>>>   apiserver/facades/controller
>>>   etc.
>>>
>>> Any objections? Or alternative suggestions?
>>>
>>> Cheers,
>>> Andrew
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/
>>> mailman/listinfo/juju-dev
>>>
>>>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: lost reviews

2017-07-05 Thread Menno Smits
I *believe* a database dump of the reviewboard site was kept but the site
was taken down as it wasn't being used.

Juju QA folks may be able to shed more light.

On 25 May 2017 at 01:44, roger peppe <roger.pe...@canonical.com> wrote:

> Despite my plea (quoted below) it seems that our reviewboard site is
> not available any more, so an important set of contextual information
> on the Juju code base has disappeared. Our commit comments are often
> not that useful. I have often found the review comments invaluable for
> determining the status of a feature and whether a given piece of code
> is deliberate or a bug, particularly when refactoring.
>
> It's somewhat ironic that the earlier codereview.appspot.com reviews
> are still available while the later ones have gone.
>
> For the record, the codereview period (still available) spans 1303
> reviews from 2012-04-19 to 2014-06-03 and the reviewboard period spans
> at least 3696 reviews from 2014-06-03 to 2017-02-08.
>
> Is it possible that this information could be retrieved and made
> available somewhere again? Or has it gone forever?
>
>   with crossed fingers,
> rog.
>
> On 24 October 2016 at 22:41, roger peppe <roger.pe...@canonical.com>
> wrote:
> > On 24 October 2016 at 22:22, Menno Smits <menno.sm...@canonical.com>
> wrote:
> >> On 25 October 2016 at 10:17, Horacio Duran <horacio.du...@canonical.com
> >
> >> wrote:
> >>>
> >>> Shouldn't we leave it for historic purposes?
> >>>
> >>
> >> Will it really get used? My bet is that the project's commit history
> will be
> >> enough.
> >
> > I think that review history is crucial for context on historic
> > code decisions - I often look into a review to see why a
> > particular piece of code is the way it is (including old
> > Juju codereview reviews).
> >
> > It would be unfortunate to lose them in my view.
> >
> >   cheers,
> > rog.
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Consuming MongoDB from a Snap

2017-06-22 Thread Menno Smits
On 23 June 2017 at 12:25, Michael Hudson-Doyle  wrote:

>
>>> The answer appears to be "yes with some caveats". For xenial onwards
>>> there are snapd packages for all the architectures the Juju team cares
>>> about.
>>>
>>
>> Ah, I thought the question was rather whether or not the mongo snap
>> existed for all of those architectures. I don't think it does. IIANM, the
>> snap comes from https://github.com/niemeyer/snaps/blob/master/mongodb/
>> mongo32/snapcraft.yaml, which (if you look at the "mongodb" part,
>> appears to only exist for x86_64). So we would need to do some work on that
>> first.
>>
>
> Mongo 3.2 upstream doesn't support s390x. 3.4 does though, so I don't see
> any immediate reason why a snap of 3.4 couldn't support all arches we care
> about.
>
> I was asking about snaps in the context of moving to 3.4, really.
>

​It makes sense to ​do the switch to a MongoDB snap as part of the move to
MongoDB 3.4.

   https://packages.ubuntu.com/xenial/snapd
>>>
>>> For trusty only amd64, armhf and i386 appear to be supported.
>>>
>>>https://packages.ubuntu.com/trusty-updates/snapd
>>>
>>> This is probably ok. I think it's probably fine to start saying that new
>>> Juju controllers, on some architectures at least, need to be based on
>>> xenial or later.
>>>
>>
>> Since the controller machine isn't designed for workloads, it seems fine
>> to me to restrict them to latest LTS.
>>
>
> Eh well, there is no juju-mongodb3.2 package for trusty at all is there?
>

True. There's a juju-mongodb for trusty but is has 2.4.9 so a lack of snapd
support for some arches on trusty is a moot point.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Consuming MongoDB from a Snap

2017-06-22 Thread Menno Smits
On 23 June 2017 at 12:09, Andrew Wilkins 
wrote:

>
>> *1. Does snapd work on all architectures that Juju supports?*
>>
>> The answer appears to be "yes with some caveats". For xenial onwards
>> there are snapd packages for all the architectures the Juju team cares
>> about.
>>
>
> Ah, I thought the question was rather whether or not the mongo snap
> existed for all of those architectures. I don't think it does. IIANM, the
> snap comes from https://github.com/niemeyer/snaps/blob/master/
> mongodb/mongo32/snapcraft.yaml, which (if you look at the "mongodb" part,
> appears to only exist for x86_64). So we would need to do some work on that
> first.
>

​I imagine we would have a custom MongoDB snap for Juju rather than using
this one as is. We want direct control over the snap. The niemeyer snap
would probably be a good starting point though.



>
>>https://packages.ubuntu.com/trusty-updates/snapd
>>
>> This is probably ok. I think it's probably fine to start saying that new
>> Juju controllers, on some architectures at least, need to be based on
>> xenial or later.
>>
>
> Since the controller machine isn't designed for workloads, it seems fine
> to me to restrict them to latest LTS.
>
> One issue would be upgrades: we would either have to continue supporting
> both snaps and debs for mongodb, or we would have to disallow upgrading
> from a system that doesn't support snaps. That would OK as long as there
> are no workloads on the controller, as we could use migration.
>

This would certainly be a good case to use migrations.​

*2. Does snapd work inside LXD containers?*
>>
>> Although it's rarely done, it's possible to set up a Juju HA cluster
>> where some nodes are running inside LXD containers so this is something
>> we'd need to consider.
>>
>
> It would suck if we couldn't test using the lxd provider, though.
>

/me slaps forehead for forgetting the more obvious use case.​

At any rate, snapd in LXD containers does seem to work from xenial onwards.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Consuming MongoDB from a Snap

2017-06-22 Thread Menno Smits
We've had some discussion this week about whether Juju could use MongoDB
from snap instead of a deb. This would make it easier for Juju to stay up
to date with the latest MongoDB releases, avoiding the involved process
getting each update into Ubuntu's update repository, as well as giving us
all the other advantages of snaps.

Two concerns were raised in this week's tech board meeting.

*1. Does snapd work on all architectures that Juju supports?*

The answer appears to be "yes with some caveats". For xenial onwards there
are snapd packages for all the architectures the Juju team cares about.

   https://packages.ubuntu.com/xenial/snapd

For trusty only amd64, armhf and i386 appear to be supported.

   https://packages.ubuntu.com/trusty-updates/snapd

This is probably ok. I think it's probably fine to start saying that new
Juju controllers, on some architectures at least, need to be based on
xenial or later.

*2. Does snapd work inside LXD containers?*

Although it's rarely done, it's possible to set up a Juju HA cluster where
some nodes are running inside LXD containers so this is something we'd need
to consider.

>From xenial onwards, snapd does indeed work inside LXD containers. I
followed Stephane's instructions using a xenial container and successfully
installed a number of non-trivial, working snaps including Gustavo's
mongo32 snap.

  https://stgraber.org/2016/12/07/running-snaps-in-lxd-containers/


There is of course more testing to be done but it seems like having Juju's
MongoDB in a snap is certainly doable.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Debugging MongoDB performance issues

2017-06-11 Thread Menno Smits
Hi everyone,

I started writing an email to a few Juju developers with tips on diagnosing
MongoDB performance issues but then realised that this would be more useful
as a wiki page.

https://github.com/juju/juju/wiki/Diagnosing-MongoDB-Performance

Feel free to expand and add your own techniques.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Github Reviews vs Reviewboard

2016-10-24 Thread Menno Smits
On 25 October 2016 at 10:55, Katherine Cox-Buday <
katherine.cox-bu...@canonical.com> wrote:

> roger peppe  writes:
>
> > I think that review history is crucial for context on historic
> > code decisions
>
> I wonder if we could hack a script to save the reviews as git notes, e.g.
> https://github.com/google/git-appraise
>
> With git's ability to rewrite history, I bet this is doable...
>

​+1
​This is a great idea.
​ We could also import the old reviews from ​
codereview.appspot.com
​.​

For those who don't know what git notes are, they're a way of adding extra
information to commits without modifying the commit itself. Notes can be
viewed and manipulated using the "git notes" subcommand. "git show" will
also show any notes for a commit.

Github used to display notes but no longer does for some reason.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Github Reviews vs Reviewboard

2016-10-24 Thread Menno Smits
On 25 October 2016 at 10:17, Horacio Duran 
wrote:

> Shouldn't we leave it for historic purposes?
>
>
​Will it really get used? My bet is that the project's commit history will
be enough.​
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Github Reviews vs Reviewboard

2016-10-24 Thread Menno Smits
The votes are in: Github 8, Reviewboard 5. It looks like we stick with
Github Reviews.

I'm going to email some people now about tearing down the Reviewboard
instance.

On 15 October 2016 at 06:57, Casey Marshall 
wrote:

> +1, as I work on many other Github projects besides Juju and it's
> familiar. It's not perfect by any means but I can work with it.
>
> I thought the ReviewBoard we had was pretty ugly and buggy, but it was
> reasonably easy to use. Gerrit is cleaner and clearer to me -- though I
> feel like Gerrit is also kind of rough on the uninitiated. Maybe if a newer
> version of RB was sufficiently improved and it was charmed up well, its
> operation would be more manageable, and it'd be OK?
>
> -Casey
>
> On Fri, Oct 14, 2016 at 12:34 PM, Andrew McDermott <
> andrew.mcderm...@canonical.com> wrote:
>
>>
>> On 14 October 2016 at 16:26, Mick Gregg  wrote:
>>
>>> I would probably chose gerrit over either, but that's not the question
>>> today.
>>>
>>
>> Oooh, yes to gerrit. +2
>>
>>
>>
>> --
>> Andrew McDermott 
>> Juju Core Sapphire team 
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>> an/listinfo/juju-dev
>>
>>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at: https://lists.ubuntu.com/
> mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Github Reviews vs Reviewboard

2016-10-13 Thread Menno Smits
-1

I was really excited by Github Reviews initially but after using it for a
while I've switched my position.

On 14 October 2016 at 11:44, Menno Smits <menno.sm...@canonical.com> wrote:

> We've been trialling Github Reviews for some time now and it's time to
> decide whether we stick with it or go back to Reviewboard.
>
> We're going to have a vote. If you have an opinion on the issue please
> reply to this email with a +1, 0 or -1, optionally followed by any further
> thoughts.
>
>- +1 means you prefer Github Reviews
>- -1 means you prefer Reviewboard
>- 0 means you don't mind.
>
> If you don't mind which review system we use there's no need to reply
> unless you want to voice some opinions.
>
> The voting period starts *now* and ends my* EOD next Friday (October 21)*.
>
> As a refresher, here are the concerns raised for each option.
>
> *Github Reviews*
>
>- Comments disrupt the flow of the code and can't be minimised,
>hindering readability.
>- Comments can't be marked as done making it hard to see what's still
>to be taken care of.
>- There's no way to distinguish between a problem and a comment.
>- There's no summary of issues raised. You need to scroll through the
>often busy discussion page.
>- There's no indication of which PRs have been reviewed from the pull
>request index page nor is it possible to see which PRs have been approved
>or otherwise.
>- It's hard to see when a review has been updated.
>
> *Reviewboard*
>
>- Another piece of infrastructure for us to maintain
>- Higher barrier to entry for newcomers and outside contributors
>- Occasionally misses Github pull requests (likely a problem with our
>integration so is fixable)
>- Poor handling of deleted and renamed files
>- Falls over with very large diffs
>- 1990's looks :)
>- May make future integration of tools which work with Github into our
>process more difficult (e.g. static analysis or automated review tools)
>
> There has been talk of evaluating other review tools such as Gerrit and
> that may still happen. For now, let's decide between the two options we
> have recent experience with.
>
> - Menno
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Github Reviews vs Reviewboard

2016-10-13 Thread Menno Smits
We've been trialling Github Reviews for some time now and it's time to
decide whether we stick with it or go back to Reviewboard.

We're going to have a vote. If you have an opinion on the issue please
reply to this email with a +1, 0 or -1, optionally followed by any further
thoughts.

   - +1 means you prefer Github Reviews
   - -1 means you prefer Reviewboard
   - 0 means you don't mind.

If you don't mind which review system we use there's no need to reply
unless you want to voice some opinions.

The voting period starts *now* and ends my* EOD next Friday (October 21)*.

As a refresher, here are the concerns raised for each option.

*Github Reviews*

   - Comments disrupt the flow of the code and can't be minimised,
   hindering readability.
   - Comments can't be marked as done making it hard to see what's still to
   be taken care of.
   - There's no way to distinguish between a problem and a comment.
   - There's no summary of issues raised. You need to scroll through the
   often busy discussion page.
   - There's no indication of which PRs have been reviewed from the pull
   request index page nor is it possible to see which PRs have been approved
   or otherwise.
   - It's hard to see when a review has been updated.

*Reviewboard*

   - Another piece of infrastructure for us to maintain
   - Higher barrier to entry for newcomers and outside contributors
   - Occasionally misses Github pull requests (likely a problem with our
   integration so is fixable)
   - Poor handling of deleted and renamed files
   - Falls over with very large diffs
   - 1990's looks :)
   - May make future integration of tools which work with Github into our
   process more difficult (e.g. static analysis or automated review tools)

There has been talk of evaluating other review tools such as Gerrit and
that may still happen. For now, let's decide between the two options we
have recent experience with.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Big memory usage improvements

2016-10-12 Thread Menno Smits
On 13 October 2016 at 10:36, Katherine Cox-Buday <
katherine.cox-bu...@canonical.com> wrote:

> Menno Smits <menno.sm...@canonical.com> writes:
>
> > Christian (babbageclunk) has been busy fixing various memory leaks in
> the Juju
> > controllers and has made some significant improvements.
>
> Awesome, good work, Christian!
>
> Not to detract from this fantastic news, but it's still worrisome that an
> idle Juju seems to have a memory which is growing linearly (before picture
> looked like the beginning of an exponential curve?). Do we feel this is
> memory which will at some point be GCed?
>

​To be clear, the Juju controller is anything but idle in this test. Models
are being continually added and removed, with charms deployed, over the
course of 8 hours.

But yes, there does appear to still be some memory not being released
somewhere. The problem is much less severe now.

- Menno

​
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Big memory usage improvements

2016-10-12 Thread Menno Smits
Christian (babbageclunk) has been busy fixing various memory leaks in the
Juju controllers and has made some significant improvements. Chris
(veebers) has been tracking resource usage for a long running test which
adds and removes a bunch of models and he noticed the difference.

Take a look at the memory usage graphs here:

Before: http://people.canonical.com/~leecj2/perfscalemem/
After: http://people.canonical.com/~leecj2/perfscalemem2/

Interestingly the MongoDB memory usage profile is quite different as well.
I'm not sure if this is due to Christian's improvements or something else.

There's possibly still some more small leaks somewhere but this is
fantastic regardless. Thanks to Christian for tackling this and Chris for
tracking the numbers.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Reviews on Github

2016-09-20 Thread Menno Smits
(gah, hit send too early)

... If we decide to stay with RB, that will need to be fixed.

On 21 September 2016 at 09:53, Menno Smits <menno.sm...@canonical.com>
wrote:

> Some of us probably got a little excited (me included). There should be
> discussion and a clear announcement before we make a signigicant change to
> our process. The tech board meeting is today/tonight so we'll discuss it
> there as per Rick's email. Please contribute to this thread if you haven't
> already and have strong opinions either way on the topic.
>
> Interestingly our Github/RB integration seems to have broken a little
> since Github made these changes. The links to Reviewboard on pull requests
> aren't getting inserted any more. If we decide to stay with RB
>
> On 21 September 2016 at 05:54, Rick Harding <rick.hard...@canonical.com>
> wrote:
>
>> I spoke with Alexis today about this and it's on her list to check with
>> her folks on this. The tech board has been tasked with he decision, so
>> please feel free to shoot a copy of your opinions their way. As you say, on
>> the one hand it's a big impact on the team, but it's also a standard
>> developer practice that not everyone will agree with so I'm sure the tech
>> board is a good solution to limiting the amount of bike-shedding and to
>> have some multi-mind consensus.
>>
>> On Tue, Sep 20, 2016 at 1:52 PM Katherine Cox-Buday <
>> katherine.cox-bu...@canonical.com> wrote:
>>
>>> Seems like a good thing to do would be to ensure the tech board doesn't
>>> have any objections and then put it to a vote since it's more a property of
>>> the team and not the codebase.
>>>
>>> I just want some consistency until a decision is made. E.g. "we will be
>>> trying out GitHub reviews for the next two weeks; all reviews should be
>>> done on there".
>>>
>>> --
>>> Katherine
>>>
>>> Nate Finch <nate.fi...@canonical.com> writes:
>>>
>>> > Can we try reviews on github for a couple weeks? Seems like we'll
>>> > never know if it's sufficient if we don't try it. And there's no setup
>>> > cost, which is nice.
>>> >
>>> > On Tue, Sep 20, 2016 at 12:44 PM Katherine Cox-Buday
>>> > <katherine.cox-bu...@canonical.com> wrote:
>>> >
>>> > I see quite a few PRs that are being reviewed in GitHub and not
>>> > ReviewBoard. I really don't care where we do them, but can we
>>> > please pick a direction and move forward? And until then, can we
>>> > stick to our previous decision and use RB? With people using both
>>> > it's much more difficult to tell what's been reviewed and what
>>> > hasn't.
>>> >
>>> > --
>>> > Katherine
>>> >
>>> > Nate Finch <nate.fi...@canonical.com> writes:
>>> >
>>> > > In case you missed it, Github rolled out a new review process.
>>> > It
>>> > > basically works just like reviewboard does, where you start a
>>> > review,
>>> > > batch up comments, then post the review as a whole, so you don't
>>> > just
>>> > > write a bunch of disconnected comments (and get one email per
>>> > review,
>>> > > not per comment). The only features reviewboard has is the edge
>>> > case
>>> > > stuff that we rarely use: like using rbt to post a review from a
>>> > > random diff that is not connected directly to a github PR. I
>>> > think
>>> > > that is easy enough to give up in order to get the benefit of
>>> > not
>>> > > needing an entirely separate system to handle reviews.
>>> > >
>>> > > I made a little test review on one PR here, and the UX was
>>> > almost
>>> > > exactly like working in reviewboard:
>>> > > https://github.com/juju/juju/pull/6234
>>> > >
>>> > > There may be important edge cases I'm missing, but I think it's
>>> > worth
>>> > > looking into.
>>> > >
>>> > > -Nate
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>>> an/listinfo/juju-dev
>>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>> an/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Reviews on Github

2016-09-15 Thread Menno Smits
Although I share some of Ian's concerns, I think the reduced moving parts,
improved reliability, reduced maintenance, easier experience for outside
contributors and better handling of file moves are pretty big wins. The
rendering of diffs on Github is a whole lot nicer as well.

I'm +1 for trialling the new review system on Github for a couple of weeks
as per Andrew's suggestion.

On 16 September 2016 at 05:50, Nate Finch  wrote:

> Reviewboard goes down a couple times a month, usually from lack of disk
> space or some other BS.  According to a source knowledgeable with these
> matters, the charm was rushed out, and the agent for that machine is down
> anyway, so we're kinda just waiting for the other shoe to drop.
>
> As for the process things that Ian mentioned, most of those can be
> addressed with a sprinkling of convention.  Marking things as issues could
> just be adding :x: to the first line (github even pops up suggestions and
> auto-completes), thusly:
>
> [image: :x:]This will cause a race condition
>
> And if you want to indicate you're dropping a suggestion, you can use :-1:
>  which gives you a thumbs down:
>
> [image: :-1:] I ran the race detector and it's fine.
>
> It won't give you the cumulative "what's left to fix" at the top of the
> page, like reviewboard... but for me, I never directly read that, anyway,
> just used it to see if there were zero or non-zero comments left.
>
> As for the inline comments in the code - there's a checkbox to hide them
> all.  It's not quite as convenient as the gutter indicators per-comment,
> but it's sufficient, I think.
>
> On Wed, Sep 14, 2016 at 6:43 PM Ian Booth  wrote:
>
>>
>>
>> On 15/09/16 08:22, Rick Harding wrote:
>> > I think that the issue is that someone has to maintain the RB and the
>> > cost/time spent on that does not seem commensurate with the bonus
>> features
>> > in my experience.
>> >
>>
>> The maintenance is not that great. We have SSO using github credentials so
>> there's really no day to day works IIANM. As a team, we do many, many
>> reviews
>> per day, and the features I outlined are significant and things I (and I
>> assume
>> others) rely on. Don't under estimate the value in knowing why a comment
>> was
>> rejected and being able to properly track that. Or having review comments
>> collapsed as a gutter indicated so you can browse the code and still know
>> that
>> there's a comment there. With github, you can hide the comments but
>> there's no
>> gutter indicator. All these things add up to a lot.
>>
>>
>> > On Wed, Sep 14, 2016 at 6:13 PM Ian Booth 
>> wrote:
>> >
>> >> One thing review board does better is use gutter indicators so as not
>> to
>> >> interrupt the flow of reading the code with huge comment blocks. It
>> also
>> >> seems
>> >> much better at allowing previous commits with comments to be viewed in
>> >> their
>> >> entirety. And it allows the reviewer to differentiate between issues
>> and
>> >> comments (ie fix this vs take note of this), plus it allows the notion
>> of
>> >> marking stuff as fixed vs dropped, with a reason for dropping if
>> needed.
>> >> So the
>> >> github improvements are nice but there's still a large and significant
>> gap
>> >> that
>> >> is yet to be filled. I for one would miss all the features reviewboard
>> >> offers.
>> >> Unless there's a way of doing the same thing in github that I'm not
>> aware
>> >> of.
>> >>
>> >> On 15/09/16 07:22, Tim Penhey wrote:
>> >>> I'm +1 if we can remove the extra tools and we don't get email per
>> >> comment.
>> >>>
>> >>> On 15/09/16 08:03, Nate Finch wrote:
>>  In case you missed it, Github rolled out a new review process.  It
>>  basically works just like reviewboard does, where you start a review,
>>  batch up comments, then post the review as a whole, so you don't just
>>  write a bunch of disconnected comments (and get one email per review,
>>  not per comment).  The only features reviewboard has is the edge case
>>  stuff that we rarely use:  like using rbt to post a review from a
>> random
>>  diff that is not connected directly to a github PR. I think that is
>> easy
>>  enough to give up in order to get the benefit of not needing an
>> entirely
>>  separate system to handle reviews.
>> 
>>  I made a little test review on one PR here, and the UX was almost
>>  exactly like working in reviewboard:
>> >> https://github.com/juju/juju/pull/6234
>> 
>>  There may be important edge cases I'm missing, but I think it's worth
>>  looking into.
>> 
>>  -Nate
>> 
>> 
>> >>>
>> >>
>> >> --
>> >> Juju-dev mailing list
>> >> Juju-dev@lists.ubuntu.com
>> >> Modify settings or unsubscribe at:
>> >> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>> >>
>> >
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/
>> 

Re: Faster LXD bootstraps and provisioning

2016-08-15 Thread Menno Smits
Thanks Rafael. Would you mind adding this to the wiki page?

On 16 August 2016 at 02:31, Rafael Gonzalez <rafael.gonza...@canonical.com>
wrote:

> Hi Menno,
>
> Thanks for putting this together, great tips.  I recently ran into an
> issue which others could see as well.
>
> One may need to adjust the following for large bundle deployments on LXD.
> A bundle deployment fails with errors about "Too many files open."  This
> will increase number of max open files:
>
> echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf &&
> sudo sysctl -p
>
>
> Regards,
>
> Rafael O. Gonzalez
> Canonical, Solutions Architect
> rgo...@canonical.com
> 1-646-481-7232
>
>
>
> On Sun, Aug 14, 2016 at 8:07 PM, Menno Smits <menno.sm...@canonical.com>
> wrote:
>
>> I've put together a few tips on the wiki for speeding up bootstrap and
>> provisioning times when using the Juju lxd provider. I find these
>> techniques helpful when checking my work or investigating bugs - situations
>> where you end up bootstrapping and deploying many times.
>>
>> https://github.com/juju/juju/wiki/Faster-LXD
>>
>> If you have your own techniques, or improvements to what I'm doing,
>> please update the article.
>>
>> - Menno
>>
>>
>>
>>
>>
>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>> an/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Faster LXD bootstraps and provisioning

2016-08-15 Thread Menno Smits
Good catch Casey. I've just updated the config in the gist to allow access
to any mirror or PPA (in a cleaner way than in the blog article IMO). It
seems to work well (apt-get download is nice way to test).


On 16 August 2016 at 09:27, Casey Marshall <casey.marsh...@canonical.com>
wrote:

> Menno,
> This is great and thanks for sharing!
>
> In case anyone else runs into this.. charms that install from PPAs will
> fail with this squid-deb-proxy setup. You'll need to allow archive mirrors
> for this to work. See https://1337.tips/ubuntu-cache-packages-using-squid-
> deb-proxy/ for an example.
>
> On Mon, Aug 15, 2016 at 9:31 AM, Rafael Gonzalez <
> rafael.gonza...@canonical.com> wrote:
>
>> Hi Menno,
>>
>> Thanks for putting this together, great tips.  I recently ran into an
>> issue which others could see as well.
>>
>> One may need to adjust the following for large bundle deployments on
>> LXD.  A bundle deployment fails with errors about "Too many files open."
>>  This will increase number of max open files:
>>
>> echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf
>> && sudo sysctl -p
>>
>>
>> Regards,
>>
>> Rafael O. Gonzalez
>> Canonical, Solutions Architect
>> rgo...@canonical.com
>> 1-646-481-7232
>>
>>
>>
>> On Sun, Aug 14, 2016 at 8:07 PM, Menno Smits <menno.sm...@canonical.com>
>> wrote:
>>
>>> I've put together a few tips on the wiki for speeding up bootstrap and
>>> provisioning times when using the Juju lxd provider. I find these
>>> techniques helpful when checking my work or investigating bugs - situations
>>> where you end up bootstrapping and deploying many times.
>>>
>>> https://github.com/juju/juju/wiki/Faster-LXD
>>>
>>> If you have your own techniques, or improvements to what I'm doing,
>>> please update the article.
>>>
>>> - Menno
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>>> an/listinfo/juju-dev
>>>
>>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at: https://lists.ubuntu.com/mailm
>> an/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Better handling of MongoDB disconnects due to new replicaset members

2016-08-14 Thread Menno Smits
Just to round out this thread, this issue has now been dealt with (thanks
Tim!). The server now translates these errors into a more useful error that
indicates the client should retry the required. The Juju commmand line
client transparently intercepts these errors and retries.

Here's the relevant pull request: https://github.com/juju/juju/pull/5927


On 26 July 2016 at 14:42, Reed O'Brien <reed.obr...@canonical.com> wrote:

> On Mon, Jul 25, 2016 at 5:38 PM, Menno Smits <menno.sm...@canonical.com>
> wrote:
>
>> Regarding https://bugs.launchpad.net/juju-core/+bug/1597601 ...
>>
>> When "juju enable-ha" is used, new controller machines are started, each
>> running a mongod instance which is connected to Juju's replicaset. As each
>> new node joins the replicaset a MongoDB leader election is triggered which
>> causes all mongod instances in the replicaset to drop their connections
>> (this is by design). The workers in the Juju's machine agents handle this
>> correctly by aborting and restarting with fresh connections to MongoDB.
>>
>> The problem is that if an API request comes in at just the right time, it
>> will be actioned just as the MongoDB connection goes down, resulting in the
>> i/o timeout error being reported back to the client.
>>
>> This isn't a new problem but it's one that Juju's users regularly run in
>> to. A workaround is to wait for the new controller machines to come up
>> after enable-ha is issued before doing anything else.
>>
>> IMHO it would be best if Juju could hide all this from the client as much
>> as possible but I'm really not sure if that's feasible or what the best
>> approach should be.
>>
>> The challenge is that unless we do some major rearchitecting, the API
>> server needs to be restarted when the MongoDB connections drop. There's no
>> way to that the client's connection can stay up, making it difficult to
>> hide this detail from the client.
>>
>
> It seems that mgo could handle this as a failover. Or that we could see
> that the replica set is starting and wait until it reports being up, then
> refresh the mgo session. I don't understand why the API server itself has
> to restart, though I am sure there are good reasons.
>
>
>>
>> The most practical solution I can think of is that we introduce a new
>> error type over the API which means "please retry the request". Errors such
>> as an i/o timeout from the MongoDB layer could be converted into this
>> error. Clients would obviously have to handle this error specially.
>>
>
> Barring handling it via mgo session this seems obvious and practical.
>
>
> ~ro
>
> --
> Reed O'Brien
> ✉ reed.obr...@canonical.com
> ✆ 415-562-6797
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Faster LXD bootstraps and provisioning

2016-08-14 Thread Menno Smits
I've put together a few tips on the wiki for speeding up bootstrap and
provisioning times when using the Juju lxd provider. I find these
techniques helpful when checking my work or investigating bugs - situations
where you end up bootstrapping and deploying many times.

https://github.com/juju/juju/wiki/Faster-LXD

If you have your own techniques, or improvements to what I'm doing, please
update the article.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Let's talk retries

2016-08-08 Thread Menno Smits
On 9 August 2016 at 12:22, Katherine Cox-Buday <
katherine.cox-bu...@canonical.com> wrote:

>
> To implement this, I do something like this:
>
> args := retry.CallArgs{
> Func: func() error {
> // Body of my loop
> },
> BackoffFunc: func(delay time.Duration, attempt int) time.Duration {
> if attempt == 1 {
> return delay
> }
> return delay * factor
> },
>
>
Note that for BackoffFunc there is already one canned function in the retry
package for the common case of doubling the retry delay (DoubleDelay). It
might be nice to add a few more standard backoff functions and/or a factory
for multiplicative delays like this to remove the need for a little bit of
boilerplate in every use of retry.Call.



> Functionally, what's in juju/retry is fine; and I can stuff anything I
> want into the function. It just feels a little odd to use in that I must
> put the body of my loop in a function, and I dislike that the CallArgs
> struct attempts to encapsulate a bunch of different scenarios.
>

BackoffTick has a certain elegance but I still prefer retry.Call because it
makes it harder to get things wrong. The CallArgs struct helps to remind
the developer of the things they should be thinking about whereas with
BackoffTick it's easy to forget about some aspect (e.g. the need for a
cancellation channel).

retry.Call also has the advantage of already being able to do the job and
already being in use.


> Also, is it on the roadmap to address the inconsitant retries throughout
> Juju? I almost used utils.BackoffTimer until I started looking deeper. I'm
> going to submit a PR to flag it as deprecated.
>

That would be good to do IMO.

Cheers,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: hub github helper

2016-08-02 Thread Menno Smits
+1

I've been using hub for a while now to make it easy to grab other people's
pull requests. It's great.

Like Nate, I also prefer to keep hub separate from git so I also ignore the
install suggestion from the hub team.


On 3 August 2016 at 07:56, Rick Harding  wrote:

> Thanks Nate, that's really useful info and Hub makes it easy to get at
> other folk's repos/forks of Juju to really collaborate, look at code that's
> WIP and such.
>
> I highly recommend folks take a peek and see how it can improve their
> collaboration and workflows. Especially when reviewing and QA'ing pull
> requests from folks.
>
> On Tue, Aug 2, 2016 at 12:08 PM Nate Finch 
> wrote:
>
>> I've mentioned this before, but with some of our new code review
>> guidelines, I figured it's good to reiterate.  Github has a CLI tool that
>> helps with doing git-related things with github.  It's called hub. It's
>> written in Go, so installing it is as easy as go get
>> github.com/github/hub
>>
>> Github recommends making an alias to have hub replace git, since it
>> forwards everything to git that it doesn't understand.  Honestly, I don't
>> really see any benefit to that.  I prefer to understand what git is doing
>> versus what hub is doing.
>>
>> It can do a whole bunch of stuff, but there are two things I use it for
>> the most - checking out PRs and making PRs.
>>
>> Since we're supposed to be doing manual testing on people's PRs when we
>> review them, we need a way to do that.  With hub it's one command:
>>
>> hub checkout 
>>
>> so, for example:
>>
>> hub checkout https://github.com/juju/juju/pull/5915
>>
>> Bam, your local branch is set to a copy of the PR (don't forget to run
>> godeps).
>>
>> To make a PR from the CLI using hub, make sure the repo you want to PR
>> against is the git remote called origin, then you can make a PR with your
>> current branch by just doing
>>
>> hub pull-request
>>
>> This will open an editor to write the PR message, or you can use -m just
>> like with git commit.
>>
>> -Nate
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Small script to connect to Juju's mongo in LXD

2016-07-27 Thread Menno Smits
Small correction: http://paste.ubuntu.com/21232706/

Serves me right for cleaning it up a little before sending.

On 28 July 2016 at 15:11, Menno Smits <menno.sm...@canonical.com> wrote:

> Nice. I'd suggest 2 things:
>
>- Change the mongo command line to include `--authenticationDatabase
>admin` and then change the database being connected to in the URL to "juju"
>instead of "admin". That way you get dropped straight into the juju
>database which is usually what you want.
>- Quote $PASSWORD, just in case.
>
> I have a similar thing which works for controllers on any cloud type but
> it's pretty awful. FWIW here it is: http://paste.ubuntu.com/21232100/
>
> It currently only works for xenial machines and adds a PPA to get a
> MongoDB 3.2 client onto the host. The 2.6 client which ships with Xenial
> can't connect to a MongoDB 3.2 instance (I think something was being sorted
> out for this). It wouldn't be hard to have it install install
> mongodb-clients for non-xenial machines.
>
> - Menno
>
>
>
> On 28 July 2016 at 14:46, Andrew Wilkins <andrew.wilk...@canonical.com>
> wrote:
>
>> On Thu, Jul 28, 2016 at 12:32 AM John Meinel <j...@arbash-meinel.com>
>> wrote:
>>
>>> Did you intend to attach the script to the email? It does sound like
>>> something useful. I know when we were investigating at some client sites we
>>> had a small snippet of a bash function to dig the content out of agent.conf
>>> and launch mongo with the right options. It would be nice to have that in a
>>> more official place so it doesn't get forgotten.
>>>
>>
>> Kapil wrote a plugin for inspecting Mongo:
>> https://github.com/kapilt/juju-dbinspect. It's almost certainly broken
>> in Juju 2.0. I've found it handy in the past, it'd be good to have that
>> brought up to date.
>>
>> Cheers,
>> Andrew
>>
>>
>>> John
>>> =:->
>>>
>>>
>>> On Wed, Jul 27, 2016 at 6:19 PM, Katherine Cox-Buday <
>>> katherine.cox-bu...@canonical.com> wrote:
>>>
>>>> I frequently need to connect to Juju's Mongo instance to poke around
>>>> and see if something I've done is having the desired effect. Back when we
>>>> were using LXC, I had a script that would pull the password from agent.conf
>>>> and open a shell. When we switched to LXD my script broke, and I never
>>>> updated it. I finally got frustrated enough to modify[1] it, and thought
>>>> others might find this useful for poking around Mongo.
>>>>
>>>> Let me know if you have any suggestions.
>>>>
>>>> --
>>>> Katherine
>>>>
>>>> [1] - http://pastebin.ubuntu.com/21155985/
>>>>
>>>> --
>>>> Juju-dev mailing list
>>>> Juju-dev@lists.ubuntu.com
>>>> Modify settings or unsubscribe at:
>>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>>
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Small script to connect to Juju's mongo in LXD

2016-07-27 Thread Menno Smits
Nice. I'd suggest 2 things:

   - Change the mongo command line to include `--authenticationDatabase
   admin` and then change the database being connected to in the URL to "juju"
   instead of "admin". That way you get dropped straight into the juju
   database which is usually what you want.
   - Quote $PASSWORD, just in case.

I have a similar thing which works for controllers on any cloud type but
it's pretty awful. FWIW here it is: http://paste.ubuntu.com/21232100/

It currently only works for xenial machines and adds a PPA to get a MongoDB
3.2 client onto the host. The 2.6 client which ships with Xenial can't
connect to a MongoDB 3.2 instance (I think something was being sorted out
for this). It wouldn't be hard to have it install install mongodb-clients
for non-xenial machines.

- Menno



On 28 July 2016 at 14:46, Andrew Wilkins 
wrote:

> On Thu, Jul 28, 2016 at 12:32 AM John Meinel 
> wrote:
>
>> Did you intend to attach the script to the email? It does sound like
>> something useful. I know when we were investigating at some client sites we
>> had a small snippet of a bash function to dig the content out of agent.conf
>> and launch mongo with the right options. It would be nice to have that in a
>> more official place so it doesn't get forgotten.
>>
>
> Kapil wrote a plugin for inspecting Mongo:
> https://github.com/kapilt/juju-dbinspect. It's almost certainly broken in
> Juju 2.0. I've found it handy in the past, it'd be good to have that
> brought up to date.
>
> Cheers,
> Andrew
>
>
>> John
>> =:->
>>
>>
>> On Wed, Jul 27, 2016 at 6:19 PM, Katherine Cox-Buday <
>> katherine.cox-bu...@canonical.com> wrote:
>>
>>> I frequently need to connect to Juju's Mongo instance to poke around and
>>> see if something I've done is having the desired effect. Back when we were
>>> using LXC, I had a script that would pull the password from agent.conf and
>>> open a shell. When we switched to LXD my script broke, and I never updated
>>> it. I finally got frustrated enough to modify[1] it, and thought others
>>> might find this useful for poking around Mongo.
>>>
>>> Let me know if you have any suggestions.
>>>
>>> --
>>> Katherine
>>>
>>> [1] - http://pastebin.ubuntu.com/21155985/
>>>
>>> --
>>> Juju-dev mailing list
>>> Juju-dev@lists.ubuntu.com
>>> Modify settings or unsubscribe at:
>>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>>
>>
>> --
>> Juju-dev mailing list
>> Juju-dev@lists.ubuntu.com
>> Modify settings or unsubscribe at:
>> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Proposed addition to code review guidelines

2016-07-25 Thread Menno Smits
+1. This seems like an obvious addition to the checklist.

On 20 July 2016 at 14:26, Tim Penhey  wrote:

> Hi folks,
>
> With model migration entering the world as a first class citizen, we need
> to ensure that all fields and documents added to state have appropriate
> migrations done for them.
>
> In particular when adding a field it is likely that the
> state/migrations_internal_test.go will have a failing test as it checks the
> public fields of the documents.
>
> We should add a code review check that ensures that no fields are added to
> that test without the appropriate additions to state/migration_export.go
> and state/migration_import.go otherwise it is just a lie, and the migration
> result will not reflect the new database state.
>
> Tim
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Better handling of MongoDB disconnects due to new replicaset members

2016-07-25 Thread Menno Smits
Regarding https://bugs.launchpad.net/juju-core/+bug/1597601 ...

When "juju enable-ha" is used, new controller machines are started, each
running a mongod instance which is connected to Juju's replicaset. As each
new node joins the replicaset a MongoDB leader election is triggered which
causes all mongod instances in the replicaset to drop their connections
(this is by design). The workers in the Juju's machine agents handle this
correctly by aborting and restarting with fresh connections to MongoDB.

The problem is that if an API request comes in at just the right time, it
will be actioned just as the MongoDB connection goes down, resulting in the
i/o timeout error being reported back to the client.

This isn't a new problem but it's one that Juju's users regularly run in
to. A workaround is to wait for the new controller machines to come up
after enable-ha is issued before doing anything else.

IMHO it would be best if Juju could hide all this from the client as much
as possible but I'm really not sure if that's feasible or what the best
approach should be.

The challenge is that unless we do some major rearchitecting, the API
server needs to be restarted when the MongoDB connections drop. There's no
way to that the client's connection can stay up, making it difficult to
hide this detail from the client.

The most practical solution I can think of is that we introduce a new error
type over the API which means "please retry the request". Errors such as an
i/o timeout from the MongoDB layer could be converted into this error.
Clients would obviously have to handle this error specially.

Does anyone have another idea?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Nasty MongoDB upsert behaviour

2016-06-27 Thread Menno Smits
Christian and I have just learned about an unfortunate property of upserts
in MongoDB. It's documented, but unexpected, and I doubt many people know
about it. From the docs:

With a unique index, if multiple applications issue the same update with
> upsert: true, exactly one update() would successfully insert a new document.
>


The remaining operations would either:
> * update the newly inserted document, or

** fail when they attempted to insert a duplicate.*



If the operation fails because of a duplicate index key error, *applications
> may retry the operation* which will succeed as an update operation.


 (
https://docs.mongodb.com/v3.2/reference/method/db.collection.update/#use-unique-indexes
)

This means that an upsert may result in either a new document being
inserted, an existing document being updated, or a duplicate key error
being returned. If the duplicate key error happens it's up to the client to
retry. Yuck!

These semantics exist in all versions of MongoDB that we use with Juju but
under 3.2 with WiredTiger duplicate key violations with upserts seem to be
happening a lot more often. We've been seeing them occasionally with
upserts into txns.stash that mgo/txn does.

Christian and I are about to propose a PR for mgo/txn which deals with this
issue but a check of the Juju state package shows a number of other uses of
upserts which also need to be changed. The most important of these is
probably the upserts for the sequence collection. Christian and I will look
at these next.

Please keep this in mind when interacting with MongoDB.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Cleansing Mongo data

2016-06-23 Thread Menno Smits
Thanks, this is really useful  - especially when writing data into the
database that comes sources that the code doing the writing doesn't have
control over.

Two little things:

1. The docstring for EscapeKeys still mentions statusDoc.
2. Are you sure this needs to be in it's own package, especially one called
"utils"? Given we already have the widely used github.com/juju/utils - as
well as others with that name under juju/juju - this one is predestined to
be aliased everywhere it's imported. Couldn't these escaping functions just
live in their own file in github.com/juju/juju/mongo? Even when the import
isn't aliased, the intent of "mongo.EscapeKeys(...)" is clearer than
"utils.EscapeKeys(...)".

- Menno


On 24 June 2016 at 08:09, Katherine Cox-Buday <
katherine.cox-bu...@canonical.com> wrote:

> Hey all,
>
> William gave me a good review and it came up that I wasn't cleansing some
> of the data being placed in Mongo. I wasn't aware this had to be done, and
> after talking to a few other folks it became apparent that maybe not many
> people know we should be doing this.
>
> At any rate, William also pointed me to some existing code which did this.
> I've pulled it out into the mongo/utils package for general consumption.
> The comments do a pretty good job of elucidating why this is necessary.
>
> https://github.com/juju/juju/blob/master/mongo/utils/data_cleansing.go
>
> -
> Katherine
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Natural sorting helper

2016-06-22 Thread Menno Smits
Hi everyone,

Earlier this week I needed to be able sort a slice containing unit names
and machine ids in a way that would make sense to a human. For example, a
conventional string sort would order a list of machine ids like this:

0
10
3
3/lxd/1
3/lxd/10
3/lxd/11
3/lxd/2
4

when what I really wanted was:

0
3
3/lxd/1
3/lxd/2
3/lxd/10
3/lxd/11
4
10

Tim pointed me at something that Anastasia had already done for formatting
the output from some Juju CLI commands. This was close so I extracted it to
github.com/juju/utils and generalised it. An in-place sort can be performed
like this:

utils.SortStringsNaturally(someSliceOfStrings)

The implementation is here:

https://github.com/juju/utils/blob/master/naturalsort.go
https://github.com/juju/utils/blob/master/naturalsort_test.go

Consider using it if your code needs to sort machine ids, unit names, tag
strings, IP addresses and any other slice of strings which contain sections
of digits.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Automatic commit squashing

2016-06-15 Thread Menno Smits
Hi everyone,

Following on from the recent thread about commit squashing and commit
message quality, the idea of automatically squashing commit at merge time
has been raised. The idea is that the merge bot would automatically squash
commits for a pull request into a single commit, using the PR description
as the commit message.

With this in place, developers can commit locally using any approach they
prefer. The smaller commits they make as they work won't be part of the
history the team interacts with in master.

When using autosquashing the quality of pull request descriptions should
get even more scrutiny during reviews. The quality of PR descriptions is
already important as they are used for merge commits but with autosquashing
in place they will be the *only* commit message.

Autosquashing can be achieved technically by either having the merge bot do
the squashing itself, or by taking advantage of Github's feature to do this
(currently in preview mode):

https://developer.github.com/changes/2016-04-01-squash-api-preview/

We need to ensure that the squashed commits are attributed to the correct
author (i.e. not jujubot). I'm not sure what we do with pull requests which
contain work from multiple authors. There doesn't seem to be an established
approach for this.

Thoughts?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: A cautionary tale - mgo asserts

2016-06-08 Thread Menno Smits
On 8 June 2016 at 22:36, John Meinel  wrote:

> ...
>>
>
>
>>
>>   ops := []txn.Op{{
>>   C: "collection",
>>   Id: ...,
>>   Assert: bson.M{
>>   "some-field.A": "foo",
>>   "some-field.B": 99,
>>   },
>>   Update: ...
>>   }
>>
>> ...
>>
>>
> If loading into a bson.M is the problem, wouldn't using a bson.M to start
> with also be a problem?
>

No this is fine. The assert above defines that each field should match the
values given. Each field is checked separately - order doesn't matter.

This would be a problem though:

  ops := []txn.Op{{
  C: "collection",
  Id: ...,
  Assert: bson.M{"some-field": bson.M{
  "A": "foo",
  "B": 99,
  },
  Update: ...
  }

>
In this case, mgo is being asked to assert that some-field is an embedded
document equal to a document defined by the bson.M{"A": "foo", "B": 99}
map. This is what's happening now when you provide a struct value to
compare against a field because the struct gets round-tripped through
bson.M. That bson.M eventually gets converts to actual bson and sent to
mongodb but you have no control of the field ordering that will ultimately
be used.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: A cautionary tale - mgo asserts

2016-06-08 Thread Menno Smits
On 9 June 2016 at 03:44, Gustavo Niemeyer 
wrote:

> Is it mgo/txn that is internally unmarahalling onto that?
>
> Let's get that fixed at its heart.
>

That would be ideal. The root of the problem is that the Assert, Insert and
Update fields of txn.Op are of type interface{} and the bson unmarshalling
uses bson.M for these. This means when a transaction is loaded from the
txns collection the contents of these fields are loaded into bson.M and
field ordering is lost.

It looks trivial to change the bson unmarshalling code to default to bson.D
but naively changing this will likely break existing users of the bson
package. That's probably not the right solution here. Perhaps transactions
which are written to/loaded from the database by mgo/txn should use a
private txn.Op analogue where Assert, Insert and Update are bson.D instead
of interface{}?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: A cautionary tale - mgo asserts

2016-06-08 Thread Menno Smits
On 8 June 2016 at 21:05, Tim Penhey  wrote:

> Hi folks,
>
> tl;dr: not use structs in transaction asserts
>
> ...
>
> The solution is to not use a field struct equality, even though it is easy
> to write, but to use the dotted field notation to check the embedded field
> values.
>


To give a more concrete example, asserting on a embedded document field
like this is problematic:

  ops := []txn.Op{{
  C: "collection",
  Id: ...,
  Assert: bson.D{{"some-field", Thing{A: "foo", B: 99}}},
  Update: ...
  }

Due to the way mgo works[1], the document the transaction operation is
asserting against may have been written with A and B in reverse order, or
the Thing struct in the Assert may have A and B swapped by the time it's
used. Either way, the assertion will fail randomly.

The correct approach is to express the assertion like this:

  ops := []txn.Op{{
  C: "collection",
  Id: ...,
  Assert: bson.D{
  {"some-field.A", "foo"},
  {"some-field.B", 99},
  },
  Update: ...
  }

or this:

  ops := []txn.Op{{
  C: "collection",
  Id: ...,
  Assert: bson.M{
  "some-field.A": "foo",
  "some-field.B": 99,
  },
  Update: ...
  }


> Yet another thing to add to the list of things to check when doing reviews.


I think we can go a bit further and error on attempts to use structs for
comparison in txn.Op asserts in Juju's txn layers in state. Just as we
already do some munging and checking of database operations to ensure
correct multi-model behaviour, we should be able to do this same for this
issue and prevent it from happening again.

- Menno

[1] If transaction operations are loaded and used from the DB (more likely
under load when multiple runners are acting concurrently), the Insert,
Update and Assert fields are loaded as bson.M (this is what the bson
Unmarshaller does for interface{} typed fields). Once this happens field
ordering is lost.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Reminder: write tests fail first

2016-05-04 Thread Menno Smits
Good catch.

On 5 May 2016 at 14:24, Andrew Wilkins  wrote:

> See: https://bugs.launchpad.net/juju-core/+bug/1578456
>
> Cheers,
> Andrew
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: auto-upgrading models

2016-04-25 Thread Menno Smits
On 26 April 2016 at 02:58, Katherine Cox-Buday <
katherine.cox-bu...@canonical.com> wrote:

> I like #4. I don't think we'd ever want to auto-upgrade all models,
> because having models be isolated from one another is kind of the point.
>

I agree. Remember that in some situations models may have completely
different owners.

#4 would be good to have so that the administrator for each model at least
knows that an upgrade is available.

A per-model flag which indicates that the model should automatically
upgrade when the controller does might be nice too (this what #7 means I
think?). This would be convenient when all the models for a controller are
owned and managed by the same people - especially for less critical
deployments.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: nice things coming in Go 1.7: toolchain improvements

2016-04-18 Thread Menno Smits
That is good news! I've been flipping back to Go 1.4 sometimes when doing
rapid code-test cycles due to the slower build times.


On 18 April 2016 at 15:59, Michael Hudson-Doyle <
michael.hud...@canonical.com> wrote:

> Hi all,
>
> I know we're all heads down for release, but I thought I'd report
> something interesting: the go linker in go tip / 1.7-to-be is so much
> faster than the one in 1.6 that it nearly halves the time to run "go
> test github.com/juju/juju/...", taking the wall clock time on this EC2
> instance I had set up from "28m1.069s" to "15m19.542s"! So that's
> something to look forward to :-)
>
> (Actually go tip chokes on the juju tests currently, I wound back a
> few days to make these numbers -- upstream bug report being written
> now)
>
> Cheers,
> mwh
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Dependency engine in the machine agent

2016-03-02 Thread Menno Smits
On 3 March 2016 at 10:54, Tim Penhey  wrote:

> Thanks Menno. That was very helpful.
>
> Apart from state connections, or api connections, what sort of resources
> are shared between workers?
>

Right now we have:

   - the agent manifold which outputs an Agent interface. This is used by
   many manifolds to access the agent's configuration.
   - the leadership manifold which outputs a leadership tracker for workers
   which need participate in leader elections.
   - the statetracker manifold which emits a boolean indicating whether a
   machine agent should be a state server (this is consumed by the state
   manifold).
   - a few manifolds which are used for synchronisation across workers
   (e.g. signalling when upgrades are complete)

There will be more resources added, including those to affect worker
behaviour for model migrations.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Dependency engine in the machine agent

2016-03-02 Thread Menno Smits
Hi everyone,

One of the pieces of the puzzle required for model migrations was to get
all the workers in the machine agent running under the new dependency
engine framework. It will allow us to cleanly shut down workers so that a
model to migrated can settle to a stable, "read-only" mode. This has been a
major focus for team Onyx and Will.

If you're not sure what the dependency engine is all about, a good place to
start is Will's excellent documentation in the source tree:

https://github.com/juju/juju/blob/master/worker/dependency/doc.go

The tl;dr is that the dependency engine provides a framework where workers
are wrapped in a "manifold" which describes the resources the worker
depends on, a function to start the worker and (optionally) a function
which makes resources available to other manifolds/workers. In the context
of the dependency engine framework, a "resource" is any value that is
exported by a manifold for use by other manifolds.

All the manifolds for an agent are registered with a dependency engine
which then starts, stops and restarts workers based on the availability of
their configured inputs (i.e. dependencies) and worker behaviour (i.e. when
they exit and the errors they exit with).

One nice property of the dependency engine framework is that it makes it
very clear what a worker depends on and how it starts. The machine agent
has become a complex hierarchy of runners and workers with sometimes
non-obvious interactions between them. When the machine agent was younger
and simpler the "nested runners" architecture was elegant and appropriate,
but as more demands have been put on the machine agent it's become almost
unmanageable. Fixing this wasn't the main driver for moving the machine
agent to the dependency engine framework but it's been a side benefit.

The machine agent dependency engine work has also meant that code to set up
and start specific workers has been moved out of the machine agent itself
to worker-specific packages, meaning everything to do with a worker is
located together, and the machine agent's implementation is becoming
clearer and more concise. Testing of workers and the agent has also become
more straightforward and more robust.

So where are things now? Before Onyx started doing this work on the machine
agent, the unit agent was already completely converted to use the
dependency engine for its workers - a unit agent runs far fewer workers so
it was a good place to start. Much of the conversion work for the machine
agent has already landed in master and there's several feature branches at
varying levels of completion that cover most of the remaining work.
Specifically the following areas have been converted:

   - a manifold which manages the API connection
   - most of the (many!) workers which depend on the API connection
   - the base workers which manage a State instance (state servers only)
   - some of the workers which depend on State

What does this mean for Juju developers? If you're adding a new worker to
the machine (or unit) agent you'll need to create a manifold for it and
integrate it with the agent's dependency engine. Similarly if you're making
changes to an existing worker, it's likely to have already been converted
so you'll need to understand how manifolds and the dependency engine work.

If you're not sure about something dependency engine related, please feel
free to ask Will, Jesse or me. There's an initial learning curve but it's
great once you get basic concepts straight. Aside from the docs (linked
above), there's also lots of examples to study. Any worker package you find
in the source with a manifold.go file has been converted.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: LXD support (maybe)

2016-02-25 Thread Menno Smits
On 26 February 2016 at 16:35, John Meinel  wrote:

> A fair point. But I'll note that we are trying to get go >= 1.5 into
> trusty back ports and use that for Juju 2.0
> We need something better than 1.2 that is in Trusty to build the lxd
> bindings at all, and we've wanted to move forward to not have to fight with
> all the dependencies that have already moved forward.
>
I'll happily move to a newer Go version when necessary. I'm currently using
1.3 as my main Go, but have 1.2 through 1.5 installed alongside for
troubleshooting version specific issues.


> I personally encourage us to use heterogeneous versions of go as much as
> we can. Because we should be compatible as much as possible. But it does
> look like our dependencies are going to force our hand.
>

Agreed. I think it's healthy for Juju's devs to be using a range of Go
versions (within reason). It helps to ensure we not relying on version
specific behaviour.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: LXD support (maybe)

2016-02-25 Thread Menno Smits
On 26 February 2016 at 11:41, Ian Booth  wrote:

> FWIW, go 1.6 works just fine with Juju on my system
>

My reason for not using Go 1.6 on my machine is that it's not what the
official Juju releases are built with (yet) and it's not what CI uses.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: LXD support (maybe)

2016-02-25 Thread Menno Smits
On 26 February 2016 at 04:59, Horacio Duran 
wrote:

> be aware though, iirc that ppa replaces your go version with 1.6 (or used
> to) which can mess your env if you are using go from ubuntu.
>

With a bit of apt configuration you can use the lxd stable PPA without
pulling in its Go 1.6 packages.

Here's what I did:

$ cat /etc/apt/preferences.d/lxd-stable-pin
Package:  *
Pin: release o=LP-PPA-ubuntu-lxc-lxd-stable
Pin-Priority: 200

Package: lxd lxd-tools lxd-client lxcfs lxc-templates lxc cgmanager
libcgmanager0 libseccomp2
Pin: release o=LP-PPA-ubuntu-lxc-lxd-stable
Pin-Priority: 500

The main problem with this approach is that you have to explicitly specify
the package names you do want to use, which will be a problem if package
names change or extra packages are added. Maybe someone with more apt foo
than me knows a better way.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: "environment" vs "model" in the code

2016-01-17 Thread Menno Smits
+1 to what Roger said. New features always require changes to existing code
so inconsistency is unavoidable if we take a piecemeal approach.

Given that a big rename is planned at some point, and that renaming can be
largely automated, continuing to use "environment" internally until the big
rename happens may make more sense in terms of maintainability.

Thoughts?



On 15 January 2016 at 21:05, roger peppe <roger.pe...@canonical.com> wrote:

> On 15 January 2016 at 06:03, Ian Booth <ian.bo...@canonical.com> wrote:
> >
> >
> > On 15/01/16 10:16, Menno Smits wrote:
> >> Hi all,
> >>
> >> We've committed to renaming "environment" to "model" in Juju's CLI and
> API
> >> but what do we want to do in Juju's internals? I'm currently adding
> >> significant new model/environment related functionality to the state
> >> package which includes adding new database collections, structs and
> >> functions which could include either "env/environment" or "model" in
> their
> >> names.
> >>
> >> One approach could be that we only use the word "model" at the edges -
> the
> >> CLI, API and GUI - and continue to use "environment" internally. That
> way
> >> the naming of environment related things in most of Juju's code and
> >> database stays consistent.
> >>
> >> Another approach is to use "model" for new work[1] with a hope that
> it'll
> >> eventually become the dominant name for the concept. This will however
> >> result in a long period of widespread inconsistency, and it's unlikely
> that
> >> things we'll ever completely get rid of all uses of "environment".
> >>
> >> I think we need arrive at some sort of consensus on the way to tackle
> this.
> >> FWIW, I prefer the former approach. Having good, consistent names for
> >> things is important[2].
> >>
> >
> > Using "model" for new work is the correct approach - new chunks of work
> will be
> > internally consistent with the use of their terminology. And we will be
> looking
> > to migrate existing internal code once we tackle the external facing
> stuff for
> > 2.0. We don't want to add to our tech debt and make our future selves
> sad by
> > introducing obsoleted terminology for new work.
>
> The other side of this coin is that, as Menno says, now the code base
> will be harder to read because it will be inconsistent throughout (and
> not consistently inconsistent either, because the new work is bound to
> cross domain boundaries).
>
> Given that it's not hard to make automated source code changes in Go
> (given gofmt, gorename, gofix etc), I wonder if doing it this way might
> just be making things harder for people maintaining the code without
> actually making things significantly easier in the long run.
>
>   cheers,
> rog.
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


"environment" vs "model" in the code

2016-01-14 Thread Menno Smits
Hi all,

We've committed to renaming "environment" to "model" in Juju's CLI and API
but what do we want to do in Juju's internals? I'm currently adding
significant new model/environment related functionality to the state
package which includes adding new database collections, structs and
functions which could include either "env/environment" or "model" in their
names.

One approach could be that we only use the word "model" at the edges - the
CLI, API and GUI - and continue to use "environment" internally. That way
the naming of environment related things in most of Juju's code and
database stays consistent.

Another approach is to use "model" for new work[1] with a hope that it'll
eventually become the dominant name for the concept. This will however
result in a long period of widespread inconsistency, and it's unlikely that
things we'll ever completely get rid of all uses of "environment".

I think we need arrive at some sort of consensus on the way to tackle this.
FWIW, I prefer the former approach. Having good, consistent names for
things is important[2].

Thoughts?

- Menno

[1] - but what defines "new" and what do we do when making significant
changes to existing code?
[2] - http://martinfowler.com/bliki/TwoHardThings.html
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: New feature branch MADE-workers

2016-01-13 Thread Menno Smits
Jesse: you'll need to close your existing PRs with Ship Its and retarget
them to MADE-workers and merge them. No need to get them re-reviewed
obviously.

On 14 January 2016 at 10:50, Tim Penhey  wrote:

> Hi all,
>
> Since we are holding back the machine-dep-engine feature branch from
> master until the first 2.0 alpha, we needed a place to merge in the
> worker changes without impacting the blessedness of the
> machine-dep-engine branch.
>
> This feature branch should just hold the related manifold worker changes
> over and above the machine-dep-engine changes.
>
> Tim
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Making logging to MongoDB the default

2015-10-22 Thread Menno Smits
I implemented the ability for Juju's logs to go to MongoDB some time ago.
This feature is available in 1.25 behind a feature flag and also gets
enabled automatically if the "jes" feature flag is enabled (logging via
rsyslog doesn't make much sense when using multiple environments in the one
system).

There's also another separate feature flag which turns off rsyslog based
logging.

The "jes" feature flag will be removed soon meaning that database logging
will need to become default functionality. The question is do we then also
remove logging to rsyslog functionality, or perhaps reverse the sense of
the rsyslog feature flag so that it's off by default but can be enabled if
people want it for some reason?

I'm confident that the logging to the database feature is solid. I spent of
lot of time confirming that performance wouldn't be an issue. The code is
well tested. Automatic log rotation is implemented.

The main issue I can see is that once rsyslog based logging is turned off
we lose the all-machines.log file which some people and systems no doubt
rely on. The logs for an environment can of course still be retrieved using
the "juju debug-log" command.

If we really want something like all-machines.log, it wouldn't be /too/
hard to implement a worker which generates an all-machines.log style file
in real-time from the database logs. All the pieces to implement that
already exist, but I really don't have the bandwidth this cycle to do the
work.

Thoughts?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits
On 23 October 2015 at 00:50, Marco Ceppi  wrote:

> How do current customers, like dtag, who are very keen on having
> everything goto syslog/rsyslog, going to keep that functionality going
> forward? Is there an easy mechanism to get the logs out of mongodb?
>
> All too often when I'm rummaging around in debug-hooks, I'll tail
> /var/log/juju/unit-*.log to get an idea of what to expect or see what
> failed. Does that workflow change with this feature?
>
>
Sorry, I should have been clearer. This change doesn't affect the existing
unit-*.log and machine-*.log files. It's only about the streaming of logs
to the API servers. The workflow you describe will continue to work.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits
On 23 October 2015 at 00:50, Marco Ceppi  wrote:

> How do current customers, like dtag, who are very keen on having
> everything goto syslog/rsyslog, going to keep that functionality going
> forward? Is there an easy mechanism to get the logs out of mongodb?
>

AFAIK we've never supported sending Juju's logs to an external syslog so
nothing changes for existing users there. The logging-to-MongoDB change
doesn't affect any use of rsyslog by services deployed by Juju - this is
only about Juju's own logs.

The work done for the logging-to-MongoDB feature does pave the way for
sending Juju's logs to external logging systems but we haven't taken it
that far yet.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits
On 22 October 2015 at 23:04, Adam Collard 
wrote:

> On Thu, 22 Oct 2015 at 10:39 roger peppe 
> wrote:
>
>> I have one objection to the debug-log command - it doesn't
>> appear to be possible to get the log up until the current time
>> without blocking to wait for more messages when it gets
>> to the end. So it's not quite a substitute because I can't easily
>> grep the log without interrupting the grep command.
>>
>
> FWIW this is
> https://bugs.launchpad.net/juju-core/+bug/1390585
>

This should be quite easy to fix. Onyx will deal with this too.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits
On 23 October 2015 at 04:17, Nate Finch  wrote:

> IMO, all-machines.log is a bad idea anyway (it duplicates what's in the
> log files already, and makes it very likely that the state machines will
> run out of disk space, since they're potentially aggregating hundreds or
> thousands of machines' logs, not to mention adding a lot of network
> overhead). I'd be happy to see it go away.
>

Collecting the logs in a central place provides a single audit log and
makes it possible to get the big picture when investigating why something
happened. It also means we still have logs even when machines have been
decommissioned.


> However, I am not convinced that dropping text file logs in general is a
> good idea, so I'd love to hear what we're gaining by putting logs in Mongo.
>

The machine-*.log and unit-*.log files will remain as they are now. This
change only affects the way logs get to the Juju controllers and how
they're stored there.

The main driver for putting the logs into the database was so we can
cleanly separate logs for different environments running under a single
controller. This is difficult to achieve reliably with rsyslog. There are
other benefits too: more powerful filtering of logs, faster log queries
(via indexes), allowing for structured log data to be emitted.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Making logging to MongoDB the default

2015-10-22 Thread Menno Smits
On 23 October 2015 at 05:02, Stuart Bishop 
wrote:

>
> I'm looking forward to having access to them in a structured format so
> I can generate logs, reports and displays the way I like rather than
> dealing with the hard to parse strings in the text logs. 'juju
> debug-logs [--from ts] [--until ts] [-F] [--format=json]' would keep
> me quite happy and I can filter, format, interleave and colorize the
> output to my hearts content. I can even generate all-machines.log if I
> feel like a headache ;)
>

Output of structured log data is now very possible. Someone just needs to
make the API server and client changes to support it. I don't think it's
something that Onyx can commit to for this cycle but it might end up being
one of those things that happens over the course of a few Friday afternoons.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Renaming of environments

2015-10-22 Thread Menno Smits
 While working on the environment migrations spec I noticed that it is
currently not possible to change the name of a Juju environment once it has
been created. This creates an unfortunate corner case in the context of
environment migrations, where you might have an environment that can't be
migrated to another controller because you've already created another
(unrelated) environment with the same name on the target controller.

Rick pointed out that it would also be nice to be able to rename an
environment when its purpose has changed. For example, you might have
created an environment called "test" which you build up and end up using
for production purposes. At that point the environment name doesn't make
much sense.

We will fix this. The rename itself is fairly easy to implement but
environment names have also been used as part of things such as EC2 and
Openstack security group names so this will need to change too. It would be
better if the names of external environment-related resources used the
environment UUID instead. There is a card for this work in Onyx's backlog.

So just a heads up that this is a current weakness which will get addressed.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Running upgrade steps for units

2015-09-15 Thread Menno Smits
On 16 September 2015 at 08:41, Tim Penhey  wrote:

> On 15/09/15 19:38, William Reade wrote:
> > Having the machine agent run unit agent upgrade steps would be a Bad
> > Thing -- the unit agents are still actively running the old code at that
> > point. Stopping the unit agents and managing the upgrade purely from the
> > machine would be ok; but it feels like a lot of effort for very little
> > payoff, so I'm most inclined to WONTFIX it and spend the energy on agent
> > consolidation instead.
>
> This still leaves us with the problem of the two upgrade steps that were
> written to update the uniter state file, and how to handle this.
>

If the work that these upgrade steps did is fairly trivial we could have
the unit agents run a function which does the upgrade work as it comes up,
before workers are started. This might be an acceptable solution if we're
going to merge machine and unit agents soon[1] anyway.

I had thought it might be reasonably easy to get the upgrade machinery
working within the unit agent but now that I've looked at the code I can
see that it's a fairly major undertaking (to do it Right at least).

- Menno


[1] FSVO "soon"
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: workers using *state.State

2015-09-08 Thread Menno Smits
You missed another worker that needs updating: envWorkerManager. Its use of
*state.State is a little less obvious.

Ticket and card added: https://bugs.launchpad.net/juju-core/+bug/1493606

On 9 September 2015 at 12:43, Tim Penhey  wrote:

> On 09/09/15 12:36, Horacio Duran wrote:
> > There is lazy and there is also "I just based in that other worker"
> > which happens, I am the proud parent of statushistorypruner and a
> > rewrite is underway too, sorry.
>
> Don't get me wrong, lazy developers are generally good. We try to find
> the simplest thing that will work.
>
> Tim
>
>
> --
> Juju-dev mailing list
> Juju-dev@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju-dev
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: workers using *state.State

2015-09-08 Thread Menno Smits
On 9 September 2015 at 12:51, Tim Penhey <tim.pen...@canonical.com> wrote:

> On 09/09/15 12:47, Menno Smits wrote:
> > You missed another worker that needs updating: envWorkerManager. Its use
> > of *state.State is a little less obvious.
>
> I had left that one off because I thought it only had the state instance
> to pass on to other workers.
>
> But I guess it does need updating, so thank you.
>
> Tim
>
>
It also uses the State itself to call State.WatchEnvironments (which will
need to be exposed via the API as part of the clean up).
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: fork/exec ... unable to allocate memory

2015-06-03 Thread Menno Smits
This thread and the ticket linked by Michael got me curious about whether
we could write our own routine for spawning processes that doesn't invoke
the usual copy-on-write semantics.

The struct returned by exec.Command has a a SysProcAttr field where you can
set (Linux-specific) flags to pass to the clone syscall. The CLONE_VM flag
looks promising but it seems to upset Go when you use it (fatal error:
runtime: stack growth during syscall). If CLONE_VM | CLONE_VFORK is used
the executable runs - I see the output from echo - but the call to Run
never returns. I'm not sure why given that with CLONE_VFORK the parent
process is supposed to unblock once the child calls execve (which it does).

I could poke at this further but I need to get back to other things.
Looking at https://github.com/golang/go/issues/5838 I'm not the only one
who's tried this and run into similar problems.

Another approach - and one that Go might do internally one day - is to tell
the kernel to not allow copy-on-write for all (or at least large) memory
blocks. Tweaking the allocations in Nate's example like this:

bigs := make([][]byte, 6)
for i := range bigs {
bigs[i] = make([]byte, GB)
syscall.Madvise(bigs[i], syscall.MADV_DONTFORK)
}

Allows the fork to work. It fails as before without the Madvise calls. This
isn't particularly practical for us but it's an interesting data point
anyway.

- Menno




On 4 June 2015 at 02:07, John Meinel j...@arbash-meinel.com wrote:

 Yeah, I'm pretty sure this machine is on 0 and we've just overcommitted
 enough that Linux is refusing to overcommit more. I'm pretty sure juju was
 at least at 2GB of pages, where 1G was in RAM and 1GB was in swap. And if
 we've already overcommitted to 9.7GB over 6.2GB linux probably decided that
 another 2GB was obvious overcommits that it would refuse.

 John
 =:-


 On Wed, Jun 3, 2015 at 5:32 PM, Gustavo Niemeyer gust...@niemeyer.net
 wrote:

 From https://www.kernel.org/doc/Documentation/vm/overcommit-accounting:

 The Linux kernel supports the following overcommit handling modes

 0-   Heuristic overcommit handling. Obvious overcommits of
  address space are refused. Used for a typical system. It
  ensures a seriously wild allocation fails while allowing
  overcommit to reduce swap usage.  root is allowed to
  allocate slightly more memory in this mode. This is the
  default.

 1-   Always overcommit. Appropriate for some scientific
  applications. Classic example is code using sparse arrays
  and just relying on the virtual memory consisting almost
  entirely of zero pages.

 2-   Don't overcommit. The total address space commit
  for the system is not permitted to exceed swap + a
  configurable amount (default is 50%) of physical RAM.
  Depending on the amount you use, in most situations
  this means a process will not be killed while accessing
  pages but will receive errors on memory allocation as
  appropriate.

  Useful for applications that want to guarantee their
  memory allocations will be available in the future
  without having to initialize every page.


 On Wed, Jun 3, 2015 at 7:40 AM, John Meinel j...@arbash-meinel.com
 wrote:

 So interestingly we are already fairly heavily overcommitted. We have
 4GB of RAM and 4GB of swap available. And cat /proc/meminfo is saying:
 CommitLimit: 6214344 kB
 Committed_AS:9764580 kB

 John
 =:-



 On Wed, Jun 3, 2015 at 9:28 AM, Gustavo Niemeyer gust...@niemeyer.net
 wrote:

 Ah, and you can also suggest increasing the swap. It would not actually
 be used, but the system would be able to commit to the amount of memory
 required, if it really had to.
  On Jun 3, 2015 1:24 AM, Gustavo Niemeyer gust...@niemeyer.net
 wrote:

 Hey John,

 It's probably an overcommit issue. Even if you don't have the memory
 in use, cloning it would mean the new process would have a chance to 
 change
 that memory and thus require real memory pages, which the system obviously
 cannot give it. You can workaround that by explicitly enabling overcommit,
 which means the potential to crash late in strange places in the bad case,
 but would be totally okay for the exec situation.
 So we're running into this failure mode again at one of our sites.

 Specifically, the system is running with a reasonable number of nodes
 (~100) and has been running for a while. It appears that it wanted to
 restart itself (I don't think it restarted jujud, but I do think it at
 least restarted a lot of the workers.)
 Anyway, we have a fair number of things that we exec during startup
 (kvm-ok, restart rsyslog, etc).
 But when we get into this situation (whatever it actually is) then we
 can't exec anything and we start getting failures.

 Now, this *might* be a golang bug.

 When I was trying to 

Re: fork/exec ... unable to allocate memory

2015-06-03 Thread Menno Smits
On 4 June 2015 at 11:56, Menno Smits menno.sm...@canonical.com wrote:


 bigs := make([][]byte, 6)


Note: I was using 6GB because my machine is running a bunch of VMs and has
very little free memory at the moment. With the 14GB allocated in the
example, memory runs out before the program gets to run echo. Anyone trying
to run the example on their own machine will need to tweak this value to
suit the amount of available memory.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Pruning the txns collection

2015-05-13 Thread Menno Smits
On 14 May 2015 at 06:41, Gustavo Niemeyer gustavo.nieme...@canonical.com
wrote:


 You are right that it's not that simple, but it's not that complex either
 once you understand the background.

 Transactions are applied by the txn package by tagging each one of the
 documents that will participate in the transaction with the transaction id
 they are participating in. When mgo goes to apply a transaction in that
 same document, it will tag the document with the new transaction id, and
 then evaluate all the transactions it is part of. If you drop one of the
 transactions that a document claims to be participating in, then the txn
 package will rightfully complain since it cannot tell the state of a
 transaction that explicitly asked to be considered for the given document.

 That means the solution is to make sure removed transactions are 1) in a
 final state; and 2) not being referenced by any tagged documents.


Thanks. This explanation clarifies things a lot.



 The txn package itself collects garbage from old transactions as new
 transactions are applied, but it doesn't guarantee that right after a
 transaction reaches a final state it will be collected. This can lead to
 pretty old transactions being referenced, if these documents are never
 touched again.


I was confused by this part when I read it because I don't see anywhere in
the mgo/txn code where cleanup of the txn collection already occurs. To
summarise our later IRC conversation for anyone who might be interested:
mgo/txn doesn't currently prune the txns collection, but it *does* prune
references to applied transactions from the txn-queue fields on documents.



 So, you have two choices to collect these old documents:

 1. Clean up the transaction references from all documents

 or

 2. Just make sure the transaction being removed is not referenced anywhere

 I would personally go for 2, as it is a read-only operation everywhere but
 in the transactions collection itself, to drop the transaction document.


I agree that #2 is preferable and I have a fairly straightforward strategy
in mind to make this happen. I'll work on that today.


Note that the same rules here apply to the stash collection as well.


Noted. I know how this hangs together from my work with PurgeMissing.

Thanks,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Send Juju logs to different database?

2015-05-06 Thread Menno Smits
On 6 May 2015 at 19:53, Stuart Bishop stuart.bis...@canonical.com wrote:

 On 6 May 2015 at 04:57, Menno Smits menno.sm...@canonical.com wrote:

  It is more likely that Juju will grow the ability to send logs to
 external
  log services using the syslog protocol (and perhaps others). You could
 use
  this to log to your own log aggregator or database. This feature has been
  discussed but hasn't been planned in any detail yet (pull requests would
 be
  most welcome!).

 syslog seems a bad fit, as the logs are now structured data and I'd
 like to keep it that way. I guess people want it as an option, but I'd
 consider it the legacy option here.


You're right. Structured logs would be much more useful.


 My own use case would be to make a more readable debug-logs, rather
 than attempting to parse the debug-logs output ;) Hmm... I may be able
 to do this already via the Juju API.


Not yet. The API used by debug-log currently emits formatted log lines, not
structured log data.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Send Juju logs to different database?

2015-05-06 Thread Menno Smits
On 7 May 2015 at 00:58, Charles Butler charles.but...@canonical.com wrote:


 On Wed, May 6, 2015 at 3:53 AM, Stuart Bishop stuart.bis...@canonical.com
  wrote:

 My own use case would be to make a more readable debug-logs, rather
 than attempting to parse the debug-logs output ;) Hmm... I may be able
 to do this already via the Juju API.


 Personally - i feel like this is a *great* use case for an opengrok filter
 w/ logstash.  Translate that structured data into something meaningful in
 elasticsearch and search/aggregate/tail like a champ.


Agreed - that would be awesome.  When we get to this we should aim to have
Juju emit structured log data that could be used to feed things like
logstash. If we do it right then hopefully grok won't be needed because the
log data will already be structured.

No promises about when this might happen though. At least the logging work
currently being done provides a good basis to add logging to external
systems.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Send Juju logs to different database?

2015-05-05 Thread Menno Smits
On 6 May 2015 at 06:49, Lauri Ojansivu x...@xet7.org wrote:

 Hi,
 today at Ubuntu Developer Summit:
 http://summit.ubuntu.com/uos-1505/meeting/22437/juju-share-and-juju-sos/

 Speakers talked about changing all logging to go to MongoDB, so I asked
 question at IRC:
 xet7 Would it be possible to use some different database than MongoDB
 for logs, because of current problems in MongoDB ?
 https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads


It is more likely that Juju will grow the ability to send logs to external
log services using the syslog protocol (and perhaps others). You could use
this to log to your own log aggregator or database. This feature has been
discussed but hasn't been planned in any detail yet (pull requests would be
most welcome!).

The changes currently being made to Juju to support logging to MongoDB also
lay the groundwork for logging to external services. This should land in
the coming weeks.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic multi-environment collection handling

2014-12-18 Thread Menno Smits
Just following up. This was fixed earlier on today and the various CI
upgrade jobs are now passing. I've marked the ticket as Fix Released so
that this issue no longer blocks merges.

On 19 December 2014 at 09:40, Menno Smits menno.sm...@canonical.com wrote:

 On 19 December 2014 at 06:02, Dimiter Naydenov 
 dimiter.nayde...@canonical.com wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 All this is great work! Thanks for the write-up as well.

 I think I discovered an issue with it - take a look at this bug
 http://pad.lv/1403738. It seems machine.SetAgentVersion() should be
 handled specially by the multi-env transaction runner, as only after
 calling it the upgrade actually starts and the steps to add env-uuids
 to state collections are executed.


 Sorry - I neglected to do a manual upgrade test before pushing this
 change. Ensuring that code that runs before database migrations have
 occurred still works as it should has been pain point for us while doing
 the multi-environment work.

 I will get this sorted. Thanks for saving me some time by doing the
 initial analysis.

 - Menno




-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Joyent networking issues

2014-12-14 Thread Menno Smits
On 13 December 2014 at 06:34, Curtis Hovey-Canonical cur...@canonical.com
wrote:

 Thank you Menno.

 On Fri, Dec 12, 2014 at 12:01 AM, Menno Smits menno.sm...@canonical.com
 wrote:
  For the last day and a half I've been looking at this bug:
  https://bugs.launchpad.net/juju-core/+bug/1401130
 
  There's a lot of detail attached to the ticket but the short story is
 that
  the Joyent cloud often allocates different internal networks to
 instances,
  meaning that they can't communicate. From what I can tell from relevant
 LP
  tickets, this has been a problem for a long time (perhaps always). It's
 very
  hit and miss - sometimes you get allocated 10 machines in a row that all
 end
  up with the same internal network, but more often than not it only takes
 2
  or 3 machine additions before running into one that can't talk to the
  others.

 Your analysis explains a lot about the the intermittent failures we
 have observed in Juju CI for months.
 ...

  Given that this is looking like a problem/feature at Joyent's end that
 needs
  clarification from them, may I suggest that this issue is no longer
 allowed
  to block CI?

 Speaking for users, there is a regression.

 ...



 We do see intermittent failures using 1.20 in the joyent cloud health
 check. so we know statistically, the problem does exists for every
 juju, but we are seeing 100% failure for master tip. The success rates
 were better for master last week, and the rates for 1.20 and 1.21 are
 great for all weeks.


Based on what we're seeing in CI, I'm thinking there are 3 things at play
here:

1. The new networker wasn't playing well with the way the network
configuration files are set up in Joyent images. Dimiter has disabled the
networker on Joyent for now, increasing the chance of success for 1.21 and
master.

2. As discussed throughout this thread, instances can end up on different
internal networks. This is a Joyent issue which can affect any Juju
release. It's just up to chance whether the tests will pass on Joyent in CI
- if one of  instances that is assigned isn't on the same internal network
as the others the test run will fail. Adding a static route for 10.0.0.0/8
should fix this.

3. Some other issue, yet to be determined, is preventing the Joyent tests
from passing on master only. I will start investigating this once the
static route is being added automatically.





 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Joyent networking issues

2014-12-14 Thread Menno Smits


 Based on what we're seeing in CI, I'm thinking there are 3 things at play
 here:

 1. The new networker wasn't playing well with the way the network
 configuration files are set up in Joyent images. Dimiter has disabled the
 networker on Joyent for now, increasing the chance of success for 1.21 and
 master.

 2. As discussed throughout this thread, instances can end up on different
 internal networks. This is a Joyent issue which can affect any Juju
 release. It's just up to chance whether the tests will pass on Joyent in CI
 - if one of  instances that is assigned isn't on the same internal network
 as the others the test run will fail. Adding a static route for 10.0.0.0/8
 should fix this.


The merge for this just completed.


 3. Some other issue, yet to be determined, is preventing the Joyent tests
 from passing on master only. I will start investigating this once the
 static route is being added automatically.


Looking more closely at the joyent-deploy-trusty-amd64 job you can see that
there is actually 2 reasons for it failing. Sometimes bootstrap fails and
that is clearly because of the routing issue (#2 above, should now be
fixed).

Sometimes bootstrap succeeds, but the juju deploy ... local:dummy-source
command fails with a non-zero exit. Based on the log output it looks like
the API connection gets terminated unexpectedly. This might indicate a
panic while handling the API request. Unfortunately the logs that are saved
with the test failure stop well before bootstrap has completed so we have
little to go off.

I haven't been able to make the problem happen when manually copying what
the test does from my own machine. This might require use of the actual
test CI infrastructure to reproduce.

I've created another ticket to track this issue:
https://bugs.launchpad.net/juju-core/+bug/1402495

I'm guessing CI will remain blocked until until this is fixed so someone
will need to continue with this since I'm about to EOD.

- Menno







 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Also found all-machines.log issue with 1.21

2014-11-26 Thread Menno Smits
The fix for this is merging into master and 1.21 now.

The problem was that for local provider based environments, unit and
machine agents weren't including the required namespace in their log
message tags. rsyslogd on the server was expecting to see the namespace so
it filtered out the unexpected messages. I think this regression was
probably introduced when we started using a Go syslog client on non-state
server machines instead of rsyslogd (so that we can have non-Linux
machines).



On 27 November 2014 at 14:23, Menno Smits menno.sm...@canonical.com wrote:



 On 27 November 2014 at 10:52, Tim Penhey tim.pen...@canonical.com wrote:

 https://bugs.launchpad.net/juju-core/+bug/1396796

 Would love other people to check.

 I don't know if it happens on other providers, just using local right now.

 I would appreciate it if someone could confirm working or not working on
 EC2.


 I've checked on EC2 and the problem doesn't appear to happen there, both
 when upgrading or starting with fresh install. Seems like this is a local
 only issue.

 I've been digging in to this problem with the local provider but don't
 have an answer yet, just theories.

 - Menno

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Automatic environment filtering for DB queries

2014-11-18 Thread Menno Smits
Thanks John and Tim. I really like these ideas, especially because it means
the team doesn't need to learn a new way of working (and remember to keep
using the new way). In most cases, the thing returned by getCollection()
will be able to be used in the same way as before, even though it'll
actually be a different type of thing. I'll try this approach out in the
next day or so.

I have thought about what can be done about ensuring env UUIDs are
correctly added or updated during inserts and updates but I don't think
there's anything practical we can do there. I think we need to rely on
diligence and testing to ensure that writes to the DB correctly handle
environment UUIDs.



On 18 November 2014 17:03, Tim Penhey tim.pen...@canonical.com wrote:

 Wrapping the collection objects that are returned from the
 `getCollection` method shouldn't be too hard and it could wrap the Find
 and FindId calls.  It would have fixed this missed call below:

 // aliveUnitsCount returns the number a alive units for the service.
 func aliveUnitsCount(service *Service) (int, error) {
 units, closer := service.st.getCollection(unitsC)
 defer closer()

 query := bson.D{{service, service.doc.Name}, {life, Alive}}
 return units.Find(query).Count()
 }

 However it is not just finding that we need to care about, but setting
 and updating the collections.  Hopefully testing would cover the cases
 where we aren't setting what we think we are setting, but that is much
 harder to catch as the main execution flow is to just run these
 transaction operations.

 Tim


 On 18/11/14 16:45, John Meinel wrote:
  I've had this around to think about for a bit. I think it is ok, though
  sniffing an input parameter to change behavior seems brittle. Mostly
  because the object isn't really design to work that way. Could we wrap
  the objects so that we just have Find/FindId do the right thing to start?
 
  I suppose that is a fair amount of more work. I certainly do like the
  idea of having a common pattern rather than requiring everyone to know
  exactly whether this object is different than the rest.
 
  John
  =:-
 
 
  On Thu, Nov 13, 2014 at 8:10 AM, Menno Smits menno.sm...@canonical.com
  mailto:menno.sm...@canonical.com wrote:
 
  Team Onyx has been busy preparing the MongoDB collections used by
  Juju to support data for multiple environments. For each collection
  that needs to store data for multiple environments we have been
  adding the a env-uuid field to each document and prefixing the
  environment UUID to the document id, with the previous document id
  being moved to a new field. Apart from the document changes
  themselves, we've been adjusting the state package implementation to
  match the document changes.
 
  Part of this task is ensuring that all DB queries correctly filter
  by environment so that we don't end up unintentionally leaking data
  across environments. To avoid opportunities for us to forget to add
  environment filtering to the DB queries using by Juju, sabdfl would
  like us to consider ways to make this filtering happen
  automatically. To this end, I propose we add the following methods:
 
  func (s *State) Find(coll *mgo.Collection, sel bson.D) *mgo.Query
  func (s *State) FindId(coll *mgo.Collection, id string)
 *mgo.Query
 
 
  The idea is that almost all MongoDB queries performed by Juju would
  use these 2 methods. They would become the default way that we do
  queries, used even on collections that don't contain data for
  multiple environments.
 
  Both methods would check the collection passed in against a fixed
  set of collections that use environment UUIDs. If the collection
  doesn't use environment UUIDs then the lookup is passed through to
  mgo unmodified. If the collection /does/ use environment UUIDs then
  Find() would automatically add the appropriate env-uuid field to
  the query selector.  Similarly, FindId() would automatically call
  docID() on the supplied id. Pretty simple.
 
  If use of these methods becomes default way the team does DB queries
  in Juju code, then we greatly reduce the risk of leaking data
  between environments. They also allows us to remove one concern from
  each Find/FindId call site - as environment filtering is taken care
  of by these methods, it does not having to be repeated all
  throughout the codebase. To get us started, I intend to perform a
  mass-refactoring of all existing Find and FindId calls to use these
  new methods.
 
  To make the proposal clearer, here's some examples:
 
  Find call:   err := units.Find(bson.D{{env-uuid: ...}, {service:
  service}}).All{docs)
becomes:   err := st.Find(units, bson.D{{service:
  service}}).All{docs)
 
  FindId call: err = units.FindId(w.st.docID(unitName)).One(doc)
  becomes: err = w.st.FindId

Automatic environment filtering for DB queries

2014-11-12 Thread Menno Smits
Team Onyx has been busy preparing the MongoDB collections used by Juju to
support data for multiple environments. For each collection that needs to
store data for multiple environments we have been adding the a env-uuid
field to each document and prefixing the environment UUID to the document
id, with the previous document id being moved to a new field. Apart from
the document changes themselves, we've been adjusting the state package
implementation to match the document changes.

Part of this task is ensuring that all DB queries correctly filter by
environment so that we don't end up unintentionally leaking data across
environments. To avoid opportunities for us to forget to add environment
filtering to the DB queries using by Juju, sabdfl would like us to consider
ways to make this filtering happen automatically. To this end, I propose we
add the following methods:

func (s *State) Find(coll *mgo.Collection, sel bson.D) *mgo.Query
func (s *State) FindId(coll *mgo.Collection, id string) *mgo.Query


The idea is that almost all MongoDB queries performed by Juju would use
these 2 methods. They would become the default way that we do queries, used
even on collections that don't contain data for multiple environments.

Both methods would check the collection passed in against a fixed set of
collections that use environment UUIDs. If the collection doesn't use
environment UUIDs then the lookup is passed through to mgo unmodified. If
the collection *does* use environment UUIDs then Find() would automatically
add the appropriate env-uuid field to the query selector.  Similarly,
FindId() would automatically call docID() on the supplied id. Pretty simple.

If use of these methods becomes default way the team does DB queries in
Juju code, then we greatly reduce the risk of leaking data between
environments. They also allows us to remove one concern from each
Find/FindId call site - as environment filtering is taken care of by these
methods, it does not having to be repeated all throughout the codebase. To
get us started, I intend to perform a mass-refactoring of all existing Find
and FindId calls to use these new methods.

To make the proposal clearer, here's some examples:

Find call:   err := units.Find(bson.D{{env-uuid: ...}, {service:
service}}).All{docs)
  becomes:   err := st.Find(units, bson.D{{service: service}}).All{docs)

FindId call: err = units.FindId(w.st.docID(unitName)).One(doc)
becomes: err = w.st.FindId(units, unitName).One(doc)

Does this sound reasonable? Is there another approach I should be
considering?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: review board not syncing?

2014-11-11 Thread Menno Smits
Since I'm on-call reviewing today and don't know how to debug the problem,
I will watch both Github and RB for updates. For branches that don't make
it to RB, I'll review on GH.

On 12 November 2014 08:28, Jesse Meek jesse.m...@canonical.com wrote:

 The latest three reviews on GitHub (#1103,#1102,#1101) I cannot see in
 Review Board. Do we have a loose wire?

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at: https://lists.ubuntu.com/
 mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Using subdocument _id fields for multi-environment support

2014-10-10 Thread Menno Smits
TL;DR: It has been surprisingly difficult As per what has already been done
for the units and services collections, we will continue with the approach
of using uuid:id style string ids while also adding separate env-UUID
and collection specific identifier fields.

Jesse and I have been making the changes to use subdocument ids for the
units and services collections this week at the sprint and we've come up
against some unexpected issues.

We found that one part of the mgo/txn package wasn't happy using struct ids
and have been working with Gustavo to fix that. This isn't a show-stopper
but has slowed us down.

We also found unexpected friction with the implementation of the watchers
and entity life. These areas deeply assume that our document ids are
strings and fixing them requires wide-ranging and often ugly changes which
will take significant time to get right. It's been brick wall after brick
wall. We discussed with Tim, Will, John and Ian yesterday and given that
it's important that multi-environment support lands soon and given that the
watchers are going to completely change in the not too distant future[1],
we have abandoned the approach of using subdocument idfs for
multi-environment support. The benefits of using subdocuments ids are
outweighed by the chan



- Menno

[1] opening up the possibility of surrogate keys as document ids, where we
need application domain fields to exist fields outside of the _id.


On 1 October 2014 22:11, Menno Smits menno.sm...@canonical.com wrote:



 On 2 October 2014 01:31, Kapil Thangavelu kapil.thangav...@canonical.com
 wrote:

 it feels a little strange to use a mutable object for an immutable field.
 that said it does seem functional. although the immutability speaks to the
 first disadvantage noted for the separate fields namely becoming out of
 sync, which afaics isn't something that's possible with the current model,
 ie. a change of name needs to generate a new doc. Names (previous _id) are
 unique in usage minus the extant bug that unit ids are reused. even without
 that the benefits to avoiding the duplicate doc data and manual parse on
 every _id seem like clear wins for subdoc _ids.


 Just to be really sure, I added a test that exercises the case of one of
 the _id fields changing. See TestAttemptedIdUpdate in the (just updated)
 gist. MongoDB stops us from doing anything stupid (as expected).


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Using subdocument _id fields for multi-environment support

2014-09-30 Thread Menno Smits
Team Onyx has been busy preparing for multi-environment state server
support. One piece of this is updating almost all of Juju's collections to
include the environment UUID in document identifiers so that data for
multiple environments can co-exist in the same collection even when they
otherwise have same identifier (machine id, service name, unit name etc).

Based on discussions on juju-dev a while back[1] we have started this doing
this by prepending the environment UUID to the _id field and adding extra
fields which provide the environment UUID and old _id value separately for
easier querying and handling.

So far, services and units have been migrated. Where previously a service
document looked like this:

type serviceDoc struct {
 Name  string `bson:_id`
 Seriesstring
 ...

it nows looks like this:

type serviceDoc struct {
 DocID string `bson:_id`   // env uuid:wordpress/0
 Name  string `bson:name`  // wordpress/0
 EnvUUID   string `bson:env-uuid`  // env uuid
 Seriesstring
 ...

Unit documents have undergone a similar transformation.

This approach works but has a few downsides:

   - it's possible for the local id (Name in this case) and EnvUUID
   fields to become out of sync with the corresponding values the make up the
   _id. If that ever happens very bad things could occur.
   - it somewhat unnecessarily increases the document size, requiring that
   we effectively store some values twice
   - it requires slightly awkward transformations between UUID prefixed and
   unprefixed IDs throughout the code

MongoDB allows the _id field to be a subdocument so Tim asked me to
experiment with this to see if it might be a cleaner way to approach the
multi-environment conversion before we update any more collections. The
code for these experiments can be found here:
https://gist.github.com/mjs/2959bb3e90a8d4e7db50 (I've included the output
as a comment on the gist).

What I've found suggests that using a subdocument for the _id is a better
way forward. This approach means that each field value is only stored once
so there's no chance of the document key being out of sync with other
fields and there's no unnecessary redundancy in the amount of data being
stored. The fields in the _id subdocument are easy to access individually
and can be queried separately if required. It is also possible to create
indexes on specific fields in the _id subdocument if necessary for
performance reasons.

Using this approach, a service document would end up looking something like
this:

type serviceDoc struct {
 IDserviceId `bson:_id`
 Seriesstring
 ...
}

type serviceId struct {
  EnvUUID string `bson:env-uuid`
  Namestring
}

There was some concern in the original email thread about whether
subdocument style _id fields would work with sharding. My research and
experiments suggest that there is no issue here. There are a few types of
indexes that can't be used with sharding, primarily multikey indexes, but
I can't see us using these for _id values. A multikey index is used by
MongoDB when a field used as part of an index is an array - it's highly
unlikely that we're going to use arrays in _id fields.

Hashed indexes are a good basis for well-balanced shards according to the
MongoDB docs so I wanted to be sure that it's OK to create a hashed index
for subdocument style fields. It turns out there's no issue here (see
TestHashedIndex in the gist).

Using subdocuments for _id fields is not going to prevent us from using
MongoDB's sharding features in the future if we need to.

Apart from having to rework the changes already made to the services and
units collections[2], I don't see any downsides to this approach. Can
anyone think of something I might be overlooking?

- Menno


[1] - subject was RFC: mongo _id fields in the multi-environment juju
server world

[2] - this work will have to be done before 1.21 has a stable release
because the units and services changes have already landed.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Someone please look at these bugs ASAP

2014-09-17 Thread Menno Smits
On 18 September 2014 08:16, Nate Finch nate.fi...@canonical.com wrote:




 *Unable to connect to environment after local upgrade on precise*
 https://bugs.launchpad.net/juju-core/+bug/1370635


At first glance, this one seems like it could be related to recent
multi-env state server changes. I'll take a look soon (thumper is out
today).
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: ReviewBoard is now the official review tool for juju

2014-09-15 Thread Menno Smits
On 16 September 2014 08:39, Ian Booth ian.bo...@canonical.com wrote:



 On 16/09/14 00:50, Eric Snow wrote:


  Step (0) is also pretty easy and I'll argue that people should be
  doing it anyway.
 

 Disagree :-)
 I never (or very, very rarely) had to do this with Launchpad and bzr and
 things
 Just Worked. I don't do it now with github and pull requests. I'd like to
 think
 we'd be able to avoid the burden moving forward also.


Sorry, I didn't mean for this to turn into a rebase vs merge discussion. A
different choice of wording would have helped. The first step could have
been written like:

0.  Sync up your feature branch with upstream (by merging or rebasing)

Some people like rebasing and some like merging. It doesn't matter much to
the rest of the team which approach you use but I presume that everyone
syncs up their branch somehow soon before proposing (rerunning tests etc)
to ensure that other people's changes haven't impacted theirs.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: ReviewBoard is now the official review tool for juju

2014-09-14 Thread Menno Smits
Eric,

Thanks for setting this up.

Firstly, in your email you mention rbt pull several times. I think you
mean rbt post right? I don't see a pull command documented anywhere.

I've run in to one issue so far. When I tried to get my first review in to
Reviewboard today it took me a long time to figure out how to get it to
generate the correct diff. After much gnashing of teeth I figured out that
rbt post generates a diff by comparing origin/master against the
current branch. This means that if you haven't updated your local master
branch recently *and pushed your local master branch to your personal fork
on Github* (this is the part I missed) you'll end up with diffs that
include lots of changes that have already been merged and have nothing to
do with your branch.

As things stand the workflow actually needs to be:

1. Ensure your feature branch is rebased against upstream/master
2. Create a pull request like normal via github.
3. Switch to your local master branch.
4. git pull to update master
5. git push origin master to update your personal master on github.
5. Switch back to your feature branch (git checkout - helps here)
6. Run rbt post while at your branch to create a review request.
7. open the review request in your browser and publish it.
  - alternately use the rbt --open (-o) and/or --publish (-p) flags.
8. add a comment to the PR with a link to the review request.
9. address reviews until you get a Ship It! (like normal, with LGTM).
10. add a $$merge$$ comment to the PR (like normal).

This is a bit confusing and inconvenient. I can see us all forgetting to
keep our personal master branches on GH updated.

It looks like the TRACKING_BRANCH option in .reviewboardrc could be
helpful. It defaults to origin/master but if we changed it to
upstream/master I suspect Reviewboard will then generate diffs against
the shared master branch instead of what we might happen to have in master
in our personal forks. The of course relies on every developer having a
remote called upstream that points to the right place (which isn't
necessarily true).

If TRACKING_BRANCH isn't going to work then whatever automation we come up
with to streamline RB/GH integration is probably going to have to sort this
out.

- Menno








On 14 September 2014 14:45, Eric Snow eric.s...@canonical.com wrote:

 Hi all,

 Starting now new code review requests should be made on
 http://reviews.vapour.ws (see more below on creating review requests).
 We will continue to use github for everything else (pull requests,
 merging, etc.).  I've already posted some of the below information
 elsewhere, but am repeating it here for the sake of reference.  I plan
 on updating CONTRIBUTING.md with this information in the near future.
 Please let me know if you have any feedback.  Happy reviewing!

 -eric

 Authentication
 --
 Use the Github OAuth button on the login page to log in.  If you don't
 have an account yet on ReviewBoard, your github account name will
 automatically be registered for you.  ReviewBoard uses session
 cookies, so once you have logged in you shouldn't need to log in again
 unless you log out first.

 For the reviewboard commandline client (rbt), use your reviewboard
 username and a password of oauth:username@github.  This should
 only be needed the first time.

 RBTools
 --

 ReviewBoard has a command-line tool that you can install on your local
 system called rbt.  rbt is the recommended tool for creating and
 updating review requestsion.  The documentation covers installation
 and usage.  It has satisfied my questions thus far.

 https://www.reviewboard.org/docs/rbtools/0.6/

 The key sub-command is post (see rbt post -h).

 To install you can follow the instructions in the rbtools docs.  You
 can also install using pip (which itself may need to be installed
 first):

 $ virtualenv ~/.venvs/reviewboard ~/.venvs/reviewboard/bin/pip install
 --allow-unverified rbtools --allow-external rbtools rbtools
 $ alias rbt='~/.venvs/reviewboard/bin/rbt'

 (you could just sudo pip install it, but the --allow-unverified flag
 makes it kind of sketchy.)

 Workflow
 ---

 1. Create a pull request like normal via github.
 2. Run rbt pull while at your branch to create a review request.
   - if the repo does not have a .reviewboardrc file yet, you'll need
 to run rbt setup-repo.
   - make sure your branch is based on an up-to-date master.
   - if the revision already has a review request you will need to
 update it (see below).
 3. open the review request in your browser and publish it.
   - alternately use the rbt --open (-o) and/or --publish (-p) flags.
 4. add a comment to the PR with a link to the review request.
 5. address reviews until you get a Ship It! (like normal, with LGTM).
 6. add a $$merge$$ comment to the PR (like normal).

 Keep in mind that the github-related steps aren't strictly necessary
 for the sake a getting a code review.  They are if you want to merge
 the patch though. :)  I 

Re: Fixed reviewer schedule

2014-09-11 Thread Menno Smits
A possibly dumb question: is there any way to see who is on for a given
day? I can only see my own days.

On 11 September 2014 20:14, Dimiter Naydenov dimiter.nayde...@canonical.com
 wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 11.09.2014 06:12, Tim Penhey wrote:
  Hi folks,
 
  Those of you who are reviewers should now have invites to your
  bi-weekly review time.  This now occurs on the same day every two
  weeks. I have tried to have the mentors on the day after the
  mentees (or overlapping).  Also tried to spread out the different
  timezones.  It will never be perfect, but hopefully this is better
  now.
 
  Cheers, Tim
 

 Great job Tim! No more poking into spreadsheets :)

 - --
 Dimiter Naydenov dimiter.nayde...@canonical.com
 juju-core team
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1

 iQEcBAEBAgAGBQJUEVnWAAoJENzxV2TbLzHwxzMH/3cIDt9y4ESoXcm4Ige/9dB/
 ty6pkPEjD6ZA6Yteopj/bzqbTrXTkQwKm0UZAw31uqx04DbAMwzemJkjHCc5dpQH
 qElnA6dilFmDStb+yghQur25ucf0gjgsvoA7ADzFfvIpHgDB4nkRVvJRCCyI4Wu7
 7/THnCRvl+VrtrY6FvzMAINkuCTqKZJ9232+Q6FUJsT0bi2b3Q8JVaonFNmZa3Bm
 jSdcvyABFMg48uvbdnfvW0KVGIg5V43YH8IoT9HxcW6ezjWqeU1lVW34xWQ1i8al
 JCru1IC071f/dg89th6sZe+uMXyWrluFAwV29FwQd0+vzenPDlRC/y9eAHqReWw=
 =nmlo
 -END PGP SIGNATURE-

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: ReviewBoard and our workflow

2014-09-10 Thread Menno Smits
Thanks Eric! I've used Reviewboard at a previous job and I'm fairly sure
that it aligns better with the way the Juju Core team likes to work than
Github's review features.

Two questions:

1. Is this what we're supposed to be doing from right now?

2. I'm pretty sure some configuration of the rbt tool is required so that
it knows how to talk to the Reviewboard server. Is there a config file
available?



On 11 September 2014 03:58, Eric Snow eric.s...@canonical.com wrote:

 Steps for a review of a PR:

 1. create pull request in github
 2. run rbt post while at your branch to create a review request [1][2]
 3. open the review request in your browser and publish it [3]
 4. add a comment to the PR with a link to the review request
 5. address reviews until you get a Ship It!
 6. add a $$merge$$ comment to the PR

 Both github and ReviewBoard support various triggers/hooks and both
 have robust HTTP APIs.  So we should be able to automate those steps
 (e.g. PR - review request, ship it - $$merge$$).  However, I don't
 see that automation as a prerequisite for switching over to
 ReviewBoard.

 Updating an existing review request:
 1. run rbt post -u (or the explicit rbt post -r #)
 2. open the review request in your browser and publish it [3]

 FYI, Reviewboard supports chaining review requests.  Run rbt post
 --parent parent branch.

 I'll be updating the contributing doc relative to ReviewBoard (i.e.
 with the above info) once we settle in with the new tool.

 -eric

 [1] Make sure your branch is based on upstream master.  Otherwise this
 will not work right.
 [2] Reviewboard links revision IDs to review requests.  So if you
 already have a review request for a particular revision (e.g. your
 branch), then rbt post will fail.  Use rbt post -u or rbt post -r
 # instead.
 [3] rbt post has some options you should consider using:
   - automatically publish the review request: rbt post --publish (or -p)
   - open a browser window with the new review request: rbt post --open (or
 -o)

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Important Blocking Bug

2014-09-09 Thread Menno Smits
Given that I merged a big upgrade related change yesterday, this could be
me (although I did test it extensively manually). I'll take a look.



On 10 September 2014 09:01, Nate Finch nate.fi...@canonical.com wrote:

 https://bugs.launchpad.net/juju-core/+bug/1367431

 Someone from the down under crew, please take a look at this, since the US
 is currently at EOD or nearly so.

 -Nate

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Important Blocking Bug

2014-09-09 Thread Menno Smits
I've looked into this a bit and the problem is due to one of the machines
being unable to download tools for the upgrade. This could be due to the
recent changes in tools storage.

Ian has now taken over.

On 10 September 2014 09:11, Menno Smits menno.sm...@canonical.com wrote:

 Given that I merged a big upgrade related change yesterday, this could be
 me (although I did test it extensively manually). I'll take a look.



 On 10 September 2014 09:01, Nate Finch nate.fi...@canonical.com wrote:

 https://bugs.launchpad.net/juju-core/+bug/1367431

 Someone from the down under crew, please take a look at this, since the
 US is currently at EOD or nearly so.

 -Nate

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Logger prefixes for api/apiserver are changing

2014-09-02 Thread Menno Smits
On 3 September 2014 15:07, Menno Smits menno.sm...@canonical.com wrote:



 Also, when we talk about package paths we really mean the source tree path
 right? Every in cmd/juju is in the main package but uses a logger named
 juju.cmd.juju. We can really use the real package name to set the logger
 name.


Urgh. s/can/can't/
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Fwd: Storage server temporarily offline

2014-08-18 Thread Menno Smits
An update from Github about the current outage.

-- Forwarded message --
From: John Greet (GitHub Staff) supp...@github.com
Date: 19 August 2014 12:37
Subject: Re: Storage server temporarily offline
To: Menno Smits menno.sm...@canonical.com


Hi Menno,

Sorry about that, we're working on restoring access to the file server your
repository resides on. Status updates are being posted here:

https://status.github.com

If you're still unable to access the repository when our status is back to
green please let us know.

Thanks,
John
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Intentionally introducing failures into Juju

2014-08-13 Thread Menno Smits
I like the idea being able to trigger failures using the juju command line.

I'm undecided about how the need to fail should be stored. An obvious
location would be in a new collection managed by state, or even as a field
on existing state objects and documents. The downside of this approach is
that a connection to state will then need to be available from where-ever
we would like failures to be triggered - this isn't always possible or
convenient.

Another approach would be to have juju inject-failure drop files in some
location (along the lines of what I've already implemented) using SSH. This
has the advantage of making the failure checks easy to perform from
anywhere with the disadvantage of making it more difficult to manage
existing failures. There would also be some added complexity when creating
failure files for about-to-be-created entities (e.g. the juju deploy
--inject-failure case).

Do you have any thoughts on this?




On 14 August 2014 02:25, Gustavo Niemeyer gustavo.nieme...@canonical.com
wrote:

 That's a nice direction, Menno.

 The main thing that comes to mind is that it sounds quite inconvenient
 to turn the feature on. It may sound otherwise because it's so easy to
 drop files at arbitrary places in our local machines, but when dealing
 with a distributed system that knows how to spawn its own resources
 up, suddenly the just write a file becomes surprisingly boring and
 race prone.

 What about:

 juju inject-failure [--unit=unit] [--service=service] failure name?
 juju deploy [--inject-failure=name] ...



 On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits menno.sm...@canonical.com
 wrote:
  There's been some discussion recently about adding some feature to Juju
 to
  allow developers or CI tests to intentionally trigger otherwise hard to
  induce failures in specific parts of Juju. The idea is that sometimes we
  need some kind of failure to happen in a CI test or when manually testing
  but those failures can often be hard to make happen.
 
  For example, for changes Juju's upgrade mechanics that I'm working on at
 the
  moment I would like to ensure that an upgrade is cleanly aborted if one
 of
  the state servers in a HA environment refuses to start the upgrade. This
  logic is well unit tested but there's nothing like seeing it actually
 work
  in a real environment to build confidence - however, it isn't easy to
 make a
  state server misbehave in this way.
 
  To help with this kind of testing scenario, I've created a new top-level
  package called wrench which lets us drop a wrench in the works so to
  speak. It's very simple with one main API which can be called from
  judiciously chosen points in Juju's execution to decide whether some
 failure
  should be triggered.
 
  The module looks for files in $jujudatadir/wrench (typically
  /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
  upgrade failure described above I could drop a file in that directory on
 one
  of the state servers named say machine-agent with the content:
 
  refuse-upgrade
 
  Then in some part of jujud's upgrade code there could be a check like:
 
  if wrench.IsActive(machine-agent, refuse-upgrade) {
   // trigger the failure
  }
 
  The idea is this check would be left in the code to aid CI tests and
 future
  manual tests.
 
  You can see the incomplete wrench package here:
  https://github.com/juju/juju/pull/508
 
  There are a few issues to nut out.
 
  1. It needs to be difficult/impossible for someone to accidentally or
  maliciously activate this feature, especially in production
 environments. I
  have almost finished (but not pushed to Github) some changes to the
 wrench
  package which make it strict about the ownership and permissions on the
  wrench files. This should make it harder for the wrong person to drop
 files
  in to the wrench directory.
 
  The idea has also been floated to only enable this functionality in
  non-stable builds. This certainly gives a good level of protection but
 I'm
  slightly wary of this approach because it makes it impossible for CI to
 take
  advantage of the wrench feature when testing stable release builds. I'm
  happy to be convinced that the benefit is worth the cost.
 
  Other ideas on how to better handle this are very welcome.
 
  2. The wrench functionality needs to be disabled during unit test runs
  because we don't want any wrench files a developer may have lying around
 to
  affect Juju's behaviour during test runs. The wrench package has a global
  on/off switch so I plan on switching it off in BaseSuite's setup or
 similar.
 
  3. The name is a bikeshedding magnet :)  Other names that have been
 bandied
  about for this feature are chaos and spanner. I don't care too much
 so
  if there's a strong consensus for another name let's use that. I chose
  wrench over spanner because I believe that's the more common usage in
  the US and because Spanner is a DB from Google. Let's not get carried
  away...
 
  All comments

Re: Intentionally introducing failures into Juju

2014-08-13 Thread Menno Smits
I like the idea of being able to trigger failures stochastically. I'll
integrate this into whatever we settle on for Juju's failure injection.


On 14 August 2014 02:29, Gustavo Niemeyer gustavo.nieme...@canonical.com
wrote:

 Ah, and one more thing: when developing the chaos-injection mechanism
 in the mgo/txn package, I also added both a chance parameter for
 either killing or slowing down a given breakpoint. It sounds like it
 would be useful for juju's mechanism too. If you kill every time, it's
 hard to tell whether the system would know how to retry properly.
 Killing or slowing down just sometimes, or perhaps the first 2 times
 out of every 3, for example, would enable the system to recover
 itself, and an external agent to ensure it continues to work properly.

 On Wed, Aug 13, 2014 at 11:25 AM, Gustavo Niemeyer
 gustavo.nieme...@canonical.com wrote:
  That's a nice direction, Menno.
 
  The main thing that comes to mind is that it sounds quite inconvenient
  to turn the feature on. It may sound otherwise because it's so easy to
  drop files at arbitrary places in our local machines, but when dealing
  with a distributed system that knows how to spawn its own resources
  up, suddenly the just write a file becomes surprisingly boring and
  race prone.
 
  What about:
 
  juju inject-failure [--unit=unit] [--service=service] failure
 name?
  juju deploy [--inject-failure=name] ...
 
 
 
  On Wed, Aug 13, 2014 at 7:17 AM, Menno Smits menno.sm...@canonical.com
 wrote:
  There's been some discussion recently about adding some feature to Juju
 to
  allow developers or CI tests to intentionally trigger otherwise hard to
  induce failures in specific parts of Juju. The idea is that sometimes we
  need some kind of failure to happen in a CI test or when manually
 testing
  but those failures can often be hard to make happen.
 
  For example, for changes Juju's upgrade mechanics that I'm working on
 at the
  moment I would like to ensure that an upgrade is cleanly aborted if one
 of
  the state servers in a HA environment refuses to start the upgrade. This
  logic is well unit tested but there's nothing like seeing it actually
 work
  in a real environment to build confidence - however, it isn't easy to
 make a
  state server misbehave in this way.
 
  To help with this kind of testing scenario, I've created a new top-level
  package called wrench which lets us drop a wrench in the works so to
  speak. It's very simple with one main API which can be called from
  judiciously chosen points in Juju's execution to decide whether some
 failure
  should be triggered.
 
  The module looks for files in $jujudatadir/wrench (typically
  /var/lib/juju/wrench) on the local machine. If I wanted to trigger the
  upgrade failure described above I could drop a file in that directory
 on one
  of the state servers named say machine-agent with the content:
 
  refuse-upgrade
 
  Then in some part of jujud's upgrade code there could be a check like:
 
  if wrench.IsActive(machine-agent, refuse-upgrade) {
   // trigger the failure
  }
 
  The idea is this check would be left in the code to aid CI tests and
 future
  manual tests.
 
  You can see the incomplete wrench package here:
  https://github.com/juju/juju/pull/508
 
  There are a few issues to nut out.
 
  1. It needs to be difficult/impossible for someone to accidentally or
  maliciously activate this feature, especially in production
 environments. I
  have almost finished (but not pushed to Github) some changes to the
 wrench
  package which make it strict about the ownership and permissions on the
  wrench files. This should make it harder for the wrong person to drop
 files
  in to the wrench directory.
 
  The idea has also been floated to only enable this functionality in
  non-stable builds. This certainly gives a good level of protection but
 I'm
  slightly wary of this approach because it makes it impossible for CI to
 take
  advantage of the wrench feature when testing stable release builds. I'm
  happy to be convinced that the benefit is worth the cost.
 
  Other ideas on how to better handle this are very welcome.
 
  2. The wrench functionality needs to be disabled during unit test runs
  because we don't want any wrench files a developer may have lying
 around to
  affect Juju's behaviour during test runs. The wrench package has a
 global
  on/off switch so I plan on switching it off in BaseSuite's setup or
 similar.
 
  3. The name is a bikeshedding magnet :)  Other names that have been
 bandied
  about for this feature are chaos and spanner. I don't care too much
 so
  if there's a strong consensus for another name let's use that. I chose
  wrench over spanner because I believe that's the more common usage
 in
  the US and because Spanner is a DB from Google. Let's not get carried
  away...
 
  All comments, ideas and concerns welcome.
 
  - Menno
 
 
 
  --
  Juju-dev mailing list
  Juju-dev@lists.ubuntu.com
  Modify settings or unsubscribe

Re: issuing a jujud restart

2014-08-10 Thread Menno Smits
How this happens is slightly complex but the short answer is that if any of
jujud's workers exit with a fatalError (as defined in cmd/jujud/agent.go),
then jujud will terminate (and then be restarted by upstart).

I'm not sure how you should propagate the need to exit from the restore API
call through to the worker but I'm sure that's doable.

HTH,
Menno



On 9 August 2014 11:32, Horacio Duran horacio.du...@canonical.com wrote:

 Hey, I am currently working on restore command for juju state servers. I
 find myself in the need, after putting all the back-up parts in place, of
 restarting jujud, from within jujud. Can anyone shed some light on how to
 do that?
 Thank you all.
 Cheers
 --
 Horacio Durán


 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: avoiding unnecessarily entering upgrade mode

2014-08-10 Thread Menno Smits
On 6 August 2014 19:09, Dimiter Naydenov dimiter.nayde...@canonical.com
wrote:


  I would like to change the machine agent so that upgrade mode is
  only entered if the version in agent.conf is different from the
  running software version. This would mean that upgrade mode is
  only entered if there is an actual upgrade to perform.
 If you are referring to Upgraded-To version field in agent config, I
 think this is set after the upgrade completes, so it might be
 unavailable before that.


I think this should be OK. The upgradedToVersion field in an agent.conf
will be set to the last version for which upgrade steps were successfully
performed. If this is the same as version.Current then we know there will
be no upgrade to perform. If they are different, or if upgradedToVersion is
not set, then we know there may be upgrade steps to perform.

Thanks,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: avoiding unnecessarily entering upgrade mode

2014-08-10 Thread Menno Smits
On 7 August 2014 03:13, William Reade william.re...@canonical.com wrote:

 SGTM too. It should always have worked like this -- rerunning all our
 upgrade steps every time is *crazy*.


Sorry, I got this detail wrong. The machine agent isn't rerunning upgrade
steps unnecessarily, but it is waiting for an unnecessarily long time
before the deciding on whether upgrades are required. This is what I'm
fixing (today).

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: determine if juju is upgrading

2014-08-05 Thread Menno Smits
In recent versions of Juju (i.e. post 1.20) the agent-state-info field
for each machine agent in the status output will say upgrading to
version while the upgrade is in progress. This is could be used by tests
to know when the upgrade is finished.




On 6 August 2014 05:40, Horacio Duran horacio.du...@canonical.com wrote:

 Hey, I have been running several CI tests lately and find very often the
 following error:
 2014-08-04 22:27:42 ERROR juju.cmd supercommand.go:323 upgrade in progress
 -
 At least when my machine is not under heavy load and I am at decent
 network reach of amazon.
 I wonder, is there a way to poll juju to know if its upgrading?

 I think it is a bit drastic having CI tests fail for this (unless the test
 checks is related to upgrade in some form) I believe that being able to do
 the check and have the CI test check with a timeout before proceeding with
 the rest of the tests would yield much cleaner results when we need to
 review CI runs that failed.

 Cheers.
 --
 Horacio.

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev


-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


avoiding unnecessarily entering upgrade mode

2014-08-05 Thread Menno Smits
Right now, a Juju machine agent is in upgrade mode from the moment it
starts until the upgrade-steps worker is finished. During this period API
logins are heavily restricted and most of the agent's workers don't start
until upgrade mode stops.

This happens even when there is no upgrade to perform. The upgrade-steps
worker always runs at machine agent startup and upgrade mode is in force
until it finishes.

Upgrade mode is typically short-lived (say 10 seconds) but if something is
wrong (e.g. mongo issues) the upgrade-steps worker may take longer or not
finish resulting in the user seeing lots of upgrade in progress messages
from the client and in the logs.
This is particularly confusing when a user hasn't even requested an upgrade
themselves.

I would like to change the machine agent so that upgrade mode is only
entered if the version in agent.conf is different from the running software
version. This would mean that upgrade mode is only entered if there is an
actual upgrade to perform.

The version in agent.conf is only updated after a successful upgrade so it
is the right thing to use to determine if upgrade mode should be entered.

The current behaviour means that the (idempotent) upgrade steps for the
current version are always run each time the machine agent starts. If the
change I'm proposing is implemented this will no longer happen. Does this
seem like a problem to anyone? For instance, do we rely on the upgrade
steps for the current version being run after bootstrap?

The ticket for this work is at: https://bugs.launchpad.net/bugs/1350111

Cheers,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Please use gopkg.in for importing mgo

2014-07-31 Thread Menno Smits
Trunk is currently broken if building using a clean GOPATH because
revision 03e56dcd was recently merged which imports mgo from labix.org
instead of gopkg.in. We no longer use mgo from labix.org and godeps no
longer installs it from that location.

The following import paths should be used instead:

gopkg.in/mgo.v2
gopkg.in/mgo.v2/bson
gopkg.in/mgo.v2/txn

This was perhaps not publicised well enough but the switch was made a
couple of weeks ago.

Right now juju will only build on machines that incidentally have a
labix.org mgo install. If the machine doesn't already have it, godeps won't
install it and builds fail.

I imagine the problem revision got past the landing bot because our test
hosts still have the labix.org mgo installed. If so, this should be cleaned
up.

I will fix the problem imports in the upgrades package now.

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Please use gopkg.in for importing mgo

2014-07-31 Thread Menno Smits
On 1 August 2014 12:09, Ian Booth ian.bo...@canonical.com wrote:

  The following import paths should be used instead:
 
  gopkg.in/mgo.v2
  gopkg.in/mgo.v2/bson
  gopkg.in/mgo.v2/txn
 
  This was perhaps not publicised well enough but the switch was made a
  couple of weeks ago.
 

 FWIW, the switch was only done yesterday in order to pick up the new mgo
 driver
 version needed to fix a mongo panic, and the changes were only done on
 Github
 and not Launchpad, hence the import path change.


Not quite. Trunk has been using mgo from gopkg.in since July 28. A similar
change was done on the 1.20 branch yesterday so that we could get the
recent mgo fix there.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: backup API in 1.20

2014-07-29 Thread Menno Smits


 But I'm a bit suspicious... would someone please confirm that we don't
 have *any* released clients that use the POST API? The above is predicated
 on that assumption.


The juju command line client hasn't yet seen changes to call the backup
POST API. The work was seemingly being done server first, then client
(which seem logical) and the client work didn't make the release (and isn't
done even today).

IMHO, Eric's proposal seems totally fine to me.
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


help please: mongo/mgo panic

2014-07-29 Thread Menno Smits
All,

Various people have been seeing the machine agents panic with the following
message:

   panic: rescanned document misses transaction in queue

The error message comes from mgo but the actual cause is unknown. There's
plenty of detail in the comments for the LP bug that's tracking this. If
you have any ideas about a possible cause or how to debug this further
please weigh in.

https://bugs.launchpad.net/juju-core/+bug/1318366

Thanks,
Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


log spam related to /etc/network/interfaces

2014-07-16 Thread Menno Smits
I've just noticed that Juju is currently repeating the following every 3
seconds on my dev machine when using the local provider:

machine-0: 2014-07-17 05:12:32 ERROR juju.worker runner.go:218 exited
networker: open /etc/network/interfaces: no such file or directory

/etc/network/interfaces does indeed not exist on my laptop (NetworkManager
takes care of networking).

Seems like something needs to be changed to account for this case. Any
ideas?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Current handling of failed upgrades is screwy

2014-07-15 Thread Menno Smits
OK - points taken.

So taking your ideas and extending them a little, I'm thinking:

   - retry upgrade steps on failure (with inter-attempt delay)
   - indicate when there's upgrade problems by setting the machine agent
   status
   - if despite the retries the upgrade won't complete, report this in
   status and keep the agent running but with the restricted API in place and
   most workers not complete (i.e. as if the upgrade is still running). This
   allows juju status and juju ssh to work unless there's a significant
   upgrade step that hasn't run that prevents them from working.

Does that sound reasonable?




On 15 July 2014 19:33, William Reade william.re...@canonical.com wrote:

 FWIW, we could set some error status on the affected agent (so users can
 see there's a problem) and make it return 0 (so that upstart doesn't keep
 hammering it); but as jam points out that's not helpful when it's a
 transient error. I'd suggest retrying a few times, with some delay between
 attempts, before we do so (although reporting the error, and making it
 clear that we'll retry automatically, is probably worthwhile).

 And, really, I'm not very keen on the prospect of continuing to run when
 we know upgrade steps have failed -- IMO this puts us in an essentially
 unknowable state, and I'd much rather fail hard and early than limp along
 pretending to work correctly. Manual recovery of a failed upgrade will
 surely be tedious whatever we do, but a failed upgrade won't affect the
 operation of properly-written charms -- it's a management failure, so you
 can't scale/relate/whatever, but the actual software deployed will keep
 running. However, I can easily imagine that continuing to run juju agents
 against truly broken state could lead to services actually being shut
 down/misconfigured, and I think that's much more harmful.

 Cheers
 William


 On Thu, Jul 10, 2014 at 9:57 AM, John Meinel j...@arbash-meinel.com
 wrote:

 I think it fundamentally comes down to is the reason upgrade failed
 transient or permanent, if we can try again later, do so, else log at
 Error level, and keep on with your life, because that is the only chance of
 recovery (from what you've said, at least).

 John
 =:-


 On Thu, Jul 10, 2014 at 11:18 AM, Menno Smits menno.sm...@canonical.com
 wrote:

 So I've noticed that the way we currently handle failed upgrades in the
 machine agent doesn't make a lot of sense.

 Looking at cmd/jujud/machine.go:821, an error is created if
 PerformUpgrade() fails but nothing is ever done with it. It's not returned
 and it's not logged. This means that if upgrade steps fail, the agent
 continues running with the new software version, probably with partially
 applied upgrade steps, and there is no way to know.

 I have a unit tested fix ready which causes the machine agent to exit
 (by returning the error as a fatalError) if PerformUpgrade fails but before
 proposing I realised that's not the right thing to do. The agent's upstart
 script will restart the agent and probably cause the upgrade to run and
 fail again so we end up with an endless restart loop.

 The error could also be returned as a non-fatal (to the runner) error
 but that will just cause the upgrade-steps worker to continuously restart,
 attempting the upgrade and failing.

 Another approach could be to set the global agent-version back to the
 previous software version before killing the machine agent but other agents
 may have already upgraded and we can't currently roll them back in any
 reliable way.

 Our upgrade story will be improving in the coming weeks (I'm working on
 that). In the mean time what should we do?

 Perhaps the safest thing to do is just log the error and keep the agent
 running the new version and hope for the best? There is a significant
 chance of problems but this is basically what we're doing now (except
 without logging that there's a problem).

 Does anyone have a better idea?

 - Menno





 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Current handling of failed upgrades is screwy

2014-07-13 Thread Menno Smits
On 10 July 2014 20:57, John Meinel j...@arbash-meinel.com wrote:

 I think it fundamentally comes down to is the reason upgrade failed
 transient or permanent, if we can try again later, do so, else log at
 Error level, and keep on with your life, because that is the only chance of
 recovery (from what you've said, at least).


This is a good approach but I don't see any way that the machine agent can
know if an error is transient or permanent with any certainty.

Tim has contributed some useful guidance. Given that we currently have no
reliable way of automatically rolling back upgrades, we should aim to just
stay on the new software version (this is what we silently do now anyway).
Instead of stopping on the first failed upgrade step, all upgrade steps
should be attempted with all upgrade step failures logged, the thinking
being that the environment is more likely to be operational the more
upgrade steps that have run.

This approach will also be used for the upcoming upgrade changes when
backups aren't available (i.e. when upgrading from a version that doesn't
support the backup API). If backups are available then upgrades will be
aborted after the first failure with the backup being used to roll back any
changes that may have been made.






 John
 =:-


 On Thu, Jul 10, 2014 at 11:18 AM, Menno Smits menno.sm...@canonical.com
 wrote:

 So I've noticed that the way we currently handle failed upgrades in the
 machine agent doesn't make a lot of sense.

 Looking at cmd/jujud/machine.go:821, an error is created if
 PerformUpgrade() fails but nothing is ever done with it. It's not returned
 and it's not logged. This means that if upgrade steps fail, the agent
 continues running with the new software version, probably with partially
 applied upgrade steps, and there is no way to know.

 I have a unit tested fix ready which causes the machine agent to exit (by
 returning the error as a fatalError) if PerformUpgrade fails but before
 proposing I realised that's not the right thing to do. The agent's upstart
 script will restart the agent and probably cause the upgrade to run and
 fail again so we end up with an endless restart loop.

 The error could also be returned as a non-fatal (to the runner) error
 but that will just cause the upgrade-steps worker to continuously restart,
 attempting the upgrade and failing.

 Another approach could be to set the global agent-version back to the
 previous software version before killing the machine agent but other agents
 may have already upgraded and we can't currently roll them back in any
 reliable way.

 Our upgrade story will be improving in the coming weeks (I'm working on
 that). In the mean time what should we do?

 Perhaps the safest thing to do is just log the error and keep the agent
 running the new version and hope for the best? There is a significant
 chance of problems but this is basically what we're doing now (except
 without logging that there's a problem).

 Does anyone have a better idea?

 - Menno





 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Current handling of failed upgrades is screwy

2014-07-10 Thread Menno Smits
So I've noticed that the way we currently handle failed upgrades in the
machine agent doesn't make a lot of sense.

Looking at cmd/jujud/machine.go:821, an error is created if
PerformUpgrade() fails but nothing is ever done with it. It's not returned
and it's not logged. This means that if upgrade steps fail, the agent
continues running with the new software version, probably with partially
applied upgrade steps, and there is no way to know.

I have a unit tested fix ready which causes the machine agent to exit (by
returning the error as a fatalError) if PerformUpgrade fails but before
proposing I realised that's not the right thing to do. The agent's upstart
script will restart the agent and probably cause the upgrade to run and
fail again so we end up with an endless restart loop.

The error could also be returned as a non-fatal (to the runner) error but
that will just cause the upgrade-steps worker to continuously restart,
attempting the upgrade and failing.

Another approach could be to set the global agent-version back to the
previous software version before killing the machine agent but other agents
may have already upgraded and we can't currently roll them back in any
reliable way.

Our upgrade story will be improving in the coming weeks (I'm working on
that). In the mean time what should we do?

Perhaps the safest thing to do is just log the error and keep the agent
running the new version and hope for the best? There is a significant
chance of problems but this is basically what we're doing now (except
without logging that there's a problem).

Does anyone have a better idea?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Dropbox has open sourced some of their Go libraries

2014-07-07 Thread Menno Smits
Hey look - another errors package! */me ducks*

https://tech.dropbox.com/2014/07/open-sourcing-our-go-libraries/
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Critical bugs for 1.20.

2014-06-29 Thread Menno Smits
thumper, axw and I spent quite a bit of time on this on Friday and think we
know what's happening (nothing to do with the API logins change). The
potential fix is here:

https://github.com/juju/juju/pull/188


On 27 June 2014 12:22, Menno Smits menno.sm...@canonical.com wrote:

 I'm currently taking a look at #1334273. It could be related to the
 restricted API logins during upgrades change I made.


 On 27 June 2014 06:00, Curtis Hovey-Canonical cur...@canonical.com
 wrote:

 We are tracking a list of critical bugs that must be fixed to release
 1.20.
 https://launchpad.net/juju-core/+milestone/1.20.0

 I intend to create a 1.20 branch in
 https://github.com/juju/juju
 that we can merge fixes into. Actually I have done that, but github
 doesn't want me to merge by version branch...I need to tinker with the
 issue to resolve it.

 I think the 1.20 branch needs to be version 1.20.0
 I propose we set the master branch to 1.21-alpha1 now so that
 development can continue to add new features.

 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev



-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: Critical bugs for 1.20.

2014-06-26 Thread Menno Smits
I'm currently taking a look at #1334273. It could be related to the
restricted API logins during upgrades change I made.


On 27 June 2014 06:00, Curtis Hovey-Canonical cur...@canonical.com wrote:

 We are tracking a list of critical bugs that must be fixed to release 1.20.
 https://launchpad.net/juju-core/+milestone/1.20.0

 I intend to create a 1.20 branch in
 https://github.com/juju/juju
 that we can merge fixes into. Actually I have done that, but github
 doesn't want me to merge by version branch...I need to tinker with the
 issue to resolve it.

 I think the 1.20 branch needs to be version 1.20.0
 I propose we set the master branch to 1.21-alpha1 now so that
 development can continue to add new features.

 --
 Curtis Hovey
 Canonical Cloud Development and Operations
 http://launchpad.net/~sinzui

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


panics in apiserver/root.go

2014-06-22 Thread Menno Smits
There's been a few panics during test runs in the API server code recently
that look like this:

http://juju-ci.vapour.ws:8080/job/github-merge-juju/221/consoleFull
http://juju-ci.vapour.ws:8080/job/github-merge-juju/223/consoleFull

I've also seen it happen during test runs on my personal machine today
(once out of 10's of runs of the same test suite). Given where the panic is
occurring could it be related to the recent API versioning changes?

- Menno
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: panics in apiserver/root.go

2014-06-22 Thread Menno Smits
Talk about quick turn-around! I'm glad you've (probably) found it.



On 23 June 2014 17:32, John Meinel j...@arbash-meinel.com wrote:

 I think I figured it out, though I'm not 100% sure. I believe we're
 accessing a go map without a mutex, so you can get concurrent access. I'll
 see if I can create a test case that triggers the race detector and fix it.

 John
 =:-


 On Mon, Jun 23, 2014 at 9:24 AM, John Meinel j...@arbash-meinel.com
 wrote:

 I have not seen them before, but it certainly is likely to be related,
 I'll dig into it today.

 John
 =:-


 On Mon, Jun 23, 2014 at 8:21 AM, Menno Smits menno.sm...@canonical.com
 wrote:

  There's been a few panics during test runs in the API server code
 recently that look like this:

 http://juju-ci.vapour.ws:8080/job/github-merge-juju/221/consoleFull
 http://juju-ci.vapour.ws:8080/job/github-merge-juju/223/consoleFull

 I've also seen it happen during test runs on my personal machine today
 (once out of 10's of runs of the same test suite). Given where the panic is
 occurring could it be related to the recent API versioning changes?

 - Menno

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev




-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


Re: How to show diff for a rev?

2014-06-18 Thread Menno Smits
Try the poorly documented -m option to git show, like this:

git show -m rev

For 7360086, this gives exactly the same output as git
diff 7360086^ 7360086.

For 348c104, the output is almost the same as git diff 348c104^ 348c104
except there's some additional hunks at the bottom which I haven't spent
much time figuring out where they're from. The additional bits certainly
look related to the change.




On 18 June 2014 23:37, Martin Packman martin.pack...@canonical.com wrote:

 On 18/06/2014, John Meinel j...@arbash-meinel.com wrote:
 
  So the only syntax that reliably gives me what I want is:
   git dif 348c104^ 348c104
  I was hoping there would be a better shortcut for it. Does anyone have
 some
  more voodoo that I could use to avoid having to type the same thing
 twice?

 That's what I've always done. Often have shas (or sha heads) on my
 clipboard...

 Seems like you could do something like this though:

 $ git config --global alias.d '!sh -c git diff $1^ $1 -'
 $ git d 348c104

 Martin

 --
 Juju-dev mailing list
 Juju-dev@lists.ubuntu.com
 Modify settings or unsubscribe at:
 https://lists.ubuntu.com/mailman/listinfo/juju-dev

-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev


  1   2   >