how to correctly update a juju unit's config with local provider (lxc)?
Folks, I had to update an unit config to change its LXC policy today. I lxc-stopped the unit, updated its config file and then lxc-started it again with -d. So far so good. Then I noticed juju ssh unit/0 stopped working. Permission denied every time, probably because the unit IP address changed or something like or some other check failed, anyway... I knew the unit's IP address so I could ssh to it manually, but juju status keeps showing incorrect info and juju ssh would never work again it seems. I wonder what other side effects I may notice from now on with juju commands and this unit. What would be the correct way to change an unit config with a local provider in this case so Juju is aware of this change since I restarted the unit? — Caio Begotti [ˈka.jo | be.ˈgɔ.t͡ʃi] -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
feedback about juju after using it for a few months
Folks, I just wanted to share my experience with Juju during the last few months using it for real at work. I know it's pretty long but stay with me as I wanted to see if some of these points are bugs, design decisions or if we could simply to talk about them :-) General: 1. Seems that if you happen to have more than... say, 30 machines, Juju starts behaving weirdly until you remove unused machines. One of the weird things is that new deploys all stay stuck with a pending status. That happened at least 4 times, so now I always destroy-environment when testing things just in case. Have anyone else seen this behaviour? Can this because of LXC with Juju local? I do a lot of Juju testing so it's not usual for me to have a couple hundreds of machines after a mont by the way. 2. It's not reliable to use Juju in laptops, which I can understand why of course but just in case... if the system is suspended Juju will not recover itself like the rest of the system services. It looses its connection from its API apparently? Hooks fail too (resuming always seems to call hooks/config-changed)? Is this just with me? 3. The docs recommend writing charms in Python versus shell script. Compared to Python they are subpar enough that I'd recommend saying they are not officially supported then. It's quite common to have race conditions in charms written in shell script. You have to keep polling the status of things because if you just call deploys and set relations in a row they will fail, because Juju won't queue the commands in a logical sequence, it'll just run them dumbly and developers are left in the wild to control it. I'm assuming a Python charm does not have this problem at all? 4. It's not very clear how many times hooks/config-changed runs to me, I'd just guess many :-) so you have to pay attention to it and write extra checks to avoid multiple harmful runs of this hook. I'd say the sequence and number of hooks called by a new deploy is not very clear based on the documentation because of this. Hmm perhaps I could print debug it and count the hits... 5. Juju should queue multiple deployment in order not to hurt performance, both of disk and network IO. More than 3 deployments in parallel on my machine makes it all really slow. I just leave Juju for a while and go get some coffee because the system goes crazy. Or I have to break up manually the deployments, while Juju could have just queued it all and the CLI could simply display it as queued instead. I know it would need to analyse the machine's hardware to guess a number different from 3 but think about it if your deployments have about 10 different services... things that take 20 minutes can easily take over 1 hour. 6. There is no way to know if a relation exists and if it's active or not, so you need to write dummy conditionals in your hooks to work around that. IMHO it's hackish to check variables that are only non-empty during a relation because they will vanish anyway. A command to list the currently set relations would be awesome to have, both inside the hooks and in the CLI. Perhaps charmhelpers.core.services.helpers.RelationContext could be used for this but I'm not totally sure as you only get the relation data and you need to know the relation name in advance anyway, right? 7. When a hook fails (most usually during relations being set) I have to manually run resolved unit/0 multiple times. It's not enough to call it once and wait for Juju to get it straight. I have to babysit the unit and keep running resolved unit/0, while I imagined this should be automatic because I wanted it resolved for real anyway. If the failed hook was the first in a chain, you'll have to re-run this for every other hook in the sequence. Once for a relation, another for config-changed, then perhaps another for the stop hook and another one for start hook, depending on your setup. 8. Do we have to monitor and wait a relation variable to be set? I've noticed that sometimes I want to get its value right away in the relation hook but it's not assigned yet by the other service. So I'm finding myself adding sleep commands when it happens, and that's quite hackish I think? IMHO the command to get a variable from a relation should be blocking until a value is returned so the charm doesn't have any timing issues. I see that happening with rabbitmq-server's charm all the time, for instance. 9. If you want to cancel a deployment that just started you need to keep running remove-service forever. Juju will simply ignore you if it's still running some special bits of the charm or if you have previously asked it to cancel the deployment during its setting up. No errors, no other messages are printed. You need to actually open its log to see that it's still stuck in a long apt-get installation and you have to wait until the right moment to remove-service again. And if your connection is slow, that takes time, you'll have to babysit Juju here because it doesn't really control its
Re: feedback about juju after using it for a few months
On Wed, Dec 17, 2014 at 8:47 PM, Tim Penhey tim.pen...@canonical.com wrote: 1. Seems that if you happen to have more than... say, 30 machines, Juju starts behaving weirdly until you remove unused machines. One of the weird things is that new deploys all stay stuck with a pending status. That happened at least 4 times, so now I always destroy-environment when testing things just in case. Have anyone else seen this behaviour? Can this because of LXC with Juju local? I do a lot of Juju testing so it's not usual for me to have a couple hundreds of machines after a mont by the way. I'll answer this one now. This is due to not enough file handles. It seems that the LXC containers that get created inherit the handles of the parent process, which is the machine agent. After a certain number of machines, and it may be around 30, the new machines start failing to recognise the new upstart script because inotify isn't working properly. This means the agents don't start, and don't tell the state server they are running, which means the machines stay pending even though lxc says yep you're all good. I'm not sure how big we can make the limit nofile in the agent upstart script without it causing problems elsewhere. Hey, that makes a lot of sense. I wonder if you can detect that in advance and perhaps make Juju tell the sysadmin about the limit being reached (or nearly reached) then? -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju
Re: feedback about juju after using it for a few months
On Wed, Dec 17, 2014 at 8:44 PM, Richard Harding rick.hard...@canonical.com wrote: 11. Juju's GUI's bells and whistles are nice, but I think there's a bug with it because its statuses are inaccurate. If you set a relation, Juju says the relation is green and active immediately, which is not true if you keep tailing the log file and you know things can still fail because scripts are still running. The relation is green, but if it errors after some time it should turn red with error info. If the relation goes into an error state and it does not then that's a bug we'd love to fix. If you could file the bug and let us know if there's two services that this is easily replicated with that's be awesome! I understand that, but the UI is telling me all is good, you're free to go on and play with the other Juju magic here in the GUI then I do that and things start to fail because that green bar actually should not have been turned green as we have scripts running, though I understand the principle with your explanation :-) perhaps it could be transparent, or pulsing green, or any other temporary state in between? I know it may fail and it will turn red, but it does not matter because I would never set a second relation between my units if one of them isn't really ready to (i.e. green). When I see it green, I think it's ready. Except it's not :-( 13. Juju's GUI's panel with charmstore stays open all the time wasting window space (so I have to zoom out virtually all my deployments because of the amount of wasted space, every time). There could be a way to hide that panel, because honestly it's useless locally since it never lists my local charms even if I export JUJU_REPOSITORY correctly. I'd rather have my local charms listed there too or just hide the panel instead. You can hide the panel. If you type '?' a series of keyboard shortcuts come up. One of them is to toggle the sidebar using 'ctrl-shift-h' (hide). Please try that out and let us know if that helps or not. As we improve the sidebar and make the added services bar more prevalent we hope the sidebar being there is more and more useful. Perhaps I didn't notice that because I always have the GUI in a separate monitor so I just use it with the mouse, sorry. But if you can hide it with a shortcut what stops us from having a clickable area in there to hide with the cursor? Or is there one? 13. Juju's GUI shows new relations info incorrectly. If I set up a DB relation to my service it simply says in the confirmation window that db relation added between postgresql and postgresql. I've noticed sometimes this changes to between myservice and myservice so perhaps it has to do with the order of the relation, from what service to the other? Anyway, both cases seem to show it wrong? Thanks, we'll look into this. Is there two services you can replicate this every time or is it something that happens less consistently? I've seen that with the Postgres charm in the store more specifically. But I think with Apache's and RabbitMQ's too, then I started to wonder if it wasn't a problem with the GUI instead with the charms. 14. Juju's GUI always shows the service panel even if the service unit has been destroyed, just because I opened it once. Also, it says 1 dying units (sic) forever until I close it manually. By service panel is this the details panel that slides out from the left sidebar? We can definitely look into making sure those go away when the unit or service are destroyed. Yep! That one :-) 16. Juju's GUI lists all my machines. Like, all of them, really. In the added services part of the panel it lists even inactive machines, which does not make much sense I'd say because it makes it seem only deployed machines are listed. I think that count is wrong. The GUI lists the machines it knows about from Juju. I'm not sure about hiding them because in machine view we use them for targets to deploy things to. Now machines are only listed in machine view, but you mention seeing them in the 'added services' panel? Do you have a screenshot of what you mean we could take a look at? Not now, but I can take one tomorrow. I don't see the machines themselves, it's just their count in the added services panel, which is odd because I don't have that many machines active in the deployment, that's why. I'll make a note so I don't forget to take the screenshot :-) That's it, thank you for those who made it to the end :-D And thank you for taking the time to write out the great feedback. Cool beans! -- Juju mailing list Juju@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju