[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
This bug was fixed in the package lxd - 2.0.10-0ubuntu1~16.04.2 --- lxd (2.0.10-0ubuntu1~16.04.2) xenial; urgency=medium * Fix regression in image update logic (LP: #1712455): - 0005-Fix-regression-in-image-auto-update-logic.patch - 0006-lxd-images-Carry-old-cached-value-on-refresh.patch - 0007-Attempt-to-restore-the-auto_update-property.patch * Ship a sysctl.d file that bumps inotify watches count. (LP: #1602192) * Update debian/watch to look only at LTS releases. -- Stéphane GraberTue, 22 Aug 2017 20:39:36 -0400 ** Changed in: lxd (Ubuntu Xenial) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
I confirmed that the sysctl.d file is now present and functional. ** Tags removed: verification-needed verification-needed-xenial ** Tags added: verification-done verification-done-xenial -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Hello Christian, or anyone else affected, Accepted lxd into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/lxd/2.0.10-0ubuntu1~16.04.2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: lxd (Ubuntu Xenial) Status: In Progress => Fix Committed ** Tags added: verification-needed verification-needed-xenial -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Also affects: lxd (Ubuntu Xenial) Importance: Undecided Status: New ** Description changed: - Reported by Uros Jovanovic here: https://bugs.launchpad.net/juju- - core/+bug/1593828/comments/18 + == SRU + === Rationale + LXD containers using systemd will use a very large amount of inotify watches. This means that a system will typically run out of global watches with as little as 15 Ubuntu 16.04 containers. + + An easy fix for the issue is to bump the number of user watches up to + 1024, making it possible to run around 100 containers before hitting the + limit again. + + To do so, LXD is now shipping a sysctl.d file which bumps that + particular limit on systems that have LXD installed. + + === Testcase + 1) Upgrade LXD + 2) Spawn about 50 Ubuntu 16.04 containers ("lxc launch ubuntu:16.04") + 3) Check that they all get an IP address ("lxc list"), that's a pretty good sign that they booted properly + + === Regression potential + Not expecting anything here. Juju has shipped a similar configuration for a while now and so have the LXD feature releases. + + We pretty much just forgot to include this particular change in our LTS + packaging branch + + + == Original bug report + Reported by Uros Jovanovic here: https://bugs.launchpad.net/juju-core/+bug/1593828/comments/18 "... However, if you bootstrap LXD and do: juju bootstrap localxd lxd --upload-tools for i in {1..30}; do juju deploy ubuntu ubuntu$i; sleep 90; done Somewhere between 10-20-th deploy fails with machine in pending state (nothin useful in logs) and none of the new deploys after that first pending succeeds. Might be a different bug, but it's easy to verify with running that for loop. So, this particular error was not in my logs, but the controller still ends up unable to provision at least 30 machines ..." I can reproduce this. Looking on the failed machine I can see that jujud isn't running, which is why juju considers the machine not up, and in fact nothing of juju seems to be installed. There's nothing about juju in /var/log. Comparing cloud-init-output.log between a stuck-pending machine and one which has started up fine, they both start with some key-generation messages, but the successful machine then has the line: Cloud-init v. 0.7.7 running 'init' at Tue, 12 Jul 2016 08:32:00 +. Up 4.0 seconds. ...and then a whole lot of juju-installation gubbins, while the failed machine log just stops. ** Changed in: lxd (Ubuntu Xenial) Status: New => Triaged ** Changed in: lxd (Ubuntu Xenial) Status: Triaged => In Progress ** Changed in: lxd (Ubuntu Xenial) Importance: Undecided => Medium -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** No longer affects: juju -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju Milestone: 2.2-beta4 => 2.2-rc1 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju Milestone: 2.2-beta3 => 2.2-beta4 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
This bug was fixed in the package lxd - 2.12-0ubuntu2 --- lxd (2.12-0ubuntu2) zesty; urgency=medium * Increate default inotify limits to 1024 instances per uid. (LP: #1602192) -- Stéphane GraberMon, 03 Apr 2017 18:51:34 -0400 ** Changed in: lxd (Ubuntu) Status: Confirmed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju Milestone: 2.2-beta2 => 2.2-beta3 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju Milestone: 2.2-beta1 => 2.2-beta2 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
>From internal discussion, for Juju, in clouds, there is no guarantee that the images can support particular kernel parameters. For us, it is best not to try and guess and better surface the error in the status. ** Changed in: juju Importance: Critical => High ** Changed in: juju Milestone: 2.1-rc1 => 2.2.0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju Milestone: 2.1.0 => 2.1-rc1 ** Changed in: juju Assignee: Richard Harding (rharding) => (unassigned) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju Status: Confirmed => Triaged -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
billy-olsen: You're right - because we don't install juju from the package on the deployed machines the config to enable more lxd containers doesn't get done there. We should apply it in that case. ** Changed in: juju Status: Fix Committed => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
This bug is fixed for the local LXD provider scenario for 2.0 on Xenial with this commit - http://bazaar.launchpad.net/~juju- qa/ubuntu/xenial/juju/2.0.0/revision/214 - in which the juju-2.0.conf file has the settings identified and dropped into /usr/lib/sysctl.d. Unfortunately, this only addresses the case of the local LXD provider and not any other LXD usage. For example, if you do a high density deployment using MAAS + LXD, this code only applies to the juju-2.0 packages which are installed on the machine where the juju client is run. This means that any of the other possible uses of the LXD containers within Juju are receiving the same benefit from this tweak. So to me it feels like there's a fix committed and even released, however it only partially solves tuning the system for higher density of LXD containers. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Can we see the commit related to the bug? Also, is there a plan to backport it to 2.0 series as I'm not sure when we get 2.1 GA? https://github.com/juju/juju/wiki/Juju-Release-Schedule -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Tested w Juju 2.1 beta3 and verified I can get over 20 containers and while it's slow does not get stuck in pending. The changes to the profiles on the server that Michael has done appear to be working. ** Changed in: juju Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Not really. The problem is that bumping those sysctls to the value in that document will work on most normal systems but will fail dramatically on very low memory systems like some cloud instances and ARM systems. As for having LXD know when things will fail, it simply has not idea. All the errors happen through processes inside the container that LXD has no visibility on. You can start hundreds (possibly thousands) of alpine containers before any such problem occur, but spawning 20 containers that use systemd will trip it. Similarly, you can probably start about 10x as many Ubuntu 14.04 containers before you run the same problem as with Ubuntu 16.04 containers. In most cases, what's receiving the allocation error back from the kernel is systemd in the container. And rather than moving on and just doing some polling when inotify doesn't work, it just plain hangs there without providing any kind of useful feedback to the user (since logging isn't even started at that point). As I said before, there is kernel work being done now to fix the inotify part of this problem in a clean, sane way which will work for everyone. Until then, it's reasonable for Juju to bump those limits as they know exactly what kind of instance they're running on. For the ulimits (limits.conf), we'll have to look into what's going on here because the LXD systemd unit does have those bumped, but they somehow seem to be ignored by systemd or reset at some point in time, causing some of this breakage. I suspect it'll boil down to being a systemd bug, but we need to take a close look at this. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Would it be reasonable to apply the settings that https://github.com/lxc/lxd/blob/master/doc/production-setup.md suggests when we add the first LXD to a machine? What about interrogating the kernel about available resources and refusing to add a container when it won't work? I can try adding 20 containers to a MAAS node at the moment and get no useful message and they do start to fail. Since Juju is running as root we could just do a sysctl key=value to set those live and apparently changes to /etc/security/limits.conf apply to new processes without a reboot. Preserving the sysctl over reboots isn't difficult either. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
$ juju bootstrap foo lxd Creating Juju controller "foo" on lxd/localhost Looking for packaged Juju agent version 2.0.0 for amd64 No packaged binary found, preparing local Juju agent binary To configure your system to better support LXD containers, please see: https://github.com/lxc/lxd/blob/master/doc/production-setup.md Launching controller instance(s) on lxd/localhost... - juju-6e02cd-0 Fetching Juju GUI 2.2.0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
I have filed bug #1631914 against lxd for better surfacing of the "too many files open" error. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
As a note, increasing fs.max-files had no effect for me. Using the following settings I was able to deploy about 23 containers with juju, which is higher than I could previously: fs.inotify.max_user_watches = 524288 fs.inotify.max_user_instances = 256 These are the proposed settings to come with a juju install in bug #1631038. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
FYI a new version of the inotify patchset was sent today for review: https://lists.linuxfoundation.org/pipermail/containers/2016-October/037480.html This approach is the real fix for the inotify part of this problem. For open files, we've had a few reports of systemd occasionally misbehaving and dropping our bumped ulimits on the floor, this may be what you've been running into... With ulimits bumped and the inotify resolved in the kernel, the next likely limit we'd hit is pts_max, but assuming normal uses of 2-3 devices per container, it'd take quite a few of them to reach the default limit of 1024. Anyway, those kind of kernel limits are something we're quite aware of, I actually am flying back from LinuxCon where I gave a talk covering a bunch of those problems and we'll be pushing at Linux Plumbers in a couple of weeks to try and get proper solutions in the upstream kernel. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju Status: Triaged => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
with fs.inotify.max_user_instances = 2048 it didn't fail until the 187th container. It did fail with a new message that I haven't seen: [3831] x-187:error: Error calling 'lxd forkstart x-187 /var/lib/lxd/containers /var/log/lxd/x-187/lxc.conf': err='exit status 1' lxc 20161007100331.798 ERROR lxc_conf - conf.c:run_buffer:347 - Script exited with status 1 lxc 20161007100331.798 ERROR lxc_start - start.c:lxc_init:465 - failed to run pre-start hooks for container 'x-187'. lxc 20161007100331.798 ERROR lxc_start - start.c:__lxc_start:1313 - failed to initialize the container Try `lxc info --show-log local:x-187` for more info I wasn't able to actually do '--show-log', because the cleanup code already had torn down that instance (go-lxc-run.sh only leaves the container around if it launches but fails to come up, if the launch itself fails it is still in the CLEAN list.) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
I added a "cat /proc/meminfo | grep Slab" to go-lxc-run.sh and found this: $ sysctl fs.inotify fs.inotify.max_queued_events = 65536 fs.inotify.max_user_instances = 1024 fs.inotify.max_user_watches = 524288 $ ulimit -a ... open files (-n) 1048576 ... $ go-lxc-run.sh [0] x-001:[13]..7Slab: 432176 kB ... [1365] x-071:[15]...8Slab:9639484 kB [1387] x-072:[15]...8Slab:9603648 kB [1409] x-073:[14]...8Slab:9621936 kB [1431] x-074:[14]...8Slab:9673000 kB [1453] x-075:[13]9Slab:9680444 kB [1475] x-076:[14]9Slab:8223312 kB [1501] x-077:[16]..7Slab:5095880 kB ... 1812] x-093:[12]..7Slab:6163512 kB [1831] x-094:[12]..7Slab:6271272 kB [1850] x-095:[13]..falseSlab:6371956 kB x-095 failed to boot. keeping x-095. So kernel memory seemed to peak at around 10GB, and then somehow dropped down to 5GB to allow 20 more containers to be created. (on a 16GB machine, that's a fair bit allotted to just the kernel). However, 95 seems to be the limit for 1024 fs.inotify.max_user_instances. But that still means setting it to 1M is silly. I'll do another run with max_user_instances=2048 and see what happens. Since I know with very-high max values, I can get to 230 containers, that sounds like 2048 is sufficient for it. I'll also play around next with putting Juju into the loop and see where I get. I really wonder about max-open-files for Root vs User. LXD is running as Root, but if the containers are all running User level, shouldn't that be the constraint? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
After rebooting with the new values, I did manage to get to launch a lot more containers, failing at only the 232nd one. I wonder if the issue is not the User number of open files, but the Root number of open files, which requires a reboot to get updated. I'll try to play around more to really nail down what things actually need to be changed, but 200+ containers is a huge difference from 19. (Also need to run this test with Juju in the mix.) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Interestingly, my baseline Kernel memory with no containers (and not much other software) was about the same (~400MB). I'm not entirely sure why it grew faster with the new settings, but didn't effect the baseline. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
I did try setting all of the items that are mentioned in production- setup.md. To start with, a few of them are not reasonable. max_user_instances defaults to 128, and we were able to see a difference at 256, but not at 1024. Setting it to 1M seems silly. I'll also note that my Kernel memory consumption went up significantly with those settings. When I hit 12 containers I was already up over 2GB of kernel memory (whereas before I peaked around 1.4GB of kernel memory when I hit 19 containers). It seems to be a case of a huge number more "kmalloc-64" entries. I'm not sure where those are coming from, but there are enough "OBJS" that it overflows the standard column widths in the slabtop output. With all of those set, I did get more containers after rebooting my machine. (Just logging out and back in again, I actually went down to 18 containers max). At 22 containers I hit 3.8GB of Kernel memory. I'm letting it continue to run to see where it gets to. I did also make sure to change the LXD backend pool to ZFS instead of being just the normal disk. (using zfs-dkms for Trusty kernel.) Given that I was using a btrfs filesystem, and now LXD is using ZFS that might also be a factor in how many containers I could run. Certainly in the initial reports "btrfs_inode_*" was near the top of the KMem output. And now its all kmalloc and dentry. Maybe that's a side-effect of dkms? I did end up hitting 30 containers at 4.6GB of Kernel memory before the go-lxc-run.sh wanted to start clearing up old containers. So I'll patch that out and see how far I get. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Created bug #1631038 for the addition of an /etc/sysctl.d/10-juju.conf file upping max_user_watches and max_user_instances which will increase the number of containers we can run out of the box, but not by enough. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Note that there should be support for /etc/system/limits.d/10-juju.conf I'm testing it now, but it may be that we can drop something in there as well. I'll test it a bit, but if we have some tasteful defaults, maybe we can make it work. I think we can change from their default so instead of "* nofile 1M" we do something like "@lxd nofile 1M" so only users that are also in the LXD group will get the expanded limits. Again, these feel a bit more like LXD issues than Juju ones, but maybe we can get something done in the immediate term. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Chatting with tych0 we were linked to: https://github.com/lxc/lxd/blob/master/doc/production-setup.md That notes a bunch of tweaks to run the system at scale. We need to evaluate what items make sense to enable ootb, and what things are solid for a prod lxd system but not a great idea for all users to just have. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Some settings to tweak from tych0 on IRC: https://github.com/lxc/lxd/blob/master/doc/production-setup.md -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Note from jam after some experimentation: So having played with it for a bit, I'm more comfortable with an /etc/sysctl.d/10-juju.conf that sets max_user_watches=512k and max_user_instances=256 but if we want to get to 50 instances we need to dig harder. I can just barely get to 10 instances of 'ubuntu' from juju, and only 19 raw containers with any of the inotify settings, and processes start dying at that point. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
With Juju in the loop, I run into whatever limit a bit faster. I was successful at doing: juju bootstrap test-lxd lxd juju deploy ubuntu juju add-unit -n 5 ubuntu # wait for status to say everything is running juju add-unit -n 5 ubuntu # wait for status to be happy but then after doing one more juju add-unit -n 5 ubuntu Firefox crashed shortly thereafter, and so did the Juju Controller, 'juju status' stopped responding. 'lxc list' shows that containers 0-10, 12 and 13 are running, but 11,14,15 are stopped. lsof | wc -l reports 84854 file handles open, but that's still lower than cat /proc/sys/fs/file-max 1635148 Kernel memory is currently at 1409M I get some random errors like "unable to acquire lock, no space left on device" even though /dev/sda1 has 10GB free. This is all under a Trusty HWE kernel (4.4.0-38), though the images are Xenial ones that we are running. It appears that Mongo was killed when things started to go south and systemd decided that it would stop trying to restart 'juju-db' With the error message "Start request repeated too quickly". It didn't try again for at least 10 minutes before I went in and started it manually. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
5) Another data point, with $ sysctl fs.inotify fs.inotify.max_queued_events = 131072 fs.inotify.max_user_instances = 1024 fs.inotify.max_user_watches = 524288 (so max_queued_events 8x greater, and max_user_instances well above previously established useful level), I still only get 19 containers. and it still fails with a "Failed to change ownership of". So if we want to get to 50 containers, it seems we need more than just setting the inotify limits. I still plan on doing similar tests with Juju in the mix, as it is more likely to actually need user_watches than just launching a bare Xenial container. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Michael and I played around with some different settings, and here are my notes. 1) Package kde-runtime seems to install /etc/sysctl.d/30-baloo-inotify-limits.conf which sets max_user_watches to 512*1024 'slabtop' says that my baseline kernel memory is 380-420MB with no containers running. 2) With fs.inotify.max_queued_events = 16384 fs.inotify.max_user_instances = 128 fs.inotify.max_user_watches = 524288 go-run-lxc.sh fails to start container number 13. with "x-013 failed to boot" Kernel memory has grown to 1076MB up from 400MB with 141816K dentry 564320K btrfs_inode As the largest items. 3) However if I set fs.inotify.max_user_watches = 8192 It fails again at exactly 13 containers. So while in a real-world scenario, max_user_watches may come into play, a "standard" desktop value of 512K is plenty (at least to have machines provision). (I do believe I've seen max_user_watches come into play while using Juju in the past, it just isn't the specific problem in the go-lxc-run.sh script which takes Juju out of the picture.) 4) If I then play around with 'max_user_instances' with: fs.inotify.max_queued_events = 16384 fs.inotify.max_user_instances = 256 fs.inotify.max_user_watches = 524288 I then fail on the 19th container, with error: Failed to change ownership of: Kernel memory is up to: 1363MB Top entries are: 178872K dentry 723104K btrfs_inode And at this point, my machine is behaving poorly. Things like "lxd delete --force" actually fails to cleanup instances, because "error: unable to open database file". And Term crashed at least one time. I even tried at one point to set: fs.inotify.max_user_instances=2048 But it still failed at 19 (that was when Term crashed). But I'm pretty confident that it means we're exhausting some other limit, rather than inotify max_user_instances or max_user_watches, since changing either of them doesn't actually increase the number of containers. I'm going to run some more tests with Juju in the loop. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
We can include this in cloud-init for machines that will host containers. However, for the lxd provider this would need to be set on the machine running juju (the host machine). Should this be done at juju install time? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju Milestone: 2.0.0 => 2.1.0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Per email thread, Juju needs to ship with expanded details: > Stephane, I share your concerns around selecting the right knobs and > testing accordingly but my main concern now is that I feel many people will > hit this limitation when they deploy bigger charms in containers. Is there > any way to expedite the work-around testing? Well, Juju itself could also be bumping those limits, that way only hosts that do actually run a bunch of big charms will have their limits bumped rather than everyone who has LXD installed. You could just ship a /etc/sysctl.d/10-juju.conf file with: fs.inotify.max_queued_events = 131072 fs.inotify.max_user_instances = 1024 fs.inotify.max_user_watches = 4194304 ** Project changed: juju-core => juju ** Changed in: juju Status: Invalid => Triaged ** Changed in: juju Assignee: (unassigned) => Richard Harding (rharding) ** Changed in: juju Milestone: None => 2.0.0 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
** Changed in: juju-core Assignee: Christian Muirhead (2-xtian) => (unassigned) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
@stgrabber, what's the status on this? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
So there is a patchset that's been sent for review a couple of times upstream which would tie a bunch of the inotify limits to a user namespace. This would almost certainly fix this issue. Until then, we can workaround it by bumping the default values for those knobs. This bump could be done in the Ubuntu kernel directly, or through a sysctl file, either shipped by default with our other sysctl settings or as a hook in the lxd or juju packages (lxd would probably make more sense). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
It sounds like the "fix" for this lies outside of LXD. Can we add the affected packages and involve the right upstreams for this? Are we going to pursue this purely upstream with systemd / kernel or are we going to attempt a distro-patch of some sort? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
It does look like max_user_instances is global and not namespaced. So unless there's a resource leak somewhere that needs to be fixed, bumping that limit may be the only option. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
I don't know that much about /proc/sys/fs/inotify/max_user_instances, but apparently this isn't properly namespaced, so it is a global limit for all containers on the host? As a workaround, lxd could ship a /usr/lib/sysctl.d/ snippet to bump it, but I'm not sure if that has other downsides (particularly as we install lxd by default). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: when starting many LXD containers, they start failing to boot with "Too many open files"
Apparently the relevant limit is /proc/sys/fs/inotify/max_user_instances. This is "128" by default. When increasing it with sudo sysctl fs.inotify.max_user_instances=256 then the failed container reboots fine (and udev starts). -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs