[Bug 1602192] Re: deploy 30 nodes on lxd, machines never leave pending
So while pid1 survives the EMFILES, udevd does not. Working container: 1058 writev(2, [{"rules contain 49152 bytes tokens (4096 * 12 bytes), 13886 bytes strings", 71}, {"\n", 1}], 2) = 72 [...] 1058 inotify_init1(O_CLOEXEC) = 8 Failing container: writev(2, [{"rules contain 49152 bytes tokens (4096 * 12 bytes), 13886 bytes strings", 71}, {"\n", 1}], 2rules contain 49152 bytes tokens (4096 * 12 bytes), 13886 bytes strings ) = 72 [...] inotify_init1(O_CLOEXEC)= -1 EMFILE (Too many open files) writev(2, [{"inotify_init failed: Too many open files", 40}, {"\n", 1}], 2inotify_init failed: Too many open files ) = 41 writev(2, [{"error initializing inotify", 26}, {"\n", 1}], 2error initializing inotify ) = 27 [...] writev(2, [{"failed to allocate manager object: Cannot allocate memory", 57}, {"\n", 1}], src/udev/udevd.c, manager_new(): manager->fd_inotify = udev_watch_init(manager->udev); if (manager->fd_inotify < 0) return log_error_errno(ENOMEM, "error initializing inotify"); So the "cannot allocate memory" is the wrong error code, but it does fail because of running out of file descriptors. ** Summary changed: - deploy 30 nodes on lxd, machines never leave pending + when starting many LXD containers, they start failing to boot with "Too many open files" ** Changed in: lxd (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: when starting many LXD containers, they start failing to boot with "Too many open files" To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: deploy 30 nodes on lxd, machines never leave pending
This is how this can be debugged with lxd: Change the reproducer script to not delete the "working" containers, but leave them running. Then booting the failed x-013 (or whichever it is) will fail reliably. "lxc stop -f x-013" it, then "lxc config edit x-013" and add the following keys: config: environment.SYSTEMD_LOG_LEVEL: debug devices: kmsg: major: "1" minor: "11" path: /dev/kmsg type: unix-char This will enable verbose logging and provide a /dev/kmsg in the container so that the container can actually log stuff to dmesg (it fails too early for the journal to start). Then "dmesg -w" provides a nice log tail. There we see: [12235.886344] systemd-tmpfiles[54]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring. [12235.891623] systemd-remount-fs[55]: /bin/mount for / exited with exit status 1. [12235.909095] systemd-udevd[51]: inotify_init failed: Too many open files [12235.909101] systemd-udevd[51]: error initializing inotify [12235.909151] systemd-udevd[51]: failed to allocate manager object: Cannot allocate memory (repeats a few times). stracing lxd children also confirms that there are a lot of EMFILE errors. However, I stopped x-013 and rebooted x-012, which works again. The EMFILE *also* happens there, so this is unrelated and just a red herring. The important difference is that udev works there, so systemd sees that the root device is already mounted/does not exist. Without udev it just tries to mount it as it has no further information about it. As a workaround, remove or comment out the root fs line in /etc/fstab -- it's not needed at all in a container. Then the container boots, which makes it more convenient to debug udev in it. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: deploy 30 nodes on lxd, machines never leave pending To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1602192] Re: deploy 30 nodes on lxd, machines never leave pending
Marking as invalid for juju-core as it is reproducible outside of juju ** Also affects: lxd (Ubuntu) Importance: Undecided Status: New ** Changed in: juju-core Status: In Progress => Invalid ** Changed in: juju-core Milestone: 2.0-beta12 => None -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1602192 Title: deploy 30 nodes on lxd, machines never leave pending To manage notifications about this bug go to: https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs