[Bug 1602192] Re: deploy 30 nodes on lxd, machines never leave pending

2016-07-15 Thread Martin Pitt
So while pid1 survives the EMFILES, udevd does not. Working container:

1058  writev(2, [{"rules contain 49152 bytes tokens (4096 * 12 bytes), 13886 
bytes strings", 71}, {"\n", 1}], 2) = 72
[...]
1058  inotify_init1(O_CLOEXEC)  = 8

Failing container:

writev(2, [{"rules contain 49152 bytes tokens (4096 * 12 bytes), 13886 bytes 
strings", 71}, {"\n", 1}], 2rules contain 49152 bytes tokens (4096 * 12 bytes), 
13886 bytes strings
) = 72
[...]
inotify_init1(O_CLOEXEC)= -1 EMFILE (Too many open files)
writev(2, [{"inotify_init failed: Too many open files", 40}, {"\n", 1}], 
2inotify_init failed: Too many open files
) = 41
writev(2, [{"error initializing inotify", 26}, {"\n", 1}], 2error initializing 
inotify
) = 27
[...]
writev(2, [{"failed to allocate manager object: Cannot allocate memory", 57}, 
{"\n", 1}], 


src/udev/udevd.c, manager_new():

manager->fd_inotify = udev_watch_init(manager->udev);
if (manager->fd_inotify < 0)
return log_error_errno(ENOMEM, "error initializing inotify");

So the "cannot allocate memory" is the wrong error code, but it does
fail because of running out of file descriptors.

** Summary changed:

- deploy 30 nodes on lxd, machines never leave pending
+ when starting many LXD containers, they start failing to boot with "Too many 
open files"

** Changed in: lxd (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1602192

Title:
  when starting many LXD containers, they start failing to boot with
  "Too many open files"

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1602192] Re: deploy 30 nodes on lxd, machines never leave pending

2016-07-15 Thread Martin Pitt
This is how this can be debugged with lxd: Change the reproducer script
to not delete the "working" containers, but leave them running. Then
booting the failed x-013 (or whichever it is) will fail reliably. "lxc
stop -f x-013" it, then "lxc config edit x-013" and add the following
keys:

config:
  environment.SYSTEMD_LOG_LEVEL: debug

devices:
  kmsg:
major: "1"
minor: "11"
path: /dev/kmsg
type: unix-char

This will enable verbose logging and provide a /dev/kmsg in the
container so that the container can actually log stuff to dmesg (it
fails too early for the journal to start). Then "dmesg -w" provides a
nice log tail. There we see:

[12235.886344] systemd-tmpfiles[54]: [/usr/lib/tmpfiles.d/var.conf:14] 
Duplicate line for path "/var/log", ignoring.
[12235.891623] systemd-remount-fs[55]: /bin/mount for / exited with exit status 
1.
[12235.909095] systemd-udevd[51]: inotify_init failed: Too many open files
[12235.909101] systemd-udevd[51]: error initializing inotify
[12235.909151] systemd-udevd[51]: failed to allocate manager object: Cannot 
allocate memory

(repeats a few times). stracing lxd children also confirms that there
are a lot of EMFILE errors.

However, I stopped x-013 and rebooted x-012, which works again. The
EMFILE *also* happens there, so this is unrelated and just a red
herring. The important difference is that udev works there, so systemd
sees that the root device is already mounted/does not exist. Without
udev it just tries to mount it as it has no further information about
it.

As a workaround, remove or comment out the root fs line in /etc/fstab --
it's not needed at all in a container. Then the container boots, which
makes it more convenient to debug udev in it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1602192

Title:
  deploy 30 nodes on lxd, machines never leave pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs


[Bug 1602192] Re: deploy 30 nodes on lxd, machines never leave pending

2016-07-14 Thread Cheryl Jennings
Marking as invalid for juju-core as it is reproducible outside of juju

** Also affects: lxd (Ubuntu)
   Importance: Undecided
   Status: New

** Changed in: juju-core
   Status: In Progress => Invalid

** Changed in: juju-core
Milestone: 2.0-beta12 => None

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1602192

Title:
  deploy 30 nodes on lxd, machines never leave pending

To manage notifications about this bug go to:
https://bugs.launchpad.net/juju-core/+bug/1602192/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs