[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
This bug was fixed in the package lxc - 1.0.0~alpha1-0ubuntu2 --- lxc (1.0.0~alpha1-0ubuntu2) saucy; urgency=low * Add allow-stderr to autopkgtst restrictions as the Ubuntu template uses policy-rc.d to disable some daemons and that causes a message to be printed on stderr when the service tries to start. -- Stephane Graber stgra...@ubuntu.com Thu, 12 Sep 2013 13:57:17 -0400 ** Changed in: lxc (Ubuntu) Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
The go API bindings have been tested with a large number of simultaneous create/start/destroys, so I believe the upstream API thread-safety work must have fixed this at least upstream. ** Changed in: lxc (Ubuntu) Status: Confirmed = Fix Committed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
Note that the workaround helps significantly but is not always sufficient. If I do not apply it, I can reliably encounter the problem every single test run. Another team is working on moving us to run on Precise in production, which would let us use Precise containers. Precise containers do not appear to trigger the problem. Because of this, I will advise that we wait on pursuing the problem. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
This was working very well for us, and the lxc-start-ephemeral TRIES=60 was working fine. Starting last week, we began seeing recurrence of this problem, even with the workarounds applied and TRIES hacked to 180. I intend to circle around with Serge and see if he has any other ideas. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
Thanks, Gary, I'll try to reproduce this. ** Changed in: lxc (Ubuntu) Importance: Undecided = Medium -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
@Gary, I wasn't able to reproduce this on a m1.xlarge. Regarding needing disk space, note that /dev/vdb1 should have a large amount of disk. You can unmount /mnt, pvcreate /dev/xvdb; vgcreate lxc /dev/xvdb; then use -B lvm as an additional lxc-create flag to create an lvm-backed container. SNapshotted lvm containers can be created wit lxc-clone -s -o p1 -n p2. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
I wasn't able to reproduce this on a cc2.8xlarge either. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
** Changed in: lxc (Ubuntu) Status: New = Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
Ok I've been able to reproduce this. I've tried with lvm snapshot containers - those did NOT do this. I don't understand why, unless it's simply the timing. I used the following script: cat notit.sh #!/bin/bash maclist=[] i=0 for c in /var/lib/lxc/*/config; do mac=`grep lxc.network.hwaddr $c | awk -F= '{ print $2 }'` maclist[$i]=$mac i=$((i+1)) done for j in `seq 0 $i`; do echo got macaddr ${maclist[j]} done for j in `seq 0 $i`; do tail -100 /var/log/syslog | grep -q ${maclist[j]} if [ $? -ne 0 ]; then echo ${maclist[j]} was not in syslog fi done to find a mac address of a container which didn't get an address, (grepping for the mac addr in /var/lib/lxc/*/config), and attached to such a console. ps -ef showed: UIDPID PPID C STIME TTY TIME CMD root 1 0 0 16:38 ?00:00:00 /sbin/init root42 1 0 16:38 ?00:00:00 upstart-udev-bridge --daemon root44 1 0 16:38 ?00:00:00 /usr/sbin/sshd -D syslog 47 1 0 16:38 ?00:00:00 rsyslogd -c4 root52 1 0 16:38 ?00:00:00 udevd --daemon root54 1 0 16:38 ?00:00:00 /sbin/udevadm monitor -e root58 1 0 16:38 ?00:00:00 udevadm settle root81 1 0 16:38 ?00:00:00 /sbin/getty -8 38400 tty4 root84 1 0 16:38 ?00:00:00 /sbin/getty -8 38400 tty2 root85 1 0 16:38 ?00:00:00 /sbin/getty -8 38400 tty3 root90 1 0 16:38 ?00:00:00 cron root 117 1 0 16:38 ?00:00:00 /bin/login -- root 118 1 0 16:38 ?00:00:00 /sbin/getty -8 38400 /dev/console ubuntu 133 117 0 16:39 tty1 00:00:00 -bash ubuntu 145 133 0 16:40 tty1 00:00:00 ps -ef and ifconfig -a showed: eth0 Link encap:Ethernet HWaddr 00:16:3e:2d:02:b9 inet6 addr: fe80::216:3eff:fe2d:2b9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:118 errors:0 dropped:0 overruns:0 frame:0 TX packets:7 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:14800 (14.8 KB) TX bytes:550 (550.0 B) loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) Note that udevadm monitor and settle are still running, and dhclient3 is not yet running. This is basically what I'd assumed since precise containers don't do this: precise containers don't do udevadm trigger. A workaround therefore will be to remove /etc/init/udevtrigger.conf from the container. Question remains exactly what is causing this - is there a rate limit in the kernel or in user-space udev which makes this pause happen? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
Actually simply disabling udevtrigger.conf doesn't work, because in lucid eth0 won't then come up. You also need to change the start on for /etc/init/networking.conf to start on (local-filesystems) -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
(lowering priority as there is a workaround) ** Changed in: lxc (Ubuntu) Importance: Medium = Low -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
Serge, thank you! The workaround appears to work very well for us. The containers started quickly, and it should not only give us more reliable starts but also seems to have taken at least three minutes off our average run time, as you might expect. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to lxc in Ubuntu. https://bugs.launchpad.net/bugs/1014916 Title: simultaneously started lucid containers pause while starting after the first seven To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven
** Description changed: We are gathering more data, but it feels like we have enough to start a bug report. On Precise, on a 16 core/32 thread EC2 machine (instance type cc2.8xlarge), when starting approximately eight or more Lucid containers - simultaneously, after the first seven or so connect to dnsmasq within - just a few seconds, the next group takes just under three minutes to - connect. We will provide short steps to duplicate this. + simultaneously: after the first seven or so connect to dnsmasq within + just a few seconds, the next group waits just under three minutes before + they start to connect. - We also have past experience indicating that more grouping occurs, - because simultaneously starting 32 containers occasionally (about once - in 15 attempts) will result in one container that still has not - connected to dnsmasq after ten minutes. We have not tried to duplicate - this, because we hope that tackling the smaller issue will also address - the larger one. + This affects our ability to ssh into the containers and do work. - I have not yet had enough disk space on this machine to duplicate this - without lxc-start-ephemeral. I intend to do so. - - I have tried with a precise container. *Precise containers do not - exhibit this bug* in the experiment I performed, at least up to 16 - containers. I plan to duplicate this at least a couple more times to - verify, but I'm comfortable saying initially that I believe this is - specific to Lucid containers. I'll report back. + We will provide short steps to duplicate this. Please note also the + extra data we include at the end of this report, showing experiments we + have run to rule out certain causes and scenarios. To duplicate (again, I am doing this on a specific EC2 machine with a lot of cores, but I believe you can see something similar to this even on a 4 core/8 hyperthread machine): sudo apt-get install moreutils sudo lxc-create -t ubuntu -n lucid -- -r lucid -a i386 -b ubuntu parallel -j 16 bash -c lxc-start-ephemeral -d -o lucid -- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Now look at /var/log/syslog at the most recent dnsmasq entries. You - will see about seven containers get DHCPACK from dnsmasq within about 20 - seconds; then you will see a pause of about three minutes; then you will - see more containers report to dnsmasq. Here's an excerpt (the - approximate three minute jump happens between lines 5 and 6): + will see that about seven containers get DHCPACK from dnsmasq within + about 20 seconds; then you will see a pause of about three minutes; then + you will see more containers report to dnsmasq. Here's an excerpt (the + approximate three minute jump happens between lines 5 and 6, and then I + include all lines up to the next ACK): ... Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPACK(lxcbr0) 10.0.3.161 00:16:3e:38:9f:e2 lptests-temp-5uKcEua - Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPREQUEST(lxcbr0) 10.0.3.37 00:16:3e:9c:ae:16 + Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPREQUEST(lxcbr0) 10.0.3.37 00:16:3e:9c:ae:16 Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPACK(lxcbr0) 10.0.3.37 00:16:3e:9c:ae:16 lptests-temp-1HxiwbU - Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPREQUEST(lxcbr0) 10.0.3.62 00:16:3e:69:83:2b + Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPREQUEST(lxcbr0) 10.0.3.62 00:16:3e:69:83:2b Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPACK(lxcbr0) 10.0.3.62 00:16:3e:69:83:2b lptests-temp-6o30rvH - Jun 18 20:43:05 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 10.0.3.123 00:16:3e:0d:7b:65 - Jun 18 20:43:05 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 10.0.3.203 00:16:3e:0d:7b:65 - Jun 18 20:43:08 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 10.0.3.123 00:16:3e:32:5b:99 - Jun 18 20:43:08 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 10.0.3.68 00:16:3e:32:5b:99 - Jun 18 20:43:11 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 10.0.3.123 00:16:3e:7c:21:38 - Jun 18 20:43:11 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 10.0.3.35 00:16:3e:7c:21:38 - Jun 18 20:43:14 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 10.0.3.123 00:16:3e:72:0c:64 - Jun 18 20:43:14 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 10.0.3.208 00:16:3e:72:0c:64 - Jun 18 20:43:17 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 10.0.3.123 00:16:3e:ed:b3:a0 - Jun 18 20:43:17 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 10.0.3.82 00:16:3e:ed:b3:a0 - Jun 18 20:43:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 10.0.3.123 00:16:3e:e3:8f:1a - Jun 18 20:43:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 10.0.3.247 00:16:3e:e3:8f:1a - Jun 18 20:43:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 10.0.3.123 00:16:3e:5c:99:20 - Jun 18 20:43:20 ip-10-58-49-120