[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2013-09-12 Thread Launchpad Bug Tracker
This bug was fixed in the package lxc - 1.0.0~alpha1-0ubuntu2

---
lxc (1.0.0~alpha1-0ubuntu2) saucy; urgency=low

  * Add allow-stderr to autopkgtst restrictions as the Ubuntu template
uses policy-rc.d to disable some daemons and that causes a message to
be printed on stderr when the service tries to start.
 -- Stephane Graber stgra...@ubuntu.com   Thu, 12 Sep 2013 13:57:17 -0400

** Changed in: lxc (Ubuntu)
   Status: Fix Committed = Fix Released

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2013-06-11 Thread Serge Hallyn
The go API bindings have been tested with a large number of simultaneous
create/start/destroys, so I believe the upstream API thread-safety work
must have fixed this at least upstream.

** Changed in: lxc (Ubuntu)
   Status: Confirmed = Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-07-10 Thread Gary Poster
Note that the workaround helps significantly but is not always
sufficient.  If I do not apply it, I can reliably encounter the problem
every single test run.

Another team is working on moving us to run on Precise in production,
which would let us use Precise containers.  Precise containers do not
appear to trigger the problem.  Because of this, I will advise that we
wait on pursuing the problem.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-07-09 Thread Gary Poster
This was working very well for us, and the lxc-start-ephemeral TRIES=60
was working fine.  Starting last week, we began seeing recurrence of
this problem, even with the workarounds applied and TRIES hacked to 180.
I intend to circle around with Serge and see if he has any other ideas.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-19 Thread Serge Hallyn
Thanks, Gary, I'll try to reproduce this.

** Changed in: lxc (Ubuntu)
   Importance: Undecided = Medium

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-19 Thread Serge Hallyn
@Gary,

I wasn't able to reproduce this on a m1.xlarge.

Regarding needing disk space, note that /dev/vdb1 should have a large
amount of disk.  You can unmount /mnt, pvcreate /dev/xvdb;  vgcreate lxc
/dev/xvdb;  then use -B lvm as an additional lxc-create flag to create
an lvm-backed container.  SNapshotted lvm containers can be created wit
lxc-clone -s -o p1 -n p2.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-19 Thread Serge Hallyn
I wasn't able to reproduce this on a cc2.8xlarge either.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-19 Thread Serge Hallyn
** Changed in: lxc (Ubuntu)
   Status: New = Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-19 Thread Serge Hallyn
Ok I've been able to reproduce this.  I've tried with lvm snapshot
containers - those did NOT do this.  I don't understand why, unless it's
simply the timing.

I used the following script:

cat notit.sh 
#!/bin/bash

maclist=[]
i=0
for c in /var/lib/lxc/*/config; do
  mac=`grep lxc.network.hwaddr $c | awk -F= '{ print $2 }'`
  maclist[$i]=$mac
  i=$((i+1))
done

for j in `seq 0 $i`; do
 echo got macaddr ${maclist[j]}
done

for j in `seq 0 $i`; do
  tail -100 /var/log/syslog | grep -q ${maclist[j]}
  if [ $? -ne 0 ]; then
echo ${maclist[j]} was not in syslog
  fi
done

to find a mac address of a container which didn't get an address,
(grepping for the mac addr in /var/lib/lxc/*/config), and attached to
such a console.  ps -ef showed:

UIDPID  PPID  C STIME TTY  TIME CMD
root 1 0  0 16:38 ?00:00:00 /sbin/init
root42 1  0 16:38 ?00:00:00 upstart-udev-bridge --daemon
root44 1  0 16:38 ?00:00:00 /usr/sbin/sshd -D
syslog  47 1  0 16:38 ?00:00:00 rsyslogd -c4
root52 1  0 16:38 ?00:00:00 udevd --daemon
root54 1  0 16:38 ?00:00:00 /sbin/udevadm monitor -e
root58 1  0 16:38 ?00:00:00 udevadm settle
root81 1  0 16:38 ?00:00:00 /sbin/getty -8 38400 tty4
root84 1  0 16:38 ?00:00:00 /sbin/getty -8 38400 tty2
root85 1  0 16:38 ?00:00:00 /sbin/getty -8 38400 tty3
root90 1  0 16:38 ?00:00:00 cron
root   117 1  0 16:38 ?00:00:00 /bin/login --   
root   118 1  0 16:38 ?00:00:00 /sbin/getty -8 38400 
/dev/console
ubuntu 133   117  0 16:39 tty1 00:00:00 -bash
ubuntu 145   133  0 16:40 tty1 00:00:00 ps -ef

and ifconfig -a showed:

eth0  Link encap:Ethernet  HWaddr 00:16:3e:2d:02:b9  
  inet6 addr: fe80::216:3eff:fe2d:2b9/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:118 errors:0 dropped:0 overruns:0 frame:0
  TX packets:7 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:14800 (14.8 KB)  TX bytes:550 (550.0 B)

loLink encap:Local Loopback  
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:0 errors:0 dropped:0 overruns:0 frame:0
  TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0 
  RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Note that udevadm monitor  and settle are still running, and dhclient3 is
not yet running.

This is basically what I'd assumed since precise containers don't do this:
precise containers don't do udevadm trigger.

A workaround therefore will be to remove /etc/init/udevtrigger.conf from
the container.

Question remains exactly what is causing this - is there a rate limit in
the kernel or in user-space udev which makes this pause happen?

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-19 Thread Serge Hallyn
Actually simply disabling udevtrigger.conf doesn't work, because in
lucid eth0 won't then come up.   You also need to change the start on
for /etc/init/networking.conf to

start on (local-filesystems)

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-19 Thread Serge Hallyn
(lowering priority as there is a workaround)

** Changed in: lxc (Ubuntu)
   Importance: Medium = Low

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-19 Thread Gary Poster
Serge, thank you!  The workaround appears to work very well for us.  The
containers started quickly, and it should not only give us more reliable
starts but also seems to have taken at least three minutes off our
average run time, as you might expect.

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1014916

Title:
  simultaneously started lucid containers pause while starting after the
  first seven

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1014916/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs


[Bug 1014916] Re: simultaneously started lucid containers pause while starting after the first seven

2012-06-18 Thread Gary Poster
** Description changed:

  We are gathering more data, but it feels like we have enough to start a
  bug report.
  
  On Precise, on a 16 core/32 thread EC2 machine (instance type
  cc2.8xlarge), when starting approximately eight or more Lucid containers
- simultaneously, after the first seven or so connect to dnsmasq within
- just a few seconds, the next group takes just under three minutes to
- connect.  We will provide short steps to duplicate this.
+ simultaneously: after the first seven or so connect to dnsmasq within
+ just a few seconds, the next group waits just under three minutes before
+ they start to connect.
  
- We also have past experience indicating that more grouping occurs,
- because simultaneously starting 32 containers occasionally (about once
- in 15 attempts) will result in one container that still has not
- connected to dnsmasq after ten minutes.  We have not tried to duplicate
- this, because we hope that tackling the smaller issue will also address
- the larger one.
+ This affects our ability to ssh into the containers and do work.
  
- I have not yet had enough disk space on this machine to duplicate this
- without lxc-start-ephemeral.  I intend to do so.
- 
- I have tried with a precise container.  *Precise containers do not
- exhibit this bug* in the experiment I performed, at least up to 16
- containers.  I plan to duplicate this at least a couple more times to
- verify, but I'm comfortable saying initially that I believe this is
- specific to Lucid containers.  I'll report back.
+ We will provide short steps to duplicate this.  Please note also the
+ extra data we include at the end of this report, showing experiments we
+ have run to rule out certain causes and scenarios.
  
  To duplicate (again, I am doing this on a specific EC2 machine with a
  lot of cores, but I believe you can see something similar to this even
  on a 4 core/8 hyperthread machine):
  
  sudo apt-get install moreutils
  sudo lxc-create -t ubuntu -n lucid -- -r lucid -a i386 -b ubuntu
  parallel -j 16 bash -c lxc-start-ephemeral -d -o lucid -- 1 2 3 4 5 6 7 8 9 
10 11 12 13 14 15 16
  
  Now look at /var/log/syslog at the most recent dnsmasq entries.  You
- will see about seven containers get DHCPACK from dnsmasq within about 20
- seconds; then you will see a pause of about three minutes; then you will
- see more containers report to dnsmasq.  Here's an excerpt (the
- approximate three minute jump happens between lines 5 and 6):
+ will see that about seven containers get DHCPACK from dnsmasq within
+ about 20 seconds; then you will see a pause of about three minutes; then
+ you will see more containers report to dnsmasq.  Here's an excerpt (the
+ approximate three minute jump happens between lines 5 and 6, and then I
+ include all lines up to the next ACK):
  
  ...
  Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPACK(lxcbr0) 
10.0.3.161 00:16:3e:38:9f:e2 lptests-temp-5uKcEua
- Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPREQUEST(lxcbr0) 
10.0.3.37 00:16:3e:9c:ae:16 
+ Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPREQUEST(lxcbr0) 
10.0.3.37 00:16:3e:9c:ae:16
  Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPACK(lxcbr0) 10.0.3.37 
00:16:3e:9c:ae:16 lptests-temp-1HxiwbU
- Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPREQUEST(lxcbr0) 
10.0.3.62 00:16:3e:69:83:2b 
+ Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPREQUEST(lxcbr0) 
10.0.3.62 00:16:3e:69:83:2b
  Jun 18 20:40:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPACK(lxcbr0) 10.0.3.62 
00:16:3e:69:83:2b lptests-temp-6o30rvH
- Jun 18 20:43:05 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 
10.0.3.123 00:16:3e:0d:7b:65 
- Jun 18 20:43:05 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 
10.0.3.203 00:16:3e:0d:7b:65 
- Jun 18 20:43:08 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 
10.0.3.123 00:16:3e:32:5b:99 
- Jun 18 20:43:08 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 
10.0.3.68 00:16:3e:32:5b:99 
- Jun 18 20:43:11 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 
10.0.3.123 00:16:3e:7c:21:38 
- Jun 18 20:43:11 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 
10.0.3.35 00:16:3e:7c:21:38 
- Jun 18 20:43:14 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 
10.0.3.123 00:16:3e:72:0c:64 
- Jun 18 20:43:14 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 
10.0.3.208 00:16:3e:72:0c:64 
- Jun 18 20:43:17 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 
10.0.3.123 00:16:3e:ed:b3:a0 
- Jun 18 20:43:17 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 
10.0.3.82 00:16:3e:ed:b3:a0 
- Jun 18 20:43:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 
10.0.3.123 00:16:3e:e3:8f:1a 
- Jun 18 20:43:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPOFFER(lxcbr0) 
10.0.3.247 00:16:3e:e3:8f:1a 
- Jun 18 20:43:20 ip-10-58-49-120 dnsmasq-dhcp[7546]: DHCPDISCOVER(lxcbr0) 
10.0.3.123 00:16:3e:5c:99:20 
- Jun 18 20:43:20 ip-10-58-49-120