Re: [Lxc-users] Bug discussion: implementing high virtual device MAC addresses

2011-10-24 Thread Derek Simkowiak
 Hello,
 Just following up re: this bug.  I think it's a pretty serious issue.

 I am looking to work on this, but I am seeking some feedback and 
direction from one of the core LXC devs.

- Do you agree with my analysis?
- Has anyone else worked on this already?
etc.


Thanks,
Derek

On 10/18/2011 04:31 PM, Derek Simkowiak wrote:
   There is a behavior in the Linux kernel which can cause a bridge
 device to change MAC address, thus causing a network blackout of several
 seconds (while everybody ARPs the new MAC address flushes the old one).
 This happens when bridging an enslaved interface, like we do with LXC.

   The symptom is that the LXC host will black out for several seconds
 when starting or stopping an LXC container.  Your SSH terminal on the
 host will freeze and become unresponsive.  (It is a random symptom,
 because the blackout only happens if the randomly-assigned MAC address
 of the virtual device is lower than that of the physical eth0 device).

   This behavior was first observed by the libvirt folks when creating
 virtual machines.  You can read more details about it (and how they
 fixed it) here:

 https://www.redhat.com/archives/libvir-list/2010-July/msg00450.html
 https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/584048

   I have observed the symptom under LXC, and the workaround for it
 has been independently confirmed for LXC in this bug report (ID: 3411497):

 http://sourceforge.net/tracker/index.php?func=detailaid=3411497group_id=163076atid=826303


   The workaround for the bug is to give the virtual device a high MAC
 address, thus discouraging the bridge device from adapting its MAC
 address as its own.

   I have mentioned this bug on the list before, however, I was
 confused about which MAC address was causing the problem.  This is NOT
 the mac address specified in lxc.conf, like this:

 lxc.network.hwaddr = fe:16:3e:fd:5a:5b

   That MAC address has nothing to do with the bug; the host's bridge
 device (br0) will never assume a configured LXC MAC address as its own.
 Instead, the MAC address in question is the one of the virtual veth
 device, as shown with ifconfig on the host:

 veth0IEDlk Link encap:Ethernet  HWaddr 4e:34:7c:dc:92:e8
 [...snip...]

   That HWaddr should be given a high prefix to avoid the network
 blackouts, just like they've done for libvirt.  That does not exist in
 any config file anywhere; it must be fixed in the LXC source code.

   I looked in network.c for the LXC source code and I think the fix
 should go in lxc_bridge_attach() near line 991.  The fix would put a
 manually-generated MAC address -- one with a high prefix -- into
 ifr.ifr_hwaddr.sa_data and thus replace the random one assigned by the
 kernel.

   However, I'm new to the LXC source and would like some input and
 analysis from a more seasoned contributor.  I would be happy to test and
 maybe even contribute a patch, but I'd like some feedback first.


 Thank You,
 Derek Simkowiak


 --
 All the data continuously generated in your IT infrastructure contains a
 definitive record of customers, application performance, security
 threats, fraudulent activity and more. Splunk takes this data and makes
 sense of it. Business sense. IT sense. Common sense.
 http://p.sf.net/sfu/splunk-d2d-oct
 ___
 Lxc-users mailing list
 Lxc-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/lxc-users


--
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is growing even more rapidly.
Take a complimentary Learning@Cisco Self-Assessment and learn 
about Cisco certifications, training, and career opportunities. 
http://p.sf.net/sfu/cisco-dev2dev
___
Lxc-users mailing list
Lxc-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/lxc-users


Re: [Lxc-users] Bug discussion: implementing high virtual device MAC addresses

2011-10-24 Thread Francois-Xavier Bourlet
Hi,

Here we are using lxc intensively with bridges. Since we don't use STP, the
downtime for each a mac@ change is unnoticeable. In fact, we discovered it
when reading this mailinglist. After some test I can confirm that most of
the time we are spawning/destroying a container, the bridge's mac@ change,
but there is no loss of connectivity, since arp tables
are instantly refreshed.

So an easy workaround for the moment is to disable STP on the brige (brctl
br0 stp off). If you are using a bridge in a controlled environment, you
really dont need STP anyway.

My 2cents,

On Mon, Oct 24, 2011 at 11:09 AM, Derek Simkowiak de...@simkowiak.netwrote:

 Hello,
 Just following up re: this bug.  I think it's a pretty serious issue.

 I am looking to work on this, but I am seeking some feedback and
 direction from one of the core LXC devs.

 - Do you agree with my analysis?
 - Has anyone else worked on this already?
 etc.


 Thanks,
 Derek

 On 10/18/2011 04:31 PM, Derek Simkowiak wrote:
There is a behavior in the Linux kernel which can cause a bridge
  device to change MAC address, thus causing a network blackout of several
  seconds (while everybody ARPs the new MAC address flushes the old one).
  This happens when bridging an enslaved interface, like we do with LXC.
 
The symptom is that the LXC host will black out for several seconds
  when starting or stopping an LXC container.  Your SSH terminal on the
  host will freeze and become unresponsive.  (It is a random symptom,
  because the blackout only happens if the randomly-assigned MAC address
  of the virtual device is lower than that of the physical eth0 device).
 
This behavior was first observed by the libvirt folks when creating
  virtual machines.  You can read more details about it (and how they
  fixed it) here:
 
  https://www.redhat.com/archives/libvir-list/2010-July/msg00450.html
  https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/584048
 
I have observed the symptom under LXC, and the workaround for it
  has been independently confirmed for LXC in this bug report (ID:
 3411497):
 
 
 http://sourceforge.net/tracker/index.php?func=detailaid=3411497group_id=163076atid=826303
 
 
The workaround for the bug is to give the virtual device a high MAC
  address, thus discouraging the bridge device from adapting its MAC
  address as its own.
 
I have mentioned this bug on the list before, however, I was
  confused about which MAC address was causing the problem.  This is NOT
  the mac address specified in lxc.conf, like this:
 
  lxc.network.hwaddr = fe:16:3e:fd:5a:5b
 
That MAC address has nothing to do with the bug; the host's bridge
  device (br0) will never assume a configured LXC MAC address as its own.
  Instead, the MAC address in question is the one of the virtual veth
  device, as shown with ifconfig on the host:
 
  veth0IEDlk Link encap:Ethernet  HWaddr 4e:34:7c:dc:92:e8
  [...snip...]
 
That HWaddr should be given a high prefix to avoid the network
  blackouts, just like they've done for libvirt.  That does not exist in
  any config file anywhere; it must be fixed in the LXC source code.
 
I looked in network.c for the LXC source code and I think the fix
  should go in lxc_bridge_attach() near line 991.  The fix would put a
  manually-generated MAC address -- one with a high prefix -- into
  ifr.ifr_hwaddr.sa_data and thus replace the random one assigned by the
  kernel.
 
However, I'm new to the LXC source and would like some input and
  analysis from a more seasoned contributor.  I would be happy to test and
  maybe even contribute a patch, but I'd like some feedback first.
 
 
  Thank You,
  Derek Simkowiak
 
 
 
 --
  All the data continuously generated in your IT infrastructure contains a
  definitive record of customers, application performance, security
  threats, fraudulent activity and more. Splunk takes this data and makes
  sense of it. Business sense. IT sense. Common sense.
  http://p.sf.net/sfu/splunk-d2d-oct
  ___
  Lxc-users mailing list
  Lxc-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/lxc-users



 --
 The demand for IT networking professionals continues to grow, and the
 demand for specialized networking skills is growing even more rapidly.
 Take a complimentary Learning@Cisco Self-Assessment and learn
 about Cisco certifications, training, and career opportunities.
 http://p.sf.net/sfu/cisco-dev2dev
 ___
 Lxc-users mailing list
 Lxc-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/lxc-users




-- 
François-Xavier Bourlet
--
The demand for IT networking 

Re: [Lxc-users] Bug discussion: implementing high virtual device MAC addresses

2011-10-24 Thread Derek Simkowiak

Francois-Xavier,
Thank you for your feedback.

I have seen the issue on two systems where STP is turned off.  Here 
is the /etc/network/interfaces entry for the particular bridge where 
I've seen it the most; note the last line:


iface br0 inet dhcp
bridge_ports eth0
bridge_fd 9
bridge_hello 2
bridge_maxage 12
bridge_stp off

(As you can see, on this server I was using DHCP for the bridge.  
That is rare, but not unusual.  In this case we centrally manage all 
fixed IP addresses using an /etc/ethers file on the DHCP server.)


I submit that the symptom is not related to STP, but instead is 
related to the ARP cache (and network topology) of the equipment you are 
connecting through.  With my Linux laptop hooked up through two GigE 
switches (and no STP), I see the host's network freeze.  I've seen it on 
Ubuntu 10.04 and 11.04.


F /If you are using a bridge in a controlled environment, you really 
dont need STP anyway./


If using collocation or managed hardware from a data center 
provider, you may not have a choice re: STP.


It is worth noting that the KVM/libvirt folks found the issue 
serious enough to fix.



Thank You,
Derek Simkowiak

On 10/24/2011 11:41 AM, Francois-Xavier Bourlet wrote:

Hi,

Here we are using lxc intensively with bridges. Since we don't use 
STP, the downtime for each a mac@ change is unnoticeable. In fact, 
we discovered it when reading this mailinglist. After some test I can 
confirm that most of the time we are spawning/destroying a container, 
the bridge's mac@ change, but there is no loss of connectivity, since 
arp tables are instantly refreshed.


So an easy workaround for the moment is to disable STP on the brige 
(brctl br0 stp off). If you are using a bridge in a 
controlled environment, you really dont need STP anyway.


My 2cents,

On Mon, Oct 24, 2011 at 11:09 AM, Derek Simkowiak de...@simkowiak.net 
mailto:de...@simkowiak.net wrote:


Hello,
Just following up re: this bug.  I think it's a pretty serious
issue.

I am looking to work on this, but I am seeking some feedback and
direction from one of the core LXC devs.

- Do you agree with my analysis?
- Has anyone else worked on this already?
etc.


Thanks,
Derek

On 10/18/2011 04:31 PM, Derek Simkowiak wrote:
   There is a behavior in the Linux kernel which can cause a
bridge
 device to change MAC address, thus causing a network blackout of
several
 seconds (while everybody ARPs the new MAC address flushes the
old one).
 This happens when bridging an enslaved interface, like we do
with LXC.

   The symptom is that the LXC host will black out for
several seconds
 when starting or stopping an LXC container.  Your SSH terminal
on the
 host will freeze and become unresponsive.  (It is a random symptom,
 because the blackout only happens if the randomly-assigned MAC
address
 of the virtual device is lower than that of the physical eth0
device).

   This behavior was first observed by the libvirt folks when
creating
 virtual machines.  You can read more details about it (and how they
 fixed it) here:

 https://www.redhat.com/archives/libvir-list/2010-July/msg00450.html
 https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/584048

   I have observed the symptom under LXC, and the workaround
for it
 has been independently confirmed for LXC in this bug report (ID:
3411497):



http://sourceforge.net/tracker/index.php?func=detailaid=3411497group_id=163076atid=826303

http://sourceforge.net/tracker/index.php?func=detailaid=3411497group_id=163076atid=826303


   The workaround for the bug is to give the virtual device a
high MAC
 address, thus discouraging the bridge device from adapting its MAC
 address as its own.

   I have mentioned this bug on the list before, however, I was
 confused about which MAC address was causing the problem.  This
is NOT
 the mac address specified in lxc.conf, like this:

 lxc.network.hwaddr = fe:16:3e:fd:5a:5b

   That MAC address has nothing to do with the bug; the
host's bridge
 device (br0) will never assume a configured LXC MAC address as
its own.
 Instead, the MAC address in question is the one of the virtual
veth
 device, as shown with ifconfig on the host:

 veth0IEDlk Link encap:Ethernet  HWaddr 4e:34:7c:dc:92:e8
 [...snip...]

   That HWaddr should be given a high prefix to avoid the network
 blackouts, just like they've done for libvirt.  That does not
exist in
 any config file anywhere; it must be fixed in the LXC source code.

   I looked in network.c for the LXC source code and I think
the fix
 should go in lxc_bridge_attach() near line 991.  The fix