Re: LAGG bug or misconfiguration???
Hi guys ... just for the record. I've fixed the issue simply moving the cable of the backup interface to another switch as suggested by the network guys of the DC. Which is even preferable under the network redundancy perspective. Now works perfectly and the failover NIC0-NIC1 and (NIC1-NIC0) is immediate. Many thanks for your time. Cheers. On Fri, 2012-03-16 at 17:49 +0100, Damien Fleuriot wrote: I confirm you should see fast transition for your VLANs to forwarding state. Are your ports in access or trunk mode ? If they're trunked, portfast alone won't do it, you need spanning-tree portfast trunk. Additionally, are you using link aggregation on the cisco swi ? (channel-group) On 3/16/12 5:31 PM, Snoop wrote: That's the STP configuration on my two switch ports: spanning-tree portfast spanning-tree bpduguard enable On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: You're not looking for FEC or ethechannel or 802.3ad at all. What you're looking for, in the case of a *failover* configuration, is a spanning-tree portfast feature so that your port doesn't transition through the different spantree states before forwarding traffic. Kindly obtain the configuration from whoever has it and let us know. On 3/16/12 11:18 AM, Snoop wrote: Hi Dweimer and Damien, thanks for replying. The server is connected to a switch of the datacentre. The configuration of this switch is unknown to me and I obviously have no access to it but I truly believe that such an enterprise environment has management capabilities. Anyway, in which way the configuration would affect the lagg functionality? Might this issue be related to what stated in the FreeBSD LAGG pages in the handbook? http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html Cisco® Fast EtherChannel® Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. If the switch supports LACP then that should be used instead. On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop sn...@email.it wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover
Re: LAGG bug or misconfiguration???
Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop sn...@email.it wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=4ACTIVE laggport: bge0 flags=1MASTER ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE __ Also nothing unusual
Re: LAGG bug or misconfiguration???
Hi Dweimer and Damien, thanks for replying. The server is connected to a switch of the datacentre. The configuration of this switch is unknown to me and I obviously have no access to it but I truly believe that such an enterprise environment has management capabilities. Anyway, in which way the configuration would affect the lagg functionality? Might this issue be related to what stated in the FreeBSD LAGG pages in the handbook? http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html Cisco® Fast EtherChannel® Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. If the switch supports LACP then that should be used instead. On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop sn...@email.it wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=4ACTIVE laggport: bge0 flags=1MASTER ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
Re: LAGG bug or misconfiguration???
You're not looking for FEC or ethechannel or 802.3ad at all. What you're looking for, in the case of a *failover* configuration, is a spanning-tree portfast feature so that your port doesn't transition through the different spantree states before forwarding traffic. Kindly obtain the configuration from whoever has it and let us know. On 3/16/12 11:18 AM, Snoop wrote: Hi Dweimer and Damien, thanks for replying. The server is connected to a switch of the datacentre. The configuration of this switch is unknown to me and I obviously have no access to it but I truly believe that such an enterprise environment has management capabilities. Anyway, in which way the configuration would affect the lagg functionality? Might this issue be related to what stated in the FreeBSD LAGG pages in the handbook? http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html Cisco® Fast EtherChannel® Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. If the switch supports LACP then that should be used instead. On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop sn...@email.it wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover
Re: LAGG bug or misconfiguration???
I've requested the configuration. I'll post that as soon as I have it. Thank you very much for your time. On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: You're not looking for FEC or ethechannel or 802.3ad at all. What you're looking for, in the case of a *failover* configuration, is a spanning-tree portfast feature so that your port doesn't transition through the different spantree states before forwarding traffic. Kindly obtain the configuration from whoever has it and let us know. On 3/16/12 11:18 AM, Snoop wrote: Hi Dweimer and Damien, thanks for replying. The server is connected to a switch of the datacentre. The configuration of this switch is unknown to me and I obviously have no access to it but I truly believe that such an enterprise environment has management capabilities. Anyway, in which way the configuration would affect the lagg functionality? Might this issue be related to what stated in the FreeBSD LAGG pages in the handbook? http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html Cisco® Fast EtherChannel® Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. If the switch supports LACP then that should be used instead. On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop sn...@email.it wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet
Re: LAGG bug or misconfiguration???
That's the STP configuration on my two switch ports: spanning-tree portfast spanning-tree bpduguard enable On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: You're not looking for FEC or ethechannel or 802.3ad at all. What you're looking for, in the case of a *failover* configuration, is a spanning-tree portfast feature so that your port doesn't transition through the different spantree states before forwarding traffic. Kindly obtain the configuration from whoever has it and let us know. On 3/16/12 11:18 AM, Snoop wrote: Hi Dweimer and Damien, thanks for replying. The server is connected to a switch of the datacentre. The configuration of this switch is unknown to me and I obviously have no access to it but I truly believe that such an enterprise environment has management capabilities. Anyway, in which way the configuration would affect the lagg functionality? Might this issue be related to what stated in the FreeBSD LAGG pages in the handbook? http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html Cisco® Fast EtherChannel® Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. If the switch supports LACP then that should be used instead. On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop sn...@email.it wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4
Re: LAGG bug or misconfiguration???
I confirm you should see fast transition for your VLANs to forwarding state. Are your ports in access or trunk mode ? If they're trunked, portfast alone won't do it, you need spanning-tree portfast trunk. Additionally, are you using link aggregation on the cisco swi ? (channel-group) On 3/16/12 5:31 PM, Snoop wrote: That's the STP configuration on my two switch ports: spanning-tree portfast spanning-tree bpduguard enable On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: You're not looking for FEC or ethechannel or 802.3ad at all. What you're looking for, in the case of a *failover* configuration, is a spanning-tree portfast feature so that your port doesn't transition through the different spantree states before forwarding traffic. Kindly obtain the configuration from whoever has it and let us know. On 3/16/12 11:18 AM, Snoop wrote: Hi Dweimer and Damien, thanks for replying. The server is connected to a switch of the datacentre. The configuration of this switch is unknown to me and I obviously have no access to it but I truly believe that such an enterprise environment has management capabilities. Anyway, in which way the configuration would affect the lagg functionality? Might this issue be related to what stated in the FreeBSD LAGG pages in the handbook? http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html Cisco® Fast EtherChannel® Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. If the switch supports LACP then that should be used instead. On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop sn...@email.it wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x
Re: LAGG bug or misconfiguration???
I actually don't know Damien. I'll have to have a chat with the network guy in the DC as I'm not managing the switch neither I have access to it, plus I'm not really a Cisco guy so I'll forward those questions to him. Moreover I'm getting a bit lost with this. If the ports are in trunk mode would this affect the FreeBSD lagg functionality? If yes how? Do I need spanning-tree portfast trunk to make it work properly? I really appreciate your useful inputs Damien. On Fri, 2012-03-16 at 17:49 +0100, Damien Fleuriot wrote: I confirm you should see fast transition for your VLANs to forwarding state. Are your ports in access or trunk mode ? If they're trunked, portfast alone won't do it, you need spanning-tree portfast trunk. Additionally, are you using link aggregation on the cisco swi ? (channel-group) On 3/16/12 5:31 PM, Snoop wrote: That's the STP configuration on my two switch ports: spanning-tree portfast spanning-tree bpduguard enable On Fri, 2012-03-16 at 12:10 +0100, Damien Fleuriot wrote: You're not looking for FEC or ethechannel or 802.3ad at all. What you're looking for, in the case of a *failover* configuration, is a spanning-tree portfast feature so that your port doesn't transition through the different spantree states before forwarding traffic. Kindly obtain the configuration from whoever has it and let us know. On 3/16/12 11:18 AM, Snoop wrote: Hi Dweimer and Damien, thanks for replying. The server is connected to a switch of the datacentre. The configuration of this switch is unknown to me and I obviously have no access to it but I truly believe that such an enterprise environment has management capabilities. Anyway, in which way the configuration would affect the lagg functionality? Might this issue be related to what stated in the FreeBSD LAGG pages in the handbook? http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/network-aggregation.html Cisco® Fast EtherChannel® Cisco Fast EtherChannel (FEC), is a static setup and does not negotiate aggregation with the peer or exchange frames to monitor the link. If the switch supports LACP then that should be used instead. On Fri, 2012-03-16 at 10:45 +0100, Damien Fleuriot wrote: Sorry top posting from phone. Show your switch's port configurations. We're using VLAN tagging over lagg failover interfaces at work and I have already tried the tests you described, to much better results. We're also running 8.2 so the only thing that seems to differ between us is the switch config, likely. On 15 Mar 2012, at 20:06, Snoop sn...@email.it wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x
LAGG bug or misconfiguration???
Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=4ACTIVE laggport: bge0 flags=1MASTER ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE __ Also nothing unusual on dmesg: ... bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP ... The following
Re: LAGG bug or misconfiguration???
On 15.03.2012 14:06, Snoop wrote: Hi there, a while after setting up my new server (with 8 jails in it) I've decided (after postponing several times) to properly check the functionality of the lagg and the result was very disappointing. The test I've done is very simple. I've started copying a file from one site to another of my VPN network (from the server I've been testing the net to another node somewhere else) and in the meantime I've been physically disconnecting the main network cable to check the responsiveness of the lagg configuration. Then I've plugged the cable back to check if the traffic would switch back to the main NIC as it should. The result was basically this (lagg0 members: bge0 primary, bge1 secondary) - when bge0 unplugged the traffic switched almost instantaneously to bge1 - when bge0 plugged back in, the network stopped working completely with the two NICs polling synchronously until I manually unplug bge1. Then within 2-4 seconds traffic goes back on bge0 (I've been waiting for a little more than a minute maximum to avoid all the active connections on the server to timeout). Now, I've repeated the same test about 10-15 times randomly waiting for different times between the unplug-replug procedure. The result was always the same. So, below are the ipconfig outputs - before to start the test - when bge0 gets unplugged - when bge0 gets plugged back in I couldn't see anything odd. ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=4ACTIVE laggport: bge0 flags=1MASTER ___ lagg0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500 options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE ether 00:14:ee:00:8a:c0 inet xxx.xx.xx.224 netmask 0xff00 broadcast xxx.xx.xx.255 inet xxx.xx.xx.227 netmask 0x broadcast xxx.xx.xx.227 inet xxx.xx.xx.225 netmask 0x broadcast xxx.xx.xx.225 inet 172.16.3.2 netmask 0x broadcast 172.16.3.2 inet 172.16.3.3 netmask 0x broadcast 172.16.3.3 inet 172.16.3.4 netmask 0x broadcast 172.16.3.4 inet 172.16.3.5 netmask 0x broadcast 172.16.3.5 inet 172.16.3.6 netmask 0x broadcast 172.16.3.6 inet xxx.xx.xx.226 netmask 0x broadcast xxx.xx.xx.226 media: Ethernet autoselect status: active laggproto failover laggport: bge1 flags=0 laggport: bge0 flags=5MASTER,ACTIVE __ Also nothing unusual on dmesg: ... bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to DOWN bge1: link state changed to UP bge0: link state changed to DOWN bge0: link state changed to UP bge1: link state changed to