Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
Replying to an old thread... I'm seeing a very similar situation caused not by ZFS but by a dual-switch model resulting in one switch never seeing the frames that come in over the other since their least-cost routing hop is on the same switch. We've tuned our CAM and ARP timeouts to prevent this normally, but spanning-tree events/TCNs put all of those CAM entries into a fast-aging queue, which results in traffic to each host flooding until the ARP entry times out. Clearing the ARP table manually is a fix, but not exactly without its own impact. However, while researching the issue I found this paragraph in Cisco's docs: Note: In MSFC IOS, there is an optimization that will trigger VLAN interfaces to repopulate their ARP tables when there is a TCN in the respective VLAN. This limits flooding in case of TCNs, as there will be an ARP broadcast and the host MAC address will be relearned as the hosts reply to ARP. http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00801d0808.shtml#cause2 Given that the switches in question are Cat6Ks running SX code, any reason the above might either not be working or not helping us even if it is? Is there a command needed to enable this optimization? Thanks, -C On Mar 23, 2010, at 4:12 PM, Gert Doering wrote: Hi, On Mon, Mar 22, 2010 at 07:03:36PM -0700, Ray Van Dolson wrote: What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). If zfs is only receiving packets, yes, that's likely to happen. What we do is easy: install something like rwhod that broadcasts a single packet every minute. Make sure all CAM tables are always up to date. gert -- USENET is *not* the non-clickable part of WWW! //www.muc.de/~gert/ Gert Doering - Munich, Germany g...@greenie.muc.de fax: +49-89-35655025g...@net.informatik.tu-muenchen.de ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/ ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
Hi, On Wed, Apr 21, 2010 at 10:05:29AM -0400, Chris Woodfield wrote: However, while researching the issue I found this paragraph in Cisco's docs: Note: In MSFC IOS, there is an optimization that will trigger VLAN interfaces to repopulate their ARP tables when there is a TCN in the respective VLAN. This limits flooding in case of TCNs, as there will be an ARP broadcast and the host MAC address will be relearned as the hosts reply to ARP. if there is a TCN. TCN = Topology Change Notice, so unless a port is causing a spanning-tree event, there won't be any TCNs - no rebroadcasting. You don't want gratuitous TCNs :-) gert -- USENET is *not* the non-clickable part of WWW! //www.muc.de/~gert/ Gert Doering - Munich, Germany g...@greenie.muc.de fax: +49-89-35655025g...@net.informatik.tu-muenchen.de pgpGDTSBzA7ma.pgp Description: PGP signature ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
You're right, we don't, but they're not *completely* unavoidable... :) -C On Apr 21, 2010, at 10:38 AM, Gert Doering wrote: Hi, On Wed, Apr 21, 2010 at 10:05:29AM -0400, Chris Woodfield wrote: However, while researching the issue I found this paragraph in Cisco's docs: Note: In MSFC IOS, there is an optimization that will trigger VLAN interfaces to repopulate their ARP tables when there is a TCN in the respective VLAN. This limits flooding in case of TCNs, as there will be an ARP broadcast and the host MAC address will be relearned as the hosts reply to ARP. if there is a TCN. TCN = Topology Change Notice, so unless a port is causing a spanning-tree event, there won't be any TCNs - no rebroadcasting. You don't want gratuitous TCNs :-) gert -- USENET is *not* the non-clickable part of WWW! //www.muc.de/~gert/ Gert Doering - Munich, Germany g...@greenie.muc.de fax: +49-89-35655025g...@net.informatik.tu-muenchen.de ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
Hi, On Mon, Mar 22, 2010 at 07:03:36PM -0700, Ray Van Dolson wrote: What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). If zfs is only receiving packets, yes, that's likely to happen. What we do is easy: install something like rwhod that broadcasts a single packet every minute. Make sure all CAM tables are always up to date. gert -- USENET is *not* the non-clickable part of WWW! //www.muc.de/~gert/ Gert Doering - Munich, Germany g...@greenie.muc.de fax: +49-89-35655025g...@net.informatik.tu-muenchen.de pgpfeCg3ZYxCH.pgp Description: PGP signature ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
[c-nsp] Unicast traffic being sent to every port? Aging issue?
We have two Dell PowerConnect M6220 switches (A1 and B1). They are not cross-connected, but both have uplinks to the same subnet: zfs1 / ++ | A1 |-| ++ +---+ | Cisco |--- linux1 ++ +---+ | B1 |-| ++ / \ esx1 esx2 There's a host hanging off of A1 (zfs1) and several ESX hosts hanging off of B1 (esx1, esx2, etc). There's a host linux1 hanging off the Cisco as well (actually many hosts, but for the sake of description What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). At that point, the Cisco begins forwarding the unicast packets to all its ports. The result -- linux1, and all other hosts see the packets. Occasionally, when we're dealing with a lot of traffic, this seriously impacts performance. My question here is.. what is the _right_ way to deal with this? This flooding can continue for many minutes at a time.. it isn't until an ARP reply eminates from zfs1 that the CAM table is populated again and the broadcasting stops. I wonder if zfs1 would send back an ARP response quicker were it not behind an additional switch (the PowerConnect)... Thanks, Ray ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
On 3/22/10 7:03 PM, Ray Van Dolson wrote: We have two Dell PowerConnect M6220 switches (A1 and B1). They are not cross-connected, but both have uplinks to the same subnet: zfs1 / ++ | A1 |-| ++ +---+ | Cisco |--- linux1 ++ +---+ | B1 |-| ++ / \ esx1 esx2 There's a host hanging off of A1 (zfs1) and several ESX hosts hanging off of B1 (esx1, esx2, etc). There's a host linux1 hanging off the Cisco as well (actually many hosts, but for the sake of description What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). At that point, the Cisco begins forwarding the unicast packets to all its ports. The result -- linux1, and all other hosts see the packets. Occasionally, when we're dealing with a lot of traffic, this seriously impacts performance. Is the Cisco a router or a layer 2 switch? All hosts in the same IP subnet? Subnet masks all match? Nothing doing proxy-arp? My question here is.. what is the _right_ way to deal with this? This flooding can continue for many minutes at a time.. it isn't until an ARP reply eminates from zfs1 that the CAM table is populated again and the broadcasting stops. If these are layer 2 switches, ARP won't have anything to do with it. If zfs1's MAC expires from the MAC address table on the cisco, it will flood the next packet for that MAC. A1 will forward it to zfs1 or flood if it too has expired the MAC. When zfs1 replies, A1 forwards the reply to the cisco. At that point, the cisco should re-install the MAC into its address table and the flooding cease. This should happen with a single packet. Does this happen with any other hosts behind A1? Any interface errors on any of the devices? I wonder if zfs1 would send back an ARP response quicker were it not behind an additional switch (the PowerConnect)... If layer 2 switches, ARP doesn't have anything to do with it. -- Jay Hennigan - CCIE #7880 - Network Engineering - j...@impulse.net Impulse Internet Service - http://www.impulse.net/ Your local telephone and internet company - 805 884-6323 - WB6RDV ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
On Mon, Mar 22, 2010 at 08:04:10PM -0700, Jay Hennigan wrote: On 3/22/10 7:03 PM, Ray Van Dolson wrote: We have two Dell PowerConnect M6220 switches (A1 and B1). They are not cross-connected, but both have uplinks to the same subnet: zfs1 / ++ | A1 |-| ++ +---+ | Cisco |--- linux1 ++ +---+ | B1 |-| ++ / \ esx1 esx2 There's a host hanging off of A1 (zfs1) and several ESX hosts hanging off of B1 (esx1, esx2, etc). There's a host linux1 hanging off the Cisco as well (actually many hosts, but for the sake of description What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). At that point, the Cisco begins forwarding the unicast packets to all its ports. The result -- linux1, and all other hosts see the packets. Occasionally, when we're dealing with a lot of traffic, this seriously impacts performance. Is the Cisco a router or a layer 2 switch? All hosts in the same IP subnet? Subnet masks all match? Nothing doing proxy-arp? My question here is.. what is the _right_ way to deal with this? This flooding can continue for many minutes at a time.. it isn't until an ARP reply eminates from zfs1 that the CAM table is populated again and the broadcasting stops. If these are layer 2 switches, ARP won't have anything to do with it. If zfs1's MAC expires from the MAC address table on the cisco, it will flood the next packet for that MAC. A1 will forward it to zfs1 or flood if it too has expired the MAC. When zfs1 replies, A1 forwards the reply to the cisco. At that point, the cisco should re-install the MAC into its address table and the flooding cease. This should happen with a single packet. Does this happen with any other hosts behind A1? Any interface errors on any of the devices? I wonder if zfs1 would send back an ARP response quicker were it not behind an additional switch (the PowerConnect)... If layer 2 switches, ARP doesn't have anything to do with it. I'll have to find out how the Cisco's are configured. I wouldn't be surprised if they're doing some Layer 3 though as I know some VLAN routing is going on... The Dell switches both seem to have Routing Mode enabled as well (but proxy arp disabled). There currently aren't any other hosts behind A1, but that would be a good test. No interface errors currently. Firmware is old on A1, so at this point I'm a little suspicious it's to blame. Just wanted to try and wrap my head around this first. Thanks, Ray ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
Long ago, I had this problem but the zfs1 in this case was a syslog server. What was happening was, all the hosts were sending traffic to the server but since it was just receiving syslog/UDP, that host rarely ever sent any traffic back out. So switches didn't know where it was once the forwarding table expired the MAC and flooded all ports. We just setup a cron job every 10 minutes (or something. It was 13 years ago.) to send out a ping to the host connected to the farthest switch. So, I guess it kind of depends on what traffic is going/coming from zfs1. If it's like syslog, it may be the same as what I went through. On Mon, Mar 22, 2010 at 11:14 PM, Ray Van Dolson rvandol...@esri.com wrote: On Mon, Mar 22, 2010 at 08:04:10PM -0700, Jay Hennigan wrote: On 3/22/10 7:03 PM, Ray Van Dolson wrote: We have two Dell PowerConnect M6220 switches (A1 and B1). They are not cross-connected, but both have uplinks to the same subnet: zfs1 / ++ | A1 |-| ++ +---+ | Cisco |--- linux1 ++ +---+ | B1 |-| ++ / \ esx1 esx2 There's a host hanging off of A1 (zfs1) and several ESX hosts hanging off of B1 (esx1, esx2, etc). There's a host linux1 hanging off the Cisco as well (actually many hosts, but for the sake of description What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). At that point, the Cisco begins forwarding the unicast packets to all its ports. The result -- linux1, and all other hosts see the packets. Occasionally, when we're dealing with a lot of traffic, this seriously impacts performance. Is the Cisco a router or a layer 2 switch? All hosts in the same IP subnet? Subnet masks all match? Nothing doing proxy-arp? My question here is.. what is the _right_ way to deal with this? This flooding can continue for many minutes at a time.. it isn't until an ARP reply eminates from zfs1 that the CAM table is populated again and the broadcasting stops. If these are layer 2 switches, ARP won't have anything to do with it. If zfs1's MAC expires from the MAC address table on the cisco, it will flood the next packet for that MAC. A1 will forward it to zfs1 or flood if it too has expired the MAC. When zfs1 replies, A1 forwards the reply to the cisco. At that point, the cisco should re-install the MAC into its address table and the flooding cease. This should happen with a single packet. Does this happen with any other hosts behind A1? Any interface errors on any of the devices? I wonder if zfs1 would send back an ARP response quicker were it not behind an additional switch (the PowerConnect)... If layer 2 switches, ARP doesn't have anything to do with it. I'll have to find out how the Cisco's are configured. I wouldn't be surprised if they're doing some Layer 3 though as I know some VLAN routing is going on... The Dell switches both seem to have Routing Mode enabled as well (but proxy arp disabled). There currently aren't any other hosts behind A1, but that would be a good test. No interface errors currently. Firmware is old on A1, so at this point I'm a little suspicious it's to blame. Just wanted to try and wrap my head around this first. Thanks, Ray ___ cisco-nsp mailing list cisco-...@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/ ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). Great, this is a good step; however, you need to have valid data to backup your theory! Have you logged into the switch to verify the MAC is expiring? At that point, the Cisco begins forwarding the unicast packets to all its ports. The result -- linux1, and all other hosts see the packets. Occasionally, when we're dealing with a lot of traffic, this seriously impacts performance. Have you conducted any packet captures (Wireshark is your friend). My question here is.. what is the _right_ way to deal with this? This flooding can continue for many minutes at a time.. it isn't until an ARP reply eminates from zfs1 that the CAM table is populated again and the broadcasting stops. When did this start? Is this a new environment? What was changed in the network? Was anything added? Have you released a new application or released an update to the application? There are many questions to be asked as a first step. You state that performance is impacted; very possible you have a broadcast storm (Check the broadcast counters on the interfaces [what is the cpu utilization like on the switches?]), bad NIC on a server, many possibilities here. What makes you think that flooding is occurring to a point that is causing performance issues? IMHO, your first start is to check the status of all switches during the issue and also start capturing packets utilizing wireshark on the hosts and/or possibly SPAN a port on the Cisco/Dells. Good Luck E.B ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
On 3/22/2010 11:14 PM, Ray Van Dolson wrote: On Mon, Mar 22, 2010 at 08:04:10PM -0700, Jay Hennigan wrote: On 3/22/10 7:03 PM, Ray Van Dolson wrote: We have two Dell PowerConnect M6220 switches (A1 and B1). They are not cross-connected, but both have uplinks to the same subnet: zfs1 / ++ | A1 |-| ++ +---+ | Cisco |--- linux1 ++ +---+ | B1 |-| ++ / \ esx1 esx2 There's a host hanging off of A1 (zfs1) and several ESX hosts hanging off of B1 (esx1, esx2, etc). There's a host linux1 hanging off the Cisco as well (actually many hosts, but for the sake of description What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). At that point, the Cisco begins forwarding the unicast packets to all its ports. The result -- linux1, and all other hosts see the packets. Occasionally, when we're dealing with a lot of traffic, this seriously impacts performance. Is the Cisco a router or a layer 2 switch? All hosts in the same IP subnet? Subnet masks all match? Nothing doing proxy-arp? My question here is.. what is the _right_ way to deal with this? This flooding can continue for many minutes at a time.. it isn't until an ARP reply eminates from zfs1 that the CAM table is populated again and the broadcasting stops. If these are layer 2 switches, ARP won't have anything to do with it. If zfs1's MAC expires from the MAC address table on the cisco, it will flood the next packet for that MAC. A1 will forward it to zfs1 or flood if it too has expired the MAC. When zfs1 replies, A1 forwards the reply to the cisco. At that point, the cisco should re-install the MAC into its address table and the flooding cease. This should happen with a single packet. Does this happen with any other hosts behind A1? Any interface errors on any of the devices? I wonder if zfs1 would send back an ARP response quicker were it not behind an additional switch (the PowerConnect)... If layer 2 switches, ARP doesn't have anything to do with it. I'll have to find out how the Cisco's are configured. I wouldn't be surprised if they're doing some Layer 3 though as I know some VLAN routing is going on... The Dell switches both seem to have Routing Mode enabled as well (but proxy arp disabled). There currently aren't any other hosts behind A1, but that would be a good test. No interface errors currently. Firmware is old on A1, so at this point I'm a little suspicious it's to blame. Just wanted to try and wrap my head around this first. Thanks, Ray ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/ In other multivendor LAN setups, We've noticed similar behavior and enjoyed some success by synching the arp timers. That's worth a look if you haven't already followed that line of investigation. ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?
On Mon, Mar 22, 2010 at 07:03:36PM -0700, Ray Van Dolson wrote: We have two Dell PowerConnect M6220 switches (A1 and B1). They are not cross-connected, but both have uplinks to the same subnet: zfs1 / ++ | A1 |-| ++ +---+ | Cisco |--- linux1 ++ +---+ | B1 |-| ++ / \ esx1 esx2 There's a host hanging off of A1 (zfs1) and several ESX hosts hanging off of B1 (esx1, esx2, etc). There's a host linux1 hanging off the Cisco as well (actually many hosts, but for the sake of description What's happening is, esx1/2 beging talking to zfs1. All is well for a while... but at some point, zfs1's MAC address expires from the CAM on the switch (I guess that is what is happening). At that point, the Cisco begins forwarding the unicast packets to all its ports. The result -- linux1, and all other hosts see the packets. Occasionally, when we're dealing with a lot of traffic, this seriously impacts performance. My question here is.. what is the _right_ way to deal with this? This flooding can continue for many minutes at a time.. it isn't until an ARP reply eminates from zfs1 that the CAM table is populated again and the broadcasting stops. I wonder if zfs1 would send back an ARP response quicker were it not behind an additional switch (the PowerConnect)... Well, I think I've nailed down the cause for this. Probably if I'd more completely described things some of you woulda pointed it out right away, but I was trying to keep the model simplistic. zfs1 is multi-homed. Two interfaces on the same subnet. Running Solaris 10 with no special source based routing setup I probably don't need to go any further, but, suffice it to say, packets destined for one interface on zfs1 were going in just fine, but the replies were going out the other interface -- with a different MAC address. So obviously the switches eventually lose track of the real MAC address and we get the symptoms I described. Probably can be corrected with ipfilter in Solaris or changing our infrastructure somewhat to handle this better. Thanks all who replied -- it was good to learn about unicast storms! Ray ___ cisco-nsp mailing list cisco-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/