Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-04-21 Thread Chris Woodfield
Replying to an old thread...

I'm seeing a very similar situation caused not by ZFS but by a dual-switch 
model resulting in one switch never seeing the frames that come in over the 
other since their least-cost routing hop is on the same switch. We've tuned our 
CAM and ARP timeouts to prevent this normally, but spanning-tree events/TCNs 
put all of those CAM entries into a fast-aging queue, which results in traffic 
to each host flooding until the ARP entry times out. Clearing the ARP table 
manually is a fix, but not exactly without its own impact.

However, while researching the issue I found this paragraph in Cisco's docs:

Note: In MSFC IOS, there is an optimization that will trigger VLAN interfaces 
to repopulate their ARP tables when there is a TCN in the respective VLAN. This 
limits flooding in case of TCNs, as there will be an ARP broadcast and the host 
MAC address will be relearned as the hosts reply to ARP.

http://www.cisco.com/en/US/products/hw/switches/ps700/products_tech_note09186a00801d0808.shtml#cause2

Given that the switches in question are Cat6Ks running SX code, any reason the 
above might either not be working or not helping us even if it is? Is there a 
command needed to enable this optimization?

Thanks,

-C

On Mar 23, 2010, at 4:12 PM, Gert Doering wrote:

 Hi,
 
 On Mon, Mar 22, 2010 at 07:03:36PM -0700, Ray Van Dolson wrote:
 What's happening is, esx1/2 beging talking to zfs1.  All is well for a
 while... but at some point, zfs1's MAC address expires from the CAM on
 the switch (I guess that is what is happening).
 
 If zfs is only receiving packets, yes, that's likely to happen.
 
 What we do is easy: install something like rwhod that broadcasts a 
 single packet every minute.  Make sure all CAM tables are always up
 to date.
 
 gert
 -- 
 USENET is *not* the non-clickable part of WWW!
   //www.muc.de/~gert/
 Gert Doering - Munich, Germany g...@greenie.muc.de
 fax: +49-89-35655025g...@net.informatik.tu-muenchen.de
 ___
 cisco-nsp mailing list  cisco-nsp@puck.nether.net
 https://puck.nether.net/mailman/listinfo/cisco-nsp
 archive at http://puck.nether.net/pipermail/cisco-nsp/


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-04-21 Thread Gert Doering
Hi,

On Wed, Apr 21, 2010 at 10:05:29AM -0400, Chris Woodfield wrote:
 However, while researching the issue I found this paragraph in Cisco's docs:
 
 Note: In MSFC IOS, there is an optimization that will trigger
 VLAN interfaces to repopulate their ARP tables when there is a TCN
 in the respective VLAN. This limits flooding in case of TCNs, as
 there will be an ARP broadcast and the host MAC address will be
 relearned as the hosts reply to ARP.

if there is a TCN.

TCN = Topology Change Notice, so unless a port is causing a spanning-tree
event, there won't be any TCNs - no rebroadcasting.

You don't want gratuitous TCNs :-)

gert
-- 
USENET is *not* the non-clickable part of WWW!
   //www.muc.de/~gert/
Gert Doering - Munich, Germany g...@greenie.muc.de
fax: +49-89-35655025g...@net.informatik.tu-muenchen.de


pgpGDTSBzA7ma.pgp
Description: PGP signature
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-04-21 Thread Chris Woodfield
You're right, we don't, but they're not *completely* unavoidable... :)

-C

On Apr 21, 2010, at 10:38 AM, Gert Doering wrote:

 Hi,
 
 On Wed, Apr 21, 2010 at 10:05:29AM -0400, Chris Woodfield wrote:
 However, while researching the issue I found this paragraph in Cisco's docs:
 
 Note: In MSFC IOS, there is an optimization that will trigger
 VLAN interfaces to repopulate their ARP tables when there is a TCN
 in the respective VLAN. This limits flooding in case of TCNs, as
 there will be an ARP broadcast and the host MAC address will be
 relearned as the hosts reply to ARP.
 
 if there is a TCN.
 
 TCN = Topology Change Notice, so unless a port is causing a spanning-tree
 event, there won't be any TCNs - no rebroadcasting.
 
 You don't want gratuitous TCNs :-)
 
 gert
 -- 
 USENET is *not* the non-clickable part of WWW!
   //www.muc.de/~gert/
 Gert Doering - Munich, Germany g...@greenie.muc.de
 fax: +49-89-35655025g...@net.informatik.tu-muenchen.de


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-03-23 Thread Gert Doering
Hi,

On Mon, Mar 22, 2010 at 07:03:36PM -0700, Ray Van Dolson wrote:
 What's happening is, esx1/2 beging talking to zfs1.  All is well for a
 while... but at some point, zfs1's MAC address expires from the CAM on
 the switch (I guess that is what is happening).

If zfs is only receiving packets, yes, that's likely to happen.

What we do is easy: install something like rwhod that broadcasts a 
single packet every minute.  Make sure all CAM tables are always up
to date.

gert
-- 
USENET is *not* the non-clickable part of WWW!
   //www.muc.de/~gert/
Gert Doering - Munich, Germany g...@greenie.muc.de
fax: +49-89-35655025g...@net.informatik.tu-muenchen.de


pgpfeCg3ZYxCH.pgp
Description: PGP signature
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

[c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-03-22 Thread Ray Van Dolson
We have two Dell PowerConnect M6220 switches (A1 and B1).  They are not
cross-connected, but both have uplinks to the same subnet:

  zfs1
 /
   ++
   | A1 |-|
   ++ +---+
  | Cisco |--- linux1
   ++ +---+
   | B1 |-|
   ++
/ \
  esx1 esx2

There's a host hanging off of A1 (zfs1) and several ESX hosts hanging
off of B1 (esx1, esx2, etc).  There's a host linux1 hanging off the
Cisco as well (actually many hosts, but for the sake of description

What's happening is, esx1/2 beging talking to zfs1.  All is well for a
while... but at some point, zfs1's MAC address expires from the CAM on
the switch (I guess that is what is happening).

At that point, the Cisco begins forwarding the unicast packets to all
its ports.  The result -- linux1, and all other hosts see the packets.
Occasionally, when we're dealing with a lot of traffic, this seriously
impacts performance.

My question here is.. what is the _right_ way to deal with this?  This
flooding can continue for many minutes at a time.. it isn't until an
ARP reply eminates from zfs1 that the CAM table is populated again and
the broadcasting stops.

I wonder if zfs1 would send back an ARP response quicker were it not
behind an additional switch (the PowerConnect)... 

Thanks,
Ray
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-03-22 Thread Jay Hennigan
On 3/22/10 7:03 PM, Ray Van Dolson wrote:
 We have two Dell PowerConnect M6220 switches (A1 and B1).  They are not
 cross-connected, but both have uplinks to the same subnet:
 
   zfs1
  /
++
| A1 |-|
++ +---+
   | Cisco |--- linux1
++ +---+
| B1 |-|
++
 / \
   esx1 esx2
 
 There's a host hanging off of A1 (zfs1) and several ESX hosts hanging
 off of B1 (esx1, esx2, etc).  There's a host linux1 hanging off the
 Cisco as well (actually many hosts, but for the sake of description
 
 What's happening is, esx1/2 beging talking to zfs1.  All is well for a
 while... but at some point, zfs1's MAC address expires from the CAM on
 the switch (I guess that is what is happening).
 
 At that point, the Cisco begins forwarding the unicast packets to all
 its ports.  The result -- linux1, and all other hosts see the packets.
 Occasionally, when we're dealing with a lot of traffic, this seriously
 impacts performance.

Is the Cisco a router or a layer 2 switch?  All hosts in the same IP
subnet?  Subnet masks all match?  Nothing doing proxy-arp?

 My question here is.. what is the _right_ way to deal with this?  This
 flooding can continue for many minutes at a time.. it isn't until an
 ARP reply eminates from zfs1 that the CAM table is populated again and
 the broadcasting stops.

If these are layer 2 switches, ARP won't have anything to do with it.

If zfs1's MAC expires from the MAC address table on the cisco, it will
flood the next packet for that MAC.  A1 will forward it to zfs1 or flood
if it too has expired the MAC.

When zfs1 replies, A1 forwards the reply to the cisco.  At that point,
the cisco should re-install the MAC into its address table and the
flooding cease.

This should happen with a single packet.

Does this happen with any other hosts behind A1?  Any interface errors
on any of the devices?

 I wonder if zfs1 would send back an ARP response quicker were it not
 behind an additional switch (the PowerConnect)... 

If layer 2 switches, ARP doesn't have anything to do with it.

--
Jay Hennigan - CCIE #7880 - Network Engineering - j...@impulse.net
Impulse Internet Service  -  http://www.impulse.net/
Your local telephone and internet company - 805 884-6323 - WB6RDV
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-03-22 Thread Ray Van Dolson
On Mon, Mar 22, 2010 at 08:04:10PM -0700, Jay Hennigan wrote:
 On 3/22/10 7:03 PM, Ray Van Dolson wrote:
  We have two Dell PowerConnect M6220 switches (A1 and B1).  They are not
  cross-connected, but both have uplinks to the same subnet:
  
zfs1
   /
 ++
 | A1 |-|
 ++ +---+
| Cisco |--- linux1
 ++ +---+
 | B1 |-|
 ++
  / \
esx1 esx2
  
  There's a host hanging off of A1 (zfs1) and several ESX hosts hanging
  off of B1 (esx1, esx2, etc).  There's a host linux1 hanging off the
  Cisco as well (actually many hosts, but for the sake of description
  
  What's happening is, esx1/2 beging talking to zfs1.  All is well for a
  while... but at some point, zfs1's MAC address expires from the CAM on
  the switch (I guess that is what is happening).
  
  At that point, the Cisco begins forwarding the unicast packets to all
  its ports.  The result -- linux1, and all other hosts see the packets.
  Occasionally, when we're dealing with a lot of traffic, this seriously
  impacts performance.
 
 Is the Cisco a router or a layer 2 switch?  All hosts in the same IP
 subnet?  Subnet masks all match?  Nothing doing proxy-arp?
 
  My question here is.. what is the _right_ way to deal with this?  This
  flooding can continue for many minutes at a time.. it isn't until an
  ARP reply eminates from zfs1 that the CAM table is populated again and
  the broadcasting stops.
 
 If these are layer 2 switches, ARP won't have anything to do with it.
 
 If zfs1's MAC expires from the MAC address table on the cisco, it will
 flood the next packet for that MAC.  A1 will forward it to zfs1 or flood
 if it too has expired the MAC.
 
 When zfs1 replies, A1 forwards the reply to the cisco.  At that point,
 the cisco should re-install the MAC into its address table and the
 flooding cease.
 
 This should happen with a single packet.
 
 Does this happen with any other hosts behind A1?  Any interface errors
 on any of the devices?
 
  I wonder if zfs1 would send back an ARP response quicker were it not
  behind an additional switch (the PowerConnect)... 
 
 If layer 2 switches, ARP doesn't have anything to do with it.

I'll have to find out how the Cisco's are configured.  I wouldn't be
surprised if they're doing some Layer 3 though as I know some VLAN
routing is going on...

The Dell switches both seem to have Routing Mode enabled as well (but
proxy arp disabled).

There currently aren't any other hosts behind A1, but that would be a
good test.  No interface errors currently.

Firmware is old on A1, so at this point I'm a little suspicious it's to
blame.

Just wanted to try and wrap my head around this first.

Thanks,
Ray
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-03-22 Thread Jay Nakamura
Long ago, I had this problem but the zfs1 in this case was a syslog
server.  What was happening was, all the hosts were sending traffic to
the server but since it was just receiving syslog/UDP, that host
rarely ever sent any traffic back out.  So switches didn't know where
it was once the forwarding table expired the MAC and flooded all
ports.  We just setup a cron job every 10 minutes (or something.  It
was 13 years ago.) to send out a ping to the host connected to the
farthest switch.  So, I guess it kind of depends on what traffic is
going/coming from zfs1.  If it's like syslog, it may be the same as
what I went through.

On Mon, Mar 22, 2010 at 11:14 PM, Ray Van Dolson rvandol...@esri.com wrote:
 On Mon, Mar 22, 2010 at 08:04:10PM -0700, Jay Hennigan wrote:
 On 3/22/10 7:03 PM, Ray Van Dolson wrote:
  We have two Dell PowerConnect M6220 switches (A1 and B1).  They are not
  cross-connected, but both have uplinks to the same subnet:
 
                        zfs1
                       /
                     ++
                     | A1 |-|
                     ++     +---+
                                | Cisco |--- linux1
                     ++     +---+
                     | B1 |-|
                     ++
                      / \
                    esx1 esx2
 
  There's a host hanging off of A1 (zfs1) and several ESX hosts hanging
  off of B1 (esx1, esx2, etc).  There's a host linux1 hanging off the
  Cisco as well (actually many hosts, but for the sake of description
 
  What's happening is, esx1/2 beging talking to zfs1.  All is well for a
  while... but at some point, zfs1's MAC address expires from the CAM on
  the switch (I guess that is what is happening).
 
  At that point, the Cisco begins forwarding the unicast packets to all
  its ports.  The result -- linux1, and all other hosts see the packets.
  Occasionally, when we're dealing with a lot of traffic, this seriously
  impacts performance.

 Is the Cisco a router or a layer 2 switch?  All hosts in the same IP
 subnet?  Subnet masks all match?  Nothing doing proxy-arp?

  My question here is.. what is the _right_ way to deal with this?  This
  flooding can continue for many minutes at a time.. it isn't until an
  ARP reply eminates from zfs1 that the CAM table is populated again and
  the broadcasting stops.

 If these are layer 2 switches, ARP won't have anything to do with it.

 If zfs1's MAC expires from the MAC address table on the cisco, it will
 flood the next packet for that MAC.  A1 will forward it to zfs1 or flood
 if it too has expired the MAC.

 When zfs1 replies, A1 forwards the reply to the cisco.  At that point,
 the cisco should re-install the MAC into its address table and the
 flooding cease.

 This should happen with a single packet.

 Does this happen with any other hosts behind A1?  Any interface errors
 on any of the devices?

  I wonder if zfs1 would send back an ARP response quicker were it not
  behind an additional switch (the PowerConnect)...

 If layer 2 switches, ARP doesn't have anything to do with it.

 I'll have to find out how the Cisco's are configured.  I wouldn't be
 surprised if they're doing some Layer 3 though as I know some VLAN
 routing is going on...

 The Dell switches both seem to have Routing Mode enabled as well (but
 proxy arp disabled).

 There currently aren't any other hosts behind A1, but that would be a
 good test.  No interface errors currently.

 Firmware is old on A1, so at this point I'm a little suspicious it's to
 blame.

 Just wanted to try and wrap my head around this first.

 Thanks,
 Ray
 ___
 cisco-nsp mailing list  cisco-...@puck.nether.net
 https://puck.nether.net/mailman/listinfo/cisco-nsp
 archive at http://puck.nether.net/pipermail/cisco-nsp/


___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-03-22 Thread evil bit
What's happening is, esx1/2 beging talking to zfs1.  All is well for a
while... but at some point, zfs1's MAC address expires from the CAM on
the switch (I guess that is what is happening).

Great, this is a good step; however, you need to have
valid data to backup your theory! Have you logged into the switch to
verify the MAC is expiring?

At that point, the Cisco begins forwarding the unicast packets to all
its ports.  The result -- linux1, and all other hosts see the packets.
Occasionally, when we're dealing with a lot of traffic, this seriously
impacts performance.

Have you conducted any packet captures (Wireshark is your friend).

My question here is.. what is the _right_ way to deal with this?  This
flooding can continue for many minutes at a time.. it isn't until an
ARP reply eminates from zfs1 that the CAM table is populated again and
the broadcasting stops.

When did this start? Is this a new environment? What was changed in the
network? Was anything added? Have you released a new application or released
an update to the application? There are many questions to be asked as a
first
step. You state that performance is impacted; very possible you have a
broadcast
storm (Check the broadcast counters on the interfaces [what is the cpu
utilization like
on the switches?]), bad NIC on a server, many possibilities here. What makes
you
think that flooding is occurring to a point that is causing performance
issues?

IMHO, your first start is to check the status of all switches during the
issue and
also start capturing packets utilizing wireshark on the hosts and/or
possibly SPAN
a port on the Cisco/Dells.

Good Luck
E.B
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-03-22 Thread Kevin Cullimore

On 3/22/2010 11:14 PM, Ray Van Dolson wrote:

On Mon, Mar 22, 2010 at 08:04:10PM -0700, Jay Hennigan wrote:
   

On 3/22/10 7:03 PM, Ray Van Dolson wrote:
 

We have two Dell PowerConnect M6220 switches (A1 and B1).  They are not
cross-connected, but both have uplinks to the same subnet:

   zfs1
  /
++
| A1 |-|
++ +---+
   | Cisco |--- linux1
++ +---+
| B1 |-|
++
 / \
   esx1 esx2

There's a host hanging off of A1 (zfs1) and several ESX hosts hanging
off of B1 (esx1, esx2, etc).  There's a host linux1 hanging off the
Cisco as well (actually many hosts, but for the sake of description

What's happening is, esx1/2 beging talking to zfs1.  All is well for a
while... but at some point, zfs1's MAC address expires from the CAM on
the switch (I guess that is what is happening).

At that point, the Cisco begins forwarding the unicast packets to all
its ports.  The result -- linux1, and all other hosts see the packets.
Occasionally, when we're dealing with a lot of traffic, this seriously
impacts performance.
   

Is the Cisco a router or a layer 2 switch?  All hosts in the same IP
subnet?  Subnet masks all match?  Nothing doing proxy-arp?

 

My question here is.. what is the _right_ way to deal with this?  This
flooding can continue for many minutes at a time.. it isn't until an
ARP reply eminates from zfs1 that the CAM table is populated again and
the broadcasting stops.
   

If these are layer 2 switches, ARP won't have anything to do with it.

If zfs1's MAC expires from the MAC address table on the cisco, it will
flood the next packet for that MAC.  A1 will forward it to zfs1 or flood
if it too has expired the MAC.

When zfs1 replies, A1 forwards the reply to the cisco.  At that point,
the cisco should re-install the MAC into its address table and the
flooding cease.

This should happen with a single packet.

Does this happen with any other hosts behind A1?  Any interface errors
on any of the devices?

 

I wonder if zfs1 would send back an ARP response quicker were it not
behind an additional switch (the PowerConnect)...
   

If layer 2 switches, ARP doesn't have anything to do with it.
 

I'll have to find out how the Cisco's are configured.  I wouldn't be
surprised if they're doing some Layer 3 though as I know some VLAN
routing is going on...

The Dell switches both seem to have Routing Mode enabled as well (but
proxy arp disabled).

There currently aren't any other hosts behind A1, but that would be a
good test.  No interface errors currently.

Firmware is old on A1, so at this point I'm a little suspicious it's to
blame.

Just wanted to try and wrap my head around this first.

Thanks,
Ray
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


   
In other multivendor LAN setups, We've noticed similar behavior and 
enjoyed some success by synching the arp timers. That's worth a look if 
you haven't already followed that line of investigation.

___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/


Re: [c-nsp] Unicast traffic being sent to every port? Aging issue?

2010-03-22 Thread Ray Van Dolson
On Mon, Mar 22, 2010 at 07:03:36PM -0700, Ray Van Dolson wrote:
 We have two Dell PowerConnect M6220 switches (A1 and B1).  They are not
 cross-connected, but both have uplinks to the same subnet:
 
   zfs1
  /
++
| A1 |-|
++ +---+
   | Cisco |--- linux1
++ +---+
| B1 |-|
++
 / \
   esx1 esx2
 
 There's a host hanging off of A1 (zfs1) and several ESX hosts hanging
 off of B1 (esx1, esx2, etc).  There's a host linux1 hanging off the
 Cisco as well (actually many hosts, but for the sake of description
 
 What's happening is, esx1/2 beging talking to zfs1.  All is well for a
 while... but at some point, zfs1's MAC address expires from the CAM on
 the switch (I guess that is what is happening).
 
 At that point, the Cisco begins forwarding the unicast packets to all
 its ports.  The result -- linux1, and all other hosts see the packets.
 Occasionally, when we're dealing with a lot of traffic, this seriously
 impacts performance.
 
 My question here is.. what is the _right_ way to deal with this?  This
 flooding can continue for many minutes at a time.. it isn't until an
 ARP reply eminates from zfs1 that the CAM table is populated again and
 the broadcasting stops.
 
 I wonder if zfs1 would send back an ARP response quicker were it not
 behind an additional switch (the PowerConnect)... 

Well, I think I've nailed down the cause for this.

Probably if I'd more completely described things some of you woulda
pointed it out right away, but I was trying to keep the model
simplistic.

zfs1 is multi-homed.  Two interfaces on the same subnet.  Running
Solaris 10 with no special source based routing setup

I probably don't need to go any further, but, suffice it to say,
packets destined for one interface on zfs1 were going in just fine,
but the replies were going out the other interface -- with a different
MAC address.

So obviously the switches eventually lose track of the real MAC
address and we get the symptoms I described.

Probably can be corrected with ipfilter in Solaris or changing our
infrastructure somewhat to handle this better.

Thanks all who replied -- it was good to learn about unicast storms!

Ray
___
cisco-nsp mailing list  cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/