On Sat, Jun 19, 2010 at 02:16:42PM -0700, cd34 wrote: > I did a quick run of ab, several requests came in at 3.1 seconds. I > don't know what sort of network they are running, but 3 seconds seems > very similar to an arp table rebuild in certain switches. When enough > machines are put on a network, it overflows the 8192 record tcam (or > whatever the limit is on that switch), which causes it to need to > rebuild the arp tree as it rediscovers. Until each machine is > rediscovered, they are basically 'off-net'.
Such ARP table exhaustion would typically be steady-state in a datacenter environment - a complete and total meltdown, never recoverging. 3 seconds wouldn't be nearly enough time for such a thing to wring out. Also, TCAM isn't used for ARP entries - lots of things are stored in TCAM, but ARP isn't one of them. > I'd file a trouble ticket with them and let them know that you are > seeing packet loss into their network and that you are seeing timeouts > on requests going to your server and see what they say. I don't think > your issue is software related - I agree that it looks like it is > network related. If you opt to contact their support department, provide a traceroute from your problamtic location to the destination that shows either failing traceroutes or extended delays at the LAST HOP (the intermediate hops don't matter). MTR is a nice program for generating nice reports of this, but some NOCs want classic traceroute output. I don't see any evidence of a network issue, unless you were able to measure the 6% packet loss only at the time of HTTP impact. I'm not seeing any packet loss to 74.213.166.67 from my location, but it appears that the HTTP service is stopped now. Are you hosting on a virtualized platform? I'd be suspicious of platform performance issues. Ross -- Ross Vandegrift [email protected] "If the fight gets hot, the songs get hotter. If the going gets tough, the songs get tougher." --Woody Guthrie
signature.asc
Description: Digital signature
