Andrew, Thank you for the excellent write-up. There was some really good Sherlock Holms stuff going on here: 1. the problem hit multiple machines at once; this eliminates certain local issues 2. all network links good except the one we can't test easily; smells like that's the problem 3. hash of 5-tuple used for link aggregation; something "every network engineer knows" but is non-obvious elsewhere... and obviously to the network people at times. 4. Weird error messages that hint at what it is but not really. 5. Fixing a minor problem that shouldn't be the cause, fixes the larger problem; 0.4% packet loss is really bad for TCP.
Good story! Sorry it had to happen to you! Thanks for sharing it so we can all learn! Tom _______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/