https://bugzilla.wikimedia.org/show_bug.cgi?id=30086
Maarten Dammers <maar...@mdammers.nl> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |FIXED --- Comment #44 from Maarten Dammers <maar...@mdammers.nl> 2011-08-13 10:03:01 UTC --- Over the last couple of days several people turned around the whole cluster trying to pinpoint the bottleneck. Squid were ruled out, ms7 and nfs was ruled out. It ended up being a low level problem: [11:57] mark it was a nasty problem with TSO/GRO being broken with linux 802.1q tagged interfaces [11:57] multichill So really low level problem? [11:58] mark yeah [11:58] mark so, the nic on lvs4 was reassembling tcp packets into jumbo packets before presenting them to the OS [11:58] mark after which LVS would forward them [11:58] mark and then they wouldn't be split back up again by the nic after sending out [11:58] multichill And fragmentation? [11:58] mark and dropped as jumbo packets [11:58] mark so, tcp delays, icmp "frag needed" messages being sent [11:58] mark really hard to see because on the wire, they were < 1500 byte packages as usual [12:00] mark the fix was disabling GRO on all lvs servers [12:00] mark no idea why it was on by default anyway, on most servers it isn't [12:00] mark probably some nic drivers enable it, most don't [12:01] mark i bet TSO wasn't happening because of the added 802.1q vlan tag Thanks everyone for debugging this problem. I confirmed on Commons that upload is fast again (17MB file uploaded in less than 10 seconds). Closing this bug as resolved. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l