Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
Thanks for the pointer to those guidelines, very helpful. I also found the information at http://ace-host.stuart.id.au/russell/files/tc/doc/cls_u32.txt to be very useful, although quite outdated. For reference, the below is what I ended up with: ETH="eth0" EST="est 1sec 4sec" BUCKETS=64 RATE="100Mbit" tc qd del dev $ETH root 2>/dev/null tc qdisc add dev $ETH root handle 8000: $EST htb r2q 1000 default 0 tc filter add dev $ETH parent 8000: protocol ip u32 tc filter add dev $ETH parent 8000: handle 2: protocol ip u32 divisor ${BUCKETS} for i in $( seq 1 $BUCKETS ); do BUCKET=$( printf %x $((i)) ) BUCKETM1=$( printf %x $((i-1)) ) tc class add dev $ETH parent 8000: classid 8000:${BUCKET} ${EST} htb rate ${RATE} tc qdisc add dev $ETH parent 8000:${BUCKET} handle ${BUCKET}: $EST fq_codel tc filter add dev $ETH protocol ip parent 8000: u32 ht 2:${BUCKETM1}: \ match u32 0 0 flowid 8000:${BUCKET} done tc filter add dev $ETH protocol ip parent 8000: u32 ht 800:: \ match ip protocol 6 0xff \ match ip sport 8081 0x \ hashkey mask 0x at 20 \ link 2: It seems to work pretty well in testing so far, but it certainly has some flaws. The biggest is that hashing is now based solely on destination port (ephemeral in our case, so not too bad), and it assumes that the IP header has no options. Also, the filter lists inside the 2: hash table still have to have a match run against them to direct them to a classid, which seems a waste, but I can't imagine the always-match rule of "u32 0 0" adds much overhead. Thanks again, Sam On 10 July 2013 14:34, Eric Dumazet wrote: > On Wed, 2013-07-10 at 11:53 +0100, Sam Crawford wrote: > > Thanks Eric! I've adapted this to the following: > > > > > > ETH="eth1" > > EST="est 1sec 4sec" > > BUCKETS=64 > > RATE="100Mbit" > > > > > > tc qd del dev $ETH root 2>/dev/null > > > > > > tc qdisc add dev $ETH root handle 8000: $EST htb r2q 1000 default 8000 > > > > > > for i in $( seq 1 $BUCKETS ); do > >BUCKET=$( printf %x $((i)) ) > >tc class add dev $ETH parent 8000: classid 8000:${BUCKET} ${EST} > > htb rate ${RATE} > >tc qdisc add dev $ETH parent 8000:${BUCKET} handle ${BUCKET}: $EST > > fq_codel > > done > > > > > > tc filter add dev $ETH parent 8000: handle 7000 protocol ip flow hash > > keys src,dst,proto-src,proto-dst baseclass 8000:1 divisor ${BUCKETS} > > > > > If dst port is always 8081, you can omit proto-dst from the hash keys > > > > > > > This seems to deliver the behaviour I'm looking for - each flow is > > effectively rate limited to 100Mbit/s, and multiple flows between the > > same src and dst can achieve this rate (unless they're unlucky and > > fall into the same bucket, which is not too bad). Have I made any > > silly mistakes in there? > > > > > The final thing I'm struggling to work out is how to limit this rule > > to a single service. Ideally the rate limiting should only apply to > > TCP/8081. It seems you cannot combine different filter types (i.e. the > > u32 match + the flow match). Any suggestions? > > > Please follow guidelines in > > http://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.adv-filter.hashing.html > > > Because you need a more specialized setup than a single 'flow hash' > > > Apologies for going OT on the list... I hope this topic is useful to > > others Googling for it in the future. > > > > > No problem ;) > > > > -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
On Wed, 2013-07-10 at 11:53 +0100, Sam Crawford wrote: > Thanks Eric! I've adapted this to the following: > > > ETH="eth1" > EST="est 1sec 4sec" > BUCKETS=64 > RATE="100Mbit" > > > tc qd del dev $ETH root 2>/dev/null > > > tc qdisc add dev $ETH root handle 8000: $EST htb r2q 1000 default 8000 > > > for i in $( seq 1 $BUCKETS ); do >BUCKET=$( printf %x $((i)) ) >tc class add dev $ETH parent 8000: classid 8000:${BUCKET} ${EST} > htb rate ${RATE} >tc qdisc add dev $ETH parent 8000:${BUCKET} handle ${BUCKET}: $EST > fq_codel > done > > > tc filter add dev $ETH parent 8000: handle 7000 protocol ip flow hash > keys src,dst,proto-src,proto-dst baseclass 8000:1 divisor ${BUCKETS} > > If dst port is always 8081, you can omit proto-dst from the hash keys > > This seems to deliver the behaviour I'm looking for - each flow is > effectively rate limited to 100Mbit/s, and multiple flows between the > same src and dst can achieve this rate (unless they're unlucky and > fall into the same bucket, which is not too bad). Have I made any > silly mistakes in there? > > The final thing I'm struggling to work out is how to limit this rule > to a single service. Ideally the rate limiting should only apply to > TCP/8081. It seems you cannot combine different filter types (i.e. the > u32 match + the flow match). Any suggestions? > Please follow guidelines in http://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.adv-filter.hashing.html > Because you need a more specialized setup than a single 'flow hash' > Apologies for going OT on the list... I hope this topic is useful to > others Googling for it in the future. > > No problem ;) -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
Thanks Eric! I've adapted this to the following: ETH="eth1" EST="est 1sec 4sec" BUCKETS=64 RATE="100Mbit" tc qd del dev $ETH root 2>/dev/null tc qdisc add dev $ETH root handle 8000: $EST htb r2q 1000 default 8000 for i in $( seq 1 $BUCKETS ); do BUCKET=$( printf %x $((i)) ) tc class add dev $ETH parent 8000: classid 8000:${BUCKET} ${EST} htb rate ${RATE} tc qdisc add dev $ETH parent 8000:${BUCKET} handle ${BUCKET}: $EST fq_codel done tc filter add dev $ETH parent 8000: handle 7000 protocol ip flow hash keys src,dst,proto-src,proto-dst baseclass 8000:1 divisor ${BUCKETS} This seems to deliver the behaviour I'm looking for - each flow is effectively rate limited to 100Mbit/s, and multiple flows between the same src and dst can achieve this rate (unless they're unlucky and fall into the same bucket, which is not too bad). Have I made any silly mistakes in there? The final thing I'm struggling to work out is how to limit this rule to a single service. Ideally the rate limiting should only apply to TCP/8081. It seems you cannot combine different filter types (i.e. the u32 match + the flow match). Any suggestions? Apologies for going OT on the list... I hope this topic is useful to others Googling for it in the future. Thanks, Sam On 9 July 2013 18:13, Eric Dumazet wrote: > On Tue, 2013-07-09 at 16:58 +0100, Sam Crawford wrote: > > Thanks very much! One quick kernel upgrade later (to add fq_codel > > support) and that has definitely helped. I'll run a larger set of > > tests and report back. > > > > > > One final question: I understand that this applies a 100Mbit aggregate > > shaper to the specified destination(s). I'd like to instead apply this > > shaper on a per-destination or per-flow basis, but without specifying > > each individual destination (i.e. so that 10x 100M clients could still > > saturate the 1G link). Do you know if this is possible? > > If you are interested by a qdisc setup you could adapt the following to > your needs. It uses hashing so you could potentially have two flows > sharing a single bucket. > > - > #!/bin/bash > > ETH=eth0 > > setup_htb() { > FROM=$1 > TO=$2 > RATE=$3 > > for i in $( seq $FROM $TO ); do > slot=$( printf %x $((i)) ) > > echo class add dev $ETH parent 8000: classid 8000:$slot htb rate > ${RATE} > echo qdisc add dev $ETH parent 8000:$slot handle $slot: codel > done > } > > tc qdisc del dev $ETH root 2>/dev/null > > ( > echo qdisc add dev $ETH root handle 8000: est 1sec 4sec htb r2q 100 > default 1 > setup_htb 1 1024 100Mbit > echo filter add dev $ETH parent 8000: handle 2 pref 20 flow hash keys > src,dst,proto-src,proto-dst baseclass 8000:1 divisor 1024 > ) | tc -b > > > > > -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
On Tue, 2013-07-09 at 16:58 +0100, Sam Crawford wrote: > Thanks very much! One quick kernel upgrade later (to add fq_codel > support) and that has definitely helped. I'll run a larger set of > tests and report back. > > > One final question: I understand that this applies a 100Mbit aggregate > shaper to the specified destination(s). I'd like to instead apply this > shaper on a per-destination or per-flow basis, but without specifying > each individual destination (i.e. so that 10x 100M clients could still > saturate the 1G link). Do you know if this is possible? If you are interested by a qdisc setup you could adapt the following to your needs. It uses hashing so you could potentially have two flows sharing a single bucket. - #!/bin/bash ETH=eth0 setup_htb() { FROM=$1 TO=$2 RATE=$3 for i in $( seq $FROM $TO ); do slot=$( printf %x $((i)) ) echo class add dev $ETH parent 8000: classid 8000:$slot htb rate ${RATE} echo qdisc add dev $ETH parent 8000:$slot handle $slot: codel done } tc qdisc del dev $ETH root 2>/dev/null ( echo qdisc add dev $ETH root handle 8000: est 1sec 4sec htb r2q 100 default 1 setup_htb 1 1024 100Mbit echo filter add dev $ETH parent 8000: handle 2 pref 20 flow hash keys src,dst,proto-src,proto-dst baseclass 8000:1 divisor 1024 ) | tc -b -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
Thanks very much! One quick kernel upgrade later (to add fq_codel support) and that has definitely helped. I'll run a larger set of tests and report back. One final question: I understand that this applies a 100Mbit aggregate shaper to the specified destination(s). I'd like to instead apply this shaper on a per-destination or per-flow basis, but without specifying each individual destination (i.e. so that 10x 100M clients could still saturate the 1G link). Do you know if this is possible? Thanks, Sam On 9 July 2013 16:27, Eric Dumazet wrote: > On Tue, 2013-07-09 at 15:53 +0100, Sam Crawford wrote: > > > > > I've tried dropping the qlen down (even to zero), but to no effect. Is > > this still expected? I've got total control over both client and > > server components, so can change pretty much anything to improve > > matters. > > As the bottleneck is not at the sender, you have no queueing on it, so > qlen has absolutely no effect. > > Thats why you should create the bottleneck on the sender, to properly > control it. > > This script would be a start : > > ETH=eth0 > EST="est 1sec 4sec" > REMOTE_WAN_NET=192.168.9/24 > > tc qd del dev $ETH root 2>/dev/null > > tc qdisc add dev $ETH root handle 1: $EST htb r2q 1000 default 2 > tc class add dev $ETH parent 1: classid 1:1 $EST htb rate 100Mbit > > tc qdisc add dev $ETH parent 1:1 handle 10: $EST fq_codel > > filter add dev $ETH parent 1: protocol ip u32 \ > match ip dst $REMOTE_WAN_NET flowid 1:1 > > > > -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
On Tue, 2013-07-09 at 15:53 +0100, Sam Crawford wrote: > > I've tried dropping the qlen down (even to zero), but to no effect. Is > this still expected? I've got total control over both client and > server components, so can change pretty much anything to improve > matters. As the bottleneck is not at the sender, you have no queueing on it, so qlen has absolutely no effect. Thats why you should create the bottleneck on the sender, to properly control it. This script would be a start : ETH=eth0 EST="est 1sec 4sec" REMOTE_WAN_NET=192.168.9/24 tc qd del dev $ETH root 2>/dev/null tc qdisc add dev $ETH root handle 1: $EST htb r2q 1000 default 2 tc class add dev $ETH parent 1: classid 1:1 $EST htb rate 100Mbit tc qdisc add dev $ETH parent 1:1 handle 10: $EST fq_codel filter add dev $ETH parent 1: protocol ip u32 \ match ip dst $REMOTE_WAN_NET flowid 1:1 -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
> What you are seeing is pretty normal, as the standard pfifo_fast qdisc > allows a queue of up to 1000 packets. > Good to know it's not unheard of! > Using 100Mbit with such amount of queueing 'allows' RTT to grow at > insane levels. And since you are below the nominal WAN bandwidth, you > get no packet losses and 'optimal' tcp throughput. > > As soon as you allow tcp sender to send more packets than real > bandwidth, you experiment packet losses, and if RTT is big, performance > sunks badly (Depending if SACK and/or tcp timestamps are enabled) > SACK and TCP timestamps are enabled. I've tried dropping the qlen down (even to zero), but to no effect. Is this still expected? I've got total control over both client and server components, so can change pretty much anything to improve matters. I advise you use a rate limiter, using HTB + fq_codel. > > (Allow your LAN traffic to reach 1Gb, but shape the traffic meant for > WAN to 100Mbits) > Thanks for the suggestion - I will give that a try. > BTW, 100ms RTT doesn't need 8MB of TCP buffers to fill the pipe. You > only add bufferbloat. > > You theoretically need 1.25 MB (10.000.000 bits) > Thanks, I'm aware. The true RTT to these servers is ~200ms for most clients, and their bandwidth can sometimes be in excess of 200Mbps. > And the ramp up should be much faster than 60 sec ! > That's what I'm really looking for. Right now, the losses occur so early on in slow-start that it means linear growth for a very long duration. Thanks again, Sam -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
On Tue, 2013-07-09 at 14:57 +0100, Sam Crawford wrote: > Hi all, > > This issue persists unfortunately. Attached is a log from an instrumented > TCP server (the sender), logging CWND values and the retransmits. This has > been run on two identical servers on the same switch - one at 100Mbit and > the other at 1Gbit. You can see that a small amount of losses occur after > 1-2 seconds with the 1Gbit setup, limiting the congestion window to ~200 > MSSs. The 100Mbit server is able to hit a CWND of 1092 stably. These > results are highly repeatable. > > TSO/GSO/GRO are disabled on all hosts. Packet captures from both ends are > available upon request. > > Any suggestions gratefully received! What you are seeing is pretty normal, as the standard pfifo_fast qdisc allows a queue of up to 1000 packets. Using 100Mbit with such amount of queueing 'allows' RTT to grow at insane levels. And since you are below the nominal WAN bandwidth, you get no packet losses and 'optimal' tcp throughput. As soon as you allow tcp sender to send more packets than real bandwidth, you experiment packet losses, and if RTT is big, performance sunks badly (Depending if SACK and/or tcp timestamps are enabled) I advise you use a rate limiter, using HTB + fq_codel. (Allow your LAN traffic to reach 1Gb, but shape the traffic meant for WAN to 100Mbits) BTW, 100ms RTT doesn't need 8MB of TCP buffers to fill the pipe. You only add bufferbloat. You theoretically need 1.25 MB (10.000.000 bits) And the ramp up should be much faster than 60 sec ! -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
Hi all, This issue persists unfortunately. Attached is a log from an instrumented TCP server (the sender), logging CWND values and the retransmits. This has been run on two identical servers on the same switch - one at 100Mbit and the other at 1Gbit. You can see that a small amount of losses occur after 1-2 seconds with the 1Gbit setup, limiting the congestion window to ~200 MSSs. The 100Mbit server is able to hit a CWND of 1092 stably. These results are highly repeatable. TSO/GSO/GRO are disabled on all hosts. Packet captures from both ends are available upon request. Any suggestions gratefully received! Sam On 21 May 2013 20:49, Sam Crawford wrote: > Thanks for your reply Jesse. > > I've already tried disabling TSO, GSO and GRO - no joy I'm afraid. > > The qdisc queuing idea was new to me. I tried dropping it down to 100 and > removing it completely, but there was no discernible effect. > > Thanks, > > Sam > > > On 21 May 2013 20:22, Jesse Brandeburg wrote: > >> On Tue, 21 May 2013 19:24:24 +0100 >> Sam Crawford wrote: >> > To be clear, this doesn't just affect this one hosting provider - it >> seems >> > to be common to all of our boxes. The issue only occurs when the sender >> is >> > connected at 1Gbps, the RTT is reasonably high (> ~60ms), and we use >> TCP. >> > >> > By posting here I'm certainly not trying to suggest that the e1000e >> driver >> > is at fault... I'm just running out of ideas and could really use some >> > expert suggestions on where to look next! >> >> I think you're overwhelming some intermediate buffers with send data >> before they can drain, due to the burst send nature of TCP when >> combined with TSO. This is akin to bufferbloat. >> >> Try turning off TSO using ethtool. This will restore the native >> feedback mechanisms of TCP. You may also want to reduce or eliminate >> the send side qdisc queueing (the default is 1000, but you probably >> need a lot less), but I don't think it will help as much. >> >> ethtool -K ethx tso off gso off >> >> you may even want to turn GRO off at both ends, as GRO will be messing >> with your feedback as well. >> >> ethtool -K ethx gro off >> >> I'm a bit surprised that this issue isn't being understood natively by >> the linux stack. That said GRO and TSO are really focused on LAN >> traffic, not WAN. >> >> Jesse >> > > -- See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds. Start your free trial of AppDynamics Pro today! http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
Thanks for your reply Jesse. I've already tried disabling TSO, GSO and GRO - no joy I'm afraid. The qdisc queuing idea was new to me. I tried dropping it down to 100 and removing it completely, but there was no discernible effect. Thanks, Sam On 21 May 2013 20:22, Jesse Brandeburg wrote: > On Tue, 21 May 2013 19:24:24 +0100 > Sam Crawford wrote: > > To be clear, this doesn't just affect this one hosting provider - it > seems > > to be common to all of our boxes. The issue only occurs when the sender > is > > connected at 1Gbps, the RTT is reasonably high (> ~60ms), and we use TCP. > > > > By posting here I'm certainly not trying to suggest that the e1000e > driver > > is at fault... I'm just running out of ideas and could really use some > > expert suggestions on where to look next! > > I think you're overwhelming some intermediate buffers with send data > before they can drain, due to the burst send nature of TCP when > combined with TSO. This is akin to bufferbloat. > > Try turning off TSO using ethtool. This will restore the native > feedback mechanisms of TCP. You may also want to reduce or eliminate > the send side qdisc queueing (the default is 1000, but you probably > need a lot less), but I don't think it will help as much. > > ethtool -K ethx tso off gso off > > you may even want to turn GRO off at both ends, as GRO will be messing > with your feedback as well. > > ethtool -K ethx gro off > > I'm a bit surprised that this issue isn't being understood natively by > the linux stack. That said GRO and TSO are really focused on LAN > traffic, not WAN. > > Jesse > -- Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps
On Tue, 21 May 2013 19:24:24 +0100 Sam Crawford wrote: > To be clear, this doesn't just affect this one hosting provider - it seems > to be common to all of our boxes. The issue only occurs when the sender is > connected at 1Gbps, the RTT is reasonably high (> ~60ms), and we use TCP. > > By posting here I'm certainly not trying to suggest that the e1000e driver > is at fault... I'm just running out of ideas and could really use some > expert suggestions on where to look next! I think you're overwhelming some intermediate buffers with send data before they can drain, due to the burst send nature of TCP when combined with TSO. This is akin to bufferbloat. Try turning off TSO using ethtool. This will restore the native feedback mechanisms of TCP. You may also want to reduce or eliminate the send side qdisc queueing (the default is 1000, but you probably need a lot less), but I don't think it will help as much. ethtool -K ethx tso off gso off you may even want to turn GRO off at both ends, as GRO will be messing with your feedback as well. ethtool -K ethx gro off I'm a bit surprised that this issue isn't being understood natively by the linux stack. That said GRO and TSO are really focused on LAN traffic, not WAN. Jesse -- Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may ___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
[E1000-devel] Higher throughput at 100Mbps than 1Gbps
Hello, We've recently upgraded some hosted (physical) servers from 100Mbps links to 1Gbps links. For the sake of simplicity, I'll say there are two servers in Los Angeles and two in London. Before the upgrade we could get ~96Mbps between all locations over a single TCP stream. We'd reach that speed pretty much straight away (slow start completed within a second or two). We're using TCP cubic with 8MB max send/recv windows. This was true even for the London <-> LA links. After the upgrade we can get ~1Gbps between local servers with the same test case. However, over the WAN (~100ms RTT) we're now struggling to 30-40Mbps over TCP. Throughput will occasionally reach 90Mbps, but is unstable and soon drops down again. The ramp up to 90Mbps (when it does happen) takes around 60 seconds. There is no problem with UDP traffic - we can hit rates well over 100Mbps with almost no loss. To be clear, I'm not expecting to hit 1Gbps between London and LA with 8MB TCP buffers - but I am expecting to hit at least ~96Mbps like I could when the servers were connected at 100Mbps. As soon as we downgrade the sender's port speed to 100Mbps then we're back up to full speed (~96Mbps) immediately with TCP. The network operator assures me there's no QoS policy or traffic policing on their kit, and if there were then it should also affect traffic between adjacent nodes. We're using Xeon E3 and X56xx servers, with 82574L NICs. They're running CentOS 6.4 (64-bit). I've tried the following (all unsuccessfully): - Disabling/enabling TOE features - Applying the EEPROM patch for losses when entering power saving mode - Upgrading from the stock 2.1.4 driver to the 2.3.2 driver - Upgrading the Kernel to 3.9.3 - Many other things (txqueuelen, tx-rings, increasing/reducing TCP window maxes, etc) Packet captures show that losses are clearly occurring, which is preventing TCP from ramping up properly. The graph at http://www.imagebam.com/image/1f8443255679356 shows the sender side traffic profile - you can see the bursty nature of the losses and its effect on TCP. It _looks_ like buffer/QoS behaviour, but I'm not familiar enough with Cisco switching/routing kit to ask the hosting provider sensible questions about this. To be clear, this doesn't just affect this one hosting provider - it seems to be common to all of our boxes. The issue only occurs when the sender is connected at 1Gbps, the RTT is reasonably high (> ~60ms), and we use TCP. By posting here I'm certainly not trying to suggest that the e1000e driver is at fault... I'm just running out of ideas and could really use some expert suggestions on where to look next! Thanks in advance, Sam -- Try New Relic Now & We'll Send You this Cool Shirt New Relic is the only SaaS-based application performance monitoring service that delivers powerful full stack analytics. Optimize and monitor your browser, app, & servers with just a few lines of code. Try New Relic and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired