Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-11 Thread Sam Crawford
Thanks for the pointer to those guidelines, very helpful. I also found the
information at http://ace-host.stuart.id.au/russell/files/tc/doc/cls_u32.txt to
be very useful, although quite outdated.

For reference, the below is what I ended up with:

ETH="eth0"
EST="est 1sec 4sec"
BUCKETS=64
RATE="100Mbit"

tc qd del dev $ETH root 2>/dev/null

tc qdisc add dev $ETH root handle 8000: $EST htb r2q 1000 default 0
tc filter add dev $ETH parent 8000: protocol ip u32
tc filter add dev $ETH parent 8000: handle 2: protocol ip u32 divisor
${BUCKETS}

for i in $( seq 1 $BUCKETS ); do

   BUCKET=$( printf %x $((i)) )
   BUCKETM1=$( printf %x $((i-1)) )

   tc class add dev $ETH parent 8000: classid 8000:${BUCKET} ${EST} htb
rate ${RATE}
   tc qdisc add dev $ETH parent 8000:${BUCKET} handle ${BUCKET}: $EST
fq_codel
   tc filter add dev $ETH protocol ip parent 8000: u32 ht 2:${BUCKETM1}: \
  match u32 0 0 flowid 8000:${BUCKET}

done

tc filter add dev $ETH protocol ip parent 8000: u32 ht 800:: \
   match ip protocol 6 0xff \
   match ip sport 8081 0x \
   hashkey mask 0x at 20 \
   link 2:


It seems to work pretty well in testing so far, but it certainly has some
flaws. The biggest is that hashing is now based solely on destination port
(ephemeral in our case, so not too bad), and it assumes that the IP header
has no options. Also, the filter lists inside the 2: hash table still have
to have a match run against them to direct them to a classid, which seems a
waste, but I can't imagine the always-match rule of "u32 0 0" adds much
overhead.

Thanks again,

Sam



On 10 July 2013 14:34, Eric Dumazet  wrote:

> On Wed, 2013-07-10 at 11:53 +0100, Sam Crawford wrote:
> > Thanks Eric! I've adapted this to the following:
> >
> >
> > ETH="eth1"
> > EST="est 1sec 4sec"
> > BUCKETS=64
> > RATE="100Mbit"
> >
> >
> > tc qd del dev $ETH root 2>/dev/null
> >
> >
> > tc qdisc add dev $ETH root handle 8000: $EST htb r2q 1000 default 8000
> >
> >
> > for i in $( seq 1 $BUCKETS ); do
> >BUCKET=$( printf %x $((i)) )
> >tc class add dev $ETH parent 8000: classid 8000:${BUCKET} ${EST}
> > htb rate ${RATE}
> >tc qdisc add dev $ETH parent 8000:${BUCKET} handle ${BUCKET}: $EST
> > fq_codel
> > done
> >
> >
> > tc filter add dev $ETH parent 8000: handle 7000 protocol ip flow hash
> > keys src,dst,proto-src,proto-dst baseclass 8000:1 divisor ${BUCKETS}
> >
> >
> If dst port is always 8081, you can omit proto-dst from the hash keys
>
>
>
> >
> > This seems to deliver the behaviour I'm looking for - each flow is
> > effectively rate limited to 100Mbit/s, and multiple flows between the
> > same src and dst can achieve this rate (unless they're unlucky and
> > fall into the same bucket, which is not too bad). Have I made any
> > silly mistakes in there?
> >
>
> > The final thing I'm struggling to work out is how to limit this rule
> > to a single service. Ideally the rate limiting should only apply to
> > TCP/8081. It seems you cannot combine different filter types (i.e. the
> > u32 match + the flow match). Any suggestions?
> >
> Please follow guidelines in
>
> http://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.adv-filter.hashing.html
> >
> Because you need a more specialized setup than a single 'flow hash'
>
> > Apologies for going OT on the list... I hope this topic is useful to
> > others Googling for it in the future.
> >
> >
> No problem ;)
>
>
>
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-10 Thread Eric Dumazet
On Wed, 2013-07-10 at 11:53 +0100, Sam Crawford wrote:
> Thanks Eric! I've adapted this to the following:
> 
> 
> ETH="eth1"
> EST="est 1sec 4sec"
> BUCKETS=64
> RATE="100Mbit"
> 
> 
> tc qd del dev $ETH root 2>/dev/null
> 
> 
> tc qdisc add dev $ETH root handle 8000: $EST htb r2q 1000 default 8000
> 
> 
> for i in $( seq 1 $BUCKETS ); do
>BUCKET=$( printf %x $((i)) )
>tc class add dev $ETH parent 8000: classid 8000:${BUCKET} ${EST}
> htb rate ${RATE}
>tc qdisc add dev $ETH parent 8000:${BUCKET} handle ${BUCKET}: $EST
> fq_codel
> done
> 
> 
> tc filter add dev $ETH parent 8000: handle 7000 protocol ip flow hash
> keys src,dst,proto-src,proto-dst baseclass 8000:1 divisor ${BUCKETS}
> 
> 
If dst port is always 8081, you can omit proto-dst from the hash keys



> 
> This seems to deliver the behaviour I'm looking for - each flow is
> effectively rate limited to 100Mbit/s, and multiple flows between the
> same src and dst can achieve this rate (unless they're unlucky and
> fall into the same bucket, which is not too bad). Have I made any
> silly mistakes in there?
> 

> The final thing I'm struggling to work out is how to limit this rule
> to a single service. Ideally the rate limiting should only apply to
> TCP/8081. It seems you cannot combine different filter types (i.e. the
> u32 match + the flow match). Any suggestions?
> 
Please follow guidelines in

http://www.tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.adv-filter.hashing.html
> 
Because you need a more specialized setup than a single 'flow hash'

> Apologies for going OT on the list... I hope this topic is useful to
> others Googling for it in the future.
> 
> 
No problem ;)




--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-10 Thread Sam Crawford
Thanks Eric! I've adapted this to the following:

ETH="eth1"
EST="est 1sec 4sec"
BUCKETS=64
RATE="100Mbit"

tc qd del dev $ETH root 2>/dev/null

tc qdisc add dev $ETH root handle 8000: $EST htb r2q 1000 default 8000

for i in $( seq 1 $BUCKETS ); do
   BUCKET=$( printf %x $((i)) )
   tc class add dev $ETH parent 8000: classid 8000:${BUCKET} ${EST} htb
rate ${RATE}
   tc qdisc add dev $ETH parent 8000:${BUCKET} handle ${BUCKET}: $EST
fq_codel
done

tc filter add dev $ETH parent 8000: handle 7000 protocol ip flow hash keys
src,dst,proto-src,proto-dst baseclass 8000:1 divisor ${BUCKETS}


This seems to deliver the behaviour I'm looking for - each flow is
effectively rate limited to 100Mbit/s, and multiple flows between the same
src and dst can achieve this rate (unless they're unlucky and fall into the
same bucket, which is not too bad). Have I made any silly mistakes in there?

The final thing I'm struggling to work out is how to limit this rule to a
single service. Ideally the rate limiting should only apply to TCP/8081. It
seems you cannot combine different filter types (i.e. the u32 match + the
flow match). Any suggestions?

Apologies for going OT on the list... I hope this topic is useful to others
Googling for it in the future.

Thanks,

Sam




On 9 July 2013 18:13, Eric Dumazet  wrote:

> On Tue, 2013-07-09 at 16:58 +0100, Sam Crawford wrote:
> > Thanks very much! One quick kernel upgrade later (to add fq_codel
> > support) and that has definitely helped. I'll run a larger set of
> > tests and report back.
> >
> >
> > One final question: I understand that this applies a 100Mbit aggregate
> > shaper to the specified destination(s). I'd like to instead apply this
> > shaper on a per-destination or per-flow basis, but without specifying
> > each individual destination (i.e. so that 10x 100M clients could still
> > saturate the 1G link). Do you know if this is possible?
>
> If you are interested by a qdisc setup you could adapt the following to
> your needs. It uses hashing so you could potentially have two flows
> sharing a single bucket.
>
> -
> #!/bin/bash
>
> ETH=eth0
>
> setup_htb() {
> FROM=$1
> TO=$2
> RATE=$3
>
> for i in $( seq $FROM $TO ); do
> slot=$( printf %x $((i)) )
>
> echo class add dev $ETH parent 8000: classid 8000:$slot htb rate
> ${RATE}
> echo qdisc add dev $ETH parent 8000:$slot handle $slot: codel
> done
> }
>
> tc qdisc del dev $ETH root 2>/dev/null
>
> (
>   echo qdisc add dev $ETH root handle 8000: est 1sec 4sec htb r2q 100
> default 1
>   setup_htb 1 1024 100Mbit
>   echo filter add dev $ETH parent 8000: handle 2 pref 20 flow hash keys
> src,dst,proto-src,proto-dst baseclass 8000:1 divisor 1024
> ) | tc -b
>
>
>
>
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-09 Thread Eric Dumazet
On Tue, 2013-07-09 at 16:58 +0100, Sam Crawford wrote:
> Thanks very much! One quick kernel upgrade later (to add fq_codel
> support) and that has definitely helped. I'll run a larger set of
> tests and report back.
> 
> 
> One final question: I understand that this applies a 100Mbit aggregate
> shaper to the specified destination(s). I'd like to instead apply this
> shaper on a per-destination or per-flow basis, but without specifying
> each individual destination (i.e. so that 10x 100M clients could still
> saturate the 1G link). Do you know if this is possible?

If you are interested by a qdisc setup you could adapt the following to
your needs. It uses hashing so you could potentially have two flows
sharing a single bucket.

-
#!/bin/bash

ETH=eth0

setup_htb() {
FROM=$1
TO=$2
RATE=$3

for i in $( seq $FROM $TO ); do
slot=$( printf %x $((i)) )

echo class add dev $ETH parent 8000: classid 8000:$slot htb rate ${RATE}
echo qdisc add dev $ETH parent 8000:$slot handle $slot: codel 
done
}

tc qdisc del dev $ETH root 2>/dev/null

(
  echo qdisc add dev $ETH root handle 8000: est 1sec 4sec htb r2q 100 default 
1
  setup_htb 1 1024 100Mbit
  echo filter add dev $ETH parent 8000: handle 2 pref 20 flow hash keys 
src,dst,proto-src,proto-dst baseclass 8000:1 divisor 1024 
) | tc -b





--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-09 Thread Sam Crawford
Thanks very much! One quick kernel upgrade later (to add fq_codel support)
and that has definitely helped. I'll run a larger set of tests and report
back.

One final question: I understand that this applies a 100Mbit aggregate
shaper to the specified destination(s). I'd like to instead apply this
shaper on a per-destination or per-flow basis, but without specifying each
individual destination (i.e. so that 10x 100M clients could still saturate
the 1G link). Do you know if this is possible?

Thanks,

Sam



On 9 July 2013 16:27, Eric Dumazet  wrote:

> On Tue, 2013-07-09 at 15:53 +0100, Sam Crawford wrote:
>
> >
> > I've tried dropping the qlen down (even to zero), but to no effect. Is
> > this still expected? I've got total control over both client and
> > server components, so can change pretty much anything to improve
> > matters.
>
> As the bottleneck is not at the sender, you have no queueing on it, so
> qlen has absolutely no effect.
>
> Thats why you should create the bottleneck on the sender, to properly
> control it.
>
> This script would be a start :
>
> ETH=eth0
> EST="est 1sec 4sec"
> REMOTE_WAN_NET=192.168.9/24
>
> tc qd del dev $ETH root 2>/dev/null
>
> tc qdisc add dev $ETH root handle 1: $EST htb r2q 1000 default 2
> tc class add dev $ETH parent 1: classid 1:1 $EST htb rate 100Mbit
>
> tc qdisc add dev $ETH parent 1:1 handle 10: $EST fq_codel
>
> filter add dev $ETH parent 1: protocol ip u32 \
> match ip dst $REMOTE_WAN_NET flowid 1:1
>
>
>
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-09 Thread Eric Dumazet
On Tue, 2013-07-09 at 15:53 +0100, Sam Crawford wrote:

> 
> I've tried dropping the qlen down (even to zero), but to no effect. Is
> this still expected? I've got total control over both client and
> server components, so can change pretty much anything to improve
> matters.

As the bottleneck is not at the sender, you have no queueing on it, so
qlen has absolutely no effect.

Thats why you should create the bottleneck on the sender, to properly
control it.

This script would be a start :

ETH=eth0
EST="est 1sec 4sec"
REMOTE_WAN_NET=192.168.9/24

tc qd del dev $ETH root 2>/dev/null

tc qdisc add dev $ETH root handle 1: $EST htb r2q 1000 default 2
tc class add dev $ETH parent 1: classid 1:1 $EST htb rate 100Mbit

tc qdisc add dev $ETH parent 1:1 handle 10: $EST fq_codel

filter add dev $ETH parent 1: protocol ip u32 \
match ip dst $REMOTE_WAN_NET flowid 1:1




--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-09 Thread Sam Crawford
> What you are seeing is pretty normal, as the standard pfifo_fast qdisc
> allows a queue of up to 1000 packets.
>

Good to know it's not unheard of!


> Using 100Mbit with such amount of queueing 'allows' RTT to grow at
> insane levels. And since you are below the nominal WAN bandwidth, you
> get no packet losses and 'optimal' tcp throughput.
>
> As soon as you allow tcp sender to send more packets than real
> bandwidth, you experiment packet losses, and if RTT is big, performance
> sunks badly (Depending if SACK and/or tcp timestamps are enabled)
>

SACK and TCP timestamps are enabled.

I've tried dropping the qlen down (even to zero), but to no effect. Is this
still expected? I've got total control over both client and server
components, so can change pretty much anything to improve matters.

I advise you use a rate limiter, using HTB + fq_codel.
>
> (Allow your LAN traffic to reach 1Gb, but shape the traffic meant for
> WAN to 100Mbits)
>

Thanks for the suggestion - I will give that a try.


> BTW, 100ms RTT doesn't need 8MB of TCP buffers to fill the pipe. You
> only add bufferbloat.
>
> You theoretically need 1.25 MB  (10.000.000 bits)
>

Thanks, I'm aware. The true RTT to these servers is ~200ms for most
clients, and their bandwidth can sometimes be in excess of 200Mbps.


> And the ramp up should be much faster than 60 sec !
>

That's what I'm really looking for. Right now, the losses occur so early on
in slow-start that it means linear growth for a very long duration.

Thanks again,

Sam
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-09 Thread Eric Dumazet
On Tue, 2013-07-09 at 14:57 +0100, Sam Crawford wrote:
> Hi all,
> 
> This issue persists unfortunately. Attached is a log from an instrumented
> TCP server (the sender), logging CWND values and the retransmits. This has
> been run on two identical servers on the same switch - one at 100Mbit and
> the other at 1Gbit. You can see that a small amount of losses occur after
> 1-2 seconds with the 1Gbit setup, limiting the congestion window to ~200
> MSSs. The 100Mbit server is able to hit a CWND of 1092 stably. These
> results are highly repeatable.
> 
> TSO/GSO/GRO are disabled on all hosts. Packet captures from both ends are
> available upon request.
> 
> Any suggestions gratefully received!

What you are seeing is pretty normal, as the standard pfifo_fast qdisc
allows a queue of up to 1000 packets.

Using 100Mbit with such amount of queueing 'allows' RTT to grow at
insane levels. And since you are below the nominal WAN bandwidth, you
get no packet losses and 'optimal' tcp throughput.

As soon as you allow tcp sender to send more packets than real
bandwidth, you experiment packet losses, and if RTT is big, performance
sunks badly (Depending if SACK and/or tcp timestamps are enabled)

I advise you use a rate limiter, using HTB + fq_codel.

(Allow your LAN traffic to reach 1Gb, but shape the traffic meant for
WAN to 100Mbits)

BTW, 100ms RTT doesn't need 8MB of TCP buffers to fill the pipe. You
only add bufferbloat.

You theoretically need 1.25 MB  (10.000.000 bits)

And the ramp up should be much faster than 60 sec !



--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-07-09 Thread Sam Crawford
Hi all,

This issue persists unfortunately. Attached is a log from an instrumented
TCP server (the sender), logging CWND values and the retransmits. This has
been run on two identical servers on the same switch - one at 100Mbit and
the other at 1Gbit. You can see that a small amount of losses occur after
1-2 seconds with the 1Gbit setup, limiting the congestion window to ~200
MSSs. The 100Mbit server is able to hit a CWND of 1092 stably. These
results are highly repeatable.

TSO/GSO/GRO are disabled on all hosts. Packet captures from both ends are
available upon request.

Any suggestions gratefully received!

Sam



On 21 May 2013 20:49, Sam Crawford  wrote:

> Thanks for your reply Jesse.
>
> I've already tried disabling TSO, GSO and GRO - no joy I'm afraid.
>
> The qdisc queuing idea was new to me. I tried dropping it down to 100 and
> removing it completely, but there was no discernible effect.
>
> Thanks,
>
> Sam
>
>
> On 21 May 2013 20:22, Jesse Brandeburg  wrote:
>
>> On Tue, 21 May 2013 19:24:24 +0100
>> Sam Crawford  wrote:
>> > To be clear, this doesn't just affect this one hosting provider - it
>> seems
>> > to be common to all of our boxes. The issue only occurs when the sender
>> is
>> > connected at 1Gbps, the RTT is reasonably high (> ~60ms), and we use
>> TCP.
>> >
>> > By posting here I'm certainly not trying to suggest that the e1000e
>> driver
>> > is at fault... I'm just running out of ideas and could really use some
>> > expert suggestions on where to look next!
>>
>> I think you're overwhelming some intermediate buffers with send data
>> before they can drain, due to the burst send nature of TCP when
>> combined with TSO.  This is akin to bufferbloat.
>>
>> Try turning off TSO using ethtool.  This will restore the native
>> feedback mechanisms of TCP.  You may also want to reduce or eliminate
>> the send side qdisc queueing (the default is 1000, but you probably
>> need a lot less), but I don't think it will help as much.
>>
>> ethtool -K ethx tso off gso off
>>
>> you may even want to turn GRO off at both ends, as GRO will be messing
>> with your feedback as well.
>>
>> ethtool -K ethx gro off
>>
>> I'm a bit surprised that this issue isn't being understood natively by
>> the linux stack.  That said GRO and TSO are really focused on LAN
>> traffic, not WAN.
>>
>> Jesse
>>
>
>
--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-05-21 Thread Sam Crawford
Thanks for your reply Jesse.

I've already tried disabling TSO, GSO and GRO - no joy I'm afraid.

The qdisc queuing idea was new to me. I tried dropping it down to 100 and
removing it completely, but there was no discernible effect.

Thanks,

Sam


On 21 May 2013 20:22, Jesse Brandeburg  wrote:

> On Tue, 21 May 2013 19:24:24 +0100
> Sam Crawford  wrote:
> > To be clear, this doesn't just affect this one hosting provider - it
> seems
> > to be common to all of our boxes. The issue only occurs when the sender
> is
> > connected at 1Gbps, the RTT is reasonably high (> ~60ms), and we use TCP.
> >
> > By posting here I'm certainly not trying to suggest that the e1000e
> driver
> > is at fault... I'm just running out of ideas and could really use some
> > expert suggestions on where to look next!
>
> I think you're overwhelming some intermediate buffers with send data
> before they can drain, due to the burst send nature of TCP when
> combined with TSO.  This is akin to bufferbloat.
>
> Try turning off TSO using ethtool.  This will restore the native
> feedback mechanisms of TCP.  You may also want to reduce or eliminate
> the send side qdisc queueing (the default is 1000, but you probably
> need a lot less), but I don't think it will help as much.
>
> ethtool -K ethx tso off gso off
>
> you may even want to turn GRO off at both ends, as GRO will be messing
> with your feedback as well.
>
> ethtool -K ethx gro off
>
> I'm a bit surprised that this issue isn't being understood natively by
> the linux stack.  That said GRO and TSO are really focused on LAN
> traffic, not WAN.
>
> Jesse
>
--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


Re: [E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-05-21 Thread Jesse Brandeburg
On Tue, 21 May 2013 19:24:24 +0100
Sam Crawford  wrote:
> To be clear, this doesn't just affect this one hosting provider - it seems
> to be common to all of our boxes. The issue only occurs when the sender is
> connected at 1Gbps, the RTT is reasonably high (> ~60ms), and we use TCP.
> 
> By posting here I'm certainly not trying to suggest that the e1000e driver
> is at fault... I'm just running out of ideas and could really use some
> expert suggestions on where to look next!

I think you're overwhelming some intermediate buffers with send data
before they can drain, due to the burst send nature of TCP when
combined with TSO.  This is akin to bufferbloat.

Try turning off TSO using ethtool.  This will restore the native
feedback mechanisms of TCP.  You may also want to reduce or eliminate
the send side qdisc queueing (the default is 1000, but you probably
need a lot less), but I don't think it will help as much.

ethtool -K ethx tso off gso off

you may even want to turn GRO off at both ends, as GRO will be messing
with your feedback as well.

ethtool -K ethx gro off

I'm a bit surprised that this issue isn't being understood natively by
the linux stack.  That said GRO and TSO are really focused on LAN
traffic, not WAN.

Jesse

--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired


[E1000-devel] Higher throughput at 100Mbps than 1Gbps

2013-05-21 Thread Sam Crawford
Hello,

We've recently upgraded some hosted (physical) servers from 100Mbps links
to 1Gbps links. For the sake of simplicity, I'll say there are two servers
in Los Angeles and two in London.

Before the upgrade we could get ~96Mbps between all locations over a single
TCP stream. We'd reach that speed pretty much straight away (slow start
completed within a second or two). We're using TCP cubic with 8MB max
send/recv windows. This was true even for the London <-> LA links.

After the upgrade we can get ~1Gbps between local servers with the same
test case. However, over the WAN (~100ms RTT) we're now struggling to
30-40Mbps over TCP. Throughput will occasionally reach 90Mbps, but is
unstable and soon drops down again. The ramp up to 90Mbps (when it does
happen) takes around 60 seconds. There is no problem with UDP traffic - we
can hit rates well over 100Mbps with almost no loss. To be clear, I'm not
expecting to hit 1Gbps between London and LA with 8MB TCP buffers - but I
am expecting to hit at least ~96Mbps like I could when the servers were
connected at 100Mbps.

As soon as we downgrade the sender's port speed to 100Mbps then we're back
up to full speed (~96Mbps) immediately with TCP. The network operator
assures me there's no QoS policy or traffic policing on their kit, and if
there were then it should also affect traffic between adjacent nodes.

We're using Xeon E3 and X56xx servers, with 82574L NICs. They're running
CentOS 6.4 (64-bit).

I've tried the following (all unsuccessfully):
- Disabling/enabling TOE features
- Applying the EEPROM patch for losses when entering power saving mode
- Upgrading from the stock 2.1.4 driver to the 2.3.2 driver
- Upgrading the Kernel to 3.9.3
- Many other things (txqueuelen, tx-rings, increasing/reducing TCP window
maxes, etc)

Packet captures show that losses are clearly occurring, which is preventing
TCP from ramping up properly. The graph at
http://www.imagebam.com/image/1f8443255679356 shows the sender side traffic
profile - you can see the bursty nature of the losses and its effect on
TCP. It _looks_ like buffer/QoS behaviour, but I'm not familiar enough with
Cisco switching/routing kit to ask the hosting provider sensible questions
about this.

To be clear, this doesn't just affect this one hosting provider - it seems
to be common to all of our boxes. The issue only occurs when the sender is
connected at 1Gbps, the RTT is reasonably high (> ~60ms), and we use TCP.

By posting here I'm certainly not trying to suggest that the e1000e driver
is at fault... I'm just running out of ideas and could really use some
expert suggestions on where to look next!

Thanks in advance,

Sam
--
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired