Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Thu, 20 Dec 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)
 
  [PATCH] [TCP]: Fix TSO deferring
  
  I'd say that most of what tcp_tso_should_defer had in between
  there was dead code because of this.
  
  Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
 
 Yikes!
 
 John, we've been living a lie for more than a year. :-/
 
 On the bright side this explains a lot of small TSO frames I've been
 seeing in traces over the past year but never got a chance to
 investigate.
 
  diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
  index 8dafda9..693b9f6 100644
  --- a/net/ipv4/tcp_output.c
  +++ b/net/ipv4/tcp_output.c
  @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, 
  struct sk_buff *skb)
  goto send_now;
   
  /* Defer for less than two clock ticks. */
  -   if (!tp-tso_deferred  ((jiffies1)1) - (tp-tso_deferred1)  1)
  +   if (tp-tso_deferred 
  +   ((jiffies  1)  1) - (tp-tso_deferred  1)  1)
  goto send_now;
   
  in_flight = tcp_packets_in_flight(tp);

I meant to ask about this a while back but then got distracted by
other things.  But now since the subject has come up, I had a couple
of more questions about this code.

What's with all the shifting back and forth?  Here with:

((jiffies1)1) - (tp-tso_deferred1)

and later with:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = 1 | (jiffies1);

Is this just done to try and avoid the special case of jiffies==0 
when the jiffies wrap?  If so it seems like a lot of unnecessary
work just to avoid a 1 in 4 billion event, since it's my understanding
that the whole tcp_tso_should_defer function is just an optimization
and not a criticality to the proper functioning of TCP, especially
considering it hasn't even been executing at all up to now.

My second question is more basic and if I'm not mistaken actually
relates to a remaining bug in the (corrected) test:

/* Defer for less than two clock ticks. */
if (tp-tso_deferred 
((jiffies  1)  1) - (tp-tso_deferred  1)  1)

Since jiffies is an unsigned long, which is 64-bits on a 64-bit system,
whereas tp-tso_deferred is a u32, once jiffies exceeds 31-bits, which
will happen in about 25 days if HZ=1000, won't the second part of the
test always be true after that?  Or am I missing something obvious?

-Bill
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Bill Fink wrote:

 I meant to ask about this a while back but then got distracted by
 other things.  But now since the subject has come up, I had a couple
 of more questions about this code.
 
 What's with all the shifting back and forth?  Here with:
 
   ((jiffies1)1) - (tp-tso_deferred1)
 
 and later with:
 
   /* Ok, it looks like it is advisable to defer.  */
   tp-tso_deferred = 1 | (jiffies1);
 
 Is this just done to try and avoid the special case of jiffies==0 
 when the jiffies wrap? 

I thought that it must be the reason. I couldn't figure out any other 
explination while thinking the same thing but since I saw no problem in 
that rather weird approach, I didn't want to touch that in a patch which 
had net-2.6 (or stable) potential.

 If so it seems like a lot of unnecessary
 work just to avoid a 1 in 4 billion event, since it's my understanding
 that the whole tcp_tso_should_defer function is just an optimization
 and not a criticality to the proper functioning of TCP, especially
 considering it hasn't even been executing at all up to now.

It would still be good to not return 1 in that case we didn't flag the 
deferral, how about adding one additional tick (+comment) in the zero 
case making tso_deferred == 0 again unambiguously defined, e.g.:

tp-tso_deferred = min_t(u32, jiffies, 1);

...I'm relatively sure that nobody would ever notice that tick :-) and we 
kept return value consistent with tso_deferred state invariant.

 My second question is more basic and if I'm not mistaken actually
 relates to a remaining bug in the (corrected) test:
 
   /* Defer for less than two clock ticks. */
   if (tp-tso_deferred 
   ((jiffies  1)  1) - (tp-tso_deferred  1)  1)
 
 Since jiffies is an unsigned long, which is 64-bits on a 64-bit system,
 whereas tp-tso_deferred is a u32, once jiffies exceeds 31-bits, which
 will happen in about 25 days if HZ=1000, won't the second part of the
 test always be true after that?  Or am I missing something obvious?

I didn't notice that earlier but I think you're right though my knowledge 
about jiffies and such is quite limited.

...Feel free to submit a patch to correct these.


-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread David Miller
From: Bill Fink [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 03:06:48 -0500

 What's with all the shifting back and forth?  Here with:
 
   ((jiffies1)1) - (tp-tso_deferred1)
 
 and later with:
 
   /* Ok, it looks like it is advisable to defer.  */
   tp-tso_deferred = 1 | (jiffies1);
 
 Is this just done to try and avoid the special case of jiffies==0 
 when the jiffies wrap?  If so it seems like a lot of unnecessary
 work just to avoid a 1 in 4 billion event, since it's my understanding
 that the whole tcp_tso_should_defer function is just an optimization
 and not a criticality to the proper functioning of TCP, especially
 considering it hasn't even been executing at all up to now.

How else would you avoid the incorrect result when jiffies is
indeed zero?

It's two shifts, and this gets scheduled along with the other
instructions on many cpus so it's effectively free.

I don't see why this is even worth mentioning and discussing.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

 On Fri, 21 Dec 2007, Bill Fink wrote:
 
  If so it seems like a lot of unnecessary
  work just to avoid a 1 in 4 billion event, since it's my understanding
  that the whole tcp_tso_should_defer function is just an optimization
  and not a criticality to the proper functioning of TCP, especially
  considering it hasn't even been executing at all up to now.
 
 It would still be good to not return 1 in that case we didn't flag the 
 deferral, how about adding one additional tick (+comment) in the zero 
 case making tso_deferred == 0 again unambiguously defined, e.g.:
 
   tp-tso_deferred = min_t(u32, jiffies, 1);

Blah, max_t of course :-).


-- 
 i.

Re: TSO trimming question

2007-12-21 Thread Herbert Xu
On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:

 It's two shifts, and this gets scheduled along with the other
 instructions on many cpus so it's effectively free.
 
 I don't see why this is even worth mentioning and discussing.

I totally agree.  Two shifts are way better than a branch.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Fri, 21 Dec 2007 17:29:27 +0800

 On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
 
  It's two shifts, and this gets scheduled along with the other
  instructions on many cpus so it's effectively free.
  
  I don't see why this is even worth mentioning and discussing.
 
 I totally agree.  Two shifts are way better than a branch.

We take probably a thousand+ 100+ cycle cache misses in the TCP stack
on big window openning ACKs.

Instead of discussing ways to solve that huge performance killer we're
wanking about two friggin' integer shifts.

It's hilarious isn't it? :-)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, David Miller wrote:

 From: Herbert Xu [EMAIL PROTECTED]
 Date: Fri, 21 Dec 2007 17:29:27 +0800
 
  On Fri, Dec 21, 2007 at 01:27:20AM -0800, David Miller wrote:
  
   It's two shifts, and this gets scheduled along with the other
   instructions on many cpus so it's effectively free.
   
   I don't see why this is even worth mentioning and discussing.
  
  I totally agree.  Two shifts are way better than a branch.
 
 We take probably a thousand+ 100+ cycle cache misses in the TCP stack
 on big window openning ACKs.
 
 Instead of discussing ways to solve that huge performance killer we're
 wanking about two friggin' integer shifts.
 
 It's hilarious isn't it? :-)

I don't think obfuscated code is hilarious.  Instead of the convoluted
and dense code:

/* Defer for less than two clock ticks. */
if (tp-tso_deferred 
((jiffies  1)  1) - (tp-tso_deferred  1)  1)

You can have the much simpler and more easily understandable:

/* Defer for less than two clock ticks. */
if (tp-tso_deferred  (jiffies - tp-tso_deferred)  1)

And instead of:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = 1 | (jiffies1);

return 1;

You could do as Ilpo suggested:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = max_t(u32, jiffies, 1);

return 1;

Or perhaps more efficiently:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = jiffies;
if (unlikely(jiffies == 0))
tp-tso_deferred = 1;

return 1;

Or perhaps even:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = jiffies;

/* need to return a non-zero value to defer, which means won't
 * defer if jiffies == 0 but it's only a 1 in 4 billion event
 * (and avoids a compare/branch by not checking jiffies)
 /
return jiffies;

Since it really only needs a non-zero return value to defer.

See, no branches needed and much clearer code.  That seems worthwhile
to me from a code maintenance standpoint, even if it isn't any speed
improvement.

And what about the 64-bit jiffies versus 32-bit tp-tso_deferred issue?
Should tso_deferred be made unsigned long to match jiffies?

-Bill
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, Bill Fink wrote:

 Or perhaps even:
 
   /* Ok, it looks like it is advisable to defer.  */
   tp-tso_deferred = jiffies;
 
   /* need to return a non-zero value to defer, which means won't
* defer if jiffies == 0 but it's only a 1 in 4 billion event
* (and avoids a compare/branch by not checking jiffies)
/
 return jiffies;

Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
How about:

/* Ok, it looks like it is advisable to defer.  */
tp-tso_deferred = jiffies;

/* this won't defer if jiffies == 0 but it's only a 1 in
 * 4 billion event (and avoids a branch)
 */
return (jiffies != 0);

-Bill
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Ilpo Järvinen
On Fri, 21 Dec 2007, Bill Fink wrote:

 On Fri, 21 Dec 2007, Bill Fink wrote:
 
  Or perhaps even:
  
  /* Ok, it looks like it is advisable to defer.  */
  tp-tso_deferred = jiffies;
  
  /* need to return a non-zero value to defer, which means won't
   * defer if jiffies == 0 but it's only a 1 in 4 billion event
   * (and avoids a compare/branch by not checking jiffies)
   /
  return jiffies;
 
 Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
 How about:
 
   /* Ok, it looks like it is advisable to defer.  */
   tp-tso_deferred = jiffies;
 
   /* this won't defer if jiffies == 0 but it's only a 1 in
* 4 billion event (and avoids a branch)
*/
   return (jiffies != 0);

I'm not sure how the jiffies work but is this racy as well?

Simple return tp-tso_deferred; should work, shouldn't it? :-)


-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-21 Thread Bill Fink
On Fri, 21 Dec 2007, Ilpo Järvinen wrote:

 On Fri, 21 Dec 2007, Bill Fink wrote:
 
  On Fri, 21 Dec 2007, Bill Fink wrote:
  
   Or perhaps even:
   
 /* Ok, it looks like it is advisable to defer.  */
 tp-tso_deferred = jiffies;
   
 /* need to return a non-zero value to defer, which means won't
  * defer if jiffies == 0 but it's only a 1 in 4 billion event
  * (and avoids a compare/branch by not checking jiffies)
  /
   return jiffies;
  
  Ack.  I introduced my own 64-bit to 32-bit issue (too late at night).
  How about:
  
  /* Ok, it looks like it is advisable to defer.  */
  tp-tso_deferred = jiffies;
  
  /* this won't defer if jiffies == 0 but it's only a 1 in
   * 4 billion event (and avoids a branch)
   */
  return (jiffies != 0);
 
 I'm not sure how the jiffies work but is this racy as well?
 
 Simple return tp-tso_deferred; should work, shouldn't it? :-)

As long as tp-tso_deferred remains u32, pending the other issue.

-Bill
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread Ilpo Järvinen
On Wed, 19 Dec 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Wed, 19 Dec 2007 23:46:33 +0200 (EET)
 
  I'm not fully sure what's purpose of this code in tcp_write_xmit:
  
 if (skb-len  limit) {
 unsigned int trim = skb-len % mss_now;
  
 if (trim)
 limit = skb-len - trim;
  }
  
  Is it used to make sure we send only multiples of mss_now here and leave 
  the left-over into another skb?

Yeah, I now understand that this part is correct. I somehow got such 
impression while trying to figure this out that it ends up being dead code 
but that wasn't correct thought from my side. However, it caught my 
attention and after some thinking I'd say there's more to handle here 
(covered by the second question).

Also note that patch I sent earlier is not right either but needs some 
refining to do the right thing.

  Or does it try to make sure that
  tso_fragment result honors multiple of mss_now boundaries when snd_wnd
  is the limitting factor? For latter IMHO this would be necessary:
  
  if (skb-len  limit)
  limit -= limit % mss_now;
 
 The purpose of the test is to make sure we process tail sub-mss chunks
 correctly wrt. Nagle, which most closely matches the first purpose
 you've listed.
 
 So I think the calculation really does belong where it is.

 Because of the way that the sendmsg() super-skb formation logic
 works, we always will tack on more data and grow the tail
 SKB before creating a new one.  So any sub-mss chunk at the
 end of a TSO frame really is at the end of the write queue
 and really should get nagle processing.

Yes, I now agree this is fully correct for this task.

 Actually, there is an exception, which is when we run out of
 skb_frag_list slots.  In that case we'll potentially have breaks at
 odd boundaries in the middle of the queue.  But this can only happen
 in exceptional cases (user does tons of 1-byte sendfile()'s over
 random non-consequetive locations of a file) or outright bugs
 (MAX_SKB_FRAGS is defined incorrectly, for example) and thus this
 situation is not worth coding for.

That's not the only case, IMHO if there's odd boundary due to 
snd_una+snd_wnd - skb-seq limit (done in tcp_window_allows()), we don't 
consider it as odd but break the skb at arbitary point resulting
two small segments to the network, and what's worse, when the later skb 
resulting from the first split is matching skb-len  limit check as well 
causing an unnecessary small skb to be created for nagle purpose too, 
solving it fully requires some thought in case the mss_now != mss_cache 
even if non-odd boundaries are honored in the middle of skb.

Though whether we get there is depending on what tcp_tso_should_defer() 
decided. Hmm, there seems to be an unrelated bug in it as well :-/. A 
patch below. Please consider the fact that enabling TSO deferring may 
have some unpleasant effect to TCP dynamics, considering that I don't find 
stable mandatory for this to avoid breaking, besides things have been 
working quite well without it too... Only compile tested.

-- 
 i.


--
[PATCH] [TCP]: Fix TSO deferring

I'd say that most of what tcp_tso_should_defer had in between
there was dead code because of this.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_output.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8dafda9..693b9f6 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct 
sk_buff *skb)
goto send_now;
 
/* Defer for less than two clock ticks. */
-   if (!tp-tso_deferred  ((jiffies1)1) - (tp-tso_deferred1)  1)
+   if (tp-tso_deferred 
+   ((jiffies  1)  1) - (tp-tso_deferred  1)  1)
goto send_now;
 
in_flight = tcp_packets_in_flight(tp);
-- 
1.5.0.6


Re: TSO trimming question

2007-12-20 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)

 [PATCH] [TCP]: Fix TSO deferring
 
 I'd say that most of what tcp_tso_should_defer had in between
 there was dead code because of this.
 
 Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]

Yikes!

John, we've been living a lie for more than a year. :-/

On the bright side this explains a lot of small TSO frames I've been
seeing in traces over the past year but never got a chance to
investigate.

 diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
 index 8dafda9..693b9f6 100644
 --- a/net/ipv4/tcp_output.c
 +++ b/net/ipv4/tcp_output.c
 @@ -1217,7 +1217,8 @@ static int tcp_tso_should_defer(struct sock *sk, struct 
 sk_buff *skb)
   goto send_now;
  
   /* Defer for less than two clock ticks. */
 - if (!tp-tso_deferred  ((jiffies1)1) - (tp-tso_deferred1)  1)
 + if (tp-tso_deferred 
 + ((jiffies  1)  1) - (tp-tso_deferred  1)  1)
   goto send_now;
  
   in_flight = tcp_packets_in_flight(tp);
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)

 That's not the only case, IMHO if there's odd boundary due to 
 snd_una+snd_wnd - skb-seq limit (done in tcp_window_allows()), we don't 
 consider it as odd but break the skb at arbitary point resulting
 two small segments to the network, and what's worse, when the later skb 
 resulting from the first split is matching skb-len  limit check as well 
 causing an unnecessary small skb to be created for nagle purpose too, 
 solving it fully requires some thought in case the mss_now != mss_cache 
 even if non-odd boundaries are honored in the middle of skb.

In the most ideal sense, tcp_window_allows() should probably
be changed to only return MSS multiples.

Unfortunately this would add an expensive modulo operation,
however I think it would elimiate this problem case.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread Ilpo Järvinen
On Thu, 20 Dec 2007, David Miller wrote:

 From: Ilpo_Järvinen [EMAIL PROTECTED]
 Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)
 
  That's not the only case, IMHO if there's odd boundary due to 
  snd_una+snd_wnd - skb-seq limit (done in tcp_window_allows()), we don't 
  consider it as odd but break the skb at arbitary point resulting
  two small segments to the network, and what's worse, when the later skb 
  resulting from the first split is matching skb-len  limit check as well 
  causing an unnecessary small skb to be created for nagle purpose too, 
  solving it fully requires some thought in case the mss_now != mss_cache 
  even if non-odd boundaries are honored in the middle of skb.
 
 In the most ideal sense, tcp_window_allows() should probably
 be changed to only return MSS multiples.

That's what Herbert suggested already, I'll send a patch later
on... :-)

 Unfortunately this would add an expensive modulo operation,
 however I think it would elimiate this problem case.

Yes. Should we still call tcp_minshall_update() if split in the middle of 
wq results in smaller than MSS tail (occurs only if mss_now != mss_cache)?

-- 
 i.

Re: TSO trimming question

2007-12-20 Thread Herbert Xu
On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote:

 In the most ideal sense, tcp_window_allows() should probably
 be changed to only return MSS multiples.
 
 Unfortunately this would add an expensive modulo operation,
 however I think it would elimiate this problem case.

Well you only have to divide in the unlikely case of us being
limited by the receiver window.  In that case speed is probably
not of the essence anyway.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread John Heffner

David Miller wrote:

From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)


[PATCH] [TCP]: Fix TSO deferring

I'd say that most of what tcp_tso_should_defer had in between
there was dead code because of this.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]


Yikes!

John, we've been living a lie for more than a year. :-/

On the bright side this explains a lot of small TSO frames I've been
seeing in traces over the past year but never got a chance to
investigate.


Ouch.  This fix may improve some benchmarks.

Re-checking this function was on my list of things to do because I had 
also noticed some TSO frames that seemed a bit small.  This clearly 
explains it.


  -John
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 20 Dec 2007 22:00:12 +0800

 On Thu, Dec 20, 2007 at 04:00:37AM -0800, David Miller wrote:
 
  In the most ideal sense, tcp_window_allows() should probably
  be changed to only return MSS multiples.
  
  Unfortunately this would add an expensive modulo operation,
  however I think it would elimiate this problem case.
 
 Well you only have to divide in the unlikely case of us being
 limited by the receiver window.  In that case speed is probably
 not of the essence anyway.

Agreed, to some extent.

I say to some extent because it might be realistic, with
lots (millions) of sockets to hit this case a lot.

There are so many things that are a don't care performance
wise until you have a lot of stinky connections over crappy
links.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-20 Thread David Miller
From: John Heffner [EMAIL PROTECTED]
Date: Thu, 20 Dec 2007 11:02:21 -0500

 David Miller wrote:
  From: Ilpo_Järvinen [EMAIL PROTECTED]
  Date: Thu, 20 Dec 2007 13:40:51 +0200 (EET)
  
  [PATCH] [TCP]: Fix TSO deferring
 
  I'd say that most of what tcp_tso_should_defer had in between
  there was dead code because of this.
 
  Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
  
  Yikes!
  
  John, we've been living a lie for more than a year. :-/
  
  On the bright side this explains a lot of small TSO frames I've been
  seeing in traces over the past year but never got a chance to
  investigate.
 
 Ouch.  This fix may improve some benchmarks.
 
 Re-checking this function was on my list of things to do because I had 
 also noticed some TSO frames that seemed a bit small.  This clearly 
 explains it.

What I'll do for now is I'll put this into net-2.6.25 to let it
cook for a while.  It's an obvious fix but it's enabling code
that's effectively been disabled for more than a year so something
might turn up that we need to fix.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TSO trimming question

2007-12-19 Thread Ilpo Järvinen
Hi all,

I'm not fully sure what's purpose of this code in tcp_write_xmit:

   if (skb-len  limit) {
   unsigned int trim = skb-len % mss_now;

   if (trim)
   limit = skb-len - trim;
}

Is it used to make sure we send only multiples of mss_now here and leave 
the left-over into another skb? Or does it try to make sure that
tso_fragment result honors multiple of mss_now boundaries when snd_wnd
is the limitting factor? For latter IMHO this would be necessary:

if (skb-len  limit)
limit -= limit % mss_now;


-- 
 i.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-19 Thread Ilpo Järvinen
On Wed, 19 Dec 2007, Ilpo Järvinen wrote:

 I'm not fully sure what's purpose of this code in tcp_write_xmit:
 
if (skb-len  limit) {
unsigned int trim = skb-len % mss_now;
 
if (trim)
limit = skb-len - trim;
   }
 
 Is it used to make sure we send only multiples of mss_now here and leave 
 the left-over into another skb? Or does it try to make sure that
 tso_fragment result honors multiple of mss_now boundaries when snd_wnd
 is the limitting factor? For latter IMHO this would be necessary:
 
   if (skb-len  limit)
   limit -= limit % mss_now;

...Anyway, here's an untested patch to just do it :-):

-- 
 i.


[PATCH] [TCP]: Force tso skb split to mss_now boundary (if snd_wnd limits)

If snd_wnd was limitting factor, the tso_fragment might not
create full-sized skbs but would include the window left-overs
as well.

Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED]
---
 net/ipv4/tcp_output.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 1c7ef17..8dafda9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1453,6 +1453,8 @@ static int tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle)
 
if (trim)
limit = skb-len - trim;
+   } else if (skb-len  limit) {
+   limit -= limit % mss_now;
}
}
 
@@ -1525,6 +1527,8 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)
 
if (trim)
limit = skb-len - trim;
+   } else if (skb-len  limit) {
+   limit -= limit % mss_now;
}
}
 
-- 
1.5.0.6


Re: TSO trimming question

2007-12-19 Thread Herbert Xu
Ilpo J??rvinen [EMAIL PROTECTED] wrote:

 is the limitting factor? For latter IMHO this would be necessary:
 
   if (skb-len  limit)
   limit -= limit % mss_now;

Good catch! But how about putting this logic into tcp_window_allows
since then you can avoid the divide for the case when we're not
bound by the receiver window.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED]
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-19 Thread David Miller
From: Ilpo_Järvinen [EMAIL PROTECTED]
Date: Wed, 19 Dec 2007 23:46:33 +0200 (EET)

 I'm not fully sure what's purpose of this code in tcp_write_xmit:
 
if (skb-len  limit) {
unsigned int trim = skb-len % mss_now;
 
if (trim)
limit = skb-len - trim;
   }
 
 Is it used to make sure we send only multiples of mss_now here and leave 
 the left-over into another skb? Or does it try to make sure that
 tso_fragment result honors multiple of mss_now boundaries when snd_wnd
 is the limitting factor? For latter IMHO this would be necessary:
 
   if (skb-len  limit)
   limit -= limit % mss_now;

The purpose of the test is to make sure we process tail sub-mss chunks
correctly wrt. Nagle, which most closely matches the first purpose
you've listed.

So I think the calculation really does belong where it is.

Because of the way that the sendmsg() super-skb formation logic
works, we always will tack on more data and grow the tail
SKB before creating a new one.  So any sub-mss chunk at the
end of a TSO frame really is at the end of the write queue
and really should get nagle processing.

Actually, there is an exception, which is when we run out of
skb_frag_list slots.  In that case we'll potentially have breaks at
odd boundaries in the middle of the queue.  But this can only happen
in exceptional cases (user does tons of 1-byte sendfile()'s over
random non-consequetive locations of a file) or outright bugs
(MAX_SKB_FRAGS is defined incorrectly, for example) and thus this
situation is not worth coding for.

sendmsg()'s goal, when TSO is enabled, is to append to the write
queue tail SKB, and it does so until the MAX_SKB_FRAGS limit
is reached.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TSO trimming question

2007-12-19 Thread David Miller
From: Herbert Xu [EMAIL PROTECTED]
Date: Thu, 20 Dec 2007 11:26:18 +0800

 Ilpo J??rvinen [EMAIL PROTECTED] wrote:
 
  is the limitting factor? For latter IMHO this would be necessary:
  
if (skb-len  limit)
limit -= limit % mss_now;
 
 Good catch! But how about putting this logic into tcp_window_allows
 since then you can avoid the divide for the case when we're not
 bound by the receiver window.

Because the purpose of this calculation is different, as I
described in my other reply, I don't think this change is
correct, regardless of where it is placed :-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html