Re: TCP stalls in current git, possibly splice related

2007-07-19 Thread Johannes Berg
On Wed, 2007-07-18 at 12:58 +0200, Johannes Berg wrote:

 Yeah, I only need to try to remember when I last *didn't* see the
 problem, and hope that my analysis about it being on the receiver side
 is correct :)

Hm. I went back to 2.6.20 on the receiver side and still saw it. Maybe I
need to go back further?

With strace (the synergyc process) and tcpdump (on the interface) I
continually see packets coming in but the read sometimes pauses for 8
seconds... Not sure why. Very weird, but right now I don't have enough
time for investigating.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-18 Thread Johannes Berg
On Mon, 2007-07-16 at 14:02 +0200, Jens Axboe wrote:

 Yep, it's a sender thing, so upgrading the receiver will not change
 anything.

Ok, I upgraded, but that didn't help. And in fact, I don't see how it
could have since synergy doesn't use splice or sendfile. I should've
thought of that right away, sorry.

It seems that packets are actually coming in during the time that my
mouse hangs though (ran wireshark in parallel and saw no pauses in the
timeline.) Hence, it actually seems to be on the receiver side, and
running the synergy client under strace reveals that during the time my
mouse hangs it's in poll() waiting for input on the tcp socket. sysrg-t
doesn't show anything useful, it's just scheduling waiting for data.
According to wireshark data is sent, but it never shows up at the
application layer.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-18 Thread Jens Axboe
On Wed, Jul 18 2007, Johannes Berg wrote:
 On Mon, 2007-07-16 at 14:02 +0200, Jens Axboe wrote:
 
  Yep, it's a sender thing, so upgrading the receiver will not change
  anything.
 
 Ok, I upgraded, but that didn't help. And in fact, I don't see how it
 could have since synergy doesn't use splice or sendfile. I should've
 thought of that right away, sorry.
 
 It seems that packets are actually coming in during the time that my
 mouse hangs though (ran wireshark in parallel and saw no pauses in the
 timeline.) Hence, it actually seems to be on the receiver side, and
 running the synergy client under strace reveals that during the time my
 mouse hangs it's in poll() waiting for input on the tcp socket. sysrg-t
 doesn't show anything useful, it's just scheduling waiting for data.
 According to wireshark data is sent, but it never shows up at the
 application layer.

OK, then we can put splice off the hook at least :-)

If it's easily reproducible (and it sounds like it), then a git bisect
might be the easiest way forward.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-18 Thread Johannes Berg
On Wed, 2007-07-18 at 12:49 +0200, Jens Axboe wrote:

 OK, then we can put splice off the hook at least :-)

:)

 If it's easily reproducible (and it sounds like it), then a git bisect
 might be the easiest way forward.

Yeah, I only need to try to remember when I last *didn't* see the
problem, and hope that my analysis about it being on the receiver side
is correct :)

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-16 Thread Johannes Berg
On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote:
 On Fri, Jul 13 2007, Johannes Berg wrote:
  On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
   I'm seeing TCP connection stalls with current git, and a bisect found the 
   following as a possible cause:
  
  I'm also seeing stalls with synergy.
 
 Please try the below.

I usually see stalls sending data from machine A to machine B. I just
upgraded machine B to commit 8d9107e8c50e1c4ff43c91c8841805833f3ecfb9
(Fri Jul 13 16:53:18 2007 -0700) and the tree includes this patch. This
did not, however, fix the problem so I guess the problem is on the
sender side. Due to wireless-dev lagging behind I there still have
2.6.22-something where this patch doesn't apply. I hope John will merge
again soon and then I'll report back whether I still see the stalling
behaviour with a current tree that includes this patch.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-16 Thread Jens Axboe
On Mon, Jul 16 2007, Johannes Berg wrote:
 On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote:
  On Fri, Jul 13 2007, Johannes Berg wrote:
   On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
I'm seeing TCP connection stalls with current git, and a bisect found 
the 
following as a possible cause:
   
   I'm also seeing stalls with synergy.
  
  Please try the below.
 
 I usually see stalls sending data from machine A to machine B. I just
 upgraded machine B to commit 8d9107e8c50e1c4ff43c91c8841805833f3ecfb9
 (Fri Jul 13 16:53:18 2007 -0700) and the tree includes this patch. This
 did not, however, fix the problem so I guess the problem is on the
 sender side. Due to wireless-dev lagging behind I there still have
 2.6.22-something where this patch doesn't apply. I hope John will merge
 again soon and then I'll report back whether I still see the stalling
 behaviour with a current tree that includes this patch.

Yep, it's a sender thing, so upgrading the receiver will not change
anything.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Jens Axboe
On Fri, Jul 13 2007, Jens Axboe wrote:
 On Thu, Jul 12 2007, James Morris wrote:
  On Thu, 12 Jul 2007, David Miller wrote:
  
   From: James Morris [EMAIL PROTECTED]
   Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)
   
I'm seeing TCP connection stalls with current git, and a bisect found 
the 
following as a possible cause:
   
   To add to this James is seeing this with distcc I believe.
  
  Correct.
 
 I'll try and reproduce.

You didn't happen to get a sysrq-t backtrace of that distcc being hung,
did you?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread James Morris
On Fri, 13 Jul 2007, Jens Axboe wrote:

 On Fri, Jul 13 2007, Johannes Berg wrote:
  On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
   I'm seeing TCP connection stalls with current git, and a bisect found the 
   following as a possible cause:
  
  I'm also seeing stalls with synergy.
 
 Please try the below.

Works for me.

-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Jens Axboe
On Fri, Jul 13 2007, Johannes Berg wrote:
 On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
  I'm seeing TCP connection stalls with current git, and a bisect found the 
  following as a possible cause:
 
 I'm also seeing stalls with synergy.

Please try the below.

diff --git a/fs/splice.c b/fs/splice.c
index ed2ce99..92646aa 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -491,7 +491,7 @@ ssize_t generic_file_splice_read(struct file *in, loff_t 
*ppos,
 
ret = 0;
spliced = 0;
-   while (len) {
+   while (len  !spliced) {
ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
 
if (ret  0)
@@ -1051,15 +1051,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
sd-flags = ~SPLICE_F_NONBLOCK;
 
while (len) {
-   size_t read_len, max_read_len;
-
-   /*
-* Do at most PIPE_BUFFERS pages worth of transfer:
-*/
-   max_read_len = min(len, (size_t)(PIPE_BUFFERS*PAGE_SIZE));
+   size_t read_len;
 
-   ret = do_splice_to(in, sd-pos, pipe, max_read_len, flags);
-   if (unlikely(ret  0))
+   ret = do_splice_to(in, sd-pos, pipe, len, flags);
+   if (unlikely(ret = 0))
goto out_release;
 
read_len = ret;
@@ -1071,26 +1066,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
 * could get stuck data in the internal pipe:
 */
ret = actor(pipe, sd);
-   if (unlikely(ret  0))
+   if (unlikely(ret = 0))
goto out_release;
 
bytes += ret;
len -= ret;
 
-   /*
-* In nonblocking mode, if we got back a short read then
-* that was due to either an IO error or due to the
-* pagecache entry not being there. In the IO error case
-* the _next_ splice attempt will produce a clean IO error
-* return value (not a short read), so in both cases it's
-* correct to break out of the loop here:
-*/
-   if ((flags  SPLICE_F_NONBLOCK)  (read_len  max_read_len))
-   break;
+   if (ret  read_len)
+   goto out_release;
}
 
pipe-nrbufs = pipe-curbuf = 0;
-
return bytes;
 
 out_release:
@@ -1152,10 +1138,12 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
struct file *out,
.pos= *ppos,
.u.file = out,
};
-   size_t ret;
+   long ret;
 
ret = splice_direct_to_actor(in, sd, direct_splice_actor);
-   *ppos = sd.pos;
+   if (ret  0)
+   *ppos += ret;
+
return ret;
 }
 

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Jens Axboe
On Fri, Jul 13 2007, Jens Axboe wrote:
 On Fri, Jul 13 2007, Jens Axboe wrote:
  On Thu, Jul 12 2007, James Morris wrote:
   On Thu, 12 Jul 2007, David Miller wrote:
   
From: James Morris [EMAIL PROTECTED]
Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)

 I'm seeing TCP connection stalls with current git, and a bisect found 
 the 
 following as a possible cause:

To add to this James is seeing this with distcc I believe.
   
   Correct.
  
  I'll try and reproduce.
 
 You didn't happen to get a sysrq-t backtrace of that distcc being hung,
 did you?

Does this work for you?

diff --git a/fs/splice.c b/fs/splice.c
index ed2ce99..92646aa 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -491,7 +491,7 @@ ssize_t generic_file_splice_read(struct file *in, loff_t 
*ppos,
 
ret = 0;
spliced = 0;
-   while (len) {
+   while (len  !spliced) {
ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
 
if (ret  0)
@@ -1051,15 +1051,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
sd-flags = ~SPLICE_F_NONBLOCK;
 
while (len) {
-   size_t read_len, max_read_len;
-
-   /*
-* Do at most PIPE_BUFFERS pages worth of transfer:
-*/
-   max_read_len = min(len, (size_t)(PIPE_BUFFERS*PAGE_SIZE));
+   size_t read_len;
 
-   ret = do_splice_to(in, sd-pos, pipe, max_read_len, flags);
-   if (unlikely(ret  0))
+   ret = do_splice_to(in, sd-pos, pipe, len, flags);
+   if (unlikely(ret = 0))
goto out_release;
 
read_len = ret;
@@ -1071,26 +1066,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
 * could get stuck data in the internal pipe:
 */
ret = actor(pipe, sd);
-   if (unlikely(ret  0))
+   if (unlikely(ret = 0))
goto out_release;
 
bytes += ret;
len -= ret;
 
-   /*
-* In nonblocking mode, if we got back a short read then
-* that was due to either an IO error or due to the
-* pagecache entry not being there. In the IO error case
-* the _next_ splice attempt will produce a clean IO error
-* return value (not a short read), so in both cases it's
-* correct to break out of the loop here:
-*/
-   if ((flags  SPLICE_F_NONBLOCK)  (read_len  max_read_len))
-   break;
+   if (ret  read_len)
+   goto out_release;
}
 
pipe-nrbufs = pipe-curbuf = 0;
-
return bytes;
 
 out_release:
@@ -1152,10 +1138,12 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
struct file *out,
.pos= *ppos,
.u.file = out,
};
-   size_t ret;
+   long ret;
 
ret = splice_direct_to_actor(in, sd, direct_splice_actor);
-   *ppos = sd.pos;
+   if (ret  0)
+   *ppos += ret;
+
return ret;
 }
 

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Johannes Berg
On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
 I'm seeing TCP connection stalls with current git, and a bisect found the 
 following as a possible cause:

I'm also seeing stalls with synergy.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Johannes Berg
On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote:
 On Fri, Jul 13 2007, Johannes Berg wrote:
  On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
   I'm seeing TCP connection stalls with current git, and a bisect found the 
   following as a possible cause:
  
  I'm also seeing stalls with synergy.
 
 Please try the below.

Thanks. I already had your mail in the other part of the thread marked,
I'll give it a try on Monday or Tuesday; I'm away from the setup where I
always see the problem right now.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-12 Thread David Miller
From: James Morris [EMAIL PROTECTED]
Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)

 I'm seeing TCP connection stalls with current git, and a bisect found the 
 following as a possible cause:

To add to this James is seeing this with distcc I believe.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-12 Thread James Morris
On Thu, 12 Jul 2007, David Miller wrote:

 From: James Morris [EMAIL PROTECTED]
 Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)
 
  I'm seeing TCP connection stalls with current git, and a bisect found the 
  following as a possible cause:
 
 To add to this James is seeing this with distcc I believe.

Correct.

-- 
James Morris
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-12 Thread Jens Axboe
On Thu, Jul 12 2007, James Morris wrote:
 On Thu, 12 Jul 2007, David Miller wrote:
 
  From: James Morris [EMAIL PROTECTED]
  Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)
  
   I'm seeing TCP connection stalls with current git, and a bisect found the 
   following as a possible cause:
  
  To add to this James is seeing this with distcc I believe.
 
 Correct.

I'll try and reproduce.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html