Tcp developers: (Alan Cox: you probably could fix in a minute)

I've been told that this is THE PLACE to contact
linux kernel developers. Ok, I've got a repeatable
bug that I've reported elsewhere to no avail. Hope
this is the place:

Includes 2 perl scripts to reproduce it and info about what
I see not working with tcpdump traces.

(i'm not a subscriber - to reach me use my email address)

thnx
et



from my prior posts:


--------------------
Hi:

I have been trying to figure out
why linux tcp is failing to ack
properly in some situations.

I can easily reproduce this error
using 2 perl scripts. (see below) I
create a socket server in one
script, a client in another, start
sending, suspend the receiver and
wait 4 minutes. The socket will get
disconnected. It should not do
this, it should send an ack with a
window of 0, which it fails to do.
Both the client and the server can
be on the same system to easily see
the error.

To any developer who might be
listening - please help me find and
fix this problem.

Thanks
Eric


---- debugging info follows ------------------



If I run the client from a windows
system, linux behaves properly,
sending acks/window 0, and when I
unsuspend the receiver, all
re-starts up in a few seconds.

When going linux to linux, it fails
almost always. When it does fail, I
see the sender trying to send a
large block (1-2k bytes) and when
no ack comes back the sender goes
into it's retransmit loop waiting
1,2,4... seconds. I also see the
retransmit count going up with:
/proc/net/tcp

When it fails, the last ack sent
back shows a window size of more
than 15000. Here is a sample:

21:48:23.376528   lo > 127.0.0.1.5000 > 127.0.0.1.1052: . 1:1(0) ack 1255213 win 18158 
<nop,nop,timestamp 21258536 21258536> (DF)
21:48:23.379304   lo > 127.0.0.1.1052 > 127.0.0.1.5000: P 1255213:1255256(43) ack 1 
win 31072 <nop,nop,timestamp 21258536 21258536> (DF)
21:48:23.384049   lo > 127.0.0.1.1052 > 127.0.0.1.5000: P 1255256:1257234(1978) ack 1 
win 31072 <nop,nop,timestamp 21258537 21258536

No further acks here. This last
send occurs until the retransmit
count is exhausted. Then the socket
is closed.

Once in a while, it does not fail,
when it works, I see the sender
(tcpdump trace) sending only a 1
byte record. And I see all the acks
with a window getting smaller until
it reaches zero. At that point, the
sender is probing by sending 1 byte
records.

Here is a sample where it does
work:

22:13:34.161580   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5444248 win 11616 
<nop,nop,timestamp 21409615 21409615> (DF)
22:13:34.161979   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5451992 win 7744 
<nop,nop,timestamp 21409615 21409615> (DF)
22:13:34.162322   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5459736 win 3872 
<nop,nop,timestamp 21409615 21409615> (DF)
22:13:34.162553   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5463608 win 3872 
<nop,nop,timestamp 21409615 21409615> (DF)
22:13:34.179785   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5467480 win 0 
<nop,nop,timestamp 21409617 21409615> (DF)
22:13:34.379759   lo > 127.0.0.1.1053 > 127.0.0.1.5000: . 5467479:5467479(0) ack 1 win 
31072 <nop,nop,timestamp 21409637 21409617> (DF)
22:13:34.379856   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5467480 win 0 
<nop,nop,timestamp 21409637 21409615> (DF)
22:13:34.779762   lo > 127.0.0.1.1053 > 127.0.0.1.5000: . 5467479:5467479(0) ack 1 win 
31072 <nop,nop,timestamp 21409677 21409637> (DF)
22:13:34.779852   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5467480 win 0 
<nop,nop,timestamp 21409677 21409615> (DF)
22:13:35.579755   lo > 127.0.0.1.1053 > 127.0.0.1.5000: . 5467479:5467479(0) ack 1 win 
31072 <nop,nop,timestamp 21409757 21409677> (DF)
22:13:35.579842   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5467480 win 0 
<nop,nop,timestamp 21409757 21409615> (DF)

server resumed here.

22:18:33.572608   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5467480 win 3872 
<nop,nop,timestamp 21439556 21409615> (DF)
22:18:33.572714   lo > 127.0.0.1.1053 > 127.0.0.1.5000: P 5467480:5471352(3872) ack 1 
win 31072 <nop,nop,timestamp 21439556 21439556> (DF)
22:18:33.572910   lo > 127.0.0.1.5000 > 127.0.0.1.1053: . 1:1(0) ack 5471352 win 3872 
<nop,nop,timestamp 21439556 21439556> (DF)
22:18:33.573007   lo > 127.0.0.1.1053 > 127.0.0.1.5000: P 5471352:5475224(3872) ack 1 
win 31072 <nop,nop,timestamp 21439556 21439556

------------------------------------------------------



To reproduce: create the 2 perl
files, badclient.pl and
badserver.pl shown below.

>From two terminal windows:

>perl badserver.pl 5000
>perl badclient.pl

You will see for the server (hit control-z):

[1028]$ perl badserver.pl 5000
waiting for connection on port 5000
accept connection from 127.0.0.1, 1052
hello there /badclient.pl/   //   // 1
hello there /badclient.pl/   //   // 1001
hello there /badclient.pl/   //   // 2001
hello there /badclient.pl/   //   // 3001
---type control z ---
[1]+  Stopped                 perl badserver.pl 5000

You will see nothing for the client.

---badclient.pl----

use IO::Socket;

$client = IO::Socket::INET->new("localhost:5000") or die $@;
$a = shift;
$b = shift;
$i=1;
while(1) {
  print $client "hello there /$0/   /$a/   /$b/ $i\n";
  $i++;
}

close($client);


---badserver.pl---



use IO::Socket;
use Socket;
$port = shift or die "usage: perl server port_number\n";

$server = IO::Socket::INET->new(
                LocalPort =>$port,
                Type => SOCK_STREAM,
                Reuse    => 1,
                Listen    => 10,
                )
                or die "cannot create server : $@\n";
print "waiting for connection on port $port\n";                 
while ($client = $server->accept()) {
    $other = getpeername($client) or die "cannot get peer $!\n";
    ($port,$iaddr) = unpack_sockaddr_in($other);
    $ip_addr = inet_ntoa($iaddr);
        print "accept connection from $ip_addr, $port\n";
    $i=0;
    while( $_ = <$client> ) {
        if($i%1000 == 0){print $_;}
        $i++;
    }
    print "done with this connection\n";
}                       
                        
close($server);
                                                                

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to