It's nice for a change to work with someone who provides actual
information, instead of that dork from last week who was just
venting his opinions and not giving people a chance to help.

In this case it looks like a Solaris TCP bug when sending data over
a connection that suffers from heavy packet loss.  Apparently,
Solaris 10 stops retransmitting, and therefore the other side times
out.

I'm not wrapping the packets this time, so you will have to
widen your window to read this message comfortably.

        Wietse

The three-way TCP handshake show no wscale options, so we don't
have to worry about borked firewalls mis-handling this:

09:16:40.786945 IP 213.207.90.2.59301 > 145.222.14.10.25: S 
2466228028:2466228028(0) win 49640 <mss 1460,nop,nop,sackOK>
09:16:40.789806 IP 145.222.14.10.25 > 213.207.90.2.59301: S 
3886146351:3886146351(0) ack 2466228029 win 5840 <mss 1380>
09:16:40.789829 IP 213.207.90.2.59301 > 145.222.14.10.25: . ack 1 win 49680

The handshake shows a round-trip time of 2.9ms. The receiver's MTU
is smaller than 1460, which suggests that his packets are encapsulated
in some other protocol. It does not matter for the problem at hand.

Next is a segment from the middle of transmission. The sender has
sent a full window up to byte 22233, but apparently there was major
packet loss after byte 5673.

09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.869861 IP 213.207.90.2.59301 > 145.222.14.10.25: P 20853:22233(1380) 
ack 137 win 49680
09:16:51.869874 IP 213.207.90.2.59301 > 145.222.14.10.25: . 5673:7053(1380) ack 
137 win 49680
09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.870110 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.870111 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.870376 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:56.216513 IP 213.207.90.2.59301 > 145.222.14.10.25: . 5673:7053(1380) ack 
137 win 49680
09:16:56.260383 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 7053 win 19320
09:16:56.260414 IP 213.207.90.2.59301 > 145.222.14.10.25: . 7053:8433(1380) ack 
137 win 49680
09:16:56.260421 IP 213.207.90.2.59301 > 145.222.14.10.25: . 8433:9813(1380) ack 
137 win 49680
09:16:56.263577 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 8433 win 22080
09:16:56.263588 IP 213.207.90.2.59301 > 145.222.14.10.25: . 9813:11193(1380) 
ack 137 win 49680
09:16:56.263610 IP 213.207.90.2.59301 > 145.222.14.10.25: . 11193:12573(1380) 
ack 137 win 49680
09:16:56.263844 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 9813 win 24840
09:16:56.263855 IP 213.207.90.2.59301 > 145.222.14.10.25: . 12573:13953(1380) 
ack 137 win 49680
09:16:56.263865 IP 213.207.90.2.59301 > 145.222.14.10.25: . 13953:15333(1380) 
ack 137 win 49680
09:16:56.266641 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 9813 win 24840
09:16:56.266776 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 11193 win 28980
09:16:56.266777 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 11193 win 28980
09:16:56.266800 IP 213.207.90.2.59301 > 145.222.14.10.25: . 15333:16713(1380) 
ack 137 win 49680

Sequences like this repeat through the entire session. This network
connection is so bad that your machine can send only 107656 bytes
in 85 seconds or 1.27 kbyte/s.

I notice there are many sequences like this:

09:16:51.869844 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.869845 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.869846 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560

and this:

09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560
09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win 16560

and this:

09:17:20.597737 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win 32767
09:17:20.597738 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win 32767
09:17:20.597739 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win 32767

And many more.  This is consistent with network congestion. These
identical ACKs would have been sent several ms separated from each
other (remember, the round-trip time is about 3ms). Apparently the
ACKs were queued up behind other data that was clogging some pipe
somewere, and so they ended up traveling together.

Network congestion is also consistent with increased packet loss.

And now for the surprise: after 09:18:16.515592 the sender stops
retransmitting. That would be a sender TCP bug.

The first thing that happens next is 30s later, at 09:18:46.512567,
when the server reports a 421 timeout error and closes the connection.

09:18:16.512400 IP 213.207.90.2.59301 > 145.222.14.10.25: . 106413:107793(1380) 
ack 137 win 49680
09:18:16.512408 IP 213.207.90.2.59301 > 145.222.14.10.25: . 107793:109173(1380) 
ack 137 win 49680
09:18:16.512627 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 106413 win 32767
09:18:16.512644 IP 213.207.90.2.59301 > 145.222.14.10.25: . 109173:110553(1380) 
ack 137 win 49680
09:18:16.515557 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 107793 win 32767
09:18:16.515592 IP 213.207.90.2.59301 > 145.222.14.10.25: P 110553:111933(1380) 
ack 137 win 49680
09:18:16.515823 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793 win 32767
09:18:16.518354 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793 win 32767
[nothing for 30s]
09:18:46.512567 IP 145.222.14.10.25 > 213.207.90.2.59301: P 137:188(51) ack 
107793 win 32767
09:18:46.513098 IP 145.222.14.10.25 > 213.207.90.2.59301: F 188:188(0) ack 
107793 win 32767

The sender should not stop retransmitting like this. You may want
to provide the complete trace to the Solaris maintainers.

Reply via email to