On Thu, 21 Jan 2010 10:30:54 -0500 (EST), wie...@porcupine.org (Wietse
Venema) wrote:
> It's nice for a change to work with someone who provides actual
> information, instead of that dork from last week who was just
> venting his opinions and not giving people a chance to help.

I don't know how one can help me if I don't supply info ;)
> 
> In this case it looks like a Solaris TCP bug when sending data over
> a connection that suffers from heavy packet loss.  Apparently,
> Solaris 10 stops retransmitting, and therefore the other side times
> out.

Ok, this is useful help. I did reroute the message via a linux postfix
server and now it is successfully delivered. I try to find out if I can
solve the Solaris bug but at least our customer is happy now!
> 
> I'm not wrapping the packets this time, so you will have to
> widen your window to read this message comfortably.

With your explanation I'm able to read the tcp dump, one day I will fully
understand it...
> 
>       Wietse
> 
> The three-way TCP handshake show no wscale options, so we don't
> have to worry about borked firewalls mis-handling this:
> 
> 09:16:40.786945 IP 213.207.90.2.59301 > 145.222.14.10.25: S
> 2466228028:2466228028(0) win 49640 <mss 1460,nop,nop,sackOK>
> 09:16:40.789806 IP 145.222.14.10.25 > 213.207.90.2.59301: S
> 3886146351:3886146351(0) ack 2466228029 win 5840 <mss 1380>
> 09:16:40.789829 IP 213.207.90.2.59301 > 145.222.14.10.25: . ack 1 win
49680
> 
> The handshake shows a round-trip time of 2.9ms. The receiver's MTU
> is smaller than 1460, which suggests that his packets are encapsulated
> in some other protocol. It does not matter for the problem at hand.
> 
> Next is a segment from the middle of transmission. The sender has
> sent a full window up to byte 22233, but apparently there was major
> packet loss after byte 5673.
> 
> 09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.869861 IP 213.207.90.2.59301 > 145.222.14.10.25: P
> 20853:22233(1380) ack 137 win 49680
> 09:16:51.869874 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 5673:7053(1380) ack 137 win 49680
> 09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.870110 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.870111 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.870376 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:56.216513 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 5673:7053(1380) ack 137 win 49680
> 09:16:56.260383 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 7053 win
> 19320
> 09:16:56.260414 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 7053:8433(1380) ack 137 win 49680
> 09:16:56.260421 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 8433:9813(1380) ack 137 win 49680
> 09:16:56.263577 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 8433 win
> 22080
> 09:16:56.263588 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 9813:11193(1380) ack 137 win 49680
> 09:16:56.263610 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 11193:12573(1380) ack 137 win 49680
> 09:16:56.263844 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 9813 win
> 24840
> 09:16:56.263855 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 12573:13953(1380) ack 137 win 49680
> 09:16:56.263865 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 13953:15333(1380) ack 137 win 49680
> 09:16:56.266641 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 9813 win
> 24840
> 09:16:56.266776 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 11193
win
> 28980
> 09:16:56.266777 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 11193
win
> 28980
> 09:16:56.266800 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 15333:16713(1380) ack 137 win 49680
> 
> Sequences like this repeat through the entire session. This network
> connection is so bad that your machine can send only 107656 bytes
> in 85 seconds or 1.27 kbyte/s.
> 
> I notice there are many sequences like this:
> 
> 09:16:51.869844 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.869845 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.869846 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 
> and this:
> 
> 09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win
> 16560
> 
> and this:
> 
> 09:17:20.597737 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173
win
> 32767
> 09:17:20.597738 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173
win
> 32767
> 09:17:20.597739 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173
win
> 32767
> 
> And many more.  This is consistent with network congestion. These
> identical ACKs would have been sent several ms separated from each
> other (remember, the round-trip time is about 3ms). Apparently the
> ACKs were queued up behind other data that was clogging some pipe
> somewere, and so they ended up traveling together.
> 
> Network congestion is also consistent with increased packet loss.
> 
> And now for the surprise: after 09:18:16.515592 the sender stops
> retransmitting. That would be a sender TCP bug.
> 
> The first thing that happens next is 30s later, at 09:18:46.512567,
> when the server reports a 421 timeout error and closes the connection.
> 
> 09:18:16.512400 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 106413:107793(1380) ack 137 win 49680
> 09:18:16.512408 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 107793:109173(1380) ack 137 win 49680
> 09:18:16.512627 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 106413
win
> 32767
> 09:18:16.512644 IP 213.207.90.2.59301 > 145.222.14.10.25: .
> 109173:110553(1380) ack 137 win 49680
> 09:18:16.515557 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 107793
win
> 32767
> 09:18:16.515592 IP 213.207.90.2.59301 > 145.222.14.10.25: P
> 110553:111933(1380) ack 137 win 49680
> 09:18:16.515823 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793
win
> 32767
> 09:18:16.518354 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793
win
> 32767
> [nothing for 30s]
> 09:18:46.512567 IP 145.222.14.10.25 > 213.207.90.2.59301: P 137:188(51)
> ack 107793 win 32767
> 09:18:46.513098 IP 145.222.14.10.25 > 213.207.90.2.59301: F 188:188(0)
ack
> 107793 win 32767
> 
> The sender should not stop retransmitting like this. You may want
> to provide the complete trace to the Solaris maintainers.

-- 
YoungGuns
Kasteleinenkampweg 7b
5222 AX 's-Hertogenbosch
T. 073 623 56 40
F. 073 623 56 39
www.youngguns.nl
KvK 18076568

Reply via email to