On Thu, 21 Jan 2010 10:30:54 -0500 (EST), wie...@porcupine.org (Wietse Venema) wrote: > It's nice for a change to work with someone who provides actual > information, instead of that dork from last week who was just > venting his opinions and not giving people a chance to help.
I don't know how one can help me if I don't supply info ;) > > In this case it looks like a Solaris TCP bug when sending data over > a connection that suffers from heavy packet loss. Apparently, > Solaris 10 stops retransmitting, and therefore the other side times > out. Ok, this is useful help. I did reroute the message via a linux postfix server and now it is successfully delivered. I try to find out if I can solve the Solaris bug but at least our customer is happy now! > > I'm not wrapping the packets this time, so you will have to > widen your window to read this message comfortably. With your explanation I'm able to read the tcp dump, one day I will fully understand it... > > Wietse > > The three-way TCP handshake show no wscale options, so we don't > have to worry about borked firewalls mis-handling this: > > 09:16:40.786945 IP 213.207.90.2.59301 > 145.222.14.10.25: S > 2466228028:2466228028(0) win 49640 <mss 1460,nop,nop,sackOK> > 09:16:40.789806 IP 145.222.14.10.25 > 213.207.90.2.59301: S > 3886146351:3886146351(0) ack 2466228029 win 5840 <mss 1380> > 09:16:40.789829 IP 213.207.90.2.59301 > 145.222.14.10.25: . ack 1 win 49680 > > The handshake shows a round-trip time of 2.9ms. The receiver's MTU > is smaller than 1460, which suggests that his packets are encapsulated > in some other protocol. It does not matter for the problem at hand. > > Next is a segment from the middle of transmission. The sender has > sent a full window up to byte 22233, but apparently there was major > packet loss after byte 5673. > > 09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869861 IP 213.207.90.2.59301 > 145.222.14.10.25: P > 20853:22233(1380) ack 137 win 49680 > 09:16:51.869874 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 5673:7053(1380) ack 137 win 49680 > 09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.870110 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.870111 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.870376 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:56.216513 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 5673:7053(1380) ack 137 win 49680 > 09:16:56.260383 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 7053 win > 19320 > 09:16:56.260414 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 7053:8433(1380) ack 137 win 49680 > 09:16:56.260421 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 8433:9813(1380) ack 137 win 49680 > 09:16:56.263577 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 8433 win > 22080 > 09:16:56.263588 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 9813:11193(1380) ack 137 win 49680 > 09:16:56.263610 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 11193:12573(1380) ack 137 win 49680 > 09:16:56.263844 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 9813 win > 24840 > 09:16:56.263855 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 12573:13953(1380) ack 137 win 49680 > 09:16:56.263865 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 13953:15333(1380) ack 137 win 49680 > 09:16:56.266641 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 9813 win > 24840 > 09:16:56.266776 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 11193 win > 28980 > 09:16:56.266777 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 11193 win > 28980 > 09:16:56.266800 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 15333:16713(1380) ack 137 win 49680 > > Sequences like this repeat through the entire session. This network > connection is so bad that your machine can send only 107656 bytes > in 85 seconds or 1.27 kbyte/s. > > I notice there are many sequences like this: > > 09:16:51.869844 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869845 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869846 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869847 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > > and this: > > 09:16:51.869976 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869977 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > 09:16:51.869978 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 5673 win > 16560 > > and this: > > 09:17:20.597737 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win > 32767 > 09:17:20.597738 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win > 32767 > 09:17:20.597739 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 40173 win > 32767 > > And many more. This is consistent with network congestion. These > identical ACKs would have been sent several ms separated from each > other (remember, the round-trip time is about 3ms). Apparently the > ACKs were queued up behind other data that was clogging some pipe > somewere, and so they ended up traveling together. > > Network congestion is also consistent with increased packet loss. > > And now for the surprise: after 09:18:16.515592 the sender stops > retransmitting. That would be a sender TCP bug. > > The first thing that happens next is 30s later, at 09:18:46.512567, > when the server reports a 421 timeout error and closes the connection. > > 09:18:16.512400 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 106413:107793(1380) ack 137 win 49680 > 09:18:16.512408 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 107793:109173(1380) ack 137 win 49680 > 09:18:16.512627 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 106413 win > 32767 > 09:18:16.512644 IP 213.207.90.2.59301 > 145.222.14.10.25: . > 109173:110553(1380) ack 137 win 49680 > 09:18:16.515557 IP 145.222.14.10.25 > 213.207.90.2.59301: . ack 107793 win > 32767 > 09:18:16.515592 IP 213.207.90.2.59301 > 145.222.14.10.25: P > 110553:111933(1380) ack 137 win 49680 > 09:18:16.515823 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793 win > 32767 > 09:18:16.518354 IP 145.222.14.10.25 > 213.207.90.2.59301: P ack 107793 win > 32767 > [nothing for 30s] > 09:18:46.512567 IP 145.222.14.10.25 > 213.207.90.2.59301: P 137:188(51) > ack 107793 win 32767 > 09:18:46.513098 IP 145.222.14.10.25 > 213.207.90.2.59301: F 188:188(0) ack > 107793 win 32767 > > The sender should not stop retransmitting like this. You may want > to provide the complete trace to the Solaris maintainers. -- YoungGuns Kasteleinenkampweg 7b 5222 AX 's-Hertogenbosch T. 073 623 56 40 F. 073 623 56 39 www.youngguns.nl KvK 18076568