Nicolas Williams wrote: > I see. Well, we can always start with a very large SO_RCVBUF and to > hell with tuning TCP. My only concern with that is that this may > reserver a large amount of memory, but the buffers should only ever get > really big in large delay, WAN situations.
Assuming an app is well behaved, the buffer is only used when there is data lost. TCP needs to deliver data in sequence. So if there is data lost, TCP needs to buffer subsequent data until the lost data is recovered. So if congestion control is done right, meaning not much data is lost due to congestion, the kernel memory issue should not matter. > *Exactly*. ssh/sshd could track the running average of RTTs over two > different time periods, and when the short-term average is smaller than > the long-term then available bandwidth is growing and with it's higher > we have congestion. There have been a lot of research on this area. But they are on how TCP should work. It is not clear to me how the above will work on top of TCP, which is also doing its own congestion control. But I think the end result may still be limited by TCP. The sending rate is always limited by TCP's congestion window. And it is increased as fast as the underlying TCP's algorithm allows. This assumes that there is always data to be sent in TCP. So if an app just keeps dumping data to TCP, TCP will send out as fast as it thinks it can. And TCP will react to congestion events. > Well, we're talking about bulk data transfers. Congestion will be > noticeable. I'm more concerned about detecting when congestion is > resolved. Also, the application will be able to measure both, RTTs and > actual bandwidth for the connection. > > Yes, getting this right will be tricky. It doesn't help that we have > two layers of flow control. But perhaps your comments about TCP buffer > sizing limitations is actually a boon in disguise: just don't auto-tune > TCP and start with very large TCP buffer sizes but small SSHv2 channel > windows and slow start those. Note that my comment on buffer size is on the receiving side. An app cannot reduce the receive buffer. And yes in theory, if the receiver advertises a huge window, the TCP bulk transfer rate will be mostly controlled by the sending side. If the sending side's algorithm is good, it can react well to both very low and very high bandwidth environment. > The implementation is real dumb: fixed window sizes without relation to > TCP buffer sizes. (Actually the window size shrinks when the sender > sends data and grows when the receiver drains it, but it never exceeds > the original.) See $SRC/cmd/ssh/libssh/common/channels.c, and search > for "adjust" case-insensitively -- it's pretty obvious. > > The SSHv2 spec covering this (RFC4254) allows the channel window size to > grow, and it would be silly to over-subscribe the connection's buffers > for long. Each channel has an initial window size. Sending data > consumes space from the window. The receiver can send an unsigned > integer adjustment whenever it wants. So I guess the issue is on how to grow the window size of each channel. And only those channels which have used up their windows will need to have the windows adjusted. I guess your scheme may work. Just let the sender know the receiver's buffer size to avoid over subscription. Using an appropriate fairness control and grow the bulk trasnfer channel's window. This makes sure that there is always data in TCP to be sent and TCP will send as fast as it thinks it can. -- K. Poon. kacheong.poon at sun.com