On Sat, Jan 18, 2003 at 08:42:04AM +0000, Steve Schmitz wrote:

> Does the Linux NAT code already do this?

Possibly, but I'll have to check the source code to verify. It could
either strip the option or set any scale factors inside the option to
zero. But doing that is not much simpler than actually supporting
non-zero factors. All these approaches have the limitation that they
only work if the code sees the TCP handshake of the connections.

> So I conclude that either the OpenBSD firewall code has no trouble with 
> wscale but the NAT code has, or the Linux NAT clears out the wscale TCP 
> options from the initial SYN packet - i.e. does exactly what you propose.

It's the OpenBSD TCP sequence number tracking code that stalls such
connections, and that is used whenever you filter a TCP connection
statefully (when using 'keep state'). pf always creates a state entry
when any translation (like nat, rdr or binat) is applied to a
connection. If you were filtering statelessly with pf and doing nat on
the Linux box, that might explain why the connection didn't stall.

In the tcpdumped session you quoted, the client was using 'wscale 0' and
the server 'wscale 9'. That means the client's window values didn't get
shifted/multiplied at all, and the server's were shifted 9 bits
(multiplied by a factor of 2^9=512).

The server started sending window values of 12 (meaning 12*512=6144) and
increased them to 52 (meaning 52*512=26624). As long as the client sent
smaller segments, pf let them through. But the first larger packet gets
dropped, and the client retransmits it until the connection times out.
So you might not always see a stall, depending on the kind of traffic
the client sends. If it's all small packets (like an interactive SQL
session, where the client sends only small commands), it could work.

Also, the server might have used a lower scaling factor on other
connections. wscale 9 is quite large, that means it wants to be able to
advertise a maximum window of 65535*512 bytes, about 32 MB. Such a large
window would mean the client is invited to send up to 32 MB of data
before getting an acknowledgment. I don't know how Linux calculates the
scaling factors, but I guess it might depend on the memory available for
such buffers at run-time. It might have chosen a lower scaling factor
during the second test. But that's just a guess :)

It's also interesting that your client chose wscale 0, indicating that
it doesn't itself want to scale its own windows (because it has no large
buffers?) but wants to support the peer doing so. If you worry about
performance impacts due to disabled window scaling, it might depend on
the nature of your traffic. If only the server uses large windows (using
scaling factors), only bulk traffic client -> sender would benefit. If
your client only sends small queries but gets large results back, using
a factor only for the server's windows wouldn't improve performance.

Daniel

Reply via email to