Re: PF NAT and Oracle/Linux mystery
Maybe someone with experience in fast networks can comment on how average packet loss and latency affect the maximum TCP window you might want to use. With an MTU of 1500 or lower, over the Internet, do your windows actually go beyond 65535 bytes, even if you enable scaling? We have a (partly) gigabit/copper internal network at the institute, and the MTU can be set to 9000. The nodes which have to access the Oracle DB only have 100 Mbit links, and deactivating wscale on them had no measureable effect on the network throughput between nodes and internal file servers (which have Gbit links). One reason for this is, I suppose, that this is about _TCP_ window size, whereas NFS (the transfer method we mainly use) is _UDP_ only under Linux, thus unaffected. If you consider gigabit/copper a fast network and can suggest experiments/meassurements, I'll be happy to conduct them. Cheers, Steve _ Add photos to your e-mail with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail
Re: PF NAT and Oracle/Linux mystery
We could add a "strip-wscale" option to scrub. It doesn't solve the state pickup issue, but could prevent clients communicating through the firewall from negotiating this option. Does the Linux NAT code already do this? We tried and temporarily split up our combined firewall/NAT machine into two, a firewall (the original combined machine with NAT commented out), and an extra NAT machine. When the NAT machine ran OpenBSD, contact with the wscale-ing Linux/Oracle server failed. When we installed Linux on the NAT machine, it worked, although in both cases the OpenBSD firewall was still between the NAT machine and the Oracle server. So I conclude that either the OpenBSD firewall code has no trouble with wscale but the NAT code has, or the Linux NAT clears out the wscale TCP options from the initial SYN packet - i.e. does exactly what you propose. I have not tried to flush the Linux NAT state (and thus, wscale size) and see if it crashes the connection. I only understood these issues after Daniels explanation. Cheers, Steve _ Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail
Resolved: PF NAT and Oracle/Linux mystery
Hi Daniel, hi Mike, and the others. Thank you very very much for your help! Now I know what caused the problem (TCP Window Scaling) and how to fix it ("echo 0 > /proc/sys/net/ipv4/tcp_window_scaling" on the clients), all without requiring access to the Oracle server machine, and without measureable performance loss for the client in the private network! In one word, perfect. What a way to end a week. Thanks again! Cheers, Steve _ MSN 8: advanced junk mail protection and 2 months FREE*. http://join.msn.com/?page=features/junkmail
Re: PF NAT and Oracle/Linux mystery
You mentioned the behavior depends on the OS (and application) of the server. When Oracle runs on Solaris, it works. And when you connect to the Linux Oracle to another service (ssh, etc.), it works, too? I am not allowed to log into Linux/Oracle server. I tried with netcat on a sister machine of the L/O server and this worked okay. Could you run a tcpdump -nvvvSpi to catch all packets of a new connection up to the point where it stalls? You can use a filter expression (like 'host 192.168.101.14') to only capture packets of a single connection, as the stall occurs after around 130 packets, the log shouldn't get too large. Find the log attached. The client this time was 192.168.101.9. Cheers, Steve _ Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail oracle-hang.log Description: Binary data
Re: PF NAT and Oracle/Linux mystery
Could be fragments. Can you try with scrub in on $ext_if all no-df scrub out on $ext_if all no-df If you run pfctl -si, do you see any of the 'Counters' at the bottom increase when you get a stalled connection? Also, can you enable debug loggin (pfctl -x m) and check /var/log/messages for relevant entries, after reproducing the problem? I included the two scrub lines into the ruleset and flushed and reloaded the pf, but to no avail. Log attached. The firewall is running not quite the newest version of OpenBSD/PF (a 3.2 beta). Is it advisable to upgrade, given the interruption in service? Cheers, Steve _ MSN 8 helps eliminate e-mail viruses. Get 2 months FREE* http://join.msn.com/?page=features/virus 192.168.101.14 - the node which tries to connect to Oracle/Linux 141.225.240.34 - the Oracle/Linux server 139.33.102.140 - the OpenBSD/PF NAT (and FW) machine Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1 Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1 Jan 16 18:41:44 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 4:4PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=131 dir=out,fwd Jan 16 18:41:44 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=131 dir=out,fwd Jan 16 18:41:44 firewall /bsd: pf: State failure on: 1 Jan 16 18:41:44 firewall /bsd: pf: State failure on: 1 Counters match 308080.0/s bad-offset 00.0/s fragment 00.0/s short 00.0/s normalize 00.0/s memory 00.0/s [ shortly after ] Counters match 325000.0/s bad-offset 00.0/s fragment 00.0/s short 00.0/s normalize 00.0/s memory 00.0/s
PF NAT and Oracle/Linux mystery
Hi, I have a problem with access to an Oracle database over an OpenBSD PF NAT setup. We (a particle physics institute) have a Linux cluster for our computations; the nodes have private IP addresses and contact the outside world via an OpenBSD/PF NAT machine. The NAT machine works perfectly fine for SSH/SCP, DNS and everything else we tried. Everything except access to an Oracle database on a Linux machine, that is. A connection can be opened, and a query can be sent. However, after a few lines of results printed out, the connection freezes. pfctl -s state reports the connection as ESTABLISHED:ESTABLISHED, even minutes after the connection went south. It is interesting to notice that two variations of this situation do indeed work well: access via an OpenBSD/PF NAT to an Solaris Oracle database works, and access via a Linux/iptables NAT to both Oracle on Solaris and on Linux works, too. The problem seems to be an interference of the OpenBSD/PF NAT with the Linux/Oracle. Any ideas? Cheers, Steve _ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963