Re: PF NAT and Oracle/Linux mystery

2003-01-18 Thread Steve Schmitz
Maybe someone with experience in fast networks can comment on how
average packet loss and latency affect the maximum TCP window you might
want to use. With an MTU of 1500 or lower, over the Internet, do your
windows actually go beyond 65535 bytes, even if you enable scaling?


We have a (partly) gigabit/copper internal network at the institute, and the 
MTU can be set to 9000.

The nodes which have to access the Oracle DB only have 100 Mbit links, and 
deactivating wscale on them had no measureable effect on the network 
throughput between nodes and internal file servers (which have Gbit links). 
One reason for this is, I suppose, that this is about _TCP_ window size, 
whereas NFS (the transfer method we mainly use) is _UDP_ only under Linux, 
thus unaffected.

If you consider gigabit/copper a fast network and can suggest 
experiments/meassurements, I'll be happy to conduct them.

Cheers, Steve



_
Add photos to your e-mail with MSN 8. Get 2 months FREE*. 
http://join.msn.com/?page=features/featuredemail



Re: PF NAT and Oracle/Linux mystery

2003-01-18 Thread Steve Schmitz
We could add a "strip-wscale" option to scrub. It doesn't solve
the state pickup issue, but could prevent clients communicating
through the firewall from negotiating this option.


Does the Linux NAT code already do this?

We tried and temporarily split up our combined firewall/NAT machine into 
two, a firewall (the original combined machine with NAT commented out), and 
an extra NAT machine. When the NAT machine ran OpenBSD, contact with the 
wscale-ing Linux/Oracle server failed. When we installed Linux on the NAT 
machine, it worked, although in both cases the OpenBSD firewall was still 
between the NAT machine and the Oracle server.

So I conclude that either the OpenBSD firewall code has no trouble with 
wscale but the NAT code has, or the Linux NAT clears out the wscale TCP 
options from the initial SYN packet - i.e. does exactly what you propose.

I have not tried to flush the Linux NAT state (and thus, wscale size) and 
see if it crashes the connection. I only understood these issues after 
Daniels explanation.

Cheers, Steve


_
Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail



Resolved: PF NAT and Oracle/Linux mystery

2003-01-17 Thread Steve Schmitz
Hi Daniel, hi Mike, and the others.

Thank you very very much for your help!

Now I know what caused the problem (TCP Window Scaling) and how
to  fix it ("echo 0 > /proc/sys/net/ipv4/tcp_window_scaling" on
the  clients), all without requiring access to the Oracle server
machine, and without measureable performance loss for the client
in the private network!

In one word, perfect. What a way to end a week.

Thanks again!

Cheers, Steve


_
MSN 8: advanced junk mail protection and 2 months FREE*. 
http://join.msn.com/?page=features/junkmail



Re: PF NAT and Oracle/Linux mystery

2003-01-17 Thread Steve Schmitz
You mentioned the behavior depends on the OS (and application) of the 
server. When Oracle runs on Solaris, it works. And when you connect to the 
Linux Oracle to another service (ssh, etc.), it works, too?

I am not allowed to log into Linux/Oracle server. I tried with netcat on a 
sister machine of the L/O server and this worked okay.

Could you run a tcpdump -nvvvSpi  to catch all packets of a new 
connection up to the point where it stalls? You can use a filter expression 
(like 'host 192.168.101.14') to only capture packets of a single 
connection, as the stall occurs after around 130 packets, the log shouldn't 
get too large.

Find the log attached. The client this time was 192.168.101.9.

Cheers, Steve


_
Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail


oracle-hang.log
Description: Binary data


Re: PF NAT and Oracle/Linux mystery

2003-01-16 Thread Steve Schmitz
Could be fragments. Can you try with

  scrub in on $ext_if all no-df
  scrub out on $ext_if all no-df

If you run pfctl -si, do you see any of the 'Counters' at the bottom
increase when you get a stalled connection?

Also, can you enable debug loggin (pfctl -x m) and check
/var/log/messages for relevant entries, after reproducing the problem?


I included the two scrub lines into the ruleset and flushed and reloaded the 
pf, but to no avail. Log attached.

The firewall is running not quite the newest version of OpenBSD/PF (a 3.2 
beta). Is it advisable to upgrade, given the interruption in service?

Cheers, Steve


_
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE* 
http://join.msn.com/?page=features/virus
192.168.101.14 - the node which tries to connect to Oracle/Linux
141.225.240.34 - the Oracle/Linux server
139.33.102.140 - the OpenBSD/PF NAT (and FW) machine


Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 
win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 
4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd
Jan 16 18:41:32 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 
win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 
4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=130 dir=out,fwd
Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1
Jan 16 18:41:32 firewall /bsd: pf: State failure on: 1
Jan 16 18:41:44 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 
high=3987556777win=28480 modulator=0] [lo=3963179816 high=3963208296 
win=5792 modulator=0] 4:4PA seq=3987556722 ack=3963179816 len=121 ackskew=0 
pkts=131 dir=out,fwd
Jan 16 18:41:44 firewall /bsd: pf: BAD state: TCP 192.168.101.14:32863 
139.33.102.140:50237 141.225.240.34:1521 [lo=3987556722 high=3987556777 
win=28480 modulator=0] [lo=3963179816 high=3963208296 win=5792 modulator=0] 
4:4 PA seq=3987556722 ack=3963179816 len=121 ackskew=0 pkts=131 dir=out,fwd
Jan 16 18:41:44 firewall /bsd: pf: State failure on: 1
Jan 16 18:41:44 firewall /bsd: pf: State failure on: 1



Counters
 match  308080.0/s
 bad-offset 00.0/s
 fragment   00.0/s
 short  00.0/s
 normalize  00.0/s
 memory 00.0/s

[ shortly after ]

Counters
 match  325000.0/s
 bad-offset 00.0/s
 fragment   00.0/s
 short  00.0/s
 normalize  00.0/s
 memory 00.0/s




PF NAT and Oracle/Linux mystery

2003-01-16 Thread Steve Schmitz
Hi,

I have a problem with access to an Oracle database over
an OpenBSD PF NAT setup.

We (a particle physics institute) have a Linux cluster for
our computations; the nodes have private IP addresses and
contact the outside world via an OpenBSD/PF NAT machine.

The NAT machine works perfectly fine for SSH/SCP, DNS and
everything else we tried. Everything except access to an
Oracle database on a Linux machine, that is. A connection
can be opened, and a query can be sent. However, after a
few lines of results printed out, the connection freezes.
pfctl -s state reports the connection as
ESTABLISHED:ESTABLISHED, even minutes after the connection
went south.

It is interesting to notice that two variations of this
situation do indeed work well: access via an OpenBSD/PF
NAT to an Solaris Oracle database works, and access via
a Linux/iptables NAT to both Oracle on Solaris and on
Linux works, too.

The problem seems to be an interference of the OpenBSD/PF
NAT with the Linux/Oracle.

Any ideas?

Cheers, Steve


_
Protect your PC - get McAfee.com VirusScan Online 
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963