vpn bridge misbehavior

2007-02-05 Thread Jonathan Whiteman

Greetings all,

Last week I described briefly a problem with *return* TCP/IP traffic
only, across a LAN-to-LAN VPN network bridge, only on the first
connection.  I appreciate your responses and so now as you've requested
I have composed a detailed network topology and configuration document
in order to properly describe the problem in detail.

This email best viewed with a fixed-width font.  (Pardon the ASCII art.)

#
#THE NETWORK:
#
 PUBLIC INTERNET

 | |  |
 | |  |
   -  -  -
   |firewall 3 |__link_A__|firewall 1 |__link_B__|firewall 2 |
   |OpenBSD 4.0|  |OpenBSD 3.9|  |OpenBSD 4.0|
   -  -  -
  |   |   |   ||
172.18.1.0/24 |192.168.248.0/21 192.168.254.0/24
  |   |   || -
 172.18.2.0/24||_| router 1  |
  |  -   |OS X 10.4.8|
 172.18.3.0/24   | router 2  |   -
 |OpenBSD 4.0|  |   |   |
 -172.17.1.0/24 |
   ||   |
   |  172.17.2.0/24
   ||
X.X.X.X/X 172.17.3.0/24
#--
#Internal IP addresses:
#--

firewall 1: 192.168.250.1
firewall 2: 192.168.254.1
firewall 3: 172.18.1.1, 172.18.2.1, 172.18.3.1
  router 1: 192.168.254.2, 172.17.1.1, 172.17.2.1, 172.17.3.1
  router 2: 192.168.250.3, X.X.X.X

The network behind firewall 1 is the primary network.  Software
developers sit within its private address space.  Firewall 1 is also
running two OpenVPN server instances; one instance is configured as a
routed tunnel, one instance is configured as a bridged tunnel.

Firewalls 2 and 3 are both at remote locations.  They each run an
OpenVPN client instance which connects back to firewall 1.  In the
diagram above link A represents the routed OpenVPN tunnel and link B
represents the bridged tunnel.  You'll notice that firewall 2, which
connects to the bridged tunnel, handles a private IP range that is in
fact a subset of firewall 1's.  The art department's desktops lie within
that 192.168.254.0/24 range at firewall 2's remote site.  It should be
noted here that developers also regularly connect to either tunnel when
working from home or the road.  The bridged tunnel is configured to
forward all traffic while the routed tunnel is configured to only
forward appropriate traffic.

Router 1 sits within the range of addresses firewall 2's bridge to
firewall 1's network and routes traffic for 3 seperate subnets
(sub-subnets?) of intel mac mini development server clusters (God,
please don't ask why).  Router 2 sits within the main part of firewall
1's local network and acts as a OpenVPN client just for routing traffic
to/from a network at another remote site whose administration is not
within my jurisdiction.  Theoretically TCP/IP traffic should be able to
pass from any part of the network to any other part of the network and
back.  For that matter, this all in fact seems to work correctly despite
the fact that I'm a total rookie network admin... with one notable
exception, which brings us to the problem.

#
#THE PROBLEM:
#

Now for the fun part.  I've been adding routes and configurations and pf
rules and such as we build this network out.  Some branches (notably the
172.17 and 172.18 subnets) are very new additions.  The only part where
known problems persist (and here lies the point of this whole email) is
with access from developer desktops behind firewall 1 (also all OS X
10.4.8 machines, fwiw) to the mac mini dev cluster behind router 1.  The
real odd part of this problem is that its only a problem for any given
client the first time it connects to a mac dev server in any given day.
Whats even weirder is that it doesn't appear to be a problem with the
macs themselves or with router 1, which is also a mac.  Traffic reaches
the mac mini server and tcpdump verifies that it comes all the way back
through the tunnel and appears on tun1 of firewall 1 but fails to get
passed over the bridge (bridge0) with firewall 1's tunnel endpoint and
its local ethernet device (sis0).  This first connection always times
out, however immediately retrying always works.  Pinging always works
though and pinging first always eliminates the first-connection failure
of the following TCP/IP connections - but for only that client and the
strange little cycle seems to reset itself sometime between the end of
the business day and 

Re: vpn bridge misbehavior

2007-02-02 Thread Joachim Schipper
On Thu, Feb 01, 2007 at 05:25:05PM -0800, Jonathan Whiteman wrote:
 Greetings.
 
 Is there a commonly known cause of *return* TCP/IP traffic
 to reach but be dropped rather than passed back across a
 bridge (ala bridgename.bridge0) but... get this... only on
 the first try?
 
 I'd like to get into a detailed explanation of the network
 topology I'm working with here but I don't want to scare off
 anyone by opening with a 3 page email.
 
 The bridge seems to work fine for everything except every
 24 hours or so (may be less... like say 2-8 hours actually?)
 individual clients trying to access services on a *certain*
 cluster of servers on the other side of the bridge has to
 either first ping the server (which always works) or
 else just accept that their first connection attempt WILL
 time out but the second one WILL succeed.
 
 Obvious issues like the server machines or even just their
 network devices going to sleep because of misconfigured
 power management has already been excluded as a possiblity
 because tcpdumping on both devices in the bridge clearly
 shows missing return traffic only being passed back to the
 other device AFTER the first attempt.
 
 Anyway, any advice is greatly appreciated.

While OpenBSD doesn't do that, ISTR some other VPN implementations
offering to open tunnels 'on demand' (and, presumably, close them when
not needed). Could this be involved in this case?

Still, I don't know why that would only be a problem one way, but if
this seems to depend on the tunnel in use, something like this might
be the case.

Joachim



Re: vpn bridge misbehavior

2007-02-02 Thread Stuart Henderson
 Is there a commonly known cause of *return* TCP/IP traffic
 to reach but be dropped rather than passed back across a
 bridge (ala bridgename.bridge0) but... get this... only on
 the first try?

if this is a long-lived TCP connection, perhaps firewall states
have timed out.

if so, adjusting timers may help, either for certain rules, or use
the 'set optimization' shortcuts (these set up default values for
tcp.first, tcp.opening etc - src/sbin/pfctl/pfctl.c shows that
aggressive sets tcp.established to 5h, normal 24h, conservative 5d)

without the 3 page email it's guesswork though.

 I'd like to get into a detailed explanation of the network
 topology I'm working with here but I don't want to scare off
 anyone by opening with a 3 page email.

people can always skip the email, most people who will be able to
help would prefer to have the information in one place rather than
back-and-forth to find it out. in most cases, actual configs and
output from system commands work better than descriptions.

note that the process of gathering all the relevant information
for a good list post will often highlight the actual problem :-)



vpn bridge misbehavior

2007-02-01 Thread Jonathan Whiteman

Greetings.

Is there a commonly known cause of *return* TCP/IP traffic
to reach but be dropped rather than passed back across a
bridge (ala bridgename.bridge0) but... get this... only on
the first try?

I'd like to get into a detailed explanation of the network
topology I'm working with here but I don't want to scare off
anyone by opening with a 3 page email.

The bridge seems to work fine for everything except every
24 hours or so (may be less... like say 2-8 hours actually?)
individual clients trying to access services on a *certain*
cluster of servers on the other side of the bridge has to
either first ping the server (which always works) or
else just accept that their first connection attempt WILL
time out but the second one WILL succeed.

Obvious issues like the server machines or even just their
network devices going to sleep because of misconfigured
power management has already been excluded as a possiblity
because tcpdumping on both devices in the bridge clearly
shows missing return traffic only being passed back to the
other device AFTER the first attempt.

Anyway, any advice is greatly appreciated.
thanks,

~jon



Re: vpn bridge misbehavior

2007-02-01 Thread Rolf Sommerhalder

Hi,

On 2/2/07, Jonathan Whiteman [EMAIL PROTECTED] wrote:

I'd like to get into a detailed explanation of the network
topology I'm working with here but I don't want to scare off
anyone by opening with a 3 page email.


Your subject implies that you built a layer-2 LAN-to-LAN bridge over
an (IPSec or OpenVPN?) VPN tunnel.
Not beening a OpenBSD nor VPN specialist yet, still I would love to
see your detailed three page description, as I am currently
experimenting with a similar setup (and got a prototype to work last
night in the lab).

Rolf