Re: Mechanisms for a multi-homed host to pick the best router

2008-09-19 Thread Brighten Godfrey
When the server sends TCP traffic for that same connection back to  
host A,
it needs to pick one of the N routers, in other words, it needs to  
pick an

outbound interface from its N interfaces.

...
The problem is that some routers are better than other routers in  
the
sense that they are closer to the final destination address A. (For  
example,

each router could be connected to a different ISP.)

One way for the server to pick the optimal downstream router, is  
to run

stub BGP between the server and each of the routers.

...
While this approach would certainly allow the server to pick the  
optimal
downstream router in all cases, I would prefer not to run routing  
protocols

on this server for a number of reasons:


It's probably good to keep in mind that this would be optimal, not  
optimal.  As far as I know the best you would get is to minimize the  
number of AS hops, which is probably correlated with, but definitely  
not the same as, metrics you actually care about like latency.  All  
in all, running BGP does seem like an awful lot of work just to let  
you optimize for the wrong metric.


Here's another thought, though.  You don't need to run BGP to get the  
data that BGP will give you.  There exist approximate maps of the  
Internet at the router or AS level with IP prefixes attached.  It  
would be possible to periodically obtain one of these graphs, e.g.  
from CAIDA, and then run a shortest-paths algorithm on that graph to  
decide based on the destination IP address which router is best.  Not  
only does this let you avoid running BGP, it also saves memory since  
you need only one copy of the graph, rather than one copy for each of  
the N BGP sessions.  Of course, it's not real-time data, but if all  
you need is a good guess as to which of the outbound interfaces is  
best, it might be sufficient.


Does anyone actually do something like this in practice?  (I'm  
guessing no)


Someone suggested an idea to me which seems almost to simple to  
work, but I

cannot find any good reason why it would not work.

The idea is the server simply sends all outbound traffic for the TCP
connection out over the same interface over which the most recent TCP
traffic for that connection was received.


So the underlying idea here is that the source (or its ISP) has  
effectively done the work of picking a good path, and by replying on  
the same interface, you use the reverse of that path, which is also  
likely to be pretty good.


Some of the assumptions in that reasoning seem imperfect:

- There's a good chance the forward path (i.e. the one the source  
picked) isn't the best.
- Asymmetry, as you noted: Even if the forward path was the best, the  
reverse of it is not necessarily the best.
- A different asymmetry: Even if the forward was best and the reverse  
of it is best, the path followed by sending on the same interface is  
not necessarily the reverse of the forward path.


So I understand that this heuristic could perform pretty well in  
practice, and certainly better than sending on a random interface (in  
terms of latency, not traffic engineering).  But I can't see how it's  
the optimal strategy.


I think there are commercial products that solve this problem the  
right way, by automatically and dynamically monitoring path quality  
and availability, and selecting paths for you.  I think the Avaya  
Converged Network Analyzer is one.  I recall speaking with an  
operator from a major content provider who said that they use an  
intelligent route selection product similar to this for their  
outbound traffic.  I'd be personally interested to hear what other  
operators typically use.


~Brighten Godfrey



Re: Mechanisms for a multi-homed host to pick the best router

2008-09-18 Thread Cayle Spandon
Hi Paul,

Thank you very much for the confirmation that the idea is sane and for the
pointers to the additional information.

-- Cayle

On Wed, Sep 17, 2008 at 11:49 PM, Paul Vixie [EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED] (Cayle Spandon) writes:

  (My apologies, in advance, for the fact that this question is very long
  winded.)

 np.

  I have a server which is multi-homed to N routers as shown below:
 
   +---+
  R1---|   |
   |   |
  R2---|   |
  ...  | S |
   |   |
  Rn---|   |
   +---+
 
  This server is a host; it is not a router in the sense that it will never
  forward any packets (but it might run routing protocols as discussed
 below).
 
  Also, for the sake of simplicity in this discussion, let's say this
 server
  only receives inbound TCP connections; it never initiates outbound TCP
  connections.
 
  Finally, this server has a loopback address L. All traffic destined to
 the
  server uses address L as the destination address. All N routers have a
  static route to L and inject that route into their routing protocol
  (possibly as part of an aggregate).
 
  Now, imagine the server receives an inbound connection from another host
  whose address is A. Thus, the TCP SYN packet which S receives has source
  address A and destination address L.
 ...
  For all these reasons, I don't want to run BGP on the server.

 too many moving parts.

  Someone suggested an idea to me which seems almost to simple to work, but
 I
  cannot find any good reason why it would not work.
 
  The idea is the server simply sends all outbound traffic for the TCP
  connection out over the same interface over which the most recent TCP
  traffic for that connection was received.
 
  So, for example, if the server receives the SYN from router R3, it would
  send the SYN ACK and all subsequent packets for the TCP connection over
 that
  same interface R3.
  ...

 right idea.  works great.  see the following:

 http://www.academ.com/nanog/feb1997/multihoming.html
 http://www.irbs.net/internet/nanog/9706/0232.html
 http://gatekeeper.dec.com/pub/misc/vixie/ifdefault/

  ...
  I can think of the following problems with this approach:
 
  (Problem 1) It only works for inbound TCP connections and not for
 outbound
  TCP connections. For outbound TCP connections we would not know which
 router
  to send the first SYN packet to.

 you said above you only needed inbound.  for outbound and udp: round robin.

  ...
  My question for the NANOG community are these:
 
  (Question 1) Can you think of any additional problems with this approach?
  Specifically, I am interested in persistent failures in the absence of
  topology changes.

 topology change frequency is in a different order of magnitude than the
 usual tcp session startup frequency, so unless you've got long running tcp
 sessions which won't restart on a connection reset, you've got no problem,
 and if you do have that kind of tcp session, you've already got problems.

  (Question 2) Is there another mechanism for the server (a multi-homed
 host)
  to pick a best router, short of running stub BGP? Are there any standards
  for this?

 there are a bazillion patented and/or ubersecret ways to do this.  noone
 has
 ever demonstrated anything that works better than an undercommitted network
 with undercommitted connections to other undercommitted first-hop networks.

  (Question 3) If the answer to question 2 is no, is there any interest
  in standardizing a protocol for a multi-homed host to pick a best
  next-hop router? One could think of this is a host-to-router routing
  protocol. One might call the existing routing protocols router-to-router
  protocols (because I think we are abusing them by running them on
  hosts). This is somewhat analogous to the multicast routing world where
  we use different protocols for router-to-router multicast (PIM) versus
  host-to-router (IGMP).

 sadly, this has been tried, but it always runs into least-cost routing
 issues
 whereby not only the predicted connection quality but also contract details
 like whether this is over or under the per-period minima and how many
 quatloos
 per kilosegment it will cost all have to get exchanged at high speed with
 low
 latency and good accuracy.  been there, did that, got no useful t-shirt
 even.
 --
 Paul Vixie




Re: Mechanisms for a multi-homed host to pick the best router

2008-09-18 Thread Cayle Spandon
Hi Laurence,

RE why would you not sent the reply out the same spigot you go the request
on?

Yes, that exactly what I was trying to ask in the e-mail (in a much more
verbose way than you :-).

The problems I could think of are:

- It only works for inbound TCP connections.

- The TCP connections might be dropped after a topology change despite the
existence of an alternative path.

I was wondering if anyone else knows of any additional problems.

-- Cayle


Re: Mechanisms for a multi-homed host to pick the best router

2008-09-18 Thread Valdis . Kletnieks
On Wed, 17 Sep 2008 22:32:29 EDT, Cayle Spandon said:

 (Problem 2) If there is a topology change after the TCP connection has been
 established, the traffic might follow a sub-optimal path.

Another possibility is that the connection was originally established *during*
a link outage, so the initial part of the connection was done over a sub-optimal
path, and that the topology change puts it back to the normal better path...

A possible *biggger* issue is that toss the reply packet back where the
original came from makes traffic-engineering your outbound packets a lot more
challenging - you end up having to play announcement games upstream of your
N routers to engineer your *inbound* traffic so your outbound packets do what
you want.


pgp90Ld4EITT3.pgp
Description: PGP signature


Re: Mechanisms for a multi-homed host to pick the best router

2008-09-18 Thread William Waites

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

So, for example, if the server receives the SYN from router R3, it  
would
send the SYN ACK and all subsequent packets for the TCP connection  
over that

same interface R3.
...


right idea.  works great.  see the following:

http://www.academ.com/nanog/feb1997/multihoming.html
http://www.irbs.net/internet/nanog/9706/0232.html
http://gatekeeper.dec.com/pub/misc/vixie/ifdefault/



This approach is particularly useful for host with multiple IPv6  
tunnels. Some
tunnel providers implement strict RPF, some don't. Where this is the  
case,
having multiple tunnels (cf multiple address ranges) is problematic.  
Of course
these days perhaps perhaps the IPv4 variant could be done with a  
stateful NAT.


Maybe case could be made for IPv6 NAT (and site-local addresses?) in  
this scnario...


- -w
- --
William Waites   [EMAIL PROTECTED]
http://www.irl.styx.org/  +49 30 8894 9942
CD70 0498 8AE4 36EA 1CD7  281C 427A 3F36 2130 E9F5

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjSlsMACgkQQno/NiEw6fWEhACfcVGZ5qEbvESVCWxQibkm/jLp
wKsAn1lQWcMO+fk5ZV5V08narSfoC/gF
=tlbx
-END PGP SIGNATURE-



Mechanisms for a multi-homed host to pick the best router

2008-09-17 Thread Cayle Spandon
(My apologies, in advance, for the fact that this question is very long
winded.)

I have a server which is multi-homed to N routers as shown below:

 +---+
R1---|   |
 |   |
R2---|   |
...  | S |
 |   |
Rn---|   |
 +---+

This server is a host; it is not a router in the sense that it will never
forward any packets (but it might run routing protocols as discussed below).

Also, for the sake of simplicity in this discussion, let's say this server
only receives inbound TCP connections; it never initiates outbound TCP
connections.

Finally, this server has a loopback address L. All traffic destined to the
server uses address L as the destination address. All N routers have a
static route to L and inject that route into their routing protocol
(possibly as part of an aggregate).

Now, imagine the server receives an inbound connection from another host
whose address is A. Thus, the TCP SYN packet which S receives has source
address A and destination address L.

When the server sends TCP traffic for that same connection back to host A,
it needs to pick one of the N routers, in other words, it needs to pick an
outbound interface from its N interfaces.

Traditionally, this is done by doing a best-match lookup for address A in
the forwarding table of the server.

One could install a ECMP default route which points to all N routers. In
this case, the downstream router would essentially be picked at random (for
each connection, assuming 5-tuple hashing).

The problem is that some routers are better than other routers in the
sense that they are closer to the final destination address A. (For example,
each router could be connected to a different ISP.)

One way for the server to pick the optimal downstream router, is to run
stub BGP between the server and each of the routers. By stub BGP I mean
that the server uses the BGP session only to learn routes. It advertises its
own loopback L, but it never advertises any other routes, and it never
propagates and routes from one BGP session to another BGP session.

The server would have N BGP sessions and learn the full default-free BGP
route table over each of those sessions. In other words, the server would
end up with approximately N x 250,000 routes in its RIB and 250,000 routes
in its FIB.

While this approach would certainly allow the server to pick the optimal
downstream router in all cases, I would prefer not to run routing protocols
on this server for a number of reasons:

- I don't want to the spend memory and CPU on such large RIBs and FIBs.

- I'm afraid that other routers will attempt to forward traffic through the
server (due to accidental misconfigurations) once it starts participating in
the routing protocols.

- Since there might be many of these servers (many more than the number of
routers) I might end up stretching the routers beyond their scaling limits
(number of BGP sessions, link state database size, etc.) and destabilizing
the network.

- I know there are good open source implementation of routing protocols, but
still, I'm nervous that any instability or bugs on the servers could end up
screwing up the routers (e.g. persistent BGP flaps).

- One possible variation is that the server is a client of some
route-reflectors which are not in the forwarding path (i.e. next-hop-self is
not enabled). In that case, I might end up needing to do BGP next-hop
resolution for a very large number of BGP next-hops. This, in turn, implies
that the server might need to also run OSPF in a very large flat area 0.

For all these reasons, I don't want to run BGP on the server.

Someone suggested an idea to me which seems almost to simple to work, but I
cannot find any good reason why it would not work.

The idea is the server simply sends all outbound traffic for the TCP
connection out over the same interface over which the most recent TCP
traffic for that connection was received.

So, for example, if the server receives the SYN from router R3, it would
send the SYN ACK and all subsequent packets for the TCP connection over that
same interface R3.

If the inbound packets for that same TCP connection start arriving from a
different router (e.g. because of link failure), say R4, then the server
also switches the outbound packets to that same router R4.

I am aware that routing is not always symmetrical. In other words, I am
aware that the best route from A to Z might be A-B-C-Z while the best
route from Z to A might be Z-D-A.

However, since the IP routing tables form an inverted tree, it seems to me
that in realistic scenarios the traffic should still arrive at A (maybe over
a non-optimal path in rare cases) if Z sends the reverse traffic to C
instead of D. It seems unlikely (impossible?) that this would cause a
routing loop.

I can think of the following problems with this approach:

(Problem 1) It only works for inbound TCP connections and not for outbound
TCP connections. For outbound TCP connections we would not know which router
to send the first SYN packet 

Re: Mechanisms for a multi-homed host to pick the best router

2008-09-17 Thread Laurence F. Sheldon, Jr.

Cayle Spandon wrote:


I have a server which is multi-homed to N routers as shown below:

 +---+
R1---|   |
 |   |
R2---|   |
...  | S |
 |   |
Rn---|   |
 +---+

This server is a host; it is not a router in the sense that it will never
forward any packets (but it might run routing protocols as discussed below).


This is going to be the stupid question of the day, but unless you have 
a route policy (in which case, what was the question again?) why would 
you not sent the reply out the same spigot you go the request on?




Re: Mechanisms for a multi-homed host to pick the best router

2008-09-17 Thread Paul Vixie
[EMAIL PROTECTED] (Cayle Spandon) writes:

 (My apologies, in advance, for the fact that this question is very long
 winded.)

np.

 I have a server which is multi-homed to N routers as shown below:

  +---+
 R1---|   |
  |   |
 R2---|   |
 ...  | S |
  |   |
 Rn---|   |
  +---+

 This server is a host; it is not a router in the sense that it will never
 forward any packets (but it might run routing protocols as discussed below).

 Also, for the sake of simplicity in this discussion, let's say this server
 only receives inbound TCP connections; it never initiates outbound TCP
 connections.

 Finally, this server has a loopback address L. All traffic destined to the
 server uses address L as the destination address. All N routers have a
 static route to L and inject that route into their routing protocol
 (possibly as part of an aggregate).

 Now, imagine the server receives an inbound connection from another host
 whose address is A. Thus, the TCP SYN packet which S receives has source
 address A and destination address L.
...
 For all these reasons, I don't want to run BGP on the server.

too many moving parts.

 Someone suggested an idea to me which seems almost to simple to work, but I
 cannot find any good reason why it would not work.

 The idea is the server simply sends all outbound traffic for the TCP
 connection out over the same interface over which the most recent TCP
 traffic for that connection was received.

 So, for example, if the server receives the SYN from router R3, it would
 send the SYN ACK and all subsequent packets for the TCP connection over that
 same interface R3.
 ...

right idea.  works great.  see the following:

http://www.academ.com/nanog/feb1997/multihoming.html
http://www.irbs.net/internet/nanog/9706/0232.html
http://gatekeeper.dec.com/pub/misc/vixie/ifdefault/

 ...
 I can think of the following problems with this approach:

 (Problem 1) It only works for inbound TCP connections and not for outbound
 TCP connections. For outbound TCP connections we would not know which router
 to send the first SYN packet to.

you said above you only needed inbound.  for outbound and udp: round robin.

 ...
 My question for the NANOG community are these:

 (Question 1) Can you think of any additional problems with this approach?
 Specifically, I am interested in persistent failures in the absence of
 topology changes.

topology change frequency is in a different order of magnitude than the
usual tcp session startup frequency, so unless you've got long running tcp
sessions which won't restart on a connection reset, you've got no problem,
and if you do have that kind of tcp session, you've already got problems.

 (Question 2) Is there another mechanism for the server (a multi-homed host)
 to pick a best router, short of running stub BGP? Are there any standards
 for this?

there are a bazillion patented and/or ubersecret ways to do this.  noone has
ever demonstrated anything that works better than an undercommitted network
with undercommitted connections to other undercommitted first-hop networks.

 (Question 3) If the answer to question 2 is no, is there any interest
 in standardizing a protocol for a multi-homed host to pick a best
 next-hop router? One could think of this is a host-to-router routing
 protocol. One might call the existing routing protocols router-to-router
 protocols (because I think we are abusing them by running them on
 hosts). This is somewhat analogous to the multicast routing world where
 we use different protocols for router-to-router multicast (PIM) versus
 host-to-router (IGMP).

sadly, this has been tried, but it always runs into least-cost routing issues
whereby not only the predicted connection quality but also contract details
like whether this is over or under the per-period minima and how many quatloos
per kilosegment it will cost all have to get exchanged at high speed with low
latency and good accuracy.  been there, did that, got no useful t-shirt even.
-- 
Paul Vixie