RE: routing policy based on u32 classifier

2007-12-10 Thread Brian S Julin


Marco wrote:

 Hello everybody.
 Kindly, I would like to know if the is any plan to add this feature to a 
 future kernel release.
 I know that fwmark is able to do this, but there is the limitation in source 
 ip address selection.

Could you explain the limitation?  My iptables manpage seems to suggest
that u32 is pretty general.  Are you just asking if the pom-ng ipt_u32
will be mainlined?

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] [PATCH] easier PBR for dynamic source tables (via multipath)

2007-12-07 Thread Brian S Julin

This is a first swat and not in final form.  I hope folks here will help vet
my thinking on it.

This fills in a missed niche in policy routing support.  It allows multipath
routes to select nexthop based on the source realm, inside the routing
decision step, immediately after RPF is performed.  It moves RPF before
multipath selection.

This would be for people wanting to do policy routing based on a table
injected by a dynamic routing protocol, e.g. quagga, rather than static rules.

The existing methods for achieving this effect are all a bit tacky for
various reasons:

1) iptables -m realm --realm X -j ROUTE in FORWARD,mangle
because ipt_ROUTE is not a well supported iptables target
and has started to get dropped from mainstream distros.  Maybe
for lack of maintenence, but perhaps it is intentionally
deprecated. (?)

2) tc route from ... action mirred egress redirect happens
too late in the packet processing to do much else to the
packet, like say edit the MAC addresses which remain what they
were on the original output dev.  Doing this is really an
abuse of the queueing system and involves setting up qdiscs in
weird ways when one may only want to route.

3) Userspace scripts to glue loading from a kernel routing table
to a pre-routing ipset, iptables -j MARK, then ip rule add fwmark
because the kernel then has to check the source address against two
tables rather than one, and they could get quite large.  Plus it's
hackery.

This patch is a raw proof-of-concept I put together to get
things working just enough to ensure that nothing blows up when
packets are routed this way.  As such it does a couple of distasteful
things and has a couple rough edges:

  Reuses the nh_weight field as the realm
  Does not allow normal load balancing to fully mix in
  ipv4 only
  forward only, no code for local/output route.
  probably will break ifndef CONFIG_NET_CLS_ROUTE

Were this general idea to be deemed worthy, and as long as limiting
sizeof(struct fib_nh) is not a major concern to any linux routing
application.  I could work up a more thorough/cleaner patch allowing
statistical multipath and SAD policy-routing multipath to play nicely
together.

Especially needing comments on proper multipath RPF: The mainline code
only checks the selected path and if RPF fails it does not choose a
different one.  From this I assumed it is OK to do RPF on any old nexthop,
and we just assume the user won't or can't put any PR rule in that would gum
that up.  Otherwise both the mainline code and this code would have to
RPF multiple times, defeating the goal of good performance.  (Not to
mention that could get extra confusing when you are using the source
realm to choose.)  Special attention to the spec_dest handling, what
should be (?) OK since this is forward-only.

Also to consider is what this means to multipath caching should that
make a comeback.

I've only tested this code lightly so far, just bouncing things around
to static arp maps on the same if.

After patching iproute2, just substitute weight X with byrealm X to
activate it.  Probably you want to avoid realm 0.  You should be able to
put catch-all nexthops in with weight X alongside the byrealm ones
but they do not interact statistically.  Comments on that syntax
also welcome.

Sorry about the attachments, no real MUAs available here that won't
corrupt tabs.
diff -r -U2 linux-source-2.6.23-dsc/include/linux/rtnetlink.h 
linux-source-2.6.23-dsc-dsad/include/linux/rtnetlink.h
--- linux-source-2.6.23-dsc/include/linux/rtnetlink.h   2007-10-09 
16:31:38.0 -0400
+++ linux-source-2.6.23-dsc-dsad/include/linux/rtnetlink.h  2007-12-06 
20:23:25.0 -0500
@@ -294,4 +294,5 @@
 #define RTNH_F_PERVASIVE   2   /* Do recursive gateway lookup  */
 #define RTNH_F_ONLINK  4   /* Gateway is forced on link*/
+#define RTNH_F_DSAD8   /* Dynamic PBR (weight = source realm)  
*/
 
 /* Macros to handle hexthops */
diff -r -U2 linux-source-2.6.23-dsc/include/net/ip_fib.h 
linux-source-2.6.23-dsc-dsad/include/net/ip_fib.h
--- linux-source-2.6.23-dsc/include/net/ip_fib.h2007-10-09 
16:31:38.0 -0400
+++ linux-source-2.6.23-dsc-dsad/include/net/ip_fib.h   2007-12-06 
20:23:25.0 -0500
@@ -202,5 +202,6 @@
 extern int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
   struct net_device *dev, __be32 *spec_dst, u32 
*itag);
-extern void fib_select_multipath(const struct flowi *flp, struct fib_result 
*res);
+extern void fib_select_multipath(const struct flowi *flp, 
+  struct fib_result *res, u32 itag);
 
 struct rtentry;
diff -r -U2 linux-source-2.6.23-dsc/net/ipv4/fib_semantics.c 
linux-source-2.6.23-dsc-dsad/net/ipv4/fib_semantics.c
--- linux-source-2.6.23-dsc/net/ipv4/fib_semantics.c2007-10-09 
16:31:38.0 -0400
+++ linux-source-2.6.23-dsc-dsad/net/ipv4/fib_semantics.c   2007-12-07 
14:36:10.0 -0500
@@ -1164,5 +1164,6 @@
  */