Re: [PATCH]: Fix ipv6 round-robin locking

2007-03-28 Thread David Miller
From: YOSHIFUJI Hideaki / 吉藤英明 <[EMAIL PROTECTED]>
Date: Wed, 28 Mar 2007 18:16:42 +0900 (JST)

> I hoped we could save some memory per fib6_node,
> but I'm fine with it.

I know, I did not want to add it either :(

Speaking of which, several of the potential fixes for the rt6_probe()
deadlock require adding even more things to the fib6_node (a linked
list which some workqueue or similar can run, or a timer, etc.).

So, I'm trying to figure out a way to get the rt6_probe() to run
outside of the per-table rwlock without adding more state to
fib6_node.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Fix ipv6 round-robin locking

2007-03-28 Thread YOSHIFUJI Hideaki / 吉藤英明
Hello.

In article <[EMAIL PROTECTED]> (at Sat, 24 Mar 2007 12:44:36 -0700 (PDT)), 
David Miller <[EMAIL PROTECTED]> says:

> The fix for the most serious of them is below, and I'd appreciate
> any feedback if people spot any problems or holes in that approach.

I hoped we could save some memory per fib6_node,
but I'm fine with it.

Regards,

--yoshfuji
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Fix ipv6 round-robin locking

2007-03-24 Thread David Miller
From: Thomas Graf <[EMAIL PROTECTED]>
Date: Sun, 25 Mar 2007 00:40:30 +0100

> Seondly, I'm not sure I've fully understood why this round robin mechanism
> is needed to ensure as many routers as possible are probed as soon as
> possible.
 ...
> I'm just making sure we know what we're doing :-)

Ignore rt6_probe(), pretend that it's disabled.

It's the neigh probe that occurs when we _return_ and
_use_ the route for something that elicits the probe
we are interested in.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH]: Fix ipv6 round-robin locking

2007-03-24 Thread Thomas Graf
* David Miller <[EMAIL PROTECTED]> 2007-03-24 12:44
> As per RFC2461, section 6.3.6, item #2, when no routers on the
> matching list are known to be reachable or probably reachable we
> do round robin on those available routes so that we make sure
> to probe as many of them as possible to detect when one becomes
> reachable faster.
> 
> +static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict,
> +int *mpri, struct rt6_info *match)
>  {
> + int m;
> +
> + if (rt6_check_expired(rt))
> + goto out;
> +
> + m = rt6_score_route(rt, oif, strict);
> + if (m < 0)
> + goto out;
> +
> + if (m > *mpri) {
> + if (strict & RT6_LOOKUP_F_REACHABLE)
> + rt6_probe(match);
> + *mpri = m;
> + match = rt;
> + } else if (strict & RT6_LOOKUP_F_REACHABLE) {
> + rt6_probe(rt);
> + }
> +
> +out:
> + return match;
> +}

First of all, I haven't spotted any errors in your patch.

Seondly, I'm not sure I've fully understood why this round robin mechanism
is needed to ensure as many routers as possible are probed as soon as
possible. I'd understand it if we'd only probe just one router per selection
process but to my undertanding we try and probe _all_ routers as long as
its score is positive (and rate limitation doesn't step in). The score can
only be negative  if rt6_check_dev() returns 0, the neighbour state is
NUD_FAILED or not present at all. I don't see how the position in the list
would have any affect on this. rt6_select() is called for all routers
regardless of its score, the position of a router doesn't matter.

The only effect of this code I can see is that we return a different
_probably_ reachable router while no router is reachable. This would implement
what item #1 of 6.3.6 proposes.

I'm just making sure we know what we're doing :-)
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH]: Fix ipv6 round-robin locking

2007-03-24 Thread David Miller

There are many, many, many severe SMP locking bugs in the ipv6
advanced routing selection code.  I wish the author had thought about
these issues more carefully when writing that code because I'm now
stuck here fixing all of this :-/

The fix for the most serious of them is below, and I'd appreciate
any feedback if people spot any problems or holes in that approach.
This will fix bugs that look like an OOPS in fib6_add(), essentially
rt6_select() corrupts the list by making fn->leaf NULL, this is an
illegal state and will cause a crash because the fib6_add() code
(correctly) assumes fn->leaf can never be NULL.

The other bug is that rt6_probe() executes with the per-routing-table
lock held as a reader, but when NDISC is protected by tunneled IPSEC,
this can call right back into the routing code and if that tunneled
IPSEC route lookup COWs a route it will try to take the rwlock as a
writer and deadlock.

There are probably other bugs of this nature lurking in this new code.

commit 4c68db63b8314df3cf30b7fe595a1b8935bb2cb0
Author: David S. Miller <[EMAIL PROTECTED]>
Date:   Sat Mar 24 12:06:32 2007 -0700

[IPV6]: Fix routing round-robin locking.

As per RFC2461, section 6.3.6, item #2, when no routers on the
matching list are known to be reachable or probably reachable we
do round robin on those available routes so that we make sure
to probe as many of them as possible to detect when one becomes
reachable faster.

Each routing table has a rwlock protecting the tree and the linked
list of routes at each leaf.  The round robin code executes during
lookup and thus with the rwlock taken as a reader.  A small local
spinlock tries to provide protection but this does not work at all
for two reasons:

1) The round-robin list manipulation, as coded, goes like this (with
   read lock held):

walk routes finding head and tail

spin_lock();
rotate list using head and tail
spin_unlock();

   While one thread is rotating the list, another thread can
   end up with stale values of head and tail and then proceed
   to corrupt the list when it gets the lock.  This ends up causing
   the OOPS in fib6_add() later onthat many people have been hitting.

2) All the other code paths that run with the rwlock held as
   a reader do not expect the list to change on them, they
   expect it to remain completely fixed while they hold the
   lock in that way.

So, simply stated, it is impossible to implement this correctly using
a manipulation of the list without violating the rwlock locking
semantics.

Reimplement using a per-fib6_node round-robin pointer.  This way we
don't need to manipulate the list at all, and since the round-robin
pointer can only ever point to real existing entries we don't need
to perform any locking on the changing of the round-robin pointer
itself.  We only need to reset the round-robin pointer to NULL when
the entry it is pointing to is removed.

The idea is from Thomas Graf and it is very similar to how this
was implemented before the advanced router selection code when in.

Signed-off-by: David S. Miller <[EMAIL PROTECTED]>

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 9eda572..cf355a3 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -58,6 +58,7 @@ struct fib6_node
__u16   fn_bit; /* bit key */
__u16   fn_flags;
__u32   fn_sernum;
+   struct rt6_info *rr_ptr;
 };
 
 #ifndef CONFIG_IPV6_SUBTREES
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index f4d7be7..c46f909 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1109,6 +1109,10 @@ static void fib6_del_route(struct fib6_node *fn, struct 
rt6_info **rtp,
rt6_stats.fib_rt_entries--;
rt6_stats.fib_discarded_routes++;
 
+   /* Reset round-robin state, if necessary */
+   if (fn->rr_ptr == rt)
+   fn->rr_ptr = NULL;
+
/* Adjust walkers */
read_lock(&fib6_walker_lock);
FOR_WALKERS(w) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a6b3117..3931b33 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -363,55 +363,76 @@ static int rt6_score_route(struct rt6_info *rt, int oif,
return m;
 }
 
-static struct rt6_info *rt6_select(struct rt6_info **head, int oif,
-  int strict)
+static struct rt6_info *find_match(struct rt6_info *rt, int oif, int strict,
+  int *mpri, struct rt6_info *match)
 {
-   struct rt6_info *match = NULL, *last = NULL;
-   struct rt6_info *rt, *rt0 = *head;
-   u32 metric;
+   int m;
+
+   if (rt6_check_expired(rt))
+   goto out;
+
+   m = rt6_score_route(rt, oif, strict);
+   if (m < 0)
+   goto out;
+
+