jsing@ and matthew@ reported a problem with /32 cloning routes, used at
least on Google Compute Engine.
The problem is that when I have the following routing table, no route
can be clone from 10.128.0.1/32:
10.128.0.1/32 10.128.0.3 UCS 1 0 - 8 vio0
10.128.0.3 70:5f:ca:21:8d:70 UHLl 0 0 - 1 vio0
10.128.0.3/32 10.128.0.3 UCn 0 0 - 4 vio0
The reason is that cloned route inherit the priority of their parent. So
when the parent is a /32, the kernel will try to insert a cloned route with
the same destination and priority. This is a multipath case.
However if we use the RTF_MPATH here we still end up with broken ARP.
Because the ARP lookup is built on top of the route lookup, we want to
ensure that no RTF_CLONING route is found.
Hence the fix below that use a higher priority for cloned routes. With it
the table now looks like:
10.128.0.1 link#1 UHLc 0 1 - 7 vio0
10.128.0.1/32 10.128.0.3 UCS 1 0 - 8 vio0
10.128.0.3 70:5f:ca:21:8d:70 UHLl 0 0 - 1 vio0
10.128.0.3/32 10.128.0.3 UCn 0 0 - 4 vio0
Since priority 1 is reserved, it is safe to use priority of the cloning
route minus one.
ok?
Index: net/route.c
===================================================================
RCS file: /cvs/src/sys/net/route.c,v
retrieving revision 1.342
diff -u -p -r1.342 route.c
--- net/route.c 4 Dec 2016 09:46:39 -0000 1.342
+++ net/route.c 13 Dec 2016 08:46:33 -0000
@@ -252,8 +252,15 @@ rt_match(struct sockaddr *dst, uint32_t
info.rti_info[RTAX_DST] = dst;
KERNEL_LOCK();
- error = rtrequest(RTM_RESOLVE, &info, RTP_DEFAULT,
- &rt, tableid);
+ /*
+ * The priority of cloned route should be different
+ * to avoid conflict with /32 cloning routes.
+ *
+ * It should also be higher to let the ARP layer find
+ * cloned routes instead of the cloning one.
+ */
+ error = rtrequest(RTM_RESOLVE, &info,
+ rt->rt_priority - 1, &rt, tableid);
if (error) {
rt_missmsg(RTM_MISS, &info, 0, RTP_NONE, 0,
error, tableid);
@@ -1046,15 +1053,14 @@ rtrequest(int req, struct rt_addrinfo *i
ifa->ifa_refcnt++;
rt->rt_ifa = ifa;
rt->rt_ifidx = ifp->if_index;
- if (rt->rt_flags & RTF_CLONED) {
- /*
- * Copy both metrics and a back pointer to the cloned
- * route's parent.
- */
- rt->rt_rmx = (*ret_nrt)->rt_rmx; /* copy metrics */
- rt->rt_priority = (*ret_nrt)->rt_priority;
- rt->rt_parent = *ret_nrt; /* Back ptr. to
parent. */
- rtref(rt->rt_parent);
+ /*
+ * Copy metrics and a back pointer from the cloned
+ * route's parent.
+ */
+ if (ISSET(rt->rt_flags, RTF_CLONED)) {
+ rtref(*ret_nrt);
+ rt->rt_parent = *ret_nrt;
+ rt->rt_rmx = (*ret_nrt)->rt_rmx;
}
/*