Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread David Ahern
On 1/30/17 4:56 PM, Stephen Hemminger wrote:
>>
>> As I mentioned quagga does not work with IPv6 multipath as is today. 
>>
>> I just looked at bird. IPv6 mpath support was added in Sept. 2016. It 
>> specifically hard codes not accepting RTA_MULTIPATH for IPv6 which I think 
>> is an odd choice and clearly coding to quirks as opposed to rtnetlink 
>> design. Having never looked at bird code I was able to get it working in < 1 
>> hour. I will contact the patch author about that limitation. That said, the 
>> bird implementation needs work when you look at the 
>> add/delete/replace/append permutations, so the current code has its problems 
>> as well.
> 
> Also what if quagga was fixed but  had to work with existing enterprise 
> distros?
> 

I am testing the patches; it will work on both.




Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Stephen Hemminger
On Mon, 30 Jan 2017 16:53:44 -0700
David Ahern  wrote:

> On 1/30/17 2:16 PM, Stephen Hemminger wrote:
> > My fear is that routing daemons already adapt to the funny semantics of 
> > multi-path routing in IPv4 vs IPv6
> > and therefore any change in semantics or flags risks breaking existing user 
> > space.  
> 
> That is a possibility, but so far the 2 open source code bases I know of have 
> problems with IPv6 mpath.

Breaking closed source is not acceptable either.

> 
> As I mentioned quagga does not work with IPv6 multipath as is today. 
> 
> I just looked at bird. IPv6 mpath support was added in Sept. 2016. It 
> specifically hard codes not accepting RTA_MULTIPATH for IPv6 which I think is 
> an odd choice and clearly coding to quirks as opposed to rtnetlink design. 
> Having never looked at bird code I was able to get it working in < 1 hour. I 
> will contact the patch author about that limitation. That said, the bird 
> implementation needs work when you look at the add/delete/replace/append 
> permutations, so the current code has its problems as well.

Also what if quagga was fixed but  had to work with existing enterprise distros?


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread David Ahern
On 1/30/17 2:16 PM, Stephen Hemminger wrote:
> My fear is that routing daemons already adapt to the funny semantics of 
> multi-path routing in IPv4 vs IPv6
> and therefore any change in semantics or flags risks breaking existing user 
> space.

That is a possibility, but so far the 2 open source code bases I know of have 
problems with IPv6 mpath.

As I mentioned quagga does not work with IPv6 multipath as is today. 

I just looked at bird. IPv6 mpath support was added in Sept. 2016. It 
specifically hard codes not accepting RTA_MULTIPATH for IPv6 which I think is 
an odd choice and clearly coding to quirks as opposed to rtnetlink design. 
Having never looked at bird code I was able to get it working in < 1 hour. I 
will contact the patch author about that limitation. That said, the bird 
implementation needs work when you look at the add/delete/replace/append 
permutations, so the current code has its problems as well.




Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Stephen Hemminger
On Mon, 30 Jan 2017 10:45:57 -0800
Roopa Prabhu  wrote:

> On 1/30/17, 8:12 AM, David Ahern wrote:
> > On 1/30/17 8:49 AM, Roopa Prabhu wrote:  
> >>> Single next hop delete will be around because IPv6 allows it -- and 
> >>> because IPv4 needs to support it.
> >>>  
> >> understand single next hop delete for ipv6 will be around..and my only 
> >> point was to leave it around but not optimize for that case.
> >> I don't think we should support single nexthop delete in ipv4 (I have not 
> >> seen a requirement for that)... ipv4 is good as it is right now.
> >> the additional complexity is not needed.
> >>  
> > IPv4 has a known bug -- delete a virtual interface in a multihop route and 
> > the entire route is deleted, including the nexthops for other devices. This 
> > does not happen for IPv6.
> >
> > Simple example of that bug:
> >
> > ip li add dummy1 type dummy
> > ip li add dummy2 type dummy
> > ip addr add dev dummy1 10.11.1.1/28
> > ip li set dummy1 up
> > ip addr add dev dummy2 10.11.2.1/28
> > ip li set dummy2 up
> > ip ro add 1.1.1.0/24 nexthop via 10.11.1.2 nexthop via 10.11.2.2
> > ip li del dummy2
> >  
> > --> the entire multipath route has been deleted.  
> >
> >
> > And, fixing this bug enables work to make IPv4 append to be sane -- 
> > appending a route should modify an existing route by adding the nexthop, 
> > not adding a new route that I believe can never actually be hit.
> >
> > Both cases mean modifying an IPv4 route -- adding or removing nexthops -- a 
> > capability that IPv6 allows so fixing this means closing another difference 
> > between the stacks.  
> 
> good point on the bug you point out. agree the bug needs to be fixed ...but 
> routing daemons react to this behavior pretty well...because they get a link 
> notification and a route notification. I was ok with fixing ipv6 to be 
> similar to ipv4...but I am not in favor of bringing in design choices that 
> ipv6 made into ipv4 :).
> In all cases, in my experience with routes, the update of ecmp route as a 
> whole has always been ok (at-least not until now...maybe in the future
> for new usecases)
> 
> In the case of the bug you point out, can the user be notified of the 
> implicit update of the route in the kernel ...via replace flag or something ?.
> regarding append..., ipv4 never really supported appending to an existing 
> route..even in the case of a non-ecmp routes.
> append just dictates the order where the route is added IIRC  (i maybe 
> mistaken here..its been long i tried it).

My fear is that routing daemons already adapt to the funny semantics of 
multi-path routing in IPv4 vs IPv6
and therefore any change in semantics or flags risks breaking existing user 
space.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Roopa Prabhu
On 1/30/17, 8:12 AM, David Ahern wrote:
> On 1/30/17 8:49 AM, Roopa Prabhu wrote:
>>> Single next hop delete will be around because IPv6 allows it -- and because 
>>> IPv4 needs to support it.
>>>
>> understand single next hop delete for ipv6 will be around..and my only point 
>> was to leave it around but not optimize for that case.
>> I don't think we should support single nexthop delete in ipv4 (I have not 
>> seen a requirement for that)... ipv4 is good as it is right now.
>> the additional complexity is not needed.
>>
> IPv4 has a known bug -- delete a virtual interface in a multihop route and 
> the entire route is deleted, including the nexthops for other devices. This 
> does not happen for IPv6.
>
> Simple example of that bug:
>
> ip li add dummy1 type dummy
> ip li add dummy2 type dummy
> ip addr add dev dummy1 10.11.1.1/28
> ip li set dummy1 up
> ip addr add dev dummy2 10.11.2.1/28
> ip li set dummy2 up
> ip ro add 1.1.1.0/24 nexthop via 10.11.1.2 nexthop via 10.11.2.2
> ip li del dummy2
>
> --> the entire multipath route has been deleted.
>
>
> And, fixing this bug enables work to make IPv4 append to be sane -- appending 
> a route should modify an existing route by adding the nexthop, not adding a 
> new route that I believe can never actually be hit.
>
> Both cases mean modifying an IPv4 route -- adding or removing nexthops -- a 
> capability that IPv6 allows so fixing this means closing another difference 
> between the stacks.

good point on the bug you point out. agree the bug needs to be fixed ...but 
routing daemons react to this behavior pretty well...because they get a link 
notification and a route notification. I was ok with fixing ipv6 to be similar 
to ipv4...but I am not in favor of bringing in design choices that ipv6 made 
into ipv4 :).
In all cases, in my experience with routes, the update of ecmp route as a whole 
has always been ok (at-least not until now...maybe in the future
for new usecases)

In the case of the bug you point out, can the user be notified of the implicit 
update of the route in the kernel ...via replace flag or something ?.
regarding append..., ipv4 never really supported appending to an existing 
route..even in the case of a non-ecmp routes.
append just dictates the order where the route is added IIRC  (i maybe mistaken 
here..its been long i tried it).


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Nicolas Dichtel
Le 30/01/2017 à 16:23, David Ahern a écrit :
> On 1/30/17 4:07 AM, Nicolas Dichtel wrote:
>> Le 29/01/2017 à 19:02, David Ahern a écrit :
>> [snip]
>>> Data centers are moving to L3, and multipath is a big part of that. Anyone 
>>> who looks at ip -6 route enough knows it gets painful mentally pulling the 
>>> individual routes into a single one.
>> I agree, but it's only an iproute2 problem. iproute2 could group routes to 
>> have
>> a better output, there is no need to have a kernel patch for this ;-)
>>
> 
> iproute2 is not the only rtnetlink user. The comment above uses ip show as an 
> example. libnl has a workaround for IPv6 to update route objects versus 
> replacing them - unnecessary complexity that does not need to replicated to 
> iproute2, Quagga/FRR or python libraries implementing rtnetlink. Really, 
> RTA_MULTIPATH support in notifications should have been added when multipath 
> support was added to the IPv6. Patch 3 is mostly a refactoring of 
> rt6_fill_node to fill in nexthop information. This could have been done 4+ 
> years ago when RTA_MULTIPATH route adds was added to the stack.
> 
Like I said, I fully agree that RTA_MULTIPATH is better for the dump. For the
notifications, I'm not convinced. I did not this 4 years ago on purpose ;-)

I don't think that ipv4 is the right reference because the implementation is
really different. ipv6 is more flexible and this implies differences in the
notifications.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Nicolas Dichtel
Le 30/01/2017 à 16:45, Roopa Prabhu a écrit :
> On 1/30/17, 3:08 AM, Nicolas Dichtel wrote:
>> Le 30/01/2017 à 00:55, Roopa Prabhu a écrit :
>>> On 1/29/17, 10:02 AM, David Ahern wrote:
 On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>> 4. Route Appends
>>- IPv6 allows nexthops to be appended to an existing route. In this
>>  case one notification is sent per nexthop added
> thanks for listing all of these...I think you mentioned this case to me..
> but I don't remember now why this notification is
> sent per nexthop added. This is an update to an existing multipath route.
> so seems like the notification should be a RTM_NEWROUTE with the full 
> RTA_MULTIPATH route
> (similar to route add)
 It could be; it's a question of what should userspace get -- the full 
 route or the change? Append to me suggests the latter - userspace is told 
 what changed. It is simpler kernel code wise to send the full new route. 
 The append changes were done after our conversation. ;-)
>>> ok, yeah. you listing all the cases here made it more simpler to understand 
>>> in the context of other notifications :). I would prefer all
>>> RTM_NEWLINK notifications (ie new add or update to an existing 
>>> route..replace/append), contain the full route via RTA_MULTIPATH.
>> I don't agree. With the previous proposal, you know *exactly* what happens 
>> with
>> each notification and this is the primary goal of the notifications. With the
>> last proposal, where RTA_MULTIPATH is used for replace and append, you have 
>> the
>> new result, but you don't know what has been done.
>> Usually, notifications are used to notify an event, not the result of an 
>> event.
>> If you want the result, you can use the dump cmd.
> 
> what has been done is conveyed by the APPEND and REPLACE flag in the 
> notification. The user only cares about the updated multipath route...
APPEND with a full route is bit ambiguous ;-)

> giving parts of the route has never been very useful... and this is 
> consistent with ipv4.
> 
I don't agree, the user cares about what happens. The dump is here to have the
result if you don't have it already (if the user has a cache, like the libnl,
it's easy to have the result). Daemons that listen nl usually start by a dump
and after they apply the change when they get notifications (they already know
the previous state).

And even if I agree that it's better to be consistent with ipv4, saying "it's
like this in ipv4" is not an argument to made the behavior good/better.
The need to fully remove an ecmp route to add a nexthop in ipv4 is really
painful (see David's example about the 'interface removal' case). And this
functional difference also impacts the notifications.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread David Ahern
On 1/30/17 8:49 AM, Roopa Prabhu wrote:
>> Single next hop delete will be around because IPv6 allows it -- and because 
>> IPv4 needs to support it.
>>
> understand single next hop delete for ipv6 will be around..and my only point 
> was to leave it around but not optimize for that case.
> I don't think we should support single nexthop delete in ipv4 (I have not 
> seen a requirement for that)... ipv4 is good as it is right now.
> the additional complexity is not needed.
> 

IPv4 has a known bug -- delete a virtual interface in a multihop route and the 
entire route is deleted, including the nexthops for other devices. This does 
not happen for IPv6.

Simple example of that bug:

ip li add dummy1 type dummy
ip li add dummy2 type dummy
ip addr add dev dummy1 10.11.1.1/28
ip li set dummy1 up
ip addr add dev dummy2 10.11.2.1/28
ip li set dummy2 up
ip ro add 1.1.1.0/24 nexthop via 10.11.1.2 nexthop via 10.11.2.2
ip li del dummy2

--> the entire multipath route has been deleted.


And, fixing this bug enables work to make IPv4 append to be sane -- appending a 
route should modify an existing route by adding the nexthop, not adding a new 
route that I believe can never actually be hit.

Both cases mean modifying an IPv4 route -- adding or removing nexthops -- a 
capability that IPv6 allows so fixing this means closing another difference 
between the stacks.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread David Ahern
On 1/30/17 4:13 AM, Nicolas Dichtel wrote:
> Le 30/01/2017 à 03:57, David Ahern a écrit :
>> On 1/29/17 7:20 PM, Roopa Prabhu wrote:
 2. Delete - 1 notification for each hop for all combinations of delete 
 commands
>>>
>>> here I was trying to say, for people deleting the full multipath route, you 
>>> should send a RTA_MULTIPATH.
>>
>> The only way to do that is to call inet6_rt_notify() *before* the route 
>> delete work has been done in fib6_del_route. Right now it is called at the 
>> end - after all sibling associations have been dropped, a time in which it 
>> can no longer fill in RTA_MULTIPATH.
>>
>> fib6_del_route function does not have any failure paths and is called with 
>> the write lock held, so moving the notification might be ok. But it also 
>> means ignoring all subsequent failures from fib6_del, again which might be 
>> ok since the only failure is ENOENT which can not happen if we are walking 
>> the sibling list.
> Is it not possible to prepare the RTA_MULTIPATH attribute during the deletion
> (when needed information are still available) and to use/copy it later when 
> the
> notification is done?

What I outlined above has the least complexity and does not require copies. And 
it is only proper for the 1 case where a multipath route is deleted by prefix 
and length -- the ability enabled by Patch 2.

> 
> Or at least, having RTA_MUTLTIPATH with one nexthop only, so that the user 
> knows
> that this route was part of an ECMP routes?
> 

For the ip6_route_multipath_del path, sure the nexthops deleted can be wrapped 
in RTA_MULTIPATH if the nexthop indeed has siblings, but does it help 
userspace? If all attributes outside of the nexthops are identical, then 
userspace would match the route without it.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Roopa Prabhu
On 1/29/17, 6:57 PM, David Ahern wrote:
> On 1/29/17 7:20 PM, Roopa Prabhu wrote:
>>> 2. Delete - 1 notification for each hop for all combinations of delete 
>>> commands
>> here I was trying to say, for people deleting the full multipath route, you 
>> should send a RTA_MULTIPATH.
> The only way to do that is to call inet6_rt_notify() *before* the route 
> delete work has been done in fib6_del_route. Right now it is called at the 
> end - after all sibling associations have been dropped, a time in which it 
> can no longer fill in RTA_MULTIPATH.
>
> fib6_del_route function does not have any failure paths and is called with 
> the write lock held, so moving the notification might be ok. But it also 
> means ignoring all subsequent failures from fib6_del, again which might be ok 
> since the only failure is ENOENT which can not happen if we are walking the 
> sibling list.
>
>
>> For the odd case of ipv6 single nexthop delete, you can continue to send a 
>> single nexthop delete (like today ..without RTA_MULTIPATH)
>> because there is no other way to notify a single nexthop delete with a 
>> RTM_DELROUTE.
>>
>> The reason I say this is because: keeping future in mind, this will make 
>> things consistent for majority of people who will
>> start using RTA_MULTIPATH for both ipv4 and ipv6 for adds and deletes 
>> (requests/notifications and dumps) the same way.
>> single next hop delete will just be around because of fear of breaking 
>> existing applications.
> Single next hop delete will be around because IPv6 allows it -- and because 
> IPv4 needs to support it.
>
understand single next hop delete for ipv6 will be around..and my only point 
was to leave it around but not optimize for that case.
I don't think we should support single nexthop delete in ipv4 (I have not seen 
a requirement for that)... ipv4 is good as it is right now.
the additional complexity is not needed.



Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread David Ahern
On 1/30/17 4:07 AM, Nicolas Dichtel wrote:
> Le 29/01/2017 à 19:02, David Ahern a écrit :
> [snip]
>> Data centers are moving to L3, and multipath is a big part of that. Anyone 
>> who looks at ip -6 route enough knows it gets painful mentally pulling the 
>> individual routes into a single one.
> I agree, but it's only an iproute2 problem. iproute2 could group routes to 
> have
> a better output, there is no need to have a kernel patch for this ;-)
> 

iproute2 is not the only rtnetlink user. The comment above uses ip show as an 
example. libnl has a workaround for IPv6 to update route objects versus 
replacing them - unnecessary complexity that does not need to replicated to 
iproute2, Quagga/FRR or python libraries implementing rtnetlink. Really, 
RTA_MULTIPATH support in notifications should have been added when multipath 
support was added to the IPv6. Patch 3 is mostly a refactoring of rt6_fill_node 
to fill in nexthop information. This could have been done 4+ years ago when 
RTA_MULTIPATH route adds was added to the stack.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Roopa Prabhu
On 1/30/17, 3:08 AM, Nicolas Dichtel wrote:
> Le 30/01/2017 à 00:55, Roopa Prabhu a écrit :
>> On 1/29/17, 10:02 AM, David Ahern wrote:
>>> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
> 4. Route Appends
>- IPv6 allows nexthops to be appended to an existing route. In this
>  case one notification is sent per nexthop added
 thanks for listing all of these...I think you mentioned this case to me..
 but I don't remember now why this notification is
 sent per nexthop added. This is an update to an existing multipath route.
 so seems like the notification should be a RTM_NEWROUTE with the full 
 RTA_MULTIPATH route
 (similar to route add)
>>> It could be; it's a question of what should userspace get -- the full route 
>>> or the change? Append to me suggests the latter - userspace is told what 
>>> changed. It is simpler kernel code wise to send the full new route. The 
>>> append changes were done after our conversation. ;-)
>> ok, yeah. you listing all the cases here made it more simpler to understand 
>> in the context of other notifications :). I would prefer all
>> RTM_NEWLINK notifications (ie new add or update to an existing 
>> route..replace/append), contain the full route via RTA_MULTIPATH.
> I don't agree. With the previous proposal, you know *exactly* what happens 
> with
> each notification and this is the primary goal of the notifications. With the
> last proposal, where RTA_MULTIPATH is used for replace and append, you have 
> the
> new result, but you don't know what has been done.
> Usually, notifications are used to notify an event, not the result of an 
> event.
> If you want the result, you can use the dump cmd.

what has been done is conveyed by the APPEND and REPLACE flag in the 
notification. The user only cares about the updated multipath route...
giving parts of the route has never been very useful... and this is consistent 
with ipv4.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Nicolas Dichtel
Le 30/01/2017 à 03:57, David Ahern a écrit :
> On 1/29/17 7:20 PM, Roopa Prabhu wrote:
>>> 2. Delete - 1 notification for each hop for all combinations of delete 
>>> commands
>>
>> here I was trying to say, for people deleting the full multipath route, you 
>> should send a RTA_MULTIPATH.
> 
> The only way to do that is to call inet6_rt_notify() *before* the route 
> delete work has been done in fib6_del_route. Right now it is called at the 
> end - after all sibling associations have been dropped, a time in which it 
> can no longer fill in RTA_MULTIPATH.
> 
> fib6_del_route function does not have any failure paths and is called with 
> the write lock held, so moving the notification might be ok. But it also 
> means ignoring all subsequent failures from fib6_del, again which might be ok 
> since the only failure is ENOENT which can not happen if we are walking the 
> sibling list.
Is it not possible to prepare the RTA_MULTIPATH attribute during the deletion
(when needed information are still available) and to use/copy it later when the
notification is done?

Or at least, having RTA_MUTLTIPATH with one nexthop only, so that the user knows
that this route was part of an ECMP routes?


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Nicolas Dichtel
Le 30/01/2017 à 00:55, Roopa Prabhu a écrit :
> On 1/29/17, 10:02 AM, David Ahern wrote:
>> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
 4. Route Appends
- IPv6 allows nexthops to be appended to an existing route. In this
  case one notification is sent per nexthop added
>>> thanks for listing all of these...I think you mentioned this case to me..
>>> but I don't remember now why this notification is
>>> sent per nexthop added. This is an update to an existing multipath route.
>>> so seems like the notification should be a RTM_NEWROUTE with the full 
>>> RTA_MULTIPATH route
>>> (similar to route add)
>> It could be; it's a question of what should userspace get -- the full route 
>> or the change? Append to me suggests the latter - userspace is told what 
>> changed. It is simpler kernel code wise to send the full new route. The 
>> append changes were done after our conversation. ;-)
> ok, yeah. you listing all the cases here made it more simpler to understand 
> in the context of other notifications :). I would prefer all
> RTM_NEWLINK notifications (ie new add or update to an existing 
> route..replace/append), contain the full route via RTA_MULTIPATH.
I don't agree. With the previous proposal, you know *exactly* what happens with
each notification and this is the primary goal of the notifications. With the
last proposal, where RTA_MULTIPATH is used for replace and append, you have the
new result, but you don't know what has been done.
Usually, notifications are used to notify an event, not the result of an event.
If you want the result, you can use the dump cmd.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-30 Thread Nicolas Dichtel
Le 29/01/2017 à 19:02, David Ahern a écrit :
[snip]
> Data centers are moving to L3, and multipath is a big part of that. Anyone 
> who looks at ip -6 route enough knows it gets painful mentally pulling the 
> individual routes into a single one.
I agree, but it's only an iproute2 problem. iproute2 could group routes to have
a better output, there is no need to have a kernel patch for this ;-)


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-29 Thread David Ahern
On 1/29/17 7:20 PM, Roopa Prabhu wrote:
>> 2. Delete - 1 notification for each hop for all combinations of delete 
>> commands
> 
> here I was trying to say, for people deleting the full multipath route, you 
> should send a RTA_MULTIPATH.

The only way to do that is to call inet6_rt_notify() *before* the route delete 
work has been done in fib6_del_route. Right now it is called at the end - after 
all sibling associations have been dropped, a time in which it can no longer 
fill in RTA_MULTIPATH.

fib6_del_route function does not have any failure paths and is called with the 
write lock held, so moving the notification might be ok. But it also means 
ignoring all subsequent failures from fib6_del, again which might be ok since 
the only failure is ENOENT which can not happen if we are walking the sibling 
list.


> For the odd case of ipv6 single nexthop delete, you can continue to send a 
> single nexthop delete (like today ..without RTA_MULTIPATH)
> because there is no other way to notify a single nexthop delete with a 
> RTM_DELROUTE.
> 
> The reason I say this is because: keeping future in mind, this will make 
> things consistent for majority of people who will
> start using RTA_MULTIPATH for both ipv4 and ipv6 for adds and deletes 
> (requests/notifications and dumps) the same way.
> single next hop delete will just be around because of fear of breaking 
> existing applications.

Single next hop delete will be around because IPv6 allows it -- and because 
IPv4 needs to support it.



Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-29 Thread Roopa Prabhu
On 1/29/17, 5:29 PM, David Ahern wrote:
[snip]
> Let's give an example for each case:
>
> 1. Add - full multipath route
>
> This route command:
> ip -6 ro add vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::2 nexthop via 
> 2001:db8:2::2
>
> generates this notification (ip -t mon ro):
>
> 2001:db8:200::/120 table red metric 1024
> nexthop via 2001:db8:1::2  dev eth1 weight 1
> nexthop via 2001:db8:2::2  dev eth2 weight 1
>

ack,
> 2. Delete - 1 notification for each hop for all combinations of delete 
> commands

here I was trying to say, for people deleting the full multipath route, you 
should send a RTA_MULTIPATH.
For the odd case of ipv6 single nexthop delete, you can continue to send a 
single nexthop delete (like today ..without RTA_MULTIPATH)
because there is no other way to notify a single nexthop delete with a 
RTM_DELROUTE.

The reason I say this is because: keeping future in mind, this will make things 
consistent for majority of people who will
start using RTA_MULTIPATH for both ipv4 and ipv6 for adds and deletes 
(requests/notifications and dumps) the same way.
single next hop delete will just be around because of fear of breaking existing 
applications.

>
>
> 3. Replace -  full multipath route
>
> This route command (with the one from Add case already in place):
> ip -6 ro replace vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::16 
> nexthop via 2001:db8:2::16
>
> generates a single notification:
>
> Replaced 2001:db8:200::/120 table red metric 1024
> nexthop via 2001:db8:1::16  dev eth1 weight 1
> nexthop via 2001:db8:2::16  dev eth2 weight 1

ack,
>
> 4. Append - 1 route after all hops are appended
>
> This append command (starting with the one installed from the Replace case):
> ip -6 ro append vrf red 2001:db8:200::/120 nexthop via 2001:db8:2::20 nexthop 
> via 2001:db8:1::20
>
> generates a single notification:
>
> Append 2001:db8:200::/120 table red metric 1024
> nexthop via 2001:db8:2::20  dev eth2 weight 1
> nexthop via 2001:db8:1::20  dev eth1 weight 1
> nexthop via 2001:db8:1::16  dev eth1 weight 1
> nexthop via 2001:db8:2::16  dev eth2 weight 1
>
> (The Replaced and Append annotations are due to a local iproute2 patch; 
> iproute2 does
> not currently distinguish NEWROUTE cases.)
>

ack,

thanks


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-29 Thread David Ahern
On 1/29/17 4:55 PM, Roopa Prabhu wrote:
> On 1/29/17, 10:02 AM, David Ahern wrote:
>> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
 4. Route Appends
- IPv6 allows nexthops to be appended to an existing route. In this
  case one notification is sent per nexthop added
>>> thanks for listing all of these...I think you mentioned this case to me..
>>> but I don't remember now why this notification is
>>> sent per nexthop added. This is an update to an existing multipath route.
>>> so seems like the notification should be a RTM_NEWROUTE with the full 
>>> RTA_MULTIPATH route
>>> (similar to route add)
>> It could be; it's a question of what should userspace get -- the full route 
>> or the change? Append to me suggests the latter - userspace is told what 
>> changed. It is simpler kernel code wise to send the full new route. The 
>> append changes were done after our conversation. ;-)
> ok, yeah. you listing all the cases here made it more simpler to understand 
> in the context of other notifications :). I would prefer all
> RTM_NEWLINK notifications (ie new add or update to an existing 
> route..replace/append), contain the full route via RTA_MULTIPATH.

Ok. I will send a v4 in a day or 2 (wait for more comments) with APPEND only 
generating 1 notification. Below response is based on this change.

> 
>>
>>> Same holds for replace, I know the code might be tricky here...but the 
>>> route replace
>>> is also an update to an existing multipath route and hence should be a 
>>> RTM_NEWROUTE
>>> with the full multipath route (RTA_MULTIPATH) that changed (from userspace 
>>> semantics POV)
>> It is. The only change on a replace is the encoding of all routes in 
>> RTA_MULTIPATH which is the point of this patch set. On successful replace, 
>> only 1 notification is sent with the full route.
> ok, good.
>>
>>> I don't have a better solution, but with the above still being different, 
>>> wondering
>>> if its worth the risk changing the api for just a few notifications.
>> Data centers are moving to L3, and multipath is a big part of that. Anyone 
>> who looks at ip -6 route enough knows it gets painful mentally pulling the 
>> individual routes into a single one. In addition, using RTA_MULTIPATH is 
>> more efficient at scale.
>>
>> Furthermore, I have a follow on patch set that returns the route matched on 
>> an ip route get lookup. Returning the full multipath route is an important 
>> part to understanding the full context of the route to an address.
> 
> ok, sure.  If we are moving to RTA_MULTIPATH, I just want to make sure all 
> the notifications consistently move to multipath.
> 
> so, to summarize and to get on the same page as you,
> - all RTM_NEWLINK notifications will contain RTA_MULTIPATH when the route in 
> the kernel is a multipath route:
> - since ipv6 allows add/append of a single nexthop, the 
> notification for the first nexthop may go out without a RTA_MULTIPATH

No. I notification for all cases - new, replace, append.

> - RTM_DELETE will also contain RTA_MULTIPATH when the user tries to delete 
> the full route with RTA_MULTIPATH
> - since ipv6 allows deleting of a single nexthop, RTM_DELETE may 
> contain a single nexthop delete ie without RTA_MULTIPATH (this is just to 
> continue
> supporting the single nexthop delete support that ipv6 has)

Let's give an example for each case:

1. Add - full multipath route

This route command:
ip -6 ro add vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::2 nexthop via 
2001:db8:2::2

generates this notification (ip -t mon ro):

2001:db8:200::/120 table red metric 1024
nexthop via 2001:db8:1::2  dev eth1 weight 1
nexthop via 2001:db8:2::2  dev eth2 weight 1


2. Delete - 1 notification for each hop for all combinations of delete commands


3. Replace -  full multipath route

This route command (with the one from Add case already in place):
ip -6 ro replace vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::16 nexthop 
via 2001:db8:2::16

generates a single notification:

Replaced 2001:db8:200::/120 table red metric 1024
nexthop via 2001:db8:1::16  dev eth1 weight 1
nexthop via 2001:db8:2::16  dev eth2 weight 1


4. Append - 1 route after all hops are appended

This append command (starting with the one installed from the Replace case):
ip -6 ro append vrf red 2001:db8:200::/120 nexthop via 2001:db8:2::20 nexthop 
via 2001:db8:1::20

generates a single notification:

Append 2001:db8:200::/120 table red metric 1024
nexthop via 2001:db8:2::20  dev eth2 weight 1
nexthop via 2001:db8:1::20  dev eth1 weight 1
nexthop via 2001:db8:1::16  dev eth1 weight 1
nexthop via 2001:db8:2::16  dev eth2 weight 1

(The Replaced and Append annotations are due to a local iproute2 patch; 
iproute2 does
not currently distinguish NEWROUTE cases.)



Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-29 Thread Roopa Prabhu
On 1/29/17, 10:02 AM, David Ahern wrote:
> On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>>> 4. Route Appends
>>>- IPv6 allows nexthops to be appended to an existing route. In this
>>>  case one notification is sent per nexthop added
>> thanks for listing all of these...I think you mentioned this case to me..
>> but I don't remember now why this notification is
>> sent per nexthop added. This is an update to an existing multipath route.
>> so seems like the notification should be a RTM_NEWROUTE with the full 
>> RTA_MULTIPATH route
>> (similar to route add)
> It could be; it's a question of what should userspace get -- the full route 
> or the change? Append to me suggests the latter - userspace is told what 
> changed. It is simpler kernel code wise to send the full new route. The 
> append changes were done after our conversation. ;-)
ok, yeah. you listing all the cases here made it more simpler to understand in 
the context of other notifications :). I would prefer all
RTM_NEWLINK notifications (ie new add or update to an existing 
route..replace/append), contain the full route via RTA_MULTIPATH.

>
>> Same holds for replace, I know the code might be tricky here...but the route 
>> replace
>> is also an update to an existing multipath route and hence should be a 
>> RTM_NEWROUTE
>> with the full multipath route (RTA_MULTIPATH) that changed (from userspace 
>> semantics POV)
> It is. The only change on a replace is the encoding of all routes in 
> RTA_MULTIPATH which is the point of this patch set. On successful replace, 
> only 1 notification is sent with the full route.
ok, good.
>
>> I don't have a better solution, but with the above still being different, 
>> wondering
>> if its worth the risk changing the api for just a few notifications.
> Data centers are moving to L3, and multipath is a big part of that. Anyone 
> who looks at ip -6 route enough knows it gets painful mentally pulling the 
> individual routes into a single one. In addition, using RTA_MULTIPATH is more 
> efficient at scale.
>
> Furthermore, I have a follow on patch set that returns the route matched on 
> an ip route get lookup. Returning the full multipath route is an important 
> part to understanding the full context of the route to an address.

ok, sure.  If we are moving to RTA_MULTIPATH, I just want to make sure all the 
notifications consistently move to multipath.

so, to summarize and to get on the same page as you,
- all RTM_NEWLINK notifications will contain RTA_MULTIPATH when the route in 
the kernel is a multipath route:
- since ipv6 allows add/append of a single nexthop, the 
notification for the first nexthop may go out without a RTA_MULTIPATH
- RTM_DELETE will also contain RTA_MULTIPATH when the user tries to delete the 
full route with RTA_MULTIPATH
- since ipv6 allows deleting of a single nexthop, RTM_DELETE may 
contain a single nexthop delete ie without RTA_MULTIPATH (this is just to 
continue
supporting the single nexthop delete support that ipv6 has)

Thanks.


Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-29 Thread David Ahern
On 1/28/17 6:00 PM, Roopa Prabhu wrote:
>> 4. Route Appends
>>- IPv6 allows nexthops to be appended to an existing route. In this
>>  case one notification is sent per nexthop added
> 
> thanks for listing all of these...I think you mentioned this case to me..
> but I don't remember now why this notification is
> sent per nexthop added. This is an update to an existing multipath route.
> so seems like the notification should be a RTM_NEWROUTE with the full 
> RTA_MULTIPATH route
> (similar to route add)

It could be; it's a question of what should userspace get -- the full route or 
the change? Append to me suggests the latter - userspace is told what changed. 
It is simpler kernel code wise to send the full new route. The append changes 
were done after our conversation. ;-)

> 
> Same holds for replace, I know the code might be tricky here...but the route 
> replace
> is also an update to an existing multipath route and hence should be a 
> RTM_NEWROUTE
> with the full multipath route (RTA_MULTIPATH) that changed (from userspace 
> semantics POV)

It is. The only change on a replace is the encoding of all routes in 
RTA_MULTIPATH which is the point of this patch set. On successful replace, only 
1 notification is sent with the full route.

> 
> I don't have a better solution, but with the above still being different, 
> wondering
> if its worth the risk changing the api for just a few notifications.

Data centers are moving to L3, and multipath is a big part of that. Anyone who 
looks at ip -6 route enough knows it gets painful mentally pulling the 
individual routes into a single one. In addition, using RTA_MULTIPATH is more 
efficient at scale.

Furthermore, I have a follow on patch set that returns the route matched on an 
ip route get lookup. Returning the full multipath route is an important part to 
understanding the full context of the route to an address.



Re: [PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-28 Thread Roopa Prabhu
On 1/27/17, 3:20 PM, David Ahern wrote:
> This series closes a couple of gaps between IPv4 and IPv6 with respect
> to multipath routes:
>
> 1. IPv4 allows all nexthops of multipath routes to be deleted using just
>the prefix and length; IPv6 only deletes the first nexthop for the
>route if only the prefix and length are given.
>
> 2. IPv4 returns multipath routes encoded in the RTA_MULTIPATH attribute.
>IPv6 returns a series of routes with the same prefix and length - one
>for each nexthop. This happens for both dumps and notifications.
>
> IPv6 does accept RTA_MULTIPATH encoded routes, but installs them as a
> series of routes.
>
> Patch 2 addresses the first item by allowing IPv6 multipath routes to be
> deleted using just the prefix and length. Patch 3 addresses the second
> allowing IPv6 multipath routes to be returned encoded in the RTA_MULTIPATH.
>
> Patch 1 adds the NLM_F_APPEND flag to notifications when the flag is
> present in the request. The lack of this flag was noted testing route
> appends and comparing to IPv4.
>
> Patch 4 prints IPv6 addresses in compressed format when showing route
> replace errors. This was noticed testing REPLACE failures.
>
> The end result for multipath routes:
> 1. Route Add
>- one notification with RTA_MULTIPATH attribute
>
> 2. Route Replace
>- notification for first route and all siblings that have
>  succeeded. This is needed regardless of success of remaining
>  nexthops to maintain add/delete consistency should a failure
>  happens on the second or following nexthop (ie., need to tell
>  userspace that original route has been replaced and then the
>  failure logic deletes all routes inserted thus far).
>  
> 3. Route Delete
>- for multipath route only given nexthops are deleted. This path
>  is hit when DELETE contains RTA_MULTIPATH. All other route deletes,
>  all nexthops are deleted for given prefix and length (and any
>  other specs if given)
>
>- one notification sent per nexthop deleted. This is unavoidable
>  since IPv6 alllows a single nexthop to be deleted within a multipath
>  route
>
> 4. Route Appends
>- IPv6 allows nexthops to be appended to an existing route. In this
>  case one notification is sent per nexthop added

thanks for listing all of these...I think you mentioned this case to me..
but I don't remember now why this notification is
sent per nexthop added. This is an update to an existing multipath route.
so seems like the notification should be a RTM_NEWROUTE with the full 
RTA_MULTIPATH route
(similar to route add)

Same holds for replace, I know the code might be tricky here...but the route 
replace
is also an update to an existing multipath route and hence should be a 
RTM_NEWROUTE
with the full multipath route (RTA_MULTIPATH) that changed (from userspace 
semantics POV)

I don't have a better solution, but with the above still being different, 
wondering
if its worth the risk changing the api for just a few notifications.

>
> Addresses some of the inconsistencies also noted by Roopa at netdev0.1:
> https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_slides.pdf
>
> v3
> - removed the need for a user API to opt-in to change. Requiring an
>   API just shifts the difference from same API with different
>   behavior to different API to achieve equivalent behavior
>
> - route notifications changed to use RTA_MULTIPATH for add and replace
>
> - upated commit messages and cover letter
>
> v2
> - fixed locking in patch 1 as noted by DaveM
> - changed user API for patch 2 to require an rtmsg with RTM_F_ALL_NEXTHOPS
>   set in rtm_flags
> - revamped explanation of patch 2 and cover letter
>
> David Ahern (4):
>   net: ipv6: add NLM_F_APPEND in notifications when applicable
>   net: ipv6: Allow shorthand delete of all nexthops in multipath route
>   net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH
> attribute
>   net: ipv6: Use compressed IPv6 addresses showing route replace error
>
>  include/net/ip6_fib.h |   4 +-
>  include/net/netlink.h |   1 +
>  net/ipv6/ip6_fib.c|  19 +-
>  net/ipv6/route.c  | 163 
> --
>  4 files changed, 165 insertions(+), 22 deletions(-)
>



[PATCH net-next v3 0/4] net: ipv6: Improve user experience with multipath routes

2017-01-27 Thread David Ahern
This series closes a couple of gaps between IPv4 and IPv6 with respect
to multipath routes:

1. IPv4 allows all nexthops of multipath routes to be deleted using just
   the prefix and length; IPv6 only deletes the first nexthop for the
   route if only the prefix and length are given.

2. IPv4 returns multipath routes encoded in the RTA_MULTIPATH attribute.
   IPv6 returns a series of routes with the same prefix and length - one
   for each nexthop. This happens for both dumps and notifications.

IPv6 does accept RTA_MULTIPATH encoded routes, but installs them as a
series of routes.

Patch 2 addresses the first item by allowing IPv6 multipath routes to be
deleted using just the prefix and length. Patch 3 addresses the second
allowing IPv6 multipath routes to be returned encoded in the RTA_MULTIPATH.

Patch 1 adds the NLM_F_APPEND flag to notifications when the flag is
present in the request. The lack of this flag was noted testing route
appends and comparing to IPv4.

Patch 4 prints IPv6 addresses in compressed format when showing route
replace errors. This was noticed testing REPLACE failures.

The end result for multipath routes:
1. Route Add
   - one notification with RTA_MULTIPATH attribute

2. Route Replace
   - notification for first route and all siblings that have
 succeeded. This is needed regardless of success of remaining
 nexthops to maintain add/delete consistency should a failure
 happens on the second or following nexthop (ie., need to tell
 userspace that original route has been replaced and then the
 failure logic deletes all routes inserted thus far).
 
3. Route Delete
   - for multipath route only given nexthops are deleted. This path
 is hit when DELETE contains RTA_MULTIPATH. All other route deletes,
 all nexthops are deleted for given prefix and length (and any
 other specs if given)

   - one notification sent per nexthop deleted. This is unavoidable
 since IPv6 alllows a single nexthop to be deleted within a multipath
 route

4. Route Appends
   - IPv6 allows nexthops to be appended to an existing route. In this
 case one notification is sent per nexthop added

Addresses some of the inconsistencies also noted by Roopa at netdev0.1:
https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_slides.pdf

v3
- removed the need for a user API to opt-in to change. Requiring an
  API just shifts the difference from same API with different
  behavior to different API to achieve equivalent behavior

- route notifications changed to use RTA_MULTIPATH for add and replace

- upated commit messages and cover letter

v2
- fixed locking in patch 1 as noted by DaveM
- changed user API for patch 2 to require an rtmsg with RTM_F_ALL_NEXTHOPS
  set in rtm_flags
- revamped explanation of patch 2 and cover letter

David Ahern (4):
  net: ipv6: add NLM_F_APPEND in notifications when applicable
  net: ipv6: Allow shorthand delete of all nexthops in multipath route
  net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH
attribute
  net: ipv6: Use compressed IPv6 addresses showing route replace error

 include/net/ip6_fib.h |   4 +-
 include/net/netlink.h |   1 +
 net/ipv6/ip6_fib.c|  19 +-
 net/ipv6/route.c  | 163 --
 4 files changed, 165 insertions(+), 22 deletions(-)

-- 
2.1.4