Re: [PATCH 2/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table

2020-08-13 Thread David Ahern
On 8/11/20 1:50 PM, Mathieu Desnoyers wrote:
> As per RFC792, ICMP errors should be sent to the source host.
> 
> However, in configurations with Virtual Routing and Forwarding tables,
> looking up which routing table to use is currently done by using the
> destination net_device.
> 
> commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to
> determine L3 domain") changes the interface passed to
> l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
> to skb_dst(skb_in)->dev. This effectively uses the destination device
> rather than the source device for choosing which routing table should be
> used to lookup where to send the ICMP error.
> 
> Therefore, if the source and destination interfaces are within separate
> VRFs, or one in the global routing table and the other in a VRF, looking
> up the source host in the destination interface's routing table will
> fail if the destination interface's routing table contains no route to
> the source host.
> 
> One observable effect of this issue is that traceroute does not work in
> the following cases:
> 
> - Route leaking between global routing table and VRF
> - Route leaking between VRFs
> 
> Preferably use the source device routing table when sending ICMP error
> messages. If no source device is set, fall-back on the destination
> device routing table.
> 
> Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine 
> L3 domain")
> Link: https://tools.ietf.org/html/rfc792
> Signed-off-by: Mathieu Desnoyers 
> Cc: David Ahern 
> Cc: David S. Miller 
> Cc: net...@vger.kernel.org
> ---
>  net/ipv4/icmp.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index cf36f955bfe6..1eb83d82ec68 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -465,6 +465,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
>   int type, int code,
>   struct icmp_bxm *param)
>  {
> + struct net_device *route_lookup_dev = NULL;
>   struct rtable *rt, *rt2;
>   struct flowi4 fl4_dec;
>   int err;
> @@ -479,7 +480,17 @@ static struct rtable *icmp_route_lookup(struct net *net,
>   fl4->flowi4_proto = IPPROTO_ICMP;
>   fl4->fl4_icmp_type = type;
>   fl4->fl4_icmp_code = code;
> - fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
> + /*
> +  * The device used for looking up which routing table to use is
> +  * preferably the source whenever it is set, which should ensure
> +  * the icmp error can be sent to the source host, else fallback
> +  * on the destination device.
> +  */
> + if (skb_in->dev)
> + route_lookup_dev = skb_in->dev;
> + else if (skb_dst(skb_in))
> + route_lookup_dev = skb_dst(skb_in)->dev;
> + fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev);
>  
>   security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
>   rt = ip_route_output_key_hash(net, fl4, skb_in);
> @@ -503,7 +514,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
>   if (err)
>   goto relookup_failed;
>  
> - if (inet_addr_type_dev_table(net, skb_dst(skb_in)->dev,
> + if (inet_addr_type_dev_table(net, route_lookup_dev,
>fl4_dec.saddr) == RTN_LOCAL) {
>   rt2 = __ip_route_output_key(net, _dec);
>   if (IS_ERR(rt2))
> 

ICMP's can be generated in many locations:
1. forward path - I think the skb_in dev is always set,

2. ingress and upper layer protocols -  dev is dropped prior to
transport layers, so, for example, UDP sending port unreachable calls
icmp_send with skb_in->dev set to NULL.

3. local packets and egress - e.g., link failures and here I believe skb
dev is set.

If in and out are in the same L3 domain, either device works where for
VRF route leaking with the forward path in and out are in separate
domains so yes you want the ingress device.

This change seems fine to me and I have not seen any issues with
existing selftests.

Reviewed-by: David Ahern 


But I did notice that unreachable / fragmentation needed messages are
NOT working with this change. You can see that by changing the MTU of
eth1 in r1 to 1400 and running:
   ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2

You really should get that working as well with VRF route leaking.




Re: [PATCH 2/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table

2020-08-13 Thread Mathieu Desnoyers
- On Aug 12, 2020, at 5:43 PM, David S. Miller da...@davemloft.net wrote:

> From: Mathieu Desnoyers 
> Date: Tue, 11 Aug 2020 15:50:02 -0400
> 
>> @@ -465,6 +465,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
>>  int type, int code,
>>  struct icmp_bxm *param)
>>  {
>> +struct net_device *route_lookup_dev = NULL;
>>  struct rtable *rt, *rt2;
>>  struct flowi4 fl4_dec;
>>  int err;
>> @@ -479,7 +480,17 @@ static struct rtable *icmp_route_lookup(struct net *net,
>>  fl4->flowi4_proto = IPPROTO_ICMP;
>>  fl4->fl4_icmp_type = type;
>>  fl4->fl4_icmp_code = code;
>> -fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
>> +/*
>> + * The device used for looking up which routing table to use is
>> + * preferably the source whenever it is set, which should ensure
>> + * the icmp error can be sent to the source host, else fallback
>> + * on the destination device.
>> + */
>> +if (skb_in->dev)
>> +route_lookup_dev = skb_in->dev;
>> +else if (skb_dst(skb_in))
>> +route_lookup_dev = skb_dst(skb_in)->dev;
>> +fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev);
> 
> The caller of icmp_route_lookup() uses the opposite prioritization of
> devices for determining the network namespace to use:
> 
>   if (rt->dst.dev)
>   net = dev_net(rt->dst.dev);
>   else if (skb_in->dev)
>   net = dev_net(skb_in->dev);
>   else
>   goto out;
> 
> Do we have to reverse the ordering there too?

Looking at the history:

Originally dst.dev was used as network namespace for icmp errors:

dde1bc0e6f861 (Denis V. Lunev   2008-01-22 23:50:57 -0800  450) 
net = rt->u.dst.dev->nd_net;

commit dde1bc0e6f86183bc095d0774cd109f4edf66ea2
Author: Denis V. Lunev 
Date:   Tue Jan 22 23:50:57 2008 -0800

[NETNS]: Add namespace for ICMP replying code.

All needed API is done, the namespace is available when required from
the device on the DST entry from the incoming packet. So, just replace
init_net with proper namespace.

Here I wonder what motivated use of the DST entry here ?

Note that this choice of DST network namespace applies to both __icmp_send and
icmp_unreach.

It has been followed by a few data structure layout changes:

c346dca10840a (YOSHIFUJI Hideaki2008-03-25 21:47:49 +0900  430) 
net = dev_net(rt->u.dst.dev);
d8d1f30b95a63 (Changli Gao  2010-06-10 23:31:35 -0700  585) 
net = dev_net(rt->dst.dev);

It was then changed to fix a NULL pointer deref:

e2c693934194f (Hangbin Liu  2019-08-22 22:19:48 +0800  586) 
e2c693934194f (Hangbin Liu  2019-08-22 22:19:48 +0800  587) 
if (rt->dst.dev)
e2c693934194f (Hangbin Liu  2019-08-22 22:19:48 +0800  588) 
net = dev_net(rt->dst.dev);
e2c693934194f (Hangbin Liu  2019-08-22 22:19:48 +0800  589) 
else if (skb_in->dev)
e2c693934194f (Hangbin Liu  2019-08-22 22:19:48 +0800  590) 
net = dev_net(skb_in->dev);
e2c693934194f (Hangbin Liu  2019-08-22 22:19:48 +0800  591) 
else
e2c693934194f (Hangbin Liu  2019-08-22 22:19:48 +0800  592) 
goto out;


> And when I read fallback in your commit message description, I
> imagined that you would have a two tiered lookup scheme.  First you
> would be trying the skb_in->dev for a lookup (to accomodate the VRF
> case), and if that failed you'd try again with skb_dst()->dev.

The code I proposed basically does use the skb_in->dev (if non-null)
for looking up which VRF table to use, else use skb_dst(skb_in) (if non-null)
for looking up which VRF table to use, else route_lookup_dev is NULL, which
means use the master table.

Whether this should instead try to lookup the source address with the 
skb_in->dev
table, and of that fails go to the next, is a good question. I think the context
I am missing in order to understand which approach is appropriate is which
scenario can cause skb_in->dev to be NULL, and which can cause skb_dst(skb_in)
to be NULL, and what is the expected behavior for icmp error route lookup in 
those
cases ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH 2/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table

2020-08-12 Thread David Miller
From: Mathieu Desnoyers 
Date: Tue, 11 Aug 2020 15:50:02 -0400

> @@ -465,6 +465,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
>   int type, int code,
>   struct icmp_bxm *param)
>  {
> + struct net_device *route_lookup_dev = NULL;
>   struct rtable *rt, *rt2;
>   struct flowi4 fl4_dec;
>   int err;
> @@ -479,7 +480,17 @@ static struct rtable *icmp_route_lookup(struct net *net,
>   fl4->flowi4_proto = IPPROTO_ICMP;
>   fl4->fl4_icmp_type = type;
>   fl4->fl4_icmp_code = code;
> - fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
> + /*
> +  * The device used for looking up which routing table to use is
> +  * preferably the source whenever it is set, which should ensure
> +  * the icmp error can be sent to the source host, else fallback
> +  * on the destination device.
> +  */
> + if (skb_in->dev)
> + route_lookup_dev = skb_in->dev;
> + else if (skb_dst(skb_in))
> + route_lookup_dev = skb_dst(skb_in)->dev;
> + fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev);

The caller of icmp_route_lookup() uses the opposite prioritization of
devices for determining the network namespace to use:

if (rt->dst.dev)
net = dev_net(rt->dst.dev);
else if (skb_in->dev)
net = dev_net(skb_in->dev);
else
goto out;

Do we have to reverse the ordering there too?

And when I read fallback in your commit message description, I
imagined that you would have a two tiered lookup scheme.  First you
would be trying the skb_in->dev for a lookup (to accomodate the VRF
case), and if that failed you'd try again with skb_dst()->dev.


[PATCH 2/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table

2020-08-11 Thread Mathieu Desnoyers
As per RFC792, ICMP errors should be sent to the source host.

However, in configurations with Virtual Routing and Forwarding tables,
looking up which routing table to use is currently done by using the
destination net_device.

commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to
determine L3 domain") changes the interface passed to
l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
to skb_dst(skb_in)->dev. This effectively uses the destination device
rather than the source device for choosing which routing table should be
used to lookup where to send the ICMP error.

Therefore, if the source and destination interfaces are within separate
VRFs, or one in the global routing table and the other in a VRF, looking
up the source host in the destination interface's routing table will
fail if the destination interface's routing table contains no route to
the source host.

One observable effect of this issue is that traceroute does not work in
the following cases:

- Route leaking between global routing table and VRF
- Route leaking between VRFs

Preferably use the source device routing table when sending ICMP error
messages. If no source device is set, fall-back on the destination
device routing table.

Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 
domain")
Link: https://tools.ietf.org/html/rfc792
Signed-off-by: Mathieu Desnoyers 
Cc: David Ahern 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
---
 net/ipv4/icmp.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index cf36f955bfe6..1eb83d82ec68 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -465,6 +465,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
int type, int code,
struct icmp_bxm *param)
 {
+   struct net_device *route_lookup_dev = NULL;
struct rtable *rt, *rt2;
struct flowi4 fl4_dec;
int err;
@@ -479,7 +480,17 @@ static struct rtable *icmp_route_lookup(struct net *net,
fl4->flowi4_proto = IPPROTO_ICMP;
fl4->fl4_icmp_type = type;
fl4->fl4_icmp_code = code;
-   fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
+   /*
+* The device used for looking up which routing table to use is
+* preferably the source whenever it is set, which should ensure
+* the icmp error can be sent to the source host, else fallback
+* on the destination device.
+*/
+   if (skb_in->dev)
+   route_lookup_dev = skb_in->dev;
+   else if (skb_dst(skb_in))
+   route_lookup_dev = skb_dst(skb_in)->dev;
+   fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev);
 
security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
rt = ip_route_output_key_hash(net, fl4, skb_in);
@@ -503,7 +514,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
if (err)
goto relookup_failed;
 
-   if (inet_addr_type_dev_table(net, skb_dst(skb_in)->dev,
+   if (inet_addr_type_dev_table(net, route_lookup_dev,
 fl4_dec.saddr) == RTN_LOCAL) {
rt2 = __ip_route_output_key(net, _dec);
if (IS_ERR(rt2))
-- 
2.17.1



[RFC PATCH 2/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table

2020-07-29 Thread Mathieu Desnoyers
As per RFC792, ICMP errors should be sent to the source host.

However, in configurations with Virtual Routing and Forwarding tables,
looking up which routing table to use is currently done by using the
destination net_device.

commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to
determine L3 domain") changes the interface passed to
l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
to skb_dst(skb_in)->dev. This effectively uses the destination device
rather than the source device for choosing which routing table should be
used to lookup where to send the ICMP error.

Therefore, if the source and destination interfaces are within separate
VRFs, or one in the global routing table and the other in a VRF, looking
up the source host in the destination interface's routing table will
fail if the destination interface's routing table contains no route to
the source host.

One observable effect of this issue is that traceroute does not work in
the following cases:

- Route leaking between global routing table and VRF
- Route leaking between VRFs

Preferably use the source device routing table when sending ICMPv6 error
messages. If no source device is set, fall-back on the destination
device routing table.

Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine L3 
domain")
Link: https://tools.ietf.org/html/rfc792
Signed-off-by: Mathieu Desnoyers 
Cc: David Ahern 
Cc: David S. Miller 
Cc: net...@vger.kernel.org
---
 net/ipv4/icmp.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index e30515f89802..029e12d35b19 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -465,6 +465,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
int type, int code,
struct icmp_bxm *param)
 {
+   struct net_device *route_lookup_dev = NULL;
struct rtable *rt, *rt2;
struct flowi4 fl4_dec;
int err;
@@ -479,7 +480,17 @@ static struct rtable *icmp_route_lookup(struct net *net,
fl4->flowi4_proto = IPPROTO_ICMP;
fl4->fl4_icmp_type = type;
fl4->fl4_icmp_code = code;
-   fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
+   /*
+* The device used for looking up which routing table to use is
+* preferably the source whenever it is set, which should ensure
+* the icmp error can be sent to the source host, else fallback
+* on the destination device.
+*/
+   if (skb_in->dev)
+   route_lookup_dev = skb_in->dev;
+   else if (skb_dst(skb_in))
+   route_lookup_dev = skb_dst(skb_in)->dev;
+   fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev);
 
security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
rt = ip_route_output_key_hash(net, fl4, skb_in);
@@ -503,7 +514,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
if (err)
goto relookup_failed;
 
-   if (inet_addr_type_dev_table(net, skb_dst(skb_in)->dev,
+   if (inet_addr_type_dev_table(net, route_lookup_dev,
 fl4_dec.saddr) == RTN_LOCAL) {
rt2 = __ip_route_output_key(net, _dec);
if (IS_ERR(rt2))
-- 
2.17.1