Re: v6/sit tunnels and VRFs

2018-04-14 Thread Jeff Barnhill
I didn't see an easy way to achieve this behavior without affecting
the non-VRF routing lookups (such as deleting non-VRF rules).  We have
some automated tests that were looking for specific responses, but, of
course, those can be changed.  Among a few of my colleagues, this
became a discussion about maintaining consistent behavior between VRF
and non-VRF, such that a ping or some other tool wouldn't respond
differently.  That's the main reason I asked the question here - to
see how important this was in general use. It sounds like in your
experience, the specific error message/code hasn't been an issue.

Thanks,
Jeff


On Fri, Apr 13, 2018 at 4:31 PM, David Ahern  wrote:
> On 4/13/18 2:23 PM, Jeff Barnhill wrote:
>> It seems that the ENETUNREACH response is still desirable in the VRF
>> case since the only difference (when using VRF vs. not) is that the
>> lookup should be restrained to a specific VRF.
>
> VRF is just policy routing to a table. If the table wants the lookup to
> stop, then it needs a default route. What you are referring to is the
> lookup goes through all tables and does not find an answer so it fails
> with -ENETUNREACH. I do not know of any way to make that happen with the
> existing default route options and in the past 2+ years we have not hit
> any s/w that discriminates -ENETUNREACH from -EHOSTUNREACH.
>
> I take it this is code from your internal code base. Why does it care
> between those two failures?


Re: v6/sit tunnels and VRFs

2018-04-13 Thread David Ahern
On 4/13/18 2:23 PM, Jeff Barnhill wrote:
> It seems that the ENETUNREACH response is still desirable in the VRF
> case since the only difference (when using VRF vs. not) is that the
> lookup should be restrained to a specific VRF.

VRF is just policy routing to a table. If the table wants the lookup to
stop, then it needs a default route. What you are referring to is the
lookup goes through all tables and does not find an answer so it fails
with -ENETUNREACH. I do not know of any way to make that happen with the
existing default route options and in the past 2+ years we have not hit
any s/w that discriminates -ENETUNREACH from -EHOSTUNREACH.

I take it this is code from your internal code base. Why does it care
between those two failures?


Re: v6/sit tunnels and VRFs

2018-04-13 Thread Jeff Barnhill
Thanks for the response, David. I'm not questioning the need to stop
the fib lookup once the end of the VRF table is reached - I agree that
is needed.  I'm concerned with the difference in the response/error
returned from the failed lookup.

For instance, with vrf "unreachable default" route, I get this:
# ip route get 1.1.1.1 vrf vrf_258
RTNETLINK answers: No route to host

Without it (and assuming no match for 1.1.1.1 in local/main/default
tables), I get this:
# ip route get 1.1.1.1 vrf vrf_258
RTNETLINK answers: Network is unreachable

Which is also what happens when not using VRFs at all.

It seems that the ENETUNREACH response is still desirable in the VRF
case since the only difference (when using VRF vs. not) is that the
lookup should be restrained to a specific VRF.

Jeff



On Thu, Apr 12, 2018 at 10:25 PM, David Ahern  wrote:
> On 4/12/18 10:54 AM, Jeff Barnhill wrote:
>> Hi David,
>>
>> In the slides referenced, you recommend adding an "unreachable
>> default" route to the end of each VRF route table.  In my testing (for
>> v4) this results in a change to fib lookup failures such that instead
>> of ENETUNREACH being returned, EHOSTUNREACH is returned since the fib
>> finds the unreachable route, versus failing to find a route
>> altogether.
>>
>> Have the implications of this been considered?  I don't see a
>> clean/easy way to achieve the old behavior without affecting non-VRF
>> routing (eg. remove the unreachable route and delete the non-VRF
>> rules).  I'm guessing that programmatically, it may not make much
>> difference, ie. lookup fails, but for debugging or to a user looking
>> at it, the difference matters.  Do you (or anyone else) have any
>> thoughts on this?
>
> We have recommended moving the local table down in the FIB rules:
>
> # ip ru ls
> 1000:   from all lookup [l3mdev-table]
> 32765:  from all lookup local
> 32766:  from all lookup main
> 32767:  from all lookup default
>
> and adding a default route to VRF tables:
>
> # ip ro ls vrf red
> unreachable default  metric 4278198272
> 172.16.2.0/24  proto bgp  metric 20
> nexthop via 169.254.0.1  dev swp3 weight 1 onlink
> nexthop via 169.254.0.1  dev swp4 weight 1 onlink
>
> # ip -6 ro ls vrf red
> 2001:db8:2::/64  proto bgp  metric 20
> nexthop via fe80::202:ff:fe00:e  dev swp3 weight 1
> nexthop via fe80::202:ff:fe00:f  dev swp4 weight 1
> anycast fe80:: dev lo  proto kernel  metric 0  pref medium
> anycast fe80:: dev lo  proto kernel  metric 0  pref medium
> fe80::/64 dev swp3  proto kernel  metric 256  pref medium
> fe80::/64 dev swp4  proto kernel  metric 256  pref medium
> ff00::/8 dev swp3  metric 256  pref medium
> ff00::/8 dev swp4  metric 256  pref medium
> unreachable default dev lo  metric 4278198272  error -101 pref medium
>
> Over the last 2 years we have not seen any negative side effects to this
> and is what you want to have proper VRF separation.
>
> Without a default route lookups will proceed to the next fib rule which
> means a lookup in the next table and barring other PBR rules will be the
> main table. It will lead to wrong lookups.
>
> Here is an example:
>   ip netns add foo
>   ip netns exec foo bash
>   ip li set lo up
>   ip li add red type vrf table 123
>   ip li set red up
>   ip li add dummy1 type dummy
>   ip addr add 10.100.1.1/24 dev dummy1
>   ip li set dummy1 master red
>   ip li set dummy1 up
>   ip li add dummy2 type dummy
>   ip addr add 10.100.1.1/24 dev dummy2
>   ip li set dummy2 up
>   ip ro get 10.100.2.2
>   ip ro get 10.100.2.2 vrf red
>
> # ip ru ls
> 0:  from all lookup local
> 1000:   from all lookup [l3mdev-table]
> 32766:  from all lookup main
> 32767:  from all lookup default
>
> # ip ro ls
> 10.100.1.0/24 dev dummy2 proto kernel scope link src 10.100.1.1
> 10.100.2.0/24 via 10.100.1.2 dev dummy2
>
> # ip ro ls vrf red
> 10.100.1.0/24 dev dummy1 proto kernel scope link src 10.100.1.1
>
> That's the setup. What happens on route lookups?
> # ip ro get vrf red 10.100.2.1
> 10.100.2.1 via 10.100.1.2 dev dummy2 src 10.100.1.1 uid 0
> cache
>
> which is clearly wrong. Let's look at the lookup sequence
>
> # perf record -e fib:* ip ro get vrf red 10.100.2.1
> 10.100.2.1 via 10.100.1.2 dev dummy2 src 10.100.1.1 uid 0
> cache
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.003 MB perf.data (4 samples) ]
>
> #  perf script --fields trace:trace
> table 255 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
> table 123 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
> table 254 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
> nexthop dev dummy2 oif 4 src 10.100.1.1
>
> The first one is because I did not move the local table down.
> The second one is the correct vrf lookup
> The third one is the continuation to the next table - the main table.
>
> Adding a default route:
> # ip ro add vrf red unreachable default
>
> And the lookup is proper:
> # ip ro 

Re: v6/sit tunnels and VRFs

2018-04-12 Thread David Ahern
On 4/12/18 10:54 AM, Jeff Barnhill wrote:
> Hi David,
> 
> In the slides referenced, you recommend adding an "unreachable
> default" route to the end of each VRF route table.  In my testing (for
> v4) this results in a change to fib lookup failures such that instead
> of ENETUNREACH being returned, EHOSTUNREACH is returned since the fib
> finds the unreachable route, versus failing to find a route
> altogether.
> 
> Have the implications of this been considered?  I don't see a
> clean/easy way to achieve the old behavior without affecting non-VRF
> routing (eg. remove the unreachable route and delete the non-VRF
> rules).  I'm guessing that programmatically, it may not make much
> difference, ie. lookup fails, but for debugging or to a user looking
> at it, the difference matters.  Do you (or anyone else) have any
> thoughts on this?

We have recommended moving the local table down in the FIB rules:

# ip ru ls
1000:   from all lookup [l3mdev-table]
32765:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default

and adding a default route to VRF tables:

# ip ro ls vrf red
unreachable default  metric 4278198272
172.16.2.0/24  proto bgp  metric 20
nexthop via 169.254.0.1  dev swp3 weight 1 onlink
nexthop via 169.254.0.1  dev swp4 weight 1 onlink

# ip -6 ro ls vrf red
2001:db8:2::/64  proto bgp  metric 20
nexthop via fe80::202:ff:fe00:e  dev swp3 weight 1
nexthop via fe80::202:ff:fe00:f  dev swp4 weight 1
anycast fe80:: dev lo  proto kernel  metric 0  pref medium
anycast fe80:: dev lo  proto kernel  metric 0  pref medium
fe80::/64 dev swp3  proto kernel  metric 256  pref medium
fe80::/64 dev swp4  proto kernel  metric 256  pref medium
ff00::/8 dev swp3  metric 256  pref medium
ff00::/8 dev swp4  metric 256  pref medium
unreachable default dev lo  metric 4278198272  error -101 pref medium

Over the last 2 years we have not seen any negative side effects to this
and is what you want to have proper VRF separation.

Without a default route lookups will proceed to the next fib rule which
means a lookup in the next table and barring other PBR rules will be the
main table. It will lead to wrong lookups.

Here is an example:
  ip netns add foo
  ip netns exec foo bash
  ip li set lo up
  ip li add red type vrf table 123
  ip li set red up
  ip li add dummy1 type dummy
  ip addr add 10.100.1.1/24 dev dummy1
  ip li set dummy1 master red
  ip li set dummy1 up
  ip li add dummy2 type dummy
  ip addr add 10.100.1.1/24 dev dummy2
  ip li set dummy2 up
  ip ro get 10.100.2.2
  ip ro get 10.100.2.2 vrf red

# ip ru ls
0:  from all lookup local
1000:   from all lookup [l3mdev-table]
32766:  from all lookup main
32767:  from all lookup default

# ip ro ls
10.100.1.0/24 dev dummy2 proto kernel scope link src 10.100.1.1
10.100.2.0/24 via 10.100.1.2 dev dummy2

# ip ro ls vrf red
10.100.1.0/24 dev dummy1 proto kernel scope link src 10.100.1.1

That's the setup. What happens on route lookups?
# ip ro get vrf red 10.100.2.1
10.100.2.1 via 10.100.1.2 dev dummy2 src 10.100.1.1 uid 0
cache

which is clearly wrong. Let's look at the lookup sequence

# perf record -e fib:* ip ro get vrf red 10.100.2.1
10.100.2.1 via 10.100.1.2 dev dummy2 src 10.100.1.1 uid 0
cache
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (4 samples) ]

#  perf script --fields trace:trace
table 255 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
table 123 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
table 254 oif 2 iif 1 src 0.0.0.0 dst 10.100.2.1 tos 0 scope 0 flags 4
nexthop dev dummy2 oif 4 src 10.100.1.1

The first one is because I did not move the local table down.
The second one is the correct vrf lookup
The third one is the continuation to the next table - the main table.

Adding a default route:
# ip ro add vrf red unreachable default

And the lookup is proper:
# ip ro get vrf red 10.100.2.1
RTNETLINK answers: No route to host


Re: v6/sit tunnels and VRFs

2018-04-12 Thread Jeff Barnhill
Hi David,

In the slides referenced, you recommend adding an "unreachable
default" route to the end of each VRF route table.  In my testing (for
v4) this results in a change to fib lookup failures such that instead
of ENETUNREACH being returned, EHOSTUNREACH is returned since the fib
finds the unreachable route, versus failing to find a route
altogether.

Have the implications of this been considered?  I don't see a
clean/easy way to achieve the old behavior without affecting non-VRF
routing (eg. remove the unreachable route and delete the non-VRF
rules).  I'm guessing that programmatically, it may not make much
difference, ie. lookup fails, but for debugging or to a user looking
at it, the difference matters.  Do you (or anyone else) have any
thoughts on this?

Thanks,
Jeff


On Sun, Oct 29, 2017 at 11:48 AM, David Ahern  wrote:
> On 10/27/17 8:43 PM, Jeff Barnhill wrote:
>> ping v4 loopback...
>>
>> jeff@VM2:~$ ip route list vrf myvrf
>> 127.0.0.0/8 dev myvrf proto kernel scope link src 127.0.0.1
>> 192.168.200.0/24 via 192.168.210.3 dev enp0s8
>> 192.168.210.0/24 dev enp0s8 proto kernel scope link src 192.168.210.2
>>
>> Lookups shown in perf script were for table 255.  Is it necessary to
>> put the l3mdev table first?  If I re-order the tables, it starts
>> working:
>
> Yes, we advise moving the local table down to avoid false hits (e.g.,
> duplicate addresses like this between the default VRF and another VRF).
>
> I covered that and a few other things at OSS 2017. Latest VRF slides for
> users:
>   http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf


Re: v6/sit tunnels and VRFs

2017-10-31 Thread David Ahern
On 10/31/17 4:20 PM, Jeff Barnhill wrote:
> I was surprised that nlmsg_parse in fib_nl_newrule() didn't pick this
> up, but I verified that the received value for this attribute was 0,
> not 1 (w/o the patch).

It only checks minimum length, not exact length.


Re: v6/sit tunnels and VRFs

2017-10-31 Thread David Ahern
On 10/31/17 4:20 PM, Jeff Barnhill wrote:
> Thanks, David.  Those slides are extremely helpful.
> 
> Also, I ran into a bug that manifested on big endian architecture:
> 
> diff --git i/drivers/net/vrf.c w/drivers/net/vrf.c
> index b23bb2fae5f8..a5f984689aee 100644
> --- i/drivers/net/vrf.c
> +++ w/drivers/net/vrf.c
> @@ -1130,7 +1130,7 @@ static int vrf_fib_rule(const struct net_device
> *dev, __u8 family, bool add_it)
> frh->family = family;
> frh->action = FR_ACT_TO_TBL;
> 
> -   if (nla_put_u32(skb, FRA_L3MDEV, 1))
> +   if (nla_put_u8(skb, FRA_L3MDEV, 1))
> goto nla_put_failure;
> 
> if (nla_put_u32(skb, FRA_PRIORITY, FIB_RULE_PREF))
> 
> I was surprised that nlmsg_parse in fib_nl_newrule() didn't pick this
> up, but I verified that the received value for this attribute was 0,
> not 1 (w/o the patch).
> 

yikes, I am surprised the fib rule policy did not catch that.

Please submit formally with:

Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create")


Re: v6/sit tunnels and VRFs

2017-10-31 Thread Jeff Barnhill
Thanks, David.  Those slides are extremely helpful.

Also, I ran into a bug that manifested on big endian architecture:

diff --git i/drivers/net/vrf.c w/drivers/net/vrf.c
index b23bb2fae5f8..a5f984689aee 100644
--- i/drivers/net/vrf.c
+++ w/drivers/net/vrf.c
@@ -1130,7 +1130,7 @@ static int vrf_fib_rule(const struct net_device
*dev, __u8 family, bool add_it)
frh->family = family;
frh->action = FR_ACT_TO_TBL;

-   if (nla_put_u32(skb, FRA_L3MDEV, 1))
+   if (nla_put_u8(skb, FRA_L3MDEV, 1))
goto nla_put_failure;

if (nla_put_u32(skb, FRA_PRIORITY, FIB_RULE_PREF))

I was surprised that nlmsg_parse in fib_nl_newrule() didn't pick this
up, but I verified that the received value for this attribute was 0,
not 1 (w/o the patch).

Jeff



On Sun, Oct 29, 2017 at 11:48 AM, David Ahern  wrote:
> On 10/27/17 8:43 PM, Jeff Barnhill wrote:
>> ping v4 loopback...
>>
>> jeff@VM2:~$ ip route list vrf myvrf
>> 127.0.0.0/8 dev myvrf proto kernel scope link src 127.0.0.1
>> 192.168.200.0/24 via 192.168.210.3 dev enp0s8
>> 192.168.210.0/24 dev enp0s8 proto kernel scope link src 192.168.210.2
>>
>> Lookups shown in perf script were for table 255.  Is it necessary to
>> put the l3mdev table first?  If I re-order the tables, it starts
>> working:
>
> Yes, we advise moving the local table down to avoid false hits (e.g.,
> duplicate addresses like this between the default VRF and another VRF).
>
> I covered that and a few other things at OSS 2017. Latest VRF slides for
> users:
>   http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf


Re: v6/sit tunnels and VRFs

2017-10-29 Thread David Ahern
On 10/27/17 8:43 PM, Jeff Barnhill wrote:
> ping v4 loopback...
> 
> jeff@VM2:~$ ip route list vrf myvrf
> 127.0.0.0/8 dev myvrf proto kernel scope link src 127.0.0.1
> 192.168.200.0/24 via 192.168.210.3 dev enp0s8
> 192.168.210.0/24 dev enp0s8 proto kernel scope link src 192.168.210.2
> 
> Lookups shown in perf script were for table 255.  Is it necessary to
> put the l3mdev table first?  If I re-order the tables, it starts
> working:

Yes, we advise moving the local table down to avoid false hits (e.g.,
duplicate addresses like this between the default VRF and another VRF).

I covered that and a few other things at OSS 2017. Latest VRF slides for
users:
  http://schd.ws/hosted_files/ossna2017/fe/vrf-tutorial-oss.pdf


Re: v6/sit tunnels and VRFs

2017-10-27 Thread Jeff Barnhill
Your comments on the tunnel VRF and underlay VRF being different make
sense and is more flexible.  I think assigning the dev/link to the
same VRF as the tunnel master accomplishes the same thing that I
originally had in mind.

ping v4 loopback...

jeff@VM2:~$ ip route list vrf myvrf
127.0.0.0/8 dev myvrf proto kernel scope link src 127.0.0.1
192.168.200.0/24 via 192.168.210.3 dev enp0s8
192.168.210.0/24 dev enp0s8 proto kernel scope link src 192.168.210.2

Lookups shown in perf script were for table 255.  Is it necessary to
put the l3mdev table first?  If I re-order the tables, it starts
working:

jeff@VM2:~$ ip rule list
0:  from all lookup local
1000:   from all lookup [l3mdev-table]
32766:  from all lookup main
32767:  from all lookup default

jeff@VM2:~$ sudo ip rule del pref 0
jeff@VM2:~$ sudo ip rule add pref 32765 table 255

jeff@VM2:~$ ip rule list
1000:   from all lookup [l3mdev-table]
32765:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default

jeff@VM2:~$ ping -c1 -I myvrf 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.029 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.029/0.029/0.029/0.000 ms


On Fri, Oct 27, 2017 at 6:53 PM, David Ahern  wrote:
> On 10/27/17 2:59 PM, Jeff Barnhill wrote:
>> w/regards to this comment:
>>You have a remote address with no qualification about which VRF to
>> use for the lookup.
>>
>> I was using this to enslave the tunnel:
>> sudo ip link set jtun vrf myvrf
>>
>> and assumed this would be enough to cause all tunnel traffic to be
>> part of this VRF.
>>
>> You are right about the tunnel link/dev configuration. (re-stating to
>> be sure we are saying the same thing)  I can use either the VRF device
>> or the v4 device and the packet will get sent.  If I use the VRF
>> device, when the reply packet is received, the tunnel is found
>> successfully.  If I use the v4 device, then I need the patch you
>> provided earlier to successfully look up the tunnel.  You mentioned
>> that both should be supported...Back to my previous comment about
>> enslaving the tunnel...shouldn't a tunnel being enslaved to a VRF also
>> be enough to allow sending and receiving on any device in that VRF
>> (that is, not having to configure the link/dev on the tunnel) ?
>
> Sure, it could be done that way. I guess the question is whether we want
> to allow the tunnel device and the underlying device to be in separate VRFs.
>
> I have always taken the approach that every device with an address can
> be in a separate routing domain. Here, that means the tunnel devices can
> be in 1 VRF, and the underlying connection in a separate VRF (or no VRF).
>
>>
>> Fyi...with regards to the v4 ping, adding the loopback address/network
>> didn't work for me.  I'll see if I can get anymore info on this and
>> maybe start a new thread.
>>
>> Thanks,
>> Jeff
>>
>>
>> jeff@VM2:~$ ping -I myvrf 127.0.0.1
>> ping: Warning: source address might be selected on device other than myvrf.
>> PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
>> ^C
>> --- 127.0.0.1 ping statistics ---
>> 3 packets transmitted, 0 received, 100% packet loss, time 2051ms
>>
>> jeff@VM2:~$ sudo ip addr add dev myvrf 127.0.0.1/8
>> jeff@VM2:~$ ping -I myvrf 127.0.0.1
>> PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
>> ^C
>> --- 127.0.0.1 ping statistics ---
>> 7 packets transmitted, 0 received, 100% packet loss, time 6149ms
>>
>> jeff@VM2:~$ ip addr list myvrf
>> 4: myvrf:  mtu 65536 qdisc noqueue state UP
>> group default qlen 1000
>> link/ether 82:2b:08:8f:a9:f3 brd ff:ff:ff:ff:ff:ff
>> inet 127.0.0.1/8 scope host myvrf
>>valid_lft forever preferred_lft forever
>
> This is v4.14 kernel? Something else? The 127.0.0.1 address on vrf
> device has been supported since the v4.9 kernel.
>
> ip ro ls vrf myvrf?
>
>
> perf record -e fib:* -a -g -- ping -I myvrf -c1 -w1 127.0.0.1
> perf script
> --> does it show lookups in the correct table for this address?


Re: v6/sit tunnels and VRFs

2017-10-27 Thread David Ahern
On 10/27/17 2:59 PM, Jeff Barnhill wrote:
> w/regards to this comment:
>You have a remote address with no qualification about which VRF to
> use for the lookup.
> 
> I was using this to enslave the tunnel:
> sudo ip link set jtun vrf myvrf
> 
> and assumed this would be enough to cause all tunnel traffic to be
> part of this VRF.
> 
> You are right about the tunnel link/dev configuration. (re-stating to
> be sure we are saying the same thing)  I can use either the VRF device
> or the v4 device and the packet will get sent.  If I use the VRF
> device, when the reply packet is received, the tunnel is found
> successfully.  If I use the v4 device, then I need the patch you
> provided earlier to successfully look up the tunnel.  You mentioned
> that both should be supported...Back to my previous comment about
> enslaving the tunnel...shouldn't a tunnel being enslaved to a VRF also
> be enough to allow sending and receiving on any device in that VRF
> (that is, not having to configure the link/dev on the tunnel) ?

Sure, it could be done that way. I guess the question is whether we want
to allow the tunnel device and the underlying device to be in separate VRFs.

I have always taken the approach that every device with an address can
be in a separate routing domain. Here, that means the tunnel devices can
be in 1 VRF, and the underlying connection in a separate VRF (or no VRF).

> 
> Fyi...with regards to the v4 ping, adding the loopback address/network
> didn't work for me.  I'll see if I can get anymore info on this and
> maybe start a new thread.
> 
> Thanks,
> Jeff
> 
> 
> jeff@VM2:~$ ping -I myvrf 127.0.0.1
> ping: Warning: source address might be selected on device other than myvrf.
> PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
> ^C
> --- 127.0.0.1 ping statistics ---
> 3 packets transmitted, 0 received, 100% packet loss, time 2051ms
> 
> jeff@VM2:~$ sudo ip addr add dev myvrf 127.0.0.1/8
> jeff@VM2:~$ ping -I myvrf 127.0.0.1
> PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
> ^C
> --- 127.0.0.1 ping statistics ---
> 7 packets transmitted, 0 received, 100% packet loss, time 6149ms
> 
> jeff@VM2:~$ ip addr list myvrf
> 4: myvrf:  mtu 65536 qdisc noqueue state UP
> group default qlen 1000
> link/ether 82:2b:08:8f:a9:f3 brd ff:ff:ff:ff:ff:ff
> inet 127.0.0.1/8 scope host myvrf
>valid_lft forever preferred_lft forever

This is v4.14 kernel? Something else? The 127.0.0.1 address on vrf
device has been supported since the v4.9 kernel.

ip ro ls vrf myvrf?


perf record -e fib:* -a -g -- ping -I myvrf -c1 -w1 127.0.0.1
perf script
--> does it show lookups in the correct table for this address?


Re: v6/sit tunnels and VRFs

2017-10-27 Thread Jeff Barnhill
w/regards to this comment:
   You have a remote address with no qualification about which VRF to
use for the lookup.

I was using this to enslave the tunnel:
sudo ip link set jtun vrf myvrf

and assumed this would be enough to cause all tunnel traffic to be
part of this VRF.

You are right about the tunnel link/dev configuration. (re-stating to
be sure we are saying the same thing)  I can use either the VRF device
or the v4 device and the packet will get sent.  If I use the VRF
device, when the reply packet is received, the tunnel is found
successfully.  If I use the v4 device, then I need the patch you
provided earlier to successfully look up the tunnel.  You mentioned
that both should be supported...Back to my previous comment about
enslaving the tunnel...shouldn't a tunnel being enslaved to a VRF also
be enough to allow sending and receiving on any device in that VRF
(that is, not having to configure the link/dev on the tunnel) ?

Fyi...with regards to the v4 ping, adding the loopback address/network
didn't work for me.  I'll see if I can get anymore info on this and
maybe start a new thread.

Thanks,
Jeff


jeff@VM2:~$ ping -I myvrf 127.0.0.1
ping: Warning: source address might be selected on device other than myvrf.
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
^C
--- 127.0.0.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2051ms

jeff@VM2:~$ sudo ip addr add dev myvrf 127.0.0.1/8
jeff@VM2:~$ ping -I myvrf 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
^C
--- 127.0.0.1 ping statistics ---
7 packets transmitted, 0 received, 100% packet loss, time 6149ms

jeff@VM2:~$ ip addr list myvrf
4: myvrf:  mtu 65536 qdisc noqueue state UP
group default qlen 1000
link/ether 82:2b:08:8f:a9:f3 brd ff:ff:ff:ff:ff:ff
inet 127.0.0.1/8 scope host myvrf
   valid_lft forever preferred_lft forever

On Fri, Oct 27, 2017 at 12:25 PM, David Ahern  wrote:
> On 10/26/17 11:19 PM, Jeff Barnhill wrote:
>> Thanks, David.
>>
>> I corrected the static route, applied the patch, and set the
>> link/output dev on the tunnel and it works now.  Is it required to set
>> the link/output dev?  I was thinking that this should not be required
>> for cases where the outgoing device is not known, for instance on a
>> router or device with multiple interfaces.
>
> In a VRF environment, addresses have to be qualified with a VRF. Running
> the command:
>
> ip tunnel add jtun mode sit remote 192.168.200.1 local 192.168.210.2
>
> You have a remote address with no qualification about which VRF to use
> for the lookup.
>
> Digging into the sit code and how the link parameter is used, the
> existing code works just fine if you use:
>
> sudo ip tunnel add jtun mode sit remote 192.168.200.1 local
> 192.168.210.2 dev myvrf
>
> The tunnel link parameter is myvrf and it is used for tunnel lookups and
> route lookups to forward the packet. So, really I should allow both
> cases -- dev is a VRF and dev is a link that could be in a vrf.
>
>>
>> Also, what is the expected behavior of loopback addresses in a VRF
>> context?  For instance, if an application were being run under "ip vrf
>> exec" and it tried to use these addresses.
>
> The 127.0.0.1 address needs to exist in the VRF. The one on 'lo' is in
> the default VRF only. VRF devices act as the VRF-local loopback, so you
> can put the IPv4 loopback address on it.
>
>>
>> jeff@VM2:~$ ping -I myvrf 127.0.0.1
>> PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
>> ^C
>> --- 127.0.0.1 ping statistics ---
>> 3 packets transmitted, 0 received, 100% packet loss, time 2033ms
>
> If you add the loopback address to the
> ip addr add dev myvrf 127.0.0.1/8
>
> root@kenny-jessie3:~# ip addr add dev myvrf 127.0.0.1/8
> root@kenny-jessie3:~# ping -I myvrf 127.0.0.1
> PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
> 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.045 ms
> ^C
> --- 127.0.0.1 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 0.045/0.045/0.045/0.000 ms
>
>
>>
>> jeff@VM2:~$ ping -I myvrf ::1
>> connect: Network is unreachable
>
> I need to add support for ::1/128 on a VRF device.


Re: v6/sit tunnels and VRFs

2017-10-27 Thread David Ahern
On 10/26/17 11:19 PM, Jeff Barnhill wrote:
> Thanks, David.
> 
> I corrected the static route, applied the patch, and set the
> link/output dev on the tunnel and it works now.  Is it required to set
> the link/output dev?  I was thinking that this should not be required
> for cases where the outgoing device is not known, for instance on a
> router or device with multiple interfaces.

In a VRF environment, addresses have to be qualified with a VRF. Running
the command:

ip tunnel add jtun mode sit remote 192.168.200.1 local 192.168.210.2

You have a remote address with no qualification about which VRF to use
for the lookup.

Digging into the sit code and how the link parameter is used, the
existing code works just fine if you use:

sudo ip tunnel add jtun mode sit remote 192.168.200.1 local
192.168.210.2 dev myvrf

The tunnel link parameter is myvrf and it is used for tunnel lookups and
route lookups to forward the packet. So, really I should allow both
cases -- dev is a VRF and dev is a link that could be in a vrf.

> 
> Also, what is the expected behavior of loopback addresses in a VRF
> context?  For instance, if an application were being run under "ip vrf
> exec" and it tried to use these addresses.

The 127.0.0.1 address needs to exist in the VRF. The one on 'lo' is in
the default VRF only. VRF devices act as the VRF-local loopback, so you
can put the IPv4 loopback address on it.

> 
> jeff@VM2:~$ ping -I myvrf 127.0.0.1
> PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
> ^C
> --- 127.0.0.1 ping statistics ---
> 3 packets transmitted, 0 received, 100% packet loss, time 2033ms

If you add the loopback address to the
ip addr add dev myvrf 127.0.0.1/8

root@kenny-jessie3:~# ip addr add dev myvrf 127.0.0.1/8
root@kenny-jessie3:~# ping -I myvrf 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.045 ms
^C
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.045/0.045/0.045/0.000 ms


> 
> jeff@VM2:~$ ping -I myvrf ::1
> connect: Network is unreachable

I need to add support for ::1/128 on a VRF device.


Re: v6/sit tunnels and VRFs

2017-10-26 Thread Jeff Barnhill
Thanks, David.

I corrected the static route, applied the patch, and set the
link/output dev on the tunnel and it works now.  Is it required to set
the link/output dev?  I was thinking that this should not be required
for cases where the outgoing device is not known, for instance on a
router or device with multiple interfaces.

Also, what is the expected behavior of loopback addresses in a VRF
context?  For instance, if an application were being run under "ip vrf
exec" and it tried to use these addresses.

jeff@VM2:~$ ping -I myvrf 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 myvrf: 56(84) bytes of data.
^C
--- 127.0.0.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2033ms

jeff@VM2:~$ ping -I myvrf ::1
connect: Network is unreachable

Thanks,
Jeff


On Thu, Oct 26, 2017 at 1:24 PM, David Ahern  wrote:
> On 10/25/17 9:28 PM, Jeff Barnhill wrote:
>> Thanks, David.
>>
>> VM1:
>> sudo ip addr add 192.168.200.1/24 dev enp0s8 broadcast 192.168.200.255
>> sudo ip link set enp0s8 up
>> sudo ip route add 192.168.210.0/24 nexthop via 192.168.200.3 dev enp0s8
>> sudo ip tunnel add jtun mode sit remote 192.168.210.2 local 192.168.200.1
>> sudo ip -6 addr add 2001::1/64 dev jtun
>> sudo ip link set jtun up
>>
>> VM2:
>> sudo ip addr add 192.168.210.2/24 dev enp0s8 broadcast 192.168.210.255
>> sudo ip link set enp0s8 up
>> sudo ip route add 192.168.200.0/24 nexthop via 192.168.210.3 dev enp0s8
>> sudo ip link add dev myvrf type vrf table 256
>> sudo ip link set myvrf up
>> sudo ip link set enp0s8 vrf myvrf
>
> You lost the static route by doing the enslaving here. When the device
> is added to or removed from a VRF it is cycled specifically to dump
> routes and neighbor entries associated with the prior vrf. Always create
> the vrf and enslave first, then add routes:
>
> sudo ip link add dev myvrf type vrf table 256
> sudo ip link set myvrf up
> sudo ip link set enp0s8 vrf myvrf
>
> sudo ip addr add 192.168.210.2/24 dev enp0s8 broadcast 192.168.210.255
> sudo ip link set enp0s8 up
> sudo ip route add 192.168.200.0/24 nexthop via 192.168.210.3 dev enp0s8
>
> That said, the above works for the wrong reason -- it is not really
> doing VRF based routing. For that to happen, the static route should be
> added to the vrf table:
>
> sudo ip route add vrf myvrf 192.168.200.0/24 nexthop via 192.168.210.3
> dev enp0s8
>
> And ...
>
>> sudo ip tunnel add jtun mode sit remote 192.168.200.1 local 192.168.210.2
>
> you need to specify the link on the tunnel create:
>
> sudo ip tunnel add jtun mode sit remote 192.168.200.1 local
> 192.168.210.2 dev enp0s8.
>
> And ...
>
> The tunnel lookup needs to account for the VRF device switch:
>
> (whitespace damaged on paste)
>
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index a799f5258614..cf0512054fa7 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -632,11 +632,18 @@ static bool packet_is_spoofed(struct sk_buff *skb,
>  static int ipip6_rcv(struct sk_buff *skb)
>  {
> const struct iphdr *iph = ip_hdr(skb);
> +   struct net_device *dev = skb->dev;
> +   struct net *net = dev_net(dev);
> struct ip_tunnel *tunnel;
> int err;
>
> -   tunnel = ipip6_tunnel_lookup(dev_net(skb->dev), skb->dev,
> -iph->saddr, iph->daddr);
> +   if (netif_is_l3_master(dev)) {
> +   dev = dev_get_by_index_rcu(net, IPCB(skb)->iif);
> +   if (!dev)
> +   goto out;
> +   }
> +
> +   tunnel = ipip6_tunnel_lookup(net, dev, iph->saddr, iph->daddr);
> if (tunnel) {
> struct pcpu_sw_netstats *tstats;
>


Re: v6/sit tunnels and VRFs

2017-10-26 Thread David Ahern
On 10/25/17 9:28 PM, Jeff Barnhill wrote:
> Thanks, David.
> 
> VM1:
> sudo ip addr add 192.168.200.1/24 dev enp0s8 broadcast 192.168.200.255
> sudo ip link set enp0s8 up
> sudo ip route add 192.168.210.0/24 nexthop via 192.168.200.3 dev enp0s8
> sudo ip tunnel add jtun mode sit remote 192.168.210.2 local 192.168.200.1
> sudo ip -6 addr add 2001::1/64 dev jtun
> sudo ip link set jtun up
> 
> VM2:
> sudo ip addr add 192.168.210.2/24 dev enp0s8 broadcast 192.168.210.255
> sudo ip link set enp0s8 up
> sudo ip route add 192.168.200.0/24 nexthop via 192.168.210.3 dev enp0s8
> sudo ip link add dev myvrf type vrf table 256
> sudo ip link set myvrf up
> sudo ip link set enp0s8 vrf myvrf

You lost the static route by doing the enslaving here. When the device
is added to or removed from a VRF it is cycled specifically to dump
routes and neighbor entries associated with the prior vrf. Always create
the vrf and enslave first, then add routes:

sudo ip link add dev myvrf type vrf table 256
sudo ip link set myvrf up
sudo ip link set enp0s8 vrf myvrf

sudo ip addr add 192.168.210.2/24 dev enp0s8 broadcast 192.168.210.255
sudo ip link set enp0s8 up
sudo ip route add 192.168.200.0/24 nexthop via 192.168.210.3 dev enp0s8

That said, the above works for the wrong reason -- it is not really
doing VRF based routing. For that to happen, the static route should be
added to the vrf table:

sudo ip route add vrf myvrf 192.168.200.0/24 nexthop via 192.168.210.3
dev enp0s8

And ...

> sudo ip tunnel add jtun mode sit remote 192.168.200.1 local 192.168.210.2

you need to specify the link on the tunnel create:

sudo ip tunnel add jtun mode sit remote 192.168.200.1 local
192.168.210.2 dev enp0s8.

And ...

The tunnel lookup needs to account for the VRF device switch:

(whitespace damaged on paste)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index a799f5258614..cf0512054fa7 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -632,11 +632,18 @@ static bool packet_is_spoofed(struct sk_buff *skb,
 static int ipip6_rcv(struct sk_buff *skb)
 {
const struct iphdr *iph = ip_hdr(skb);
+   struct net_device *dev = skb->dev;
+   struct net *net = dev_net(dev);
struct ip_tunnel *tunnel;
int err;

-   tunnel = ipip6_tunnel_lookup(dev_net(skb->dev), skb->dev,
-iph->saddr, iph->daddr);
+   if (netif_is_l3_master(dev)) {
+   dev = dev_get_by_index_rcu(net, IPCB(skb)->iif);
+   if (!dev)
+   goto out;
+   }
+
+   tunnel = ipip6_tunnel_lookup(net, dev, iph->saddr, iph->daddr);
if (tunnel) {
struct pcpu_sw_netstats *tstats;



Re: v6/sit tunnels and VRFs

2017-10-25 Thread Jeff Barnhill
Thanks, David.

VM1:
sudo ip addr add 192.168.200.1/24 dev enp0s8 broadcast 192.168.200.255
sudo ip link set enp0s8 up
sudo ip route add 192.168.210.0/24 nexthop via 192.168.200.3 dev enp0s8
sudo ip tunnel add jtun mode sit remote 192.168.210.2 local 192.168.200.1
sudo ip -6 addr add 2001::1/64 dev jtun
sudo ip link set jtun up

VM2:
sudo ip addr add 192.168.210.2/24 dev enp0s8 broadcast 192.168.210.255
sudo ip link set enp0s8 up
sudo ip route add 192.168.200.0/24 nexthop via 192.168.210.3 dev enp0s8
sudo ip link add dev myvrf type vrf table 256
sudo ip link set myvrf up
sudo ip link set enp0s8 vrf myvrf
sudo ip tunnel add jtun mode sit remote 192.168.200.1 local 192.168.210.2
sudo ip link set jtun vrf myvrf
sudo ip -6 addr add 2001::2/64 dev jtun
sudo ip link set jtun up

VM3:
sudo ip addr add 192.168.200.3/24 dev enp0s8 broadcast 192.168.200.255
sudo ip addr add 192.168.210.3/24 dev enp0s9 broadcast 192.168.210.255
sudo ip link set enp0s8 up
sudo ip link set enp0s9 up
sudo sysctl net.ipv4.conf.enp0s8.forwarding=1
sudo sysctl net.ipv4.conf.enp0s9.forwarding=1

jeff@VM2:~$ ping -c 3 -I jtun 2001::1
PING 2001::1(2001::1) from 2001::2 jtun: 56 data bytes
>From 2001::2 icmp_seq=1 Destination unreachable: Address unreachable
>From 2001::2 icmp_seq=2 Destination unreachable: Address unreachable
>From 2001::2 icmp_seq=3 Destination unreachable: Address unreachable

--- 2001::1 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2039ms

jeff@VM2:~$ ping -c 3 -I myvrf 2001::1
ping6: Warning: source address might be selected on device other than myvrf.
PING 2001::1(2001::1) from 2001::2 myvrf: 56 data bytes
>From 2001::2 icmp_seq=1 Destination unreachable: Address unreachable
>From 2001::2 icmp_seq=2 Destination unreachable: Address unreachable
>From 2001::2 icmp_seq=3 Destination unreachable: Address unreachable

--- 2001::1 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2045ms

Let me know if you have any questions or if you think I've done something wrong.

Thanks,
Jeff


On Wed, Oct 25, 2017 at 5:31 PM, David Ahern  wrote:
> On 10/25/17 2:45 PM, Jeff Barnhill wrote:
>> Are v6/sit tunnels working with VRFs?
>>
>> For instance, I have a very simple configuration with three VMs
>> running 4.13.0-16 (Ubuntu Server 17.10) kernels.  VM3 is setup as a
>> router for separation.  VM1 and VM2 have static routes to each other
>> via VM3.  All VMs have v4 interfaces configured.  If I setup a sit
>> tunnel with v6 addrs from V1 to V2, tunneled data flows as expected
>> (verified with ping) and can be seen via tcpdump on VM3.  However, if
>> I create a VRF on VM2 and enslave the v4 interface and tunnel to that
>> VRF, data does not leave VM2 and ping displays "Destination Host
>> Unreachable".  I did verify that basic v4 ping works between VM1 and
>> VM2 with the v4 interface on VM2 enslaved to VRF device.
>>
>> If this should work, I can provide more details with configuration commands.
>
> Please provide configuration details and I'll take a look


Re: v6/sit tunnels and VRFs

2017-10-25 Thread David Ahern
On 10/25/17 2:45 PM, Jeff Barnhill wrote:
> Are v6/sit tunnels working with VRFs?
> 
> For instance, I have a very simple configuration with three VMs
> running 4.13.0-16 (Ubuntu Server 17.10) kernels.  VM3 is setup as a
> router for separation.  VM1 and VM2 have static routes to each other
> via VM3.  All VMs have v4 interfaces configured.  If I setup a sit
> tunnel with v6 addrs from V1 to V2, tunneled data flows as expected
> (verified with ping) and can be seen via tcpdump on VM3.  However, if
> I create a VRF on VM2 and enslave the v4 interface and tunnel to that
> VRF, data does not leave VM2 and ping displays "Destination Host
> Unreachable".  I did verify that basic v4 ping works between VM1 and
> VM2 with the v4 interface on VM2 enslaved to VRF device.
> 
> If this should work, I can provide more details with configuration commands.

Please provide configuration details and I'll take a look