Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-09-07 Thread D'Souza, Nelson
Thanks David and Ido, for finding the root-cause for bridge Rx packets getting 
dropped, also for coming up with a patch.

Regards,
Nelson

On 9/7/18, 9:09 AM, "David Ahern"  wrote:

On 9/7/18 9:56 AM, D'Souza, Nelson wrote:

> 
> *From:* David Ahern 
> *Sent:* Thursday, September 6, 2018 5:27 PM
    > *To:* D'Souza, Nelson; netdev@vger.kernel.org
> *Subject:* Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
>  
> On 9/5/18 12:00 PM, D'Souza, Nelson wrote:
>> Just following up would you be able to confirm that this is a
> Linux VRF issue?
> 
> I can confirm that I can reproduce the problem. Need to find time to dig
> into it.

bridge's netfilter hook is dropping the packet.

bridge's netfilter code registers hook operations that are invoked when
nh_hook is called. It then sees all subsequent calls to nf_hook.

Packet wise, the bridge netfilter hook runs first. br_nf_pre_routing
allocates nf_bridge, sets in_prerouting to 1 and calls NF_HOOK for
NF_INET_PRE_ROUTING. It's finish function, br_nf_pre_routing_finish,
then resets in_prerouting flag to 0. Any subsequent calls to nf_hook
invoke ip_sabotage_in. That function sees in_prerouting is not
set and steals (drops) the packet.

The simplest change is to have ip_sabotage_in recognize that the bridge
can be enslaved to a VRF (L3 master device) and allow the packet to
continue.

Thanks to Ido for the hint on ip_sabotage_in.

This patch works for me:

diff --git a/net/bridge/br_netfilter_hooks.c
b/net/bridge/br_netfilter_hooks.c
index 6e0dc6bcd32a..37278dc280eb 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -835,7 +835,8 @@ static unsigned int ip_sabotage_in(void *priv,
   struct sk_buff *skb,
   const struct nf_hook_state *state)
 {
-   if (skb->nf_bridge && !skb->nf_bridge->in_prerouting) {
+   if (skb->nf_bridge && !skb->nf_bridge->in_prerouting &&
+   !netif_is_l3_master(skb->dev)) {
state->okfn(state->net, state->sk, skb);
return NF_STOLEN;
}




Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-09-05 Thread D'Souza, Nelson
Hi David,

Just following up would you be able to confirm that this is a Linux VRF 
issue? 

Also, how do I log a VRF related defect to ensure this gets resolved in a 
subsequent release.

Thanks,
Nelson

On 8/2/18, 4:12 PM, "D'Souza, Nelson"  wrote:

Hi David,

Turns out the VRF bridge Rx issue is triggered by a docker install.

Docker makes the following sysctl changes:
  net.bridge.bridge-nf-call-arptables = 1
  net.bridge.bridge-nf-call-ip6tables = 1
  net.bridge.bridge-nf-call-iptables = 1 <<< exposes the ipv4 VRF Rx 
issue when a bridge is enslaved to a VRF

which causes packets flowing through all bridges to be subjected to 
netfilter rules. This is required for bridge net filtering when ip forwarding 
is enabled.

Please refer to 
https://github.com/docker/libnetwork/blob/master/drivers/bridge/setup_bridgenetfiltering.go#L53

Setting net.bridge.bridge-nf-call-iptables = 0 resolves the issue, but is 
not really a viable option given that bridge net filtering is a basic 
requirement in existing docker deployments.

It's not clear to me why this conf setting breaks local Rx delivery for a 
bridge enslaved to a VRF, because these packets would always be sent up by the 
bridge for IP netfilter processing.

This issue is easily reproducible on an Ubuntu 18.04.1 VM. Simply 
installing docker will cause pings running on test-vrf to fail. Clearing the 
sysctl conf restores Rx local delivery.

Thanks,
Nelson

On 7/27/18, 4:29 PM, "D'Souza, Nelson"  wrote:

David,

With Ubuntu 18.04.1 (kernel 4.15.0-29) pings sent out on test-vrf and 
br0 are successful.

# uname -rv
4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018

# ping -c 1 -I test-vrf 172.16.2.2
ping: Warning: source address might be selected on device other than 
test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of 
data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.050 ms

--- 172.16.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.050/0.050/0.050/0.000 ms

# ping -c 1 -I br0 172.16.2.2
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.026 ms

--- 172.16.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms

However, with Ubuntu 17.10.1 (kernel  4.13.0-21) pings on only test-vrf 
are successful. Pings on br0 are not successful.
So it seems like there maybe a change in versions after 4.13.0-21 that 
causes pings on br0 to pass.

Nelson

    On 7/25/18, 5:35 PM, "D'Souza, Nelson"  wrote:

David, 

I tried out the commands on an Ubuntu 17.10.1 VM.
The pings on test-vrf are successful, but the pings on br0 are not 
successful.

# uname -rv  
4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017

 # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 17.10
Release:17.10
Codename:   artful

# ip rule  --> Note: its missing the l3mdev rule
0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default

Ran the configs from a bash script vrf.sh

 # ./vrf.sh 
+ ip netns add foo
+ ip li add veth1 type veth peer name veth2
+ ip li set veth2 netns foo
+ ip -netns foo li set lo up
+ ip -netns foo li set veth2 up
+ ip -netns foo addr add 172.16.1.2/24 dev veth2
+ ip li add test-vrf type vrf table 123
+ ip li set test-vrf up
+ ip ro add vrf test-vrf unreachable default
+ ip li add br0 type bridge
+ ip li set veth1 master br0
+ ip li set veth1 up
+ ip li set br0 up
+ ip addr add dev br0 172.16.1.1/24
+ ip li set br0 master test-vrf
+ ip -netns foo addr add 172.16.2.2/32 dev lo
+ ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2

# ping -I test-vrf 172.16.2.2 -c 2  <<< successful on test-vrf
ping: Warning: source address might be selected on device other 
than test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes 
of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64

Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-08-02 Thread D'Souza, Nelson
Hi David,

Turns out the VRF bridge Rx issue is triggered by a docker install.

Docker makes the following sysctl changes:
  net.bridge.bridge-nf-call-arptables = 1
  net.bridge.bridge-nf-call-ip6tables = 1
  net.bridge.bridge-nf-call-iptables = 1 <<< exposes the ipv4 VRF Rx issue 
when a bridge is enslaved to a VRF

which causes packets flowing through all bridges to be subjected to netfilter 
rules. This is required for bridge net filtering when ip forwarding is enabled.

Please refer to 
https://github.com/docker/libnetwork/blob/master/drivers/bridge/setup_bridgenetfiltering.go#L53

Setting net.bridge.bridge-nf-call-iptables = 0 resolves the issue, but is not 
really a viable option given that bridge net filtering is a basic requirement 
in existing docker deployments.

It's not clear to me why this conf setting breaks local Rx delivery for a 
bridge enslaved to a VRF, because these packets would always be sent up by the 
bridge for IP netfilter processing.

This issue is easily reproducible on an Ubuntu 18.04.1 VM. Simply installing 
docker will cause pings running on test-vrf to fail. Clearing the sysctl conf 
restores Rx local delivery.

Thanks,
Nelson

On 7/27/18, 4:29 PM, "D'Souza, Nelson"  wrote:

David,

With Ubuntu 18.04.1 (kernel 4.15.0-29) pings sent out on test-vrf and br0 
are successful.

# uname -rv
4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018

# ping -c 1 -I test-vrf 172.16.2.2
ping: Warning: source address might be selected on device other than 
test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.050 ms

--- 172.16.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.050/0.050/0.050/0.000 ms

# ping -c 1 -I br0 172.16.2.2
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.026 ms

--- 172.16.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms

However, with Ubuntu 17.10.1 (kernel  4.13.0-21) pings on only test-vrf are 
successful. Pings on br0 are not successful.
So it seems like there maybe a change in versions after 4.13.0-21 that 
causes pings on br0 to pass.

Nelson

On 7/25/18, 5:35 PM, "D'Souza, Nelson"  wrote:

David, 

I tried out the commands on an Ubuntu 17.10.1 VM.
The pings on test-vrf are successful, but the pings on br0 are not 
successful.

# uname -rv  
4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017

 # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 17.10
Release:17.10
Codename:   artful

# ip rule  --> Note: its missing the l3mdev rule
0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default

Ran the configs from a bash script vrf.sh

 # ./vrf.sh 
+ ip netns add foo
+ ip li add veth1 type veth peer name veth2
+ ip li set veth2 netns foo
+ ip -netns foo li set lo up
+ ip -netns foo li set veth2 up
+ ip -netns foo addr add 172.16.1.2/24 dev veth2
+ ip li add test-vrf type vrf table 123
+ ip li set test-vrf up
+ ip ro add vrf test-vrf unreachable default
+ ip li add br0 type bridge
+ ip li set veth1 master br0
+ ip li set veth1 up
+ ip li set br0 up
+ ip addr add dev br0 172.16.1.1/24
+ ip li set br0 master test-vrf
+ ip -netns foo addr add 172.16.2.2/32 dev lo
+ ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2

# ping -I test-vrf 172.16.2.2 -c 2  <<< successful on test-vrf
ping: Warning: source address might be selected on device other than 
test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of 
data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms

#ping -I br0 172.16.2.2 -c 2   <<< fails on br0
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1022ms

Please let me know if I should try a different version.

Nelson

On 7/24/18, 9:08 AM, &

Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-27 Thread D'Souza, Nelson
David,

With Ubuntu 18.04.1 (kernel 4.15.0-29) pings sent out on test-vrf and br0 are 
successful.

# uname -rv
4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018

# ping -c 1 -I test-vrf 172.16.2.2
ping: Warning: source address might be selected on device other than test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.050 ms

--- 172.16.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.050/0.050/0.050/0.000 ms

# ping -c 1 -I br0 172.16.2.2
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.026 ms

--- 172.16.2.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms

However, with Ubuntu 17.10.1 (kernel  4.13.0-21) pings on only test-vrf are 
successful. Pings on br0 are not successful.
So it seems like there maybe a change in versions after 4.13.0-21 that causes 
pings on br0 to pass.

Nelson

On 7/25/18, 5:35 PM, "D'Souza, Nelson"  wrote:

David, 

I tried out the commands on an Ubuntu 17.10.1 VM.
The pings on test-vrf are successful, but the pings on br0 are not 
successful.

# uname -rv  
4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017

 # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 17.10
Release:17.10
Codename:   artful

# ip rule  --> Note: its missing the l3mdev rule
0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default

Ran the configs from a bash script vrf.sh

 # ./vrf.sh 
+ ip netns add foo
+ ip li add veth1 type veth peer name veth2
+ ip li set veth2 netns foo
+ ip -netns foo li set lo up
+ ip -netns foo li set veth2 up
+ ip -netns foo addr add 172.16.1.2/24 dev veth2
+ ip li add test-vrf type vrf table 123
+ ip li set test-vrf up
+ ip ro add vrf test-vrf unreachable default
+ ip li add br0 type bridge
+ ip li set veth1 master br0
+ ip li set veth1 up
+ ip li set br0 up
+ ip addr add dev br0 172.16.1.1/24
+ ip li set br0 master test-vrf
+ ip -netns foo addr add 172.16.2.2/32 dev lo
+ ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2

# ping -I test-vrf 172.16.2.2 -c 2  <<< successful on test-vrf
ping: Warning: source address might be selected on device other than 
test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms

#ping -I br0 172.16.2.2 -c 2   <<< fails on br0
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1022ms

Please let me know if I should try a different version.

Nelson

    On 7/24/18, 9:08 AM, "D'Souza, Nelson"  wrote:

It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but 
enslaving eth1->test-vrf works fine.

Nelson

On 7/24/18, 8:58 AM, "D'Souza, Nelson"  wrote:

Thank you David, really appreciate the help. Most likely something 
specific to my environment.

ip vrf id, does not report anything on my system. Here's the result 
after running the command.

# ip vrf id
#

I'll follow up with a VM.

Nelson

On 7/24/18, 5:55 AM, "David Ahern"  wrote:

On 7/23/18 7:43 PM, D'Souza, Nelson wrote:
> I copy and pasted the configs onto my device, but pings on 
test-vrf do not work in my setup. 
> I'm essentially seeing the same issue as I reported before.
> 
> In this case, pings sent out on test-vrf (host ns) are 
received and replied to by the loopback interface (foo ns). Although the 
replies are seen at the test-vrf level, they are not locally delivered to the 
ping application.
> 

I just built v4.14.52 kernel and ran those commands - worked 
fine. It is
something specific to your environment. Is your shell tied to a 
VRF --
(ip vrf id)?

After that, I suggest you create a VM running a newer 
distribution of

Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-25 Thread D'Souza, Nelson
David,

To narrow down on the issue, I've been requested by our kernel team for the 
following information:

"Can you clarify what kernel configuration was used for the clean 4.14.52 
kernel (no changes)

 The kernel configuration may be available in /proc/config.gz, or it might be 
available as a text file in the /boot directory."

Would you be able to provide this?

Nelson

On 7/25/18, 5:35 PM, "D'Souza, Nelson"  wrote:

David, 

I tried out the commands on an Ubuntu 17.10.1 VM.
The pings on test-vrf are successful, but the pings on br0 are not 
successful.

# uname -rv  
4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017

 # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 17.10
Release:17.10
Codename:   artful

# ip rule  --> Note: its missing the l3mdev rule
0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default

Ran the configs from a bash script vrf.sh

 # ./vrf.sh 
+ ip netns add foo
+ ip li add veth1 type veth peer name veth2
+ ip li set veth2 netns foo
+ ip -netns foo li set lo up
+ ip -netns foo li set veth2 up
+ ip -netns foo addr add 172.16.1.2/24 dev veth2
+ ip li add test-vrf type vrf table 123
+ ip li set test-vrf up
+ ip ro add vrf test-vrf unreachable default
+ ip li add br0 type bridge
+ ip li set veth1 master br0
+ ip li set veth1 up
+ ip li set br0 up
+ ip addr add dev br0 172.16.1.1/24
+ ip li set br0 master test-vrf
+ ip -netns foo addr add 172.16.2.2/32 dev lo
+ ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2

# ping -I test-vrf 172.16.2.2 -c 2  <<< successful on test-vrf
ping: Warning: source address might be selected on device other than 
test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms

#ping -I br0 172.16.2.2 -c 2   <<< fails on br0
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1022ms

Please let me know if I should try a different version.

    Nelson

On 7/24/18, 9:08 AM, "D'Souza, Nelson"  wrote:

It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but 
enslaving eth1->test-vrf works fine.
    
    Nelson

On 7/24/18, 8:58 AM, "D'Souza, Nelson"  wrote:

Thank you David, really appreciate the help. Most likely something 
specific to my environment.

ip vrf id, does not report anything on my system. Here's the result 
after running the command.

# ip vrf id
#

I'll follow up with a VM.

Nelson

On 7/24/18, 5:55 AM, "David Ahern"  wrote:

On 7/23/18 7:43 PM, D'Souza, Nelson wrote:
> I copy and pasted the configs onto my device, but pings on 
test-vrf do not work in my setup. 
> I'm essentially seeing the same issue as I reported before.
> 
> In this case, pings sent out on test-vrf (host ns) are 
received and replied to by the loopback interface (foo ns). Although the 
replies are seen at the test-vrf level, they are not locally delivered to the 
ping application.
> 

I just built v4.14.52 kernel and ran those commands - worked 
fine. It is
something specific to your environment. Is your shell tied to a 
VRF --
(ip vrf id)?

After that, I suggest you create a VM running a newer 
distribution of
your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 
kernel, or
Fedora 26 or newer) and run the commands there.










Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-25 Thread D'Souza, Nelson
David, 

I tried out the commands on an Ubuntu 17.10.1 VM.
The pings on test-vrf are successful, but the pings on br0 are not successful.

# uname -rv  
4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017

 # lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:Ubuntu 17.10
Release:17.10
Codename:   artful

# ip rule  --> Note: its missing the l3mdev rule
0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default

Ran the configs from a bash script vrf.sh

 # ./vrf.sh 
+ ip netns add foo
+ ip li add veth1 type veth peer name veth2
+ ip li set veth2 netns foo
+ ip -netns foo li set lo up
+ ip -netns foo li set veth2 up
+ ip -netns foo addr add 172.16.1.2/24 dev veth2
+ ip li add test-vrf type vrf table 123
+ ip li set test-vrf up
+ ip ro add vrf test-vrf unreachable default
+ ip li add br0 type bridge
+ ip li set veth1 master br0
+ ip li set veth1 up
+ ip li set br0 up
+ ip addr add dev br0 172.16.1.1/24
+ ip li set br0 master test-vrf
+ ip -netns foo addr add 172.16.2.2/32 dev lo
+ ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2

# ping -I test-vrf 172.16.2.2 -c 2  <<< successful on test-vrf
ping: Warning: source address might be selected on device other than test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1022ms
rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms

#ping -I br0 172.16.2.2 -c 2   <<< fails on br0
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.

--- 172.16.2.2 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1022ms

Please let me know if I should try a different version.

Nelson

On 7/24/18, 9:08 AM, "D'Souza, Nelson"  wrote:

It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but 
enslaving eth1->test-vrf works fine.

    Nelson

    On 7/24/18, 8:58 AM, "D'Souza, Nelson"  wrote:

Thank you David, really appreciate the help. Most likely something 
specific to my environment.

ip vrf id, does not report anything on my system. Here's the result 
after running the command.

# ip vrf id
#

I'll follow up with a VM.

Nelson

On 7/24/18, 5:55 AM, "David Ahern"  wrote:

On 7/23/18 7:43 PM, D'Souza, Nelson wrote:
> I copy and pasted the configs onto my device, but pings on 
test-vrf do not work in my setup. 
> I'm essentially seeing the same issue as I reported before.
> 
> In this case, pings sent out on test-vrf (host ns) are received 
and replied to by the loopback interface (foo ns). Although the replies are 
seen at the test-vrf level, they are not locally delivered to the ping 
application.
> 

I just built v4.14.52 kernel and ran those commands - worked fine. 
It is
something specific to your environment. Is your shell tied to a VRF 
--
(ip vrf id)?

After that, I suggest you create a VM running a newer distribution 
of
your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 
kernel, or
Fedora 26 or newer) and run the commands there.








Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-24 Thread D'Souza, Nelson
It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but 
enslaving eth1->test-vrf works fine.

Nelson

On 7/24/18, 8:58 AM, "D'Souza, Nelson"  wrote:

Thank you David, really appreciate the help. Most likely something specific 
to my environment.

ip vrf id, does not report anything on my system. Here's the result after 
running the command.

# ip vrf id
#

I'll follow up with a VM.

Nelson

On 7/24/18, 5:55 AM, "David Ahern"  wrote:

    On 7/23/18 7:43 PM, D'Souza, Nelson wrote:
> I copy and pasted the configs onto my device, but pings on test-vrf 
do not work in my setup. 
> I'm essentially seeing the same issue as I reported before.
> 
> In this case, pings sent out on test-vrf (host ns) are received and 
replied to by the loopback interface (foo ns). Although the replies are seen at 
the test-vrf level, they are not locally delivered to the ping application.
> 

I just built v4.14.52 kernel and ran those commands - worked fine. It is
something specific to your environment. Is your shell tied to a VRF --
(ip vrf id)?

After that, I suggest you create a VM running a newer distribution of
your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 kernel, or
Fedora 26 or newer) and run the commands there.






Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-24 Thread D'Souza, Nelson
Thank you David, really appreciate the help. Most likely something specific to 
my environment.

ip vrf id, does not report anything on my system. Here's the result after 
running the command.

# ip vrf id
#

I'll follow up with a VM.

Nelson

On 7/24/18, 5:55 AM, "David Ahern"  wrote:

On 7/23/18 7:43 PM, D'Souza, Nelson wrote:
> I copy and pasted the configs onto my device, but pings on test-vrf do 
not work in my setup. 
> I'm essentially seeing the same issue as I reported before.
> 
> In this case, pings sent out on test-vrf (host ns) are received and 
replied to by the loopback interface (foo ns). Although the replies are seen at 
the test-vrf level, they are not locally delivered to the ping application.
> 

I just built v4.14.52 kernel and ran those commands - worked fine. It is
something specific to your environment. Is your shell tied to a VRF --
(ip vrf id)?

After that, I suggest you create a VM running a newer distribution of
your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 kernel, or
Fedora 26 or newer) and run the commands there.




Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-23 Thread D'Souza, Nelson
Hi David,

I copy and pasted the configs onto my device, but pings on test-vrf do not work 
in my setup. 
I'm essentially seeing the same issue as I reported before.

In this case, pings sent out on test-vrf (host ns) are received and replied to 
by the loopback interface (foo ns). Although the replies are seen at the 
test-vrf level, they are not locally delivered to the ping application.

Logs are as follows...

a) pings on test-vrf or br0 fail.

# ping -I test-vrf 172.16.2.2 -c1 -w1
PING 172.16.2.2 (172.16.2.2): 56 data bytes

--- 172.16.2.2 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

b) tcpdump in the foo namespace, shows icmp echos/replies on veth2

# ip netns exec foo tcpdump -i veth2 icmp -c 2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on veth2, link-type EN10MB (Ethernet), capture size 262144 bytes
18:34:13.205210 IP 172.16.1.1 > 172.16.2.2: ICMP echo request, id 19513, seq 0, 
length 64
18:34:13.205253 IP 172.16.2.2 > 172.16.1.1: ICMP echo reply, id 19513, seq 0, 
length 64
2 packets captured
2 packets received by filter
0 packets dropped by kernel

c) tcpdump in the host namespace, shows icmp echos/replies on test-vrf, br0 and 
veth1:

# tcpdump -i test-vrf icmp -c 2
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on test-vrf, link-type EN10MB (Ethernet), capture size 262144 bytes
18:34:13.204061 IP 172.16.1.1 > 172.16.2.2: ICMP echo request, id 19513, seq 0, 
length 64
18:34:13.205278 IP 172.16.2.2 > 172.16.1.1: ICMP echo reply, id 19513, seq 0, 
length 64
2 packets captured
2 packets received by filter
0 packets dropped by kernel

Thanks,
Nelson

On 7/23/18, 3:00 PM, "David Ahern"  wrote:

    On 7/20/18 1:03 PM, D'Souza, Nelson wrote:
> Setup is as follows:
> 
> ethUSB(ingress port) -> mgmtbr0 (bridge) -> mgmtvrf (vrf)



 |  netns foo
 [ test-vrf ]|
   | |
[ br0 ] 172.16.1.1   |
   | |
   [ veth1 ] |=== [ veth2 ]  lo
 |   172.16.1.2 172.16.2.2
 |


Copy and paste the following into your environment:

ip netns add foo
ip li add veth1 type veth peer name veth2
ip li set veth2 netns foo

ip -netns foo li set lo up
ip -netns foo li set veth2 up
ip -netns foo addr add 172.16.1.2/24 dev veth2


ip li add test-vrf type vrf table 123
ip li set test-vrf up
ip ro add vrf test-vrf unreachable default

ip li add  br0 type bridge
ip li set veth1 master br0
ip li set veth1 up
ip li set br0 up
ip addr add dev br0 172.16.1.1/24
ip li set br0 master test-vrf

ip -netns foo addr add 172.16.2.2/32 dev lo
ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2

Does ping work?
# ping -I test-vrf 172.16.2.2
ping: Warning: source address might be selected on device other than
test-vrf.
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.228 ms
64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.263 ms

and:
# ping -I br0 172.16.2.2
PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data.
64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.227 ms
64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.223 ms
^C
--- 172.16.2.2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.223/0.225/0.227/0.002 ms




Re: [**EXTERNAL**] Re: DNAT with VRF support in Linux Kernel

2018-07-20 Thread D'Souza, Nelson
Hi David,

Assuming your setup is as follows:

eth (ingress interface) --> br0 (bridge) --> mgmt (vrf)

is it possible that the DNAT rule matches on the eth ingress interface and not 
the mgmt. vrf device?

If the LOG rule does not match with dev == mgmt, it seems like the DNAT rule 
with dev ==mgmt wouldn't match as well?

Nelson

On 7/19/18, 8:42 PM, "David Ahern"  wrote:

On 7/19/18 7:52 PM, D'Souza, Nelson wrote:
> Hi,
> 
>  
> 
> I'm seeing a VRF/Netfilter related issue on a system running a 4.14.52
> Linux kernel.
> 
>  
> 
> I have an eth interface enslaved to l3mdev mgmtvrf device.
> 
>  
> 
> After reviewing
> https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf, I was
> expecting that the Netfilter NF_INET_PRE_ROUTING rules would be applied
> to packets at the ingress eth interface and VRF device level. I
> confirmed that this works for pre-routing rules added to the raw and
> mangle tables at the ingress interface and VRF device level. I'm having
> issues though with pre-routing rules that are applied to the NAT table.
> NAT pre-routing rules only match on the ingress eth interface, not on
> the mgmtVRF device. As a result, I'm not able to apply DNAT at the
> mgmtvrf device level for IPv4 packets sourced from an external host and
> destined to the eth interface ip address.
> 
>  
> 
> Also observed that a tcpdump on the mgmtvrf device captures packets
> ingressing on the mgmtvrf.
> 
>  
> 
> Please let me know if my understanding is correct, and if so, if this is
> a resolved/outstanding issue.
> 

I am puzzled by this one. My main dev server uses mgmt vrf with DNAT
rules to access VMs running on it, so I know it works to some degree. e.g.,

$ sudo iptables -nvL -t nat
Chain PREROUTING (policy ACCEPT 409 packets, 68587 bytes)
 pkts bytes target prot opt in out source
destination
 8761  583K ACCEPT all  --  br0*   0.0.0.0/0
0.0.0.0/0
5   320 DNAT   tcp  --  *  *   0.0.0.0/0
0.0.0.0/0tcp dpt:2201 to:10.1.1.1:22
...

But, adding LOG rule does not show a hit with dev == mgmt.




Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-20 Thread D'Souza, Nelson
The Linux kernel has kernel patches applied beyond 4.14.52 but aside from that 
it has no custom changes.
Currently don't have perf on the linux system, so will have to get back to you 
with the perf traces.

Meanwhile,  here's the ip outputs you requested.

root@x10sdv-4c-tln4f:~# ip rule ls
0:  from all lookup local
1000:   from all lookup [l3mdev-table]
32766:  from all lookup main
32767:  from all lookup default

root@x10sdv-4c-tln4f:~# ip route ls vrf mgmtvrf
default via 10.33.96.1 dev mgmtbr0
10.33.96.0/24 dev mgmtbr0 proto kernel scope link src 10.33.96.131

root@x10sdv-4c-tln4f:~# ip link show vrf mgmtvrf
16: mgmtbr0:  mtu 1500 qdisc noqueue master 
mgmtvrf state UP mode DEFAULT group default qlen 1000
link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff

Thanks,
Nelson

On 7/20/18, 12:11 PM, "David Ahern"  wrote:

On 7/20/18 1:03 PM, D'Souza, Nelson wrote:
> Hi Dave,
> 
> It is good to know that this works in your case. However, I'm not able to 
pinpoint what the issue is and looking for a way to narrow down to the root 
cause.
> Do you know if this has been an issue in the past and resolved in Linux 
kernel versions after 4.14.52? 

It has always worked as far as I recall.

> 
> I have the same setup as you and tcpdump works at all levels (eth, 
bridge, vrf).
> 
> Setup is as follows:
> 
> ethUSB(ingress port) -> mgmtbr0 (bridge) -> mgmtvrf (vrf)
> 
> Logs from my setup:
> 
> b) ethUSB is enslaved to mgmtbr0 (bridge)
> 
> root@x10sdv-4c-tln4f:~# ip link show master mgmtbr0
> 6: ethUSB:  mtu 1500 qdisc pfifo_fast 
master mgmtbr0 state UNKNOWN mode DEFAULT group default qlen 1000
> link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff
> 
> b) mgmtbr0 bridge is enslaved to mgmtvrf  (vrf)
> 
> root@x10sdv-4c-tln4f:~# ip link show master mgmtvrf
> 16: mgmtbr0:  mtu 1500 qdisc noqueue 
master mgmtvrf state UP mode DEFAULT group default qlen 1000
> link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff
> 
> c) ip address configured on mgmtbr0
> 
> root@x10sdv-4c-tln4f:~# ip addr show dev mgmtbr0
> 16: mgmtbr0:  mtu 1500 qdisc noqueue 
master mgmtvrf state UP group default qlen 1000
> link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff
> inet 10.33.96.131/24 brd 10.33.96.255 scope global mgmtbr0
>valid_lft forever preferred_lft forever
> inet6 fe80::c256:27ff:fe90:4f75/64 scope link
>valid_lft forever preferred_lft forever
> 
> d) tcpdump on ethUSB successful, but ping fails
> root@x10sdv-4c-tln4f:~# ping 10.32.8.135 -I mgmtvrf -c1 -w1
> PING 10.32.8.135 (10.32.8.135): 56 data bytes
> --- 10.32.8.135 ping statistics ---
> 1 packets transmitted, 0 packets received, 100% packet loss
> 
> root@x10sdv-4c-tln4f:~# tcpdump -i ethUSB icmp 
> 11:38:37.169678 IP 10.33.96.131 > 10.32.8.135: ICMP echo request, id 
62312, seq 0, length 64
> 11:38:37.170906 IP 10.32.8.135 > 10.33.96.131: ICMP echo reply, id 62312, 
seq 0, length 64

First, is this a modified kernel?

What does the following show?
$ ip ru ls
$ ip route ls vrf mgmt
$ ip li sh vrf mgmt

Try perf:
perf record -e fib:* -a -g -- sleep 3
(run ping during record)
perf script

Look at the table used for lookups. Is the correct one for the mgmt vrf?





Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge

2018-07-20 Thread D'Souza, Nelson
Hi Dave,

It is good to know that this works in your case. However, I'm not able to 
pinpoint what the issue is and looking for a way to narrow down to the root 
cause.
Do you know if this has been an issue in the past and resolved in Linux kernel 
versions after 4.14.52? 

I have the same setup as you and tcpdump works at all levels (eth, bridge, vrf).

Setup is as follows:

ethUSB(ingress port) -> mgmtbr0 (bridge) -> mgmtvrf (vrf)

Logs from my setup:

b) ethUSB is enslaved to mgmtbr0 (bridge)

root@x10sdv-4c-tln4f:~# ip link show master mgmtbr0
6: ethUSB:  mtu 1500 qdisc pfifo_fast master 
mgmtbr0 state UNKNOWN mode DEFAULT group default qlen 1000
link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff

b) mgmtbr0 bridge is enslaved to mgmtvrf  (vrf)

root@x10sdv-4c-tln4f:~# ip link show master mgmtvrf
16: mgmtbr0:  mtu 1500 qdisc noqueue master 
mgmtvrf state UP mode DEFAULT group default qlen 1000
link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff

c) ip address configured on mgmtbr0

root@x10sdv-4c-tln4f:~# ip addr show dev mgmtbr0
16: mgmtbr0:  mtu 1500 qdisc noqueue master 
mgmtvrf state UP group default qlen 1000
link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff
inet 10.33.96.131/24 brd 10.33.96.255 scope global mgmtbr0
   valid_lft forever preferred_lft forever
inet6 fe80::c256:27ff:fe90:4f75/64 scope link
   valid_lft forever preferred_lft forever

d) tcpdump on ethUSB successful, but ping fails
root@x10sdv-4c-tln4f:~# ping 10.32.8.135 -I mgmtvrf -c1 -w1
PING 10.32.8.135 (10.32.8.135): 56 data bytes
--- 10.32.8.135 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

root@x10sdv-4c-tln4f:~# tcpdump -i ethUSB icmp 
11:38:37.169678 IP 10.33.96.131 > 10.32.8.135: ICMP echo request, id 62312, seq 
0, length 64
11:38:37.170906 IP 10.32.8.135 > 10.33.96.131: ICMP echo reply, id 62312, seq 
0, length 64

e) tcpdump on mgmtbr0 successful, but ping fails

root@x10sdv-4c-tln4f:~# ping 10.32.8.135 -I mgmtvrf -c1 -w1
PING 10.32.8.135 (10.32.8.135): 56 data bytes
--- 10.32.8.135 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

root@x10sdv-4c-tln4f:~# tcpdump -i mgmtbr0 icmp
11:46:21.566739 IP 10.33.96.131 > 10.32.8.135: ICMP echo request, id 617, seq 
0, length 64
11:46:21.567982 IP 10.32.8.135 > 10.33.96.131: ICMP echo reply, id 617, seq 0, 
length 64

f) tcpdump on mgmtvrf successful, but ping fails
root@x10sdv-4c-tln4f:~# ping 10.32.8.135 -I mgmtvrf -c1 -w1
PING 10.32.8.135 (10.32.8.135): 56 data bytes
--- 10.32.8.135 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss

root@x10sdv-4c-tln4f:~# tcpdump -i mgmtvrf icmp
11:50:24.155706 IP 10.33.96.131 > 10.32.8.135: ICMP echo request, id 2153, seq 
0, length 64
11:50:24.156977 IP 10.32.8.135 > 10.33.96.131: ICMP echo reply, id 2153, seq 0, 
length 64

f) Netfilter prerouting rules added to the raw table, only sees packets 
ingressing on mgmtbr0, not mgmtvrf. 

root@x10sdv-4c-tln4f:~# iptables -t raw -nvL PREROUTING
Chain PREROUTING (policy ACCEPT 3 packets, 252 bytes)
 pkts bytes target prot opt in out source   destination
3   252 LOGall  --  mgmtbr0 *   10.32.8.135  0.0.0.0/0  
  LOG flags 0 level 4
0 0 LOGall  --  mgmtvrf *   10.32.8.135  0.0.0.0/0  
  LOG flags 0 level 4

It's strange that while the tcpdump works at the mgmtvrf level, netfilter 
prerouting rules do not match on the mgmtvrf level.

Appreciate the help, please let me know if you need additional logs.

Thanks,
Nelson

On 7/19/18, 8:37 PM, "David Ahern"  wrote:

On 7/19/18 8:19 PM, D'Souza, Nelson wrote:
> Hi,
> 
>  
> 
> I'm seeing the following issue on a system running a 4.14.52 Linux kernel.
> 
>  
> 
> With an eth interface enslaved to a VRF device, pings sent out on the
> VRF to an neighboring host are successful. But, with an eth interface
> enslaved to a L3 enabled bridge (mgmtbr0), and the bridge enslaved to a
> l3mdev VRF (mgmtvrf), the pings sent out on the VRF are not received
> back at the application level.

you mean this setup:
eth1 (ingress port) -> br0 (bridge) -> red (vrf)

IP address on br0:

9: br0:  mtu 1500 qdisc noqueue master
red state UP group default qlen 1000
link/ether 02:e0:f9:1c:00:37 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.4/24 scope global br0
   valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:37/64 scope link
   valid_lft forever preferred_lft forever

And then ping a neighbor:

# ping -I red -c1 -w1 10.100.1.254
ping: Warning: source address might be selected on device other than red.
PING 10.100.1.254 (10.100.1.254) from 10.100.1.4 red: 56(84) bytes of data.
64 bytes from 10.100.1.254: icmp_seq=1 t