Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
Thanks David and Ido, for finding the root-cause for bridge Rx packets getting dropped, also for coming up with a patch. Regards, Nelson On 9/7/18, 9:09 AM, "David Ahern" wrote: On 9/7/18 9:56 AM, D'Souza, Nelson wrote: > > *From:* David Ahern > *Sent:* Thursday, September 6, 2018 5:27 PM > *To:* D'Souza, Nelson; netdev@vger.kernel.org > *Subject:* Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge > > On 9/5/18 12:00 PM, D'Souza, Nelson wrote: >> Just following up would you be able to confirm that this is a > Linux VRF issue? > > I can confirm that I can reproduce the problem. Need to find time to dig > into it. bridge's netfilter hook is dropping the packet. bridge's netfilter code registers hook operations that are invoked when nh_hook is called. It then sees all subsequent calls to nf_hook. Packet wise, the bridge netfilter hook runs first. br_nf_pre_routing allocates nf_bridge, sets in_prerouting to 1 and calls NF_HOOK for NF_INET_PRE_ROUTING. It's finish function, br_nf_pre_routing_finish, then resets in_prerouting flag to 0. Any subsequent calls to nf_hook invoke ip_sabotage_in. That function sees in_prerouting is not set and steals (drops) the packet. The simplest change is to have ip_sabotage_in recognize that the bridge can be enslaved to a VRF (L3 master device) and allow the packet to continue. Thanks to Ido for the hint on ip_sabotage_in. This patch works for me: diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 6e0dc6bcd32a..37278dc280eb 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -835,7 +835,8 @@ static unsigned int ip_sabotage_in(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { - if (skb->nf_bridge && !skb->nf_bridge->in_prerouting) { + if (skb->nf_bridge && !skb->nf_bridge->in_prerouting && + !netif_is_l3_master(skb->dev)) { state->okfn(state->net, state->sk, skb); return NF_STOLEN; }
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
On 9/7/18 9:56 AM, D'Souza, Nelson wrote: > > *From:* David Ahern > *Sent:* Thursday, September 6, 2018 5:27 PM > *To:* D'Souza, Nelson; netdev@vger.kernel.org > *Subject:* Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge > > On 9/5/18 12:00 PM, D'Souza, Nelson wrote: >> Just following up would you be able to confirm that this is a > Linux VRF issue? > > I can confirm that I can reproduce the problem. Need to find time to dig > into it. bridge's netfilter hook is dropping the packet. bridge's netfilter code registers hook operations that are invoked when nh_hook is called. It then sees all subsequent calls to nf_hook. Packet wise, the bridge netfilter hook runs first. br_nf_pre_routing allocates nf_bridge, sets in_prerouting to 1 and calls NF_HOOK for NF_INET_PRE_ROUTING. It's finish function, br_nf_pre_routing_finish, then resets in_prerouting flag to 0. Any subsequent calls to nf_hook invoke ip_sabotage_in. That function sees in_prerouting is not set and steals (drops) the packet. The simplest change is to have ip_sabotage_in recognize that the bridge can be enslaved to a VRF (L3 master device) and allow the packet to continue. Thanks to Ido for the hint on ip_sabotage_in. This patch works for me: diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index 6e0dc6bcd32a..37278dc280eb 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -835,7 +835,8 @@ static unsigned int ip_sabotage_in(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { - if (skb->nf_bridge && !skb->nf_bridge->in_prerouting) { + if (skb->nf_bridge && !skb->nf_bridge->in_prerouting && + !netif_is_l3_master(skb->dev)) { state->okfn(state->net, state->sk, skb); return NF_STOLEN; }
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
On 9/5/18 12:00 PM, D'Souza, Nelson wrote: > Just following up would you be able to confirm that this is a Linux VRF > issue? I can confirm that I can reproduce the problem. Need to find time to dig into it.
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
Hi David, Just following up would you be able to confirm that this is a Linux VRF issue? Also, how do I log a VRF related defect to ensure this gets resolved in a subsequent release. Thanks, Nelson On 8/2/18, 4:12 PM, "D'Souza, Nelson" wrote: Hi David, Turns out the VRF bridge Rx issue is triggered by a docker install. Docker makes the following sysctl changes: net.bridge.bridge-nf-call-arptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 <<< exposes the ipv4 VRF Rx issue when a bridge is enslaved to a VRF which causes packets flowing through all bridges to be subjected to netfilter rules. This is required for bridge net filtering when ip forwarding is enabled. Please refer to https://github.com/docker/libnetwork/blob/master/drivers/bridge/setup_bridgenetfiltering.go#L53 Setting net.bridge.bridge-nf-call-iptables = 0 resolves the issue, but is not really a viable option given that bridge net filtering is a basic requirement in existing docker deployments. It's not clear to me why this conf setting breaks local Rx delivery for a bridge enslaved to a VRF, because these packets would always be sent up by the bridge for IP netfilter processing. This issue is easily reproducible on an Ubuntu 18.04.1 VM. Simply installing docker will cause pings running on test-vrf to fail. Clearing the sysctl conf restores Rx local delivery. Thanks, Nelson On 7/27/18, 4:29 PM, "D'Souza, Nelson" wrote: David, With Ubuntu 18.04.1 (kernel 4.15.0-29) pings sent out on test-vrf and br0 are successful. # uname -rv 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 # ping -c 1 -I test-vrf 172.16.2.2 ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.050 ms --- 172.16.2.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.050/0.050/0.050/0.000 ms # ping -c 1 -I br0 172.16.2.2 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.026 ms --- 172.16.2.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms However, with Ubuntu 17.10.1 (kernel 4.13.0-21) pings on only test-vrf are successful. Pings on br0 are not successful. So it seems like there maybe a change in versions after 4.13.0-21 that causes pings on br0 to pass. Nelson On 7/25/18, 5:35 PM, "D'Souza, Nelson" wrote: David, I tried out the commands on an Ubuntu 17.10.1 VM. The pings on test-vrf are successful, but the pings on br0 are not successful. # uname -rv 4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017 # lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 17.10 Release:17.10 Codename: artful # ip rule --> Note: its missing the l3mdev rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default Ran the configs from a bash script vrf.sh # ./vrf.sh + ip netns add foo + ip li add veth1 type veth peer name veth2 + ip li set veth2 netns foo + ip -netns foo li set lo up + ip -netns foo li set veth2 up + ip -netns foo addr add 172.16.1.2/24 dev veth2 + ip li add test-vrf type vrf table 123 + ip li set test-vrf up + ip ro add vrf test-vrf unreachable default + ip li add br0 type bridge + ip li set veth1 master br0 + ip li set veth1 up + ip li set br0 up + ip addr add dev br0 172.16.1.1/24 + ip li set br0 master test-vrf + ip -netns foo addr add 172.16.2.2/32 dev lo + ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2 # ping -I test-vrf 172.16.2.2 -c 2 <<< successful on test-vrf ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 172.16.2.2: icmp
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
Hi David, Turns out the VRF bridge Rx issue is triggered by a docker install. Docker makes the following sysctl changes: net.bridge.bridge-nf-call-arptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 <<< exposes the ipv4 VRF Rx issue when a bridge is enslaved to a VRF which causes packets flowing through all bridges to be subjected to netfilter rules. This is required for bridge net filtering when ip forwarding is enabled. Please refer to https://github.com/docker/libnetwork/blob/master/drivers/bridge/setup_bridgenetfiltering.go#L53 Setting net.bridge.bridge-nf-call-iptables = 0 resolves the issue, but is not really a viable option given that bridge net filtering is a basic requirement in existing docker deployments. It's not clear to me why this conf setting breaks local Rx delivery for a bridge enslaved to a VRF, because these packets would always be sent up by the bridge for IP netfilter processing. This issue is easily reproducible on an Ubuntu 18.04.1 VM. Simply installing docker will cause pings running on test-vrf to fail. Clearing the sysctl conf restores Rx local delivery. Thanks, Nelson On 7/27/18, 4:29 PM, "D'Souza, Nelson" wrote: David, With Ubuntu 18.04.1 (kernel 4.15.0-29) pings sent out on test-vrf and br0 are successful. # uname -rv 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 # ping -c 1 -I test-vrf 172.16.2.2 ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.050 ms --- 172.16.2.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.050/0.050/0.050/0.000 ms # ping -c 1 -I br0 172.16.2.2 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.026 ms --- 172.16.2.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms However, with Ubuntu 17.10.1 (kernel 4.13.0-21) pings on only test-vrf are successful. Pings on br0 are not successful. So it seems like there maybe a change in versions after 4.13.0-21 that causes pings on br0 to pass. Nelson On 7/25/18, 5:35 PM, "D'Souza, Nelson" wrote: David, I tried out the commands on an Ubuntu 17.10.1 VM. The pings on test-vrf are successful, but the pings on br0 are not successful. # uname -rv 4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017 # lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 17.10 Release:17.10 Codename: artful # ip rule --> Note: its missing the l3mdev rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default Ran the configs from a bash script vrf.sh # ./vrf.sh + ip netns add foo + ip li add veth1 type veth peer name veth2 + ip li set veth2 netns foo + ip -netns foo li set lo up + ip -netns foo li set veth2 up + ip -netns foo addr add 172.16.1.2/24 dev veth2 + ip li add test-vrf type vrf table 123 + ip li set test-vrf up + ip ro add vrf test-vrf unreachable default + ip li add br0 type bridge + ip li set veth1 master br0 + ip li set veth1 up + ip li set br0 up + ip addr add dev br0 172.16.1.1/24 + ip li set br0 master test-vrf + ip -netns foo addr add 172.16.2.2/32 dev lo + ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2 # ping -I test-vrf 172.16.2.2 -c 2 <<< successful on test-vrf ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms --- 172.16.2.2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1022ms rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms #ping -I br0 172.16.2.2 -c 2 <<< fails on br0 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. --- 172.16.2.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1022ms Please let me know if I should try a different version. Nelson On 7/24/18, 9:08 AM, "D'Souza, Nelson" wrote: It's
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
David, With Ubuntu 18.04.1 (kernel 4.15.0-29) pings sent out on test-vrf and br0 are successful. # uname -rv 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 # ping -c 1 -I test-vrf 172.16.2.2 ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.050 ms --- 172.16.2.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.050/0.050/0.050/0.000 ms # ping -c 1 -I br0 172.16.2.2 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.026 ms --- 172.16.2.2 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms However, with Ubuntu 17.10.1 (kernel 4.13.0-21) pings on only test-vrf are successful. Pings on br0 are not successful. So it seems like there maybe a change in versions after 4.13.0-21 that causes pings on br0 to pass. Nelson On 7/25/18, 5:35 PM, "D'Souza, Nelson" wrote: David, I tried out the commands on an Ubuntu 17.10.1 VM. The pings on test-vrf are successful, but the pings on br0 are not successful. # uname -rv 4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017 # lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 17.10 Release:17.10 Codename: artful # ip rule --> Note: its missing the l3mdev rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default Ran the configs from a bash script vrf.sh # ./vrf.sh + ip netns add foo + ip li add veth1 type veth peer name veth2 + ip li set veth2 netns foo + ip -netns foo li set lo up + ip -netns foo li set veth2 up + ip -netns foo addr add 172.16.1.2/24 dev veth2 + ip li add test-vrf type vrf table 123 + ip li set test-vrf up + ip ro add vrf test-vrf unreachable default + ip li add br0 type bridge + ip li set veth1 master br0 + ip li set veth1 up + ip li set br0 up + ip addr add dev br0 172.16.1.1/24 + ip li set br0 master test-vrf + ip -netns foo addr add 172.16.2.2/32 dev lo + ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2 # ping -I test-vrf 172.16.2.2 -c 2 <<< successful on test-vrf ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms --- 172.16.2.2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1022ms rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms #ping -I br0 172.16.2.2 -c 2 <<< fails on br0 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. --- 172.16.2.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1022ms Please let me know if I should try a different version. Nelson On 7/24/18, 9:08 AM, "D'Souza, Nelson" wrote: It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but enslaving eth1->test-vrf works fine. Nelson On 7/24/18, 8:58 AM, "D'Souza, Nelson" wrote: Thank you David, really appreciate the help. Most likely something specific to my environment. ip vrf id, does not report anything on my system. Here's the result after running the command. # ip vrf id # I'll follow up with a VM. Nelson On 7/24/18, 5:55 AM, "David Ahern" wrote: On 7/23/18 7:43 PM, D'Souza, Nelson wrote: > I copy and pasted the configs onto my device, but pings on test-vrf do not work in my setup. > I'm essentially seeing the same issue as I reported before. > > In this case, pings sent out on test-vrf (host ns) are received and replied to by the loopback interface (foo ns). Although the replies are seen at the test-vrf level, they are not locally delivered to the ping application. > I just built v4.14.52 kernel and ran those commands - worked fine. It is something specific to your environment. Is your shell tied to a VRF -- (ip vrf id)? After that, I suggest you create a VM running a newer distribution of your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 kernel, or
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
David, To narrow down on the issue, I've been requested by our kernel team for the following information: "Can you clarify what kernel configuration was used for the clean 4.14.52 kernel (no changes) The kernel configuration may be available in /proc/config.gz, or it might be available as a text file in the /boot directory." Would you be able to provide this? Nelson On 7/25/18, 5:35 PM, "D'Souza, Nelson" wrote: David, I tried out the commands on an Ubuntu 17.10.1 VM. The pings on test-vrf are successful, but the pings on br0 are not successful. # uname -rv 4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017 # lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 17.10 Release:17.10 Codename: artful # ip rule --> Note: its missing the l3mdev rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default Ran the configs from a bash script vrf.sh # ./vrf.sh + ip netns add foo + ip li add veth1 type veth peer name veth2 + ip li set veth2 netns foo + ip -netns foo li set lo up + ip -netns foo li set veth2 up + ip -netns foo addr add 172.16.1.2/24 dev veth2 + ip li add test-vrf type vrf table 123 + ip li set test-vrf up + ip ro add vrf test-vrf unreachable default + ip li add br0 type bridge + ip li set veth1 master br0 + ip li set veth1 up + ip li set br0 up + ip addr add dev br0 172.16.1.1/24 + ip li set br0 master test-vrf + ip -netns foo addr add 172.16.2.2/32 dev lo + ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2 # ping -I test-vrf 172.16.2.2 -c 2 <<< successful on test-vrf ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms --- 172.16.2.2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1022ms rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms #ping -I br0 172.16.2.2 -c 2 <<< fails on br0 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. --- 172.16.2.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1022ms Please let me know if I should try a different version. Nelson On 7/24/18, 9:08 AM, "D'Souza, Nelson" wrote: It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but enslaving eth1->test-vrf works fine. Nelson On 7/24/18, 8:58 AM, "D'Souza, Nelson" wrote: Thank you David, really appreciate the help. Most likely something specific to my environment. ip vrf id, does not report anything on my system. Here's the result after running the command. # ip vrf id # I'll follow up with a VM. Nelson On 7/24/18, 5:55 AM, "David Ahern" wrote: On 7/23/18 7:43 PM, D'Souza, Nelson wrote: > I copy and pasted the configs onto my device, but pings on test-vrf do not work in my setup. > I'm essentially seeing the same issue as I reported before. > > In this case, pings sent out on test-vrf (host ns) are received and replied to by the loopback interface (foo ns). Although the replies are seen at the test-vrf level, they are not locally delivered to the ping application. > I just built v4.14.52 kernel and ran those commands - worked fine. It is something specific to your environment. Is your shell tied to a VRF -- (ip vrf id)? After that, I suggest you create a VM running a newer distribution of your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 kernel, or Fedora 26 or newer) and run the commands there.
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
David, I tried out the commands on an Ubuntu 17.10.1 VM. The pings on test-vrf are successful, but the pings on br0 are not successful. # uname -rv 4.13.0-21-generic #24-Ubuntu SMP Mon Dec 18 17:29:16 UTC 2017 # lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description:Ubuntu 17.10 Release:17.10 Codename: artful # ip rule --> Note: its missing the l3mdev rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default Ran the configs from a bash script vrf.sh # ./vrf.sh + ip netns add foo + ip li add veth1 type veth peer name veth2 + ip li set veth2 netns foo + ip -netns foo li set lo up + ip -netns foo li set veth2 up + ip -netns foo addr add 172.16.1.2/24 dev veth2 + ip li add test-vrf type vrf table 123 + ip li set test-vrf up + ip ro add vrf test-vrf unreachable default + ip li add br0 type bridge + ip li set veth1 master br0 + ip li set veth1 up + ip li set br0 up + ip addr add dev br0 172.16.1.1/24 + ip li set br0 master test-vrf + ip -netns foo addr add 172.16.2.2/32 dev lo + ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2 # ping -I test-vrf 172.16.2.2 -c 2 <<< successful on test-vrf ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.045 ms --- 172.16.2.2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1022ms rtt min/avg/max/mdev = 0.035/0.040/0.045/0.005 ms #ping -I br0 172.16.2.2 -c 2 <<< fails on br0 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. --- 172.16.2.2 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1022ms Please let me know if I should try a different version. Nelson On 7/24/18, 9:08 AM, "D'Souza, Nelson" wrote: It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but enslaving eth1->test-vrf works fine. Nelson On 7/24/18, 8:58 AM, "D'Souza, Nelson" wrote: Thank you David, really appreciate the help. Most likely something specific to my environment. ip vrf id, does not report anything on my system. Here's the result after running the command. # ip vrf id # I'll follow up with a VM. Nelson On 7/24/18, 5:55 AM, "David Ahern" wrote: On 7/23/18 7:43 PM, D'Souza, Nelson wrote: > I copy and pasted the configs onto my device, but pings on test-vrf do not work in my setup. > I'm essentially seeing the same issue as I reported before. > > In this case, pings sent out on test-vrf (host ns) are received and replied to by the loopback interface (foo ns). Although the replies are seen at the test-vrf level, they are not locally delivered to the ping application. > I just built v4.14.52 kernel and ran those commands - worked fine. It is something specific to your environment. Is your shell tied to a VRF -- (ip vrf id)? After that, I suggest you create a VM running a newer distribution of your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 kernel, or Fedora 26 or newer) and run the commands there.
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
It's strange that enslaving eth1 -> br0 -> test-vrf does not work, but enslaving eth1->test-vrf works fine. Nelson On 7/24/18, 8:58 AM, "D'Souza, Nelson" wrote: Thank you David, really appreciate the help. Most likely something specific to my environment. ip vrf id, does not report anything on my system. Here's the result after running the command. # ip vrf id # I'll follow up with a VM. Nelson On 7/24/18, 5:55 AM, "David Ahern" wrote: On 7/23/18 7:43 PM, D'Souza, Nelson wrote: > I copy and pasted the configs onto my device, but pings on test-vrf do not work in my setup. > I'm essentially seeing the same issue as I reported before. > > In this case, pings sent out on test-vrf (host ns) are received and replied to by the loopback interface (foo ns). Although the replies are seen at the test-vrf level, they are not locally delivered to the ping application. > I just built v4.14.52 kernel and ran those commands - worked fine. It is something specific to your environment. Is your shell tied to a VRF -- (ip vrf id)? After that, I suggest you create a VM running a newer distribution of your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 kernel, or Fedora 26 or newer) and run the commands there.
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
Thank you David, really appreciate the help. Most likely something specific to my environment. ip vrf id, does not report anything on my system. Here's the result after running the command. # ip vrf id # I'll follow up with a VM. Nelson On 7/24/18, 5:55 AM, "David Ahern" wrote: On 7/23/18 7:43 PM, D'Souza, Nelson wrote: > I copy and pasted the configs onto my device, but pings on test-vrf do not work in my setup. > I'm essentially seeing the same issue as I reported before. > > In this case, pings sent out on test-vrf (host ns) are received and replied to by the loopback interface (foo ns). Although the replies are seen at the test-vrf level, they are not locally delivered to the ping application. > I just built v4.14.52 kernel and ran those commands - worked fine. It is something specific to your environment. Is your shell tied to a VRF -- (ip vrf id)? After that, I suggest you create a VM running a newer distribution of your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 kernel, or Fedora 26 or newer) and run the commands there.
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
On 7/23/18 7:43 PM, D'Souza, Nelson wrote: > I copy and pasted the configs onto my device, but pings on test-vrf do not > work in my setup. > I'm essentially seeing the same issue as I reported before. > > In this case, pings sent out on test-vrf (host ns) are received and replied > to by the loopback interface (foo ns). Although the replies are seen at the > test-vrf level, they are not locally delivered to the ping application. > I just built v4.14.52 kernel and ran those commands - worked fine. It is something specific to your environment. Is your shell tied to a VRF -- (ip vrf id)? After that, I suggest you create a VM running a newer distribution of your choice (Ubuntu 17.10 or newer, debian stretch with 4.14 kernel, or Fedora 26 or newer) and run the commands there.
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
Hi David, I copy and pasted the configs onto my device, but pings on test-vrf do not work in my setup. I'm essentially seeing the same issue as I reported before. In this case, pings sent out on test-vrf (host ns) are received and replied to by the loopback interface (foo ns). Although the replies are seen at the test-vrf level, they are not locally delivered to the ping application. Logs are as follows... a) pings on test-vrf or br0 fail. # ping -I test-vrf 172.16.2.2 -c1 -w1 PING 172.16.2.2 (172.16.2.2): 56 data bytes --- 172.16.2.2 ping statistics --- 1 packets transmitted, 0 packets received, 100% packet loss b) tcpdump in the foo namespace, shows icmp echos/replies on veth2 # ip netns exec foo tcpdump -i veth2 icmp -c 2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on veth2, link-type EN10MB (Ethernet), capture size 262144 bytes 18:34:13.205210 IP 172.16.1.1 > 172.16.2.2: ICMP echo request, id 19513, seq 0, length 64 18:34:13.205253 IP 172.16.2.2 > 172.16.1.1: ICMP echo reply, id 19513, seq 0, length 64 2 packets captured 2 packets received by filter 0 packets dropped by kernel c) tcpdump in the host namespace, shows icmp echos/replies on test-vrf, br0 and veth1: # tcpdump -i test-vrf icmp -c 2 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on test-vrf, link-type EN10MB (Ethernet), capture size 262144 bytes 18:34:13.204061 IP 172.16.1.1 > 172.16.2.2: ICMP echo request, id 19513, seq 0, length 64 18:34:13.205278 IP 172.16.2.2 > 172.16.1.1: ICMP echo reply, id 19513, seq 0, length 64 2 packets captured 2 packets received by filter 0 packets dropped by kernel Thanks, Nelson On 7/23/18, 3:00 PM, "David Ahern" wrote: On 7/20/18 1:03 PM, D'Souza, Nelson wrote: > Setup is as follows: > > ethUSB(ingress port) -> mgmtbr0 (bridge) -> mgmtvrf (vrf) | netns foo [ test-vrf ]| | | [ br0 ] 172.16.1.1 | | | [ veth1 ] |=== [ veth2 ] lo | 172.16.1.2 172.16.2.2 | Copy and paste the following into your environment: ip netns add foo ip li add veth1 type veth peer name veth2 ip li set veth2 netns foo ip -netns foo li set lo up ip -netns foo li set veth2 up ip -netns foo addr add 172.16.1.2/24 dev veth2 ip li add test-vrf type vrf table 123 ip li set test-vrf up ip ro add vrf test-vrf unreachable default ip li add br0 type bridge ip li set veth1 master br0 ip li set veth1 up ip li set br0 up ip addr add dev br0 172.16.1.1/24 ip li set br0 master test-vrf ip -netns foo addr add 172.16.2.2/32 dev lo ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2 Does ping work? # ping -I test-vrf 172.16.2.2 ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.228 ms 64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.263 ms and: # ping -I br0 172.16.2.2 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.227 ms 64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.223 ms ^C --- 172.16.2.2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.223/0.225/0.227/0.002 ms
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
On 7/20/18 1:03 PM, D'Souza, Nelson wrote: > Setup is as follows: > > ethUSB(ingress port) -> mgmtbr0 (bridge) -> mgmtvrf (vrf) | netns foo [ test-vrf ]| | | [ br0 ] 172.16.1.1 | | | [ veth1 ] |=== [ veth2 ] lo | 172.16.1.2 172.16.2.2 | Copy and paste the following into your environment: ip netns add foo ip li add veth1 type veth peer name veth2 ip li set veth2 netns foo ip -netns foo li set lo up ip -netns foo li set veth2 up ip -netns foo addr add 172.16.1.2/24 dev veth2 ip li add test-vrf type vrf table 123 ip li set test-vrf up ip ro add vrf test-vrf unreachable default ip li add br0 type bridge ip li set veth1 master br0 ip li set veth1 up ip li set br0 up ip addr add dev br0 172.16.1.1/24 ip li set br0 master test-vrf ip -netns foo addr add 172.16.2.2/32 dev lo ip ro add vrf test-vrf 172.16.2.2/32 via 172.16.1.2 Does ping work? # ping -I test-vrf 172.16.2.2 ping: Warning: source address might be selected on device other than test-vrf. PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 test-vrf: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.228 ms 64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.263 ms and: # ping -I br0 172.16.2.2 PING 172.16.2.2 (172.16.2.2) from 172.16.1.1 br0: 56(84) bytes of data. 64 bytes from 172.16.2.2: icmp_seq=1 ttl=64 time=0.227 ms 64 bytes from 172.16.2.2: icmp_seq=2 ttl=64 time=0.223 ms ^C --- 172.16.2.2 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.223/0.225/0.227/0.002 ms
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
The Linux kernel has kernel patches applied beyond 4.14.52 but aside from that it has no custom changes. Currently don't have perf on the linux system, so will have to get back to you with the perf traces. Meanwhile, here's the ip outputs you requested. root@x10sdv-4c-tln4f:~# ip rule ls 0: from all lookup local 1000: from all lookup [l3mdev-table] 32766: from all lookup main 32767: from all lookup default root@x10sdv-4c-tln4f:~# ip route ls vrf mgmtvrf default via 10.33.96.1 dev mgmtbr0 10.33.96.0/24 dev mgmtbr0 proto kernel scope link src 10.33.96.131 root@x10sdv-4c-tln4f:~# ip link show vrf mgmtvrf 16: mgmtbr0: mtu 1500 qdisc noqueue master mgmtvrf state UP mode DEFAULT group default qlen 1000 link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff Thanks, Nelson On 7/20/18, 12:11 PM, "David Ahern" wrote: On 7/20/18 1:03 PM, D'Souza, Nelson wrote: > Hi Dave, > > It is good to know that this works in your case. However, I'm not able to pinpoint what the issue is and looking for a way to narrow down to the root cause. > Do you know if this has been an issue in the past and resolved in Linux kernel versions after 4.14.52? It has always worked as far as I recall. > > I have the same setup as you and tcpdump works at all levels (eth, bridge, vrf). > > Setup is as follows: > > ethUSB(ingress port) -> mgmtbr0 (bridge) -> mgmtvrf (vrf) > > Logs from my setup: > > b) ethUSB is enslaved to mgmtbr0 (bridge) > > root@x10sdv-4c-tln4f:~# ip link show master mgmtbr0 > 6: ethUSB: mtu 1500 qdisc pfifo_fast master mgmtbr0 state UNKNOWN mode DEFAULT group default qlen 1000 > link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff > > b) mgmtbr0 bridge is enslaved to mgmtvrf (vrf) > > root@x10sdv-4c-tln4f:~# ip link show master mgmtvrf > 16: mgmtbr0: mtu 1500 qdisc noqueue master mgmtvrf state UP mode DEFAULT group default qlen 1000 > link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff > > c) ip address configured on mgmtbr0 > > root@x10sdv-4c-tln4f:~# ip addr show dev mgmtbr0 > 16: mgmtbr0: mtu 1500 qdisc noqueue master mgmtvrf state UP group default qlen 1000 > link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff > inet 10.33.96.131/24 brd 10.33.96.255 scope global mgmtbr0 >valid_lft forever preferred_lft forever > inet6 fe80::c256:27ff:fe90:4f75/64 scope link >valid_lft forever preferred_lft forever > > d) tcpdump on ethUSB successful, but ping fails > root@x10sdv-4c-tln4f:~# ping 10.32.8.135 -I mgmtvrf -c1 -w1 > PING 10.32.8.135 (10.32.8.135): 56 data bytes > --- 10.32.8.135 ping statistics --- > 1 packets transmitted, 0 packets received, 100% packet loss > > root@x10sdv-4c-tln4f:~# tcpdump -i ethUSB icmp > 11:38:37.169678 IP 10.33.96.131 > 10.32.8.135: ICMP echo request, id 62312, seq 0, length 64 > 11:38:37.170906 IP 10.32.8.135 > 10.33.96.131: ICMP echo reply, id 62312, seq 0, length 64 First, is this a modified kernel? What does the following show? $ ip ru ls $ ip route ls vrf mgmt $ ip li sh vrf mgmt Try perf: perf record -e fib:* -a -g -- sleep 3 (run ping during record) perf script Look at the table used for lookups. Is the correct one for the mgmt vrf?
Re: [**EXTERNAL**] Re: VRF with enslaved L3 enabled bridge
Hi Dave, It is good to know that this works in your case. However, I'm not able to pinpoint what the issue is and looking for a way to narrow down to the root cause. Do you know if this has been an issue in the past and resolved in Linux kernel versions after 4.14.52? I have the same setup as you and tcpdump works at all levels (eth, bridge, vrf). Setup is as follows: ethUSB(ingress port) -> mgmtbr0 (bridge) -> mgmtvrf (vrf) Logs from my setup: b) ethUSB is enslaved to mgmtbr0 (bridge) root@x10sdv-4c-tln4f:~# ip link show master mgmtbr0 6: ethUSB: mtu 1500 qdisc pfifo_fast master mgmtbr0 state UNKNOWN mode DEFAULT group default qlen 1000 link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff b) mgmtbr0 bridge is enslaved to mgmtvrf (vrf) root@x10sdv-4c-tln4f:~# ip link show master mgmtvrf 16: mgmtbr0: mtu 1500 qdisc noqueue master mgmtvrf state UP mode DEFAULT group default qlen 1000 link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff c) ip address configured on mgmtbr0 root@x10sdv-4c-tln4f:~# ip addr show dev mgmtbr0 16: mgmtbr0: mtu 1500 qdisc noqueue master mgmtvrf state UP group default qlen 1000 link/ether c0:56:27:90:4f:75 brd ff:ff:ff:ff:ff:ff inet 10.33.96.131/24 brd 10.33.96.255 scope global mgmtbr0 valid_lft forever preferred_lft forever inet6 fe80::c256:27ff:fe90:4f75/64 scope link valid_lft forever preferred_lft forever d) tcpdump on ethUSB successful, but ping fails root@x10sdv-4c-tln4f:~# ping 10.32.8.135 -I mgmtvrf -c1 -w1 PING 10.32.8.135 (10.32.8.135): 56 data bytes --- 10.32.8.135 ping statistics --- 1 packets transmitted, 0 packets received, 100% packet loss root@x10sdv-4c-tln4f:~# tcpdump -i ethUSB icmp 11:38:37.169678 IP 10.33.96.131 > 10.32.8.135: ICMP echo request, id 62312, seq 0, length 64 11:38:37.170906 IP 10.32.8.135 > 10.33.96.131: ICMP echo reply, id 62312, seq 0, length 64 e) tcpdump on mgmtbr0 successful, but ping fails root@x10sdv-4c-tln4f:~# ping 10.32.8.135 -I mgmtvrf -c1 -w1 PING 10.32.8.135 (10.32.8.135): 56 data bytes --- 10.32.8.135 ping statistics --- 1 packets transmitted, 0 packets received, 100% packet loss root@x10sdv-4c-tln4f:~# tcpdump -i mgmtbr0 icmp 11:46:21.566739 IP 10.33.96.131 > 10.32.8.135: ICMP echo request, id 617, seq 0, length 64 11:46:21.567982 IP 10.32.8.135 > 10.33.96.131: ICMP echo reply, id 617, seq 0, length 64 f) tcpdump on mgmtvrf successful, but ping fails root@x10sdv-4c-tln4f:~# ping 10.32.8.135 -I mgmtvrf -c1 -w1 PING 10.32.8.135 (10.32.8.135): 56 data bytes --- 10.32.8.135 ping statistics --- 1 packets transmitted, 0 packets received, 100% packet loss root@x10sdv-4c-tln4f:~# tcpdump -i mgmtvrf icmp 11:50:24.155706 IP 10.33.96.131 > 10.32.8.135: ICMP echo request, id 2153, seq 0, length 64 11:50:24.156977 IP 10.32.8.135 > 10.33.96.131: ICMP echo reply, id 2153, seq 0, length 64 f) Netfilter prerouting rules added to the raw table, only sees packets ingressing on mgmtbr0, not mgmtvrf. root@x10sdv-4c-tln4f:~# iptables -t raw -nvL PREROUTING Chain PREROUTING (policy ACCEPT 3 packets, 252 bytes) pkts bytes target prot opt in out source destination 3 252 LOGall -- mgmtbr0 * 10.32.8.135 0.0.0.0/0 LOG flags 0 level 4 0 0 LOGall -- mgmtvrf * 10.32.8.135 0.0.0.0/0 LOG flags 0 level 4 It's strange that while the tcpdump works at the mgmtvrf level, netfilter prerouting rules do not match on the mgmtvrf level. Appreciate the help, please let me know if you need additional logs. Thanks, Nelson On 7/19/18, 8:37 PM, "David Ahern" wrote: On 7/19/18 8:19 PM, D'Souza, Nelson wrote: > Hi, > > > > I'm seeing the following issue on a system running a 4.14.52 Linux kernel. > > > > With an eth interface enslaved to a VRF device, pings sent out on the > VRF to an neighboring host are successful. But, with an eth interface > enslaved to a L3 enabled bridge (mgmtbr0), and the bridge enslaved to a > l3mdev VRF (mgmtvrf), the pings sent out on the VRF are not received > back at the application level. you mean this setup: eth1 (ingress port) -> br0 (bridge) -> red (vrf) IP address on br0: 9: br0: mtu 1500 qdisc noqueue master red state UP group default qlen 1000 link/ether 02:e0:f9:1c:00:37 brd ff:ff:ff:ff:ff:ff inet 10.100.1.4/24 scope global br0 valid_lft forever preferred_lft forever inet6 fe80::e0:f9ff:fe1c:37/64 scope link valid_lft forever preferred_lft forever And then ping a neighbor: # ping -I red -c1 -w1 10.100.1.254 ping: Warning: source address might be selected on device other than red. PING 10.100.1.254 (10.100.1.254) from 10.100.1.4 red: 56(84) bytes of data. 64 bytes from 10.100.1.254: icmp_seq=1 ttl=64 time=0.810 ms --- 10.100.1.254 ping statistics --- 1 packets transmitted