On 2/17/2021 7:54 PM, Tyler Stachecki wrote:
Thanks - I was worried about the unforeseen side effects, especially with
my unfamiliarity of this part of the tree and OVS.

After looking at the tree, I see uses of dev_net(dev) for this kind of test
within IP tunneling sections, which also handles the CONFIG_NET_NS=n case.

Thanks for you time,
Tyler

Right, dev_net(dev) is the proper way to do that test.  Glad you got it
figured out.

- Greg



On Tue, Feb 16, 2021 at 3:03 PM Gregory Rose <[email protected]> wrote:



On 2/13/2021 8:01 AM, Tyler Stachecki wrote:
I've fixed the issue in such a way that it works for me (TM), but would
appreciate confirmation from an OVS expert that I'm not overlooking
something here:

Based on my last post, we need:
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -503,6 +503,7 @@ void ovs_vport_send(struct vport *vport, struct
sk_buff
*skb, u8 mac_proto)
          }

          skb->dev = vport->dev;
+       skb->tstamp = 0;
          vport->ops->send(skb);
          return;

Hmm... I'm not so sure about this.  The skb_scrub_packet() function only
clears skb->tstamp if the @xnet boolean parameter is true.  In this case
you are doing it unconditionally which very well might have unforeseen
side effects.

Maybe test skb->dev->nd_net and if it isn't NULL then clear the
tstamp?

What do you think?

- Greg


As the timestamp must be cleared when forwarding packets to a different
namespace ref:

https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/#1871003

Cheers,
Tyler

On Sat, Feb 13, 2021 at 12:04 AM Tyler Stachecki <
[email protected]>
wrote:

Here's the offender:

commit fb420d5d91c1274d5966917725e71f27ed092a85 (refs/bisect/bad)
Author: Eric Dumazet <[email protected]>
Date:   Fri Sep 28 10:28:44 2018 -0700

      tcp/fq: move back to CLOCK_MONOTONIC

Without this, I wasn't able to make it past the 4.20 series.  I
forward-ported a reversion to 5.4 LTS for fun and things still work
great.
Though it sounds like simply reverting this is not the right fix -- some
interesting discussion on others impact of this commit:
https://lists.openwall.net/netdev/2019/01/10/36

Then, we probably need to clear skb->tstamp in more paths (you are
mentioning bridge ...)

I will try to take a peek sometime this weekend to see if I can spot
where
in OVS, assuming it is there.

On Tue, Feb 9, 2021 at 4:22 PM Gregory Rose <[email protected]>
wrote:



On 2/8/2021 4:19 PM, Tyler Stachecki wrote:
Thanks for the reply.  This is router, so it is using conntrack;
unsure
if
there is additional connection tracking in OVS.  `ovs-ofctl dump-flows
br-util` shows exactly one flow: the default one.

Here's my approx /etc/network//interfaces.  I just attach VMs to this
with
libvirt and having nothing else added at this point:
allow-ovs br-util
iface br-util inet manual
           ovs_type OVSBridge
           ovs_ports enp0s20f1.102 vrf-util

allow-br-util enp0s20f1.102
auto enp0s20f1.102
iface enp0s20f1.102 inet manual
           ovs_bridge br-util
           ovs_type OVSPort
           mtu 9000

allow-br-util vrf-util
iface vrf-util inet static
           ovs_bridge br-util
           ovs_type OVSIntPort
           address 10.10.2.1/24
           mtu 9000

I roughly transcribed what I was doing into a Linux bridge, and it
works as
expected in 5.10... e.g. this in my /etc/network/interfaces:
auto enp0s20f1.102
iface enp0s20f1.102 inet manual
           mtu 9000

auto vrf-util
iface vrf-util inet static
           bridge_ports enp0s20f1.102
           bridge-vlan-aware no
           address 10.10.2.1/24
           mtu 9000

I'm having a bit of a tough time following the dataflow code, and the
~1
commit or so I was missing from the kernel staging tree does not seem
to
have fixed the issue.

Hi Tyler,

this does not sound like the previous issue I mentioned because that
one
was caused by flow programming for dropping packets.

I hate to say it but you're probably going to have to resort to a
bisect to find this one.

- Greg


On Mon, Feb 8, 2021 at 6:21 PM Gregory Rose <[email protected]>
wrote:



On 2/6/2021 9:50 AM, Tyler Stachecki wrote:
I have simple forwarding issues when running the Debian stable
backports
kernel (5.9) that I don't see with the stable, non-backported 4.19
kernel.
Big fat disclaimer: I compiled my OVS (2.14.1) from source, but
given
it
works with the 4.19 kernel I doubt it has anything to do with it.
For
good
measure, I also compiled 5.10.8 from source and see the same issue I
do
in
5.9.

The issue I see on 5.x (config snippets below):
My VM (vnet0 - 10.10.0.16/24) can ARP/ping for other physical hosts
on
its
subnet (e.g. 00:07:32:4d:2f:71 = 10.10.0.23/24 below), but only the
first
echo request in a sequence is seen by the destination host.  I then
have
to
wait about 10 seconds before pinging the destination host from the
VM
again, but again only the first echo in a sequence gets a reply.

I've tried tcpdump'ing enp0s20f1.102 (the external interface on the
hypervisor) and see the pings going out that interface at the rate I
would
expect.  OTOH, when I tcpdump on the destination host, I only see
the
first
of the ICMP echo requests in a sequence (for which an echo reply is
sent).

I then added an OVS internal port on the hypervisor (i.e., on
br-util)
and
gave it an IP address (10.10.2.1/24).  It is able to ping that same
external host just fine.  Likewise, I am able to ping between the VM
and
the OVS internal port just fine.

When I rollback to 4.19, this weirdness about traffic going out of
enp0s20f1.102 *for the VM* goes away and everything just works.  Any
clues
while I start ripping into code?

Are you using any of the connection tracking capabilities? I vaguely
recall some issue that sounds a lot like what you're seeing but do
not
see anything in the git log to stir my memory.  IIRC though it was a
similar problem.

Maybe provide a dump of your flows.

- Greg







_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to