Hello,

                I’m trying to run down an issue with a couple of my servers
but I’m having a really hard time pinpointing the root cause.  I have
around 250 servers up an running and after about a year one of the servers
is no longer able to communicate over OVN.  About two months later another
server fell into this same state.  For a given ovn switch any two VMs
connected to that switch can talk to each other unless one of the endpoints
resides on one of these failed servers.  If both VMs are on the same server
they have no problem communicating through the ovs bridge.  Turning up
various different debug I can’t determine why these servers are having
issues.  Ovn-trace shows that it should work.  I see their chassis in the
southbound database.  Doing tcpdump on the different servers I can see a
geneve encapsulated arp going out of the server and coming back in.  It
never seems to get the vm interface though. Tcpdump on the vm interface
only shows the arp going out and never coming back.   Turning up
openvswitch debug I see debug statements saying the flow is sent but I
never see flow received like I do on working boxes.  What other tools/debug
can I bring to bear to try and figure out what is wrong?  It feels like
perhaps something isn’t getting cleaned up somewhere.  Again I have many
servers working with the same configuration as these two servers and these
two servers used to work without issue.  I’ve tried completely
re-installing the OS and reconfiguring the bad servers and the problem
still persists.   I have a lot of users using this setup but I may try and
upgrade to a newer version of ovs(2.12) vs. 2.7-2 that I’m on now if I can
get some system downtime.  I’m also currently using RHEL 7.8 as the OS.

Thanks, john
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to