Re: [ovs-discuss] Unexpected upcalls
Hi, On 2023-06-12 18:40, Ilya Maximets wrote: Flushing datapath flows underneath running ovs-vswitchd process may cause all kind of weird behaviour. So, the del-flows as well as most of the other ovs-dpctl commands should not be used under normal circumstances. Here is what the man page says: """ Do not use commands to add or remove or modify datapath flows if ovs-vswitchd is running because it interferes with ovs-vswitchd's own datapath flow management. Use ovs-ofctl(8), instead, to work with OpenFlow flow entries. """ Ok. I didn't notice this paragraph. Thank you for pointing that out. Could you please explain in more detail why flushing the datapath flows would cause these problems? Because some of my assumptions about OVS must wrong and I don't know which. Just a link to relevant source code could probably help me a lot. My current understanding is that the datapath flows are always installed as a reaction to an upcall. There are no statically installed datapath flows. Also, the revalidator threads always dump the real content of the datapath and then check every dumped flow if it should be deleted. It does not have to keep track of the flows in between runs. So, assuming the above, when I flush the flows, all packets will generate upcalls and the necessary rules will be immediately installed again, business as usual. I'd guess, that if ovs-vswitchd kept a synchronized copy of the kernel flow table in userspace and we'd desync it by flushing the kernel table, the next revalidator run would fix the problem and everything would behave normally again. (I didn't find any code in revalidator, that would do this though). What's the catch? On 6/11/23 14:20, Vašek Šraier via discuss wrote: Test environment 3 node Kubernetes cluster using OVN-Kubernetes, Docker and Fedora 38. I've managed to reproduce the same behavior on a single system with just OVS and a single network namespace. No explicit OpenFlow rules are needed. I guess this is not unexpected, but I wanted to confirm it regardless. :) Best, Vašek OpenPGP_0x540304CFA85D8C56.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Unexpected upcalls
On 6/13/23 15:55, Vašek Šraier wrote: > Hi, > > On 2023-06-12 18:40, Ilya Maximets wrote: >> Flushing datapath flows underneath running ovs-vswitchd process >> may cause all kind of weird behaviour. So, the del-flows as well as >> most of the other ovs-dpctl commands should not be used under normal >> circumstances. Here is what the man page says: >> >> """ >>Do not use commands to add or remove or modify datapath flows if >>ovs-vswitchd is running because it interferes with ovs-vswitchd's own >>datapath flow management. Use ovs-ofctl(8), instead, to work with >>OpenFlow flow entries. >> """ > > > Ok. I didn't notice this paragraph. Thank you for pointing that out. > > > Could you please explain in more detail why flushing the datapath flows > would cause these problems? Because some of my assumptions about OVS > must wrong and I don't know which. Just a link to relevant source code > could probably help me a lot. > > My current understanding is that the datapath flows are always installed > as a reaction to an upcall. There are no statically installed datapath > flows. Also, the revalidator threads always dump the real content of the > datapath and then check every dumped flow if it should be deleted. It > does not have to keep track of the flows in between runs. > > So, assuming the above, when I flush the flows, all packets will > generate upcalls and the necessary rules will be immediately installed > again, business as usual. See the following code path in ofproto/ofproto-dpif-upcall.c: handle_upcalls() --> ukey_install() --> ukey_install__() --> try_ukey_replace() Only if ukey_install() succeeds, the flow will be installed to the datapath. In your case the try_ukey_replace() will return false, because we have the exactly same ukey. And this result will float up. Actions will be executed, but the flow will not be installed. > > I'd guess, that if ovs-vswitchd kept a synchronized copy of the kernel > flow table in userspace and we'd desync it by flushing the kernel table, > the next revalidator run would fix the problem and everything would > behave normally again. (I didn't find any code in revalidator, that > would do this though). What's the catch? Revalidators revalidate ukeys for flows that they dump from the datapath. I.e. they iterate over dumped flows, not over all the ukeys during the main phase. Also, these ukeys are getting their stats updated, because there is still traffic going through them via upcalls, so they will not be revalidated, so they are not getting cleaned up during the revalidator_sweep__() phase. Also the dump is not a reliable source of information, because it happens concurrently from multiple threads and also while handlers are installing new flows. So, if the flow is not in the dump, it doesn't generally mean that it's not in the datapath. Best regards, Ilya Maximets. > > > On 6/11/23 14:20, Vašek Šraier via discuss wrote: >> Test environment >> >> >> 3 node Kubernetes cluster using OVN-Kubernetes, Docker and Fedora 38. > > I've managed to reproduce the same behavior on a single system with just > OVS and a single network namespace. No explicit OpenFlow rules are needed. > > I guess this is not unexpected, but I wanted to confirm it regardless. :) > > > Best, > Vašek ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
Re: [ovs-discuss] Unexpected upcalls
On 6/11/23 14:20, Vašek Šraier via discuss wrote: > Hi, > > I think I might have found a bug or just a really weird unexpected > behavior. I am reporting it here, but I've reproduced it only on systems > with OVN-Kubernetes and OVN. I don't know whether it can be replicated > with standalone OVS. > > > Test environment > > > 3 node Kubernetes cluster using OVN-Kubernetes, Docker and Fedora 38. > All nodes in a single LAN with IPs 192.168.1.{221,222,223}. From now on, > I will refer to the nodes only by the last digit of their IP address. > > > Steps to reproduce: > === > > 1. install a pod with a shell and a ping command > (I am using docker.io/archlinux:latest, always running on node 222) > 2. run `kubectl exec -ti $POD_NAME -- ping 192.168.1.221` > 3. in a different terminal, ssh into the host of the pod (in my case > node 222) and run `ovs-dpctl del-flows` > 4. observe the measured latencies > 5. keep the ping running and kill ovs-vswitchd on the host and wait for > its restart > 6. observe the latencies > > > What I observe > == > > [root@wsfd-netdev64 ~]# kubectl exec -ti arch -- ping 192.168.1.221 > PING 192.168.1.221 (192.168.1.221) 56(84) bytes of data. > 64 bytes from 192.168.1.221: icmp_seq=1 ttl=63 time=0.543 ms > 64 bytes from 192.168.1.221: icmp_seq=2 ttl=63 time=0.160 ms > 64 bytes from 192.168.1.221: icmp_seq=3 ttl=63 time=0.119 ms > 64 bytes from 192.168.1.221: icmp_seq=4 ttl=63 time=0.144 ms > 64 bytes from 192.168.1.221: icmp_seq=5 ttl=63 time=0.137 ms > 64 bytes from 192.168.1.221: icmp_seq=6 ttl=63 time=0.996 ms # < > ovs-dpctl del-flows > 64 bytes from 192.168.1.221: icmp_seq=7 ttl=63 time=0.808 ms > 64 bytes from 192.168.1.221: icmp_seq=8 ttl=63 time=1.01 ms > 64 bytes from 192.168.1.221: icmp_seq=9 ttl=63 time=1.24 ms > 64 bytes from 192.168.1.221: icmp_seq=10 ttl=63 time=1.20 ms > 64 bytes from 192.168.1.221: icmp_seq=11 ttl=63 time=1.14 ms > 64 bytes from 192.168.1.221: icmp_seq=12 ttl=63 time=1.10 ms # < > killall ovs-vswitchd >From 10.244.1.5 icmp_seq=22 Destination Host Unreachable >From 10.244.1.5 icmp_seq=23 Destination Host Unreachable >From 10.244.1.5 icmp_seq=24 Destination Host Unreachable >From 10.244.1.5 icmp_seq=25 Destination Host Unreachable >From 10.244.1.5 icmp_seq=26 Destination Host Unreachable >From 10.244.1.5 icmp_seq=27 Destination Host Unreachable >From 10.244.1.5 icmp_seq=28 Destination Host Unreachable >From 10.244.1.5 icmp_seq=29 Destination Host Unreachable >From 10.244.1.5 icmp_seq=31 Destination Host Unreachable >From 10.244.1.5 icmp_seq=32 Destination Host Unreachable > 64 bytes from 192.168.1.221: icmp_seq=34 ttl=63 time=1371 ms > 64 bytes from 192.168.1.221: icmp_seq=35 ttl=63 time=322 ms > 64 bytes from 192.168.1.221: icmp_seq=36 ttl=63 time=0.186 ms > 64 bytes from 192.168.1.221: icmp_seq=37 ttl=63 time=0.192 ms > 64 bytes from 192.168.1.221: icmp_seq=38 ttl=63 time=0.140 ms > 64 bytes from 192.168.1.221: icmp_seq=39 ttl=63 time=0.163 ms > ^C > --- 192.168.1.221 ping statistics --- > 39 packets transmitted, 18 received, +10 errors, 53.8462% packet loss, > time 38769ms > rtt min/avg/max/mdev = 0.119/94.570/1370.551/318.102 ms, pipe 3 > > > After flushing the datapath flow rules, the RTTs increase. This > indicates an upcall for every single packet. I confirmed this by tracing > the kernel and the upcalls are definitely there (actually, this is how I > discovered the whole issue). The system can remain in this state for a > long time (at least minutes). > > After restarting vswitchd, the upcalls stop happening and everything is > fast again. > > > I have also confirmed, that the same thing happens to UDP packets. The > same also happens regardless of which node in the cluster I target, even > the pod's host. > > > What I expect > = > > I expected only a few upcalls in response to flushing the datapath, > after which a new datapath rules would get inserted into the kernel. > > > Repeatability > = > > Flushing the rules and restarting vswitchd seems to be a certain method > to flip the system between the two different states. However the > upcall-making state seems unstable and it sometimes reverts back to the > expected lower-latency state by itself. It appears to me that flooding > the system with unrelated upcalls and rules increases the chance of an > "autonomous fix". > > > > > At this point, I don't believe this would cause significant problems. I > just find the behavior extremely weird. Hi. Flushing datapath flows underneath running ovs-vswitchd process may cause all kind of weird behaviour. So, the del-flows as well as most of the other ovs-dpctl commands should not be used under normal circumstances. Here is what the man page says: """ Do not use commands to add or remove or modify datapath flows if ovs-vswitchd is running because it interferes with ovs-vswitch
[ovs-discuss] Unexpected upcalls
Hi, I think I might have found a bug or just a really weird unexpected behavior. I am reporting it here, but I've reproduced it only on systems with OVN-Kubernetes and OVN. I don't know whether it can be replicated with standalone OVS. Test environment 3 node Kubernetes cluster using OVN-Kubernetes, Docker and Fedora 38. All nodes in a single LAN with IPs 192.168.1.{221,222,223}. From now on, I will refer to the nodes only by the last digit of their IP address. Steps to reproduce: === 1. install a pod with a shell and a ping command (I am using docker.io/archlinux:latest, always running on node 222) 2. run `kubectl exec -ti $POD_NAME -- ping 192.168.1.221` 3. in a different terminal, ssh into the host of the pod (in my case node 222) and run `ovs-dpctl del-flows` 4. observe the measured latencies 5. keep the ping running and kill ovs-vswitchd on the host and wait for its restart 6. observe the latencies What I observe == [root@wsfd-netdev64 ~]# kubectl exec -ti arch -- ping 192.168.1.221 PING 192.168.1.221 (192.168.1.221) 56(84) bytes of data. 64 bytes from 192.168.1.221: icmp_seq=1 ttl=63 time=0.543 ms 64 bytes from 192.168.1.221: icmp_seq=2 ttl=63 time=0.160 ms 64 bytes from 192.168.1.221: icmp_seq=3 ttl=63 time=0.119 ms 64 bytes from 192.168.1.221: icmp_seq=4 ttl=63 time=0.144 ms 64 bytes from 192.168.1.221: icmp_seq=5 ttl=63 time=0.137 ms 64 bytes from 192.168.1.221: icmp_seq=6 ttl=63 time=0.996 ms # < ovs-dpctl del-flows 64 bytes from 192.168.1.221: icmp_seq=7 ttl=63 time=0.808 ms 64 bytes from 192.168.1.221: icmp_seq=8 ttl=63 time=1.01 ms 64 bytes from 192.168.1.221: icmp_seq=9 ttl=63 time=1.24 ms 64 bytes from 192.168.1.221: icmp_seq=10 ttl=63 time=1.20 ms 64 bytes from 192.168.1.221: icmp_seq=11 ttl=63 time=1.14 ms 64 bytes from 192.168.1.221: icmp_seq=12 ttl=63 time=1.10 ms # < killall ovs-vswitchd From 10.244.1.5 icmp_seq=22 Destination Host Unreachable From 10.244.1.5 icmp_seq=23 Destination Host Unreachable From 10.244.1.5 icmp_seq=24 Destination Host Unreachable From 10.244.1.5 icmp_seq=25 Destination Host Unreachable From 10.244.1.5 icmp_seq=26 Destination Host Unreachable From 10.244.1.5 icmp_seq=27 Destination Host Unreachable From 10.244.1.5 icmp_seq=28 Destination Host Unreachable From 10.244.1.5 icmp_seq=29 Destination Host Unreachable From 10.244.1.5 icmp_seq=31 Destination Host Unreachable From 10.244.1.5 icmp_seq=32 Destination Host Unreachable 64 bytes from 192.168.1.221: icmp_seq=34 ttl=63 time=1371 ms 64 bytes from 192.168.1.221: icmp_seq=35 ttl=63 time=322 ms 64 bytes from 192.168.1.221: icmp_seq=36 ttl=63 time=0.186 ms 64 bytes from 192.168.1.221: icmp_seq=37 ttl=63 time=0.192 ms 64 bytes from 192.168.1.221: icmp_seq=38 ttl=63 time=0.140 ms 64 bytes from 192.168.1.221: icmp_seq=39 ttl=63 time=0.163 ms ^C --- 192.168.1.221 ping statistics --- 39 packets transmitted, 18 received, +10 errors, 53.8462% packet loss, time 38769ms rtt min/avg/max/mdev = 0.119/94.570/1370.551/318.102 ms, pipe 3 After flushing the datapath flow rules, the RTTs increase. This indicates an upcall for every single packet. I confirmed this by tracing the kernel and the upcalls are definitely there (actually, this is how I discovered the whole issue). The system can remain in this state for a long time (at least minutes). After restarting vswitchd, the upcalls stop happening and everything is fast again. I have also confirmed, that the same thing happens to UDP packets. The same also happens regardless of which node in the cluster I target, even the pod's host. What I expect = I expected only a few upcalls in response to flushing the datapath, after which a new datapath rules would get inserted into the kernel. Repeatability = Flushing the rules and restarting vswitchd seems to be a certain method to flip the system between the two different states. However the upcall-making state seems unstable and it sometimes reverts back to the expected lower-latency state by itself. It appears to me that flooding the system with unrelated upcalls and rules increases the chance of an "autonomous fix". At this point, I don't believe this would cause significant problems. I just find the behavior extremely weird. Is there something more I could provide to help with reproducing this? Or should I report it elsewhere? Best, Vašek Šraier OpenPGP_0x540304CFA85D8C56.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss