proxied from Joe Stringer@VMware "I wonder if this patch fixes the issue: https://github.com/openvswitch/ovs/commit/546953509095cec6fad42663b659171618b765d2
Note that the last three lines within the bad_key_len / bad_mask_len conditional statement are the following: format_generic_odp_key(ma, ds); ds_put_char(ds, ')'); return; This is the same logic as the end of the function, where the backtrace is reporting the callstack to be. Jarno pointed out that the compiler could optimize out the first copy of this code to turn into a jump instruction which jumps inside the if (!is_exact) statement. Hence the backtrace shows this confusing callstack. Note that this problem would only present itself if there is: A) A mismatch between a newer kernel version and an older userspace (OVS<2.3), where B) The kernel has a new flow match field available which ovs-vswitchd doesn't understand, and C) A flow_del command fails for some reason." It would be great if we could confirm by getting the existing build of OVS and applying the patch above. ** Description changed: + [Impact] + Open vSwitch daemon crashes, causing flow data to be lost and in an OpenStack cloud, instance connectivity to be lost. + + [Test Case] + <trivialized step> Install and OpenStack cloud using Neutron + ML2 plugin and OpenvSwitch + Run cloud for some time - ovs-vswitchd will crash causing loss of instance connectivity. + + + [Regression Potential] + Minimal - this code is in versions > 2.0.2 for some time. + + [Original Bug Report] Hi I find that every 2 days or so I lose part of my cluster. It seems that openvswitch is crashing... The only message left on syslog is as follows: syslog:Jul 1 22:52:32 blue-compute kernel: [530482.190688] ovs- vswitchd[1935]: segfault at 0 ip 0000000000459110 sp 00007fff85804758 error 4 in ovs-vswitchd[400000+133000] And this is the last message. I'm unable to reboot gracefully. I have to reset. (This can be because ceph not giving up also). And I can see a lot of traffic going around in the network. There so much traffic that some lowend routers/switches fail. Can be because another problem (machines stalled because the ovs fault and others trying to connect. Maybe it fails because much traffic). But I tell this for completeness. Now some info: Linux version 3.13.0-30-generic (buildd@allspice) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #54-Ubu ntu SMP Mon Jun 9 22:45:01 UTC 2014 (Ubuntu 3.13.0-30.54-generic 3.13.11.2) vendor_id : AuthenticAMD cpu family : 16 model : 4 model name : AMD Phenom(tm) II X4 810 Processor Ubuntu 14.04 LTS (server). ovs-vsctl --version ovs-vsctl (Open vSwitch) 2.0.1 Compiled Feb 23 2014 14:42:32 I can attach full logs but I think there's nothing useful because only one line referring the problem. NOTE: restarting ovs does not solve the problem. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1336555 Title: ovs-vswitchd crashed with SIGSEGV in nl_attr_get_size() To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
