On Fri, May 17, 2019 at 09:45:36AM +0000, SCHAER Frederic wrote: > Hi > > Thank you for your answer. > I actually forgot to say I already had checked the syslogs, the ovs and the > network journals/logs... no coredump reference anywhere > > For me a core dump or a crash would not return an exit code of 0, which seems > to be what system saw :/ > I even straced -f the ovs-vswitchd process and made it stop/crash with an > ifdown/ifup, but looks to me this is an exit ... > > (I can retry and save the strace output if necessary or usefull) > End of strace output was (I see "brflat" in the long strings, which is the > bridge hosting em1) :
Is this a strace of ovs-vswitchd or ovs-vsctl? Because SIGABRT happens when the ovs-vsctl is stuck and the alarm fires. Then this would just point that ovs-vswitchd is not running. If ovs-vswitchd is not crashing, something is stopping the service and maybe running sh -x /sbin/ifdown <iface> helps to shed a light? Or add 'set -x' to /etc/sysconfig/network-scripts/if*-ovs scripts. fbl > > [pid 175068] sendmsg(18, {msg_name(0)=NULL, > msg_iov(1)=[{",\0\0\0\22\0\1\0\223\6\0\0!\353\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\v\0\3\0brflat\0\0", > 44}], msg_ > controllen=0, msg_flags=0}, 0 <unfinished ...> > [pid 175233] <... futex resumed> ) = 0 > [pid 175068] <... sendmsg resumed> ) = 44 > [pid 175068] recvmsg(18, <unfinished ...> > [pid 175234] futex(0x55b8aaa19128, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> > ...skipping... > [pid 175233] futex(0x7f7f226b9140, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished > ...> > [pid 175068] <... sendmsg resumed> ) = 44 > [pid 175234] <... futex resumed> ) = 0 > [pid 175233] <... futex resumed> ) = -1 EAGAIN (Resource temporarily > unavailable) > [pid 175068] recvmsg(18, <unfinished ...> > [pid 175234] futex(0x7f7f226b9140, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished > ...> > [pid 175233] futex(0x7f7f226b9140, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> > [pid 175068] <... recvmsg resumed> {msg_name(0)=NULL, > msg_iov(2)=[{"\360\4\0\0\20\0\0\0\224\6\0\0!\353\377\377\0\0\1\0\36\0\0\0C\20\1\0\0\0\0\0\v\0\3\0brflat\0\0\10\0\r\0\350\3\0\0\5\0\20\0\0\0\0\0\5\0\21\0\0\0\0\0\10\0\4\0\334\5\0\0\10\0\33\0\0\0\0\0\10\0\36\0\1\0\0\0\10\0\37\0\1\0\0\0\10\0(\0\377\377\0\0\10\0)\0\0\0\1\0\10\0 > > \0\1\0\0\0\5\0!\0\1\0\0\0\f\0\6\0noqueue\0\10\0#\0\0\0\0\0\5\0'\0\0\0\0\0$\0\16\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0H\234\377\377\n\0\1\0\276J\307\307\207I\0\0\n\0\2\0\377\377\377\377\377\377\0\0\304\0\27\0Y\22\5\0\0\0\0\0Uf\0\0\0\0\0\0^0j\1\0\0\0\0\372\371k\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0d\0\7\0Y\22\5\0Uf\0\0^0j\1\372\371k\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 1024}, > {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0y0\2\0\0\0\0\0\256\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\214\254\235\0\0\0\0\0 > > z\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\211\223\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0004\0\6\0\6\0\0\0\0\0\0\0r\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0A\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\24\0\7\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\10\0\0\0\0\0", > 65536}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT) = 1264 > [pid 175234] <... futex resumed> ) = -1 EAGAIN (Resource temporarily > unavailable) > [pid 175233] <... futex resumed> ) = 0 > [pid 175234] futex(0x7f7f226b9140, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> > [pid 175068] rt_sigprocmask(SIG_UNBLOCK, [ABRT], <unfinished ...> > [pid 175234] <... futex resumed> ) = 0 > [pid 175068] <... rt_sigprocmask resumed> NULL, 8) = 0 > [pid 175068] tgkill(175068, 175068, SIGABRT <unfinished ...> > [pid 175233] futex(0x7f7f226b9140, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> > [pid 175068] <... tgkill resumed> ) = 0 > [pid 175233] <... futex resumed> ) = 0 > [pid 175068] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=175068, > si_uid=393} --- > [pid 189862] +++ killed by SIGABRT +++ > [pid 175237] +++ killed by SIGABRT +++ > [pid 175236] +++ killed by SIGABRT +++ > [pid 175235] +++ killed by SIGABRT +++ > [pid 175234] +++ killed by SIGABRT +++ > [pid 175233] +++ killed by SIGABRT +++ > [pid 175232] +++ killed by SIGABRT +++ > [pid 175231] +++ killed by SIGABRT +++ > [pid 175230] +++ killed by SIGABRT +++ > [pid 175229] +++ killed by SIGABRT +++ > [pid 175228] +++ killed by SIGABRT +++ > [pid 175227] +++ killed by SIGABRT +++ > [pid 175226] +++ killed by SIGABRT +++ > [pid 175225] +++ killed by SIGABRT +++ > [pid 175224] +++ killed by SIGABRT +++ > [pid 175223] +++ killed by SIGABRT +++ > [pid 175222] +++ killed by SIGABRT +++ > [pid 175085] +++ killed by SIGABRT +++ > +++ killed by SIGABRT +++ > > > Regards > > > -----Message d'origine----- > > De : Flavio Leitner <f...@sysclose.org> > > Envoyé : vendredi 17 mai 2019 10:29 > > À : SCHAER Frederic <frederic.sch...@cea.fr> > > Cc : b...@openvswitch.org > > Objet : Re: [ovs-discuss] Restarting network kills ovs-vswitchd (and > > network)... ? > > > > On Thu, May 16, 2019 at 09:34:28AM +0000, SCHAER Frederic wrote: > > > Hi, > > > I'm facing an issue with openvswitch, which I think is new (not even > > > sure). > > > here is the description : > > > > > > * What you did that make the problem appear. > > > > > > I am configuring openstack (compute, network) nodes using OVS networks > > for main interfaces and RHEL network scripts, basically using openvswitch to > > create bridges, set the bridges IPs, and include the real Ethernet devices > > in > > the bridges. > > > On a compute machine (not in production, so not using 3 or more > > interfaces), I have for instance brflat -> em1. > > > Brflat has multiple IPs defined using IPADDR1, IPADDR2, etc.. > > > Now : at boot, machine has network. Bur if I ever change anything in > > network scripts and issue either a network restart, an ifup or an ifdown : > > network breaks and connectivity is lost. > > > > > > Also, on network restarts, I'm getting these logs in the network journal : > > > May 16 10:26:41 cloud1 ovs-vsctl[1766678]: ovs|00001|vsctl|INFO|Called > > > as ovs-vsctl -t 10 -- --may-exist add-br brflat May 16 10:26:51 cloud1 > > > ovs-vsctl[1766678]: ovs|00002|fatal_signal|WARN|terminating with > > > signal 14 (Alarm clock) May 16 10:26:51 cloud1 network[1766482]: > > > Bringing up interface brflat: > > > 2019-05-16T08:26:51Z|00002|fatal_signal|WARN|terminating with signal > > > 14 (Alarm clock) > > > > > > * What you expected to happen. > > > > > > On network restart... to get back a working network. Not be forced to log > > > in > > using ipmi console and fix network manually. > > > > > > * What actually happened. > > > > > > What actually happens is that on ifup/ifdown/network restart, the ovs- > > vswitchd daemon stops working. According to systemctl, it is actually > > exiting > > with code 0. > > > If I do a ifdown on one interface, then ovs-vswitchd is down. > > > After ovs-vswitchd restart, I then can ifup that interface : network is > > > still > > down (no ping, nothing). > > > Ovs-vswitchd is again dead/stopped/exited 0. > > > Then : manually starting ovs-vswitchd restores connectivity. > > > > > > Please also include the following information: > > > * The Open vSwitch version number (as output by ovs-vswitchd --version). > > > ovs-vswitchd (Open vSwitch) 2.10.1 > > > > Sounds like OVS is crashing. Please check 'dmesg' if you see segmentation > > fault messages in there. Or the journal logs. > > Or the systemd service status. > > > > If it is, then the next step is to enable coredumps to grab one core. Then > > install openvswitch-debuginfo package to see the stack trace. > > > > You're right that ifdown should not put the service down. > > > > fbl > _______________________________________________ > discuss mailing list > disc...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss