Same is reproducible using SIGABRT (kill -6) ps aux | grep ovn-controller root 927884 0.0 0.0 26792 956 ? S<s 12:03 0:00 ovn-controller: monitoring pid 927885 (healthy) root 927885 0.0 0.0 27060 2484 ? S< 12:03 0:00 ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --no-chdir --log-file=/var/log/openvswitch/ovn-controller.log --pidfile=/var/run/openvswitch/ovn-controller.pid --detach --monitor kill -6 927884 kill -6 927885
service ovn-host restart 2017-04-29T19:46:53Z|00001|unixctl|WARN|failed to connect to /var/run/openvswitch/ovn-controller.927885.ctl ovs-appctl: cannot connect to "/var/run/openvswitch/ovn-controller.927885.ctl" (Connection refused) * Starting ovn-controller We are trying to solve the real use case here . The reason is --monitor takes care of code crash based on the SIG* mentioned. However, we want to avoid cases where someone kills the controller pid and provisioning a new VM will not get the ACLs properly in place since controller died. Re-spawning at-least ensures that we there is no *control-plane impact*. Same case for ovs-vswitchd where if someone kills the pid, it brings down the host(with/without vms on it) since there is no respawn mechanism apart from code crash which monitor takes care of. Also for production version ,we always choose stable release to avoid such random code crash issues for which monitor option will handle it by default. Here again re-spawning helps avoid *data-plane impact*. On Sat, Apr 29, 2017 at 12:26 PM, Ben Pfaff <b...@ovn.org> wrote: > Please read the ovs-vswitchd manpage. It says: > > --monitor > Creates an additional process to monitor the ovs-vswitchd > dae‐ > mon. If the daemon dies due to a signal that indicates a > pro‐ > gramming error (SIGABRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, > SIG‐ > PIPE, SIGSEGV, SIGXCPU, or SIGXFSZ) then the monitor > process > starts a new copy of it. If the daemon dies or exits > for > another reason, the monitor process exits. > > This option is normally used with --detach, but it also > func‐ > tions without it. > > SIGKILL (signal 9) does not indicate a bug, so the monitor process does > not restart OVS. If you want to test the monitoring feature, use one of > the signals listed above that indicates a bug. > > OVS solves the PID file management problem by holding a lock on the > pidfile. The pidfile is only valid if it is locked. > > I don't think you're solving real problems. > > On Sat, Apr 29, 2017 at 12:10:58PM -0700, Aliasgar Mikail Ginwala wrote: > > When you say that ovn-controller crashed, what do you mean? > > I mean if someone kills the pid or it crashes, it never comes back up > until > > and unless I do service ovn-host restart. > > Do you mean that you killed it? Yes > > Which process, and how did you kill it? Stating the e.g. I posted > above: > > ps aux | grep controller > > root 3639845 0.0 0.0 26792 952 ? S<s 17:24 0:00 > > ovn-controller: monitoring pid 3639846 (healthy) > > root 3639846 0.0 0.0 27060 2484 ? S< 17:24 0:00 > > ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer > > -vsyslog:err -vfile:info --no-chdir > > --log-file=/var/log/openvswitch/ovn-controller.log > > --pidfile=/var/run/openvswitch/ovn-controller.pid --detach --monitor > > > > Kill -9 3639845and issuing kill -9 3639846 ofcourse kill the whole > service. > > > > Also we have a known issue for pid file management as it goes stale > which I > > already highlighed in the example and reference @. > > http://stackoverflow.com/questions/696839/how-do-i- > write-a-bash-script-to-restart-a-process-if-it-dies > > > > My sample service with respawn is as follow ; as soon as you kill the > pid, > > it just respawns: > > ps aux | grep fakeservice > > root 924307 2.7 0.0 782872 23844 ? Sl 12:01 0:00 > > /fake/fakeservice --v=10 --fakeservice-resource-point=http://fakeurl > > kill -9 924307 > > ps aux | grep fakeservice > > root 924653 12.0 0.0 774420 23728 ? Sl 12:01 0:00 > > /fake/fakeservice --v=10 --fakeservice-resource-point=http://fakeurl > > > > So why can't we get rid of it and just add ovn-host in /etc/init/ and add > > below lines which immediately respawns? > > respawn > > respawn limit x x > > > > > > > > > > > > > > > > > > > > > > On Sat, Apr 29, 2017 at 10:04 AM, Ben Pfaff <b...@ovn.org> wrote: > > > > > When you say that ovn-controller crashed, what do you mean? Do you > mean > > > that you killed it? Which process, and how did you kill it? > > > > > > On Fri, Apr 28, 2017 at 10:51:04PM -0700, Aliasgar Mikail Ginwala > wrote: > > > > Yes: > > > > > > > > ps aux | grep controller > > > > root 3639845 0.0 0.0 26792 952 ? S<s 17:24 0:00 > > > > ovn-controller: monitoring pid 3639846 (healthy) > > > > root 3639846 0.0 0.0 27060 2484 ? S< 17:24 0:00 > > > > ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer > > > > -vsyslog:err -vfile:info --no-chdir > > > > --log-file=/var/log/openvswitch/ovn-controller.log > > > > --pidfile=/var/run/openvswitch/ovn-controller.pid --detach --monitor > > > > root 4067233 0.0 0.0 11744 936 pts/9 S+ 22:46 0:00 > grep > > > > --color=auto controller > > > > > > > > > > > > /etc/init.d/ovn-host installed via debain that is compiled from > source > > > code > > > > only adds --monitor > > > > > > > > On Fri, Apr 28, 2017 at 9:08 PM, Ben Pfaff <b...@ovn.org> wrote: > > > > > > > > > Is it running with the --monitor option? If not, either --monitor > > > > > should be added or the upstart features should be used. > > > > > > > > > > On Fri, Apr 28, 2017 at 05:16:09PM -0700, Aliasgar Mikail Ginwala > > > wrote: > > > > > > I did double verify: > > > > > > > > > > > > This is what is happening after crashing the ovn pid: > > > > > > > > > > > > service ovn-host status > > > > > > Pidfile for ovn-controller (/var/run/openvswitch/ovn- > controller.pid) > > > is > > > > > > stale > > > > > > > > > > > > Works only after manual restart and didn't respawn > > > > > > service ovn-host restart > > > > > > 2017-04-29T00:14:37Z|00001|unixctl|WARN|failed to connect to > > > > > > /var/run/openvswitch/ovn-controller.3623709.ctl > > > > > > ovs-appctl: cannot connect to > > > > > > "/var/run/openvswitch/ovn-controller.3623709.ctl" (Connection > > > refused) > > > > > > * Starting ovn-controller > > > > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > > Aliasgar > > > > > > > > > > > > On Fri, Apr 28, 2017 at 4:50 PM, Ben Pfaff <b...@ovn.org> wrote: > > > > > > > > > > > > > On Fri, Apr 28, 2017 at 04:02:26PM -0700, Aliasgar Mikail > Ginwala > > > > > wrote: > > > > > > > > Recently when I was adding monitoring and alerting for ovs > and > > > ovn > > > > > > > version > > > > > > > > 2.7.0, I found both of the upstart services are missing > > > *respawn* . > > > > > Is it > > > > > > > > on purpose? If it's not then lets handle it as an > improvement to > > > add > > > > > it > > > > > > > in > > > > > > > > the upstart. Suggestions welcome. > > > > > > > > > > > > > > OVS and OVN already restarts itself, so probably nothing is > needed. > > > > > > > > > > > > > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev