Please read the ovs-vswitchd manpage. It says: --monitor Creates an additional process to monitor the ovs-vswitchd dae‐ mon. If the daemon dies due to a signal that indicates a pro‐ gramming error (SIGABRT, SIGALRM, SIGBUS, SIGFPE, SIGILL, SIG‐ PIPE, SIGSEGV, SIGXCPU, or SIGXFSZ) then the monitor process starts a new copy of it. If the daemon dies or exits for another reason, the monitor process exits.
This option is normally used with --detach, but it also func‐ tions without it. SIGKILL (signal 9) does not indicate a bug, so the monitor process does not restart OVS. If you want to test the monitoring feature, use one of the signals listed above that indicates a bug. OVS solves the PID file management problem by holding a lock on the pidfile. The pidfile is only valid if it is locked. I don't think you're solving real problems. On Sat, Apr 29, 2017 at 12:10:58PM -0700, Aliasgar Mikail Ginwala wrote: > When you say that ovn-controller crashed, what do you mean? > I mean if someone kills the pid or it crashes, it never comes back up until > and unless I do service ovn-host restart. > Do you mean that you killed it? Yes > Which process, and how did you kill it? Stating the e.g. I posted above: > ps aux | grep controller > root 3639845 0.0 0.0 26792 952 ? S<s 17:24 0:00 > ovn-controller: monitoring pid 3639846 (healthy) > root 3639846 0.0 0.0 27060 2484 ? S< 17:24 0:00 > ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer > -vsyslog:err -vfile:info --no-chdir > --log-file=/var/log/openvswitch/ovn-controller.log > --pidfile=/var/run/openvswitch/ovn-controller.pid --detach --monitor > > Kill -9 3639845and issuing kill -9 3639846 ofcourse kill the whole service. > > Also we have a known issue for pid file management as it goes stale which I > already highlighed in the example and reference @. > http://stackoverflow.com/questions/696839/how-do-i-write-a-bash-script-to-restart-a-process-if-it-dies > > My sample service with respawn is as follow ; as soon as you kill the pid, > it just respawns: > ps aux | grep fakeservice > root 924307 2.7 0.0 782872 23844 ? Sl 12:01 0:00 > /fake/fakeservice --v=10 --fakeservice-resource-point=http://fakeurl > kill -9 924307 > ps aux | grep fakeservice > root 924653 12.0 0.0 774420 23728 ? Sl 12:01 0:00 > /fake/fakeservice --v=10 --fakeservice-resource-point=http://fakeurl > > So why can't we get rid of it and just add ovn-host in /etc/init/ and add > below lines which immediately respawns? > respawn > respawn limit x x > > > > > > > > > > > On Sat, Apr 29, 2017 at 10:04 AM, Ben Pfaff <b...@ovn.org> wrote: > > > When you say that ovn-controller crashed, what do you mean? Do you mean > > that you killed it? Which process, and how did you kill it? > > > > On Fri, Apr 28, 2017 at 10:51:04PM -0700, Aliasgar Mikail Ginwala wrote: > > > Yes: > > > > > > ps aux | grep controller > > > root 3639845 0.0 0.0 26792 952 ? S<s 17:24 0:00 > > > ovn-controller: monitoring pid 3639846 (healthy) > > > root 3639846 0.0 0.0 27060 2484 ? S< 17:24 0:00 > > > ovn-controller unix:/var/run/openvswitch/db.sock -vconsole:emer > > > -vsyslog:err -vfile:info --no-chdir > > > --log-file=/var/log/openvswitch/ovn-controller.log > > > --pidfile=/var/run/openvswitch/ovn-controller.pid --detach --monitor > > > root 4067233 0.0 0.0 11744 936 pts/9 S+ 22:46 0:00 grep > > > --color=auto controller > > > > > > > > > /etc/init.d/ovn-host installed via debain that is compiled from source > > code > > > only adds --monitor > > > > > > On Fri, Apr 28, 2017 at 9:08 PM, Ben Pfaff <b...@ovn.org> wrote: > > > > > > > Is it running with the --monitor option? If not, either --monitor > > > > should be added or the upstart features should be used. > > > > > > > > On Fri, Apr 28, 2017 at 05:16:09PM -0700, Aliasgar Mikail Ginwala > > wrote: > > > > > I did double verify: > > > > > > > > > > This is what is happening after crashing the ovn pid: > > > > > > > > > > service ovn-host status > > > > > Pidfile for ovn-controller (/var/run/openvswitch/ovn-controller.pid) > > is > > > > > stale > > > > > > > > > > Works only after manual restart and didn't respawn > > > > > service ovn-host restart > > > > > 2017-04-29T00:14:37Z|00001|unixctl|WARN|failed to connect to > > > > > /var/run/openvswitch/ovn-controller.3623709.ctl > > > > > ovs-appctl: cannot connect to > > > > > "/var/run/openvswitch/ovn-controller.3623709.ctl" (Connection > > refused) > > > > > * Starting ovn-controller > > > > > > > > > > > > > > > > > > > > Regards, > > > > > Aliasgar > > > > > > > > > > On Fri, Apr 28, 2017 at 4:50 PM, Ben Pfaff <b...@ovn.org> wrote: > > > > > > > > > > > On Fri, Apr 28, 2017 at 04:02:26PM -0700, Aliasgar Mikail Ginwala > > > > wrote: > > > > > > > Recently when I was adding monitoring and alerting for ovs and > > ovn > > > > > > version > > > > > > > 2.7.0, I found both of the upstart services are missing > > *respawn* . > > > > Is it > > > > > > > on purpose? If it's not then lets handle it as an improvement to > > add > > > > it > > > > > > in > > > > > > > the upstart. Suggestions welcome. > > > > > > > > > > > > OVS and OVN already restarts itself, so probably nothing is needed. > > > > > > > > > > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev