On Thu, Jun 28, 2012 at 11:01 PM, Alexander E. Patrakov <patra...@gmail.com> wrote: > 2012/6/29 Kok, Auke-jan H <auke-jan.h....@intel.com>: >> On Fri, Jun 29, 2012 at 12:49 AM, Nathan <qwerty....@gmail.com> wrote: >>> Another issue (though slightly related) is we have an external binary >>> that when run will return 0 or 1 depending if we should run a service >>> is there a way to run this command in the service_name.service and start the >>> service if it returns 0 and stop the service if the script >>> returns 1 (retrying the script every 5 minutes or so). >> >> cheap trick: make a script and run it from a timer, have the script >> run `systemctl ...` >> >> better trick: fix the daemon to do all of this properly. > > Hello. The company I work for has a similar need. The director has > permitted me to disclose the details in full, in hope that this will > permit you to understand the use case better and understand why "fix > the daemon" is not a possible solution in our case. We are not using > systemd yet on our servers, but this doesn't make the problem > statement invalid. > > We have several servers hosted at different ISPs, and our own > autonomous system. The service is provided to our clients via IPv4 > anycast. So, at each of the servers, we run bgpd (from quagga) and > announce a route to our own IPv4 block. This means that each client > will be routed to the nearest (in the BGP sense) server. It also > protects our service against outages that affect the entire ISP, and > allows us to perform maintenance and software upgrades safely (i.e. > with near zero visible downtime for clients) by stopping bgpd first. > > The issue is that twice in the company's lifetime there was a payment > problem with one of the servers. When this happened, the ISP did not > shut down the affected server. Instead, they somehow firewalled the > packets destined to it, but the BGP session was left intact. End > result: the route is still announced into the global routing table, > but doesn't work, and some clients see service interruption. So, as a > protection against such mistakes, we need some form of a custom dead > man's switch that would stop bgpd if none of the test IPv4 addresses > is pingable. > > Of course, such monitoring need is specific to our use case, and other > companies will either not need it at all or write a dead man's switch > with a different logic. > > So the logic, as I understand it, should be as follows: run bgpd if > the administrator has not prohibited this due to maintenance or > similar reasons, and the periodically-executed (?) dead-man's-switch > script doesn't say that bgpd should not run. > > The "run systemctl from timer" is close, but not close enough: extra > care is needed during maintenance periods to disable the dead man's > switch script (so it doesn't restart bgpd contrary to the > administrator's decision) and not to forget to reenable it later.
nothing a sticky note on a monitor couldn't fix. A real solution would be to use some sort of heartbeat feature, or just wrap the bgpd in a wrapper program that takes care of starting/stopping it. That allows you to keep the wrapper running from systemd at all times. No timers needed. Auke _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel