On Wed, 11.05.16 11:27, Brian Kroth (bpkr...@gmail.com) wrote: > Hi all, I'm in the midst of steeping myself in systemd docs as I prepare to > face lift a slew of services for Debian Jessie updates. > > As I read through things I'm starting to think through a number of new ways > I could potentially reorganize some of our services, which is cool. With my > ideas though I think I'm finding a few gaps in either my understanding or > systemd capabilities, so I wanted to send a few questions to the list. > Hopefully this is the right place. > > The first should hopefully be a bit of a softball: > > With .service units one can specify OnFailure and other sorts of restart > behaviors, including thresholds and backoffs for when to stop retrying and > what to do then. Essentially a lightweight service problem escalation > procedure. > > However, in reading systemd-system.conf, I don't see any way to specify > something like DefaultOnFailure behavior for what to do on failure, perhaps > after some simple restart attempts, for all services. Seems like it can > only be done on a per unit basis, no?
That is correct, yes. > Ideally, I'd like to be able to do something very simply like, declare > if any service fails to restart itself or does so too often and enters a > hard failure state, then systemd should (attempt to) fire off an > escalation procedure unit like send a passive check status to Nagios or > send an email, accepting that such procedures may depend upon network > connectivity which may or may not be available (so maybe there's some > circular dependency issues to work through in such a scenario, but I > presume systemd already has facilities for handling that case, maybe via > OnFailureJobMode= settings). > > Thoughts? That sounds like it goes towards service monitoring? I figure our theory there was that monitoring systems should probably keep an eye on the journal stream generated, where there are events generated about these issues. These log entries are recognizable by their message ID and carry both human readable as well as structured metadta that let you know what's going on. Our plan was originally to then add a concept of "activation-by-log-event" to systemd, so that you could activate some service each time a log event of a certain kind happens. However, we never came around to actually hack that up, it's still on the TODO list. I think OnFailure= and stuff are pretty useful for some things, but for the monitoring case such a journal-based logic would be nicer, because it can cover events triggered in a quick pace and during early boot nicer, as they processing of this can happen serially and asynchronously... Also, it would allow much nicer filtering for any kind of event on the system, and we wouldn't happen to hook up every kind of failure of each service with a OnFailure= like dependency. So yeah, I think we should have better support for what you are trying to do, but I think we should best do that by delivering the activate-by-log-message feature after all... Lennart -- Lennart Poettering, Red Hat _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel