Re: [systemd-devel] default service restart action?

Brian Kroth Mon, 16 May 2016 11:12:06 -0700

On May 11, 2016 12:07, "Lennart Poettering" <[email protected]> wrote:
>
> On Wed, 11.05.16 11:27, Brian Kroth ([email protected]) wrote:
>
> > Hi all, I'm in the midst of steeping myself in systemd docs as I
prepare to
> > face lift a slew of services for Debian Jessie updates.
> >
> > As I read through things I'm starting to think through a number of new
ways
> > I could potentially reorganize some of our services, which is cool.
With my
> > ideas though I think I'm finding a few gaps in either my understanding
or
> > systemd capabilities, so I wanted to send a few questions to the list.
> > Hopefully this is the right place.
> >
> > The first should hopefully be a bit of a softball:
> >
> > With .service units one can specify OnFailure and other sorts of restart
> > behaviors, including thresholds and backoffs for when to stop retrying
and
> > what to do then. Essentially a lightweight service problem escalation
> > procedure.
> >
> > However, in reading systemd-system.conf, I don't see any way to specify
> > something like DefaultOnFailure behavior for what to do on failure,
perhaps
> > after some simple restart attempts, for all services.  Seems like it can
> > only be done on a per unit basis, no?
>
> That is correct, yes.
>
> > Ideally, I'd like to be able to do something very simply like, declare
> > if any service fails to restart itself or does so too often and enters a
> > hard failure state, then systemd should (attempt to) fire off an
> > escalation procedure unit like send a passive check status to Nagios or
> > send an email, accepting that such procedures may depend upon network
> > connectivity which may or may not be available (so maybe there's some
> > circular dependency issues to work through in such a scenario, but I
> > presume systemd already has facilities for handling that case, maybe via
> > OnFailureJobMode= settings).
> >
> > Thoughts?
>
> That sounds like it goes towards service monitoring?
>
> I figure our theory there was that monitoring systems should probably
> keep an eye on the journal stream generated, where there are events
> generated about these issues. These log entries are recognizable by
> their message ID and carry both human readable as well as structured
> metadta that let you know what's going on. Our plan was originally to
> then add a concept of "activation-by-log-event" to systemd, so that
> you could activate some service each time a log event of a certain
> kind happens. However, we never came around to actually hack that up,
> it's still on the TODO list.
>
> I think OnFailure= and stuff are pretty useful for some things, but
> for the monitoring case such a journal-based logic would be nicer,
> because it can cover events triggered in a quick pace and during early
> boot nicer, as they processing of this can happen serially and
> asynchronously... Also, it would allow much nicer filtering for any
> kind of event on the system, and we wouldn't happen to hook up every
> kind of failure of each service with a OnFailure= like dependency.
>
> So yeah, I think we should have better support for what you are trying
> to do, but I think we should best do that by delivering the
> activate-by-log-message feature after all...
>
> Lennart


Thanks, I'll look into that technique.

Essentially in this case it'd be another .service script monitoring journal
activity, perhaps with some filters, or else just a periodic cron job.
Either way, I think you're right - that's probably the more generally
applicable approach.

I must admit I'd only done enough research/understanding of journald to get
my syslog stuff working again. I hadn't really thought through what else it
might offer/enable.  Now that I have, I'm starting to see nice aspects to
it.  Too bad Debian Jessie is a little bit behind on a number of its
(coredumpctl) and support cast (syslog-ng) features.

Thanks,
Brian

_______________________________________________
systemd-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] default service restart action?

Reply via email to