Yes, I agree to that.

We have Restart=on-abnormal in our systemd unit for unbound. If it met runtime error, which is not recoverable, it would make sure a new instance is started again, hopefully restoring it to working state. On the other hand, if I make just typo in configuration file, that is a final failure. But not a kind of issue where restart will likely help to fix it. That is why I do not like Restart=on-failure in services.

It would be great, if described errors could lead to fatal_abort indeed. Of course primary issue for it should be fixed. But described error is something, where restart should fix the issue. At least on Fedora it is common, that services do not create core dump by default. But that can be changed. At least emitting different exit code when runtime error happened is useful. abort() call does that and systemd process it.

Cheers,
Petr

On 15. 10. 24 5:33, David Pfitzner via Unbound-users wrote:
Recently, I have had cases where unbound (v1.20.0) occasionally exits with
a log message like:

   fatal error: event_dispatch returned error -1, errno is Bad file
descriptor

If that is a known issue I would be interested to hear, but that is not
actually my main point, which is: In this case it is not clear (at least to
me) what the detailed cause of the fatal error was, and so I think it would
be useful if unbound would generate a core file in cases like this, as that
might help to understand the problem. That is, for the fatal_exit()
function (which generates the above message) to call abort() rather than
exit(1). So my question is, would it be reasonable to modify fatal_exit()
to do that?

I could imagine possibly not, because fatal_exit() may be called in a lot
of cases, including, for example, bad configuration, and in many of those
cases the cause of the error may be immediately obvious, so a core file
could be considered superfluous.

Or, would such a feature be more palatable if it was enabled by some sort
of global config option or command-line parameter etc?

Alternatively, one could change just the code (in comm_base_dispatch())
which calls fatal_exit() with the above message, so that it dumps core
instead of calling fatal_exit(). But then I would worry that other calls to
fatal_exit() may have a similar problem in future.

Or, maybe there should be two functions, eg fatal_exit() and fatal_abort(),
and cases where the cause could be unclear could use the latter one.

Any thoughts?

I have applied a local change to make fatal_exit() dump core for me, but
was wondering whether something like that could be applied upstream.

Regards,
David Pfitzner

--
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB

Reply via email to