Re: Unbound exiting on stats write failure?,Re: Unbound exiting on stats write failure?
>> one of our unbound hosts recently exited, and before it did, it >> logged this: >> >> Sep 19 14:25:56 xxx unbound: [96:4] error: tube msg write failed: >> Resource temporarily unavailable >> Sep 19 14:25:56 xxx unbound: [96:4] fatal error: could not write stat >> values over cmd channel > > The error is on a pipe between unbound processes (threads). It should > not be out of resources (it might block of course, waiting for them, and > blocking pipes are not a problem for unbound, but this error is like a > pipe randomly breaks up). This turned out to be caused by us running a too old version of unbound, version 1.5.4. I've since upgraded to 1.5.9, so this exact problem should not happen again for us. In-between there, tube_write_msg() grew a test for EAGAIN (causing a retry) in the non-blocking case. Regards, - Håvard
Re: Unbound exiting on stats write failure?,Re: Unbound exiting on stats write failure?
> The error is on a pipe between unbound processes (threads). It should > not be out of resources (it might block of course, waiting for them, and > blocking pipes are not a problem for unbound, but this error is like a > pipe randomly breaks up). Hm. > Are you on OpenBSD? Perhaps upgrade the kernel? Nope, on NetBSD 7.0. Regards, - Håvard
Re: Unbound exiting on stats write failure?
Hi Havard, The error is on a pipe between unbound processes (threads). It should not be out of resources (it might block of course, waiting for them, and blocking pipes are not a problem for unbound, but this error is like a pipe randomly breaks up). Are you on OpenBSD? Perhaps upgrade the kernel? Best regards, Wouter On 20/09/16 09:47, Havard Eidnes via Unbound-users wrote: > Hi, > > one of our unbound hosts recently exited, and before it did, it > logged this: > > Sep 19 14:25:56 xxx unbound: [96:4] error: tube msg write failed: > Resource temporarily unavailable > Sep 19 14:25:56 xxx unbound: [96:4] fatal error: could not write stat > values over cmd channel > > Now, we're periodically polling stats via "unbound-control stats" and > feeding this into collectd, and our collectd hasn't exactly been fully > stable. However, is there a good reason the failure to write the > stats values is considered a fatal error? One would have thought that > it would not be, and that abandoning the output channel would be a > rasonable error recovery mechanism, allowing the main task of unbound > to proceed uninterrupted? > > Regards, > > - Håvard > signature.asc Description: OpenPGP digital signature