On Wed, Apr 6, 2011 at 2:06 AM, W.C.A. Wijngaards <[email protected]>wrote:
> > > When this issue happens, I can't communicate with unbound via > > unbound-control and it will never resolve anything. I can cleanly shut > > it down and start a new instance and it will behave exactly the same. > > The only solution I've found is to restart the VPS. I have another VPS > > from the same provider which is setup almost identically and it has > > never had this issue. > > So, it is somehow unique to that machine. Can you see in 'top' what > unbound is doing? (is it using cpu, 100% in a busy loop?, it is not > responding to unbound-control, so it must be completely hosed somehow) > Sorry I meant to include that in my original email. It does not appear to be in a busy loop; top shows 0% CPU usage for unbound. > netstat -su may be interesting (packet counters for UDP). > Okay, I'll remember to take a look, see if the packets are sitting unread. > > Another thing you can do is use 'gcore' to make a coredump of the > 'failed' unbound process. (and then kill it and start a new unbound for > your production). Then you can use 'gdb' and your compiled unbound > executable to read the core image and produce a stack backtrace what it > is doing. > I'm not familiar with "gcore" can I just configure ulimit to allow core dumps then send the ABRT signal? I'll make sure I install the debug libraries so I get something useful there. The weird thing is restarting unbound won't fix it. I really have to restart the machine (so it's likely something else is really broken). Well it should respond to the unbound-control utility. If it does not > this means it is somehow no longer processing the main loop, or that > network traffic does not reach it. > Interesting, all the requests should be done over localhost. My resolv.conf only contains the line "nameserver 127.0.0.1" and doing "dig @localhost foo.com" also fails. I can check the routing table and do the obvious pings and see if those at least work. I did run strace last time this happened, but I wasn't really sure what to look for; I was really just checking that it was doing something and not just hanging. Next time I'll capture the output and try and take a better look. If it matters, this is on an amd64 Debian GNU/Linux Squeeze (6.0) system. Thanks for the tips, --Will
_______________________________________________ Unbound-users mailing list [email protected] http://unbound.nlnetlabs.nl/mailman/listinfo/unbound-users
