By any chance is the time it stops around 4-4:30AM when the backups are being
done. I am starting to see zenperfsnmp croak around this time. The I/O Wait for
the system goes off the roof and zenperfsnmp juse doesn't recover from the
problem devices. It then just stops in the middle of a cycle with no errors.
A strace on the pid reports this...
recvmsg(66,
66 in this case seems to be a UDP port for a specific machine.
When trying to restart normally it reports this...
recvmsg(66, 0x7fff1b2cefa0, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigaction(SIGTERM, {0x3eacebbab0, [], SA_RESTORER, 0x317d20dd40},
{0x3eacebbab0, [], SA_RESTORER, 0x317d20dd40}, 8) = 0
rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
recvmsg(66, 0x7fff1b2cefa0, 0) = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigaction(SIGTERM, {0x3eacebbab0, [], SA_RESTORER, 0x317d20dd40},
{0x3eacebbab0, [], SA_RESTORER, 0x317d20dd40}, 8) = 0
rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
recvmsg(66, <unfinished ...>
An lsof shows around 71 UDP ports open by the process.
It is hung in this state. Then when killed with a -USR1 (I was trying different
signals before -9) it died. Restarting via the gui, I get machines not being
collected as reported in "Recovery Issues" thread;
http://community.zenoss.com/forums/viewtopic.php?t=3920
When I started from the command line, it started up just fine.
zenperfsnmp needs to be able to recover from this.
-------------------- m2f --------------------
Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=14576#14576
-------------------- m2f --------------------
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users