One of the processes will not kill. (kill -9 <pid>, killall -9 <pid> or killall -9 zenperfsnmp do nothing.)

I had to reboot the system yesterday to try & get the rogue processes to die. It looks like I might have to do the same today. Any suggestions on trying to kill the last process?

zenoss 2015 0.0 0.4 28140 2284 ? D Aug16 0:02 /usr/local/zenoss/bin/python /usr/local/zenoss/Products/ZenRRD/zenperfsnmp.py --configfile /usr/local/zenoss/etc/zenperfsnmp.conf --cycle --daemon

Todd M. Hebert

Eric Newton wrote:
Oh. Well, you have too many zenperfsnmp processes running. Kill them all. Run zenperfsnmp with with strace

$ strace -f -e trace=network -s 2000 zenperfsnmp run -v 10 2>send-to-eric

Stop it when you see the errors in the log file. Then restart zenperfsnmp as a daemon:

   $ zenperfsnmp start

-Eric

Todd Michael Hebert wrote:

This is all that ended up in the file:

Process 2026 attached - interrupt to quit
Process 2026 detached


Zenoss status says the processes are running.

This is what I get from the ps auxww command:

zenoss 2027 0.0 0.4 28140 2312 ? S Aug16 0:00 /usr/local/zenoss/bin/python /usr/local/zenoss/Products/ZenRRD/zenperfsnmp.py --configfile /usr/local/zenoss/etc/zenperfsnmp.conf --cycle --daemon zenoss 2028 0.0 0.4 28140 2312 ? S Aug16 0:00 /usr/local/zenoss/bin/python /usr/local/zenoss/Products/ZenRRD/zenperfsnmp.py --configfile /usr/local/zenoss/etc/zenperfsnmp.conf --cycle --daemon zenoss 2015 0.0 0.4 28140 2312 ? D Aug16 0:02 /usr/local/zenoss/bin/python /usr/local/zenoss/Products/ZenRRD/zenperfsnmp.py --configfile /usr/local/zenoss/etc/zenperfsnmp.conf --cycle --daemon zenoss 2026 0.0 0.4 28140 2312 ? S Aug16 0:00 /usr/local/zenoss/bin/python /usr/local/zenoss/Products/ZenRRD/zenperfsnmp.py --configfile /usr/local/zenoss/etc/zenperfsnmp.conf --cycle --daemon root 8680 0.0 0.1 1828 584 pts/0 S+ 15:13 0:00 grep zenperfsnmp

Todd M. Hebert

Eric Newton wrote:

Oops... sorry for my last content-free reply.

For some reason, zenperfsnmp isn't getting through all the devices. This will cause heartbeat failures, and the log messages below.

Can you do this for me:

   $ ps auxww | grep zenperfsnmp

Note the process id

$ strace -e trace=network -s 2000 -p [process id from above] 2>file-to-send-to-eric

Interrupt it after a minute.

zip the file and send it to me. This will let me see what packets are going out and coming back.

This will include things like IP addresses and community strings, so send it off list. I'm specifically looking for sendto/recvfrom on the IP addresses that are not collecting. We should see several attempts to talk to those devices in one minute. I will check the integrity of the packets and verify that we are decoding them properly.
-Eric

Todd Michael Hebert wrote:

These are in a rotated-out zenperfsnmp.log:

2006-08-11 16:23:02 WARNING zen.zenperfsnmp: Deleting old RRD file: /usr/local/zenoss/perf/B1/ifOutOctets.rrd

There's one of these, it would appear, for every single rrd.

I'm also getting these errors, and lots of them:

2006-08-14 18:59:29 WARNING zen.zenperfsnmp: Devices status is not clearing. Restarting. 2006-08-14 19:08:17 WARNING zen.zenperfsnmp: Devices status is not clearing. Restarting.

Right now I have 2-3 devices with events that just won't clear as well..but everything else seems to be monitoring normally.

I'm getting very frustrated, and I really don't want to take the whole Zenoss install down & start over. I'm monitoring 152 devices at this point... and the system has really been great for diagnosing network problems and knowing when anything goes down.

Todd M. Hebert

_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users




_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users


_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to