The biggest performance drain I saw in our installation of zenoss was devices that were timing out. Normally zenperfsnmp will do a series of snmpgets for the oid's specified in the templates bound to that device. It's a very quick process. If zenperfsnmp (or any of the other polling processes) can't connect to the device, they will have to wait for a timeout....then they'll try and connect again....and have to wait to timeout again. I think by default most classes have a default snmp timeout of 2.5 seconds and 2 retries.
The first thing we did was go in and identify and correct any devices that had snmp configured incorrectly (both within zenoss and on the devices snmpd configuration files). We found devices that only supported v1 snmp, but were configured with v2c...or incorrect rocommunity strings. Next we found devices that had a few oid's that couldn't be queried correctly. They'll show up in zenoss as a grey event stating something like: > Error reading value for "whatever" on devicename (oid .1.3.6.1.4.1.9.2.1.58.0 > is bad More times than not, we found old debian linux servers that had old versions of snmpd which didn't support some of the newer oid's being queried. We upgraded snmpd on the servers that would support it. On the others, we created a custom template and removed the OID queries that wouldn't work for the device. If you have a lot of the same, you can create a custom class and copy, then modify, the template in question. Then move those devices to that class. Lastly, we lowered the snmp timeout and/or lowered the number of retries on the devies that weren't critical to us. We have a number of blades running in a cluster that do random jobs. Sometimes they get too overloaded...and we don't really care if we miss cpu or mem stats on them for a cycle or two. As mentioned above, there comes a point where your hardware can't really do any more with only a single collector and it needs to be distributed. The nice thing is that the zen kicks of into different processes...so you can take advantage of more than one cpu. However, I've hit a wall in that zenhub is constantly pegging one of my procs at 100%....and the rest of the processes use zenhub so that's my biggest bottleneck. One of these days, I'll get around to troubleshooting it. -------------------- m2f -------------------- Read this topic online here: http://forums.zenoss.com/viewtopic.php?p=30143#30143 -------------------- m2f -------------------- _______________________________________________ zenoss-users mailing list [email protected] http://lists.zenoss.org/mailman/listinfo/zenoss-users
