The biggest performance drain I saw in our installation of zenoss was devices 
that were timing out.  Normally zenperfsnmp will do a series of snmpgets for 
the oid's specified in the templates bound to that device.  It's a very quick 
process.  If zenperfsnmp (or any of the other polling processes) can't connect 
to the device, they will have to wait for a timeout....then they'll try and 
connect again....and have to wait to timeout again.  I think by default most 
classes have a default snmp timeout of 2.5 seconds and 2 retries.  

The first thing we did was go in and identify and correct any devices that had 
snmp configured incorrectly (both within zenoss and on the devices snmpd 
configuration files).  We found devices that only supported v1 snmp, but were 
configured with v2c...or incorrect rocommunity strings.  

Next we found devices that had a few oid's that couldn't be queried correctly.  
They'll show up in zenoss as a grey event stating something like:


> Error reading value for "whatever" on devicename (oid .1.3.6.1.4.1.9.2.1.58.0 
> is bad


More times than not, we found old debian linux servers that had old versions of 
snmpd which didn't support some of the newer oid's being queried.  We upgraded 
snmpd on the servers that would support it.  On the others, we created a custom 
template and removed the OID queries that wouldn't work for the device.  If you 
have a lot of the same, you can create a custom class and copy, then modify, 
the template in question.  Then move those devices to that class.

Lastly, we lowered the snmp timeout and/or lowered the number of retries on the 
devies that weren't critical to us.  We have a number of blades running in a 
cluster that do random jobs.  Sometimes they get too overloaded...and we don't 
really care if we miss cpu or mem stats on them for a cycle or two.

As mentioned above, there comes a point where your hardware can't really do any 
more with only a single collector and it needs to be distributed.  The nice 
thing is that the zen kicks of into different processes...so you can take 
advantage of more than one cpu.  However, I've hit a wall in that zenhub is 
constantly pegging one of my procs at 100%....and the rest of the processes use 
zenhub so that's my biggest bottleneck.  One of these days, I'll get around to 
troubleshooting it.




-------------------- m2f --------------------

Read this topic online here:
http://forums.zenoss.com/viewtopic.php?p=30143#30143

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to