Thanks for sharing your experiences.  I'm currently deploying a monitoring 
cluster with 2x 8-way intel systems with 8G to handle RRD's and web interface, 
and 6 4-way opterons to handle mysql and monitoring tasks.  I assume from your 
specs your RRD's are not in ram but on disk.... If so, that may well be one 
serious cause of inability to scale.  

After running ganglia + cacti for years, I'm familiar with some serious 
scalability needs of large RRD deployments... I'm actually eval'ing a 64G TMS 
solid-state disk for storing RRD on to allow for full HA of this cluster (so 
they don't go dead when a node dies).

How frewquent/chatty are your traps?  I'm intending to do full ifmib polling of 
a few hundred routers/switches, and process/etc polling of several thousand 
linux/windows hosts...  I'm thinking this will be equal or greater in 
difficulty to trap load from 16k network devices.

Cheers,


/eli



-----Original Message-----
From: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
To: [email protected] <[email protected]>
Sent: Wed Jul 25 11:58:13 2007
Subject: [zenoss-users] Re: Monitored processes


tringer wrote:
> I agree completely. We are at the point where we would be ready to move from 
> nagios/cacti/cutom over to ZenOSS, but I cannot yet trust it completely. 
> 
> I have heard that ZenOSS is a "resource hog", but I am not sure what that 
> means yet. It doesn't seem as if the problems you and I are seeing are 
> related to resources. 
> 
> Has anyone else had these types of problems?


I have found out the hard way that the only time Zenoss is a resource hog is 
when you are monitoring A LOT of devices..  when I am running htop it looks 
like it eats up the resouces due to the fact that it cannot write to mysql well 
enough..  I had 16000 devices doing SNMP trapping to it, and thats when I saw 
the issue.  When being pushed this hard thats when you can see its limitations. 
 When I was using it for only 300+ devices it was very good, but needed to be 
closely watched, since there are no watchdogs on the daemons and they get 
grumpy when busy.  

This was on a Dual proc quad core 3gHz box with 4 gigs of ram on it.

For small to mid size installs with less than 1000 devices on a quiet network 
it will work really well from what I can see, and most of the bugs can be 
worked around.  If you have a noisy network, and a lot of devices I dont 
believe that Zenoss is ready for it yet.  It does not appear to be able to 
handle the loads on a single machine well enough, and I cannot find any real 
documentation on if it is possible to split out event monitoring to multiple 
installs, and still have an integrated front end for our NOC to use. 

It looks like Zenoss is going to have to be able to support a heavy SQL 
backend, as well as have process monitoring / restarting of its own daemons 
before this can really be considered Enterprise ready.  I am really looking 
forward to seeing what the next major update will have.  

Currently I am running x2 2.0.2, and x1 2.0.3 side by side for perf managment 
only and they are working quite well for that, other than the zenperfsnmp 
processes dieing every once in a while.  This happens on all 3 boxes, two with 
identical hardware and OS, and the third is only a dual proc with a gig of ram. 
 They are watching ~500 devices currently, and I am going to load up the 
"light" box with all 16000 and see how the performance daemon behaves.

Grin, this was a long response, to your question but hopefully there is some 
useful info in here...

------------------------
 Christopher Hubbard




-------------------- m2f --------------------

Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=9140#9140

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to