Re: [zenoss-users] Zenoss evaluation

Todd M. Hebert Thu, 01 Mar 2007 02:35:11 -0800

See comments interspersed below:

lionelt wrote:

Hello,


I am discovering Zenoss today, and I have some questions that the website could 
not answer.

Congratulations on the first part, and you'll probably get good feedbackon the second.. hopefully from me. probably also from others.I've been using Zenoss here since mid-year last year, and it's reallyvery powerful.

First of all, as an introduction : the network to monitor is a quite large 
network (200 servers and growing, 10 to 30 metrics to check on each -we'll have 
to develop specific agents to check some metrics-) organized in comprehensive 
clusters, themselves made of sub-clusters, and so on... to final hosts (which 
are usually Xen virtual machines).

I'm approaching 200 devices on a REALLY weak old server (we're replacingit this week with something newer & faster.)

1/ Can a Zenoss probe handle flaps ? As an example : the CPU load of a system 
is constant around 4, it rapidly raises to 15 and goes back to 4. Will Zenoss 
generate an error immediatly, send an SMS and wake me up at 4 a.m. ? Or will 
Zenoss probe again CPU load to check if it keeps high before considering it as 
a problem ?
Flaps may be also due to problems in getting the probe value at a precise 
instant, but may go back to normal the minute after...

Zenoss, by default now gets readings every 5 minutes (and pings everyminute) If the CPU load average doesn't go crazy for more than 5minutes, I doubt you'd see any problems with alerts. If there are timesof day where you expect those types of loads, you could schedulemaintenance periods.

2/ Can Zenoss be configured to aggregate hosts and groups of hosts on a multi 
level basis. Example : I have a web server, in a group of web servers, the 
groups of web servers are in a cluster, and clusters are in a country.
I want to be able to have a very quick look at a single page to check if 
everything is OK in a country, and if a problem occurs, be able to drill-down 
exactly to where the problem is.
I read on the website that it is possible to aggregate but not sure if it is 
possible on a multi level basis.

You can organise them in whatever hierarchy you wish.. and there are"groups" which are completely assignable by you, and "device types", andyou can drill down either way.You can create sub-groups of device types, or of groups, and you willhave status pages for each type, sub-type etc..

For example, I have a device type of routers, in which I have sub-typesof edge and leased-line.I can look a the "Routers" status page & see what's going on with anyrouters. I can look at either sub-type and see at a glance if there's aproblem with any router there.If there are problems, trouble indicators will show on the dashboard(overall health monitor for the entire monitored enterprise), on theRouters page, and on the appropriate sub-type page, and then again onthe page for the device that has a problem. (along with what monitoredservice or component has a problem.)

3/ Is it possible to tell Zenoss to keep silent during the first n minutes of 
live of a system. Usually at startup, the disk usage, the cpu load will be 
high, so is it possible to tell Zenoss to ignore the problems during the first 
n minutes after system startup ?

You can switch devices into "maintenance" mode on-demand, so if you'redoing a restart, you can switch it into maintenance mode.. and anyevents generated during that time will be collected & put into history,but they will not generate alerts. Switch that system back into"Production" mode when you're done with the reboot, and it will bemonitored normally again.

4/ Is it possible to swith off an indicator ? Example : a disk in a raid array 
is broken. There is still a spare disk in the array. I scheduled to change it 
next week because I planned a maintenance at the datacenter.
But before the maintenance, I would like to switch this indicator off to keep a 
high level supervision of my network and be able to quickly see if an incident 
occured without having to drill down because an alert is still present on this 
disk drive...

Yes.. you can "acknowledge" a problem, and it will still show in thedashboard, and will still be checked until it clears, but will notgenerate further alerts unless the condition is moved to history and anew "event" is created.

5/ Is it difficult to create a new probe ? Or does it only consist in building 
a string containing a value that is sent back to Zenoss server ?

Some are easier than others.. but I'm not using much of this, sohopefully someone will answer more fully!

Thanks in advance for all the information you will bring to me !

LT

------------------------
lionelt




-------------------- m2f --------------------

Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=4509#4509

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users


-- Todd M. Hebert

_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Re: [zenoss-users] Zenoss evaluation

Reply via email to