See comments interspersed below:

lionelt wrote:
Hello,

I am discovering Zenoss today, and I have some questions that the website could 
not answer.

Congratulations on the first part, and you'll probably get good feedback on the second.. hopefully from me. probably also from others. I've been using Zenoss here since mid-year last year, and it's really very powerful.

First of all, as an introduction : the network to monitor is a quite large 
network (200 servers and growing, 10 to 30 metrics to check on each -we'll have 
to develop specific agents to check some metrics-) organized in comprehensive 
clusters, themselves made of sub-clusters, and so on... to final hosts (which 
are usually Xen virtual machines).

I'm approaching 200 devices on a REALLY weak old server (we're replacing it this week with something newer & faster.)

1/ Can a Zenoss probe handle flaps ? As an example : the CPU load of a system 
is constant around 4, it rapidly raises to 15 and goes back to 4. Will Zenoss 
generate an error immediatly, send an SMS and wake me up at 4 a.m. ? Or will 
Zenoss probe again CPU load to check if it keeps high before considering it as 
a problem ?
Flaps may be also due to problems in getting the probe value at a precise 
instant, but may go back to normal the minute after...

Zenoss, by default now gets readings every 5 minutes (and pings every minute) If the CPU load average doesn't go crazy for more than 5 minutes, I doubt you'd see any problems with alerts. If there are times of day where you expect those types of loads, you could schedule maintenance periods.

2/ Can Zenoss be configured to aggregate hosts and groups of hosts on a multi 
level basis. Example : I have a web server, in a group of web servers, the 
groups of web servers are in a cluster, and clusters are in a country.
I want to be able to have a very quick look at a single page to check if 
everything is OK in a country, and if a problem occurs, be able to drill-down 
exactly to where the problem is.
I read on the website that it is possible to aggregate but not sure if it is 
possible on a multi level basis.

You can organise them in whatever hierarchy you wish.. and there are "groups" which are completely assignable by you, and "device types", and you can drill down either way. You can create sub-groups of device types, or of groups, and you will have status pages for each type, sub-type etc..

For example, I have a device type of routers, in which I have sub-types of edge and leased-line. I can look a the "Routers" status page & see what's going on with any routers. I can look at either sub-type and see at a glance if there's a problem with any router there. If there are problems, trouble indicators will show on the dashboard (overall health monitor for the entire monitored enterprise), on the Routers page, and on the appropriate sub-type page, and then again on the page for the device that has a problem. (along with what monitored service or component has a problem.)

3/ Is it possible to tell Zenoss to keep silent during the first n minutes of 
live of a system. Usually at startup, the disk usage, the cpu load will be 
high, so is it possible to tell Zenoss to ignore the problems during the first 
n minutes after system startup ?

You can switch devices into "maintenance" mode on-demand, so if you're doing a restart, you can switch it into maintenance mode.. and any events generated during that time will be collected & put into history, but they will not generate alerts. Switch that system back into "Production" mode when you're done with the reboot, and it will be monitored normally again.

4/ Is it possible to swith off an indicator ? Example : a disk in a raid array 
is broken. There is still a spare disk in the array. I scheduled to change it 
next week because I planned a maintenance at the datacenter.
But before the maintenance, I would like to switch this indicator off to keep a 
high level supervision of my network and be able to quickly see if an incident 
occured without having to drill down because an alert is still present on this 
disk drive...

Yes.. you can "acknowledge" a problem, and it will still show in the dashboard, and will still be checked until it clears, but will not generate further alerts unless the condition is moved to history and a new "event" is created.

5/ Is it difficult to create a new probe ? Or does it only consist in building 
a string containing a value that is sent back to Zenoss server ?

Some are easier than others.. but I'm not using much of this, so hopefully someone will answer more fully!

Thanks in advance for all the information you will bring to me !

LT

------------------------
lionelt




-------------------- m2f --------------------

Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=4509#4509

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

-- Todd M. Hebert

_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to