See comments interspersed below:
lionelt wrote:
Hello,
I am discovering Zenoss today, and I have some questions that the website could
not answer.
Congratulations on the first part, and you'll probably get good feedback
on the second.. hopefully from me. probably also from others.
I've been using Zenoss here since mid-year last year, and it's really
very powerful.
First of all, as an introduction : the network to monitor is a quite large
network (200 servers and growing, 10 to 30 metrics to check on each -we'll have
to develop specific agents to check some metrics-) organized in comprehensive
clusters, themselves made of sub-clusters, and so on... to final hosts (which
are usually Xen virtual machines).
I'm approaching 200 devices on a REALLY weak old server (we're replacing
it this week with something newer & faster.)
1/ Can a Zenoss probe handle flaps ? As an example : the CPU load of a system
is constant around 4, it rapidly raises to 15 and goes back to 4. Will Zenoss
generate an error immediatly, send an SMS and wake me up at 4 a.m. ? Or will
Zenoss probe again CPU load to check if it keeps high before considering it as
a problem ?
Flaps may be also due to problems in getting the probe value at a precise
instant, but may go back to normal the minute after...
Zenoss, by default now gets readings every 5 minutes (and pings every
minute) If the CPU load average doesn't go crazy for more than 5
minutes, I doubt you'd see any problems with alerts. If there are times
of day where you expect those types of loads, you could schedule
maintenance periods.
2/ Can Zenoss be configured to aggregate hosts and groups of hosts on a multi
level basis. Example : I have a web server, in a group of web servers, the
groups of web servers are in a cluster, and clusters are in a country.
I want to be able to have a very quick look at a single page to check if
everything is OK in a country, and if a problem occurs, be able to drill-down
exactly to where the problem is.
I read on the website that it is possible to aggregate but not sure if it is
possible on a multi level basis.
You can organise them in whatever hierarchy you wish.. and there are
"groups" which are completely assignable by you, and "device types", and
you can drill down either way.
You can create sub-groups of device types, or of groups, and you will
have status pages for each type, sub-type etc..
For example, I have a device type of routers, in which I have sub-types
of edge and leased-line.
I can look a the "Routers" status page & see what's going on with any
routers. I can look at either sub-type and see at a glance if there's a
problem with any router there.
If there are problems, trouble indicators will show on the dashboard
(overall health monitor for the entire monitored enterprise), on the
Routers page, and on the appropriate sub-type page, and then again on
the page for the device that has a problem. (along with what monitored
service or component has a problem.)
3/ Is it possible to tell Zenoss to keep silent during the first n minutes of
live of a system. Usually at startup, the disk usage, the cpu load will be
high, so is it possible to tell Zenoss to ignore the problems during the first
n minutes after system startup ?
You can switch devices into "maintenance" mode on-demand, so if you're
doing a restart, you can switch it into maintenance mode.. and any
events generated during that time will be collected & put into history,
but they will not generate alerts. Switch that system back into
"Production" mode when you're done with the reboot, and it will be
monitored normally again.
4/ Is it possible to swith off an indicator ? Example : a disk in a raid array
is broken. There is still a spare disk in the array. I scheduled to change it
next week because I planned a maintenance at the datacenter.
But before the maintenance, I would like to switch this indicator off to keep a
high level supervision of my network and be able to quickly see if an incident
occured without having to drill down because an alert is still present on this
disk drive...
Yes.. you can "acknowledge" a problem, and it will still show in the
dashboard, and will still be checked until it clears, but will not
generate further alerts unless the condition is moved to history and a
new "event" is created.
5/ Is it difficult to create a new probe ? Or does it only consist in building
a string containing a value that is sent back to Zenoss server ?
Some are easier than others.. but I'm not using much of this, so
hopefully someone will answer more fully!
Thanks in advance for all the information you will bring to me !
LT
------------------------
lionelt
-------------------- m2f --------------------
Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=4509#4509
-------------------- m2f --------------------
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users
-- Todd M. Hebert
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users