Re: [opsview-users] Opsview: our next step for monitoring evolution?

Simone Felici Mon, 22 Feb 2010 00:56:56 -0800

Good morning,

I konw have written maybe a lot, but could someone help me with my questions?
Thank's


Simon

Simone Felici ha scritto in data 19/02/2010 17.35:


Hello to all!

I'm working very well with nagios since 2006.
Meanwhile our company has increased the monitored devices day by day.
Today our monitoring system is composed as follow:

1 cluster (2 nodes Xeon 2GB RAM) in HA with heartbeat, also one node per
time is working. The other one is there in case of failure of the first.
- CentOS 5.4
- nagios 3.2.0 (monarch as web GUI for conf)
Actually it monitors 800 hosts and 2000 services with following stats:
Metric Min. Max. Average
Check Execution Time: 0.00 sec 15.05 sec 2.681 sec
Check Latency: 0.00 sec 12.09 sec 0.785 sec
Percent State Change: 0.00% 12.11% 0.18%

There are some distributed installations that reports some status back
to the core via NSCA.
All this is set manually, with only (great) help of monarch.

In the next months we need to merge another big monitoring system that
will groove up the numbers a lot.
Maybe at the end we should monitor 2000 hosts and 13,000 services. This
could be a problem for my Xeon Server and I'll need new hardware. This
could be a good moment to start using a distributed solution to reduce
single load.
The native distribution solution (NSCA) could be good but has two big
limitations:
1. need to maintain duplicate configuration across different nagios
installations
2. in case of down of one distributed nagios server, all checks done by
the remote server would be considered CRITICAL.

Googling, first I've found is DNX, but official it DOESN'T support a
distributed solution with some servers in DMZ. It means, I've some
Nagios (distributed-slaves) that are behind firewall and can reach ONLY
the devices behind that firewall. The check-results must be send (nsca)
to master to send out sms notifications and collect logs (for SLA) and
that slaves must check ONLY the devices in that LAN. DNX (official and
now) doesn't support a selective allocation of devices to be monitored
from single slaves only, but they are distributed in load balancing
everywhere.

And here comes Opsview Community Edition! It could help me to extend my
installation... I hope :)
Also I've some questions, hope someone can help me:

1. Is it compatible AND stable with nagios 3.2.0? It means, can I still
import all configuration AND ALL LOGS to the new system?

2. How many hosts/services could I manage from central core with this
solution? Some production scenmario examples in numbers.

3. It's right what I've understood reading the documentation?
There is one (or more in heartbeat active/passive) master that collect
all infos and sends out notifications (mail, sms gsm modem or whatever I
like)
It's possible to add two different types of slave servers:
a) single slave installations. Every single slave is handled as a
separate datacenter. This is the solution that could be used to monitor
devices not directly reachable from master. Status is sent to master
back with nsca.
b) multi slaves in cluster. They are handled, like above, as a separate
datacenter, but the checks in this slave cluster are divided between all
slaves in load balancing as well as in high availability. If one slave
dies, the others take the services/hosts to be monitored.

4. What happens if slave fails? I mean a slave like point 3a, also a
slave with no clusters.

5. Can the master do active checks too or in case of slaves it demands
the checks on slaves only?

6. Can I still use our custom plugins? They are bash scripts that
perfoms check based on nagios macros (HOSTADDRESS, ...) and give back as
attended a message and an exit code, very simple.

I know I've asked maybe a lot, but up to there answers I'll start some
tests.
Thank's a lot for your help!

Warmest Regards,

Simon
_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

_______________________________________________
Opsview-users mailing list
[email protected]
http://lists.opsview.org/lists/listinfo/opsview-users

Re: [opsview-users] Opsview: our next step for monitoring evolution?

Reply via email to