This is my 5 minute review of Zabbix (http://www.zabbix.com).

Demo* (for the impatient):
I have set up a demo showing the PLUG data that all members can look at. I
will leave the account active for a few days. Here are the details:
https://monitor.ryansimpkins.com
u:plug
p:zabbix

What I needed:
-Performance monitoring.
-Alerting.
-Support for multiple users.
-Not too intensive for 10-15 hosts.
-Remote support (monitoring hosts outside the datacenter).

I could have gone with the regular Nagios/RRDTool based solutions, but I
wanted to try something new to me. Zabbix combines performance and
availability monitoring in to one system. Backed by a database (I'm using
MySQL) with a PHP front end, the thing is pretty simple. I chose to install
the latest stable release: 1.6.5.

The good:
It is pretty easy to set up. Everything makes sense, and it is clear where you
draw the line between a performance metric (items) and an alert (triggers).
Basically every actionable object is a trigger, and a trigger uses data
collected from performance metrics to determine the state. Therefore, every
item you monitor has a historical record. How long you keep history is tunable
per item.

Another win is a web application monitoring system (uses curl, supports
cookies). It is monitoring plug.org, logging in and ensuring everything is
working. While testing it, I monitored my company's web application for a
short time. It spotted trouble that our very expensive external monitoring
services didn't catch.

Alerts are items that do something based on trigger state (like e-mail you).
It only takes a few alert rules to cover the most common scenarios. New in
1.6.5 (and a must have for me) is recovery messages. I have alerts tied to
Jabber, and a custom script. Triggers can have dependencies to ensure you
don't get spammed with alerts if a core item breaks down. It supports
escalations and many other features.

Data arrives in to the system via external checks, custom scripts, SNMP, or
the zabbix agent which has an extensive feature list. Zabbix also supports a
clustered monitoring environment, giving it ability to scale.

The bad:
The default template results in tons of performance metrics and triggers being
added at high rates. In the first day of use it generated 23MB of data for me.
For example, it was collecting network usage stats every 5 seconds! Cleaning
up the performance metrics brought things down to a much more reasonable
level. This took a lot of time to tweak.

The default agent is also missing a lot of useful metrics for certain
platforms. For example, there is no support for i/o metrics on Linux. You can
write custom scripts for this, and some community implementations are
available. However, you have to deploy these custom solutions to each
monitored host. If you need in-depth performance stats, expect to be adding a
lot of custom metrics.

Because everything is backed by a database, certain operations can result in
significant load on the monitoring system. Graphs are generated on-demand from
data out of the database. If you view a screen showing the last 6 months worth
of metrics on multiple complex graphs, expect your monitoring system to get a
workout.

ACLs still have a long way to go. This is a newer feature in Zabbix. For
example, I was unable to allow a certain user permission to modify some
elements but not others. The choices are write, read, or deny based on host
groups only. Having this feature set expanded would enhance the usefulness of
the application.

The documentation is mediocre for an open source project. Quite a few features
are hard to understand or are not documented at all. The Zabbix community
forums help to fill in the gaps. Every question I could think to ask was
answered previously on the forums.

Conclusion:
Zabbix took a little longer to set up than other monitoring solutions I've
used, a symptom of unreasonable defaults that ship with the package. However,
once set up and optimized Zabbix offers monitoring and historical data with a
minimum of pain and hassle. It is nice having everything self contained in one
package, making deployment and management of the system much easier.

Please feel free to ask any questions.

-Ryan

* I just added the zabbix agent to the PLUG server, so it is going to take a
while to fully populate everything.


/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Reply via email to