This is my 5 minute review of Zabbix (http://www.zabbix.com).
Demo* (for the impatient): I have set up a demo showing the PLUG data that all members can look at. I will leave the account active for a few days. Here are the details: https://monitor.ryansimpkins.com u:plug p:zabbix What I needed: -Performance monitoring. -Alerting. -Support for multiple users. -Not too intensive for 10-15 hosts. -Remote support (monitoring hosts outside the datacenter). I could have gone with the regular Nagios/RRDTool based solutions, but I wanted to try something new to me. Zabbix combines performance and availability monitoring in to one system. Backed by a database (I'm using MySQL) with a PHP front end, the thing is pretty simple. I chose to install the latest stable release: 1.6.5. The good: It is pretty easy to set up. Everything makes sense, and it is clear where you draw the line between a performance metric (items) and an alert (triggers). Basically every actionable object is a trigger, and a trigger uses data collected from performance metrics to determine the state. Therefore, every item you monitor has a historical record. How long you keep history is tunable per item. Another win is a web application monitoring system (uses curl, supports cookies). It is monitoring plug.org, logging in and ensuring everything is working. While testing it, I monitored my company's web application for a short time. It spotted trouble that our very expensive external monitoring services didn't catch. Alerts are items that do something based on trigger state (like e-mail you). It only takes a few alert rules to cover the most common scenarios. New in 1.6.5 (and a must have for me) is recovery messages. I have alerts tied to Jabber, and a custom script. Triggers can have dependencies to ensure you don't get spammed with alerts if a core item breaks down. It supports escalations and many other features. Data arrives in to the system via external checks, custom scripts, SNMP, or the zabbix agent which has an extensive feature list. Zabbix also supports a clustered monitoring environment, giving it ability to scale. The bad: The default template results in tons of performance metrics and triggers being added at high rates. In the first day of use it generated 23MB of data for me. For example, it was collecting network usage stats every 5 seconds! Cleaning up the performance metrics brought things down to a much more reasonable level. This took a lot of time to tweak. The default agent is also missing a lot of useful metrics for certain platforms. For example, there is no support for i/o metrics on Linux. You can write custom scripts for this, and some community implementations are available. However, you have to deploy these custom solutions to each monitored host. If you need in-depth performance stats, expect to be adding a lot of custom metrics. Because everything is backed by a database, certain operations can result in significant load on the monitoring system. Graphs are generated on-demand from data out of the database. If you view a screen showing the last 6 months worth of metrics on multiple complex graphs, expect your monitoring system to get a workout. ACLs still have a long way to go. This is a newer feature in Zabbix. For example, I was unable to allow a certain user permission to modify some elements but not others. The choices are write, read, or deny based on host groups only. Having this feature set expanded would enhance the usefulness of the application. The documentation is mediocre for an open source project. Quite a few features are hard to understand or are not documented at all. The Zabbix community forums help to fill in the gaps. Every question I could think to ask was answered previously on the forums. Conclusion: Zabbix took a little longer to set up than other monitoring solutions I've used, a symptom of unreasonable defaults that ship with the package. However, once set up and optimized Zabbix offers monitoring and historical data with a minimum of pain and hassle. It is nice having everything self contained in one package, making deployment and management of the system much easier. Please feel free to ask any questions. -Ryan * I just added the zabbix agent to the PLUG server, so it is going to take a while to fully populate everything. /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
