On 09/28/2012 12:02 AM, Johan Olsson wrote: > > Hi > > I've been looking on how to monitor varnish. I've found that there > exists a snmp for varnish which gives some info that is good to have. > I've found it and looked at it > (http://sourceforge.net/projects/varnishsnmp/), but it dosen't give > all that I need (I think). What I'm missing is to be able to monitor > how much traffic one site is using. So if I have two sites like > www.example1.com <http://www.example1.com> and www.example2.com > <http://www.example2.com>, I would like to be able to get how many > connections each one gets and how much Mbps each one is using. > > Is this possible to do? > Hi Johan,
Maybe I can help. I've got about 35 sites running on a 4 node Varnish cluster here and monitor throughput, request rate and http status codes per-site using Cacti and Nagios via SNMP . The way it works is like this: Each server is running the exact same varnish config. In this config there's a VCL chunk that defines a bunch of macro's for gathering site info: STATS_NODE - Define a new node. This generates a structure at compile time, where the statistics will be stored. STATS_INIT - Initialize a node. Unfortunately this gets called each time a site is accessed. But the code is only a few lines and very lightweight. STATS_SET_BACKEND - Defines the current backend to use. This is called each time a site is accessed. STATS_UPDATE - Update the site's statistics. This is called each time a site is accessed. STATS_DUMP - This gets called periodically to dump the entire statistics linked-list to a syslog. The flow is roughly as follows: - Each site references the macros at certain points to generate the statistics; - The main configuration calls STATS_DUMP periodically, which sends statistics info to syslog; - Syslog then sends it to a dedicated FIFO; - A script (called varnish-snmp-stats-prep-backends.sh) is listening to the FIFO and parses the stats; - The stats are then parsed and written to a per-site textfile; - SNMPD is configured to access the per-site stats files. Each server also generates varnishd-specific data every 5 minutes using a script (varnish-snmp-stats-prep-srv.sh) that calls varnishstat. The parsed output of varnish is dumped to a text file and made available to SNMPD. One varnish server is appointed the main statistics server. On that server a cronjob calls "varnish-snmp-summarize-backends.py" every 5 minutes, which gathers and summarizes the statistics of all 4 servers, using SNMP. This data is then dumped to per-site text files again, but then containing the aggregate per-site counts. Cacti can the query this server for the combined per-site and varnishd statistics. Another approach to generate the per-site statistics could be to pipe the varnishlog output to a script that parses this data. Though I fear this method might cause quite a heavy load on the machine doing the parsing, so this may have to be offloaded to another machine. But this is not the path I chose. Note: We're still running varnish 2 in production. Version 3 is in test. But the conversion is trivial. I've prepared a tarball of this setup for sharing, but I have to get permission to release this (anonimized) configuration to the public. I'll get back to you on this subject tomorrow or the day after to hopefully supply you with the entire setup (varnish, cron, support scripts, syslog). Just let me know the best way to share this on this list. And somewhat unrelated to your question, but interresting nonetheless: Another bit of VCL code dumps each request tot syslog in a modified NSCA format, for debugging, traceability and such. But because the machines sometimes generate beyond 2MB of logdata per second per server, and I like to keep the logs for a few weeks, the logs need to be rotateted fairly often to prevent gigantic files and they need to be compressed to minimize storage requirements.There are two seperate scripts to handle logrotation: - varnish-log-rotate.sh - Checks the size of the log and rotates it if it excceeds 2GB. - varnish-log-compress.sh - Waits for rotated logs and compresses and archives them using idle-priority, to minimize CPU impact. This allows you to store 2.5TB of logs on 250GB of storage and minimize the log compression load on the servers. Cheers, Johnny
_______________________________________________ varnish-misc mailing list [email protected] https://www.varnish-cache.org/lists/mailman/listinfo/varnish-misc
