Hi guys I've just had an issue on one of our 4 varnishservers - that I cannot explain.. The varnishd-child just dies.

I have a guess though - Might it have something to do with me (a check on my behalf) telnetting to the serviceport to check the backend.health?
Is my perl-fu not strong enough?


After the child is dead, I do not see the alarm until 10 mins later ... and startup varnishd - I have no varnishlog from the incident but as I start up varnish everything works again...

Every 5 minutes my nagios does four checks on the varnishmachines, to check health of the backend. command[check_varnish_backends]=/usr/lib/nagios/plugins/check_varnish_backend_health -all command[check_varnish_backend_euros08]=/usr/lib/nagios/plugins/check_varnish_backend_health -h euros08 command[check_varnish_backend_euros12]=/usr/lib/nagios/plugins/check_varnish_backend_health -h euros12 command[check_varnish_backend_carbon]=/usr/lib/nagios/plugins/check_varnish_backend_health -h carbon

I've inlined the .pl doing the nagioscheck and the syslog output from varnishd.

Does anyone have a clue ? Sorry in advance for the rathe long mail ..
Regards

/Eivind

# varnishd -V
varnishd (varnish-2.0.6)
Copyright (c) 2006-2009 Linpro AS / Verdens Gang A
______________________________________
SYSLOG:

May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 open from telnet 127.0.0.1:46759 127.0.0.1:6082
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 result 200 "debug.health"
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 closed
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 open from telnet 127.0.0.1:46761 127.0.0.1:6082
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 result 200 "debug.health"
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 closed
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 open from telnet 127.0.0.1:46762 127.0.0.1:6082
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 result 200 "debug.health"
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 closed
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 open from telnet 127.0.0.1:46764 127.0.0.1:6082
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 result 200 "debug.health"
May 12 10:05:54 varnish1 varnishd[19448]: CLI 7 closed
....
May 12 10:07:58 varnish1 varnishd[19448]: Child (19449) not responding to ping, killing it. May 12 10:08:00 varnish1 varnishd[19448]: Child (19449) not responding to ping, killing it.
May 12 10:08:00 varnish1 varnishd[19448]: Child (19449) died signal=3
May 12 10:08:02 varnish1 varnishd[19448]: Child cleanup complete
May 12 10:08:02 varnish1 varnishd[19448]: child (18174) Started
May 12 10:08:04 varnish1 kernel: [6214551.451581] Inbound IN=eth0 OUT= MAC=00:14:22:1e:ba:b8:00:0e:84:3b:ec:c0:08:00 SRC=66.151.125.8 DST=130.225.244.83 LEN=32 TOS=0x00 PREC=0x00 TTL=1 ID=1802 PROTO=UDP SPT=11057 DPT=33442 LEN=12 May 12 10:08:06 varnish1 kernel: [6214553.449867] Inbound IN=eth0 OUT= MAC=00:14:22:1e:ba:b8:00:0e:84:3b:ec:c0:08:00 SRC=66.151.125.8 DST=130.225.244.83 LEN=32 TOS=0x00 PREC=0x00 TTL=1 ID=1802 PROTO=UDP SPT=11057 DPT=33442 LEN=12 May 12 10:08:07 varnish1 varnishd[19448]: Pushing vcls failed: CLI communication error
May 12 10:08:07 varnish1 varnishd[19448]: Stopping Child
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Closed fds: 4 5 6 10 11 13 14
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Child starts
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said managed to mmap 2147483648 bytes of 2147483648
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Ready
....
May 12 10:08:07 varnish1 varnishd[19448]: Pushing vcls failed: CLI communication error
May 12 10:08:07 varnish1 varnishd[19448]: Stopping Child
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Closed fds: 4 5 6 10 11 13 14
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Child starts
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said managed to mmap 2147483648 bytes of 2147483648
May 12 10:08:07 varnish1 varnishd[19448]: Child (18174) said Ready
....
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Probe("GET /vandstand/servlet/ImageServlet HTTP/1.1
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Host: carbon.dmi.dk
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Connection: close
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: last message repeated 2 times
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said ", 3, 10)
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Probe("GET / HTTP/1.1
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Host: localhost
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Connection: close
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: last message repeated 2 times
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said ", 3, 5)
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Probe("GET /dmi/dmi.css HTTP/1.1
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Host: euros08.dmi.dk
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said Connection: close
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said
May 12 10:08:08 varnish1 varnishd[19448]: last message repeated 2 times
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) said ", 2, 10)
May 12 10:08:08 varnish1 varnishd[19448]: Child (18174) ended
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said Probe("GET /dmi/dmi.css HTTP/1.1
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said Host: euros12.dmi.dk
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said Connection: close
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said
May 12 10:08:08 varnish1 varnishd[19448]: last message repeated 2 times
May 12 10:08:08 varnish1 varnishd[19448]: Child (-1) said ", 2, 10)
May 12 10:08:08 varnish1 varnishd[19448]: Child cleanup complete
....
May 12 10:21:42 varnish1 varnishd[19448]: Manager got SIGINT
May 12 10:21:43 varnish1 varnishd[19184]: child (19185) Started
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Closed fds: 4 5 6 10 11 13 14
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Child starts
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said managed to mmap 2147483648 bytes of 2147483648
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Ready
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Probe("GET /vandstand/servlet/ImageServlet HTTP/1.1
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Host: carbon.dmi.dk
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said Connection: close
May 12 10:21:43 varnish1 varnishd[19184]: Child (19185) said
...And so on....

_________________________________
# cat /usr/lib/nagios/plugins/check_varnish_backend_health
#!/usr/bin/perl

use strict;
use Net::Telnet ();
use Getopt::Long;
my $version = '';
my $help = '';
my $result = '';
my $status = '';
my $host = '';
my $all = '';
my $verbose = '';
my $handle = new Net::Telnet ( Port=> '6082', Timeout => 8, prompt => '/\.\n|\n\n/' );

## Print Help
sub usage {
  my $msg = shift;
  if (defined $msg) {
     print "$msg\n";
  }
  print << "END";
       Usage: check_varnish_backend_health option
       Options:
               --help
               -h|--host hostname
               --all
               --verbose
               --version
       NB! Only one option at the time! :-)
END
  exit(3);
}

#Get options from command line
$result = GetOptions(
       "host|h=s"      => \$host,
       "all"           => \$all,
       "version"       => \$version,
       "verbose"       => \$verbose,
       "help"          => \$help
);

if ($help)      {usage();}
if ($version)   {usage('Version is 0.9_EBE');}
if ($all)       {if ($host){usage('Either --all _or_ --host hostname')};}
if ($verbose) {if ($host){usage('Either --verbose _or_ --host hostname')};}

my $ip = "127.0.0.1";
$handle->open("$ip");
my @lines = $handle->cmd("debug\.health");
$handle->close;
my %backends_state_hash;
my @sick_backends;
foreach (@lines) {
if (/Backend/) {
       my @line = split(' ',$_);
       $status = pop(@line);
       my $be = @line[1];
       if ($verbose) {
               print "Host: $be => $status\n";
               }
       if ($status eq 'Sick'){
               push(@sick_backends,$be);
               }
       $backends_state_hash{$be} = $status;
       }
}

if ($host) {
       if ( $backends_state_hash{$host} eq 'Healthy'){
               print "OK: Host $host is Healthy";
               exit 0;
       }
       elsif ( $backends_state_hash{$host} eq 'Sick'){
               print "Critical: Host $host is Sick";
               exit 2;
       }
       else {
               print "Warning: Host $host is not Healthy, nor is it Sick";
               exit 1;
       }
}
my $number_of_sick_backen...@sick_backends;
my $number_of_backends = keys %backends_state_hash;
if ( $number_of_backends ==  0 ) {
       print "Warning: No backends are probed";
       exit 1;
}
if ( $number_of_sick_backends == 0 ) {
       print "OK: All hosts are Healthy";
       exit 0;
}
if ( $number_of_sick_backends == $number_of_backends ) {
       print "Critical: All hosts are Sick - @sick_backends";
       exit 2;
}
else {
       print "Warning: Some hosts are sick: @sick_backends";
       exit 1;
}
______________________________________

--
Eivind Bengtsson
Systemadministrator - Cand.merc.(dat)
Danmarks Meteorologiske Institut
Lyngbyvej 100
2100 København Ø
Direkte tlf. : 39157544
Email: [email protected]
echo 'This is not a pipe.' | cat -> /dev/tty


_______________________________________________
varnish-misc mailing list
[email protected]
http://lists.varnish-cache.org/mailman/listinfo/varnish-misc

Reply via email to