Today Simon Westlake wrote:
Hi Simon,
try this patch:
--- Smokeping.pm.orig Thu Jan 15 17:14:34 2004
+++ Smokeping.pm Thu Jan 15 17:15:55 2004
@@ -1899,6 +1899,7 @@
do_log("Launched successfully");
report_probes($probes);
while (1) {
+ my $now = time;
if ($opt{debug}) {
map { $probes->{$_}->debug(1) if $probes->{$_}->can('debug') }
keys %$probes;
@@ -1906,6 +1907,10 @@
run_probes $probes;
update_rrds $cfg, $probes, $cfg->{Targets}{probe}, $cfg->{Targets},
$cfg->{General}{datadir};
exit 0 if $opt{debug};
+ my $runtime = time - $now;
+ warn "WARNING: smokeping took $runtime seconds to complete 1 round of
polling. ".
+ "It should complete polling in $cfg->{Database}{step} seconds. ".
+ "You may have unresponsive devices in your setup.\n" if $runtime
> $cfg->{Database}{step};
sleep $cfg->{Database}{step} - time % $cfg->{Database}{step};
}
}
Now smokeping will complain when it is taking too long to complete a round.
How is the load on your machine while smokeping is polling ?
The reason for the gaps when you widen the step is, that your rrds
have the maximal acceptable update time internally. you can use rrdtool tune to
change that
tobi
> Hi,
>
> A few weeks ago I posted about gaps in Smokeping graphs, and the eventual
> conclusion was that it was simply taking too long for Smokeping to run.
>
> I tried running it at 10 minutes rather than 5 but, strangely, there were
> more gaps at 10 minutes than at 5 (this always seems to be the case for me..
> I tried it again recently and had the same problem.)
>
> My previous solution was to remove devices from Smokeping that were regularly
> unresponsive - their removal seemed to resolve the problem.
>
> So, I'm stuck running at 5 minutes, as anything above that seems to produce
> more gaps. However, I'm adding 20+ devices a week to Smokeping and I'm
> starting to get gaps again. I'm assuming it's taking too long to run again,
> but I only have ~15 unresponsive devices at a time (out of 1300) so it
> doesn't seem to be a problem with excessive timeouts.
>
> I measure the amount of time MRTG takes to run for monitoring purposes by
> doing:
>
> x=`date +%s`;z=`date`;/usr/local/mrtg-2/bin/mrtg /usr/local/mrtg.cfg;y=`date
> +%s`;runtime=`expr $y - $x`;echo "$z runtime was $runtime seconds"
> >>/home/simon/runtime
>
> I can't, however, think of a good way to do this for Smokeping.
>
> So, two quick questions..
>
> Can anyone think of a way of doing something similar for Smokeping?
> Does anyone have an example of a relatively aggressive probe configuration
> for fping for monitoring large numbers of devices? I did try modifying
> parameters to pass to fping as specified in the Smokeping documentation, but
> I think I must be reading it incorrectly, as I couldn't get the syntax right.
> I'd be happy to increase the wait by a very slightly increment for successive
> timeouts and to give up at ~800ms or so.
>
> The eventual solution is going to be to split the polling over a large number
> of servers, but for the time being, I'm stuck running it on a single box. I'm
> 99% sure it's a case of excessive wait time, as the server is relatively
> powerful.
>
> Thanks for any help you can provide.
>
>
> --
> Unsubscribe mailto:[EMAIL PROTECTED]
> Help mailto:[EMAIL PROTECTED]
> Archive http://www.ee.ethz.ch/~slist/smokeping-users
> WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi
>
--
______ __ _
/_ __/_ / / (_) Oetiker @ ISG.EE, ETZ J97, ETH, CH-8092 Zurich
/ // _ \/ _ \/ / System Manager, Time Lord, Coder, Designer, Coach
/_/ \.__/_.__/_/ http://people.ee.ethz.ch/~oetiker +41(0)1-632-5286
--
Unsubscribe mailto:[EMAIL PROTECTED]
Help mailto:[EMAIL PROTECTED]
Archive http://www.ee.ethz.ch/~slist/smokeping-users
WebAdmin http://www.ee.ethz.ch/~slist/lsg2.cgi