So, I've solved my problem, and it was like I envisioned. [The alert script/MTR
seems to run in the foreground and everything stops in smokeping until the
script finishes, thus causing the RRD's to stop getting updated etc.]
So, now, I call a bash script from smokeping like so:
---
to=|/etc/smokeping/call-smoke-mtr
**Contents of call-smoke-mtr
---
/etc/smokeping/smoke-mtr "$@" &
---
The $@ is short-hand for "output all the command line args."
So we just run the real script with all the same input args we got called with,
and do it in the background...
The smoke-mtr is the "real" python script that actually does the MTR, generates
the report and sends it.
And these all go into the background and run just fine.
A warning - if you kicked off some large number [I have no idea what large
number might be a problem - I'd guess perhaps 100+] there's no control in my
setup to keep us from consuming all the available resources and killing the
smokeping monitor.
Perhaps a few lines to keep the number of background processes from exceeding
some threshold would be good - but I've not tinkered with that.
But the above works, and prevents the choke in smokeping writing the RRD's
while the MTR is running and thus no gaps in your RRD's, nogios complaining
bitterly etc. :)
-Greg
So, I've changed the thread title...
A few updates.
I didn't think it was load, so I tried running the Alert/MTR script *by
hand/manually*, while smokeping and nagios are doing their thing - just to test
what load was and what the effect was.
I ran about 15 alerts/MTR runs in quick succession - all while smokeping and
nagios were also running doing their work.
Load does peak higher than I suspected - at ~2 for the 1 min average - but
those queries complete fairly quickly and load drops back to around 0.3-0.4.
[and this is way more load from the MTR script than should have been occurring
in the automated runs I was doing before.]
However, even with the much higher load, there were no drops in writing the
smokeping RRD's and Nagios doesn't complain about them.
So, I think it's safe to say that it's not a load issue - it's that for some
reason when smokeping runs the "alert" script, that it has to wait for that
script to finish before it goes on to do anything more - and this causes the
other issues.
---
So, I also tried appending a "&" to the smokeping alert line in the config - in
the hopes that it would run the process in the background. No luck. [I'd guess
it places the "&" before the passed arguments and the script doesn't get any of
the passed arguments it needs.]
I thought about creating a script that would run a second script and append the
"&" to it, and run it.
e.g.
"MTR-Create" [a (bash?) script] - would take the arguments it was passed from
smokeping [you'd call MTR-Create from the smokeping alert]
MTR-Create would simply take it's arguments and call the "regular" MTR/Alert,
passing along those arguments and appending "&" at the end to run it in the
background.
I suspect I can struggle my way through doing that - but does any BASH guru
know how best to do that, offhand. It could save me a lot of poking, trial and
error! :)
TIA!
-Greg
Greg,
How many alerts are firing when your box starts to bog? I have been running my
fork of the mtr script for several months now with no issues. Matter of fact,
I am now working on an expanded version that will dump the mtr's into mysql for
easy access for our NOC. Currently, I just have the script appending a file in
/var/log with each mtr.
Could you be pushing the box you are running from too hard?
On Wed, Jun 25, 2014 at 8:12 PM, Gregory Sloop <[email protected]> wrote:
FP> On 21.02.2014 06:42, Philip Wehunt wrote:
>> I could hackishly work around this in my python but I wanted to
>> identify if I am doing something wrong on the SP side or if it is a
>> bug. Mainly in the spirit of KISS. I don't like to let hackish
>> scripts linger.
FP> You probably found the same script on gist, but here's my version[1]
FP> which doesn't fail when the 6th arg is missing. It will not add "
FP> cleared" to the subject without the arg, but it will send you the report.
FP> [1]: https://git.server-speed.net/users/flo/bin/tree/smokemtr.py
FP> From the documentation in smokeping_config I'd say this is a bug, but
FP> given I get my mails I didn't bother fixing it yet.
Florian et.al.
First, thanks for the script. I've had to mod it a bit - my MTR isn't quite the
same as yours and I want to use a non-local SMTP server and port - but those
were easy mods. [MTR is in a different spot too, again easy mod.]
So, I'm very excited about the prospects of automated mtr stats when a
smokeping alert gets triggered - however I run into a substantial snag.
I use a 60s poll in smokeping, and if I get a bunch of [smokeping] alerts that
kick off, then, when each MTR takes a while to run, it stalls smokeping.
This causes a ripple-effect, and a raft of nagios alerts...since I use a
smokeping nagios plug-in. When SP stalls [running the mtr's] the RRD's go dry,
and then nagios starts alerting on an "unknown" target state. ["This RRD hasn't
been written to in 180s" etc.]
So, is there some way I can fork off the mtr script, and allow smokeping to
continue while the mtr stats are gathered and a report sent?
[This is something I'm woefully un-knowledgeable about...]
TIA
-Greg
_______________________________________________
smokeping-users mailing list
[email protected]
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: [email protected]
http://www.sloop.net
---
--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: [email protected]
http://www.sloop.net
---
_______________________________________________
smokeping-users mailing list
[email protected]
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users