So, I've solved my problem, and it was like I envisioned. [The alert script/MTR 
seems to run in the foreground and everything stops in smokeping until the 
script finishes, thus causing the RRD's to stop getting updated etc.]

So, now, I call a bash script from smokeping like so:
---
to=|/etc/smokeping/call-smoke-mtr

**Contents of call-smoke-mtr
---
/etc/smokeping/smoke-mtr "$@" &
---

The $@ is short-hand for "output all the command line args."
So we just run the real script with all the same input args we got called with, 
and do it in the background...

The smoke-mtr is the "real" python script that actually does the MTR, generates 
the report and sends it.

And these all go into the background and run just fine.

A warning - if you kicked off some large number [I have no idea what large 
number might be a problem - I'd guess perhaps 100+] there's no control in my 
setup to keep us from consuming all the available resources and killing the 
smokeping monitor.

Perhaps a few lines to keep the number of background processes from exceeding 
some threshold would be good - but I've not tinkered with that.

But the above works, and prevents the choke in smokeping writing the RRD's 
while the MTR is running and thus no gaps in your RRD's, nogios complaining 
bitterly etc. :)

-Greg



So, I've changed the thread title...

A few updates.

I didn't think it was load, so I tried running the Alert/MTR script *by 
hand/manually*, while smokeping and nagios are doing their thing - just to test 
what load was and what the effect was.

I ran about 15 alerts/MTR runs in quick succession - all while smokeping and 
nagios were also running doing their work.
Load does peak higher than I suspected - at ~2 for the 1 min average - but 
those queries complete fairly quickly and load drops back to around 0.3-0.4. 
[and this is way more load from the MTR script than should have been occurring 
in the automated runs I was doing before.]

However, even with the much higher load, there were no drops in writing the 
smokeping RRD's and Nagios doesn't complain about them.

So, I think it's safe to say that it's not a load issue - it's that for some 
reason when smokeping runs the "alert" script, that it has to wait for that 
script to finish before it goes on to do anything more - and this causes the 
other issues.

---
So, I also tried appending a "&" to the smokeping alert line in the config - in 
the hopes that it would run the process in the background. No luck. [I'd guess 
it places the "&" before the passed arguments and the script doesn't get any of 
the passed arguments it needs.]

I thought about creating a script that would run a second script and append the 
"&" to it, and run it.

e.g. 
"MTR-Create" [a (bash?) script] - would take the arguments it was passed from 
smokeping [you'd call MTR-Create from the smokeping alert]
MTR-Create would simply take it's arguments and call the "regular" MTR/Alert, 
passing along those arguments and appending "&" at the end to run it in the 
background.

I suspect I can struggle my way through doing that - but does any BASH guru 
know how best to do that, offhand. It could save me a lot of poking, trial and 
error! :)

TIA!

-Greg



Greg,

How many alerts are firing when your box starts to bog?  I have been running my 
fork of the mtr script for several months now with no issues.  Matter of fact, 
I am now working on an expanded version that will dump the mtr's into mysql for 
easy access for our NOC.  Currently, I just have the script appending a file in 
/var/log with each mtr.

Could you be pushing the box you are running from too hard?




On Wed, Jun 25, 2014 at 8:12 PM, Gregory Sloop <[email protected]> wrote:



FP> On 21.02.2014 06:42, Philip Wehunt wrote:
>> I could hackishly work around this in my python but I wanted to
>> identify if I am doing something wrong on the SP side or if it is a
>> bug. Mainly in the spirit of KISS. I don't like to let hackish
>> scripts linger.

FP> You probably found the same script on gist, but here's my version[1]
FP> which doesn't fail when the 6th arg is missing. It will not add "
FP> cleared" to the subject without the arg, but it will send you the report.

FP> [1]: https://git.server-speed.net/users/flo/bin/tree/smokemtr.py

FP> From the documentation in smokeping_config I'd say this is a bug, but
FP> given I get my mails I didn't bother fixing it yet.

Florian et.al.

First, thanks for the script. I've had to mod it a bit - my MTR isn't quite the 
same as yours and I want to use a non-local SMTP server and port - but those 
were easy mods. [MTR is in a different spot too, again easy mod.]

So, I'm very excited about the prospects of automated mtr stats when a 
smokeping alert gets triggered - however I run into a substantial snag.

I use a 60s poll in smokeping, and if I get a bunch of [smokeping] alerts that 
kick off, then, when each MTR takes a while to run, it stalls smokeping. 

This causes a ripple-effect, and a raft of nagios alerts...since I use a 
smokeping nagios plug-in.  When SP stalls [running the mtr's] the RRD's go dry, 
and then nagios starts alerting on an "unknown" target state. ["This RRD hasn't 
been written to in 180s" etc.]

So, is there some way I can fork off the mtr script, and allow smokeping to 
continue while the mtr stats are gathered and a report sent?

[This is something I'm woefully un-knowledgeable about...]

TIA
-Greg

_______________________________________________
smokeping-users mailing list
[email protected]
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users


-- 
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: [email protected]
http://www.sloop.net
---

-- 
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: [email protected]
http://www.sloop.net
---
_______________________________________________
smokeping-users mailing list
[email protected]
https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users

Reply via email to