Hey Greg, I took your advice for disabling apparmour which still doesn't help. This box is definitely not doing anything I/O intensive. It's also a relatively powerful box (12 core, 24gb ram). It's basically a cron server and web server for a local intranet occupied by maybe 20 people at the moment but will be used for more cpu intensive stuff later down the line. I tested top when loading the page and the footprint was minimal.
I didn't realize I had slaves entered in the slave section, I was just using the template config, so I removed those. No slaves are configured for this. I debugged this further and found that when I upped the default timeout and found that once apache2 was restarted, it would increasingly take more time to show the page initially as I increased the number of Targets, however, once the page was loaded, it would refresh almost instantly. I also upgraded my local install of RRDTool, however, this doesn't seem to help the initial load time. On Tue, Mar 4, 2014 at 4:51 PM, Gregory Sloop <[email protected]> wrote: > [Tue Mar 04 15:15:36 2014] [warn] [client 192.168.1.66] mod_fcgid: read > data timeout in 40 seconds, referer: http://pipeline/ > > Looks like the smokeping cgi times out reading data. > Is this box I/O bound? > What does top show when you try to get a web-page from SP? [load > averages in particular] > > In any case, you need to figure out why the CGI is failing to read the > data in the allowed time of 40 secs. > Changing the default time-out might help if the box is I/O bound, but not > totally buried. [And I'm not sure where that might be.] > > However, if the box is seriously overloaded I/O wise, then waiting longer > won't really solve your problem - it will just push the box further below > the water. > [And this all gets back to - how many RRD's and how big are they. See the > database section. Are there slaves? If so, how many?] > > Finally: > > >Is fping being ran as soon as the cgi script is executed from the > webserver? > > You appear to misunderstand how SP works. The daemon runs fping and logs > the results and writes to the RRD's. The CGI pulls data from the RRD and > generates graphs for the http output. > > It appears from the debug log from SP that writing the data went fine. [At > least for the small subset of targets.] > However reading the RRD's and generating the graphs appears to > fail/timeout when reading the RRD's. [Or reading something - in any case.] > > Is selinux or apparmour running? If so, then stop them or run in > permissive mode and see if that helps. > > > -Greg > > > Forgot to add the smoke.log: > > http://pastebin.com/20UbvJVx > > At the bottom of the log you can see that I also tried timing fping (the > same command that smokeping was running) and it looks like it took 19.3 > seconds to run for a small number of machines. Would that cause it to time > out? Is fping being ran as soon as the cgi script is executed from the > webserver? > > > > On Tue, Mar 4, 2014 at 4:10 PM, Brett Bronson < > [email protected]> wrote: > Here is the apache error log that is listing smokeping: > http://pastebin.com/Knm1Cmw1 > > As for debug mode, here's my output: > http://pastebin.com/8txnhnkv > > The host names do resolve; here's an example: > [04:07 PM]superuser@pipeline[/opt/smokeping/bin] > time fping larender001a > larender001a is alive > > real 0m0.014s > user 0m0.000s > sys 0m0.000s > > > > On Tue, Mar 4, 2014 at 3:32 PM, Brett Bronson < > [email protected]> wrote: > Also, it looks like the version I have running is actually the latest, I > assumed it would output the version as 2.6.9. Sorry > > > On Tue, Mar 4, 2014 at 3:29 PM, Brett Bronson < > [email protected]> wrote: > Okay, it looks like I was actually using an older version of smokeping. > I've removed it and installed the latest version on the site and my config > is as follows: > http://pastebin.com/ZsLE8uCp > > Before, I was able to get smokeping to work fine up until I added the > section: > > + nodes > menu = Render Node Latency > title = Render Node Latency (ICMP Pings) > > ++ larender001a > host = larender001a > ++ larender001b > host = larender001b > ++ larender001c > host = larender001c > ++ larender001d > host = larender001d > > ++ larender002a > host = larender002a > ++ larender002b > host = larender002b > ++ larender002c > host = larender002c > ++ larender002d > host = larender002d > > > > Now that I look at the logs, it looks like it's still using the old > version.... > [ ... ] > Tue Mar 4 15:03:05 2014 - FPing: probing 5 targets with step 300 s and > offset 116 s. > Tue Mar 4 15:16:01 2014 - Smokeping version 2.006009 successfully > launched. > Tue Mar 4 15:16:01 2014 - Not entering multiprocess mode for just a > single probe. > Tue Mar 4 15:16:01 2014 - FPing: probing 13 targets with step 300 s and > offset 163 s. > Tue Mar 4 15:25:59 2014 - Smokeping version 2.006009 successfully > launched. > Tue Mar 4 15:25:59 2014 - Not entering multiprocess mode for just a > single probe. > Tue Mar 4 15:25:59 2014 - FPing: probing 13 targets with step 300 s and > offset 159 s. > > Before, I used sudo apt-get install smokeping to install, but I later > removed it using sudo apt-get remove smokeping; however, it looks like it > didn't remove the old version? Any idea how I could resolve this so that it > loads up the newer version? > > > > > > On Tue, Mar 4, 2014 at 2:28 PM, Gregory Sloop <[email protected]> wrote: > I don't see a database section, so I assume it's somewhere else. [Nothing > looks obviously wrong - but that was just a quick glance.] > > But when you first start SP after adding a bunch of targets, it's going to > have to allocate/create the RRD for each of the targets. > [Also, are there slaves, because it will create X * 60 new RRD's - where X > is how many slave SP instances you have. (In addition to the master RRD's) ] > > I wouldn't think that would take 10m, but I can't see how much data you're > stuffing in each RRD, or if you have slaves, which might help explain it. > > As to why web-pages won't work, I'm not sure. Have you looked at the > apache logs to see what they say? Or run SP in debug mode? [smokeping > --debug > IIRC] > > -Greg > > > Hello, > > I recently updated my smokeping Target configuration to include about 60 > of our machines in our render farm and noticed that restarting the > smokeping service took about 10 minutes, and now our webpage will not load. > > Any ideas? > > My config: > http://pastebin.com/ibNmGhAF > > > -- > Brett Bronson > Big Block | Pipeline TD > http://www.bigblockla.com > [m] 805-338-6520 > > > > > > -- > Brett Bronson > Big Block | Pipeline TD > http://www.bigblockla.com > [m] 805-338-6520 > > > > > -- > Brett Bronson > Big Block | Pipeline TD > http://www.bigblockla.com > [m] 805-338-6520 > > > > > -- > Brett Bronson > Big Block | Pipeline TD > http://www.bigblockla.com > [m] 805-338-6520 > > > > > -- > Brett Bronson > Big Block | Pipeline TD > http://www.bigblockla.com > [m] 805-338-6520 > > > > > > *-- Gregory Sloop, Principal: Sloop Network & Computer Consulting Voice: > 503.251.0452 x82 <503.251.0452%20x82> EMail: *[email protected] > http://www.sloop.net > *---* > -- Brett Bronson Big Block | Pipeline TD http://www.bigblockla.com [m] 805-338-6520
_______________________________________________ smokeping-users mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
