On Friday, October 31, 2014 9:50:41 AM UTC-4, Georgi Todorov wrote:
>
> Actually, sometime last night something happened and puppet stopped
> processing requests altogether. Stopping and starting httpd fixed this, but
> this could be just some bug in one of the new versions of software I
> upgraded to. I'll keep monitoring.
>
So, unfortunately issue is not fixed :(. For whatever reason, everything
ran great for a day. Catalog compiles were taking around 7 seconds, client
runs finished in about 20s - happy days. Then overnight, the catalog
compile times jumped to 20-30 seconds and client runs were now taking 200+
seconds. Few hours later, and there would be no more requests arriving at
the puppet master at all. Is my http server flaking out?
Running some --trace --evaltrace and strace it looks like most of the time
is spent stat-ing:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
83.01 5.743474 9 673606 612864 stat
7.72 0.534393 7 72102 71510 lstat
6.76 0.467930 77988 6 wait4
That's a pretty poor "hit" rate (7k out of 74k stats)...
I've increased the check time to 1 hour on all clients, and the master
seems to be keeping up for now - catalog compile avg 8 seconds, client run
avg - 15 seconds, queue size = 0;
Here is what a client run looks like when the server is keeping up:
Notice: Finished catalog run in *11.93* seconds
Changes:
Events:
Resources:
Total: 522
Time:
Filebucket: 0.00
Cron: 0.00
Schedule: 0.00
Package: 0.00
Service: 0.68
Exec: 1.07
*File: 1.72*
Config retrieval: 13.35
Last run: 1415032387
Total: 16.82
Version:
Config: 1415031292
Puppet: 3.7.2
And when the server is just about dead:
Notice: Finished catalog run in 214.21 seconds
Changes:
Events:
Resources:
Total: 522
Time:
Cron: 0.00
Filebucket: 0.00
Schedule: 0.01
Package: 0.02
Service: 1.19
File: 128.94
Last run: 1415027092
Total: 159.21
Exec: 2.25
Config retrieval: 26.80
Version:
Config: 1415025705
Puppet: 3.7.2
Probably 500 of the "Resources" are autofs maps
using https://github.com/pdxcat/puppet-module-autofs/commits/master
So there is definitely some bottle neck on the system, the problem is I
can't figure out what it is. Is disk IO (iostat doesn't seem to think so),
is it CPU (top looks fine), is it memory (ditto), is http/passenger combo
not up to the task, is the postgres server not keeping up? There are so
many components that it is hard for me to do a proper profile to find where
the bottleneck is. Any ideas?
So far I've timed the ENC script that pulls the classes for a node - takes
less than 1 second.
>From messages the catalog compile is from 7 seconds to 25 seconds (worst
case, overloaded server).
Anyway, figured I'd share that, unfortunately ruby was not the issue. Back
to poking around and testing.
--
You received this message because you are subscribed to the Google Groups
"Puppet Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/puppet-users/3549a96b-74e2-4335-90a9-3aa6f8f74699%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.