I am also now pretty certain that this issue (ticket #11140) is tied directly 
to 3 systems (in ticket #11143) which can't get catalogs. I believe their 
attempts to get a catalog produce a hung server. 3 servers every 30 minutes 
means that in just over 3 hours I have 20 hung puppetmasters, and the queue 
goes out of control.

I would deeply appreciate some information on how to diagnose the catalog 
failures and related puppetmaster hangs.

On Dec 2, 2011, at 3:09 PM, Jo Rhett wrote:
> Hm, you know I don't think that it's a sudden lock of all 20 passenger 
> clients.  I think it's a slow lockup of various puppet sessions until all 20 
> are locked.  Here's an example: every one of the "active" sessions below with 
> an uptime longer than 30 minutes has had the same "processed" number for more 
> than 30 minutes at this time.  So in theory, they've been processing the same 
> session for more than 30 minutes.  Somehow, I don't think so.  I think those 
> sessions are locked up.  And what is happening is that eventually all 20 
> processes are hung and we are dead in the water.
> 
> Fri Dec  2 23:05:59 UTC 2011
> ----------- General information -----------
> max      = 20
> count    = 18
> active   = 12
> inactive = 6
> Waiting on global queue: 0
> 
> ----------- Domains -----------
> /etc/puppet/rack: 
>   PID: 21021   Sessions: 0    Processed: 362     Uptime: 5m 37s
>   PID: 21005   Sessions: 0    Processed: 537     Uptime: 5m 38s
>   PID: 21555   Sessions: 0    Processed: 69      Uptime: 30s
>   PID: 21571   Sessions: 0    Processed: 62      Uptime: 29s
>   PID: 20989   Sessions: 0    Processed: 209     Uptime: 5m 39s
>   PID: 20968   Sessions: 0    Processed: 157     Uptime: 5m 41s
>   PID: 9221    Sessions: 1    Processed: 903     Uptime: 2h 5m 55s
>   PID: 9340    Sessions: 1    Processed: 764     Uptime: 2h 4m 58s
>   PID: 10379   Sessions: 1    Processed: 568     Uptime: 1h 57m 37s
>   PID: 11847   Sessions: 1    Processed: 712     Uptime: 1h 41m 13s
>   PID: 11686   Sessions: 1    Processed: 314     Uptime: 1h 41m 19s
>   PID: 10845   Sessions: 1    Processed: 511     Uptime: 1h 48m 52s
>   PID: 11650   Sessions: 1    Processed: 747     Uptime: 1h 41m 21s
>   PID: 14967   Sessions: 1    Processed: 84      Uptime: 1h 8m 28s
>   PID: 17605   Sessions: 1    Processed: 497     Uptime: 44m 41s
>   PID: 20342   Sessions: 1    Processed: 0       Uptime: 13m 14s
>   PID: 20358   Sessions: 1    Processed: 54      Uptime: 13m 13s
>   PID: 18098   Sessions: 1    Processed: 854     Uptime: 35m 46s
> 
> On Dec 2, 2011, at 2:22 PM, Jo Rhett wrote:
> 
>> On Dec 2, 2011, at 1:30 PM, Nigel Kersten wrote:
>>> On Fri, Dec 2, 2011 at 1:03 PM, Jo Rhett <[email protected]> wrote:
>>> Okay, this has happened again.  Puppet master stopped logging catalog 
>>> compiles, every server stopped returning results and the global queue went 
>>> quickly through the roof in like 9 minutes.  It appears puppet master is 
>>> stopping dead in its tracks without logging any errors.
>>> 
>>> A really quick test would be to start a webrick puppetmaster on an 
>>> alternate port with the same configuration file in debug mode and then 
>>> puppet against it to see if there's a problem at that level,
>>> 
>>> (on master)
>>> puppet master --no-daemonize --verbose --debug --masterport 9140 (for 
>>> example)
>>> 
>>> (on an agent)
>>> puppet agent --test --masterport 9140
>> 
>> This works perfectly fine.
>> 
>>> If that doesn't show anything, let us know whether you're running Apache 
>>> prefork or worker, and your relevant pool regulation settings like:
>>> 
>>> StartServers
>>> MinSpareServers
>>> MaxSpareServers
>>> ServerLimit
>>> MaxClients
>>> MaxRequestsPerChild
>> 
>> pre fork  with the following settings:
>> 
>> StartServers       8
>> MinSpareServers    5
>> MaxSpareServers   20
>> ServerLimit      256
>> MaxClients       256
>> MaxRequestsPerChild  4000
>> 
>>> # passenger-status
>>> ----------- General information -----------
>>> max      = 20
>>> count    = 20
>>> active   = 20
>>> inactive = 0
>>> Waiting on global queue: 209
>>> 
>>> ----------- Domains -----------
>>> /etc/puppet/rack: 
>>>   PID: 25783   Sessions: 1    Processed: 329     Uptime: 2h 52m 7s
>>>   PID: 25831   Sessions: 1    Processed: 4       Uptime: 2h 52m 5s
>>>   PID: 28517   Sessions: 1    Processed: 6       Uptime: 2h 22m 0s
>>>   PID: 25802   Sessions: 1    Processed: 714     Uptime: 2h 52m 6s
>>>   PID: 30905   Sessions: 1    Processed: 13      Uptime: 1h 50m 27s
>>>   PID: 25864   Sessions: 1    Processed: 709     Uptime: 2h 52m 4s
>>>   PID: 31028   Sessions: 1    Processed: 347     Uptime: 1h 50m 21s
>>>   PID: 28944   Sessions: 1    Processed: 377     Uptime: 2h 21m 50s
>>>   PID: 31090   Sessions: 1    Processed: 266     Uptime: 1h 50m 18s
>>>   PID: 577     Sessions: 1    Processed: 400     Uptime: 1h 27m 27s
>>>   PID: 418     Sessions: 1    Processed: 647     Uptime: 1h 28m 2s
>>>   PID: 1247    Sessions: 1    Processed: 133     Uptime: 1h 19m 3s
>>>   PID: 1474    Sessions: 1    Processed: 52      Uptime: 1h 18m 9s
>>>   PID: 594     Sessions: 1    Processed: 378     Uptime: 1h 27m 26s
>>>   PID: 4706    Sessions: 1    Processed: 414     Uptime: 48m 5s
>>>   PID: 4775    Sessions: 1    Processed: 218     Uptime: 47m 28s
>>>   PID: 4854    Sessions: 1    Processed: 584     Uptime: 47m 23s
>>>   PID: 7774    Sessions: 1    Processed: 165     Uptime: 14m 27s
>>>   PID: 7902    Sessions: 1    Processed: 44      Uptime: 13m 44s
>>>   PID: 8149    Sessions: 1    Processed: 541     Uptime: 11m 21s
>>> 
>>> 
>>> On Dec 2, 2011, at 10:58 AM, Jo Rhett wrote:
>>>> I came in this morning to find all the servers all locked up solid:
>>>> 
>>>> # passenger-status
>>>> ----------- General information -----------
>>>> max      = 20
>>>> count    = 20
>>>> active   = 20
>>>> inactive = 0
>>>> Waiting on global queue: 236
>>>> 
>>>> ----------- Domains -----------
>>>> /etc/puppet/rack: 
>>>>  PID: 2720    Sessions: 1    Processed: 939     Uptime: 9h 22m 18s
>>>>  PID: 1615    Sessions: 1    Processed: 947     Uptime: 9h 23m 14s
>>>>  PID: 1596    Sessions: 1    Processed: 607     Uptime: 9h 23m 15s
>>>>  PID: 1722    Sessions: 1    Processed: 953     Uptime: 9h 23m 9s
>>>>  PID: 2218    Sessions: 1    Processed: 378     Uptime: 9h 22m 43s
>>>>  PID: 4286    Sessions: 1    Processed: 178     Uptime: 8h 50m 58s
>>>>  PID: 5749    Sessions: 1    Processed: 708     Uptime: 8h 20m 20s
>>>>  PID: 4253    Sessions: 1    Processed: 820     Uptime: 8h 51m 1s
>>>>  PID: 5624    Sessions: 1    Processed: 126     Uptime: 8h 20m 24s
>>>>  PID: 7328    Sessions: 1    Processed: 811     Uptime: 7h 49m 17s
>>>>  PID: 7274    Sessions: 1    Processed: 984     Uptime: 7h 49m 20s
>>>>  PID: 8761    Sessions: 1    Processed: 85      Uptime: 7h 18m 50s
>>>>  PID: 9135    Sessions: 1    Processed: 907     Uptime: 7h 16m 27s
>>>>  PID: 8777    Sessions: 1    Processed: 342     Uptime: 7h 18m 49s
>>>>  PID: 10508   Sessions: 1    Processed: 51      Uptime: 6h 47m 6s
>>>>  PID: 10853   Sessions: 1    Processed: 603     Uptime: 6h 43m 9s
>>>>  PID: 10620   Sessions: 1    Processed: 939     Uptime: 6h 45m 52s
>>>>  PID: 11438   Sessions: 1    Processed: 870     Uptime: 6h 30m 8s
>>>>  PID: 12582   Sessions: 1    Processed: 448     Uptime: 6h 9m 59s
>>>>  PID: 12670   Sessions: 1    Processed: 400     Uptime: 6h 8m 46s
>>>> 
>>>> For comparison, most of our server processes recycle within 20 minutes 
>>>> normally, as they hit 1000 really fast.
>>>> 
>>>> # you probably want to tune these settings
>>>> PassengerHighPerformance on
>>>> PassengerUseGlobalQueue on
>>>> PassengerMaxPoolSize 20
>>>> PassengerPoolIdleTime 1800
>>>> PassengerMaxRequests 1000
>>>> #PassengerStatThrottleRate 120
>>>> RackAutoDetect Off
>>>> RailsAutoDetect Off
>>>> 
>>>> There is nothing useful in the system logs.  They just stopped:
>>>> 
>>>> Dec  2 12:06:34 axxats003 puppet-master[12670]: Compiled catalog for 
>>>> axxamx001.sjc.company.com in environment production 
>>>> in 1.76 seconds
>>>> Dec  2 12:06:37 axxats003 puppet-master[12670]: Compiled catalog for 
>>>> axxatn016.sjc.company.com in environment production 
>>>> in 1.64 seconds
>>>> Dec  2 12:06:40 axxats003 puppet-master[12670]: Compiled catalog for 
>>>> axaafc001.company.com in environment production i
>>>> n 1.70 seconds
>>>> Dec  2 14:10:02 axxats003 puppet-agent[16965]: Reopening log files
>>>> Dec  2 14:10:02 axxats003 puppet-agent[16965]: Starting Puppet client 
>>>> version 2.6.12
>>>> Dec  2 14:12:04 axxats003 puppet-agent[16965]: Could not retrieve catalog 
>>>> from remote server: execution expired
>>>> Dec  2 14:12:04 axxats003 puppet-agent[16965]: Using cached catalog
>>>> 
>>>> (every 30 minutes puppet agent says the same thing until I restart the 
>>>> puppet master)
>>>> 
>>>> Dec  2 18:06:09 axxats003 puppet-master[25783]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:10 axxats003 puppet-master[25802]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:11 axxats003 puppet-master[25831]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:12 axxats003 puppet-master[25864]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:13 axxats003 puppet-master[25897]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:14 axxats003 puppet-master[25922]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:15 axxats003 puppet-master[25947]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:16 axxats003 puppet-master[25972]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:17 axxats003 puppet-master[25997]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:18 axxats003 puppet-master[26019]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:19 axxats003 puppet-master[26056]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:20 axxats003 puppet-master[26081]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:06:21 axxats003 puppet-master[26115]: Starting Puppet master 
>>>> version 2.6.12
>>>> Dec  2 18:14:32 axxats003 puppet-master[26115]: Compiled catalog for 
>>>> axxatn018.sjc.company.com in environment production in 3.63 seconds
>>>> Dec  2 18:14:37 axxats003 puppet-master[26115]: Compiled catalog for 
>>>> axxamb002.sjc.company.com in environment production in 1.47 seconds
>>>> Dec  2 18:14:50 axxats003 puppet-master[26115]: Compiled catalog for 
>>>> axxasn001.sjc.company.com in environment production in 1.57 seconds
>>>> 
>>>> There are no other messages in /var/log/messages -- the system was 
>>>> otherwise not busy.  Apache error log only observed max clients get hit:
>>>> [Fri Dec 02 08:42:43 2011] [notice] Apache/2.2.3 (CentOS) configured -- 
>>>> resuming normal operations
>>>> [Fri Dec 02 12:23:46 2011] [error] server reached MaxClients setting, 
>>>> consider raising the MaxClients setting
>>>> [Fri Dec 02 18:06:07 2011] [notice] caught SIGTERM, shutting down
>>>> [Fri Dec 02 18:06:08 2011] [notice] suEXEC mechanism enabled (wrapper: 
>>>> /usr/sbin/suexec)
>>>> [Fri Dec 02 18:06:08 2011] [warn] RSA server certificate CommonName (CN) 
>>>> `puppetmaster.company.com' does NOT match server name!?
>>>> [Fri Dec 02 18:06:08 2011] [notice] Digest: generating secret for digest 
>>>> authentication ...
>>>> [Fri Dec 02 18:06:08 2011] [notice] Digest: done
>>>> [Fri Dec 02 18:06:08 2011] [warn] RSA server certificate CommonName (CN) 
>>>> `puppetmaster.company.com' does NOT match server name!?
>>>> [Fri Dec 02 18:06:08 2011] [notice] Apache/2.2.3 (CentOS) configured -- 
>>>> resuming normal operations
>>>> 
>>>> 
>>>> -- 
>>>> Jo Rhett
>>>> [email protected]
>>>> (415) 999-1798
>>>> 
>>>> -- 
>>>> Jo Rhett
>>>> Net Consonance : consonant endings by net philanthropy, open source and 
>>>> other randomness
>>>> 
>>> 
>>> -- 
>>> Jo Rhett
>>> Net Consonance : consonant endings by net philanthropy, open source and 
>>> other randomness
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Puppet Users" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/puppet-users?hl=en.
>>> 
>>> 
>>> 
>>> -- 
>>> Nigel Kersten
>>> Product Manager, Puppet Labs
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Puppet Users" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/puppet-users?hl=en.
>> 
>> -- 
>> Jo Rhett
>> Net Consonance : consonant endings by net philanthropy, open source and 
>> other randomness
>> 
> 
> -- 
> Jo Rhett
> Net Consonance : consonant endings by net philanthropy, open source and other 
> randomness
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Puppet Users" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/puppet-users?hl=en.

-- 
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source and other 
randomness

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to