Hm, you know I don't think that it's a sudden lock of all 20 passenger clients. 
 I think it's a slow lockup of various puppet sessions until all 20 are locked. 
 Here's an example: every one of the "active" sessions below with an uptime 
longer than 30 minutes has had the same "processed" number for more than 30 
minutes at this time.  So in theory, they've been processing the same session 
for more than 30 minutes.  Somehow, I don't think so.  I think those sessions 
are locked up.  And what is happening is that eventually all 20 processes are 
hung and we are dead in the water.

Fri Dec  2 23:05:59 UTC 2011
----------- General information -----------
max      = 20
count    = 18
active   = 12
inactive = 6
Waiting on global queue: 0

----------- Domains -----------
/etc/puppet/rack: 
  PID: 21021   Sessions: 0    Processed: 362     Uptime: 5m 37s
  PID: 21005   Sessions: 0    Processed: 537     Uptime: 5m 38s
  PID: 21555   Sessions: 0    Processed: 69      Uptime: 30s
  PID: 21571   Sessions: 0    Processed: 62      Uptime: 29s
  PID: 20989   Sessions: 0    Processed: 209     Uptime: 5m 39s
  PID: 20968   Sessions: 0    Processed: 157     Uptime: 5m 41s
  PID: 9221    Sessions: 1    Processed: 903     Uptime: 2h 5m 55s
  PID: 9340    Sessions: 1    Processed: 764     Uptime: 2h 4m 58s
  PID: 10379   Sessions: 1    Processed: 568     Uptime: 1h 57m 37s
  PID: 11847   Sessions: 1    Processed: 712     Uptime: 1h 41m 13s
  PID: 11686   Sessions: 1    Processed: 314     Uptime: 1h 41m 19s
  PID: 10845   Sessions: 1    Processed: 511     Uptime: 1h 48m 52s
  PID: 11650   Sessions: 1    Processed: 747     Uptime: 1h 41m 21s
  PID: 14967   Sessions: 1    Processed: 84      Uptime: 1h 8m 28s
  PID: 17605   Sessions: 1    Processed: 497     Uptime: 44m 41s
  PID: 20342   Sessions: 1    Processed: 0       Uptime: 13m 14s
  PID: 20358   Sessions: 1    Processed: 54      Uptime: 13m 13s
  PID: 18098   Sessions: 1    Processed: 854     Uptime: 35m 46s

On Dec 2, 2011, at 2:22 PM, Jo Rhett wrote:

> On Dec 2, 2011, at 1:30 PM, Nigel Kersten wrote:
>> On Fri, Dec 2, 2011 at 1:03 PM, Jo Rhett <[email protected]> wrote:
>> Okay, this has happened again.  Puppet master stopped logging catalog 
>> compiles, every server stopped returning results and the global queue went 
>> quickly through the roof in like 9 minutes.  It appears puppet master is 
>> stopping dead in its tracks without logging any errors.
>> 
>> A really quick test would be to start a webrick puppetmaster on an alternate 
>> port with the same configuration file in debug mode and then puppet against 
>> it to see if there's a problem at that level,
>> 
>> (on master)
>> puppet master --no-daemonize --verbose --debug --masterport 9140 (for 
>> example)
>> 
>> (on an agent)
>> puppet agent --test --masterport 9140
> 
> This works perfectly fine.
> 
>> If that doesn't show anything, let us know whether you're running Apache 
>> prefork or worker, and your relevant pool regulation settings like:
>> 
>> StartServers
>> MinSpareServers
>> MaxSpareServers
>> ServerLimit
>> MaxClients
>> MaxRequestsPerChild
> 
> pre fork  with the following settings:
> 
> StartServers       8
> MinSpareServers    5
> MaxSpareServers   20
> ServerLimit      256
> MaxClients       256
> MaxRequestsPerChild  4000
> 
>> # passenger-status
>> ----------- General information -----------
>> max      = 20
>> count    = 20
>> active   = 20
>> inactive = 0
>> Waiting on global queue: 209
>> 
>> ----------- Domains -----------
>> /etc/puppet/rack: 
>>   PID: 25783   Sessions: 1    Processed: 329     Uptime: 2h 52m 7s
>>   PID: 25831   Sessions: 1    Processed: 4       Uptime: 2h 52m 5s
>>   PID: 28517   Sessions: 1    Processed: 6       Uptime: 2h 22m 0s
>>   PID: 25802   Sessions: 1    Processed: 714     Uptime: 2h 52m 6s
>>   PID: 30905   Sessions: 1    Processed: 13      Uptime: 1h 50m 27s
>>   PID: 25864   Sessions: 1    Processed: 709     Uptime: 2h 52m 4s
>>   PID: 31028   Sessions: 1    Processed: 347     Uptime: 1h 50m 21s
>>   PID: 28944   Sessions: 1    Processed: 377     Uptime: 2h 21m 50s
>>   PID: 31090   Sessions: 1    Processed: 266     Uptime: 1h 50m 18s
>>   PID: 577     Sessions: 1    Processed: 400     Uptime: 1h 27m 27s
>>   PID: 418     Sessions: 1    Processed: 647     Uptime: 1h 28m 2s
>>   PID: 1247    Sessions: 1    Processed: 133     Uptime: 1h 19m 3s
>>   PID: 1474    Sessions: 1    Processed: 52      Uptime: 1h 18m 9s
>>   PID: 594     Sessions: 1    Processed: 378     Uptime: 1h 27m 26s
>>   PID: 4706    Sessions: 1    Processed: 414     Uptime: 48m 5s
>>   PID: 4775    Sessions: 1    Processed: 218     Uptime: 47m 28s
>>   PID: 4854    Sessions: 1    Processed: 584     Uptime: 47m 23s
>>   PID: 7774    Sessions: 1    Processed: 165     Uptime: 14m 27s
>>   PID: 7902    Sessions: 1    Processed: 44      Uptime: 13m 44s
>>   PID: 8149    Sessions: 1    Processed: 541     Uptime: 11m 21s
>> 
>> 
>> On Dec 2, 2011, at 10:58 AM, Jo Rhett wrote:
>>> I came in this morning to find all the servers all locked up solid:
>>> 
>>> # passenger-status
>>> ----------- General information -----------
>>> max      = 20
>>> count    = 20
>>> active   = 20
>>> inactive = 0
>>> Waiting on global queue: 236
>>> 
>>> ----------- Domains -----------
>>> /etc/puppet/rack: 
>>>  PID: 2720    Sessions: 1    Processed: 939     Uptime: 9h 22m 18s
>>>  PID: 1615    Sessions: 1    Processed: 947     Uptime: 9h 23m 14s
>>>  PID: 1596    Sessions: 1    Processed: 607     Uptime: 9h 23m 15s
>>>  PID: 1722    Sessions: 1    Processed: 953     Uptime: 9h 23m 9s
>>>  PID: 2218    Sessions: 1    Processed: 378     Uptime: 9h 22m 43s
>>>  PID: 4286    Sessions: 1    Processed: 178     Uptime: 8h 50m 58s
>>>  PID: 5749    Sessions: 1    Processed: 708     Uptime: 8h 20m 20s
>>>  PID: 4253    Sessions: 1    Processed: 820     Uptime: 8h 51m 1s
>>>  PID: 5624    Sessions: 1    Processed: 126     Uptime: 8h 20m 24s
>>>  PID: 7328    Sessions: 1    Processed: 811     Uptime: 7h 49m 17s
>>>  PID: 7274    Sessions: 1    Processed: 984     Uptime: 7h 49m 20s
>>>  PID: 8761    Sessions: 1    Processed: 85      Uptime: 7h 18m 50s
>>>  PID: 9135    Sessions: 1    Processed: 907     Uptime: 7h 16m 27s
>>>  PID: 8777    Sessions: 1    Processed: 342     Uptime: 7h 18m 49s
>>>  PID: 10508   Sessions: 1    Processed: 51      Uptime: 6h 47m 6s
>>>  PID: 10853   Sessions: 1    Processed: 603     Uptime: 6h 43m 9s
>>>  PID: 10620   Sessions: 1    Processed: 939     Uptime: 6h 45m 52s
>>>  PID: 11438   Sessions: 1    Processed: 870     Uptime: 6h 30m 8s
>>>  PID: 12582   Sessions: 1    Processed: 448     Uptime: 6h 9m 59s
>>>  PID: 12670   Sessions: 1    Processed: 400     Uptime: 6h 8m 46s
>>> 
>>> For comparison, most of our server processes recycle within 20 minutes 
>>> normally, as they hit 1000 really fast.
>>> 
>>> # you probably want to tune these settings
>>> PassengerHighPerformance on
>>> PassengerUseGlobalQueue on
>>> PassengerMaxPoolSize 20
>>> PassengerPoolIdleTime 1800
>>> PassengerMaxRequests 1000
>>> #PassengerStatThrottleRate 120
>>> RackAutoDetect Off
>>> RailsAutoDetect Off
>>> 
>>> There is nothing useful in the system logs.  They just stopped:
>>> 
>>> Dec  2 12:06:34 axxats003 puppet-master[12670]: Compiled catalog for 
>>> axxamx001.sjc.company.com in environment production 
>>> in 1.76 seconds
>>> Dec  2 12:06:37 axxats003 puppet-master[12670]: Compiled catalog for 
>>> axxatn016.sjc.company.com in environment production 
>>> in 1.64 seconds
>>> Dec  2 12:06:40 axxats003 puppet-master[12670]: Compiled catalog for 
>>> axaafc001.company.com in environment production i
>>> n 1.70 seconds
>>> Dec  2 14:10:02 axxats003 puppet-agent[16965]: Reopening log files
>>> Dec  2 14:10:02 axxats003 puppet-agent[16965]: Starting Puppet client 
>>> version 2.6.12
>>> Dec  2 14:12:04 axxats003 puppet-agent[16965]: Could not retrieve catalog 
>>> from remote server: execution expired
>>> Dec  2 14:12:04 axxats003 puppet-agent[16965]: Using cached catalog
>>> 
>>> (every 30 minutes puppet agent says the same thing until I restart the 
>>> puppet master)
>>> 
>>> Dec  2 18:06:09 axxats003 puppet-master[25783]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:10 axxats003 puppet-master[25802]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:11 axxats003 puppet-master[25831]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:12 axxats003 puppet-master[25864]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:13 axxats003 puppet-master[25897]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:14 axxats003 puppet-master[25922]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:15 axxats003 puppet-master[25947]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:16 axxats003 puppet-master[25972]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:17 axxats003 puppet-master[25997]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:18 axxats003 puppet-master[26019]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:19 axxats003 puppet-master[26056]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:20 axxats003 puppet-master[26081]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:06:21 axxats003 puppet-master[26115]: Starting Puppet master 
>>> version 2.6.12
>>> Dec  2 18:14:32 axxats003 puppet-master[26115]: Compiled catalog for 
>>> axxatn018.sjc.company.com in environment production in 3.63 seconds
>>> Dec  2 18:14:37 axxats003 puppet-master[26115]: Compiled catalog for 
>>> axxamb002.sjc.company.com in environment production in 1.47 seconds
>>> Dec  2 18:14:50 axxats003 puppet-master[26115]: Compiled catalog for 
>>> axxasn001.sjc.company.com in environment production in 1.57 seconds
>>> 
>>> There are no other messages in /var/log/messages -- the system was 
>>> otherwise not busy.  Apache error log only observed max clients get hit:
>>> [Fri Dec 02 08:42:43 2011] [notice] Apache/2.2.3 (CentOS) configured -- 
>>> resuming normal operations
>>> [Fri Dec 02 12:23:46 2011] [error] server reached MaxClients setting, 
>>> consider raising the MaxClients setting
>>> [Fri Dec 02 18:06:07 2011] [notice] caught SIGTERM, shutting down
>>> [Fri Dec 02 18:06:08 2011] [notice] suEXEC mechanism enabled (wrapper: 
>>> /usr/sbin/suexec)
>>> [Fri Dec 02 18:06:08 2011] [warn] RSA server certificate CommonName (CN) 
>>> `puppetmaster.company.com' does NOT match server name!?
>>> [Fri Dec 02 18:06:08 2011] [notice] Digest: generating secret for digest 
>>> authentication ...
>>> [Fri Dec 02 18:06:08 2011] [notice] Digest: done
>>> [Fri Dec 02 18:06:08 2011] [warn] RSA server certificate CommonName (CN) 
>>> `puppetmaster.company.com' does NOT match server name!?
>>> [Fri Dec 02 18:06:08 2011] [notice] Apache/2.2.3 (CentOS) configured -- 
>>> resuming normal operations
>>> 
>>> 
>>> -- 
>>> Jo Rhett
>>> [email protected]
>>> (415) 999-1798
>>> 
>>> -- 
>>> Jo Rhett
>>> Net Consonance : consonant endings by net philanthropy, open source and 
>>> other randomness
>>> 
>> 
>> -- 
>> Jo Rhett
>> Net Consonance : consonant endings by net philanthropy, open source and 
>> other randomness
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Puppet Users" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/puppet-users?hl=en.
>> 
>> 
>> 
>> -- 
>> Nigel Kersten
>> Product Manager, Puppet Labs
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Puppet Users" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/puppet-users?hl=en.
> 
> -- 
> Jo Rhett
> Net Consonance : consonant endings by net philanthropy, open source and other 
> randomness
> 

-- 
Jo Rhett
Net Consonance : consonant endings by net philanthropy, open source and other 
randomness

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-users?hl=en.

Reply via email to