Hm, you know I don't think that it's a sudden lock of all 20 passenger clients. I think it's a slow lockup of various puppet sessions until all 20 are locked. Here's an example: every one of the "active" sessions below with an uptime longer than 30 minutes has had the same "processed" number for more than 30 minutes at this time. So in theory, they've been processing the same session for more than 30 minutes. Somehow, I don't think so. I think those sessions are locked up. And what is happening is that eventually all 20 processes are hung and we are dead in the water.
Fri Dec 2 23:05:59 UTC 2011 ----------- General information ----------- max = 20 count = 18 active = 12 inactive = 6 Waiting on global queue: 0 ----------- Domains ----------- /etc/puppet/rack: PID: 21021 Sessions: 0 Processed: 362 Uptime: 5m 37s PID: 21005 Sessions: 0 Processed: 537 Uptime: 5m 38s PID: 21555 Sessions: 0 Processed: 69 Uptime: 30s PID: 21571 Sessions: 0 Processed: 62 Uptime: 29s PID: 20989 Sessions: 0 Processed: 209 Uptime: 5m 39s PID: 20968 Sessions: 0 Processed: 157 Uptime: 5m 41s PID: 9221 Sessions: 1 Processed: 903 Uptime: 2h 5m 55s PID: 9340 Sessions: 1 Processed: 764 Uptime: 2h 4m 58s PID: 10379 Sessions: 1 Processed: 568 Uptime: 1h 57m 37s PID: 11847 Sessions: 1 Processed: 712 Uptime: 1h 41m 13s PID: 11686 Sessions: 1 Processed: 314 Uptime: 1h 41m 19s PID: 10845 Sessions: 1 Processed: 511 Uptime: 1h 48m 52s PID: 11650 Sessions: 1 Processed: 747 Uptime: 1h 41m 21s PID: 14967 Sessions: 1 Processed: 84 Uptime: 1h 8m 28s PID: 17605 Sessions: 1 Processed: 497 Uptime: 44m 41s PID: 20342 Sessions: 1 Processed: 0 Uptime: 13m 14s PID: 20358 Sessions: 1 Processed: 54 Uptime: 13m 13s PID: 18098 Sessions: 1 Processed: 854 Uptime: 35m 46s On Dec 2, 2011, at 2:22 PM, Jo Rhett wrote: > On Dec 2, 2011, at 1:30 PM, Nigel Kersten wrote: >> On Fri, Dec 2, 2011 at 1:03 PM, Jo Rhett <[email protected]> wrote: >> Okay, this has happened again. Puppet master stopped logging catalog >> compiles, every server stopped returning results and the global queue went >> quickly through the roof in like 9 minutes. It appears puppet master is >> stopping dead in its tracks without logging any errors. >> >> A really quick test would be to start a webrick puppetmaster on an alternate >> port with the same configuration file in debug mode and then puppet against >> it to see if there's a problem at that level, >> >> (on master) >> puppet master --no-daemonize --verbose --debug --masterport 9140 (for >> example) >> >> (on an agent) >> puppet agent --test --masterport 9140 > > This works perfectly fine. > >> If that doesn't show anything, let us know whether you're running Apache >> prefork or worker, and your relevant pool regulation settings like: >> >> StartServers >> MinSpareServers >> MaxSpareServers >> ServerLimit >> MaxClients >> MaxRequestsPerChild > > pre fork with the following settings: > > StartServers 8 > MinSpareServers 5 > MaxSpareServers 20 > ServerLimit 256 > MaxClients 256 > MaxRequestsPerChild 4000 > >> # passenger-status >> ----------- General information ----------- >> max = 20 >> count = 20 >> active = 20 >> inactive = 0 >> Waiting on global queue: 209 >> >> ----------- Domains ----------- >> /etc/puppet/rack: >> PID: 25783 Sessions: 1 Processed: 329 Uptime: 2h 52m 7s >> PID: 25831 Sessions: 1 Processed: 4 Uptime: 2h 52m 5s >> PID: 28517 Sessions: 1 Processed: 6 Uptime: 2h 22m 0s >> PID: 25802 Sessions: 1 Processed: 714 Uptime: 2h 52m 6s >> PID: 30905 Sessions: 1 Processed: 13 Uptime: 1h 50m 27s >> PID: 25864 Sessions: 1 Processed: 709 Uptime: 2h 52m 4s >> PID: 31028 Sessions: 1 Processed: 347 Uptime: 1h 50m 21s >> PID: 28944 Sessions: 1 Processed: 377 Uptime: 2h 21m 50s >> PID: 31090 Sessions: 1 Processed: 266 Uptime: 1h 50m 18s >> PID: 577 Sessions: 1 Processed: 400 Uptime: 1h 27m 27s >> PID: 418 Sessions: 1 Processed: 647 Uptime: 1h 28m 2s >> PID: 1247 Sessions: 1 Processed: 133 Uptime: 1h 19m 3s >> PID: 1474 Sessions: 1 Processed: 52 Uptime: 1h 18m 9s >> PID: 594 Sessions: 1 Processed: 378 Uptime: 1h 27m 26s >> PID: 4706 Sessions: 1 Processed: 414 Uptime: 48m 5s >> PID: 4775 Sessions: 1 Processed: 218 Uptime: 47m 28s >> PID: 4854 Sessions: 1 Processed: 584 Uptime: 47m 23s >> PID: 7774 Sessions: 1 Processed: 165 Uptime: 14m 27s >> PID: 7902 Sessions: 1 Processed: 44 Uptime: 13m 44s >> PID: 8149 Sessions: 1 Processed: 541 Uptime: 11m 21s >> >> >> On Dec 2, 2011, at 10:58 AM, Jo Rhett wrote: >>> I came in this morning to find all the servers all locked up solid: >>> >>> # passenger-status >>> ----------- General information ----------- >>> max = 20 >>> count = 20 >>> active = 20 >>> inactive = 0 >>> Waiting on global queue: 236 >>> >>> ----------- Domains ----------- >>> /etc/puppet/rack: >>> PID: 2720 Sessions: 1 Processed: 939 Uptime: 9h 22m 18s >>> PID: 1615 Sessions: 1 Processed: 947 Uptime: 9h 23m 14s >>> PID: 1596 Sessions: 1 Processed: 607 Uptime: 9h 23m 15s >>> PID: 1722 Sessions: 1 Processed: 953 Uptime: 9h 23m 9s >>> PID: 2218 Sessions: 1 Processed: 378 Uptime: 9h 22m 43s >>> PID: 4286 Sessions: 1 Processed: 178 Uptime: 8h 50m 58s >>> PID: 5749 Sessions: 1 Processed: 708 Uptime: 8h 20m 20s >>> PID: 4253 Sessions: 1 Processed: 820 Uptime: 8h 51m 1s >>> PID: 5624 Sessions: 1 Processed: 126 Uptime: 8h 20m 24s >>> PID: 7328 Sessions: 1 Processed: 811 Uptime: 7h 49m 17s >>> PID: 7274 Sessions: 1 Processed: 984 Uptime: 7h 49m 20s >>> PID: 8761 Sessions: 1 Processed: 85 Uptime: 7h 18m 50s >>> PID: 9135 Sessions: 1 Processed: 907 Uptime: 7h 16m 27s >>> PID: 8777 Sessions: 1 Processed: 342 Uptime: 7h 18m 49s >>> PID: 10508 Sessions: 1 Processed: 51 Uptime: 6h 47m 6s >>> PID: 10853 Sessions: 1 Processed: 603 Uptime: 6h 43m 9s >>> PID: 10620 Sessions: 1 Processed: 939 Uptime: 6h 45m 52s >>> PID: 11438 Sessions: 1 Processed: 870 Uptime: 6h 30m 8s >>> PID: 12582 Sessions: 1 Processed: 448 Uptime: 6h 9m 59s >>> PID: 12670 Sessions: 1 Processed: 400 Uptime: 6h 8m 46s >>> >>> For comparison, most of our server processes recycle within 20 minutes >>> normally, as they hit 1000 really fast. >>> >>> # you probably want to tune these settings >>> PassengerHighPerformance on >>> PassengerUseGlobalQueue on >>> PassengerMaxPoolSize 20 >>> PassengerPoolIdleTime 1800 >>> PassengerMaxRequests 1000 >>> #PassengerStatThrottleRate 120 >>> RackAutoDetect Off >>> RailsAutoDetect Off >>> >>> There is nothing useful in the system logs. They just stopped: >>> >>> Dec 2 12:06:34 axxats003 puppet-master[12670]: Compiled catalog for >>> axxamx001.sjc.company.com in environment production >>> in 1.76 seconds >>> Dec 2 12:06:37 axxats003 puppet-master[12670]: Compiled catalog for >>> axxatn016.sjc.company.com in environment production >>> in 1.64 seconds >>> Dec 2 12:06:40 axxats003 puppet-master[12670]: Compiled catalog for >>> axaafc001.company.com in environment production i >>> n 1.70 seconds >>> Dec 2 14:10:02 axxats003 puppet-agent[16965]: Reopening log files >>> Dec 2 14:10:02 axxats003 puppet-agent[16965]: Starting Puppet client >>> version 2.6.12 >>> Dec 2 14:12:04 axxats003 puppet-agent[16965]: Could not retrieve catalog >>> from remote server: execution expired >>> Dec 2 14:12:04 axxats003 puppet-agent[16965]: Using cached catalog >>> >>> (every 30 minutes puppet agent says the same thing until I restart the >>> puppet master) >>> >>> Dec 2 18:06:09 axxats003 puppet-master[25783]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:10 axxats003 puppet-master[25802]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:11 axxats003 puppet-master[25831]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:12 axxats003 puppet-master[25864]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:13 axxats003 puppet-master[25897]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:14 axxats003 puppet-master[25922]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:15 axxats003 puppet-master[25947]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:16 axxats003 puppet-master[25972]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:17 axxats003 puppet-master[25997]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:18 axxats003 puppet-master[26019]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:19 axxats003 puppet-master[26056]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:20 axxats003 puppet-master[26081]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:06:21 axxats003 puppet-master[26115]: Starting Puppet master >>> version 2.6.12 >>> Dec 2 18:14:32 axxats003 puppet-master[26115]: Compiled catalog for >>> axxatn018.sjc.company.com in environment production in 3.63 seconds >>> Dec 2 18:14:37 axxats003 puppet-master[26115]: Compiled catalog for >>> axxamb002.sjc.company.com in environment production in 1.47 seconds >>> Dec 2 18:14:50 axxats003 puppet-master[26115]: Compiled catalog for >>> axxasn001.sjc.company.com in environment production in 1.57 seconds >>> >>> There are no other messages in /var/log/messages -- the system was >>> otherwise not busy. Apache error log only observed max clients get hit: >>> [Fri Dec 02 08:42:43 2011] [notice] Apache/2.2.3 (CentOS) configured -- >>> resuming normal operations >>> [Fri Dec 02 12:23:46 2011] [error] server reached MaxClients setting, >>> consider raising the MaxClients setting >>> [Fri Dec 02 18:06:07 2011] [notice] caught SIGTERM, shutting down >>> [Fri Dec 02 18:06:08 2011] [notice] suEXEC mechanism enabled (wrapper: >>> /usr/sbin/suexec) >>> [Fri Dec 02 18:06:08 2011] [warn] RSA server certificate CommonName (CN) >>> `puppetmaster.company.com' does NOT match server name!? >>> [Fri Dec 02 18:06:08 2011] [notice] Digest: generating secret for digest >>> authentication ... >>> [Fri Dec 02 18:06:08 2011] [notice] Digest: done >>> [Fri Dec 02 18:06:08 2011] [warn] RSA server certificate CommonName (CN) >>> `puppetmaster.company.com' does NOT match server name!? >>> [Fri Dec 02 18:06:08 2011] [notice] Apache/2.2.3 (CentOS) configured -- >>> resuming normal operations >>> >>> >>> -- >>> Jo Rhett >>> [email protected] >>> (415) 999-1798 >>> >>> -- >>> Jo Rhett >>> Net Consonance : consonant endings by net philanthropy, open source and >>> other randomness >>> >> >> -- >> Jo Rhett >> Net Consonance : consonant endings by net philanthropy, open source and >> other randomness >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Puppet Users" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/puppet-users?hl=en. >> >> >> >> -- >> Nigel Kersten >> Product Manager, Puppet Labs >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Puppet Users" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/puppet-users?hl=en. > > -- > Jo Rhett > Net Consonance : consonant endings by net philanthropy, open source and other > randomness > -- Jo Rhett Net Consonance : consonant endings by net philanthropy, open source and other randomness -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
