Okay, this has happened again. Puppet master stopped logging catalog compiles, every server stopped returning results and the global queue went quickly through the roof in like 9 minutes. It appears puppet master is stopping dead in its tracks without logging any errors.
# passenger-status ----------- General information ----------- max = 20 count = 20 active = 20 inactive = 0 Waiting on global queue: 209 ----------- Domains ----------- /etc/puppet/rack: PID: 25783 Sessions: 1 Processed: 329 Uptime: 2h 52m 7s PID: 25831 Sessions: 1 Processed: 4 Uptime: 2h 52m 5s PID: 28517 Sessions: 1 Processed: 6 Uptime: 2h 22m 0s PID: 25802 Sessions: 1 Processed: 714 Uptime: 2h 52m 6s PID: 30905 Sessions: 1 Processed: 13 Uptime: 1h 50m 27s PID: 25864 Sessions: 1 Processed: 709 Uptime: 2h 52m 4s PID: 31028 Sessions: 1 Processed: 347 Uptime: 1h 50m 21s PID: 28944 Sessions: 1 Processed: 377 Uptime: 2h 21m 50s PID: 31090 Sessions: 1 Processed: 266 Uptime: 1h 50m 18s PID: 577 Sessions: 1 Processed: 400 Uptime: 1h 27m 27s PID: 418 Sessions: 1 Processed: 647 Uptime: 1h 28m 2s PID: 1247 Sessions: 1 Processed: 133 Uptime: 1h 19m 3s PID: 1474 Sessions: 1 Processed: 52 Uptime: 1h 18m 9s PID: 594 Sessions: 1 Processed: 378 Uptime: 1h 27m 26s PID: 4706 Sessions: 1 Processed: 414 Uptime: 48m 5s PID: 4775 Sessions: 1 Processed: 218 Uptime: 47m 28s PID: 4854 Sessions: 1 Processed: 584 Uptime: 47m 23s PID: 7774 Sessions: 1 Processed: 165 Uptime: 14m 27s PID: 7902 Sessions: 1 Processed: 44 Uptime: 13m 44s PID: 8149 Sessions: 1 Processed: 541 Uptime: 11m 21s On Dec 2, 2011, at 10:58 AM, Jo Rhett wrote: > I came in this morning to find all the servers all locked up solid: > > # passenger-status > ----------- General information ----------- > max = 20 > count = 20 > active = 20 > inactive = 0 > Waiting on global queue: 236 > > ----------- Domains ----------- > /etc/puppet/rack: > PID: 2720 Sessions: 1 Processed: 939 Uptime: 9h 22m 18s > PID: 1615 Sessions: 1 Processed: 947 Uptime: 9h 23m 14s > PID: 1596 Sessions: 1 Processed: 607 Uptime: 9h 23m 15s > PID: 1722 Sessions: 1 Processed: 953 Uptime: 9h 23m 9s > PID: 2218 Sessions: 1 Processed: 378 Uptime: 9h 22m 43s > PID: 4286 Sessions: 1 Processed: 178 Uptime: 8h 50m 58s > PID: 5749 Sessions: 1 Processed: 708 Uptime: 8h 20m 20s > PID: 4253 Sessions: 1 Processed: 820 Uptime: 8h 51m 1s > PID: 5624 Sessions: 1 Processed: 126 Uptime: 8h 20m 24s > PID: 7328 Sessions: 1 Processed: 811 Uptime: 7h 49m 17s > PID: 7274 Sessions: 1 Processed: 984 Uptime: 7h 49m 20s > PID: 8761 Sessions: 1 Processed: 85 Uptime: 7h 18m 50s > PID: 9135 Sessions: 1 Processed: 907 Uptime: 7h 16m 27s > PID: 8777 Sessions: 1 Processed: 342 Uptime: 7h 18m 49s > PID: 10508 Sessions: 1 Processed: 51 Uptime: 6h 47m 6s > PID: 10853 Sessions: 1 Processed: 603 Uptime: 6h 43m 9s > PID: 10620 Sessions: 1 Processed: 939 Uptime: 6h 45m 52s > PID: 11438 Sessions: 1 Processed: 870 Uptime: 6h 30m 8s > PID: 12582 Sessions: 1 Processed: 448 Uptime: 6h 9m 59s > PID: 12670 Sessions: 1 Processed: 400 Uptime: 6h 8m 46s > > For comparison, most of our server processes recycle within 20 minutes > normally, as they hit 1000 really fast. > > # you probably want to tune these settings > PassengerHighPerformance on > PassengerUseGlobalQueue on > PassengerMaxPoolSize 20 > PassengerPoolIdleTime 1800 > PassengerMaxRequests 1000 > #PassengerStatThrottleRate 120 > RackAutoDetect Off > RailsAutoDetect Off > > There is nothing useful in the system logs. They just stopped: > > Dec 2 12:06:34 axxats003 puppet-master[12670]: Compiled catalog for > axxamx001.sjc.company.com in environment production > in 1.76 seconds > Dec 2 12:06:37 axxats003 puppet-master[12670]: Compiled catalog for > axxatn016.sjc.company.com in environment production > in 1.64 seconds > Dec 2 12:06:40 axxats003 puppet-master[12670]: Compiled catalog for > axaafc001.company.com in environment production i > n 1.70 seconds > Dec 2 14:10:02 axxats003 puppet-agent[16965]: Reopening log files > Dec 2 14:10:02 axxats003 puppet-agent[16965]: Starting Puppet client version > 2.6.12 > Dec 2 14:12:04 axxats003 puppet-agent[16965]: Could not retrieve catalog > from remote server: execution expired > Dec 2 14:12:04 axxats003 puppet-agent[16965]: Using cached catalog > > (every 30 minutes puppet agent says the same thing until I restart the puppet > master) > > Dec 2 18:06:09 axxats003 puppet-master[25783]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:10 axxats003 puppet-master[25802]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:11 axxats003 puppet-master[25831]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:12 axxats003 puppet-master[25864]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:13 axxats003 puppet-master[25897]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:14 axxats003 puppet-master[25922]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:15 axxats003 puppet-master[25947]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:16 axxats003 puppet-master[25972]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:17 axxats003 puppet-master[25997]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:18 axxats003 puppet-master[26019]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:19 axxats003 puppet-master[26056]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:20 axxats003 puppet-master[26081]: Starting Puppet master > version 2.6.12 > Dec 2 18:06:21 axxats003 puppet-master[26115]: Starting Puppet master > version 2.6.12 > Dec 2 18:14:32 axxats003 puppet-master[26115]: Compiled catalog for > axxatn018.sjc.company.com in environment production in 3.63 seconds > Dec 2 18:14:37 axxats003 puppet-master[26115]: Compiled catalog for > axxamb002.sjc.company.com in environment production in 1.47 seconds > Dec 2 18:14:50 axxats003 puppet-master[26115]: Compiled catalog for > axxasn001.sjc.company.com in environment production in 1.57 seconds > > There are no other messages in /var/log/messages -- the system was otherwise > not busy. Apache error log only observed max clients get hit: > [Fri Dec 02 08:42:43 2011] [notice] Apache/2.2.3 (CentOS) configured -- > resuming normal operations > [Fri Dec 02 12:23:46 2011] [error] server reached MaxClients setting, > consider raising the MaxClients setting > [Fri Dec 02 18:06:07 2011] [notice] caught SIGTERM, shutting down > [Fri Dec 02 18:06:08 2011] [notice] suEXEC mechanism enabled (wrapper: > /usr/sbin/suexec) > [Fri Dec 02 18:06:08 2011] [warn] RSA server certificate CommonName (CN) > `puppetmaster.company.com' does NOT match server name!? > [Fri Dec 02 18:06:08 2011] [notice] Digest: generating secret for digest > authentication ... > [Fri Dec 02 18:06:08 2011] [notice] Digest: done > [Fri Dec 02 18:06:08 2011] [warn] RSA server certificate CommonName (CN) > `puppetmaster.company.com' does NOT match server name!? > [Fri Dec 02 18:06:08 2011] [notice] Apache/2.2.3 (CentOS) configured -- > resuming normal operations > > > -- > Jo Rhett > [email protected] > (415) 999-1798 > > -- > Jo Rhett > Net Consonance : consonant endings by net philanthropy, open source and other > randomness > -- Jo Rhett Net Consonance : consonant endings by net philanthropy, open source and other randomness -- You received this message because you are subscribed to the Google Groups "Puppet Users" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-users?hl=en.
