Issue #11360 has been updated by Jo Rhett.
Completely different. If you read 10418 you'll see that puppet agent's never ran at all, no matter how many times you restart them. In this situation we had 50-something systems leave puppetdlock files in the state directory at exactly the same time. Every one of the puppetdlocks was dated 2:45pm PST. No commonality of kernel version. Restarting the puppet agent works perfectly fine, and they are all back up again and running every 30 minutes. Thus, not the same problem. I haven't had a chance to correlate times, but I imagine this is exactly when I restarted the puppet server due to it being unusable due to running out of swap space. That's a separate issue I'll be writing in about -- how an LDAP failure caused ruby processes to zombie and fill up swap. But this ticket is about how 50-something systems were left in a state which needed a manual touching to get them going again. Yes, the puppet server not responding is traumatic. No, it shouldn't leave the client in a state where it can't clean up after itself and get on with business. ---------------------------------------- Bug #11360: puppet client hangs after period of being unable to contact server https://projects.puppetlabs.com/issues/11360 Author: Jo Rhett Status: Re-opened Priority: Normal Assignee: Category: agent Target version: Affected Puppet version: 2.6.12 Keywords: Branch: We had some serious memory/swap issues with the puppet master today. I spent a few hours getting that worked out, and upgrading to passenger 3.0.11. After clearing up the issues we found that some 50 systems weren't up to date. Logging into these systems I found a puppetdlock file which was about 4.5 hours old and a running puppetd which was looping doing nothing. <pre> [03:30 root@ald002 ~]$ ls -la /var/lib/puppet/state total 164 drwxr-xr-t 3 root root 4096 Dec 12 22:45 . drwxr-xr-x 10 puppet puppet 4096 Oct 24 18:28 .. drwxr-xr-x 2 root root 4096 Sep 23 05:03 graphs -rw-rw---- 1 root root 1448 Dec 12 22:15 last_run_report.yaml -rw-rw---- 1 root root 38 Dec 12 22:15 last_run_summary.yaml -rw-r--r-- 1 root root 4 Dec 12 22:45 puppetdlock -rw-rw---- 1 root root 109285 Dec 12 22:15 state.yaml </pre> Here's an example log: <pre> Dec 12 21:12:27 ald002 puppet-agent[6945]: Could not retrieve catalog from remote server: Connection refused - connect(2) Dec 12 21:12:27 ald002 puppet-agent[6945]: Using cached catalog Dec 12 21:12:27 ald002 puppet-agent[6945]: Could not retrieve catalog; skipping run Dec 12 21:12:27 ald002 puppet-agent[6945]: Could not send report: Connection refused - connect(2) Dec 12 21:42:30 ald002 puppet-agent[6945]: Could not retrieve catalog from remote server: Connection refused - connect(2) Dec 12 21:42:30 ald002 puppet-agent[6945]: Using cached catalog Dec 12 21:42:30 ald002 puppet-agent[6945]: Could not retrieve catalog; skipping run Dec 12 21:42:30 ald002 puppet-agent[6945]: Could not send report: Connection refused - connect(2) Dec 12 22:15:57 ald002 puppet-agent[6945]: Could not run Puppet configuration client: execution expired </pre> <pre> [03:32 root@ald002 ~]$ strace -p 6945 Process 6945 attached - interrupt to quit select(11, [7 9], [], [], {1, 107000}) = 0 (Timeout) select(11, [7 9], [], [], {0, 0}) = 0 (Timeout) select(11, [9], [], [], {0, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(11, [7 9], [], [], {1, 999999}) = 0 (Timeout) select(11, [7 9], [], [], {0, 0}) = 0 (Timeout) select(11, [9], [], [], {0, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(11, [7 9], [], [], {1, 999999}) = 0 (Timeout) select(11, [7 9], [], [], {0, 602}) = 0 (Timeout) select(11, [7 9], [], [], {0, 0}) = 0 (Timeout) select(11, [9], [], [], {0, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(11, [7 9], [], [], {1, 999997}) = 0 (Timeout) select(11, [7 9], [], [], {0, 566}) = 0 (Timeout) select(11, [7 9], [], [], {0, 0}) = 0 (Timeout) select(11, [9], [], [], {0, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(11, [7 9], [], [], {1, 999998}) = 0 (Timeout) select(11, [7 9], [], [], {0, 0}) = 0 (Timeout) select(11, [9], [], [], {0, 0}) = 0 (Timeout) rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 select(11, [7 9], [], [], {1, 999999} <unfinished ...> Process 6945 detached </pre> -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
