Issue #14108 has been updated by John Vestrum.
It turns out that the closed sockets were connections to the ldap server, and the issue lies somewhere in the system ldap libraries. I replaced the Debian packages libnss-ldap/nscd with libnss-ldapd/nslcd/unscd and it's fine now. ---------------------------------------- Bug #14108: puppet agent stuck in poll(), getting POLLERR https://projects.puppetlabs.com/issues/14108#change-61311 Author: John Vestrum Status: Investigating Priority: Normal Assignee: Category: Target version: Affected Puppet version: 2.7.13 Keywords: Branch: On a couple of my nodes (running Debian 2.6.32-41squeeze2), the puppet agent appears to start normally and "puppet agent --test" completes fine. However within a couple hours or less, the agent is running at 99% CPU and when I strace it I see this: <pre> poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1) = 1 ([{fd=7, revents=POLLIN|POLLERR|POLLHUP}]) poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1) = 1 ([{fd=7, revents=POLLIN|POLLERR|POLLHUP}]) poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1) = 1 ([{fd=7, revents=POLLIN|POLLERR|POLLHUP}]) poll([{fd=7, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, -1) = 1 ([{fd=7, revents=POLLIN|POLLERR|POLLHUP}]) ... etc, 1000's per second ... </pre> /proc/pid/fd shows that fd=7 (and it is _always_ fd=7 when this happens) is pointing to a socket: <pre> # ls -l /proc/18026/fd total 0 lr-x------ 1 root root 64 Apr 20 08:30 0 -> /dev/null l-wx------ 1 root root 64 Apr 20 08:30 1 -> /dev/null l-wx------ 1 root root 64 Apr 20 08:30 2 -> /dev/null lr-x------ 1 root root 64 Apr 20 08:30 3 -> pipe:[3687384] l-wx------ 1 root root 64 Apr 20 08:30 4 -> pipe:[3687384] lrwx------ 1 root root 64 Apr 20 08:30 5 -> socket:[3687393] lrwx------ 1 root root 64 Apr 20 08:30 7 -> socket:[3689060] lr-x------ 1 root root 64 Apr 20 08:30 8 -> /dev/urandom </pre> But that socket doesn't seem to exist: <pre> # lsof -p 18026 | grep 3689060 puppet 18026 root 7u sock 0,6 0t0 3689060 can't identify protocol # netstat -lanep | grep 3689060 # </pre> So it appears that poll() is returning POLLERR but puppet continues to attempt to use the fd. Which I think is a bug but I'm not really a socket programmer. I'm still trying to figure out why this happens on only a couple of my nodes and not others. I upgraded the client from 2.6.2 to 2.7.13, same behaviour. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
