Issue #1095 has been updated by josb.
demotivator wrote:
> 23775 select(16, [8 10 14], [], [], {0, 832767}) = -1 EBADF (Bad file
> descriptor)
>
> I've tried to track down what filehandle that is, but I never did see an
> open() call returning 16 (I'm not great at reading strace dumps, maybe I'm
> not doing it quite right).
select(2) (on FreeBSD) says:
<pre>
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds,
struct timeval *timeout);
</pre>
So 16 is the number of fds in the fd_set to check. In the above example EBADF
indicates that one of the descriptor sets specified an invalid descriptor, so
at least one of 8,10 or 14 is invalid.
----------------------------------------
Bug #1095: Puppetmaster leaving half-open connections
http://projects.reductivelabs.com/issues/show/1095
Author: fs
Status: Needs more information
Priority: High
Assigned to: luke
Category: network
Target version: 0.25.0
Complexity: Medium
Patch: None
Affected version:
Keywords:
After a period of time ranging from a few hours to several days, puppetmaster
begins leaving half open TCP connections in a CLOSE_WAIT state. It usually
seems to happen to connections from clients, though at least once I've seen it
hit the database connection (MySQL). Here's an example:
<pre>
[EMAIL PROTECTED] ~]# lsof -i | grep 8140
puppetd 13420 root 7u IPv4 48150014 TCP
lorien.wpi.edu:52225->lorien.wpi.edu:8140 (ESTABLISHED)
puppetmas 13744 puppet 10u IPv4 47981997 TCP *:8140 (LISTEN)
puppetmas 13744 puppet 205u IPv4 48146861 TCP
lorien.wpi.edu:8140->DELENN.WPI.EDU:63688 (CLOSE_WAIT)
puppetmas 13744 puppet 206u IPv4 48145681 TCP
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:54630 (CLOSE_WAIT)
puppetmas 13744 puppet 208u IPv4 48146636 TCP
lorien.wpi.edu:8140->DELENN.WPI.EDU:63687 (CLOSE_WAIT)
puppetmas 13744 puppet 210u IPv4 48146848 TCP
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:58605 (CLOSE_WAIT)
</pre>
Once puppetmaster starts leaking sockets like this, it seems unable to answer
any new requests. In this example, you can see that the puppet client on the
local machine (lorien) has opened a connection to puppetmaster, but
puppetmaster has not responded. None of the log files on either master or
client show that any progress has been made.
Sending a HUP to the server generates "Restarting" and "Shutting down" messages
in syslog, but it never restarts. lsof shows that there are puppetmaster
processes hanging around keeping the original set of half open sockets open,
but nothing is listening for new connections anymore:
<pre>
[EMAIL PROTECTED] ~]# lsof -i | grep 8140
puppetmas 13744 puppet 205u IPv4 48146861 TCP
lorien.wpi.edu:8140->DELENN.WPI.EDU:63688 (CLOSE_WAIT)
puppetmas 13744 puppet 206u IPv4 48145681 TCP
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:54630 (CLOSE_WAIT)
puppetmas 13744 puppet 208u IPv4 48146636 TCP
lorien.wpi.edu:8140->DELENN.WPI.EDU:63687 (CLOSE_WAIT)
puppetmas 13744 puppet 210u IPv4 48146848 TCP
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:58605 (CLOSE_WAIT)
</pre>
A full restart of puppetmaster appears to be the only way to get things flowing
again.
This is on 0.24.1 plus the patch from ticket 959. Let me know what other
debugging info you'd like me to gather up.
----------------------------------------
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://reductivelabs.com/redmine/my/account
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en
-~----------~----~----~----~------~----~------~--~---