Issue #1095 has been updated by demotivator.

I think I'm suffering from this issue as well. I can't say I've noticed the 
open sockets specifically, but ever since I enabled storeconfigs I can't keep 
my puppetmaster running for more than a few minutes. Typically it starts up ok, 
handles a few requests, and then at some point it just starts consuming 100% 
CPU. At that point, it stops responding to new connections from puppetd, and if 
I leave it alone for awhile it'll eventually just crash. I'm using PostgreSQL 
for the DB for storeconfigs. I tried turning on debugging / trace / etc, but 
puppetmaster never seems to log any real errors. I ran strace against it, and 
typically when it hits the 100% CPU point, all it ever logs after that are 
thousands of lines like this:

23775 select(16, [8 10 14], [], [], {0, 832767}) = -1 EBADF (Bad file 
descriptor)

I've tried to track down what filehandle that is, but I never did see an open() 
call returning 16 (I'm not great at reading strace dumps, maybe I'm not doing 
it quite right).

Oh, I should mention I'm running Ubuntu 8.04 using the 0.24.4-3 packages of 
puppet and puppetmaster. 
ulimit -n => 1024
I'm currently using Webrick, as I only have about 40 clients checking in at 30 
minute intervals and didn't think I really needed to add the load balancing.

Let me know if I can provide any more details or if there's any debugging 
anyone wants me to try, I've got time for it since this is causing me a lot of 
pain. I really wanted storeconfigs to work, as I'm trying out the external 
resources collection to generate nagios configs.
----------------------------------------
Bug #1095: Puppetmaster leaving half-open connections
http://reductivelabs.com/redmine/issues/show/1095

Author: fs
Status: Needs more information
Priority: High
Assigned to: luke
Category: network
Target version: 0.25.0
Complexity: Medium
Patch: None
Affected version: 
Keywords: 


After a period of time ranging from a few hours to several days, puppetmaster 
begins leaving half open TCP connections in a CLOSE_WAIT state.  It usually 
seems to happen to connections from clients, though at least once I've seen it 
hit the database connection (MySQL).  Here's an example:


<pre>
[EMAIL PROTECTED] ~]# lsof -i | grep 8140
puppetd   13420     root    7u  IPv4 48150014       TCP 
lorien.wpi.edu:52225->lorien.wpi.edu:8140 (ESTABLISHED)
puppetmas 13744   puppet   10u  IPv4 47981997       TCP *:8140 (LISTEN)
puppetmas 13744   puppet  205u  IPv4 48146861       TCP 
lorien.wpi.edu:8140->DELENN.WPI.EDU:63688 (CLOSE_WAIT)
puppetmas 13744   puppet  206u  IPv4 48145681       TCP 
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:54630 (CLOSE_WAIT)
puppetmas 13744   puppet  208u  IPv4 48146636       TCP 
lorien.wpi.edu:8140->DELENN.WPI.EDU:63687 (CLOSE_WAIT)
puppetmas 13744   puppet  210u  IPv4 48146848       TCP 
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:58605 (CLOSE_WAIT)
</pre>

Once puppetmaster starts leaking sockets like this, it seems unable to answer 
any new requests.  In this example, you can see that the puppet client on the 
local machine (lorien) has opened a connection to puppetmaster, but 
puppetmaster has not responded.  None of the log files on either master or 
client show that any progress has been made.

Sending a HUP to the server generates "Restarting" and "Shutting down" messages 
in syslog, but it never restarts.  lsof shows that there are puppetmaster 
processes hanging around keeping the original set of half open sockets open, 
but nothing is listening for new connections anymore:


<pre>
[EMAIL PROTECTED] ~]# lsof -i | grep 8140
puppetmas 13744   puppet  205u  IPv4 48146861       TCP 
lorien.wpi.edu:8140->DELENN.WPI.EDU:63688 (CLOSE_WAIT)
puppetmas 13744   puppet  206u  IPv4 48145681       TCP 
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:54630 (CLOSE_WAIT)
puppetmas 13744   puppet  208u  IPv4 48146636       TCP 
lorien.wpi.edu:8140->DELENN.WPI.EDU:63687 (CLOSE_WAIT)
puppetmas 13744   puppet  210u  IPv4 48146848       TCP 
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:58605 (CLOSE_WAIT)
</pre>

A full restart of puppetmaster appears to be the only way to get things flowing 
again.

This is on 0.24.1 plus the patch from ticket 959.  Let me know what other 
debugging info you'd like me to gather up.


----------------------------------------
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://reductivelabs.com/redmine/my/account

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to