[Puppet - Bug #1095] (Re-opened) Puppetmaster leaving half-open connections

redmine Sun, 28 Jun 2009 02:49:42 -0700

Issue #1095 has been updated by Ricky Zhou.

Status changed from Closed to Re-opened
Affected version changed from 0.22.1 to 0.25.0beta2


Hi, I am currently testing 0.25beta2 and I am also seeing this issue 
(specifically when stored configs is enabled).  I have ruby-mysql and I don't 
see any warnings about using the built-in mysql functions, so that doesn't seem 
to be the problem in this case.  

Specifically, I am seeing the same symptom where after starting puppetmasterd 
(I use mongrel behind apache/mod_proxy_balancer) and attempting a puppet run, 
the puppet run times out, with the connection stuck in CLOSE_WAIT.  The mongrel 
instance then stops handling requests (so something like a curl 
http://127.0.0.1:18141/ will hang after this happens).  

The output of puppetd -t I get (after a long wait) is:

<pre>
notice: Ignoring --listen on onetime run
/usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill': execution expired 
(Timeout::Error)
        from /usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
        from /usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
        from /usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line'
        from /usr/lib/ruby/1.8/net/http.rb:2009:in `read_new'
        from /usr/lib/ruby/1.8/net/http.rb:1050:in `request'
        from /usr/lib/ruby/1.8/net/http.rb:1037:in `request'
        from /usr/lib/ruby/1.8/net/http.rb:543:in `start'
        from /usr/lib/ruby/1.8/net/http.rb:1035:in `request'
         ... 18 levels...
        from /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:226:in `send'
        from /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:226:in 
`run_command'
        from /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:217:in `run'
        from /usr/sbin/puppetd:159
</pre>

and a netstat -pan | grep 1814 gives:

<pre>
tcp        0      0 127.0.0.1:18140             0.0.0.0:*                   
LISTEN      23420/ruby     
tcp        0      0 127.0.0.1:18141             0.0.0.0:*                   
LISTEN      23472/ruby     
tcp        0      0 127.0.0.1:18142             0.0.0.0:*                   
LISTEN      23524/ruby     
tcp        0      0 127.0.0.1:18143             0.0.0.0:*                   
LISTEN      23576/ruby     
tcp        0      0 127.0.0.1:18140             127.0.0.1:46870             
ESTABLISHED 23420/ruby     
tcp        0      0 127.0.0.1:47823             127.0.0.1:18141             
FIN_WAIT2   -              
tcp        0      0 127.0.0.1:46870             127.0.0.1:18140             
ESTABLISHED 6050/httpd     
tcp        1      0 127.0.0.1:18141             127.0.0.1:47823             
CLOSE_WAIT  23472/ruby     
tcp        1      0 127.0.0.1:18141             127.0.0.1:47829             
CLOSE_WAIT  23472/ruby     
</pre>

I did one more test where I pointed mod_proxy_balancer at only one mongrel 
instance, then did an strace -p on it and tried a puppet run.  The strace 
output I got from that is available at 
http://ricky.fedorapeople.org/puppetmaster.strace.

If I can help with providing any more debugging information, I'm ricky in 
#puppet on Freenode - feel free to ping me any time.
----------------------------------------
Bug #1095: Puppetmaster leaving half-open connections
http://projects.reductivelabs.com/issues/1095

Author: Frank Sweetser
Status: Re-opened
Priority: High
Assigned to: Luke Kanies
Category: network
Target version: 0.24.8
Complexity: Medium
Patch: None
Affected version: 0.25.0beta2
Keywords: 


After a period of time ranging from a few hours to several days, puppetmaster 
begins leaving half open TCP connections in a CLOSE_WAIT state.  It usually 
seems to happen to connections from clients, though at least once I've seen it 
hit the database connection (MySQL).  Here's an example:


<pre>
[r...@lorien ~]# lsof -i | grep 8140
puppetd   13420     root    7u  IPv4 48150014       TCP 
lorien.wpi.edu:52225->lorien.wpi.edu:8140 (ESTABLISHED)
puppetmas 13744   puppet   10u  IPv4 47981997       TCP *:8140 (LISTEN)
puppetmas 13744   puppet  205u  IPv4 48146861       TCP 
lorien.wpi.edu:8140->DELENN.WPI.EDU:63688 (CLOSE_WAIT)
puppetmas 13744   puppet  206u  IPv4 48145681       TCP 
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:54630 (CLOSE_WAIT)
puppetmas 13744   puppet  208u  IPv4 48146636       TCP 
lorien.wpi.edu:8140->DELENN.WPI.EDU:63687 (CLOSE_WAIT)
puppetmas 13744   puppet  210u  IPv4 48146848       TCP 
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:58605 (CLOSE_WAIT)
</pre>

Once puppetmaster starts leaking sockets like this, it seems unable to answer 
any new requests.  In this example, you can see that the puppet client on the 
local machine (lorien) has opened a connection to puppetmaster, but 
puppetmaster has not responded.  None of the log files on either master or 
client show that any progress has been made.

Sending a HUP to the server generates "Restarting" and "Shutting down" messages 
in syslog, but it never restarts.  lsof shows that there are puppetmaster 
processes hanging around keeping the original set of half open sockets open, 
but nothing is listening for new connections anymore:


<pre>
[r...@lorien ~]# lsof -i | grep 8140
puppetmas 13744   puppet  205u  IPv4 48146861       TCP 
lorien.wpi.edu:8140->DELENN.WPI.EDU:63688 (CLOSE_WAIT)
puppetmas 13744   puppet  206u  IPv4 48145681       TCP 
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:54630 (CLOSE_WAIT)
puppetmas 13744   puppet  208u  IPv4 48146636       TCP 
lorien.wpi.edu:8140->DELENN.WPI.EDU:63687 (CLOSE_WAIT)
puppetmas 13744   puppet  210u  IPv4 48146848       TCP 
lorien.wpi.edu:8140->IVANOVA.WPI.EDU:58605 (CLOSE_WAIT)
</pre>

A full restart of puppetmaster appears to be the only way to get things flowing 
again.

This is on 0.24.1 plus the patch from ticket 959.  Let me know what other 
debugging info you'd like me to gather up.


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://reductivelabs.com/redmine/my/account

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en
-~----------~----~----~----~------~----~------~--~---

[Puppet - Bug #1095] (Re-opened) Puppetmaster leaving half-open connections

Reply via email to