Issue #2089 has been updated by Wouter D'Haeseleer.

I believe I have found the issue.
After adding a lot of debug prints in both puppet I saw that the read syscall 
on the socket had no timeout specified and therefore puppet hangs...

After looking at the source code of the latest ruby version I could found this 
issue http://bugs.ruby-lang.org/issues/show/4246

This is exactly what I see:
<pre>
If you open a SSL connection and it hangs while doing the SSL connection 
handshake then it does not timeout unless you end up hitting a TCP keepalive 
timeout.

The problem is that the open_timeout is only applied to the actual TCP socket 
being opened and not the SSL negotiation. 
</pre>


Since I'm running ruby 1.8.7 I'm hitting this bug.

Can puppet make an advice / faq for this case? If I google arround I can see 
quite a lot of people are suffering from this issue.

----------------------------------------
Bug #2089: puppet client on nodes hangs after various networking glitches
https://projects.puppetlabs.com/issues/2089#change-91101

* Author: Max Stepanov
* Status: Re-opened
* Priority: Normal
* Assignee: Nigel Kersten
* Category: plumbing
* Target version: 
* Affected Puppet version: 0.24.7
* Keywords: 
* Branch: 
----------------------------------------
sometimes i find puppet on nodes stuck. usually it is several nodes together.
they hang there doing "nothing". i restart those in order to get them running 
again.
it seems network glitches are responsible for this behavior.
here is a quick trace(it's the same on all nodes):

hey:~# ps aux| grep puppet
root     12629  0.5  2.1 129820 88896 ?        Ss   03:41   2:26 ruby 
/usr/sbin/puppetd -w 0

hey:~# strace -f -p 12629
Process 12629 attached - interrupt to quit
select(12, [9 10], [], [], {0, 184000}) = 0 (Timeout)
select(12, [9 10], [], [], {0, 1477})   = 0 (Timeout)
select(12, [9 10], [], [], {0, 0})      = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
select(12, [9 10], [], [], {1, 999998}) = 0 (Timeout)
select(12, [9 10], [], [], {0, 1706})   = 0 (Timeout)
select(12, [9 10], [], [], {0, 0})      = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
select(12, [9 10], [], [], {1, 999997} <unfinished ...>
Process 12629 detached

hey:~# file /proc/12629/fd/{9,10}
/proc/12629/fd/9:  broken symbolic link to `socket:[93425425]'
/proc/12629/fd/10: broken symbolic link to `socket:[93847992]'

hey:~# lsof -n | egrep "(93425425|93847992)"
ruby      12629     root    9u     IPv4           93425425                  TCP 
*:8139 (LISTEN)
ruby      12629     root   10u     IPv4           93847992                  TCP 
localip:46803->puppetmaster:8140 (ESTABLISHED)

hey:# file /proc/12629/fd/*
0:  symbolic link to `/dev/null'
1:  symbolic link to `/dev/null'
10: broken symbolic link to `socket:[93847992]'
2:  symbolic link to `/dev/null'
3:  broken symbolic link to `pipe:[93425407]'
4:  broken symbolic link to `pipe:[93425407]'
5:  broken symbolic link to `socket:[93425419]'
6:  symbolic link to `/var/log/puppet/http.log'
7:  symbolic link to `/var/log/puppet/http.log'
8:  symbolic link to `/var/log/puppet/http.log'
9:  broken symbolic link to `socket:[93425425]'

i'm ready to provide more info(if requested) next time it happens.


-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to