Issue #2211 has been updated by Marcin Owsiany.
I first noticed the problem when working with puppet on a train,
without any internet connection. In such situation, two problems appear:
1) Mildly annoying, frequent 2-second pauses followed by:
<pre>
Timed out seeking value for ipaddress
</pre>
I think this is because when I have no network connection, there is
no nameserver in /etc/resolv.conf, and despite my hostname being in
/etc/hosts, some resolver library does stupid things (I think it
tries to contact a purely imaginary nameserver running on localhost,
but I haven't thorougly investigated that). This is a more general
system issue, outside of puppet, as "host `hostname`" also hangs for
5 seconds before returning:
<pre>
;; connection timed out; no servers could be reached
</pre>
2) More seriously, most - if not all - resoruces fail with error such as:
<pre>
err: //beczulka/common/common_tweaks/common_bash_rc/File[/root/.bashrc]:
Failed
to retrieve current state of resource: No child processes
</pre>
Adding --trace on the command line shows for example:
<pre>
/usr/lib/ruby/1.8/puppet/util.rb:290:in `waitpid2'
/usr/lib/ruby/1.8/puppet/util.rb:290:in `execute'
/usr/lib/ruby/1.8/puppet/util/diff.rb:12:in `diff'
/usr/lib/ruby/1.8/puppet/util/diff.rb:67:in `string_file_diff'
/usr/lib/ruby/1.8/puppet/type/file/source.rb:163:in `insync?'
/usr/lib/ruby/1.8/puppet/type.rb:965:in `propertychanges'
/usr/lib/ruby/1.8/puppet/util/posix.rb:156:in `find_all'
/usr/lib/ruby/1.8/puppet/type.rb:963:in `each'
/usr/lib/ruby/1.8/puppet/type.rb:963:in `find_all'
/usr/lib/ruby/1.8/puppet/type.rb:963:in `propertychanges'
/usr/lib/ruby/1.8/puppet/type/file.rb:1123:in `propertychanges'
/usr/lib/ruby/1.8/puppet/type.rb:833:in `evaluate'
/usr/lib/ruby/1.8/puppet/transaction.rb:60:in `apply'
/usr/lib/ruby/1.8/puppet/transaction.rb:239:in `eval_resource'
/usr/lib/ruby/1.8/puppet/util.rb:425:in `thinmark'
/usr/lib/ruby/1.8/benchmark.rb:293:in `measure'
/usr/lib/ruby/1.8/benchmark.rb:307:in `realtime'
/usr/lib/ruby/1.8/puppet/util.rb:424:in `thinmark'
/usr/lib/ruby/1.8/puppet/transaction.rb:238:in `eval_resource'
/usr/lib/ruby/1.8/puppet/transaction.rb:310:in `evaluate'
/usr/lib/ruby/1.8/puppet/util.rb:425:in `thinmark'
/usr/lib/ruby/1.8/benchmark.rb:293:in `measure'
/usr/lib/ruby/1.8/benchmark.rb:307:in `realtime'
/usr/lib/ruby/1.8/puppet/util.rb:424:in `thinmark'
/usr/lib/ruby/1.8/puppet/transaction.rb:309:in `evaluate'
/usr/lib/ruby/1.8/puppet/transaction.rb:303:in `collect'
/usr/lib/ruby/1.8/puppet/transaction.rb:303:in `evaluate'
/usr/lib/ruby/1.8/puppet/node/catalog.rb:124:in `apply'
/usr/bin/puppet:233
</pre>
Now, when I do have internet connection, things are working just fine, so there
must be some connection between (1) and (2). However ignoring it for a while, I
tried to strace puppet to see what happens around before the wait() call. This
was a bit hard. Stracing all syscalls takes a huge amount of time and produces
massive output. So in the end, after ignoring some syscalls with "-e
'!nanosleep,gettimeofday,sigprocmask'" I managed to catch puppet hanging
forever after executing the following (line numbers prepended for easy
commenting later):
<pre>
1 10764 pipe([5, 7]) = 0
2 10764 clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7cf0b18)
= 10833
3 10833 close(5 <unfinished ...>
4 10764 close(7 <unfinished ...>
5 10833 <... close resumed> ) = 0
6 10764 <... close resumed> ) = 0
7 10833 dup2(7, 1 <unfinished ...>
8 10764 fcntl64(5, F_GETFL <unfinished ...>
9 10833 <... dup2 resumed> ) = 1
10 10764 <... fcntl64 resumed> ) = 0 (flags O_RDONLY)
11 10833 close(7 <unfinished ...>
12 10764 fstat64(5, <unfinished ...>
13 10833 <... close resumed> ) = 0
14 10764 <... fstat64 resumed> {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
15 10833 close(3 <unfinished ...>
16 10764 mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
17 10833 <... close resumed> ) = 0
18 10764 <... mmap2 resumed> ) = 0xb7843000
19 10833 close(4 <unfinished ...>
20 10764 _llseek(5, 0, <unfinished ...>
21 10833 <... close resumed> ) = 0
22 10764 <... _llseek resumed> 0xbf7f5270, SEEK_CUR) = -1 ESPIPE (Illegal
seek)
23 10833 close(5 <unfinished ...>
24 10764 fstat64(5, <unfinished ...>
25 10833 <... close resumed> ) = -1 EBADF (Bad file descriptor)
26 10764 <... fstat64 resumed> {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
27 10833 close(6) = 0
28 10833 close(7 <unfinished ...>
29 10764 select(6, [5], [], [], {0, 16463} <unfinished ...>
30 10833 <... close resumed> ) = -1 EBADF (Bad file descriptor)
31 10833 close(8) = -1 EBADF (Bad file descriptor)
32 10833 close(9) = -1 EBADF (Bad file descriptor)
33 10833 close(10) = -1 EBADF (Bad file descriptor)
34 10833 close(11) = -1 EBADF (Bad file descriptor)
35 [...]
36 10833 close(253) = -1 EBADF (Bad file descriptor)
37 10833 close(254) = -1 EBADF (Bad file descriptor)
38 10833 close(255) = -1 EBADF (Bad file descriptor)
39 10833 execve("/sbin/ifconfig", ["/sbin/ifconfig", "eth0"], [/* 18 vars
*/]) = 0
40 10833 brk(0) = 0x8058000
41 10833 uname({sys="Linux", node="beczulka", ...}) = 0
42 [...]
43 10833 write(1, "eth0 Link encap:Ethernet H"..., 378 <unfinished
...>
44 10764 <... select resumed> ) = 1 (in [5], left {0, 40000})
45 10833 <... write resumed> ) = 378
46 10833 exit_group(0) = ?
47 10764 --- SIGCHLD (Child exited) @ 0 (0) ---
48 10764 select(6, [5], [], [], {0, 44385}) = 1 (in [5], left {0, 44385})
49 10764 read(5, "eth0 Link encap:Ethernet H"..., 1024) = 378
50 10764 select(6, [5], [], [], {0, 44025}) = 1 (in [5], left {0, 44025})
51 10764 read(5, "", 1024) = 0
52 10764 close(5) = 0
53 10764 munmap(0xb7843000, 4096) = 0
54 10764 rt_sigaction(SIGHUP, {SIG_IGN}, {0xb7f438f0, [HUP],
SA_RESTORER|SA_RESTART, 0xb7d1c128}, 8) = 0
55 10764 rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb7f438f0, [QUIT],
SA_RESTORER|SA_RESTART, 0xb7d1c128}, 8) = 0
56 10764 rt_sigaction(SIGINT, {SIG_IGN}, {0xb7f438f0, [INT],
SA_RESTORER|SA_RESTART, 0xb7d1c128}, 8) = 0
57 10764 waitpid(10833, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG)
= 10833
58 10764 rt_sigaction(SIGHUP, {0xb7f438f0, [HUP], SA_RESTORER|SA_RESTART,
0xb7d1c128}, {SIG_IGN}, 8) = 0
59 10764 rt_sigaction(SIGQUIT, {0xb7f438f0, [QUIT],
SA_RESTORER|SA_RESTART, 0xb7d1c128}, {SIG_IGN}, 8) = 0
60 10764 rt_sigaction(SIGINT, {0xb7f438f0, [INT], SA_RESTORER|SA_RESTART,
0xb7d1c128}, {SIG_IGN}, 8) = 0
61 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
62 10764 pipe([5, 7]) = 0
63 10764 clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7cf0b18)
= 10834
64 10834 close(5 <unfinished ...>
65 10764 close(7 <unfinished ...>
66 10834 <... close resumed> ) = 0
67 10764 <... close resumed> ) = 0
68 10834 dup2(7, 1 <unfinished ...>
69 10764 fcntl64(5, F_GETFL <unfinished ...>
70 10834 <... dup2 resumed> ) = 1
71 10764 <... fcntl64 resumed> ) = 0 (flags O_RDONLY)
72 10834 close(7 <unfinished ...>
73 10764 fstat64(5, <unfinished ...>
74 10834 <... close resumed> ) = 0
75 10764 <... fstat64 resumed> {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
76 10834 close(3 <unfinished ...>
77 10764 mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
78 10834 <... close resumed> ) = 0
79 10764 <... mmap2 resumed> ) = 0xb7843000
80 10834 close(4 <unfinished ...>
81 10764 _llseek(5, 0, <unfinished ...>
82 10834 <... close resumed> ) = 0
83 10764 <... _llseek resumed> 0xbf7f6510, SEEK_CUR) = -1 ESPIPE (Illegal
seek)
84 10834 close(5 <unfinished ...>
85 10764 fstat64(5, <unfinished ...>
86 10834 <... close resumed> ) = -1 EBADF (Bad file descriptor)
87 10764 <... fstat64 resumed> {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
88 10834 close(6) = 0
89 10834 close(7 <unfinished ...>
90 10764 select(6, [5], [], [], {0, 32148} <unfinished ...>
91 10834 <... close resumed> ) = -1 EBADF (Bad file descriptor)
92 10834 close(8) = -1 EBADF (Bad file descriptor)
93 10834 close(9) = -1 EBADF (Bad file descriptor)
94 10834 close(10) = -1 EBADF (Bad file descriptor)
95 [...]
96 10834 close(253) = -1 EBADF (Bad file descriptor)
97 10834 close(254) = -1 EBADF (Bad file descriptor)
98 10834 close(255) = -1 EBADF (Bad file descriptor)
99 10834 execve("/sbin/ifconfig", ["/sbin/ifconfig"], [/* 18 vars */]) = 0
100 10834 brk(0) = 0x8058000
101 10834 uname({sys="Linux", node="beczulka", ...}) = 0
102 10834 write(1, "lo Link encap:Local Loopb"..., 781 <unfinished
...>
103 10764 <... select resumed> ) = 1 (in [5], left {0, 0})
104 10834 <... write resumed> ) = 781
105 10834 exit_group(0) = ?
106 10764 --- SIGCHLD (Child exited) @ 0 (0) ---
107 10764 select(6, [5], [], [], {0, 0}) = 1 (in [5], left {0, 0})
108 10764 waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) =
10834
109 10764 waitpid(-1, 0xbf7ee868, WNOHANG) = 0
110 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
111 10764 read(5, "lo Link encap:Local Loopb"..., 1024) = 781
112 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
113 10764 select(6, [5], [], [], {0, 59201}) = 1 (in [5], left {0, 59201})
114 10764 read(5, "", 1024) = 0
115 10764 close(5) = 0
116 10764 munmap(0xb7843000, 4096) = 0
117 10764 rt_sigaction(SIGHUP, {SIG_IGN}, {0xb7f438f0, [HUP],
SA_RESTORER|SA_RESTART, 0xb7d1c128}, 8) = 0
118 10764 rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb7f438f0, [QUIT],
SA_RESTORER|SA_RESTART, 0xb7d1c128}, 8) = 0
119 10764 rt_sigaction(SIGINT, {SIG_IGN}, {0xb7f438f0, [INT],
SA_RESTORER|SA_RESTART, 0xb7d1c128}, 8) = 0
120 10764 waitpid(10834, 0xbf7f66a8, WNOHANG) = 0
121 10764 select(0, [], [], [], {0, 57695}) = 0 (Timeout)
122 10764 select(0, [], [], [], {0, 4040}) = 0 (Timeout)
123 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
124 10764 waitpid(-1, 0xbf7ee868, WNOHANG) = 0
125 10764 waitpid(10834, 0xbf7f66a8, WNOHANG) = 0
126 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
127 10764 select(0, [], [], [], {0, 59357}) = 0 (Timeout)
128 10764 select(0, [], [], [], {0, 920}) = 0 (Timeout)
129 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
130 10764 waitpid(10834, 0xbf7f66a8, WNOHANG) = 0
131 10764 waitpid(-1, 0xbf7ee868, WNOHANG) = 0
132 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
133 10764 select(0, [], [], [], {0, 59303}) = 0 (Timeout)
134 10764 select(0, [], [], [], {0, 901}) = 0 (Timeout)
135 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
136 10764 waitpid(-1, 0xbf7ee868, WNOHANG) = 0
137 10764 waitpid(10834, 0xbf7f66a8, WNOHANG) = 0
138 10764 select(0, [], [], [], {0, 0}) = 0 (Timeout)
139 10764 select(0, [], [], [], {0, 59654}) = 0 (Timeout)
[...] goes on forever
</pre>
My comments:
* lines 1-38 is basic pipe setup/fork/close all FDs
* 39-46: "ifconfig wlan0" does its thing
* 47-60: puppet reads ifconfig's output first, and then reaps it
So far, so good - all is well. But now notice what happens on the following
lines: the command this time is just "ifconfig", and all looks similar until
line 106. However then:
* line 107: there the timeout in select() is zero,
* line 108: the child is reaped straight away,
* line 111: only now the output is read,
* line 120: puppet makes an attempt to reap the same child again!
The circumstances are slightly different, because in the first case ECHILD is
returned and reported, but in the second case, no error is returned, and puppet
waits forever for a child with PID 10834 to exit, which will never happen.
However I think that the reason for a race might be the same - some generic
SIGCHLD handler that comes in and reaps the child before some other code can
call wait()?
FYI grepping through the log reveals that the following execve()s were called
before it hung:
<pre>
10766 execve("/usr/bin/which", ["which", "which"], [/* 21 vars */]) = 0
10768 execve("/usr/bin/which", ["which", "uname"], [/* 21 vars */]) = 0
10769 execve("/bin/uname", ["uname", "-s"], [/* 18 vars */]) = 0
10771 execve("/usr/bin/which", ["which", "uname"], [/* 21 vars */]) = 0
10772 execve("/bin/uname", ["uname", "-r"], [/* 18 vars */]) = 0
10774 execve("/usr/bin/which", ["which", "hostname"], [/* 21 vars */]) = 0
10775 execve("/bin/hostname", ["hostname"], [/* 18 vars */]) = 0
10777 execve("/usr/bin/which", ["which", "dnsdomainname"], [/* 21 vars */]) = 0
10778 execve("/bin/dnsdomainname", ["dnsdomainname"], [/* 18 vars */]) = 0
10780 execve("/usr/bin/which", ["which", "domainname"], [/* 21 vars */]) = 0
10782 execve("/usr/bin/which", ["which", "sendmail"], [/* 21 vars */]) = 0
10784 execve("/usr/bin/which", ["which", "dnsdomainname"], [/* 21 vars */]) = 0
10785 execve("/bin/dnsdomainname", ["dnsdomainname"], [/* 18 vars */]) = 0
10787 execve("/usr/bin/which", ["which", "domainname"], [/* 21 vars */]) = 0
10789 execve("/usr/bin/which", ["which", "lsb_release"], [/* 21 vars */]) = 0
10791 execve("/usr/bin/lsb_release", ["lsb_release", "-a"], [/* 21 vars */]) = 0
10792 <... execve resumed> ) = 0
10793 execve("/usr/bin/apt-cache", ["apt-cache", "policy"], [/* 21 vars */]) = 0
10796 <... execve resumed> ) = 0
10797 execve("/usr/bin/dpkg-query", ["dpkg-query", "-f", "${Version}
${Provides}\n", "-W", "lsb-core", "lsb-cxx", "lsb-graphics", "lsb-desktop",
"lsb-qt4", "lsb-languages", "lsb-multimedia",
+"lsb-printing"], [/* 21 vars */]) = 0
10802 execve("/sbin/ifconfig", ["/sbin/ifconfig", "-a"], [/* 18 vars */]) = 0
10804 execve("/usr/sbin/dmidecode", ["/usr/sbin/dmidecode"], [/* 21 vars */]) = 0
10805 execve("/sbin/ifconfig", ["/sbin/ifconfig", "-a"], [/* 18 vars */]) = 0
10806 execve("/sbin/ip", ["/sbin/ip", "link", "show", "wmaster0"], [/* 18 vars
*/]) = 0
10807 execve("/sbin/ifconfig", ["/sbin/ifconfig", "wmaster0"], [/* 18 vars */])
= 0
10808 execve("/sbin/ip", ["/sbin/ip", "link", "show", "eth0"], [/* 18 vars */])
= 0
10809 execve("/sbin/ifconfig", ["/sbin/ifconfig", "eth0"], [/* 18 vars */]) = 0
10811 execve("/usr/bin/which", ["which", "uptime"], [/* 21 vars */]) = 0
10812 execve("/usr/bin/uptime", ["uptime"], [/* 18 vars */]) = 0
10813 execve("/sbin/ifconfig", ["/sbin/ifconfig"], [/* 18 vars */]) = 0
10816 execve("/usr/bin/which", ["which", "host"], [/* 21 vars */]) = 0
10817 execve("/usr/bin/host", ["host", "beczulka"], [/* 18 vars */]) = 0
10821 execve("/sbin/ifconfig", ["/sbin/ifconfig"], [/* 18 vars */]) = 0
10823 execve("/usr/bin/which", ["which", "grep"], [/* 21 vars */]) = 0
10826 execve("/bin/grep", ["grep", "physical id", "/proc/cpuinfo"], [/* 21 vars
*/] <unfinished ...>
10826 <... execve resumed> ) = 0
10827 execve("/usr/bin/cut", ["cut", "-d:", "-f", "2"], [/* 21 vars */]
<unfinished ...>
10827 <... execve resumed> ) = 0
10828 execve("/usr/bin/sort", ["sort", "-u"], [/* 21 vars */] <unfinished ...>
10828 <... execve resumed> ) = 0
10829 execve("/usr/bin/wc", ["wc", "-l"], [/* 21 vars */] <unfinished ...>
10829 <... execve resumed> ) = 0
10830 execve("/sbin/ip", ["/sbin/ip", "link", "show", "wlan0"], [/* 18 vars
*/]) = 0
10831 execve("/sbin/ifconfig", ["/sbin/ifconfig", "wlan0"], [/* 18 vars */]) = 0
10832 execve("/sbin/ip", ["/sbin/ip", "link", "show", "eth0"], [/* 18 vars */])
= 0
10833 execve("/sbin/ifconfig", ["/sbin/ifconfig", "eth0"], [/* 18 vars */]) = 0
10834 execve("/sbin/ifconfig", ["/sbin/ifconfig"], [/* 18 vars */]) = 0
</pre>
This is puppet 0.24.8-1gg4 (the differences between this and Debian 0.24.8-1
only seem to touch packaging).
----------------------------------------
Bug #2211: Facter timeouts reap all subprocesses thus confusing Puppet
http://projects.reductivelabs.com/issues/2211
Author: John Florian
Status: Accepted
Priority: High
Assigned to: Luke Kanies
Category: library
Target version: 1.6.0
Complexity: Unknown
Affected version: 0.24.8
Keywords:
It is no longer possible to have puppet install packages via yum/rpm if the
network interface is not bound to an IP address. Our use case requires using
puppet in the non-daemon mode and this is possible for us because the system
will have all necessary manifests and other necessary files locally. This
worked just fine with 0.24.6 on Fedora 10, but began failing upon the upgrade
to 0.24.8.
See the attachments for failure messages and a code diff that seems to have
introduced the regression. If I revert this one change, things work nicely
once again. Looks like a very simple fix if it weren't for the ominous looking
comment in the code. :-)
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://reductivelabs.com/redmine/my/account
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en
-~----------~----~----~----~------~----~------~--~---