Issue #4923 has been updated by Brice Figureau.
I think I now understand the problem of the filelocking, although I didn't
prove it by a testcase.
Currently we're doing something like this when write locking:
<pre>
File.open(file, "w", mode) do |rf|
rf.lock_exclusive do |lrf|
yield lrf
end
end
</pre>
where file is the target file. Unfortunately open in 'w' mode does an open with
O_TRUNC which truncates the file immediately.
Yes, way before we get an exclusive lock on the file :)
Which means another process (not another thread since we use a Sync) can lock
the file and write or read during the truncation.
I propose the following (untested) fix:
<pre>
File.open(file, File::Constants::WRONLY | File::Constants::CREATE |
File::Constants::TRUNC | 0x020, mode) do |rf|
yield rf
end
</pre>
(0x20 is O_EXLOCK but it doesn't seem to be defined in ruby :().
A more portable solution of course would be to write the file in a temporary
file and atomically move it to the target, as it was done before. We could
eliminate the locking altogether (but that would certainly require to sync the
directory or sth which might not be possible in ruby).
----------------------------------------
Bug #4923: Puppet file locking is not thread safe
https://projects.puppetlabs.com/issues/4923
Author: Brice Figureau
Status: Accepted
Priority: High
Assignee: Markus Roberts
Category: threading
Target version: 2.6.x
Affected version: 2.6.1
Keywords:
Branch:
Jason Wright discovered that running puppet-load with a high concurrency (ie >
10) was randomly producing the following error on a multiprocess passenger
system:
<pre>
failed: Could not parse YAML data
for node thiscert-isss-forr-thee-healthchecks: syntax error on line
10, col 2: ` domain: thiscert-isss-forr-thee-healthchecks'
</pre>
With the following stacktrace:
<pre>
/usr/lib/ruby/1.8/puppet/indirector/yaml.rb:22:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:208:in `find_in_cache'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:184:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:90:in `find_node'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:114:in
`node_from_request'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:32:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process'
/usr/lib/ruby/1.8/puppet/network/http/rack.rb:51:in `call'
/usr/lib/ruby/1.8/phusion_passenger/rack/request_handler.rb:95:in
`process_request'
/usr/lib/ruby/1.8/phusion_passenger/abstract_request_handler.rb:207:in
`main_loop'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:118:in `run'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:69:in
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/utils.rb:184:in `safe_fork'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:62:in
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:45:in
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:159:in `spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:287:in
`handle_spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `__send__'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `main_loop'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:196:in
`start_synchronously'
/usr/lib/phusion_passenger/passenger-spawn-server:61
</pre>
I was able to reproduce the issue locally with a 2 processes mongrel, and could
find that the issue is a corruption of the node facts cache on the master.
It does happen frequently with puppet-load because this one asks the catalog of
only one node but more than one time concurrently (which is improbable in
production).
I found that the Puppet::Util::FileLocking module wasn't correctly threadsafe,
certainly because the reader/writer ruby Sync lock has a bug (at least on MRI
1.8.7).
Here is the failure scenario:
* process 1, thread 1 enters the thread write lock, calls flock in exclusive
mode, and starts writing the yaml file.
* process 1, thread 2 enters the thread read lock (which shouldn't happen) and
call flock in shared mode. This *downgrades* the exclusive file lock to a
shared lock
* process 2, thread 1 enters the thread write lock, calls flock in exclusive
mode _and_ succeed. It starts writing the yaml file. *Corruption happens*
* process 1, thread 1 resumes and finishes writing the yaml. *file is corrupted*
I was able to fix locally the issue by using a mutually exclusive critical
section (see the soon to come patch).
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://projects.puppetlabs.com/my/account
--
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en.