[Puppet - Bug #4923] Puppet file locking is not thread safe

tickets Sat, 16 Oct 2010 03:46:18 -0700

Issue #4923 has been updated by Brice Figureau.


I think I now understand the problem of the filelocking, although I didn't 
prove it by a testcase.

Currently we're doing something like this when write locking:
<pre>
      File.open(file, "w", mode) do |rf|
        rf.lock_exclusive do |lrf|
          yield lrf
        end
      end
</pre>

where file is the target file. Unfortunately open in 'w' mode does an open with 
O_TRUNC which truncates the file immediately.
Yes, way before we get an exclusive lock on the file :)
Which means another process (not another thread since we use a Sync) can lock 
the file and write or read during the truncation.

I propose the following (untested) fix:
<pre>
      File.open(file, File::Constants::WRONLY | File::Constants::CREATE | 
File::Constants::TRUNC | 0x020, mode) do |rf|
          yield rf
      end
</pre>
(0x20 is O_EXLOCK but it doesn't seem to be defined in ruby :().

A more portable solution of course would be to write the file in a temporary 
file and atomically move it to the target, as it was done before. We could 
eliminate the locking altogether (but that would certainly require to sync the 
directory or sth which might not be possible in ruby).
----------------------------------------
Bug #4923: Puppet file locking is not thread safe
https://projects.puppetlabs.com/issues/4923

Author: Brice Figureau
Status: Accepted
Priority: High
Assignee: Markus Roberts
Category: threading
Target version: 2.6.x
Affected version: 2.6.1
Keywords: 
Branch: 


Jason Wright discovered that running puppet-load with a high concurrency (ie > 
10) was randomly producing the following error on a multiprocess passenger 
system:
<pre>
failed: Could not parse YAML data
for node thiscert-isss-forr-thee-healthchecks: syntax error on line
10, col 2: `  domain: thiscert-isss-forr-thee-healthchecks'
</pre>

With the following stacktrace:
<pre>
/usr/lib/ruby/1.8/puppet/indirector/yaml.rb:22:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:208:in `find_in_cache'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:184:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:90:in `find_node'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:114:in 
`node_from_request'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:32:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process'
/usr/lib/ruby/1.8/puppet/network/http/rack.rb:51:in `call'
/usr/lib/ruby/1.8/phusion_passenger/rack/request_handler.rb:95:in 
`process_request'
/usr/lib/ruby/1.8/phusion_passenger/abstract_request_handler.rb:207:in 
`main_loop'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:118:in `run'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:69:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/utils.rb:184:in `safe_fork'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:62:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:45:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:159:in `spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:287:in 
`handle_spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `__send__'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `main_loop'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:196:in 
`start_synchronously'
/usr/lib/phusion_passenger/passenger-spawn-server:61

</pre>

I was able to reproduce the issue locally with a 2 processes mongrel, and could 
find that the issue is a corruption of the node facts cache on the master.

It does happen frequently with puppet-load because this one asks the catalog of 
only one node but more than one time concurrently (which is improbable in 
production).

I found that the Puppet::Util::FileLocking module wasn't correctly threadsafe, 
certainly because the reader/writer ruby Sync lock has a bug (at least on MRI 
1.8.7).

Here is the failure scenario:
* process 1, thread 1 enters the thread write lock, calls flock in exclusive 
mode, and starts writing the yaml file.
* process 1, thread 2 enters the thread read lock (which shouldn't happen) and 
call flock in shared mode. This *downgrades* the exclusive file lock to a 
shared lock
* process 2, thread 1 enters the thread write lock, calls flock in exclusive 
mode _and_ succeed. It starts writing the yaml file. *Corruption happens*
* process 1, thread 1 resumes and finishes writing the yaml. *file is corrupted*

I was able to fix locally the issue by using a mutually exclusive critical 
section (see the soon to come patch).






-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

[Puppet - Bug #4923] Puppet file locking is not thread safe

Reply via email to