Issue #4923 has been updated by Brice Figureau.

Brice Figureau wrote:
> I think I now understand the problem of the filelocking, although I didn't 
> prove it by a testcase.
> 
> Currently we're doing something like this when write locking:
> [...]
> 
> where file is the target file. Unfortunately open in 'w' mode does an open 
> with O_TRUNC which truncates the file immediately.
> Yes, way before we get an exclusive lock on the file :)
> Which means another process (not another thread since we use a Sync) can lock 
> the file and write or read during the truncation.
> 
> I propose the following (untested) fix:
> [...]
> (0x20 is O_EXLOCK but it doesn't seem to be defined in ruby :().
> 
> A more portable solution of course would be to write the file in a temporary 
> file and atomically move it to the target, as it was done before. We could 
> eliminate the locking altogether (but that would certainly require to sync 
> the directory or sth which might not be possible in ruby).

OK, this change fixes the issue. Unfortunately the open flag O_EXLOCK is not 
POSIX, so I think the following can be used and still be portable:
<pre>
      File.open(file, File::Constants::CREAT | File::Constants::WRONLY , mode) 
do |rf|
        rf.lock_exclusive do |lrf|
          # poor's man open(2) O_EXLOCK|O_TRUNC 
          lrf.seek(0, IO::SEEK_SET)
          lrf.truncate(0)
          yield lrf
        end
      end
</pre>

This second fix also fixes the issue. I'll send a patch later.
----------------------------------------
Bug #4923: Puppet file locking is not thread safe
https://projects.puppetlabs.com/issues/4923

Author: Brice Figureau
Status: Accepted
Priority: High
Assignee: Markus Roberts
Category: threading
Target version: 2.6.x
Affected version: 2.6.1
Keywords: 
Branch: 


Jason Wright discovered that running puppet-load with a high concurrency (ie > 
10) was randomly producing the following error on a multiprocess passenger 
system:
<pre>
failed: Could not parse YAML data
for node thiscert-isss-forr-thee-healthchecks: syntax error on line
10, col 2: `  domain: thiscert-isss-forr-thee-healthchecks'
</pre>

With the following stacktrace:
<pre>
/usr/lib/ruby/1.8/puppet/indirector/yaml.rb:22:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:208:in `find_in_cache'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:184:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:90:in `find_node'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:114:in 
`node_from_request'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:32:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process'
/usr/lib/ruby/1.8/puppet/network/http/rack.rb:51:in `call'
/usr/lib/ruby/1.8/phusion_passenger/rack/request_handler.rb:95:in 
`process_request'
/usr/lib/ruby/1.8/phusion_passenger/abstract_request_handler.rb:207:in 
`main_loop'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:118:in `run'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:69:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/utils.rb:184:in `safe_fork'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:62:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:45:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:159:in `spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:287:in 
`handle_spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `__send__'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `main_loop'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:196:in 
`start_synchronously'
/usr/lib/phusion_passenger/passenger-spawn-server:61

</pre>

I was able to reproduce the issue locally with a 2 processes mongrel, and could 
find that the issue is a corruption of the node facts cache on the master.

It does happen frequently with puppet-load because this one asks the catalog of 
only one node but more than one time concurrently (which is improbable in 
production).

I found that the Puppet::Util::FileLocking module wasn't correctly threadsafe, 
certainly because the reader/writer ruby Sync lock has a bug (at least on MRI 
1.8.7).

Here is the failure scenario:
* process 1, thread 1 enters the thread write lock, calls flock in exclusive 
mode, and starts writing the yaml file.
* process 1, thread 2 enters the thread read lock (which shouldn't happen) and 
call flock in shared mode. This *downgrades* the exclusive file lock to a 
shared lock
* process 2, thread 1 enters the thread write lock, calls flock in exclusive 
mode _and_ succeed. It starts writing the yaml file. *Corruption happens*
* process 1, thread 1 resumes and finishes writing the yaml. *file is corrupted*

I was able to fix locally the issue by using a mutually exclusive critical 
section (see the soon to come patch).






-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to