Issue #4923 has been updated by Brice Figureau.
Brice Figureau wrote:
> I think I now understand the problem of the filelocking, although I didn't
> prove it by a testcase.
>
> Currently we're doing something like this when write locking:
> [...]
>
> where file is the target file. Unfortunately open in 'w' mode does an open
> with O_TRUNC which truncates the file immediately.
> Yes, way before we get an exclusive lock on the file :)
> Which means another process (not another thread since we use a Sync) can lock
> the file and write or read during the truncation.
>
> I propose the following (untested) fix:
> [...]
> (0x20 is O_EXLOCK but it doesn't seem to be defined in ruby :().
>
> A more portable solution of course would be to write the file in a temporary
> file and atomically move it to the target, as it was done before. We could
> eliminate the locking altogether (but that would certainly require to sync
> the directory or sth which might not be possible in ruby).
OK, this change fixes the issue. Unfortunately the open flag O_EXLOCK is not
POSIX, so I think the following can be used and still be portable:
<pre>
File.open(file, File::Constants::CREAT | File::Constants::WRONLY , mode)
do |rf|
rf.lock_exclusive do |lrf|
# poor's man open(2) O_EXLOCK|O_TRUNC
lrf.seek(0, IO::SEEK_SET)
lrf.truncate(0)
yield lrf
end
end
</pre>
This second fix also fixes the issue. I'll send a patch later.
----------------------------------------
Bug #4923: Puppet file locking is not thread safe
https://projects.puppetlabs.com/issues/4923
Author: Brice Figureau
Status: Accepted
Priority: High
Assignee: Markus Roberts
Category: threading
Target version: 2.6.x
Affected version: 2.6.1
Keywords:
Branch:
Jason Wright discovered that running puppet-load with a high concurrency (ie >
10) was randomly producing the following error on a multiprocess passenger
system:
<pre>
failed: Could not parse YAML data
for node thiscert-isss-forr-thee-healthchecks: syntax error on line
10, col 2: ` domain: thiscert-isss-forr-thee-healthchecks'
</pre>
With the following stacktrace:
<pre>
/usr/lib/ruby/1.8/puppet/indirector/yaml.rb:22:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:208:in `find_in_cache'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:184:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:90:in `find_node'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:114:in
`node_from_request'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:32:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process'
/usr/lib/ruby/1.8/puppet/network/http/rack.rb:51:in `call'
/usr/lib/ruby/1.8/phusion_passenger/rack/request_handler.rb:95:in
`process_request'
/usr/lib/ruby/1.8/phusion_passenger/abstract_request_handler.rb:207:in
`main_loop'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:118:in `run'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:69:in
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/utils.rb:184:in `safe_fork'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:62:in
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:45:in
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:159:in `spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:287:in
`handle_spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `__send__'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `main_loop'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:196:in
`start_synchronously'
/usr/lib/phusion_passenger/passenger-spawn-server:61
</pre>
I was able to reproduce the issue locally with a 2 processes mongrel, and could
find that the issue is a corruption of the node facts cache on the master.
It does happen frequently with puppet-load because this one asks the catalog of
only one node but more than one time concurrently (which is improbable in
production).
I found that the Puppet::Util::FileLocking module wasn't correctly threadsafe,
certainly because the reader/writer ruby Sync lock has a bug (at least on MRI
1.8.7).
Here is the failure scenario:
* process 1, thread 1 enters the thread write lock, calls flock in exclusive
mode, and starts writing the yaml file.
* process 1, thread 2 enters the thread read lock (which shouldn't happen) and
call flock in shared mode. This *downgrades* the exclusive file lock to a
shared lock
* process 2, thread 1 enters the thread write lock, calls flock in exclusive
mode _and_ succeed. It starts writing the yaml file. *Corruption happens*
* process 1, thread 1 resumes and finishes writing the yaml. *file is corrupted*
I was able to fix locally the issue by using a mutually exclusive critical
section (see the soon to come patch).
--
You have received this notification because you have either subscribed to it,
or are involved in it.
To change your notification preferences, please click here:
http://projects.puppetlabs.com/my/account
--
You received this message because you are subscribed to the Google Groups
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-bugs?hl=en.