Issue #4923 has been updated by Brice Figureau.

Chintan Shah wrote:
> Brice Figureau wrote:
> > Markus Roberts wrote:
> > > I'd like to see this fixed in 2.6.x if possible, though I'm not seeing it 
> > > as new (and thus our policy would be to fix it in 2.7).  I'm also raising 
> > > it to high as we have to start flushing out these threading problems.
> > 
> > I'd really like someone else than me looking at this file_locking code. It 
> > looks really suspicious, but then check the git history for more puzzling. 
> > Then look at the integration test that doesn't test multi-process file 
> > locking (flock is active on a process level, not a thread level)...
> 
> When looking at your patch, I feel that the issue has been shifted from a 
> file based lock to a hash but the goal of thread safety is still not 
> achieved. The hash itself is thread unsafe. Doesn't one have to use new_cond, 
> wait and signal on the @@mutexes?

I rejected my own patch because my analysis of the issue is quite wrong.

Regarding the original issue, the problem is a multi-process file locking 
issue, not a threading issue (although the original version is not correct 
because the Puppet::Util.sync method is not thread safe).
I'm not sure I understand your comment. The @@mutexes is extending MonitorMixin 
and the only access is thread safe (it uses a mutually exlcusive critical 
section), or did I miss something?

----------------------------------------
Bug #4923: Puppet file locking is not thread safe
http://projects.puppetlabs.com/issues/4923

Author: Brice Figureau
Status: Accepted
Priority: High
Assignee: Markus Roberts
Category: threading
Target version: 2.6.x
Affected version: 2.6.1
Keywords: 
Branch: 


Jason Wright discovered that running puppet-load with a high concurrency (ie > 
10) was randomly producing the following error on a multiprocess passenger 
system:
<pre>
failed: Could not parse YAML data
for node thiscert-isss-forr-thee-healthchecks: syntax error on line
10, col 2: `  domain: thiscert-isss-forr-thee-healthchecks'
</pre>

With the following stacktrace:
<pre>
/usr/lib/ruby/1.8/puppet/indirector/yaml.rb:22:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:208:in `find_in_cache'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:184:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:90:in `find_node'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:114:in 
`node_from_request'
/usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:32:in `find'
/usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find'
/usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send'
/usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process'
/usr/lib/ruby/1.8/puppet/network/http/rack.rb:51:in `call'
/usr/lib/ruby/1.8/phusion_passenger/rack/request_handler.rb:95:in 
`process_request'
/usr/lib/ruby/1.8/phusion_passenger/abstract_request_handler.rb:207:in 
`main_loop'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:118:in `run'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:69:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/utils.rb:184:in `safe_fork'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:62:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:45:in 
`spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:159:in `spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:287:in 
`handle_spawn_application'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `__send__'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `main_loop'
/usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:196:in 
`start_synchronously'
/usr/lib/phusion_passenger/passenger-spawn-server:61

</pre>

I was able to reproduce the issue locally with a 2 processes mongrel, and could 
find that the issue is a corruption of the node facts cache on the master.

It does happen frequently with puppet-load because this one asks the catalog of 
only one node but more than one time concurrently (which is improbable in 
production).

I found that the Puppet::Util::FileLocking module wasn't correctly threadsafe, 
certainly because the reader/writer ruby Sync lock has a bug (at least on MRI 
1.8.7).

Here is the failure scenario:
* process 1, thread 1 enters the thread write lock, calls flock in exclusive 
mode, and starts writing the yaml file.
* process 1, thread 2 enters the thread read lock (which shouldn't happen) and 
call flock in shared mode. This *downgrades* the exclusive file lock to a 
shared lock
* process 2, thread 1 enters the thread write lock, calls flock in exclusive 
mode _and_ succeed. It starts writing the yaml file. *Corruption happens*
* process 1, thread 1 resumes and finishes writing the yaml. *file is corrupted*

I was able to fix locally the issue by using a mutually exclusive critical 
section (see the soon to come patch).






-- 
You have received this notification because you have either subscribed to it, 
or are involved in it.
To change your notification preferences, please click here: 
http://projects.puppetlabs.com/my/account

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Bugs" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-bugs?hl=en.

Reply via email to