Issue #4923 has been updated by Brice Figureau.
Markus Roberts wrote: > I'd like to see this fixed in 2.6.x if possible, though I'm not seeing it as > new (and thus our policy would be to fix it in 2.7). I'm also raising it to > high as we have to start flushing out these threading problems. I'd really like someone else than me looking at this file_locking code. It looks really suspicious, but then check the git history for more puzzling. Then look at the integration test that doesn't test multi-process file locking (flock is active on a process level, not a thread level)... ---------------------------------------- Bug #4923: Puppet file locking is not thread safe http://projects.puppetlabs.com/issues/4923 Author: Brice Figureau Status: Accepted Priority: High Assignee: Category: threading Target version: 2.6.x Affected version: 2.6.1 Keywords: Branch: Jason Wright discovered that running puppet-load with a high concurrency (ie > 10) was randomly producing the following error on a multiprocess passenger system: <pre> failed: Could not parse YAML data for node thiscert-isss-forr-thee-healthchecks: syntax error on line 10, col 2: ` domain: thiscert-isss-forr-thee-healthchecks' </pre> With the following stacktrace: <pre> /usr/lib/ruby/1.8/puppet/indirector/yaml.rb:22:in `find' /usr/lib/ruby/1.8/puppet/indirector/indirection.rb:208:in `find_in_cache' /usr/lib/ruby/1.8/puppet/indirector/indirection.rb:184:in `find' /usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find' /usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:90:in `find_node' /usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:114:in `node_from_request' /usr/lib/ruby/1.8/puppet/indirector/catalog/compiler.rb:32:in `find' /usr/lib/ruby/1.8/puppet/indirector/indirection.rb:193:in `find' /usr/lib/ruby/1.8/puppet/indirector.rb:50:in `find' /usr/lib/ruby/1.8/puppet/network/http/handler.rb:101:in `do_find' /usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `send' /usr/lib/ruby/1.8/puppet/network/http/handler.rb:68:in `process' /usr/lib/ruby/1.8/puppet/network/http/rack.rb:51:in `call' /usr/lib/ruby/1.8/phusion_passenger/rack/request_handler.rb:95:in `process_request' /usr/lib/ruby/1.8/phusion_passenger/abstract_request_handler.rb:207:in `main_loop' /usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:118:in `run' /usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:69:in `spawn_application' /usr/lib/ruby/1.8/phusion_passenger/utils.rb:184:in `safe_fork' /usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:62:in `spawn_application' /usr/lib/ruby/1.8/phusion_passenger/rack/application_spawner.rb:45:in `spawn_application' /usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:159:in `spawn_application' /usr/lib/ruby/1.8/phusion_passenger/spawn_manager.rb:287:in `handle_spawn_application' /usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `__send__' /usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:352:in `main_loop' /usr/lib/ruby/1.8/phusion_passenger/abstract_server.rb:196:in `start_synchronously' /usr/lib/phusion_passenger/passenger-spawn-server:61 </pre> I was able to reproduce the issue locally with a 2 processes mongrel, and could find that the issue is a corruption of the node facts cache on the master. It does happen frequently with puppet-load because this one asks the catalog of only one node but more than one time concurrently (which is improbable in production). I found that the Puppet::Util::FileLocking module wasn't correctly threadsafe, certainly because the reader/writer ruby Sync lock has a bug (at least on MRI 1.8.7). Here is the failure scenario: # process 1, thread 1 enters the thread write lock, calls flock in exclusive mode, and starts writing the yaml file. # process 1, thread 2 enters the thread read lock (which shouldn't happen) and call flock in shared mode. This *downgrades* the exclusive file lock to a shared lock # process 2, thread 1 enters the thread write lock, calls flock in exclusive mode _and_ succeed. It starts writing the yaml file. *Corruption happens* # process 1, thread 1 resumes and finishes writing the yaml. *file is corrupted* I was able to fix locally the issue by using a mutually exclusive critical section (see the soon to come patch). -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here: http://projects.puppetlabs.com/my/account -- You received this message because you are subscribed to the Google Groups "Puppet Bugs" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-bugs?hl=en.
