On Thu, 2010-09-30 at 06:01 -0700, Nigel Kersten wrote: > On Thu, Sep 30, 2010 at 1:21 AM, Brice Figureau > <[email protected]> wrote: > > On Wed, 2010-09-29 at 17:32 -0700, Jason Wright wrote: > >> On Wed, Sep 29, 2010 at 1:54 PM, Brice Figureau > >> <[email protected]> wrote: > >> > It would be great if you could add some debug statements to the > >> > lib/puppet/indirector/yaml.rb file around line 22 to show what the YAML > >> > look like, and/or what cache it was trying to load. > >> > >> I added > >> > >> Puppet.debug("FOO: failed to read YAML from #{file}") if yaml.nil? > >> or yaml.to_s == "" > >> > >> at line 19 of puppet/indirector/yaml.rb and it's logging when I run > >> puppet-load so it looks like something is failing in readlock(). > > > > Yes that was my gut feeling too. > > I think part of the issue is that puppet-load asks always for the same > > node. In real world setups it is improbable that the master has to > > answer the same question at exactly the same time. > > So I think there is a race in the indirector yaml caching subsystem. It > > looks like readlock and writelock are not doing their job.
I found several issues that are worth looking into: 1) Puppet::Util.sync doesn't seem thread-safe Two threads can enter this method at the same time for the same resource. Thus it might be possible to exit with two different Sync instance for the same resource. There are low chance with MRI green-threading, but this can happen under JRuby. Which means a thread can write the file at the same time another can read it (flock is per process and shouldn't lock a given thread). 2) lib/puppet/external/lock.rb seems incomplete Notice how the lock_shared part does flock(LOCK_UN) only based on $reader_count which is never incremented (you can compare with the original version linked in the comment). So basically we never unlock our read locks :) I suppose that closing the file is enough to remove the lock (hopefully). I think if someone beside Jason, Nigel and me could have a look to this issue, that would be great (this is a hint for the PL team) :) I'll try to reproduce it on my side if I can achieve the same concurrency as you have (I don't have any powerful test machines, nor any load balancers :)). > > Can you summarize on what os/filesystems type/ruby versions you are running > > your master? > > > > Hmm, could it be that the node yaml (ie $yamldir) is on NFS or any > > filesystem that have issues with file locks? > > Just to avoid the timezone round trip because I woke up early :) Jason > will either be benchmarking on Ubuntu Hardy or Lucid, and I think he's > just on the standard Ruby versions there at the moment. > > Probably 1.8.6.111-2ubuntu1.3 or 1.8.7.249-2 OK, nothing fancy, then. > They're definitely not on NFS. But can $vardir be on NFS or any unlockable filesystem? -- Brice Figureau Follow the latest Puppet Community evolutions on www.planetpuppet.org! -- You received this message because you are subscribed to the Google Groups "Puppet Developers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/puppet-dev?hl=en.
