On Thu, 2010-09-30 at 06:01 -0700, Nigel Kersten wrote:
> On Thu, Sep 30, 2010 at 1:21 AM, Brice Figureau
> <[email protected]> wrote:
> > On Wed, 2010-09-29 at 17:32 -0700, Jason Wright wrote:
> >> On Wed, Sep 29, 2010 at 1:54 PM, Brice Figureau
> >> <[email protected]> wrote:
> >> > It would be great if you could add some debug statements to the
> >> > lib/puppet/indirector/yaml.rb file around line 22 to show what the YAML
> >> > look like, and/or what cache it was trying to load.
> >>
> >> I added
> >>
> >>     Puppet.debug("FOO: failed to read YAML from #{file}") if yaml.nil?
> >> or yaml.to_s == ""
> >>
> >> at line 19 of puppet/indirector/yaml.rb and it's logging when I run
> >> puppet-load so it looks like something is failing in readlock().
> >
> > Yes that was my gut feeling too.
> > I think part of the issue is that puppet-load asks always for the same
> > node. In real world setups it is improbable that the master has to
> > answer the same question at exactly the same time.
> > So I think there is a race in the indirector yaml caching subsystem. It
> > looks like readlock and writelock are not doing their job.

I found several issues that are worth looking into:

1) Puppet::Util.sync doesn't seem thread-safe
Two threads can enter this method at the same time for the same
resource. Thus it might be possible to exit with two different Sync
instance for the same resource. There are low chance with MRI
green-threading, but this can happen under JRuby. Which means a thread
can write the file at the same time another can read it (flock is per
process and shouldn't lock a given thread).

2) lib/puppet/external/lock.rb seems incomplete
Notice how the lock_shared part does flock(LOCK_UN) only based on
$reader_count which is never incremented (you can compare with the
original version linked in the comment).
So basically we never unlock our read locks :)
I suppose that closing the file is enough to remove the lock
(hopefully).

I think if someone beside Jason, Nigel and me could have a look to this
issue, that would be great (this is a hint for the PL team) :)

I'll try to reproduce it on my side if I can achieve the same
concurrency as you have (I don't have any powerful test machines, nor
any load balancers :)).

> > Can you summarize on what os/filesystems type/ruby versions you are running 
> > your master?
> >
> > Hmm, could it be that the node yaml (ie $yamldir) is on NFS or any
> > filesystem that have issues with file locks?
> 
> Just to avoid the timezone round trip because I woke up early :) Jason
> will either be benchmarking on Ubuntu Hardy or Lucid, and I think he's
> just on the standard Ruby versions there at the moment.
> 
> Probably 1.8.6.111-2ubuntu1.3 or 1.8.7.249-2

OK, nothing fancy, then.

> They're definitely not on NFS.

But can $vardir be on NFS or any unlockable filesystem?
-- 
Brice Figureau
Follow the latest Puppet Community evolutions on www.planetpuppet.org!

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to