On Thu, 2011-07-21 at 21:00 -0700, Linus Torvalds wrote:
> On Thu, Jul 21, 2011 at 8:52 PM, Ian Kent <ra...@themaw.net> wrote:
> >
> > Oh, sorry my bad, obviously I missed the significance of moving the
> > inode update, I still don't get it, so I'll have a look at the
> > surrounding code again.
> 
> So it's admittedly pretty subtle. The issue is that during the RCU
> pathname walk, since we (very much on purpose) do not take any d_lock
> or even increment the dentry count, the d_inode field is not stable.

Thanks very much for the explanation.

In hindsight, the specific case I was addressing was for a dentry that
has no mount on it so I shouldn't have changed the inode assignment
anyway, oops.

> 
> So when we read from dentry->d_inode, we need to re-check the sequence
> count afterwards.
> 
> The exceptions ends up being:
> 
>  - In __d_lookup_rcu(), we carefully load d_inode (and check the
> name), and verify the dentry sequence number afterwards with the
> proper memory barriers, so the *inode value that __d_lookup_rcu()
> returns is valid.
> 
>  - for mount-points (ie at the *end* of the loop, after we've checked
> it), we can do the load, because those dentries are guaranteed to be
> pinned.
> 
>  - after we've turned the speculative RCU walk into a non-speculative
> walk with 'unlazy_walk()' (ie we've incremented the dentry refcount
> and we've checked the sequence count), d_inode is stable again. So we
> can trust dentries once we've fully done the walk and finished the
> whole link_path_walk() or whatever.
> 
> Those exceptions (especially the last one) cover *most* of the kernel.
> But __follow_mount_rcu() is solidly in the speculative RCU path walk
> area, which is why reading d_inode was buggy there without the extra
> tests for pinning.
> 
>                        Linus


_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable

Reply via email to