On Thu, Jul 21, 2011 at 8:52 PM, Ian Kent <[email protected]> wrote:
>
> Oh, sorry my bad, obviously I missed the significance of moving the
> inode update, I still don't get it, so I'll have a look at the
> surrounding code again.

So it's admittedly pretty subtle. The issue is that during the RCU
pathname walk, since we (very much on purpose) do not take any d_lock
or even increment the dentry count, the d_inode field is not stable.

So when we read from dentry->d_inode, we need to re-check the sequence
count afterwards.

The exceptions ends up being:

 - In __d_lookup_rcu(), we carefully load d_inode (and check the
name), and verify the dentry sequence number afterwards with the
proper memory barriers, so the *inode value that __d_lookup_rcu()
returns is valid.

 - for mount-points (ie at the *end* of the loop, after we've checked
it), we can do the load, because those dentries are guaranteed to be
pinned.

 - after we've turned the speculative RCU walk into a non-speculative
walk with 'unlazy_walk()' (ie we've incremented the dentry refcount
and we've checked the sequence count), d_inode is stable again. So we
can trust dentries once we've fully done the walk and finished the
whole link_path_walk() or whatever.

Those exceptions (especially the last one) cover *most* of the kernel.
But __follow_mount_rcu() is solidly in the speculative RCU path walk
area, which is why reading d_inode was buggy there without the extra
tests for pinning.

                       Linus

_______________________________________________
stable mailing list
[email protected]
http://linux.kernel.org/mailman/listinfo/stable

Reply via email to