Date: Sat, 20 Oct 2012 04:06:13 +0000 From: David Holland <dholland-sourcechan...@netbsd.org>
On Fri, Oct 19, 2012 at 07:58:34AM +0100, David Laight wrote: > > Committed By: riastradh > > Date: Fri Oct 19 02:07:23 UTC 2012 > > > > rename("a/b", "a/c") and rename("a/c/x", "a/b/y") will deadlock. > > Surely it just converts rename("a/c/x", "a/b/y") into > rename("a/c/x", "a/c/y") which isn't quite the intended operation. No, it will (or can) deadlock. The problem is that for the second (cross-directory) rename, as far as it can tell a/c and a/b are incommensurate, so it locks in an arbtrary order (which for reasons of internal convenience is a/b first then a/c) but the same-dir rename locks a/b first then a/c. We could avoid this case by breaking ties using vnode address. > Maybe convert the fs-wide rename lock into a rw-lock and only require > read access for a same-directory rename. That will not help. It seems pretty clear to me that an rwlock would avoid this particular case too. However, it's not clear to me that either of these approaches yields a deadlock-free result. The proof that our current rename locking is deadlock-free is easy: Except for rename, everything holds at most two vnode locks at once, in parent-to-child order, and nothing holds locks on pairs of incommensurate vnodes. Rename locks in ancestor-to- descendant order when the order exists, which includes parent-to-child order; and is the only operation that can lock sets of incommensurate nodes, and it does them only within directories that nobody else can have locked simultaneously, so if there's only one rename operation in flight the lock order it uses on incommensurate nodes doesn't matter.