I saw a deadlock between a "ls" and a "rm" command on our Regatta AIX 5.2 system which I could analyze using kdb.

It turned out that the lock order was violated in afs_vnop_remove.c:
"rm" held a dcache lock when trying to obtain a vcache lock.
"ls" held a read lock on the vcache and tried to get the dcache lock.

--- afs_vnop_remove.c.orig      2005-05-30 06:05:44.000000000 +0200
+++ afs_vnop_remove.c   2006-02-21 16:05:06.000000000 +0100
@@ -349,6 +349,8 @@
     if (tvc && osi_Active(tvc)) {
        /* about to delete whole file, prefetch it first */
        ReleaseWriteLock(&adp->lock);
+       if (tdc)
+           ReleaseSharedLock(&tdc->lock);
        ObtainWriteLock(&tvc->lock, 143);
 #if    defined(AFS_OSF_ENV)
        afs_Wire(tvc, &treq);
@@ -357,6 +359,8 @@
 #endif
        ReleaseWriteLock(&tvc->lock);
        ObtainWriteLock(&adp->lock, 144);
+        if (tdc)
+           ObtainSharedLock(&tdc->lock, 1638);
     }

     osi_dnlc_remove(adp, aname, tvc);


This diff applies to OpenAFS 1.4.0. The number 1638 should remember to the number 638 where the lock was obtained before.

Hartmut
-----------------------------------------------------------------
Hartmut Reuter                           e-mail [EMAIL PROTECTED]
                                           phone +49-89-3299-1328
RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to