Re: svn commit: r250411 - in head/sys: conf kern sys

Jeff Roberson Sat, 11 May 2013 17:48:52 -0700

On Thu, 9 May 2013, Marcel Moolenaar wrote:

Author: marcel
Date: Thu May  9 16:28:18 2013
New Revision: 250411
URL: http://svnweb.freebsd.org/changeset/base/250411


Log:
 Add option WITNESS_NO_VNODE to suppress printing LORs between VNODE
 locks. To support this, VNODE locks are created with the LK_IS_VNODE
 flag. This flag is propagated down using the LO_IS_VNODE flag.

 Note that WITNESS still records the LOR. Only the printing and the
 optional entering into the kernel debugger is bypassed with the
 WITNESS_NO_VNODE option.

I'm replying to the original commit because the resulting thread got wayout of hand. We need to all take a deep breath and take a pragmaticapproach to solving the problem at hand.

Let me first say I understand the utility here as this is also coming upin my organization. Test, and users, do not want to see erroneous warningmessages. I understand that. Let's find a solution.

Secondly, I think this project has grown too far for us to commit changeslike this without some focused discussion. We need to be more mindful ofthe size of the impact and the number of people who are interested in aparticular area. I'm not picking on you Marcel because this sort of thinghas been coming up lately and we have all been guilty of it from time totime. There are more companies and individuals than ever trying to pushwork into the repository and we're having some growing pains.

I am intimately familiar with the problems that lead to these erroneouswitness messages as I have tracked down many of them and am evenresponsible for the code that generates them in some cases. Let me firstoutline a handful of generic problems. The root cause is that witness cannot determine the real order between two locks due to relationships toocomplex to describe with a pair of strings.

One example, which has been brought up, is the hierarchical nature ofvnode locks. This impacts vnodes within one filesystem but it alsoinvolves vnodes between two different filesystems as you cross mountpoints. We can construct perfectly valid and deadlock free chains ofmount points that have two different filesystem types in different orderswhich will LOR at the boundaries. We already skip duplicates to avoidthis problem within each filesystem. We need to skip cross-filesystemduplicates, most desirably at the few specific places where this happens.This problem comes up especially for devfs because we lock devvps whilefile vnodes are locked but we lock devfs directories after the rootfs lockwhen crossing mountpoints in lookup.

A second example, is locks of a fundamentally different type that have acomplex ordering relationship. For example, a vnode lock may be acquiredafter a buf lock belonging to the parent's directory block. A cg buf lockmay be acquired after any file buf lock. Here we want to ignoreinteractions between these two specific types at this particular locationbut not others as they may be unsafe.

The third example, is a complex locking pattern with shared locks aspresented by dirhash. We are seeing a similar pattern develop in the vmwhere we are going to use an exclusive object lock to protect pages or ashared object lock + a page lock. The semantics only get more complex aswe push for more scalability. I expect to see more of these patternsdevelop.

None of these problems can be solved with names alone. So far we'vejust lived with the warnings and we're no longer willing to accept that.What we need is a solution that blesses the specific instances and thespecific lock classes involved without silencing legitimate warnings thatmay only occur after new code is added. For example, it may be safe toadd a sx lock around some vnode code but you may not notice that you LORif you silence all witness warnings related to the vnode lock site.

I believe that the perfect solution would be a mechanism that could teachwitness about and enforce these specific relationships. However, that maybe computationally prohibitive and too complex to code. A more reasonableoption would be to bless the specific relationships at the specific callsites. Turning all witness off at particular sites or with particulartypes renders important infrastructure useless for very large functionalareas. It's also important to distinguish between squelching the errormessage from eliminating the other state that is saved at lock sites.

We already have lock names and types. What I would propose we do is makethe type 'vnode' for all vnodes and 'buf' for all bufs with the names usedfor the specific filesystems. Then you could specify a DUPOK thatautomatically blesses any filesystem to filesystem related LORs. In thisway witness still records the call sites and unrelated LORs or panicsstill have the acquisition information. You could eventually unwind thisto only DUPOK at the specific currently known places that we anticipatemultiple vnodes.

To solve the buf and other complex LORs involving multiple types you wouldmake lock variants that accept a blessed list of the specific locks youare blessing. We already have support for a blessed list in witness. Wejust want something like this per-call.

For example; When acquiring a child vnode lock after a parent lock whena parent's buf is held I would do something like this:


vn_lock_flags(vp, LK_EXCLUSIVE, { "vnode", "bufwait", NULL });

Written properly dead code elimination will take care of it for non-debugbuilds. You could come up with other ways of writing this that don'tinvolve passing pointers down the stack. For example, making a uniquetoken for this blessed pair and passing it in the high bits of the flags.I don't really care about the particular implementation but I think thisis the right model.


Thanks,
Jeff


Modified:
 head/sys/conf/options
 head/sys/kern/kern_lock.c
 head/sys/kern/subr_witness.c
 head/sys/kern/vfs_subr.c
 head/sys/sys/lock.h
 head/sys/sys/lockmgr.h

Modified: head/sys/conf/options
==============================================================================
--- head/sys/conf/options       Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/conf/options       Thu May  9 16:28:18 2013        (r250411)
@@ -672,6 +672,7 @@ KTR_ENTRIES         opt_global.h
KTR_VERBOSE             opt_ktr.h
WITNESS                 opt_global.h
WITNESS_KDB             opt_witness.h
+WITNESS_NO_VNODE       opt_witness.h
WITNESS_SKIPSPIN        opt_witness.h

# options for ACPI support

Modified: head/sys/kern/kern_lock.c
==============================================================================
--- head/sys/kern/kern_lock.c   Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/kern/kern_lock.c   Thu May  9 16:28:18 2013        (r250411)
@@ -393,6 +393,8 @@ lockinit(struct lock *lk, int pri, const
                iflags |= LO_WITNESS;
        if (flags & LK_QUIET)
                iflags |= LO_QUIET;
+       if (flags & LK_IS_VNODE)
+               iflags |= LO_IS_VNODE;
        iflags |= flags & (LK_ADAPTIVE | LK_NOSHARE);

        lk->lk_lock = LK_UNLOCKED;

Modified: head/sys/kern/subr_witness.c
==============================================================================
--- head/sys/kern/subr_witness.c        Thu May  9 16:09:39 2013        
(r250410)
+++ head/sys/kern/subr_witness.c        Thu May  9 16:28:18 2013        
(r250411)
@@ -1289,7 +1289,19 @@ witness_checkorder(struct lock_object *l
                        w->w_reversed = w1->w_reversed = 1;
                        witness_increment_graph_generation();
                        mtx_unlock_spin(&w_mtx);
-
+
+#ifdef WITNESS_NO_VNODE
+                       /*
+                        * There are known LORs between VNODE locks. They are
+                        * not an indication of a bug. VNODE locks are flagged
+                        * as such (LO_IS_VNODE) and we don't yell if the LOR
+                        * is between 2 VNODE locks.
+                        */
+                       if ((lock->lo_flags & LO_IS_VNODE) != 0 &&
+                           (lock1->li_lock->lo_flags & LO_IS_VNODE) != 0)
+                               return;
+#endif
+
                        /*
                         * Ok, yell about it.
                         */

Modified: head/sys/kern/vfs_subr.c
==============================================================================
--- head/sys/kern/vfs_subr.c    Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/kern/vfs_subr.c    Thu May  9 16:28:18 2013        (r250411)
@@ -1037,7 +1037,7 @@ alloc:
         * By default, don't allow shared locks unless filesystems
         * opt-in.
         */
-       lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE);
+       lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE | LK_IS_VNODE);
        /*
         * Initialize bufobj.
         */

Modified: head/sys/sys/lock.h
==============================================================================
--- head/sys/sys/lock.h Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/sys/lock.h Thu May  9 16:28:18 2013        (r250411)
@@ -79,6 +79,7 @@ struct lock_class {
#define LO_SLEEPABLE    0x00100000      /* Lock may be held while sleeping. */
#define LO_UPGRADABLE   0x00200000      /* Lock may be upgraded/downgraded. */
#define LO_DUPOK        0x00400000      /* Don't check for duplicate acquires */
+#define        LO_IS_VNODE     0x00800000      /* Tell WITNESS about a VNODE 
lock */
#define LO_CLASSMASK    0x0f000000      /* Class index bitmask. */
#define LO_NOPROFILE    0x10000000      /* Don't profile this lock */


Modified: head/sys/sys/lockmgr.h
==============================================================================
--- head/sys/sys/lockmgr.h      Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/sys/lockmgr.h      Thu May  9 16:28:18 2013        (r250411)
@@ -146,6 +146,7 @@ _lockmgr_args_rw(struct lock *lk, u_int
#define LK_NOWITNESS    0x000010
#define LK_QUIET        0x000020
#define LK_ADAPTIVE     0x000040
+#define        LK_IS_VNODE     0x000080        /* Tell WITNESS about a VNODE 
lock */

/*
 * Additional attributes to be used in lockmgr().

_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Re: svn commit: r250411 - in head/sys: conf kern sys

Reply via email to