On Thu, 9 May 2013, Marcel Moolenaar wrote:

Author: marcel
Date: Thu May  9 16:28:18 2013
New Revision: 250411
URL: http://svnweb.freebsd.org/changeset/base/250411

Log:
 Add option WITNESS_NO_VNODE to suppress printing LORs between VNODE
 locks. To support this, VNODE locks are created with the LK_IS_VNODE
 flag. This flag is propagated down using the LO_IS_VNODE flag.

 Note that WITNESS still records the LOR. Only the printing and the
 optional entering into the kernel debugger is bypassed with the
 WITNESS_NO_VNODE option.

I'm replying to the original commit because the resulting thread got way out of hand. We need to all take a deep breath and take a pragmatic approach to solving the problem at hand.

Let me first say I understand the utility here as this is also coming up in my organization. Test, and users, do not want to see erroneous warning messages. I understand that. Let's find a solution.

Secondly, I think this project has grown too far for us to commit changes like this without some focused discussion. We need to be more mindful of the size of the impact and the number of people who are interested in a particular area. I'm not picking on you Marcel because this sort of thing has been coming up lately and we have all been guilty of it from time to time. There are more companies and individuals than ever trying to push work into the repository and we're having some growing pains.

I am intimately familiar with the problems that lead to these erroneous witness messages as I have tracked down many of them and am even responsible for the code that generates them in some cases. Let me first outline a handful of generic problems. The root cause is that witness can not determine the real order between two locks due to relationships too complex to describe with a pair of strings.

One example, which has been brought up, is the hierarchical nature of vnode locks. This impacts vnodes within one filesystem but it also involves vnodes between two different filesystems as you cross mount points. We can construct perfectly valid and deadlock free chains of mount points that have two different filesystem types in different orders which will LOR at the boundaries. We already skip duplicates to avoid this problem within each filesystem. We need to skip cross-filesystem duplicates, most desirably at the few specific places where this happens. This problem comes up especially for devfs because we lock devvps while file vnodes are locked but we lock devfs directories after the rootfs lock when crossing mountpoints in lookup.

A second example, is locks of a fundamentally different type that have a complex ordering relationship. For example, a vnode lock may be acquired after a buf lock belonging to the parent's directory block. A cg buf lock may be acquired after any file buf lock. Here we want to ignore interactions between these two specific types at this particular location but not others as they may be unsafe.

The third example, is a complex locking pattern with shared locks as presented by dirhash. We are seeing a similar pattern develop in the vm where we are going to use an exclusive object lock to protect pages or a shared object lock + a page lock. The semantics only get more complex as we push for more scalability. I expect to see more of these patterns develop.

None of these problems can be solved with names alone. So far we've just lived with the warnings and we're no longer willing to accept that. What we need is a solution that blesses the specific instances and the specific lock classes involved without silencing legitimate warnings that may only occur after new code is added. For example, it may be safe to add a sx lock around some vnode code but you may not notice that you LOR if you silence all witness warnings related to the vnode lock site.

I believe that the perfect solution would be a mechanism that could teach witness about and enforce these specific relationships. However, that may be computationally prohibitive and too complex to code. A more reasonable option would be to bless the specific relationships at the specific call sites. Turning all witness off at particular sites or with particular types renders important infrastructure useless for very large functional areas. It's also important to distinguish between squelching the error message from eliminating the other state that is saved at lock sites.

We already have lock names and types. What I would propose we do is make the type 'vnode' for all vnodes and 'buf' for all bufs with the names used for the specific filesystems. Then you could specify a DUPOK that automatically blesses any filesystem to filesystem related LORs. In this way witness still records the call sites and unrelated LORs or panics still have the acquisition information. You could eventually unwind this to only DUPOK at the specific currently known places that we anticipate multiple vnodes.

To solve the buf and other complex LORs involving multiple types you would make lock variants that accept a blessed list of the specific locks you are blessing. We already have support for a blessed list in witness. We just want something like this per-call.

For example; When acquiring a child vnode lock after a parent lock when a parent's buf is held I would do something like this:

vn_lock_flags(vp, LK_EXCLUSIVE, { "vnode", "bufwait", NULL });

Written properly dead code elimination will take care of it for non-debug builds. You could come up with other ways of writing this that don't involve passing pointers down the stack. For example, making a unique token for this blessed pair and passing it in the high bits of the flags. I don't really care about the particular implementation but I think this is the right model.

Thanks,
Jeff


Modified:
 head/sys/conf/options
 head/sys/kern/kern_lock.c
 head/sys/kern/subr_witness.c
 head/sys/kern/vfs_subr.c
 head/sys/sys/lock.h
 head/sys/sys/lockmgr.h

Modified: head/sys/conf/options
==============================================================================
--- head/sys/conf/options       Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/conf/options       Thu May  9 16:28:18 2013        (r250411)
@@ -672,6 +672,7 @@ KTR_ENTRIES         opt_global.h
KTR_VERBOSE             opt_ktr.h
WITNESS                 opt_global.h
WITNESS_KDB             opt_witness.h
+WITNESS_NO_VNODE       opt_witness.h
WITNESS_SKIPSPIN        opt_witness.h

# options for ACPI support

Modified: head/sys/kern/kern_lock.c
==============================================================================
--- head/sys/kern/kern_lock.c   Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/kern/kern_lock.c   Thu May  9 16:28:18 2013        (r250411)
@@ -393,6 +393,8 @@ lockinit(struct lock *lk, int pri, const
                iflags |= LO_WITNESS;
        if (flags & LK_QUIET)
                iflags |= LO_QUIET;
+       if (flags & LK_IS_VNODE)
+               iflags |= LO_IS_VNODE;
        iflags |= flags & (LK_ADAPTIVE | LK_NOSHARE);

        lk->lk_lock = LK_UNLOCKED;

Modified: head/sys/kern/subr_witness.c
==============================================================================
--- head/sys/kern/subr_witness.c        Thu May  9 16:09:39 2013        
(r250410)
+++ head/sys/kern/subr_witness.c        Thu May  9 16:28:18 2013        
(r250411)
@@ -1289,7 +1289,19 @@ witness_checkorder(struct lock_object *l
                        w->w_reversed = w1->w_reversed = 1;
                        witness_increment_graph_generation();
                        mtx_unlock_spin(&w_mtx);
-
+
+#ifdef WITNESS_NO_VNODE
+                       /*
+                        * There are known LORs between VNODE locks. They are
+                        * not an indication of a bug. VNODE locks are flagged
+                        * as such (LO_IS_VNODE) and we don't yell if the LOR
+                        * is between 2 VNODE locks.
+                        */
+                       if ((lock->lo_flags & LO_IS_VNODE) != 0 &&
+                           (lock1->li_lock->lo_flags & LO_IS_VNODE) != 0)
+                               return;
+#endif
+
                        /*
                         * Ok, yell about it.
                         */

Modified: head/sys/kern/vfs_subr.c
==============================================================================
--- head/sys/kern/vfs_subr.c    Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/kern/vfs_subr.c    Thu May  9 16:28:18 2013        (r250411)
@@ -1037,7 +1037,7 @@ alloc:
         * By default, don't allow shared locks unless filesystems
         * opt-in.
         */
-       lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE);
+       lockinit(vp->v_vnlock, PVFS, tag, VLKTIMEOUT, LK_NOSHARE | LK_IS_VNODE);
        /*
         * Initialize bufobj.
         */

Modified: head/sys/sys/lock.h
==============================================================================
--- head/sys/sys/lock.h Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/sys/lock.h Thu May  9 16:28:18 2013        (r250411)
@@ -79,6 +79,7 @@ struct lock_class {
#define LO_SLEEPABLE    0x00100000      /* Lock may be held while sleeping. */
#define LO_UPGRADABLE   0x00200000      /* Lock may be upgraded/downgraded. */
#define LO_DUPOK        0x00400000      /* Don't check for duplicate acquires */
+#define        LO_IS_VNODE     0x00800000      /* Tell WITNESS about a VNODE 
lock */
#define LO_CLASSMASK    0x0f000000      /* Class index bitmask. */
#define LO_NOPROFILE    0x10000000      /* Don't profile this lock */


Modified: head/sys/sys/lockmgr.h
==============================================================================
--- head/sys/sys/lockmgr.h      Thu May  9 16:09:39 2013        (r250410)
+++ head/sys/sys/lockmgr.h      Thu May  9 16:28:18 2013        (r250411)
@@ -146,6 +146,7 @@ _lockmgr_args_rw(struct lock *lk, u_int
#define LK_NOWITNESS    0x000010
#define LK_QUIET        0x000020
#define LK_ADAPTIVE     0x000040
+#define        LK_IS_VNODE     0x000080        /* Tell WITNESS about a VNODE 
lock */

/*
 * Additional attributes to be used in lockmgr().

_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Reply via email to