Re: [PATCH] [RFC] bcachefs: SIX locks (shared/intent/exclusive)

2018-05-21 Thread Kent Overstreet
On Mon, May 21, 2018 at 08:04:16PM -0700, Matthew Wilcox wrote:
> On Mon, May 21, 2018 at 10:19:51PM -0400, Kent Overstreet wrote:
> > New lock for bcachefs, like read/write locks but with a third state,
> > intent.
> > 
> > Intent locks conflict with each other, but not with read locks; taking a
> > write lock requires first holding an intent lock.
> 
> Can you put something in the description that these are sleeping locks
> (like mutexes), not spinning locks (like spinlocks)?  (Yeah, I know
> there's the opportunistic spin, but conceptually, they're sleeping locks).

Yup, I'll add that

> 
> Some other things I'd like documented:
> 
>  - Any number of readers can hold the lock
>  - Once one thread acquires the lock for intent, further intent acquisitions
>will block.  May new readers acquire the lock?

I think I should have that covered already - "Intent does not block read, but
does block other intent locks"

>  - You cannot acquire the lock for write directly, you must acquire it for
>intent first, then upgrade to write.
>  - Can you downgrade to read from intent, or downgrade from write back to
>intent?

You hold both write and intent, like so:

six_lock_intent(>lock);
six_lock_write(>lock);
six_unlock_write(>lock);
six_unlock_intent(>lock);


>  - Once you are trying to upgrade from intent to write, are new read
>acquisitions blocked? (can readers starve writers?)

Readers can starve writers in the current implementation, but that's something
that should probably be fixed...

>  - When you drop the lock as a writer, do we prefer reader acquisitions
>over intent acquisitions?  That is, if we have a queue of RRIRIRIR,
>and we drop the lock, does the queue look like II or IRIR?

Separate queues per lock type, so dropping a write lock will wake up everyone
trying to take a read lock, and dropping an intent lock wakes up everyone trying
to take an intent lock.

---

Here's the new documentation I just wrote:

/*
 * Shared/intent/exclusive locks: sleepable read/write locks, much like rw
 * semaphores, except with a third intermediate state, intent. Basic operations
 * are:
 *
 * six_lock_read(>lock);
 * six_unlock_read(>lock);
 *
 * six_lock_intent(>lock);
 * six_unlock_intent(>lock);
 *
 * six_lock_write(>lock);
 * six_unlock_write(>lock);
 *
 * Intent locks block other intent locks, but do not block read locks, and you
 * must have an intent lock held before taking a write lock, like so:
 *
 * six_lock_intent(>lock);
 * six_lock_write(>lock);
 * six_unlock_write(>lock);
 * six_unlock_intent(>lock);
 *
 * Other operations:
 *
 *   six_trylock_read()
 *   six_trylock_intent()
 *   six_trylock_write()
 *
 *   six_lock_downgrade():  convert from intent to read
 *   six_lock_tryupgrade(): attempt to convert from read to intent
 *
 * Locks also embed a sequence number, which is incremented when the lock is
 * locked or unlocked for write. The current sequence number can be grabbed
 * while a lock is held from lock->state.seq; then, if you drop the lock you can
 * use six_relock_(read|intent_write)(lock, seq) to attempt to retake the lock
 * iff it hasn't been locked for write in the meantime.
 *
 * There are also operations that take the lock type as a parameter, where the
 * type is one of SIX_LOCK_read, SIX_LOCK_intent, or SIX_LOCK_write:
 *
 *   six_lock_type(lock, type)
 *   six_unlock_type(lock, type)
 *   six_relock(lock, type, seq)
 *   six_trylock_type(lock, type)
 *   six_trylock_convert(lock, from, to)
 *
 * A lock may be held multiple types by the same thread (for read or intent,
 * not write) - up to SIX_LOCK_MAX_RECURSE. However, the six locks code does
 * _not_ implement the actual recursive checks itself though - rather, if your
 * code (e.g. btree iterator code) knows that the current thread already has a
 * lock held, and for the correct type, six_lock_increment() may be used to
 * bump up the counter for that type - the only effect is that one more call to
 * unlock will be required before the lock is unlocked.
 *
 */
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [RFC] bcachefs: SIX locks (shared/intent/exclusive)

2018-05-21 Thread Matthew Wilcox
On Mon, May 21, 2018 at 10:19:51PM -0400, Kent Overstreet wrote:
> New lock for bcachefs, like read/write locks but with a third state,
> intent.
> 
> Intent locks conflict with each other, but not with read locks; taking a
> write lock requires first holding an intent lock.

Can you put something in the description that these are sleeping locks
(like mutexes), not spinning locks (like spinlocks)?  (Yeah, I know
there's the opportunistic spin, but conceptually, they're sleeping locks).

Some other things I'd like documented:

 - Any number of readers can hold the lock
 - Once one thread acquires the lock for intent, further intent acquisitions
   will block.  May new readers acquire the lock?
 - You cannot acquire the lock for write directly, you must acquire it for
   intent first, then upgrade to write.
 - Can you downgrade to read from intent, or downgrade from write back to
   intent?
 - Once you are trying to upgrade from intent to write, are new read
   acquisitions blocked? (can readers starve writers?)
 - When you drop the lock as a writer, do we prefer reader acquisitions
   over intent acquisitions?  That is, if we have a queue of RRIRIRIR,
   and we drop the lock, does the queue look like II or IRIR?

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] [RFC] bcachefs: SIX locks (shared/intent/exclusive)

2018-05-21 Thread Kent Overstreet
New lock for bcachefs, like read/write locks but with a third state,
intent.

Intent locks conflict with each other, but not with read locks; taking a
write lock requires first holding an intent lock.

The purpose is for multi node data structures (i.e. btrees), where if we were
using read/write locks we might need to hold a write lock for the duration of an
operation purely to avoid deadlocks.

For example, when splitting a btree node we lock a leaf node, allocate two new
nodes, copy the contents of the old node into the new nodes, then update the
parent node. With read write locks, we'd need to hold a write lock on the parent
node for the entire duration, because if we don't take it until we have to
update the parent we deadlock - because we still have a child locked - and we
can't unlock the child, because we need to free it as soon as we've deleted the
pointer to it. This blocks not only lookups, but updates that would only touch
unrelated leaf nodes.

With intent locks, we can hold an intent lock on the parent node for the
duration of the operation, only taking a write lock on it for the update to that
specific node.

Signed-off-by: Kent Overstreet 
---

I implemented these for bcachefs's btrees, but they're really not bcachefs
specific - I figure since I don't have the only btree implementation it might be
worth at least seeing if anyone else sees a use for them.

Also, they use osq_lock()/unlock(), which Peter Zijlstra really doesn't want
exported, so if anyone else would like to use them they could be moved out of
fs/bcachefs/ and that would solve another problem for me :)

 fs/bcachefs/six.c | 516 ++
 fs/bcachefs/six.h | 190 +
 2 files changed, 706 insertions(+)
 create mode 100644 fs/bcachefs/six.c
 create mode 100644 fs/bcachefs/six.h

diff --git a/fs/bcachefs/six.c b/fs/bcachefs/six.c
new file mode 100644
index 00..afa59a476a
--- /dev/null
+++ b/fs/bcachefs/six.c
@@ -0,0 +1,516 @@
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "six.h"
+
+#define six_acquire(l, t)  lock_acquire(l, 0, t, 0, 0, NULL, _RET_IP_)
+#define six_release(l) lock_release(l, 0, _RET_IP_)
+
+struct six_lock_vals {
+   /* Value we add to the lock in order to take the lock: */
+   u64 lock_val;
+
+   /* If the lock has this value (used as a mask), taking the lock fails: 
*/
+   u64 lock_fail;
+
+   /* Value we add to the lock in order to release the lock: */
+   u64 unlock_val;
+
+   /* Mask that indicates lock is held for this type: */
+   u64 held_mask;
+
+   /* Waitlist we wakeup when releasing the lock: */
+   enum six_lock_type  unlock_wakeup;
+};
+
+#define __SIX_LOCK_HELD_read   __SIX_VAL(read_lock, ~0)
+#define __SIX_LOCK_HELD_intent __SIX_VAL(intent_lock, ~0)
+#define __SIX_LOCK_HELD_write  __SIX_VAL(seq, 1)
+
+#define LOCK_VALS {\
+   [SIX_LOCK_read] = { \
+   .lock_val   = __SIX_VAL(read_lock, 1),  \
+   .lock_fail  = __SIX_LOCK_HELD_write,\
+   .unlock_val = -__SIX_VAL(read_lock, 1), \
+   .held_mask  = __SIX_LOCK_HELD_read, \
+   .unlock_wakeup  = SIX_LOCK_write,   \
+   },  \
+   [SIX_LOCK_intent] = {   \
+   .lock_val   = __SIX_VAL(intent_lock, 1),\
+   .lock_fail  = __SIX_LOCK_HELD_intent,   \
+   .unlock_val = -__SIX_VAL(intent_lock, 1),   \
+   .held_mask  = __SIX_LOCK_HELD_intent,   \
+   .unlock_wakeup  = SIX_LOCK_intent,  \
+   },  \
+   [SIX_LOCK_write] = {\
+   .lock_val   = __SIX_VAL(seq, 1),\
+   .lock_fail  = __SIX_LOCK_HELD_read, \
+   .unlock_val = __SIX_VAL(seq, 1),\
+   .held_mask  = __SIX_LOCK_HELD_write,\
+   .unlock_wakeup  = SIX_LOCK_read,\
+   },  \
+}
+
+static inline void six_set_owner(struct six_lock *lock, enum six_lock_type 
type,
+union six_lock_state old)
+{
+   if (type != SIX_LOCK_intent)
+   return;
+
+   if (!old.intent_lock) {
+   EBUG_ON(lock->owner);
+   lock->owner = current;
+   } else {
+