Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Jeff Layton
On Wed, 2017-05-10 at 07:18 -0700, Matthew Wilcox wrote:
> On Tue, May 09, 2017 at 11:49:16AM -0400, Jeff Layton wrote:
> > +++ b/lib/errseq.c
> > @@ -0,0 +1,199 @@
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +/*
> > + * An errseq_t is a way of recording errors in one place, and allowing any
> > + * number of "subscribers" to tell whether it has changed since an 
> > arbitrary
> > + * time of their choosing.
> 
> You use the word "time" in several places in the documentation, but I think
> it's clearer to say "sampling point" or "sample", since you're not using 
> jiffies
> or nanoseconds.  For example, I'd phrase this paragraph this way:
> 
>  * An errseq_t is a way of recording errors in one place, and allowing any
>  * number of "subscribers" to tell whether it has changed since they last
>  * sampled it.
> 
> > + * The general idea is for consumers to sample an errseq_t value at a
> > + * particular point in time. Later, that value can be used to tell whether 
> > any
> > + * new errors have occurred since that time.
> 
>  * The general idea is for consumers to sample an errseq_t value.  That
>  * value can be used to tell whether any new errors have occurred since
>  * the last time it was sampled.
> 
> > +/* The "ones" bit for the counter */
> 
> Maybe "The lowest bit of the counter"?
> 
> > +/**
> > + * errseq_check - has an error occurred since a particular point in time?
> 
> "has an error occurred since the last time it was sampled"
> 
> > +/**
> > + * errseq_check_and_advance - check an errseq_t and advance it to the 
> > current value
> > + * @eseq: pointer to value being checked reported
> 
> "value being checked reported"?
> 

Thanks. I'm cleaning up the comments like you suggest.

> > +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
> > +{
> > +   int err = 0;
> > +   errseq_t old, new;
> > +
> > +   /*
> > +* Most callers will want to use the inline wrapper to check this,
> > +* so that the common case of no error is handled without needing
> > +* to lock.
> > +*/
> > +   old = READ_ONCE(*eseq);
> > +   if (old != *since) {
> > +   /*
> > +* Set the flag and try to swap it into place if it has
> > +* changed.
> > +*
> > +* We don't care about the outcome of the swap here. If the
> > +* swap doesn't occur, then it has either been updated by a
> > +* writer who is bumping the seq count anyway, or another
> > +* reader who is just setting the "seen" flag. Either outcome
> > +* is OK, and we can advance "since" and return an error based
> > +* on what we have.
> > +*/
> > +   new = old | ERRSEQ_SEEN;
> > +   if (new != old)
> > +   cmpxchg(eseq, old, new);
> > +   *since = new;
> > +   err = -(new & MAX_ERRNO);
> > +   }
> 
> I probably need to read through the patchset some more to understand this.
> Naively, surely "since" should be updated to the current value of 'eseq'
> if we failed the cmpxchg()?

I don't think so. If we want to do that, then we'll need to redrive the
cmpxchg to set the SEEN flag if it's now clear. Storing the value in
"since" is effectively sampling it, so you do have to mark it seen.

The good news is that I think that "new" is just as valid a value to
store here as *eseq would be. It ends up representing an errseq_t value
that never actually got stored in eseq, but that's OK with the way this
works.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Matthew Wilcox
On Tue, May 09, 2017 at 11:49:16AM -0400, Jeff Layton wrote:
> +++ b/lib/errseq.c
> @@ -0,0 +1,199 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * An errseq_t is a way of recording errors in one place, and allowing any
> + * number of "subscribers" to tell whether it has changed since an arbitrary
> + * time of their choosing.

You use the word "time" in several places in the documentation, but I think
it's clearer to say "sampling point" or "sample", since you're not using jiffies
or nanoseconds.  For example, I'd phrase this paragraph this way:

 * An errseq_t is a way of recording errors in one place, and allowing any
 * number of "subscribers" to tell whether it has changed since they last
 * sampled it.

> + * The general idea is for consumers to sample an errseq_t value at a
> + * particular point in time. Later, that value can be used to tell whether 
> any
> + * new errors have occurred since that time.

 * The general idea is for consumers to sample an errseq_t value.  That
 * value can be used to tell whether any new errors have occurred since
 * the last time it was sampled.

> +/* The "ones" bit for the counter */

Maybe "The lowest bit of the counter"?

> +/**
> + * errseq_check - has an error occurred since a particular point in time?

"has an error occurred since the last time it was sampled"

> +/**
> + * errseq_check_and_advance - check an errseq_t and advance it to the 
> current value
> + * @eseq: pointer to value being checked reported

"value being checked reported"?

> +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
> +{
> + int err = 0;
> + errseq_t old, new;
> +
> + /*
> +  * Most callers will want to use the inline wrapper to check this,
> +  * so that the common case of no error is handled without needing
> +  * to lock.
> +  */
> + old = READ_ONCE(*eseq);
> + if (old != *since) {
> + /*
> +  * Set the flag and try to swap it into place if it has
> +  * changed.
> +  *
> +  * We don't care about the outcome of the swap here. If the
> +  * swap doesn't occur, then it has either been updated by a
> +  * writer who is bumping the seq count anyway, or another
> +  * reader who is just setting the "seen" flag. Either outcome
> +  * is OK, and we can advance "since" and return an error based
> +  * on what we have.
> +  */
> + new = old | ERRSEQ_SEEN;
> + if (new != old)
> + cmpxchg(eseq, old, new);
> + *since = new;
> + err = -(new & MAX_ERRNO);
> + }

I probably need to read through the patchset some more to understand this.
Naively, surely "since" should be updated to the current value of 'eseq'
if we failed the cmpxchg()?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Jan Kara
On Tue 09-05-17 11:49:16, Jeff Layton wrote:
> An errseq_t is a way of recording errors in one place, and allowing any
> number of "subscribers" to tell whether an error has been set again
> since a previous time.
> 
> It's implemented as an unsigned 32-bit value that is managed with atomic
> operations. The low order bits are designated to hold an error code
> (max size of MAX_ERRNO). The upper bits are used as a counter.
> 
> The API works with consumers sampling an errseq_t value at a particular
> point in time. Later, that value can be used to tell whether new errors
> have been set since that time.
> 
> Note that there is a 1 in 512k risk of collisions here if new errors
> are being recorded frequently, since we have so few bits to use as a
> counter. To mitigate this, one bit is used as a flag to tell whether the
> value has been sampled since a new value was recorded. That allows
> us to avoid bumping the counter if no one has sampled it since it
> was last bumped.
> 
> Later patches will build on this infrastructure to change how writeback
> errors are tracked in the kernel.
> 
> Signed-off-by: Jeff Layton 

The patch looks good to me. Feel free to add:

Reviewed-by: Jan Kara 

Just two nits below:
...
> +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
> +{
> + int err = 0;
> + errseq_t old, new;
> +
> + /*
> +  * Most callers will want to use the inline wrapper to check this,
> +  * so that the common case of no error is handled without needing
> +  * to lock.
> +  */

I'm not sure which locking you are speaking about here. Is the comment
stale?

> + old = READ_ONCE(*eseq);
> + if (old != *since) {
> + /*
> +  * Set the flag and try to swap it into place if it has
> +  * changed.
> +  *
> +  * We don't care about the outcome of the swap here. If the
> +  * swap doesn't occur, then it has either been updated by a
> +  * writer who is bumping the seq count anyway, or another

"bumping the seq count anyway" part is not quite true. Writer may see
ERRSEQ_SEEN not set and so just update the error code and leave seq count
as is. But since you compare full errseq_t for equality, this works out as
designed...

> +  * reader who is just setting the "seen" flag. Either outcome
> +  * is OK, and we can advance "since" and return an error based
> +  * on what we have.
> +  */
> + new = old | ERRSEQ_SEEN;
> + if (new != old)
> + cmpxchg(eseq, old, new);
> + *since = new;
> + err = -(new & MAX_ERRNO);
> + }
> + return err;
> +}
> +EXPORT_SYMBOL(errseq_check_and_advance);

Honza
-- 
Jan Kara 
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-10 Thread Jeff Layton
On Wed, 2017-05-10 at 08:03 +1000, NeilBrown wrote:
> On Tue, May 09 2017, Jeff Layton wrote:
> 
> > An errseq_t is a way of recording errors in one place, and allowing any
> > number of "subscribers" to tell whether an error has been set again
> > since a previous time.
> > 
> > It's implemented as an unsigned 32-bit value that is managed with atomic
> > operations. The low order bits are designated to hold an error code
> > (max size of MAX_ERRNO). The upper bits are used as a counter.
> > 
> > The API works with consumers sampling an errseq_t value at a particular
> > point in time. Later, that value can be used to tell whether new errors
> > have been set since that time.
> > 
> > Note that there is a 1 in 512k risk of collisions here if new errors
> > are being recorded frequently, since we have so few bits to use as a
> > counter. To mitigate this, one bit is used as a flag to tell whether the
> > value has been sampled since a new value was recorded. That allows
> > us to avoid bumping the counter if no one has sampled it since it
> > was last bumped.
> > 
> > Later patches will build on this infrastructure to change how writeback
> > errors are tracked in the kernel.
> > 
> > Signed-off-by: Jeff Layton 
> 
> I like that this is a separate lib/*.c - nicely structured too.
> 
> Reviewed-by: NeilBrown 
> 
> 

Thanks, yeah...it occurred to me that this scheme is not really specific
to writeback errors. While I can't think of another use-case for
errseq_t's right offhand, I think this makes for cleaner layering and
should make it easy to use in other ways should they arise.

-- 
Jeff Layton 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-09 Thread NeilBrown
On Tue, May 09 2017, Jeff Layton wrote:

> An errseq_t is a way of recording errors in one place, and allowing any
> number of "subscribers" to tell whether an error has been set again
> since a previous time.
>
> It's implemented as an unsigned 32-bit value that is managed with atomic
> operations. The low order bits are designated to hold an error code
> (max size of MAX_ERRNO). The upper bits are used as a counter.
>
> The API works with consumers sampling an errseq_t value at a particular
> point in time. Later, that value can be used to tell whether new errors
> have been set since that time.
>
> Note that there is a 1 in 512k risk of collisions here if new errors
> are being recorded frequently, since we have so few bits to use as a
> counter. To mitigate this, one bit is used as a flag to tell whether the
> value has been sampled since a new value was recorded. That allows
> us to avoid bumping the counter if no one has sampled it since it
> was last bumped.
>
> Later patches will build on this infrastructure to change how writeback
> errors are tracked in the kernel.
>
> Signed-off-by: Jeff Layton 

I like that this is a separate lib/*.c - nicely structured too.

Reviewed-by: NeilBrown 

Thanks,
NeilBrown


> ---
>  include/linux/errseq.h |  19 +
>  lib/Makefile   |   2 +-
>  lib/errseq.c   | 199 
> +
>  3 files changed, 219 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/errseq.h
>  create mode 100644 lib/errseq.c
>
> diff --git a/include/linux/errseq.h b/include/linux/errseq.h
> new file mode 100644
> index ..0d2555f310cd
> --- /dev/null
> +++ b/include/linux/errseq.h
> @@ -0,0 +1,19 @@
> +#ifndef _LINUX_ERRSEQ_H
> +#define _LINUX_ERRSEQ_H
> +
> +/* See lib/errseq.c for more info */
> +
> +typedef u32  errseq_t;
> +
> +void __errseq_set(errseq_t *eseq, int err);
> +static inline void errseq_set(errseq_t *eseq, int err)
> +{
> + /* Optimize for the common case of no error */
> + if (unlikely(err))
> + __errseq_set(eseq, err);
> +}
> +
> +errseq_t errseq_sample(errseq_t *eseq);
> +int errseq_check(errseq_t *eseq, errseq_t since);
> +int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
> +#endif
> diff --git a/lib/Makefile b/lib/Makefile
> index 320ac46a8725..2423afef40f7 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -41,7 +41,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o 
> random32.o \
>gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
>bsearch.o find_bit.o llist.o memweight.o kfifo.o \
>percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
> -  once.o refcount.o
> +  once.o refcount.o errseq.o
>  obj-y += string_helpers.o
>  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
>  obj-y += hexdump.o
> diff --git a/lib/errseq.c b/lib/errseq.c
> new file mode 100644
> index ..0f8b4ed460f0
> --- /dev/null
> +++ b/lib/errseq.c
> @@ -0,0 +1,199 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * An errseq_t is a way of recording errors in one place, and allowing any
> + * number of "subscribers" to tell whether it has changed since an arbitrary
> + * time of their choosing.
> + *
> + * It's implemented as an unsigned 32-bit value. The low order bits are
> + * designated to hold an error code (between 0 and -MAX_ERRNO). The upper 
> bits
> + * are used as a counter. This is done with atomics instead of locking so 
> that
> + * these functions can be called from any context.
> + *
> + * The general idea is for consumers to sample an errseq_t value at a
> + * particular point in time. Later, that value can be used to tell whether 
> any
> + * new errors have occurred since that time.
> + *
> + * Note that there is a risk of collisions, if new errors are being recorded
> + * frequently, since we have so few bits to use as a counter.
> + *
> + * To mitigate this, one bit is used as a flag to tell whether the value has
> + * been sampled since a new value was recorded. That allows us to avoid 
> bumping
> + * the counter if no one has sampled it since the last time an error was
> + * recorded.
> + *
> + * A new errseq_t should always be zeroed out.  A errseq_t value of all 
> zeroes
> + * is the special (but common) case where there has never been an error. An 
> all
> + * zero value thus serves as the "epoch" if one wishes to know whether there
> + * has ever been an error set since it was first initialized.
> + */
> +
> +/* The low bits are designated for error code (max of MAX_ERRNO) */
> +#define ERRSEQ_SHIFT ilog2(MAX_ERRNO + 1)
> +
> +/* This bit is used as a flag to indicate whether the value has been seen */
> +#define ERRSEQ_SEEN  (1 << ERRSEQ_SHIFT)
> +
> +/* The "ones" bit for the counter */
> +#define ERRSEQ_CTR_INC   (1 << (ERRSEQ_SHIFT + 1))
> +
> +/**
> + * __errseq_set - set a errseq_t for 

[PATCH v4 13/27] lib: add errseq_t type and infrastructure for handling it

2017-05-09 Thread Jeff Layton
An errseq_t is a way of recording errors in one place, and allowing any
number of "subscribers" to tell whether an error has been set again
since a previous time.

It's implemented as an unsigned 32-bit value that is managed with atomic
operations. The low order bits are designated to hold an error code
(max size of MAX_ERRNO). The upper bits are used as a counter.

The API works with consumers sampling an errseq_t value at a particular
point in time. Later, that value can be used to tell whether new errors
have been set since that time.

Note that there is a 1 in 512k risk of collisions here if new errors
are being recorded frequently, since we have so few bits to use as a
counter. To mitigate this, one bit is used as a flag to tell whether the
value has been sampled since a new value was recorded. That allows
us to avoid bumping the counter if no one has sampled it since it
was last bumped.

Later patches will build on this infrastructure to change how writeback
errors are tracked in the kernel.

Signed-off-by: Jeff Layton 
---
 include/linux/errseq.h |  19 +
 lib/Makefile   |   2 +-
 lib/errseq.c   | 199 +
 3 files changed, 219 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/errseq.h
 create mode 100644 lib/errseq.c

diff --git a/include/linux/errseq.h b/include/linux/errseq.h
new file mode 100644
index ..0d2555f310cd
--- /dev/null
+++ b/include/linux/errseq.h
@@ -0,0 +1,19 @@
+#ifndef _LINUX_ERRSEQ_H
+#define _LINUX_ERRSEQ_H
+
+/* See lib/errseq.c for more info */
+
+typedef u32errseq_t;
+
+void __errseq_set(errseq_t *eseq, int err);
+static inline void errseq_set(errseq_t *eseq, int err)
+{
+   /* Optimize for the common case of no error */
+   if (unlikely(err))
+   __errseq_set(eseq, err);
+}
+
+errseq_t errseq_sample(errseq_t *eseq);
+int errseq_check(errseq_t *eseq, errseq_t since);
+int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
+#endif
diff --git a/lib/Makefile b/lib/Makefile
index 320ac46a8725..2423afef40f7 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,7 +41,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o 
random32.o \
 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
-once.o refcount.o
+once.o refcount.o errseq.o
 obj-y += string_helpers.o
 obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
 obj-y += hexdump.o
diff --git a/lib/errseq.c b/lib/errseq.c
new file mode 100644
index ..0f8b4ed460f0
--- /dev/null
+++ b/lib/errseq.c
@@ -0,0 +1,199 @@
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * An errseq_t is a way of recording errors in one place, and allowing any
+ * number of "subscribers" to tell whether it has changed since an arbitrary
+ * time of their choosing.
+ *
+ * It's implemented as an unsigned 32-bit value. The low order bits are
+ * designated to hold an error code (between 0 and -MAX_ERRNO). The upper bits
+ * are used as a counter. This is done with atomics instead of locking so that
+ * these functions can be called from any context.
+ *
+ * The general idea is for consumers to sample an errseq_t value at a
+ * particular point in time. Later, that value can be used to tell whether any
+ * new errors have occurred since that time.
+ *
+ * Note that there is a risk of collisions, if new errors are being recorded
+ * frequently, since we have so few bits to use as a counter.
+ *
+ * To mitigate this, one bit is used as a flag to tell whether the value has
+ * been sampled since a new value was recorded. That allows us to avoid bumping
+ * the counter if no one has sampled it since the last time an error was
+ * recorded.
+ *
+ * A new errseq_t should always be zeroed out.  A errseq_t value of all zeroes
+ * is the special (but common) case where there has never been an error. An all
+ * zero value thus serves as the "epoch" if one wishes to know whether there
+ * has ever been an error set since it was first initialized.
+ */
+
+/* The low bits are designated for error code (max of MAX_ERRNO) */
+#define ERRSEQ_SHIFT   ilog2(MAX_ERRNO + 1)
+
+/* This bit is used as a flag to indicate whether the value has been seen */
+#define ERRSEQ_SEEN(1 << ERRSEQ_SHIFT)
+
+/* The "ones" bit for the counter */
+#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1))
+
+/**
+ * __errseq_set - set a errseq_t for later reporting
+ * @eseq: errseq_t field that should be set
+ * @err: error to set
+ *
+ * This function sets the error in *eseq, and increments the sequence counter
+ * if the last sequence was sampled at some point in the past.
+ *
+ * Any error set will always overwrite an existing error.
+ *
+ * Most callers will want to use the errseq_set inline wrapper to efficiently
+ * handle the common case