Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-25 Thread Kent Overstreet
On Tue, Oct 25, 2016 at 09:55:21AM -0700, Eric Wheeler wrote:
> Sure, I'll put it up with my -rc2 pull request to Jens.  
> 
> A couple of sanity checks (for my understanding at least):
> 
> Why does bch_data_insert_start() no longer need to call 
> set_gc_sectors(op->c) now that bch_cache_set_alloc and bch_gc_thread do?  

Before, the gc thread wasn't being explicitly signalled that it was time to run
- it was just being woken up, meaning that a spurious wakeup would cause gc to
run. At best it was sketchy and fragile, for multiple reasons - wait_event() is
a much better mechanism.

So with wait_event() gc is explicitly checking if it's time to run - the wakeup
doesn't really make anything happen by itself, it just pokes the gc thread so
it's able to notice that it should run.

When you're signalling a thread this way - we're in effect setting a global
variable that says "gc should run now", then kicking the gc thread so it can
check the "gc should run" variable. But wakeups aren't synchronous - our call to
wake_up() doesn't make the gc thread check that variable before it returns, all
we know when the wake_up() call returns is that the gc thread is going to check
that variable some point in the future. So we can't set the "gc should run"
varible, wake up the gc thread, and then set it back to "gc shouldn't run yet" -
what'll happen most of the time is that the gc thread won't run before we set
the variable back to "gc shouldn't run yet", it'll never see that it was
supposed to run and it'll go back to sleep.

So the way you make this work is you have the gc thread has to set the variable
back to "gc shouldn't run yet" _after_ it's seen it and decided to run.

> Does bch_cache_set_alloc() even need to call set_gc_sectors since 
> bch_gc_thread() does before calling bch_btree_gc?

Yes, because the gc thread only resets the counter when it's decided to run - we
don't want it to run right away at startup.

> Also I'm curious, why change invalidate_needs_gc from a bitfield? 

Bitfields are particularly unsafe for multiple threads to access - the compiler
has to emit instructions to do read/modify/write, which will clobber adjacent
data. A bare int is also not in _general_ safe for multiple threads to access
without a lock, but for what it's doing here it's fine.


Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-25 Thread Kent Overstreet
On Tue, Oct 25, 2016 at 09:55:21AM -0700, Eric Wheeler wrote:
> Sure, I'll put it up with my -rc2 pull request to Jens.  
> 
> A couple of sanity checks (for my understanding at least):
> 
> Why does bch_data_insert_start() no longer need to call 
> set_gc_sectors(op->c) now that bch_cache_set_alloc and bch_gc_thread do?  

Before, the gc thread wasn't being explicitly signalled that it was time to run
- it was just being woken up, meaning that a spurious wakeup would cause gc to
run. At best it was sketchy and fragile, for multiple reasons - wait_event() is
a much better mechanism.

So with wait_event() gc is explicitly checking if it's time to run - the wakeup
doesn't really make anything happen by itself, it just pokes the gc thread so
it's able to notice that it should run.

When you're signalling a thread this way - we're in effect setting a global
variable that says "gc should run now", then kicking the gc thread so it can
check the "gc should run" variable. But wakeups aren't synchronous - our call to
wake_up() doesn't make the gc thread check that variable before it returns, all
we know when the wake_up() call returns is that the gc thread is going to check
that variable some point in the future. So we can't set the "gc should run"
varible, wake up the gc thread, and then set it back to "gc shouldn't run yet" -
what'll happen most of the time is that the gc thread won't run before we set
the variable back to "gc shouldn't run yet", it'll never see that it was
supposed to run and it'll go back to sleep.

So the way you make this work is you have the gc thread has to set the variable
back to "gc shouldn't run yet" _after_ it's seen it and decided to run.

> Does bch_cache_set_alloc() even need to call set_gc_sectors since 
> bch_gc_thread() does before calling bch_btree_gc?

Yes, because the gc thread only resets the counter when it's decided to run - we
don't want it to run right away at startup.

> Also I'm curious, why change invalidate_needs_gc from a bitfield? 

Bitfields are particularly unsafe for multiple threads to access - the compiler
has to emit instructions to do read/modify/write, which will clobber adjacent
data. A bare int is also not in _general_ safe for multiple threads to access
without a lock, but for what it's doing here it's fine.


Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-25 Thread Eric Wheeler
On Mon, 24 Oct 2016, Kent Overstreet wrote:
> On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> > On Wed, 19 Oct 2016, Peter Zijlstra wrote:
> > 
> > > Subject: sched: Better explain sleep/wakeup
> > > From: Peter Zijlstra 
> > > Date: Wed Oct 19 15:45:27 CEST 2016
> > > 
> > > There were a few questions wrt how sleep-wakeup works. Try and explain
> > > it more.
> > > 
> > > Requested-by: Will Deacon 
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > ---
> > > include/linux/sched.h |   52 
> > > ++
> > > kernel/sched/core.c   |   15 +++---
> > > 2 files changed, 44 insertions(+), 23 deletions(-)
> > > 
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > > #define set_task_state(tsk, state_value)  \
> > >   do {\
> > >   (tsk)->task_state_change = _THIS_IP_;   \
> > > - smp_store_mb((tsk)->state, (state_value));  \
> > > + smp_store_mb((tsk)->state, (state_value));  \
> > >   } while (0)
> > > 
> > > -/*
> > > - * set_current_state() includes a barrier so that the write of 
> > > current->state
> > > - * is correctly serialised wrt the caller's subsequent test of whether to
> > > - * actually sleep:
> > > - *
> > > - *   set_current_state(TASK_UNINTERRUPTIBLE);
> > > - *   if (do_i_need_to_sleep())
> > > - *   schedule();
> > > - *
> > > - * If the caller does not need such serialisation then use 
> > > __set_current_state()
> > > - */
> > > #define __set_current_state(state_value)  \
> > >   do {\
> > >   current->task_state_change = _THIS_IP_; \
> > > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > > #define set_current_state(state_value)\
> > >   do {\
> > >   current->task_state_change = _THIS_IP_; \
> > > - smp_store_mb(current->state, (state_value));\
> > > + smp_store_mb(current->state, (state_value));\
> > >   } while (0)
> > > 
> > > #else
> > > 
> > > +/*
> > > + * @tsk had better be current, or you get to keep the pieces.
> > 
> > That reminds me we were getting rid of the set_task_state() calls. Bcache 
> > was
> > pending, being only user in the kernel that doesn't actually use current; 
> > but
> > instead breaks newly (yet blocked/uninterruptible) created garbage 
> > collection
> > kthread. I cannot figure out why this is done (ie purposely accounting the
> > load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> > as as by the allocator task, so I don't see why bcache gc should want to be
> > interruptible.
> > 
> > Kent, Jens, can we get rid of this?
> 
> Here's a patch that just fixes the way gc gets woken up. Eric, you want to 
> take
> this?

Sure, I'll put it up with my -rc2 pull request to Jens.  

A couple of sanity checks (for my understanding at least):

Why does bch_data_insert_start() no longer need to call 
set_gc_sectors(op->c) now that bch_cache_set_alloc and bch_gc_thread do?  

Does bch_cache_set_alloc() even need to call set_gc_sectors since 
bch_gc_thread() does before calling bch_btree_gc?

Also I'm curious, why change invalidate_needs_gc from a bitfield? 

-Eric

> 
> -- >8 --
> Subject: [PATCH] bcache: Make gc wakeup saner
> 
> This lets us ditch a set_task_state() call.
> 
> Signed-off-by: Kent Overstreet 
> ---
>  drivers/md/bcache/bcache.h  |  4 ++--
>  drivers/md/bcache/btree.c   | 39 ---
>  drivers/md/bcache/btree.h   |  3 +--
>  drivers/md/bcache/request.c |  4 +---
>  drivers/md/bcache/super.c   |  2 ++
>  5 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 6b420a55c7..c3ea03c9a1 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -425,7 +425,7 @@ struct cache {
>* until a gc finishes - otherwise we could pointlessly burn a ton of
>* cpu
>*/
> - unsignedinvalidate_needs_gc:1;
> + unsignedinvalidate_needs_gc;
>  
>   booldiscard; /* Get rid of? */
>  
> @@ -593,8 +593,8 @@ struct cache_set {
>  
>   /* Counts how many sectors bio_insert has added to the cache */
>   atomic_tsectors_to_gc;
> + wait_queue_head_t   gc_wait;
>  
> - wait_queue_head_t   moving_gc_wait;
>   struct keybuf   moving_gc_keys;
>   /* Number of moving GC bios in flight */
>   struct semaphoremoving_in_flight;
> diff --git a/drivers/md/bcache/btree.c 

Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-25 Thread Eric Wheeler
On Mon, 24 Oct 2016, Kent Overstreet wrote:
> On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> > On Wed, 19 Oct 2016, Peter Zijlstra wrote:
> > 
> > > Subject: sched: Better explain sleep/wakeup
> > > From: Peter Zijlstra 
> > > Date: Wed Oct 19 15:45:27 CEST 2016
> > > 
> > > There were a few questions wrt how sleep-wakeup works. Try and explain
> > > it more.
> > > 
> > > Requested-by: Will Deacon 
> > > Signed-off-by: Peter Zijlstra (Intel) 
> > > ---
> > > include/linux/sched.h |   52 
> > > ++
> > > kernel/sched/core.c   |   15 +++---
> > > 2 files changed, 44 insertions(+), 23 deletions(-)
> > > 
> > > --- a/include/linux/sched.h
> > > +++ b/include/linux/sched.h
> > > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > > #define set_task_state(tsk, state_value)  \
> > >   do {\
> > >   (tsk)->task_state_change = _THIS_IP_;   \
> > > - smp_store_mb((tsk)->state, (state_value));  \
> > > + smp_store_mb((tsk)->state, (state_value));  \
> > >   } while (0)
> > > 
> > > -/*
> > > - * set_current_state() includes a barrier so that the write of 
> > > current->state
> > > - * is correctly serialised wrt the caller's subsequent test of whether to
> > > - * actually sleep:
> > > - *
> > > - *   set_current_state(TASK_UNINTERRUPTIBLE);
> > > - *   if (do_i_need_to_sleep())
> > > - *   schedule();
> > > - *
> > > - * If the caller does not need such serialisation then use 
> > > __set_current_state()
> > > - */
> > > #define __set_current_state(state_value)  \
> > >   do {\
> > >   current->task_state_change = _THIS_IP_; \
> > > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > > #define set_current_state(state_value)\
> > >   do {\
> > >   current->task_state_change = _THIS_IP_; \
> > > - smp_store_mb(current->state, (state_value));\
> > > + smp_store_mb(current->state, (state_value));\
> > >   } while (0)
> > > 
> > > #else
> > > 
> > > +/*
> > > + * @tsk had better be current, or you get to keep the pieces.
> > 
> > That reminds me we were getting rid of the set_task_state() calls. Bcache 
> > was
> > pending, being only user in the kernel that doesn't actually use current; 
> > but
> > instead breaks newly (yet blocked/uninterruptible) created garbage 
> > collection
> > kthread. I cannot figure out why this is done (ie purposely accounting the
> > load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> > as as by the allocator task, so I don't see why bcache gc should want to be
> > interruptible.
> > 
> > Kent, Jens, can we get rid of this?
> 
> Here's a patch that just fixes the way gc gets woken up. Eric, you want to 
> take
> this?

Sure, I'll put it up with my -rc2 pull request to Jens.  

A couple of sanity checks (for my understanding at least):

Why does bch_data_insert_start() no longer need to call 
set_gc_sectors(op->c) now that bch_cache_set_alloc and bch_gc_thread do?  

Does bch_cache_set_alloc() even need to call set_gc_sectors since 
bch_gc_thread() does before calling bch_btree_gc?

Also I'm curious, why change invalidate_needs_gc from a bitfield? 

-Eric

> 
> -- >8 --
> Subject: [PATCH] bcache: Make gc wakeup saner
> 
> This lets us ditch a set_task_state() call.
> 
> Signed-off-by: Kent Overstreet 
> ---
>  drivers/md/bcache/bcache.h  |  4 ++--
>  drivers/md/bcache/btree.c   | 39 ---
>  drivers/md/bcache/btree.h   |  3 +--
>  drivers/md/bcache/request.c |  4 +---
>  drivers/md/bcache/super.c   |  2 ++
>  5 files changed, 26 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index 6b420a55c7..c3ea03c9a1 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -425,7 +425,7 @@ struct cache {
>* until a gc finishes - otherwise we could pointlessly burn a ton of
>* cpu
>*/
> - unsignedinvalidate_needs_gc:1;
> + unsignedinvalidate_needs_gc;
>  
>   booldiscard; /* Get rid of? */
>  
> @@ -593,8 +593,8 @@ struct cache_set {
>  
>   /* Counts how many sectors bio_insert has added to the cache */
>   atomic_tsectors_to_gc;
> + wait_queue_head_t   gc_wait;
>  
> - wait_queue_head_t   moving_gc_wait;
>   struct keybuf   moving_gc_keys;
>   /* Number of moving GC bios in flight */
>   struct semaphoremoving_in_flight;
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 81d3db40cd..2efdce0724 100644
> --- a/drivers/md/bcache/btree.c
> 

Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-24 Thread Kent Overstreet
On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> On Wed, 19 Oct 2016, Peter Zijlstra wrote:
> 
> > Subject: sched: Better explain sleep/wakeup
> > From: Peter Zijlstra 
> > Date: Wed Oct 19 15:45:27 CEST 2016
> > 
> > There were a few questions wrt how sleep-wakeup works. Try and explain
> > it more.
> > 
> > Requested-by: Will Deacon 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> > include/linux/sched.h |   52 
> > ++
> > kernel/sched/core.c   |   15 +++---
> > 2 files changed, 44 insertions(+), 23 deletions(-)
> > 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_task_state(tsk, state_value)\
> > do {\
> > (tsk)->task_state_change = _THIS_IP_;   \
> > -   smp_store_mb((tsk)->state, (state_value));  \
> > +   smp_store_mb((tsk)->state, (state_value));  \
> > } while (0)
> > 
> > -/*
> > - * set_current_state() includes a barrier so that the write of 
> > current->state
> > - * is correctly serialised wrt the caller's subsequent test of whether to
> > - * actually sleep:
> > - *
> > - * set_current_state(TASK_UNINTERRUPTIBLE);
> > - * if (do_i_need_to_sleep())
> > - * schedule();
> > - *
> > - * If the caller does not need such serialisation then use 
> > __set_current_state()
> > - */
> > #define __set_current_state(state_value)\
> > do {\
> > current->task_state_change = _THIS_IP_; \
> > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_current_state(state_value)  \
> > do {\
> > current->task_state_change = _THIS_IP_; \
> > -   smp_store_mb(current->state, (state_value));\
> > +   smp_store_mb(current->state, (state_value));\
> > } while (0)
> > 
> > #else
> > 
> > +/*
> > + * @tsk had better be current, or you get to keep the pieces.
> 
> That reminds me we were getting rid of the set_task_state() calls. Bcache was
> pending, being only user in the kernel that doesn't actually use current; but
> instead breaks newly (yet blocked/uninterruptible) created garbage collection
> kthread. I cannot figure out why this is done (ie purposely accounting the
> load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> as as by the allocator task, so I don't see why bcache gc should want to be
> interruptible.
> 
> Kent, Jens, can we get rid of this?

Here's a patch that just fixes the way gc gets woken up. Eric, you want to take
this?

-- >8 --
Subject: [PATCH] bcache: Make gc wakeup saner

This lets us ditch a set_task_state() call.

Signed-off-by: Kent Overstreet 
---
 drivers/md/bcache/bcache.h  |  4 ++--
 drivers/md/bcache/btree.c   | 39 ---
 drivers/md/bcache/btree.h   |  3 +--
 drivers/md/bcache/request.c |  4 +---
 drivers/md/bcache/super.c   |  2 ++
 5 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 6b420a55c7..c3ea03c9a1 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -425,7 +425,7 @@ struct cache {
 * until a gc finishes - otherwise we could pointlessly burn a ton of
 * cpu
 */
-   unsignedinvalidate_needs_gc:1;
+   unsignedinvalidate_needs_gc;
 
booldiscard; /* Get rid of? */
 
@@ -593,8 +593,8 @@ struct cache_set {
 
/* Counts how many sectors bio_insert has added to the cache */
atomic_tsectors_to_gc;
+   wait_queue_head_t   gc_wait;
 
-   wait_queue_head_t   moving_gc_wait;
struct keybuf   moving_gc_keys;
/* Number of moving GC bios in flight */
struct semaphoremoving_in_flight;
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81d3db40cd..2efdce0724 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1757,32 +1757,34 @@ static void bch_btree_gc(struct cache_set *c)
bch_moving_gc(c);
 }
 
-static int bch_gc_thread(void *arg)
+static bool gc_should_run(struct cache_set *c)
 {
-   struct cache_set *c = arg;
struct cache *ca;
unsigned i;
 
-   while (1) {
-again:
-   bch_btree_gc(c);
+   for_each_cache(ca, c, i)
+   if (ca->invalidate_needs_gc)
+   return true;
 
-   set_current_state(TASK_INTERRUPTIBLE);
-   if (kthread_should_stop())
-  

Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-24 Thread Kent Overstreet
On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> On Wed, 19 Oct 2016, Peter Zijlstra wrote:
> 
> > Subject: sched: Better explain sleep/wakeup
> > From: Peter Zijlstra 
> > Date: Wed Oct 19 15:45:27 CEST 2016
> > 
> > There were a few questions wrt how sleep-wakeup works. Try and explain
> > it more.
> > 
> > Requested-by: Will Deacon 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> > include/linux/sched.h |   52 
> > ++
> > kernel/sched/core.c   |   15 +++---
> > 2 files changed, 44 insertions(+), 23 deletions(-)
> > 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_task_state(tsk, state_value)\
> > do {\
> > (tsk)->task_state_change = _THIS_IP_;   \
> > -   smp_store_mb((tsk)->state, (state_value));  \
> > +   smp_store_mb((tsk)->state, (state_value));  \
> > } while (0)
> > 
> > -/*
> > - * set_current_state() includes a barrier so that the write of 
> > current->state
> > - * is correctly serialised wrt the caller's subsequent test of whether to
> > - * actually sleep:
> > - *
> > - * set_current_state(TASK_UNINTERRUPTIBLE);
> > - * if (do_i_need_to_sleep())
> > - * schedule();
> > - *
> > - * If the caller does not need such serialisation then use 
> > __set_current_state()
> > - */
> > #define __set_current_state(state_value)\
> > do {\
> > current->task_state_change = _THIS_IP_; \
> > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_current_state(state_value)  \
> > do {\
> > current->task_state_change = _THIS_IP_; \
> > -   smp_store_mb(current->state, (state_value));\
> > +   smp_store_mb(current->state, (state_value));\
> > } while (0)
> > 
> > #else
> > 
> > +/*
> > + * @tsk had better be current, or you get to keep the pieces.
> 
> That reminds me we were getting rid of the set_task_state() calls. Bcache was
> pending, being only user in the kernel that doesn't actually use current; but
> instead breaks newly (yet blocked/uninterruptible) created garbage collection
> kthread. I cannot figure out why this is done (ie purposely accounting the
> load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> as as by the allocator task, so I don't see why bcache gc should want to be
> interruptible.
> 
> Kent, Jens, can we get rid of this?

Here's a patch that just fixes the way gc gets woken up. Eric, you want to take
this?

-- >8 --
Subject: [PATCH] bcache: Make gc wakeup saner

This lets us ditch a set_task_state() call.

Signed-off-by: Kent Overstreet 
---
 drivers/md/bcache/bcache.h  |  4 ++--
 drivers/md/bcache/btree.c   | 39 ---
 drivers/md/bcache/btree.h   |  3 +--
 drivers/md/bcache/request.c |  4 +---
 drivers/md/bcache/super.c   |  2 ++
 5 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 6b420a55c7..c3ea03c9a1 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -425,7 +425,7 @@ struct cache {
 * until a gc finishes - otherwise we could pointlessly burn a ton of
 * cpu
 */
-   unsignedinvalidate_needs_gc:1;
+   unsignedinvalidate_needs_gc;
 
booldiscard; /* Get rid of? */
 
@@ -593,8 +593,8 @@ struct cache_set {
 
/* Counts how many sectors bio_insert has added to the cache */
atomic_tsectors_to_gc;
+   wait_queue_head_t   gc_wait;
 
-   wait_queue_head_t   moving_gc_wait;
struct keybuf   moving_gc_keys;
/* Number of moving GC bios in flight */
struct semaphoremoving_in_flight;
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 81d3db40cd..2efdce0724 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1757,32 +1757,34 @@ static void bch_btree_gc(struct cache_set *c)
bch_moving_gc(c);
 }
 
-static int bch_gc_thread(void *arg)
+static bool gc_should_run(struct cache_set *c)
 {
-   struct cache_set *c = arg;
struct cache *ca;
unsigned i;
 
-   while (1) {
-again:
-   bch_btree_gc(c);
+   for_each_cache(ca, c, i)
+   if (ca->invalidate_needs_gc)
+   return true;
 
-   set_current_state(TASK_INTERRUPTIBLE);
-   if (kthread_should_stop())
-   break;
+   if (atomic_read(>sectors_to_gc) < 0)
+   return true;
 
-   

Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-24 Thread Kent Overstreet
On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> On Wed, 19 Oct 2016, Peter Zijlstra wrote:
> 
> > Subject: sched: Better explain sleep/wakeup
> > From: Peter Zijlstra 
> > Date: Wed Oct 19 15:45:27 CEST 2016
> > 
> > There were a few questions wrt how sleep-wakeup works. Try and explain
> > it more.
> > 
> > Requested-by: Will Deacon 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> > include/linux/sched.h |   52 
> > ++
> > kernel/sched/core.c   |   15 +++---
> > 2 files changed, 44 insertions(+), 23 deletions(-)
> > 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_task_state(tsk, state_value)\
> > do {\
> > (tsk)->task_state_change = _THIS_IP_;   \
> > -   smp_store_mb((tsk)->state, (state_value));  \
> > +   smp_store_mb((tsk)->state, (state_value));  \
> > } while (0)
> > 
> > -/*
> > - * set_current_state() includes a barrier so that the write of 
> > current->state
> > - * is correctly serialised wrt the caller's subsequent test of whether to
> > - * actually sleep:
> > - *
> > - * set_current_state(TASK_UNINTERRUPTIBLE);
> > - * if (do_i_need_to_sleep())
> > - * schedule();
> > - *
> > - * If the caller does not need such serialisation then use 
> > __set_current_state()
> > - */
> > #define __set_current_state(state_value)\
> > do {\
> > current->task_state_change = _THIS_IP_; \
> > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_current_state(state_value)  \
> > do {\
> > current->task_state_change = _THIS_IP_; \
> > -   smp_store_mb(current->state, (state_value));\
> > +   smp_store_mb(current->state, (state_value));\
> > } while (0)
> > 
> > #else
> > 
> > +/*
> > + * @tsk had better be current, or you get to keep the pieces.
> 
> That reminds me we were getting rid of the set_task_state() calls. Bcache was
> pending, being only user in the kernel that doesn't actually use current; but
> instead breaks newly (yet blocked/uninterruptible) created garbage collection
> kthread. I cannot figure out why this is done (ie purposely accounting the
> load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> as as by the allocator task, so I don't see why bcache gc should want to be
> interruptible.
> 
> Kent, Jens, can we get rid of this?
> 
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 76f7534d1dd1..6e3c358b5759 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -1798,7 +1798,6 @@ int bch_gc_thread_start(struct cache_set *c)
>if (IS_ERR(c->gc_thread))
>return PTR_ERR(c->gc_thread);
> 
> -   set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
>return 0;
> }

Actually, that code looks broken, or at least stupid. Let me do a proper fix...


Re: ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-24 Thread Kent Overstreet
On Sun, Oct 23, 2016 at 06:57:26PM -0700, Davidlohr Bueso wrote:
> On Wed, 19 Oct 2016, Peter Zijlstra wrote:
> 
> > Subject: sched: Better explain sleep/wakeup
> > From: Peter Zijlstra 
> > Date: Wed Oct 19 15:45:27 CEST 2016
> > 
> > There were a few questions wrt how sleep-wakeup works. Try and explain
> > it more.
> > 
> > Requested-by: Will Deacon 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > ---
> > include/linux/sched.h |   52 
> > ++
> > kernel/sched/core.c   |   15 +++---
> > 2 files changed, 44 insertions(+), 23 deletions(-)
> > 
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_task_state(tsk, state_value)\
> > do {\
> > (tsk)->task_state_change = _THIS_IP_;   \
> > -   smp_store_mb((tsk)->state, (state_value));  \
> > +   smp_store_mb((tsk)->state, (state_value));  \
> > } while (0)
> > 
> > -/*
> > - * set_current_state() includes a barrier so that the write of 
> > current->state
> > - * is correctly serialised wrt the caller's subsequent test of whether to
> > - * actually sleep:
> > - *
> > - * set_current_state(TASK_UNINTERRUPTIBLE);
> > - * if (do_i_need_to_sleep())
> > - * schedule();
> > - *
> > - * If the caller does not need such serialisation then use 
> > __set_current_state()
> > - */
> > #define __set_current_state(state_value)\
> > do {\
> > current->task_state_change = _THIS_IP_; \
> > @@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
> > #define set_current_state(state_value)  \
> > do {\
> > current->task_state_change = _THIS_IP_; \
> > -   smp_store_mb(current->state, (state_value));\
> > +   smp_store_mb(current->state, (state_value));\
> > } while (0)
> > 
> > #else
> > 
> > +/*
> > + * @tsk had better be current, or you get to keep the pieces.
> 
> That reminds me we were getting rid of the set_task_state() calls. Bcache was
> pending, being only user in the kernel that doesn't actually use current; but
> instead breaks newly (yet blocked/uninterruptible) created garbage collection
> kthread. I cannot figure out why this is done (ie purposely accounting the
> load avg. Furthermore gc kicks in in very specific scenarios obviously, such
> as as by the allocator task, so I don't see why bcache gc should want to be
> interruptible.
> 
> Kent, Jens, can we get rid of this?
> 
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 76f7534d1dd1..6e3c358b5759 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -1798,7 +1798,6 @@ int bch_gc_thread_start(struct cache_set *c)
>if (IS_ERR(c->gc_thread))
>return PTR_ERR(c->gc_thread);
> 
> -   set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
>return 0;
> }

Actually, that code looks broken, or at least stupid. Let me do a proper fix...


ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-23 Thread Davidlohr Bueso

On Wed, 19 Oct 2016, Peter Zijlstra wrote:


Subject: sched: Better explain sleep/wakeup
From: Peter Zijlstra 
Date: Wed Oct 19 15:45:27 CEST 2016

There were a few questions wrt how sleep-wakeup works. Try and explain
it more.

Requested-by: Will Deacon 
Signed-off-by: Peter Zijlstra (Intel) 
---
include/linux/sched.h |   52 ++
kernel/sched/core.c   |   15 +++---
2 files changed, 44 insertions(+), 23 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
#define set_task_state(tsk, state_value)\
do {\
(tsk)->task_state_change = _THIS_IP_;\
-   smp_store_mb((tsk)->state, (state_value));   \
+   smp_store_mb((tsk)->state, (state_value));   \
} while (0)

-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- * set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
- *
- * If the caller does not need such serialisation then use 
__set_current_state()
- */
#define __set_current_state(state_value)\
do {\
current->task_state_change = _THIS_IP_;  \
@@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
#define set_current_state(state_value)  \
do {\
current->task_state_change = _THIS_IP_;  \
-   smp_store_mb(current->state, (state_value)); \
+   smp_store_mb(current->state, (state_value)); \
} while (0)

#else

+/*
+ * @tsk had better be current, or you get to keep the pieces.


That reminds me we were getting rid of the set_task_state() calls. Bcache was
pending, being only user in the kernel that doesn't actually use current; but
instead breaks newly (yet blocked/uninterruptible) created garbage collection
kthread. I cannot figure out why this is done (ie purposely accounting the
load avg. Furthermore gc kicks in in very specific scenarios obviously, such
as as by the allocator task, so I don't see why bcache gc should want to be
interruptible.

Kent, Jens, can we get rid of this?

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 76f7534d1dd1..6e3c358b5759 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1798,7 +1798,6 @@ int bch_gc_thread_start(struct cache_set *c)
   if (IS_ERR(c->gc_thread))
   return PTR_ERR(c->gc_thread);

-   set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
   return 0;
}

Thanks,
Davidlohr


+ *
+ * The only reason is that computing current can be more expensive than
+ * using a pointer that's already available.
+ *
+ * Therefore, see set_current_state().
+ */
#define __set_task_state(tsk, state_value)  \
do { (tsk)->state = (state_value); } while (0)
#define set_task_state(tsk, state_value)\
@@ -299,11 +296,34 @@ extern char ___assert_task_state[1 - 2*!
 * is correctly serialised wrt the caller's subsequent test of whether to
 * actually sleep:
 *
+ *   for (;;) {
 *  set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
+ * if (!need_sleep)
+ * break;
+ *
+ * schedule();
+ *   }
+ *   __set_current_state(TASK_RUNNING);
+ *
+ * If the caller does not need such serialisation (because, for instance, the
+ * condition test and condition change and wakeup are under the same lock) then
+ * use __set_current_state().
+ *
+ * The above is typically ordered against the wakeup, which does:
+ *
+ * need_sleep = false;
+ * wake_up_state(p, TASK_UNINTERRUPTIBLE);
+ *
+ * Where wake_up_state() (and all other wakeup primitives) imply enough
+ * barriers to order the store of the variable against wakeup.
+ *
+ * Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
+ * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
+ * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING).
+ *
+ * This is obviously fine, since they both store the exact same value.
 *
- * If the caller does not need such serialisation then use 
__set_current_state()
+ * Also see the comments of try_to_wake_up().
 */
#define __set_current_state(state_value)\
do { current->state = (state_value); } while (0)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2000,14 +2000,15 @@ static void ttwu_queue(struct task_struc
 * @state: the mask of task states that 

ciao set_task_state() (was Re: [PATCH -v4 6/8] locking/mutex: Restructure wait loop)

2016-10-23 Thread Davidlohr Bueso

On Wed, 19 Oct 2016, Peter Zijlstra wrote:


Subject: sched: Better explain sleep/wakeup
From: Peter Zijlstra 
Date: Wed Oct 19 15:45:27 CEST 2016

There were a few questions wrt how sleep-wakeup works. Try and explain
it more.

Requested-by: Will Deacon 
Signed-off-by: Peter Zijlstra (Intel) 
---
include/linux/sched.h |   52 ++
kernel/sched/core.c   |   15 +++---
2 files changed, 44 insertions(+), 23 deletions(-)

--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -262,20 +262,9 @@ extern char ___assert_task_state[1 - 2*!
#define set_task_state(tsk, state_value)\
do {\
(tsk)->task_state_change = _THIS_IP_;\
-   smp_store_mb((tsk)->state, (state_value));   \
+   smp_store_mb((tsk)->state, (state_value));   \
} while (0)

-/*
- * set_current_state() includes a barrier so that the write of current->state
- * is correctly serialised wrt the caller's subsequent test of whether to
- * actually sleep:
- *
- * set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
- *
- * If the caller does not need such serialisation then use 
__set_current_state()
- */
#define __set_current_state(state_value)\
do {\
current->task_state_change = _THIS_IP_;  \
@@ -284,11 +273,19 @@ extern char ___assert_task_state[1 - 2*!
#define set_current_state(state_value)  \
do {\
current->task_state_change = _THIS_IP_;  \
-   smp_store_mb(current->state, (state_value)); \
+   smp_store_mb(current->state, (state_value)); \
} while (0)

#else

+/*
+ * @tsk had better be current, or you get to keep the pieces.


That reminds me we were getting rid of the set_task_state() calls. Bcache was
pending, being only user in the kernel that doesn't actually use current; but
instead breaks newly (yet blocked/uninterruptible) created garbage collection
kthread. I cannot figure out why this is done (ie purposely accounting the
load avg. Furthermore gc kicks in in very specific scenarios obviously, such
as as by the allocator task, so I don't see why bcache gc should want to be
interruptible.

Kent, Jens, can we get rid of this?

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 76f7534d1dd1..6e3c358b5759 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1798,7 +1798,6 @@ int bch_gc_thread_start(struct cache_set *c)
   if (IS_ERR(c->gc_thread))
   return PTR_ERR(c->gc_thread);

-   set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
   return 0;
}

Thanks,
Davidlohr


+ *
+ * The only reason is that computing current can be more expensive than
+ * using a pointer that's already available.
+ *
+ * Therefore, see set_current_state().
+ */
#define __set_task_state(tsk, state_value)  \
do { (tsk)->state = (state_value); } while (0)
#define set_task_state(tsk, state_value)\
@@ -299,11 +296,34 @@ extern char ___assert_task_state[1 - 2*!
 * is correctly serialised wrt the caller's subsequent test of whether to
 * actually sleep:
 *
+ *   for (;;) {
 *  set_current_state(TASK_UNINTERRUPTIBLE);
- * if (do_i_need_to_sleep())
- * schedule();
+ * if (!need_sleep)
+ * break;
+ *
+ * schedule();
+ *   }
+ *   __set_current_state(TASK_RUNNING);
+ *
+ * If the caller does not need such serialisation (because, for instance, the
+ * condition test and condition change and wakeup are under the same lock) then
+ * use __set_current_state().
+ *
+ * The above is typically ordered against the wakeup, which does:
+ *
+ * need_sleep = false;
+ * wake_up_state(p, TASK_UNINTERRUPTIBLE);
+ *
+ * Where wake_up_state() (and all other wakeup primitives) imply enough
+ * barriers to order the store of the variable against wakeup.
+ *
+ * Wakeup will do: if (@state & p->state) p->state = TASK_RUNNING, that is,
+ * once it observes the TASK_UNINTERRUPTIBLE store the waking CPU can issue a
+ * TASK_RUNNING store which can collide with __set_current_state(TASK_RUNNING).
+ *
+ * This is obviously fine, since they both store the exact same value.
 *
- * If the caller does not need such serialisation then use 
__set_current_state()
+ * Also see the comments of try_to_wake_up().
 */
#define __set_current_state(state_value)\
do { current->state = (state_value); } while (0)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2000,14 +2000,15 @@ static void ttwu_queue(struct task_struc
 * @state: the mask of task states that can be woken
 * @wake_flags: wake modifier flags (WF_*)
 *
- * Put