Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Thu, Mar 15, 2018 at 02:45:20PM +, David Howells wrote: > Peter Zijlstrawrote: > > > Does the below address things sufficiently clear? > > Yep. Thanks! > > +wait_queue_head_t *__var_waitqueue(void *p) > > +{ > > + if (BITS_PER_LONG == 64) { > > + unsigned long q = (unsigned long)p; > > + > > + return bit_waitqueue((void *)(q & ~1), q & 1); > > + } > > + return bit_waitqueue(p, 0); > > +} > > You might be better off not using bit_waitqueue() but rather do the > calculation directly since you don't actually have a bit number. Yes, I did that in patch 11. The initial version uses the exact same stuff wait_on_atomic_t() uses to avoid spurious changes.
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Thu, Mar 15, 2018 at 02:45:20PM +, David Howells wrote: > Peter Zijlstra wrote: > > > Does the below address things sufficiently clear? > > Yep. Thanks! > > +wait_queue_head_t *__var_waitqueue(void *p) > > +{ > > + if (BITS_PER_LONG == 64) { > > + unsigned long q = (unsigned long)p; > > + > > + return bit_waitqueue((void *)(q & ~1), q & 1); > > + } > > + return bit_waitqueue(p, 0); > > +} > > You might be better off not using bit_waitqueue() but rather do the > calculation directly since you don't actually have a bit number. Yes, I did that in patch 11. The initial version uses the exact same stuff wait_on_atomic_t() uses to avoid spurious changes.
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
Peter Zijlstrawrote: > Does the below address things sufficiently clear? Yep. > +wait_queue_head_t *__var_waitqueue(void *p) > +{ > + if (BITS_PER_LONG == 64) { > + unsigned long q = (unsigned long)p; > + > + return bit_waitqueue((void *)(q & ~1), q & 1); > + } > + return bit_waitqueue(p, 0); > +} You might be better off not using bit_waitqueue() but rather do the calculation directly since you don't actually have a bit number. David
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
Peter Zijlstra wrote: > Does the below address things sufficiently clear? Yep. > +wait_queue_head_t *__var_waitqueue(void *p) > +{ > + if (BITS_PER_LONG == 64) { > + unsigned long q = (unsigned long)p; > + > + return bit_waitqueue((void *)(q & ~1), q & 1); > + } > + return bit_waitqueue(p, 0); > +} You might be better off not using bit_waitqueue() but rather do the calculation directly since you don't actually have a bit number. David
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Thu, Mar 15, 2018 at 09:58:42AM +, David Howells wrote: > Peter Zijlstrawrote: > > > > > Argh, no no no.. That whole wait_for_atomic_t thing is a giant > > > > trainwreck already and now you're making it worse still. > > Your patch description needs to say why this isn't a trainwreck when you > consider wait_for_atomic_t() to be one since it does things in a very similar > way. Does the below address things sufficiently clear? --- Subject: sched/wait: Introduce wait_var_event() From: Peter Zijlstra Date: Thu Mar 15 11:40:33 CET 2018 As a replacement for the wait_on_atomic_t() API provide the wait_var_event() API. The wait_var_event() API is based on the very same hashed-waitqueue idea, but doesn't care about the type (atomic_t) or the specific condition (atomic_read() == 0). IOW. it's much more widely applicable/flexible. It shares all the benefits/disadvantages of a hashed-waitqueue approach with the existing wait_on_atomic_t/wait_on_bit() APIs. The API is modeled after the existing wait_event() API, but instead of taking a wait_queue_head, it takes an address. This addresses is hashed to obtain a wait_queue_head from the bit_wait_table. Similar to the wait_event() API, it takes a condition expression as second argument and will wait until this expression becomes true. The following are (mostly) identical replacements: wait_on_atomic_t(_atomic, atomic_t_wait, TASK_UNINTERRUPTIBLE); wake_up_atomic_t(_atomic); wait_var_event(_atomic, !atomic_read(_atomic)); wake_up_var(_atomic); The only difference is that wake_up_var() is an unconditional wakeup and doesn't check the previously hard-coded (atomic_read() == 0) condition here. This is of little concequence, since most callers are already conditional on atomic_dec_and_test() and the ones that are not, are trivial to make so. Cc: David Howells Tested-by: Dan Williams Signed-off-by: Peter Zijlstra (Intel) --- include/linux/wait_bit.h | 70 +++ kernel/sched/wait_bit.c | 48 2 files changed, 118 insertions(+) --- a/include/linux/wait_bit.h +++ b/include/linux/wait_bit.h @@ -262,4 +262,74 @@ int wait_on_atomic_t(atomic_t *val, wait return out_of_line_wait_on_atomic_t(val, action, mode); } +extern void init_wait_var_entry(struct wait_bit_queue_entry *wbq_entry, void *var, int flags); +extern void wake_up_var(void *var); +extern wait_queue_head_t *__var_waitqueue(void *p); + +#define ___wait_var_event(var, condition, state, exclusive, ret, cmd) \ +({ \ + __label__ __out;\ + struct wait_queue_head *__wq_head = __var_waitqueue(var); \ + struct wait_bit_queue_entry __wbq_entry;\ + long __ret = ret; /* explicit shadow */ \ + \ + init_wait_var_entry(&__wbq_entry, var, \ + exclusive ? WQ_FLAG_EXCLUSIVE : 0); \ + for (;;) { \ + long __int = prepare_to_wait_event(__wq_head, \ + &__wbq_entry.wq_entry, \ + state); \ + if (condition) \ + break; \ + \ + if (___wait_is_interruptible(state) && __int) { \ + __ret = __int; \ + goto __out; \ + } \ + \ + cmd;\ + } \ + finish_wait(__wq_head, &__wbq_entry.wq_entry); \ +__out: __ret; \ +}) + +#define __wait_var_event(var, condition) \ + ___wait_var_event(var, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ + schedule()) + +#define wait_var_event(var, condition) \ +do { \ + might_sleep(); \ + if (condition) \ + break;
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Thu, Mar 15, 2018 at 09:58:42AM +, David Howells wrote: > Peter Zijlstra wrote: > > > > > Argh, no no no.. That whole wait_for_atomic_t thing is a giant > > > > trainwreck already and now you're making it worse still. > > Your patch description needs to say why this isn't a trainwreck when you > consider wait_for_atomic_t() to be one since it does things in a very similar > way. Does the below address things sufficiently clear? --- Subject: sched/wait: Introduce wait_var_event() From: Peter Zijlstra Date: Thu Mar 15 11:40:33 CET 2018 As a replacement for the wait_on_atomic_t() API provide the wait_var_event() API. The wait_var_event() API is based on the very same hashed-waitqueue idea, but doesn't care about the type (atomic_t) or the specific condition (atomic_read() == 0). IOW. it's much more widely applicable/flexible. It shares all the benefits/disadvantages of a hashed-waitqueue approach with the existing wait_on_atomic_t/wait_on_bit() APIs. The API is modeled after the existing wait_event() API, but instead of taking a wait_queue_head, it takes an address. This addresses is hashed to obtain a wait_queue_head from the bit_wait_table. Similar to the wait_event() API, it takes a condition expression as second argument and will wait until this expression becomes true. The following are (mostly) identical replacements: wait_on_atomic_t(_atomic, atomic_t_wait, TASK_UNINTERRUPTIBLE); wake_up_atomic_t(_atomic); wait_var_event(_atomic, !atomic_read(_atomic)); wake_up_var(_atomic); The only difference is that wake_up_var() is an unconditional wakeup and doesn't check the previously hard-coded (atomic_read() == 0) condition here. This is of little concequence, since most callers are already conditional on atomic_dec_and_test() and the ones that are not, are trivial to make so. Cc: David Howells Tested-by: Dan Williams Signed-off-by: Peter Zijlstra (Intel) --- include/linux/wait_bit.h | 70 +++ kernel/sched/wait_bit.c | 48 2 files changed, 118 insertions(+) --- a/include/linux/wait_bit.h +++ b/include/linux/wait_bit.h @@ -262,4 +262,74 @@ int wait_on_atomic_t(atomic_t *val, wait return out_of_line_wait_on_atomic_t(val, action, mode); } +extern void init_wait_var_entry(struct wait_bit_queue_entry *wbq_entry, void *var, int flags); +extern void wake_up_var(void *var); +extern wait_queue_head_t *__var_waitqueue(void *p); + +#define ___wait_var_event(var, condition, state, exclusive, ret, cmd) \ +({ \ + __label__ __out;\ + struct wait_queue_head *__wq_head = __var_waitqueue(var); \ + struct wait_bit_queue_entry __wbq_entry;\ + long __ret = ret; /* explicit shadow */ \ + \ + init_wait_var_entry(&__wbq_entry, var, \ + exclusive ? WQ_FLAG_EXCLUSIVE : 0); \ + for (;;) { \ + long __int = prepare_to_wait_event(__wq_head, \ + &__wbq_entry.wq_entry, \ + state); \ + if (condition) \ + break; \ + \ + if (___wait_is_interruptible(state) && __int) { \ + __ret = __int; \ + goto __out; \ + } \ + \ + cmd;\ + } \ + finish_wait(__wq_head, &__wbq_entry.wq_entry); \ +__out: __ret; \ +}) + +#define __wait_var_event(var, condition) \ + ___wait_var_event(var, condition, TASK_UNINTERRUPTIBLE, 0, 0, \ + schedule()) + +#define wait_var_event(var, condition) \ +do { \ + might_sleep(); \ + if (condition) \ + break; \ + __wait_var_event(var, condition);
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Thu, Mar 15, 2018 at 09:58:42AM +, David Howells wrote: > Peter Zijlstrawrote: > > > > > Argh, no no no.. That whole wait_for_atomic_t thing is a giant > > > > trainwreck already and now you're making it worse still. > > Your patch description needs to say why this isn't a trainwreck when you > consider wait_for_atomic_t() to be one since it does things in a very similar > way. Yeah, still writing changelogs..
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Thu, Mar 15, 2018 at 09:58:42AM +, David Howells wrote: > Peter Zijlstra wrote: > > > > > Argh, no no no.. That whole wait_for_atomic_t thing is a giant > > > > trainwreck already and now you're making it worse still. > > Your patch description needs to say why this isn't a trainwreck when you > consider wait_for_atomic_t() to be one since it does things in a very similar > way. Yeah, still writing changelogs..
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
Peter Zijlstrawrote: > > > Argh, no no no.. That whole wait_for_atomic_t thing is a giant > > > trainwreck already and now you're making it worse still. Your patch description needs to say why this isn't a trainwreck when you consider wait_for_atomic_t() to be one since it does things in a very similar way. David
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
Peter Zijlstra wrote: > > > Argh, no no no.. That whole wait_for_atomic_t thing is a giant > > > trainwreck already and now you're making it worse still. Your patch description needs to say why this isn't a trainwreck when you consider wait_for_atomic_t() to be one since it does things in a very similar way. David
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Tue, Mar 13, 2018 at 3:20 AM, Peter Zijlstrawrote: > On Sun, Mar 11, 2018 at 10:15:55AM -0700, Dan Williams wrote: >> On Sun, Mar 11, 2018 at 4:27 AM, Peter Zijlstra wrote: >> > On Fri, Mar 09, 2018 at 10:55:32PM -0800, Dan Williams wrote: >> >> Add a generic facility for awaiting an atomic_t to reach a value of 1. >> >> >> >> Page reference counts typically need to reach 0 to be considered a >> >> free / inactive page. However, ZONE_DEVICE pages allocated via >> >> devm_memremap_pages() are never 'onlined', i.e. the put_page() typically >> >> done at init time to assign pages to the page allocator is skipped. >> >> >> >> These pages will have their reference count elevated > 1 by >> >> get_user_pages() when they are under DMA. In order to coordinate DMA to >> >> these pages vs filesytem operations like hole-punch and truncate the >> >> filesystem-dax implementation needs to capture the DMA-idle event i.e. >> >> the 2 to 1 count transition). >> >> >> >> For now, this implementation does not have functional behavior change, >> >> follow-on patches will add waiters for these page-idle events. >> > >> > Argh, no no no.. That whole wait_for_atomic_t thing is a giant >> > trainwreck already and now you're making it worse still. >> > >> > Please have a look here: >> > >> > >> > https://lkml.kernel.org/r/20171101190644.chwhfpoz3ywxx...@hirez.programming.kicks-ass.net >> >> That thread seems to be worried about the object disappearing the >> moment it's reference count reaches a target. That isn't the case with >> the memmap / struct page objects for ZONE_DEVICE pages. I understand >> wait_for_atomic_one() is broken in the general case, but as far as I >> can see it works fine specifically for ZONE_DEVICE page busy tracking, >> just not generic object lifetime. > > How's this, compile tested (x86_64-allmodconfig) only. > > This allows you to write: > > wait_var_event(_atomic, atomic_read(_atomic) == 1); This works for me, you can add Tested-by: Dan Williams ...to the upstream version. Can we add this new api in an immutable commit tip/sched/core tree, so I can base my fix on it? The wait_for_atomic_t removal can then come in follow-on patches.
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Tue, Mar 13, 2018 at 3:20 AM, Peter Zijlstra wrote: > On Sun, Mar 11, 2018 at 10:15:55AM -0700, Dan Williams wrote: >> On Sun, Mar 11, 2018 at 4:27 AM, Peter Zijlstra wrote: >> > On Fri, Mar 09, 2018 at 10:55:32PM -0800, Dan Williams wrote: >> >> Add a generic facility for awaiting an atomic_t to reach a value of 1. >> >> >> >> Page reference counts typically need to reach 0 to be considered a >> >> free / inactive page. However, ZONE_DEVICE pages allocated via >> >> devm_memremap_pages() are never 'onlined', i.e. the put_page() typically >> >> done at init time to assign pages to the page allocator is skipped. >> >> >> >> These pages will have their reference count elevated > 1 by >> >> get_user_pages() when they are under DMA. In order to coordinate DMA to >> >> these pages vs filesytem operations like hole-punch and truncate the >> >> filesystem-dax implementation needs to capture the DMA-idle event i.e. >> >> the 2 to 1 count transition). >> >> >> >> For now, this implementation does not have functional behavior change, >> >> follow-on patches will add waiters for these page-idle events. >> > >> > Argh, no no no.. That whole wait_for_atomic_t thing is a giant >> > trainwreck already and now you're making it worse still. >> > >> > Please have a look here: >> > >> > >> > https://lkml.kernel.org/r/20171101190644.chwhfpoz3ywxx...@hirez.programming.kicks-ass.net >> >> That thread seems to be worried about the object disappearing the >> moment it's reference count reaches a target. That isn't the case with >> the memmap / struct page objects for ZONE_DEVICE pages. I understand >> wait_for_atomic_one() is broken in the general case, but as far as I >> can see it works fine specifically for ZONE_DEVICE page busy tracking, >> just not generic object lifetime. > > How's this, compile tested (x86_64-allmodconfig) only. > > This allows you to write: > > wait_var_event(_atomic, atomic_read(_atomic) == 1); This works for me, you can add Tested-by: Dan Williams ...to the upstream version. Can we add this new api in an immutable commit tip/sched/core tree, so I can base my fix on it? The wait_for_atomic_t removal can then come in follow-on patches.
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Tue, Mar 13, 2018 at 3:20 AM, Peter Zijlstrawrote: > On Sun, Mar 11, 2018 at 10:15:55AM -0700, Dan Williams wrote: >> On Sun, Mar 11, 2018 at 4:27 AM, Peter Zijlstra wrote: >> > On Fri, Mar 09, 2018 at 10:55:32PM -0800, Dan Williams wrote: >> >> Add a generic facility for awaiting an atomic_t to reach a value of 1. >> >> >> >> Page reference counts typically need to reach 0 to be considered a >> >> free / inactive page. However, ZONE_DEVICE pages allocated via >> >> devm_memremap_pages() are never 'onlined', i.e. the put_page() typically >> >> done at init time to assign pages to the page allocator is skipped. >> >> >> >> These pages will have their reference count elevated > 1 by >> >> get_user_pages() when they are under DMA. In order to coordinate DMA to >> >> these pages vs filesytem operations like hole-punch and truncate the >> >> filesystem-dax implementation needs to capture the DMA-idle event i.e. >> >> the 2 to 1 count transition). >> >> >> >> For now, this implementation does not have functional behavior change, >> >> follow-on patches will add waiters for these page-idle events. >> > >> > Argh, no no no.. That whole wait_for_atomic_t thing is a giant >> > trainwreck already and now you're making it worse still. >> > >> > Please have a look here: >> > >> > >> > https://lkml.kernel.org/r/20171101190644.chwhfpoz3ywxx...@hirez.programming.kicks-ass.net >> >> That thread seems to be worried about the object disappearing the >> moment it's reference count reaches a target. That isn't the case with >> the memmap / struct page objects for ZONE_DEVICE pages. I understand >> wait_for_atomic_one() is broken in the general case, but as far as I >> can see it works fine specifically for ZONE_DEVICE page busy tracking, >> just not generic object lifetime. > > How's this, compile tested (x86_64-allmodconfig) only. > > This allows you to write: > > wait_var_event(_atomic, atomic_read(_atomic) == 1); Nice! I'll give this a shot. I will need to add wait_var_event_interruptible(), but other than that it looks workable to me.
Re: [RFC][PATCH] sched/wait_bit: Introduce wait_var_event()/wake_up_var()
On Tue, Mar 13, 2018 at 3:20 AM, Peter Zijlstra wrote: > On Sun, Mar 11, 2018 at 10:15:55AM -0700, Dan Williams wrote: >> On Sun, Mar 11, 2018 at 4:27 AM, Peter Zijlstra wrote: >> > On Fri, Mar 09, 2018 at 10:55:32PM -0800, Dan Williams wrote: >> >> Add a generic facility for awaiting an atomic_t to reach a value of 1. >> >> >> >> Page reference counts typically need to reach 0 to be considered a >> >> free / inactive page. However, ZONE_DEVICE pages allocated via >> >> devm_memremap_pages() are never 'onlined', i.e. the put_page() typically >> >> done at init time to assign pages to the page allocator is skipped. >> >> >> >> These pages will have their reference count elevated > 1 by >> >> get_user_pages() when they are under DMA. In order to coordinate DMA to >> >> these pages vs filesytem operations like hole-punch and truncate the >> >> filesystem-dax implementation needs to capture the DMA-idle event i.e. >> >> the 2 to 1 count transition). >> >> >> >> For now, this implementation does not have functional behavior change, >> >> follow-on patches will add waiters for these page-idle events. >> > >> > Argh, no no no.. That whole wait_for_atomic_t thing is a giant >> > trainwreck already and now you're making it worse still. >> > >> > Please have a look here: >> > >> > >> > https://lkml.kernel.org/r/20171101190644.chwhfpoz3ywxx...@hirez.programming.kicks-ass.net >> >> That thread seems to be worried about the object disappearing the >> moment it's reference count reaches a target. That isn't the case with >> the memmap / struct page objects for ZONE_DEVICE pages. I understand >> wait_for_atomic_one() is broken in the general case, but as far as I >> can see it works fine specifically for ZONE_DEVICE page busy tracking, >> just not generic object lifetime. > > How's this, compile tested (x86_64-allmodconfig) only. > > This allows you to write: > > wait_var_event(_atomic, atomic_read(_atomic) == 1); Nice! I'll give this a shot. I will need to add wait_var_event_interruptible(), but other than that it looks workable to me.