subject:"wake_up_process implied memory barrier clarification"

Re: wake_up_process implied memory barrier clarification

2015-09-07 Thread Boqun Feng

Hi Oleg,

On Mon, Sep 07, 2015 at 07:06:52PM +0200, Oleg Nesterov wrote:
> Sorry for delay,
> 
> On 09/02, Boqun Feng wrote:
> >
> > On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> > > On 09/01, Boqun Feng wrote:
> > > >
> > > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > > > >
> > > > > And just in case, wake_up() differs in a sense that it doesn't even 
> > > > > need
> > > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the 
> > > > > "normal"
> > > > > wait_event()-like code.
> > >
> > > Looks like, you have missed this part of my previous email. See below.
> >
> > I guess I need to think through this, though I haven't found any problem
> > in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
> > And I know that in wake_up(), try_to_wake_up() will be called with
> > holding wait_queue_head_t->lock, however, only part of wait_event()
> > holds the same lock, I can't figure out why the barrier is not needed
> > because of the lock..
> 
> This is very simple. __wait_event() does
> 
>   for (;;) {
>   prepare_to_wait_event(WQ, ...); // takes WQ->lock
> 
>   if (CONDITION)
>   break;
> 
>   schedule();
>   }
> 
> and we have
> 
>   CONDITION = 1;
>   wake_up(WQ);// takes WQ->lock
> 
> on another side.
> 
> Suppose that __wait_event() wins and takes WQ->lock first. It can block
> then. In this case wake_up() must see the result of set_current_state()
> and list_add() when it takes the same lock, otherwise spin_lock() would
> be simply buggy. So it will wake the waiter up.
> 
> At the same time, if __wait_event() takes this lock after wake_up(), it
> can not miss CONDITION = 1. It must see it after it takes the lock, and
> of course after it drops the lock too.
> 

Yes, you're right! I wasn't aware that in prepare_to_wait_event(),
set_current_state() is called with the WQ->lock.

> So I am not sure I understand your concerns in this case...
> 

It's my mistake. Thank you for your explanation ;-)

Regards,
Boqun


signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-09-07 Thread Oleg Nesterov

Sorry for delay,

On 09/02, Boqun Feng wrote:
>
> On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> > On 09/01, Boqun Feng wrote:
> > >
> > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > > >
> > > > And just in case, wake_up() differs in a sense that it doesn't even need
> > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > > > wait_event()-like code.
> >
> > Looks like, you have missed this part of my previous email. See below.
>
> I guess I need to think through this, though I haven't found any problem
> in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
> And I know that in wake_up(), try_to_wake_up() will be called with
> holding wait_queue_head_t->lock, however, only part of wait_event()
> holds the same lock, I can't figure out why the barrier is not needed
> because of the lock..

This is very simple. __wait_event() does

for (;;) {
prepare_to_wait_event(WQ, ...); // takes WQ->lock

if (CONDITION)
break;

schedule();
}

and we have

CONDITION = 1;
wake_up(WQ);// takes WQ->lock

on another side.

Suppose that __wait_event() wins and takes WQ->lock first. It can block
then. In this case wake_up() must see the result of set_current_state()
and list_add() when it takes the same lock, otherwise spin_lock() would
be simply buggy. So it will wake the waiter up.

At the same time, if __wait_event() takes this lock after wake_up(), it
can not miss CONDITION = 1. It must see it after it takes the lock, and
of course after it drops the lock too.

So I am not sure I understand your concerns in this case...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-09-07 Thread Oleg Nesterov

Sorry for delay,

On 09/02, Boqun Feng wrote:
>
> On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> > On 09/01, Boqun Feng wrote:
> > >
> > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > > >
> > > > And just in case, wake_up() differs in a sense that it doesn't even need
> > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > > > wait_event()-like code.
> >
> > Looks like, you have missed this part of my previous email. See below.
>
> I guess I need to think through this, though I haven't found any problem
> in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
> And I know that in wake_up(), try_to_wake_up() will be called with
> holding wait_queue_head_t->lock, however, only part of wait_event()
> holds the same lock, I can't figure out why the barrier is not needed
> because of the lock..

This is very simple. __wait_event() does

for (;;) {
prepare_to_wait_event(WQ, ...); // takes WQ->lock

if (CONDITION)
break;

schedule();
}

and we have

CONDITION = 1;
wake_up(WQ);// takes WQ->lock

on another side.

Suppose that __wait_event() wins and takes WQ->lock first. It can block
then. In this case wake_up() must see the result of set_current_state()
and list_add() when it takes the same lock, otherwise spin_lock() would
be simply buggy. So it will wake the waiter up.

At the same time, if __wait_event() takes this lock after wake_up(), it
can not miss CONDITION = 1. It must see it after it takes the lock, and
of course after it drops the lock too.

So I am not sure I understand your concerns in this case...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-09-07 Thread Boqun Feng

Hi Oleg,

On Mon, Sep 07, 2015 at 07:06:52PM +0200, Oleg Nesterov wrote:
> Sorry for delay,
> 
> On 09/02, Boqun Feng wrote:
> >
> > On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> > > On 09/01, Boqun Feng wrote:
> > > >
> > > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > > > >
> > > > > And just in case, wake_up() differs in a sense that it doesn't even 
> > > > > need
> > > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the 
> > > > > "normal"
> > > > > wait_event()-like code.
> > >
> > > Looks like, you have missed this part of my previous email. See below.
> >
> > I guess I need to think through this, though I haven't found any problem
> > in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
> > And I know that in wake_up(), try_to_wake_up() will be called with
> > holding wait_queue_head_t->lock, however, only part of wait_event()
> > holds the same lock, I can't figure out why the barrier is not needed
> > because of the lock..
> 
> This is very simple. __wait_event() does
> 
>   for (;;) {
>   prepare_to_wait_event(WQ, ...); // takes WQ->lock
> 
>   if (CONDITION)
>   break;
> 
>   schedule();
>   }
> 
> and we have
> 
>   CONDITION = 1;
>   wake_up(WQ);// takes WQ->lock
> 
> on another side.
> 
> Suppose that __wait_event() wins and takes WQ->lock first. It can block
> then. In this case wake_up() must see the result of set_current_state()
> and list_add() when it takes the same lock, otherwise spin_lock() would
> be simply buggy. So it will wake the waiter up.
> 
> At the same time, if __wait_event() takes this lock after wake_up(), it
> can not miss CONDITION = 1. It must see it after it takes the lock, and
> of course after it drops the lock too.
> 

Yes, you're right! I wasn't aware that in prepare_to_wait_event(),
set_current_state() is called with the WQ->lock.

> So I am not sure I understand your concerns in this case...
> 

It's my mistake. Thank you for your explanation ;-)

Regards,
Boqun


signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Boqun Feng

On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> On 09/01, Boqun Feng wrote:
> >
> > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > >
> > > And just in case, wake_up() differs in a sense that it doesn't even need
> > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > > wait_event()-like code.
> 
> Looks like, you have missed this part of my previous email. See below.

I guess I need to think through this, though I haven't found any problem
in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
And I know that in wake_up(), try_to_wake_up() will be called with
holding wait_queue_head_t->lock, however, only part of wait_event()
holds the same lock, I can't figure out why the barrier is not needed
because of the lock.. I will read the code carefully to see whether I
miss something.

> 
> > I think maybe I have a misunderstanding of barrier pairing.
> 
> Or me. I can only say how it is supposed to work.
> 
> > think that a barrier pairing can only happen:
> 
> Well, no. See for example https://lkml.org/lkml/2014/7/16/310

This helps a lot, so does the LWN article it references:

https://lwn.net/Articles/573436/

The barrier pairing here is scenario 10 in that article. I'm sure that
is my misunderstanding of barrier pairing now, thank you!

> Or, say, the comment in completion_done().
> 
> And please do not assume I can answer authoritatively the questions
> in this area. Fortunately we have paulmck/peterz in CC, they can
> correct me.
> 

Your explanation and references help a lot, now I understand how the
barrier works in try_to_wake_up() ;-)

> Plus I didn't sleep today, not sure I can understand you correctly
> and/or answer your question ;)
> 
> > One example of #2 pairing is the following sequence of events:
> >
> > Initially X = 0, Y = 0
> >
> > CPU 1   CPU 2
> > === 
> > WRITE_ONCE(Y, 1);
> > smp_mb();
> > r1 = READ_ONCE(X); // r1 == 0
> > WRITE_ONCE(X, 1);
> > smp_mb();
> > r2 = READ_ONCE(Y);
> >
> > 
> > { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}
> >
> > If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
> > guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
> > wouldn't be triggered in any case.
> >
> > And this is actually the case of wake_up/wait, assuming that
> > prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2,
> 
> See above, wake_up/wait_event are fine in any case because they use
> the same lock.
> 

I think I wanted to say wake_up_proccess() here, which calls
try_to_wake_up() directly, sorry about that mistake..

> But as for try_to_wake_up() you are right, we rely on STORE-MB-LOAD.
> Now let me quote another previous email,
> 
>   So in fact try_to_wake_up() needs mb() before it does
> 
>   if (!(p->state & state))
>   goto out;
> 
>   But mb() is heavy, we can use wmb() instead, but only in this particular
>   case: before spin_lock(), which guarantees that LOAD(p->state) can't 
> leak
>   out of the critical section. And since spin_lock() itself is the STORE,
>   this guarantees that STORE(CONDITION) can't leak _into_ the critical 
> section
>   and thus it can't be reordered with LOAD(p->state).
> 

This also helps a lot, thank you ;-)

> And I think that smp_mb__before_spinlock() + spin_lock() should pair
> correctly with mb(). If not - we should redefine it.
> 
> > X is the condition and Y is the task state,
> 
> Yes,
> 
> > and replace smp_mb() with really necessary barriers, right?
> 
> Sorry, can't understand this part...
> 

I just want to be accurate by saying that, because the barrier in
try_to_wake_up() is not a smp_mb(), is a smp_mb__before_spinlock() +
spin_lock().

Regards,
Boqun


signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Oleg Nesterov

On 09/01, Boqun Feng wrote:
>
> On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> >
> > And just in case, wake_up() differs in a sense that it doesn't even need
> > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > wait_event()-like code.

Looks like, you have missed this part of my previous email. See below.

> I think maybe I have a misunderstanding of barrier pairing.

Or me. I can only say how it is supposed to work.

> think that a barrier pairing can only happen:

Well, no. See for example https://lkml.org/lkml/2014/7/16/310
Or, say, the comment in completion_done().

And please do not assume I can answer authoritatively the questions
in this area. Fortunately we have paulmck/peterz in CC, they can
correct me.

Plus I didn't sleep today, not sure I can understand you correctly
and/or answer your question ;)

> One example of #2 pairing is the following sequence of events:
>
> Initially X = 0, Y = 0
>
> CPU 1 CPU 2
> ===   
> WRITE_ONCE(Y, 1);
> smp_mb();
> r1 = READ_ONCE(X); // r1 == 0
>   WRITE_ONCE(X, 1);
>   smp_mb();
>   r2 = READ_ONCE(Y);
>
> 
> { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}
>
> If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
> guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
> wouldn't be triggered in any case.
>
> And this is actually the case of wake_up/wait, assuming that
> prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2,

See above, wake_up/wait_event are fine in any case because they use
the same lock.

But as for try_to_wake_up() you are right, we rely on STORE-MB-LOAD.
Now let me quote another previous email,

So in fact try_to_wake_up() needs mb() before it does

if (!(p->state & state))
goto out;

But mb() is heavy, we can use wmb() instead, but only in this particular
case: before spin_lock(), which guarantees that LOAD(p->state) can't 
leak
out of the critical section. And since spin_lock() itself is the STORE,
this guarantees that STORE(CONDITION) can't leak _into_ the critical 
section
and thus it can't be reordered with LOAD(p->state).

And I think that smp_mb__before_spinlock() + spin_lock() should pair
correctly with mb(). If not - we should redefine it.

> X is the condition and Y is the task state,

Yes,

> and replace smp_mb() with really necessary barriers, right?

Sorry, can't understand this part...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Boqun Feng

On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> On 09/01, Boqun Feng wrote:
> >
> > But I'm still a little confused at Oleg's words:
> >
> > "What is really important is that we have a barrier before we _read_ the
> > task state."
> >
> > I read is as "What is really important is that we have a barrier before
> > we _read_ the task state and _after_ we write the CONDITION", if I don't
> > misunderstand Oleg, this means a STORE-barrier-LOAD sequence,
> 
> Yes, exactly.
> 
> Let's look at this trivial code again,
> 
>   CONDITION = 1;
>   wake_up_process();
> 
> note that try_to_wake_up() does
> 
>   if (!(p->state & state))
>   goto out;
> 
> If this LOAD could be reordered with STORE(CONDITION) above we can obviously
> race with
> 
>   set_current_state(...);
>   if (!CONDITION)
>   schedule();
> 
> See the comment at the start of try_to_wake_up(). And again, again, please

Thank you for your detailed explanation!

I read the comment, but just couldn't understand how the pairing
happens, I think I have a misunderstanding of pairing, please see below.

> note that initially the only documented behaviour of smp_mb__before_spinlock()
> was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it
> doesn't actually need the write barrier after STORE(CONDITION).
> 
> And just in case, wake_up() differs in a sense that it doesn't even need
> that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> wait_event()-like code.
> 
> > which IIUC
> > can't pair with anything.
> 
> It pairs with the barrier implied by set_current_state().

I think maybe I have a misunderstanding of barrier pairing. I used to
think that a barrier pairing can only happen:

1.  between a "MEM-barrier-STORE" and "LOAD-barrier-MEM", when the
LOAD reads the value which the STORE writes

where MEM is a memory operation(STORE or LOAD).

However, wait and wakeup don't fit in the #1 barrier pairing, so I guess
I was wrong, there is a kind of barrier pairings which also happen:

2.  between a "MEM-barrier-LOAD" and "STORE-barrier-MEM", when the
LOAD _fails_ to read the value which the STORE writes,

#1 pairing is easy to understand and memory-barriers.txt already has
some examples. I admit that I haven't seen any usage of #2 pairing
other than wait_event() and wake_up(). Of course, this may be because
I'm not an expert of parallel programming and don't know too much..

Have to ask Paul, when we are talking about barrier pairing, we are
actually talking about the combination of #1 and #2, right?

And Oleg, the barrier pairing we are talking here is the kind of #2,
right?

Add two examples, in case that I fail to make clear the difference of
two barrier pairings.

One example of #1 pairing is the following sequence of events:

Initially X = 0, Y = 0

CPU 1   CPU 2
=== 
WRITE_ONCE(Y, 1);
smp_mb();
WRITE_ONCE(X, 1);
r1 = READ_ONCE(X); // r1 == 1
smp_mb();
r2 = READ_ONCE(Y);

{ assert(!(r1 == 1 && r2 == 0)); // means if r1 == 1 then r2 == 1}

If CPU 2 reads the value of X written by CPU 1, r2 is guaranteed to be
1, which means assert(!(r1 == 1 && r2 == 0)) afterwards wouldn't be
triggered in any case.

One example of #2 pairing is the following sequence of events:

Initially X = 0, Y = 0

CPU 1   CPU 2
=== 
WRITE_ONCE(Y, 1);
smp_mb();
r1 = READ_ONCE(X); // r1 == 0
WRITE_ONCE(X, 1);
smp_mb();
r2 = READ_ONCE(Y);

{ assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}

If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
wouldn't be triggered in any case.

And this is actually the case of wake_up/wait, assuming that
prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2, X
is the condition and Y is the task state, and replace smp_mb() with
really necessary barriers, right?

Regards,
Boqun

signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Oleg Nesterov

On 09/01, Boqun Feng wrote:
>
> But I'm still a little confused at Oleg's words:
>
> "What is really important is that we have a barrier before we _read_ the
> task state."
>
> I read is as "What is really important is that we have a barrier before
> we _read_ the task state and _after_ we write the CONDITION", if I don't
> misunderstand Oleg, this means a STORE-barrier-LOAD sequence,

Yes, exactly.

Let's look at this trivial code again,

CONDITION = 1;
wake_up_process();

note that try_to_wake_up() does

if (!(p->state & state))
goto out;

If this LOAD could be reordered with STORE(CONDITION) above we can obviously
race with

set_current_state(...);
if (!CONDITION)
schedule();

See the comment at the start of try_to_wake_up(). And again, again, please
note that initially the only documented behaviour of smp_mb__before_spinlock()
was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it
doesn't actually need the write barrier after STORE(CONDITION).

And just in case, wake_up() differs in a sense that it doesn't even need
that STORE-LOAD barrier in try_to_wake_up(), we can rely on
wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
wait_event()-like code.

> which IIUC
> can't pair with anything.

It pairs with the barrier implied by set_current_state().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Oleg Nesterov

On 08/31, Paul E. McKenney wrote:
>
> On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > On 08/31, Boqun Feng wrote:
> > >
> > > Fair enough, I went too far. How about just a single paragraph saying
> > > that:
> > >
> > > The wake_up(), wait_event() and their friends have proper barriers in
> > > them, but these implicity barriers are only for the correctness for
> > > sleep and wakeup. So don't rely on these barriers for things that are
> > > neither wait-conditons nor task states.
> > >
> > > Is that OK to you?
> >
> > Ask Paul ;) but personally I agree.
> >
> > To me, the only thing a user should know about wake_up/try_to_wake_up
> > and barriers is that you do not need another barrier between setting
> > condition and waking up.
>
> Sounds like an excellent idea in general.  But could you please show me
> a short code snippet illustrating where you don't need the additional
> barrier, even if the fastpaths are taken so that there is no sleep and
> no wakeup?

I guess I wasn't clear... All I tried to say is that

CONDITION = 1;
wake_up_process();

does not need any _additional_ barrier in between.

I mentioned this because afaics people are often unsure if this is true
or not, and to some degree this question initiated this discussion.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Boqun Feng

On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> On 09/01, Boqun Feng wrote:
> >
> > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > >
> > > And just in case, wake_up() differs in a sense that it doesn't even need
> > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > > wait_event()-like code.
> 
> Looks like, you have missed this part of my previous email. See below.

I guess I need to think through this, though I haven't found any problem
in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
And I know that in wake_up(), try_to_wake_up() will be called with
holding wait_queue_head_t->lock, however, only part of wait_event()
holds the same lock, I can't figure out why the barrier is not needed
because of the lock.. I will read the code carefully to see whether I
miss something.

> 
> > I think maybe I have a misunderstanding of barrier pairing.
> 
> Or me. I can only say how it is supposed to work.
> 
> > think that a barrier pairing can only happen:
> 
> Well, no. See for example https://lkml.org/lkml/2014/7/16/310

This helps a lot, so does the LWN article it references:

https://lwn.net/Articles/573436/

The barrier pairing here is scenario 10 in that article. I'm sure that
is my misunderstanding of barrier pairing now, thank you!

> Or, say, the comment in completion_done().
> 
> And please do not assume I can answer authoritatively the questions
> in this area. Fortunately we have paulmck/peterz in CC, they can
> correct me.
> 

Your explanation and references help a lot, now I understand how the
barrier works in try_to_wake_up() ;-)

> Plus I didn't sleep today, not sure I can understand you correctly
> and/or answer your question ;)
> 
> > One example of #2 pairing is the following sequence of events:
> >
> > Initially X = 0, Y = 0
> >
> > CPU 1   CPU 2
> > === 
> > WRITE_ONCE(Y, 1);
> > smp_mb();
> > r1 = READ_ONCE(X); // r1 == 0
> > WRITE_ONCE(X, 1);
> > smp_mb();
> > r2 = READ_ONCE(Y);
> >
> > 
> > { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}
> >
> > If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
> > guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
> > wouldn't be triggered in any case.
> >
> > And this is actually the case of wake_up/wait, assuming that
> > prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2,
> 
> See above, wake_up/wait_event are fine in any case because they use
> the same lock.
> 

I think I wanted to say wake_up_proccess() here, which calls
try_to_wake_up() directly, sorry about that mistake..

> But as for try_to_wake_up() you are right, we rely on STORE-MB-LOAD.
> Now let me quote another previous email,
> 
>   So in fact try_to_wake_up() needs mb() before it does
> 
>   if (!(p->state & state))
>   goto out;
> 
>   But mb() is heavy, we can use wmb() instead, but only in this particular
>   case: before spin_lock(), which guarantees that LOAD(p->state) can't 
> leak
>   out of the critical section. And since spin_lock() itself is the STORE,
>   this guarantees that STORE(CONDITION) can't leak _into_ the critical 
> section
>   and thus it can't be reordered with LOAD(p->state).
> 

This also helps a lot, thank you ;-)

> And I think that smp_mb__before_spinlock() + spin_lock() should pair
> correctly with mb(). If not - we should redefine it.
> 
> > X is the condition and Y is the task state,
> 
> Yes,
> 
> > and replace smp_mb() with really necessary barriers, right?
> 
> Sorry, can't understand this part...
> 

I just want to be accurate by saying that, because the barrier in
try_to_wake_up() is not a smp_mb(), is a smp_mb__before_spinlock() +
spin_lock().

Regards,
Boqun


signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Oleg Nesterov

On 09/01, Boqun Feng wrote:
>
> But I'm still a little confused at Oleg's words:
>
> "What is really important is that we have a barrier before we _read_ the
> task state."
>
> I read is as "What is really important is that we have a barrier before
> we _read_ the task state and _after_ we write the CONDITION", if I don't
> misunderstand Oleg, this means a STORE-barrier-LOAD sequence,

Yes, exactly.

Let's look at this trivial code again,

CONDITION = 1;
wake_up_process();

note that try_to_wake_up() does

if (!(p->state & state))
goto out;

If this LOAD could be reordered with STORE(CONDITION) above we can obviously
race with

set_current_state(...);
if (!CONDITION)
schedule();

See the comment at the start of try_to_wake_up(). And again, again, please
note that initially the only documented behaviour of smp_mb__before_spinlock()
was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it
doesn't actually need the write barrier after STORE(CONDITION).

And just in case, wake_up() differs in a sense that it doesn't even need
that STORE-LOAD barrier in try_to_wake_up(), we can rely on
wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
wait_event()-like code.

> which IIUC
> can't pair with anything.

It pairs with the barrier implied by set_current_state().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Oleg Nesterov

On 08/31, Paul E. McKenney wrote:
>
> On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > On 08/31, Boqun Feng wrote:
> > >
> > > Fair enough, I went too far. How about just a single paragraph saying
> > > that:
> > >
> > > The wake_up(), wait_event() and their friends have proper barriers in
> > > them, but these implicity barriers are only for the correctness for
> > > sleep and wakeup. So don't rely on these barriers for things that are
> > > neither wait-conditons nor task states.
> > >
> > > Is that OK to you?
> >
> > Ask Paul ;) but personally I agree.
> >
> > To me, the only thing a user should know about wake_up/try_to_wake_up
> > and barriers is that you do not need another barrier between setting
> > condition and waking up.
>
> Sounds like an excellent idea in general.  But could you please show me
> a short code snippet illustrating where you don't need the additional
> barrier, even if the fastpaths are taken so that there is no sleep and
> no wakeup?

I guess I wasn't clear... All I tried to say is that

CONDITION = 1;
wake_up_process();

does not need any _additional_ barrier in between.

I mentioned this because afaics people are often unsure if this is true
or not, and to some degree this question initiated this discussion.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Oleg Nesterov

On 09/01, Boqun Feng wrote:
>
> On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> >
> > And just in case, wake_up() differs in a sense that it doesn't even need
> > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > wait_event()-like code.

Looks like, you have missed this part of my previous email. See below.

> I think maybe I have a misunderstanding of barrier pairing.

Or me. I can only say how it is supposed to work.

> think that a barrier pairing can only happen:

Well, no. See for example https://lkml.org/lkml/2014/7/16/310
Or, say, the comment in completion_done().

And please do not assume I can answer authoritatively the questions
in this area. Fortunately we have paulmck/peterz in CC, they can
correct me.

Plus I didn't sleep today, not sure I can understand you correctly
and/or answer your question ;)

> One example of #2 pairing is the following sequence of events:
>
> Initially X = 0, Y = 0
>
> CPU 1 CPU 2
> ===   
> WRITE_ONCE(Y, 1);
> smp_mb();
> r1 = READ_ONCE(X); // r1 == 0
>   WRITE_ONCE(X, 1);
>   smp_mb();
>   r2 = READ_ONCE(Y);
>
> 
> { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}
>
> If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
> guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
> wouldn't be triggered in any case.
>
> And this is actually the case of wake_up/wait, assuming that
> prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2,

See above, wake_up/wait_event are fine in any case because they use
the same lock.

But as for try_to_wake_up() you are right, we rely on STORE-MB-LOAD.
Now let me quote another previous email,

So in fact try_to_wake_up() needs mb() before it does

if (!(p->state & state))
goto out;

But mb() is heavy, we can use wmb() instead, but only in this particular
case: before spin_lock(), which guarantees that LOAD(p->state) can't 
leak
out of the critical section. And since spin_lock() itself is the STORE,
this guarantees that STORE(CONDITION) can't leak _into_ the critical 
section
and thus it can't be reordered with LOAD(p->state).

And I think that smp_mb__before_spinlock() + spin_lock() should pair
correctly with mb(). If not - we should redefine it.

> X is the condition and Y is the task state,

Yes,

> and replace smp_mb() with really necessary barriers, right?

Sorry, can't understand this part...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-09-01 Thread Boqun Feng

On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> On 09/01, Boqun Feng wrote:
> >
> > But I'm still a little confused at Oleg's words:
> >
> > "What is really important is that we have a barrier before we _read_ the
> > task state."
> >
> > I read is as "What is really important is that we have a barrier before
> > we _read_ the task state and _after_ we write the CONDITION", if I don't
> > misunderstand Oleg, this means a STORE-barrier-LOAD sequence,
> 
> Yes, exactly.
> 
> Let's look at this trivial code again,
> 
>   CONDITION = 1;
>   wake_up_process();
> 
> note that try_to_wake_up() does
> 
>   if (!(p->state & state))
>   goto out;
> 
> If this LOAD could be reordered with STORE(CONDITION) above we can obviously
> race with
> 
>   set_current_state(...);
>   if (!CONDITION)
>   schedule();
> 
> See the comment at the start of try_to_wake_up(). And again, again, please

Thank you for your detailed explanation!

I read the comment, but just couldn't understand how the pairing
happens, I think I have a misunderstanding of pairing, please see below.

> note that initially the only documented behaviour of smp_mb__before_spinlock()
> was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it
> doesn't actually need the write barrier after STORE(CONDITION).
> 
> And just in case, wake_up() differs in a sense that it doesn't even need
> that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> wait_event()-like code.
> 
> > which IIUC
> > can't pair with anything.
> 
> It pairs with the barrier implied by set_current_state().

I think maybe I have a misunderstanding of barrier pairing. I used to
think that a barrier pairing can only happen:

1.  between a "MEM-barrier-STORE" and "LOAD-barrier-MEM", when the
LOAD reads the value which the STORE writes

where MEM is a memory operation(STORE or LOAD).

However, wait and wakeup don't fit in the #1 barrier pairing, so I guess
I was wrong, there is a kind of barrier pairings which also happen:

2.  between a "MEM-barrier-LOAD" and "STORE-barrier-MEM", when the
LOAD _fails_ to read the value which the STORE writes,

#1 pairing is easy to understand and memory-barriers.txt already has
some examples. I admit that I haven't seen any usage of #2 pairing
other than wait_event() and wake_up(). Of course, this may be because
I'm not an expert of parallel programming and don't know too much..

Have to ask Paul, when we are talking about barrier pairing, we are
actually talking about the combination of #1 and #2, right?

And Oleg, the barrier pairing we are talking here is the kind of #2,
right?

Add two examples, in case that I fail to make clear the difference of
two barrier pairings.

One example of #1 pairing is the following sequence of events:

Initially X = 0, Y = 0

CPU 1   CPU 2
=== 
WRITE_ONCE(Y, 1);
smp_mb();
WRITE_ONCE(X, 1);
r1 = READ_ONCE(X); // r1 == 1
smp_mb();
r2 = READ_ONCE(Y);

{ assert(!(r1 == 1 && r2 == 0)); // means if r1 == 1 then r2 == 1}

If CPU 2 reads the value of X written by CPU 1, r2 is guaranteed to be
1, which means assert(!(r1 == 1 && r2 == 0)) afterwards wouldn't be
triggered in any case.

One example of #2 pairing is the following sequence of events:

Initially X = 0, Y = 0

CPU 1   CPU 2
=== 
WRITE_ONCE(Y, 1);
smp_mb();
r1 = READ_ONCE(X); // r1 == 0
WRITE_ONCE(X, 1);
smp_mb();
r2 = READ_ONCE(Y);

{ assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}

If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
wouldn't be triggered in any case.

And this is actually the case of wake_up/wait, assuming that
prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2, X
is the condition and Y is the task state, and replace smp_mb() with
really necessary barriers, right?

Regards,
Boqun

signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-08-31 Thread Paul E. McKenney

On Tue, Sep 01, 2015 at 11:40:14AM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Mon, Aug 31, 2015 at 01:37:39PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > > On 08/31, Boqun Feng wrote:
> > > >
> > > > Fair enough, I went too far. How about just a single paragraph saying
> > > > that:
> > > >
> > > > The wake_up(), wait_event() and their friends have proper barriers in
> > > > them, but these implicity barriers are only for the correctness for
> > > > sleep and wakeup. So don't rely on these barriers for things that are
> > > > neither wait-conditons nor task states.
> > > >
> > > > Is that OK to you?
> > > 
> > > Ask Paul ;) but personally I agree.
> > > 
> > > To me, the only thing a user should know about wake_up/try_to_wake_up
> > > and barriers is that you do not need another barrier between setting
> > > condition and waking up.
> > 
> > Sounds like an excellent idea in general.  But could you please show me
> > a short code snippet illustrating where you don't need the additional
> > barrier, even if the fastpaths are taken so that there is no sleep and
> > no wakeup?
> 
> If there is no sleep and no wakeup, it means only CONDITION changed.
> Either CONDITION is a single variable or it should maintains internal
> ordering guarantee itself. And there is no need for barriers, because
> there is only one shared resource we are talking about, right?

I could imagine all sorts of combinations, which is why I would like
to see a code snippet showing exactly what Oleg is talking about.  ;-)

Thanx, Paul

> But I'm still a little confused at Oleg's words:
> 
> "What is really important is that we have a barrier before we _read_ the
> task state."
> 
> I read is as "What is really important is that we have a barrier before
> we _read_ the task state and _after_ we write the CONDITION", if I don't
> misunderstand Oleg, this means a STORE-barrier-LOAD sequence, which IIUC
> can't pair with anything.
> 
> So, there might be some tricky barrier usage here?
> 
> Regards,
> Boqun


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-31 Thread Boqun Feng

Hi Paul,

On Mon, Aug 31, 2015 at 01:37:39PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > On 08/31, Boqun Feng wrote:
> > >
> > > Fair enough, I went too far. How about just a single paragraph saying
> > > that:
> > >
> > > The wake_up(), wait_event() and their friends have proper barriers in
> > > them, but these implicity barriers are only for the correctness for
> > > sleep and wakeup. So don't rely on these barriers for things that are
> > > neither wait-conditons nor task states.
> > >
> > > Is that OK to you?
> > 
> > Ask Paul ;) but personally I agree.
> > 
> > To me, the only thing a user should know about wake_up/try_to_wake_up
> > and barriers is that you do not need another barrier between setting
> > condition and waking up.
> 
> Sounds like an excellent idea in general.  But could you please show me
> a short code snippet illustrating where you don't need the additional
> barrier, even if the fastpaths are taken so that there is no sleep and
> no wakeup?
> 

If there is no sleep and no wakeup, it means only CONDITION changed.
Either CONDITION is a single variable or it should maintains internal
ordering guarantee itself. And there is no need for barriers, because
there is only one shared resource we are talking about, right?

But I'm still a little confused at Oleg's words:

"What is really important is that we have a barrier before we _read_ the
task state."

I read is as "What is really important is that we have a barrier before
we _read_ the task state and _after_ we write the CONDITION", if I don't
misunderstand Oleg, this means a STORE-barrier-LOAD sequence, which IIUC
can't pair with anything.

So, there might be some tricky barrier usage here?

Regards,
Boqun

signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-08-31 Thread Paul E. McKenney

On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> On 08/31, Boqun Feng wrote:
> >
> > Fair enough, I went too far. How about just a single paragraph saying
> > that:
> >
> > The wake_up(), wait_event() and their friends have proper barriers in
> > them, but these implicity barriers are only for the correctness for
> > sleep and wakeup. So don't rely on these barriers for things that are
> > neither wait-conditons nor task states.
> >
> > Is that OK to you?
> 
> Ask Paul ;) but personally I agree.
> 
> To me, the only thing a user should know about wake_up/try_to_wake_up
> and barriers is that you do not need another barrier between setting
> condition and waking up.

Sounds like an excellent idea in general.  But could you please show me
a short code snippet illustrating where you don't need the additional
barrier, even if the fastpaths are taken so that there is no sleep and
no wakeup?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-31 Thread Oleg Nesterov

On 08/31, Boqun Feng wrote:
>
> Fair enough, I went too far. How about just a single paragraph saying
> that:
>
> The wake_up(), wait_event() and their friends have proper barriers in
> them, but these implicity barriers are only for the correctness for
> sleep and wakeup. So don't rely on these barriers for things that are
> neither wait-conditons nor task states.
>
> Is that OK to you?

Ask Paul ;) but personally I agree.

To me, the only thing a user should know about wake_up/try_to_wake_up
and barriers is that you do not need another barrier between setting
condition and waking up.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-31 Thread Oleg Nesterov

On 08/31, Boqun Feng wrote:
>
> Fair enough, I went too far. How about just a single paragraph saying
> that:
>
> The wake_up(), wait_event() and their friends have proper barriers in
> them, but these implicity barriers are only for the correctness for
> sleep and wakeup. So don't rely on these barriers for things that are
> neither wait-conditons nor task states.
>
> Is that OK to you?

Ask Paul ;) but personally I agree.

To me, the only thing a user should know about wake_up/try_to_wake_up
and barriers is that you do not need another barrier between setting
condition and waking up.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-31 Thread Paul E. McKenney

On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> On 08/31, Boqun Feng wrote:
> >
> > Fair enough, I went too far. How about just a single paragraph saying
> > that:
> >
> > The wake_up(), wait_event() and their friends have proper barriers in
> > them, but these implicity barriers are only for the correctness for
> > sleep and wakeup. So don't rely on these barriers for things that are
> > neither wait-conditons nor task states.
> >
> > Is that OK to you?
> 
> Ask Paul ;) but personally I agree.
> 
> To me, the only thing a user should know about wake_up/try_to_wake_up
> and barriers is that you do not need another barrier between setting
> condition and waking up.

Sounds like an excellent idea in general.  But could you please show me
a short code snippet illustrating where you don't need the additional
barrier, even if the fastpaths are taken so that there is no sleep and
no wakeup?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-31 Thread Paul E. McKenney

On Tue, Sep 01, 2015 at 11:40:14AM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Mon, Aug 31, 2015 at 01:37:39PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > > On 08/31, Boqun Feng wrote:
> > > >
> > > > Fair enough, I went too far. How about just a single paragraph saying
> > > > that:
> > > >
> > > > The wake_up(), wait_event() and their friends have proper barriers in
> > > > them, but these implicity barriers are only for the correctness for
> > > > sleep and wakeup. So don't rely on these barriers for things that are
> > > > neither wait-conditons nor task states.
> > > >
> > > > Is that OK to you?
> > > 
> > > Ask Paul ;) but personally I agree.
> > > 
> > > To me, the only thing a user should know about wake_up/try_to_wake_up
> > > and barriers is that you do not need another barrier between setting
> > > condition and waking up.
> > 
> > Sounds like an excellent idea in general.  But could you please show me
> > a short code snippet illustrating where you don't need the additional
> > barrier, even if the fastpaths are taken so that there is no sleep and
> > no wakeup?
> 
> If there is no sleep and no wakeup, it means only CONDITION changed.
> Either CONDITION is a single variable or it should maintains internal
> ordering guarantee itself. And there is no need for barriers, because
> there is only one shared resource we are talking about, right?

I could imagine all sorts of combinations, which is why I would like
to see a code snippet showing exactly what Oleg is talking about.  ;-)

Thanx, Paul

> But I'm still a little confused at Oleg's words:
> 
> "What is really important is that we have a barrier before we _read_ the
> task state."
> 
> I read is as "What is really important is that we have a barrier before
> we _read_ the task state and _after_ we write the CONDITION", if I don't
> misunderstand Oleg, this means a STORE-barrier-LOAD sequence, which IIUC
> can't pair with anything.
> 
> So, there might be some tricky barrier usage here?
> 
> Regards,
> Boqun


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-31 Thread Boqun Feng

Hi Paul,

On Mon, Aug 31, 2015 at 01:37:39PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > On 08/31, Boqun Feng wrote:
> > >
> > > Fair enough, I went too far. How about just a single paragraph saying
> > > that:
> > >
> > > The wake_up(), wait_event() and their friends have proper barriers in
> > > them, but these implicity barriers are only for the correctness for
> > > sleep and wakeup. So don't rely on these barriers for things that are
> > > neither wait-conditons nor task states.
> > >
> > > Is that OK to you?
> > 
> > Ask Paul ;) but personally I agree.
> > 
> > To me, the only thing a user should know about wake_up/try_to_wake_up
> > and barriers is that you do not need another barrier between setting
> > condition and waking up.
> 
> Sounds like an excellent idea in general.  But could you please show me
> a short code snippet illustrating where you don't need the additional
> barrier, even if the fastpaths are taken so that there is no sleep and
> no wakeup?
> 

If there is no sleep and no wakeup, it means only CONDITION changed.
Either CONDITION is a single variable or it should maintains internal
ordering guarantee itself. And there is no need for barriers, because
there is only one shared resource we are talking about, right?

But I'm still a little confused at Oleg's words:

"What is really important is that we have a barrier before we _read_ the
task state."

I read is as "What is really important is that we have a barrier before
we _read_ the task state and _after_ we write the CONDITION", if I don't
misunderstand Oleg, this means a STORE-barrier-LOAD sequence, which IIUC
can't pair with anything.

So, there might be some tricky barrier usage here?

Regards,
Boqun

signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-08-30 Thread Boqun Feng

Hi Oleg,

On Sat, Aug 29, 2015 at 04:27:07PM +0200, Oleg Nesterov wrote:
> Hello Boqun,
> 
...
> > By this, I think you actually means the example below the added text,
> > i.e. the example for "to repeat..", right?
> 
> And above. Even
> 
>   The barrier occurs before the task state is cleared
> 
> is not actually right. This is misleading. What is really important is that
> we have a barrier before we _read_ the task state. And again, again, the
> fact that we actually have the write barrier is just the implementation
> detail.
> 

Yes, agree with you. However, I didn't realize this problem before, I
will see what I can do ;-)

> > Subject: Documentation: call out conditional barriers of wait_*() and 
> > wake_up*()
> >
> > The memory barriers in some sleep and wakeup functions are conditional,
> > there are several situations that there is no barriers:
> >
> > 1.  If wait_event() and co. actually don't prepare to sleep, there
> > may be no barrier in them.
> 
> And thus (imo) we should not even try to document this. I mean, a user
> should not make any assumptions about the barriers inside wait_event().
> 

Yep.

> > 2.  No matter whether a sleep occurs or not, there may be no barrier
> > between a successful wait-condition checking(the result of which
> > is true) in wait_event() and the following instructions.
> 
> Yes, this is true in any case. Not sure this deserves addtionional
> documentation, but see below.
> 
> > 3.  If wake_up() and co. actually wake up no one, there may be no
> > write barrier in them.
> 
> See above.
> 
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
> >
> >  which therefore also imply a general memory barrier after setting the 
> > state.
> >  The whole sequence above is available in various canned forms, all of which
> > -interpolate the memory barrier in the right place:
> > +imply a general barrier if and only if a sleep is at least about to happen,
> > +i.e. prepare_to_wait*() is called.
> >
> > wait_event();
> > wait_event_interruptible();
> > @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
> > wait_on_bit();
> > wait_on_bit_lock();
> >
> > +Further more, no barrier is guaranteed after the successful wait condition
> > +checkings, whose results are true, in wait_*() and before the instructions
> > +following wait_*().
> 
> Yes, perhaps this makes sense, but (to me) only because the explanation above
> looks a bit confusing to me. I simply can't understand why memory-barriers.txt
> mentions that barrier implied by set_current_state(). You need to know this
> if you want to understand how wait_event() works. You do not need to know
> about this barrier if you want to use wait_event(). If nothing else, because
> you can never rely on this barrier. Anf your examples below shows this.
> 

Fair enough, I went too far. How about just a single paragraph saying
that:

The wake_up(), wait_event() and their friends have proper barriers in
them, but these implicity barriers are only for the correctness for
sleep and wakeup. So don't rely on these barriers for things that are
neither wait-conditons nor task states.

Is that OK to you?

Thank you for your comments ;-)

Regards,
Boqun



signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-08-30 Thread Boqun Feng

Hi Oleg,

On Sat, Aug 29, 2015 at 04:27:07PM +0200, Oleg Nesterov wrote:
 Hello Boqun,
 
...
  By this, I think you actually means the example below the added text,
  i.e. the example for to repeat.., right?
 
 And above. Even
 
   The barrier occurs before the task state is cleared
 
 is not actually right. This is misleading. What is really important is that
 we have a barrier before we _read_ the task state. And again, again, the
 fact that we actually have the write barrier is just the implementation
 detail.
 

Yes, agree with you. However, I didn't realize this problem before, I
will see what I can do ;-)

  Subject: Documentation: call out conditional barriers of wait_*() and 
  wake_up*()
 
  The memory barriers in some sleep and wakeup functions are conditional,
  there are several situations that there is no barriers:
 
  1.  If wait_event() and co. actually don't prepare to sleep, there
  may be no barrier in them.
 
 And thus (imo) we should not even try to document this. I mean, a user
 should not make any assumptions about the barriers inside wait_event().
 

Yep.

  2.  No matter whether a sleep occurs or not, there may be no barrier
  between a successful wait-condition checking(the result of which
  is true) in wait_event() and the following instructions.
 
 Yes, this is true in any case. Not sure this deserves addtionional
 documentation, but see below.
 
  3.  If wake_up() and co. actually wake up no one, there may be no
  write barrier in them.
 
 See above.
 
  --- a/Documentation/memory-barriers.txt
  +++ b/Documentation/memory-barriers.txt
  @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
 
   which therefore also imply a general memory barrier after setting the 
  state.
   The whole sequence above is available in various canned forms, all of which
  -interpolate the memory barrier in the right place:
  +imply a general barrier if and only if a sleep is at least about to happen,
  +i.e. prepare_to_wait*() is called.
 
  wait_event();
  wait_event_interruptible();
  @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
  wait_on_bit();
  wait_on_bit_lock();
 
  +Further more, no barrier is guaranteed after the successful wait condition
  +checkings, whose results are true, in wait_*() and before the instructions
  +following wait_*().
 
 Yes, perhaps this makes sense, but (to me) only because the explanation above
 looks a bit confusing to me. I simply can't understand why memory-barriers.txt
 mentions that barrier implied by set_current_state(). You need to know this
 if you want to understand how wait_event() works. You do not need to know
 about this barrier if you want to use wait_event(). If nothing else, because
 you can never rely on this barrier. Anf your examples below shows this.
 

Fair enough, I went too far. How about just a single paragraph saying
that:

The wake_up(), wait_event() and their friends have proper barriers in
them, but these implicity barriers are only for the correctness for
sleep and wakeup. So don't rely on these barriers for things that are
neither wait-conditons nor task states.

Is that OK to you?

Thank you for your comments ;-)

Regards,
Boqun



signature.asc
Description: PGP signature

Re: wake_up_process implied memory barrier clarification

2015-08-29 Thread Oleg Nesterov

Hello Boqun,

On 08/29, Boqun Feng wrote:
> Hi Oleg
>
> On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
> > On 08/28, Michal Hocko wrote:
> > >
> > > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > > > On 08/27, Michal Hocko wrote:
> > > > >
> > > > > --- a/Documentation/memory-barriers.txt
> > > > > +++ b/Documentation/memory-barriers.txt
> > > > > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the 
> > > > > task state is cleared, and so sits
> > > > >STORE current->state
> > > > >   LOAD event_indicated
> > > > >
> > > > > +Please note that wake_up_process is an exception here because it 
> > > > > implies
> > > > > +the write memory barrier unconditionally.
> > > > > +
> > > >
> > > > I simply can't understand (can't even parse) this part of 
> > > > memory-barriers.txt.
> > >
> > > Do you mean the added text or the example above it?
> >
> > Both ;)
> >
> > but note that "load from X might see 0" is true of course, and in this
>
> By this, I think you actually means the example below the added text,
> i.e. the example for "to repeat..", right?

And above. Even

The barrier occurs before the task state is cleared

is not actually right. This is misleading. What is really important is that
we have a barrier before we _read_ the task state. And again, again, the
fact that we actually have the write barrier is just the implementation
detail.

> Subject: Documentation: call out conditional barriers of wait_*() and 
> wake_up*()
>
> The memory barriers in some sleep and wakeup functions are conditional,
> there are several situations that there is no barriers:
>
> 1.If wait_event() and co. actually don't prepare to sleep, there
>   may be no barrier in them.

And thus (imo) we should not even try to document this. I mean, a user
should not make any assumptions about the barriers inside wait_event().

> 2.No matter whether a sleep occurs or not, there may be no barrier
>   between a successful wait-condition checking(the result of which
>   is true) in wait_event() and the following instructions.

Yes, this is true in any case. Not sure this deserves addtionional
documentation, but see below.

> 3.If wake_up() and co. actually wake up no one, there may be no
>   write barrier in them.

See above.

> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
>
>  which therefore also imply a general memory barrier after setting the state.
>  The whole sequence above is available in various canned forms, all of which
> -interpolate the memory barrier in the right place:
> +imply a general barrier if and only if a sleep is at least about to happen,
> +i.e. prepare_to_wait*() is called.
>
>   wait_event();
>   wait_event_interruptible();
> @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
>   wait_on_bit();
>   wait_on_bit_lock();
>
> +Further more, no barrier is guaranteed after the successful wait condition
> +checkings, whose results are true, in wait_*() and before the instructions
> +following wait_*().

Yes, perhaps this makes sense, but (to me) only because the explanation above
looks a bit confusing to me. I simply can't understand why memory-barriers.txt
mentions that barrier implied by set_current_state(). You need to know this
if you want to understand how wait_event() works. You do not need to know
about this barrier if you want to use wait_event(). If nothing else, because
you can never rely on this barrier. Anf your examples below shows this.

> +To repeat, barriers in wait_event(), wake_up() and co. are conditional, 
> meaning
> +they are present if and only if a sleep or a wakeup actually occurs. To see
> +this, consider the following three examples.
> +
> +The first example is for the conditional barriers in wait_*(), say X and Y
> +are both initially zero:
> +
> + CPU 1   CPU 2
> + === ===
> + X = 1;
> + smp_mb();
> + wait_event(wq, Y == 1); Y = 1;
> +   load from Y sees 1wake_up();
> +   
> + load from X might see 0
> +
> +And even if a sleep and a wakeup really occurs, there might be no barrier
> +between the last load from Y(which the makes wait condition checking succeed)
> +and load from X, so that load from X might still see 0.
...

> +So that, to guarantee that CPU 1's load from X is 1 in the two examples 
> above,
> +a read barrier must be added after wait_event() in the code running on CPU 1.

Yes. So why should we try to document that barrier in wait_event() ?

> +The third example is for the conditional write barriers in wake_up*(), say
> +X, Y and Z are all initially zero:
> +
> + CPU 1   CPU 2
> + ===

Re: wake_up_process implied memory barrier clarification

2015-08-29 Thread Boqun Feng

Hi Oleg

On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
> On 08/28, Michal Hocko wrote:
> >
> > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > > On 08/27, Michal Hocko wrote:
> > > >
> > > > --- a/Documentation/memory-barriers.txt
> > > > +++ b/Documentation/memory-barriers.txt
> > > > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task 
> > > > state is cleared, and so sits
> > > >  STORE current->state
> > > > LOAD event_indicated
> > > >
> > > > +Please note that wake_up_process is an exception here because it 
> > > > implies
> > > > +the write memory barrier unconditionally.
> > > > +
> > >
> > > I simply can't understand (can't even parse) this part of 
> > > memory-barriers.txt.
> >
> > Do you mean the added text or the example above it?
> 
> Both ;)
> 
> but note that "load from X might see 0" is true of course, and in this

By this, I think you actually means the example below the added text,
i.e. the example for "to repeat..", right?

I think that's not a good example, and actually that example explains
that the barriers in -wait_event()- rather than in -wake_up()- are
conditional.

I wrote a patch to make things more clear, hope that helps.

(Add Paul and Jonathan to CCed)

Regards,
Boqun

> sense wake_up_process() is not exception:
> 
>   X = 1;
>   wmb();  // unless I am totally confused this just adds more confusion
>   Y = 1;
>   wake_up_process(TASK);
> 
> vs TASK doing
> 
>   for (;;) {
>   set_current_state(...);
>   if (Y)
>   break;
>   schedule();
>   }
> 
>   BUG_ON(X == 0)
> 
> is not correct, it can actually can hit the BUG_ON() above. However, if
> wake_up_process() actually wakes a sleeping TASK up, then it should also
> see X = 1. Even without wmb(), even if we do
> 
>   Y = 1;
>   X = 1;
>   wake_up_process(TASK);
> 

--->8
Subject: Documentation: call out conditional barriers of wait_*() and wake_up*()

The memory barriers in some sleep and wakeup functions are conditional,
there are several situations that there is no barriers:

1.  If wait_event() and co. actually don't prepare to sleep, there
may be no barrier in them.

2.  No matter whether a sleep occurs or not, there may be no barrier
between a successful wait-condition checking(the result of which
is true) in wait_event() and the following instructions.

3.  If wake_up() and co. actually wake up no one, there may be no
write barrier in them.

However, the current version of memory-barriers.txt doesn't call these
out. Further more, the example which wanted to explain that write
barriers in wake_up*() are conditional fails its job. To see that,
consider a similar example:

CPU 1   CPU 2
=== ===
X = 1;
smp_mb();
Y = 1;  wait_event(wq, Y == 1);
  load from Y sees 1, no memory barrier
smp_wmb();
wake_up();
load from X might see 0

CPU 2's load from X might still be 0 even if we add a write barrier
explicitly, which means that even if wake_up() guarantees a write
barrier, the original example still doesn't have the desired order
guarantee. In fact, the original example can explains the conditionality
of barriers in wait_*() rather than wake_up*().

This patch makes things clear, by calling out that the barriers in
wait_*() and wake_up*() are conditional and giving exact examples to
explain that.

Signed-off-by: Boqun Feng 
---

 Documentation/memory-barriers.txt | 81 +++
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index eafa6a5..df69841 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:

 which therefore also imply a general memory barrier after setting the state.
 The whole sequence above is available in various canned forms, all of which
-interpolate the memory barrier in the right place:
+imply a general barrier if and only if a sleep is at least about to happen,
+i.e. prepare_to_wait*() is called.

wait_event();
wait_event_interruptible();
@@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
wait_on_bit();
wait_on_bit_lock();

+Further more, no barrier is guaranteed after the successful wait condition
+checkings, whose results are true, in wait_*() and before the instructions
+following wait_*().

 Secondly, code that performs a wake up normally follows something like this:

@@ -2009,21 +2013,6 @@ between the STORE to indicate the event and the STORE to 
set TASK_RUNNING:

Re: wake_up_process implied memory barrier clarification

2015-08-29 Thread Boqun Feng

Hi Oleg

On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
 On 08/28, Michal Hocko wrote:
 
  On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
   On 08/27, Michal Hocko wrote:
   
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task 
state is cleared, and so sits
general barrier STORE current-state
LOAD event_indicated
   
+Please note that wake_up_process is an exception here because it 
implies
+the write memory barrier unconditionally.
+
  
   I simply can't understand (can't even parse) this part of 
   memory-barriers.txt.
 
  Do you mean the added text or the example above it?
 
 Both ;)
 
 but note that load from X might see 0 is true of course, and in this

By this, I think you actually means the example below the added text,
i.e. the example for to repeat.., right?

I think that's not a good example, and actually that example explains
that the barriers in -wait_event()- rather than in -wake_up()- are
conditional.

I wrote a patch to make things more clear, hope that helps.

(Add Paul and Jonathan to CCed)

Regards,
Boqun

 sense wake_up_process() is not exception:
 
   X = 1;
   wmb();  // unless I am totally confused this just adds more confusion
   Y = 1;
   wake_up_process(TASK);
 
 vs TASK doing
 
   for (;;) {
   set_current_state(...);
   if (Y)
   break;
   schedule();
   }
 
   BUG_ON(X == 0)
 
 is not correct, it can actually can hit the BUG_ON() above. However, if
 wake_up_process() actually wakes a sleeping TASK up, then it should also
 see X = 1. Even without wmb(), even if we do
 
   Y = 1;
   X = 1;
   wake_up_process(TASK);
 

---8
Subject: Documentation: call out conditional barriers of wait_*() and wake_up*()

The memory barriers in some sleep and wakeup functions are conditional,
there are several situations that there is no barriers:

1.  If wait_event() and co. actually don't prepare to sleep, there
may be no barrier in them.

2.  No matter whether a sleep occurs or not, there may be no barrier
between a successful wait-condition checking(the result of which
is true) in wait_event() and the following instructions.

3.  If wake_up() and co. actually wake up no one, there may be no
write barrier in them.

However, the current version of memory-barriers.txt doesn't call these
out. Further more, the example which wanted to explain that write
barriers in wake_up*() are conditional fails its job. To see that,
consider a similar example:

CPU 1   CPU 2
=== ===
X = 1;
smp_mb();
Y = 1;  wait_event(wq, Y == 1);
  load from Y sees 1, no memory barrier
smp_wmb();
wake_up();
load from X might see 0

CPU 2's load from X might still be 0 even if we add a write barrier
explicitly, which means that even if wake_up() guarantees a write
barrier, the original example still doesn't have the desired order
guarantee. In fact, the original example can explains the conditionality
of barriers in wait_*() rather than wake_up*().

This patch makes things clear, by calling out that the barriers in
wait_*() and wake_up*() are conditional and giving exact examples to
explain that.

Signed-off-by: Boqun Feng boqun.f...@gmail.com
---

 Documentation/memory-barriers.txt | 81 +++
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index eafa6a5..df69841 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
 
 which therefore also imply a general memory barrier after setting the state.
 The whole sequence above is available in various canned forms, all of which
-interpolate the memory barrier in the right place:
+imply a general barrier if and only if a sleep is at least about to happen,
+i.e. prepare_to_wait*() is called.
 
wait_event();
wait_event_interruptible();
@@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
wait_on_bit();
wait_on_bit_lock();
 
+Further more, no barrier is guaranteed after the successful wait condition
+checkings, whose results are true, in wait_*() and before the instructions
+following wait_*().
 
 Secondly, code that performs a wake up normally follows something like this:
 
@@ -2009,21 +2013,6 @@ between the STORE to indicate the event and the STORE to 
set TASK_RUNNING:
general barrier STORE current-state
LOAD event_indicated

Re: wake_up_process implied memory barrier clarification

2015-08-29 Thread Oleg Nesterov

Hello Boqun,

On 08/29, Boqun Feng wrote:
 Hi Oleg

 On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
  On 08/28, Michal Hocko wrote:
  
   On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
On 08/27, Michal Hocko wrote:

 --- a/Documentation/memory-barriers.txt
 +++ b/Documentation/memory-barriers.txt
 @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the 
 task state is cleared, and so sits
   general barrier STORE current-state
   LOAD event_indicated

 +Please note that wake_up_process is an exception here because it 
 implies
 +the write memory barrier unconditionally.
 +
   
I simply can't understand (can't even parse) this part of 
memory-barriers.txt.
  
   Do you mean the added text or the example above it?
 
  Both ;)
 
  but note that load from X might see 0 is true of course, and in this

 By this, I think you actually means the example below the added text,
 i.e. the example for to repeat.., right?

And above. Even

The barrier occurs before the task state is cleared

is not actually right. This is misleading. What is really important is that
we have a barrier before we _read_ the task state. And again, again, the
fact that we actually have the write barrier is just the implementation
detail.

 Subject: Documentation: call out conditional barriers of wait_*() and 
 wake_up*()

 The memory barriers in some sleep and wakeup functions are conditional,
 there are several situations that there is no barriers:

 1.If wait_event() and co. actually don't prepare to sleep, there
   may be no barrier in them.

And thus (imo) we should not even try to document this. I mean, a user
should not make any assumptions about the barriers inside wait_event().

 2.No matter whether a sleep occurs or not, there may be no barrier
   between a successful wait-condition checking(the result of which
   is true) in wait_event() and the following instructions.

Yes, this is true in any case. Not sure this deserves addtionional
documentation, but see below.

 3.If wake_up() and co. actually wake up no one, there may be no
   write barrier in them.

See above.

 --- a/Documentation/memory-barriers.txt
 +++ b/Documentation/memory-barriers.txt
 @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:

  which therefore also imply a general memory barrier after setting the state.
  The whole sequence above is available in various canned forms, all of which
 -interpolate the memory barrier in the right place:
 +imply a general barrier if and only if a sleep is at least about to happen,
 +i.e. prepare_to_wait*() is called.

   wait_event();
   wait_event_interruptible();
 @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
   wait_on_bit();
   wait_on_bit_lock();

 +Further more, no barrier is guaranteed after the successful wait condition
 +checkings, whose results are true, in wait_*() and before the instructions
 +following wait_*().

Yes, perhaps this makes sense, but (to me) only because the explanation above
looks a bit confusing to me. I simply can't understand why memory-barriers.txt
mentions that barrier implied by set_current_state(). You need to know this
if you want to understand how wait_event() works. You do not need to know
about this barrier if you want to use wait_event(). If nothing else, because
you can never rely on this barrier. Anf your examples below shows this.

 +To repeat, barriers in wait_event(), wake_up() and co. are conditional, 
 meaning
 +they are present if and only if a sleep or a wakeup actually occurs. To see
 +this, consider the following three examples.
 +
 +The first example is for the conditional barriers in wait_*(), say X and Y
 +are both initially zero:
 +
 + CPU 1   CPU 2
 + === ===
 + X = 1;
 + smp_mb();
 + wait_event(wq, Y == 1); Y = 1;
 +   load from Y sees 1wake_up();
 +   no memory barrier
 + load from X might see 0
 +
 +And even if a sleep and a wakeup really occurs, there might be no barrier
 +between the last load from Y(which the makes wait condition checking succeed)
 +and load from X, so that load from X might still see 0.
...

 +So that, to guarantee that CPU 1's load from X is 1 in the two examples 
 above,
 +a read barrier must be added after wait_event() in the code running on CPU 1.

Yes. So why should we try to document that barrier in wait_event() ?

 +The third example is for the conditional write barriers in wake_up*(), say
 +X, Y and Z are all initially zero:
 +
 + CPU 1   CPU 2
 + === ===
 + X = 1;
 + wait_event(wq, Y == 1); Y = 1;
 +   load from Y

Re: wake_up_process implied memory barrier clarification

2015-08-28 Thread Oleg Nesterov

On 08/28, Michal Hocko wrote:
>
> On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > On 08/27, Michal Hocko wrote:
> > >
> > > --- a/Documentation/memory-barriers.txt
> > > +++ b/Documentation/memory-barriers.txt
> > > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task 
> > > state is cleared, and so sits
> > >STORE current->state
> > >   LOAD event_indicated
> > >
> > > +Please note that wake_up_process is an exception here because it implies
> > > +the write memory barrier unconditionally.
> > > +
> >
> > I simply can't understand (can't even parse) this part of 
> > memory-barriers.txt.
>
> Do you mean the added text or the example above it?

Both ;)

but note that "load from X might see 0" is true of course, and in this
sense wake_up_process() is not exception:

X = 1;
wmb();  // unless I am totally confused this just adds more confusion
Y = 1;
wake_up_process(TASK);

vs TASK doing

for (;;) {
set_current_state(...);
if (Y)
break;
schedule();
}

BUG_ON(X == 0)

is not correct, it can actually can hit the BUG_ON() above. However, if
wake_up_process() actually wakes a sleeping TASK up, then it should also
see X = 1. Even without wmb(), even if we do

Y = 1;
X = 1;
wake_up_process(TASK);

> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct 
> > > *p)
> > >   *
> > >   * Return: 1 if the process was woken up, 0 if it was already running.
> > >   *
> > > - * It may be assumed that this function implies a write memory barrier 
> > > before
> > > - * changing the task state if and only if any tasks are woken up.
> > > + * It may be assumed that this function implies a write memory barrier.
> > >   */
> >
> > I won't argue, technically this is correct of course. And I agree that
> > the old comment is misleading.
>
> Well the reason I've noticed is the following race in the scsi code
> CPU0CPU1
> scsi_error_handler  scsi_host_dev_release
>   kthread_stop()
>   while (!kthread_should_stop()) {
> set_bit(KTHREAD_SHOULD_STOP)
> wake_up_process()
> wait_for_completion()
>
> set_current_state(TASK_INTERRUPTIBLE)
> schedule()

Heh. This looks like a common mistake. See fecdf8be2d91e04b0a9a4f79ff06499 ;)

But I believe this is another thing.

> I have read the comment for wake_up_process and was wondering that
> moving set_current_state before kthread_should_stop wouldn't be enough
> because the the task at CPU0 might be TASK_RUNNIG and so wake_up_process
> wouldn't wake up it and the missing write barrier could lead to a missed
> KTHREAD_SHOULD_STOP.

And that is why try_to_wake_up()->smp_mb__before_spinlock() needs to
serialize STORE(CONDITION) and the subsequent LOAD(p->state). The fact
that it actually does wmb() is just implementation detail, that is what
I tried to say.

> > To me, this comment should just explain that this function implies a barrier
> > but only in a sense that you do not need another one after CONDITION = T and
> > before wake_up_process().
>
> I have no objection against more precise wording here but what we have is just
> misleading.

Yes, yes, I agree. Just I do not know what exactly it should document.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-28 Thread Michal Hocko

On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> On 08/27, Michal Hocko wrote:
> >
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task 
> > state is cleared, and so sits
> >  STORE current->state
> > LOAD event_indicated
> >
> > +Please note that wake_up_process is an exception here because it implies
> > +the write memory barrier unconditionally.
> > +
> 
> I simply can't understand (can't even parse) this part of memory-barriers.txt.

Do you mean the added text or the example above it?

> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct 
> > *p)
> >   *
> >   * Return: 1 if the process was woken up, 0 if it was already running.
> >   *
> > - * It may be assumed that this function implies a write memory barrier 
> > before
> > - * changing the task state if and only if any tasks are woken up.
> > + * It may be assumed that this function implies a write memory barrier.
> >   */
> 
> I won't argue, technically this is correct of course. And I agree that
> the old comment is misleading.

Well the reason I've noticed is the following race in the scsi code
CPU0CPU1
scsi_error_handler  scsi_host_dev_release
  kthread_stop()
  while (!kthread_should_stop()) {
set_bit(KTHREAD_SHOULD_STOP)
wake_up_process()
wait_for_completion()

set_current_state(TASK_INTERRUPTIBLE)
schedule()
[...]
  }

I have read the comment for wake_up_process and was wondering that
moving set_current_state before kthread_should_stop wouldn't be enough
because the the task at CPU0 might be TASK_RUNNIG and so wake_up_process
wouldn't wake up it and the missing write barrier could lead to a missed
KTHREAD_SHOULD_STOP. A look into ttwu made my worry void.

> But the new comment looks as if it is fine to avoid wmb() if you do
> wake_up_process(). Say,
> 
>   void w(void)
>   {
>   A = 1;
>   wake_up_process(something_unrelated);
>   // we know that it implies wmb().
>   B = 1;
>   }
> 
>   void r(void)
>   {
>   int a, b;
> 
>   b = B;
>   rmb();
>   a = A;
> 
>   BUG_ON(b && !a);
>   }
> 
> Perhaps this part of the comment should be simply removed, the unconditional
> wmb() in ttwu() is just implementation detail. And note that initially the
> documented behaviour of smp_mb__before_spinlock() was only the STORE - LOAD
> serialization. Then people noticed that it actually does wmb() and started
> to rely on this fact.
> 
> To me, this comment should just explain that this function implies a barrier
> but only in a sense that you do not need another one after CONDITION = T and
> before wake_up_process().

I have no objection against more precise wording here but what we have is just
misleading.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-28 Thread Michal Hocko

On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
 On 08/27, Michal Hocko wrote:
 
  --- a/Documentation/memory-barriers.txt
  +++ b/Documentation/memory-barriers.txt
  @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task 
  state is cleared, and so sits
  general barrier STORE current-state
  LOAD event_indicated
 
  +Please note that wake_up_process is an exception here because it implies
  +the write memory barrier unconditionally.
  +
 
 I simply can't understand (can't even parse) this part of memory-barriers.txt.

Do you mean the added text or the example above it?

  --- a/kernel/sched/core.c
  +++ b/kernel/sched/core.c
  @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct 
  *p)
*
* Return: 1 if the process was woken up, 0 if it was already running.
*
  - * It may be assumed that this function implies a write memory barrier 
  before
  - * changing the task state if and only if any tasks are woken up.
  + * It may be assumed that this function implies a write memory barrier.
*/
 
 I won't argue, technically this is correct of course. And I agree that
 the old comment is misleading.

Well the reason I've noticed is the following race in the scsi code
CPU0CPU1
scsi_error_handler  scsi_host_dev_release
  kthread_stop()
  while (!kthread_should_stop()) {
set_bit(KTHREAD_SHOULD_STOP)
wake_up_process()
wait_for_completion()

set_current_state(TASK_INTERRUPTIBLE)
schedule()
[...]
  }

I have read the comment for wake_up_process and was wondering that
moving set_current_state before kthread_should_stop wouldn't be enough
because the the task at CPU0 might be TASK_RUNNIG and so wake_up_process
wouldn't wake up it and the missing write barrier could lead to a missed
KTHREAD_SHOULD_STOP. A look into ttwu made my worry void.

 But the new comment looks as if it is fine to avoid wmb() if you do
 wake_up_process(). Say,
 
   void w(void)
   {
   A = 1;
   wake_up_process(something_unrelated);
   // we know that it implies wmb().
   B = 1;
   }
 
   void r(void)
   {
   int a, b;
 
   b = B;
   rmb();
   a = A;
 
   BUG_ON(b  !a);
   }
 
 Perhaps this part of the comment should be simply removed, the unconditional
 wmb() in ttwu() is just implementation detail. And note that initially the
 documented behaviour of smp_mb__before_spinlock() was only the STORE - LOAD
 serialization. Then people noticed that it actually does wmb() and started
 to rely on this fact.
 
 To me, this comment should just explain that this function implies a barrier
 but only in a sense that you do not need another one after CONDITION = T and
 before wake_up_process().

I have no objection against more precise wording here but what we have is just
misleading.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-28 Thread Oleg Nesterov

On 08/28, Michal Hocko wrote:

 On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
  On 08/27, Michal Hocko wrote:
  
   --- a/Documentation/memory-barriers.txt
   +++ b/Documentation/memory-barriers.txt
   @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task 
   state is cleared, and so sits
 general barrier STORE current-state
 LOAD event_indicated
  
   +Please note that wake_up_process is an exception here because it implies
   +the write memory barrier unconditionally.
   +
 
  I simply can't understand (can't even parse) this part of 
  memory-barriers.txt.

 Do you mean the added text or the example above it?

Both ;)

but note that load from X might see 0 is true of course, and in this
sense wake_up_process() is not exception:

X = 1;
wmb();  // unless I am totally confused this just adds more confusion
Y = 1;
wake_up_process(TASK);

vs TASK doing

for (;;) {
set_current_state(...);
if (Y)
break;
schedule();
}

BUG_ON(X == 0)

is not correct, it can actually can hit the BUG_ON() above. However, if
wake_up_process() actually wakes a sleeping TASK up, then it should also
see X = 1. Even without wmb(), even if we do

Y = 1;
X = 1;
wake_up_process(TASK);

   --- a/kernel/sched/core.c
   +++ b/kernel/sched/core.c
   @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct 
   *p)
 *
 * Return: 1 if the process was woken up, 0 if it was already running.
 *
   - * It may be assumed that this function implies a write memory barrier 
   before
   - * changing the task state if and only if any tasks are woken up.
   + * It may be assumed that this function implies a write memory barrier.
 */
 
  I won't argue, technically this is correct of course. And I agree that
  the old comment is misleading.

 Well the reason I've noticed is the following race in the scsi code
 CPU0CPU1
 scsi_error_handler  scsi_host_dev_release
   kthread_stop()
   while (!kthread_should_stop()) {
 set_bit(KTHREAD_SHOULD_STOP)
 wake_up_process()
 wait_for_completion()

 set_current_state(TASK_INTERRUPTIBLE)
 schedule()

Heh. This looks like a common mistake. See fecdf8be2d91e04b0a9a4f79ff06499 ;)

But I believe this is another thing.

 I have read the comment for wake_up_process and was wondering that
 moving set_current_state before kthread_should_stop wouldn't be enough
 because the the task at CPU0 might be TASK_RUNNIG and so wake_up_process
 wouldn't wake up it and the missing write barrier could lead to a missed
 KTHREAD_SHOULD_STOP.

And that is why try_to_wake_up()-smp_mb__before_spinlock() needs to
serialize STORE(CONDITION) and the subsequent LOAD(p-state). The fact
that it actually does wmb() is just implementation detail, that is what
I tried to say.

  To me, this comment should just explain that this function implies a barrier
  but only in a sense that you do not need another one after CONDITION = T and
  before wake_up_process().

 I have no objection against more precise wording here but what we have is just
 misleading.

Yes, yes, I agree. Just I do not know what exactly it should document.

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-27 Thread Oleg Nesterov

On 08/27, Michal Hocko wrote:
>
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state 
> is cleared, and so sits
>STORE current->state
>   LOAD event_indicated
>
> +Please note that wake_up_process is an exception here because it implies
> +the write memory barrier unconditionally.
> +

I simply can't understand (can't even parse) this part of memory-barriers.txt.

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p)
>   *
>   * Return: 1 if the process was woken up, 0 if it was already running.
>   *
> - * It may be assumed that this function implies a write memory barrier before
> - * changing the task state if and only if any tasks are woken up.
> + * It may be assumed that this function implies a write memory barrier.
>   */

I won't argue, technically this is correct of course. And I agree that
the old comment is misleading.

But the new comment looks as if it is fine to avoid wmb() if you do
wake_up_process(). Say,

void w(void)
{
A = 1;
wake_up_process(something_unrelated);
// we know that it implies wmb().
B = 1;
}

void r(void)
{
int a, b;

b = B;
rmb();
a = A;

BUG_ON(b && !a);
}

Perhaps this part of the comment should be simply removed, the unconditional
wmb() in ttwu() is just implementation detail. And note that initially the
documented behaviour of smp_mb__before_spinlock() was only the STORE - LOAD
serialization. Then people noticed that it actually does wmb() and started
to rely on this fact.

To me, this comment should just explain that this function implies a barrier
but only in a sense that you do not need another one after CONDITION = T and
before wake_up_process().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-27 Thread Michal Hocko

On Thu 27-08-15 14:43:34, Peter Zijlstra wrote:
> On Thu, Aug 27, 2015 at 02:27:27PM +0200, Michal Hocko wrote:
> > Hi,
> > I have just stumbled over the comment above wake_up_process which
> > claims:
> > "
> >  * It may be assumed that this function implies a write memory barrier 
> > before
> >  * changing the task state if and only if any tasks are woken up.
> > "
> > 
> > but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
> > since 04e2f1741d235 unconditionally. The comment was added when the
> > smp_wmb was in place already so I am wondering whether the comment is
> > wrong/misleading.
> > 
> > Could somebody clarify please?
> 
> Its true for wake_up(), since that bails early if the waitqueue list is
> empty.
> 
> I suspect there was no exception made for wake_up_process() to simplify
> the rules.

Thanks for the confirmation. Shouldn't we rather change the
documentation because this is clearly misleading and confusing.
--- 
>From b70d9a384cfd018e686c0aca06e830f564a34dd9 Mon Sep 17 00:00:00 2001
From: Michal Hocko 
Date: Thu, 27 Aug 2015 15:10:55 +0200
Subject: [PATCH] sched: Clarigy wake_up_process memory barrier semantic

wake_up_process unlike other wake up primitives based on __wake_up
implies the write memory barrier unconditionally because it relies
on try_to_wake_up directly.

Clarify this in the function comment and memory-barriers.txt because the
current doc is quite misleading.

Signed-off-by: Michal Hocko 
---
 Documentation/memory-barriers.txt | 3 +++
 kernel/sched/core.c   | 3 +--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 13feb697271f..c4f180caf0ff 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state 
is cleared, and so sits
 STORE current->state
LOAD event_indicated
 
+Please note that wake_up_process is an exception here because it implies
+the write memory barrier unconditionally.
+
 To repeat, this write memory barrier is present if and only if something
 is actually awakened.  To see this, consider the following sequence of
 events, where X and Y are both initially zero:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10081..39583b76ad2c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p)
  *
  * Return: 1 if the process was woken up, 0 if it was already running.
  *
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * It may be assumed that this function implies a write memory barrier.
  */
 int wake_up_process(struct task_struct *p)
 {
-- 
2.5.0


-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-27 Thread Peter Zijlstra

On Thu, Aug 27, 2015 at 02:27:27PM +0200, Michal Hocko wrote:
> Hi,
> I have just stumbled over the comment above wake_up_process which
> claims:
> "
>  * It may be assumed that this function implies a write memory barrier before
>  * changing the task state if and only if any tasks are woken up.
> "
> 
> but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
> since 04e2f1741d235 unconditionally. The comment was added when the
> smp_wmb was in place already so I am wondering whether the comment is
> wrong/misleading.
> 
> Could somebody clarify please?

Its true for wake_up(), since that bails early if the waitqueue list is
empty.

I suspect there was no exception made for wake_up_process() to simplify
the rules.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

wake_up_process implied memory barrier clarification

2015-08-27 Thread Michal Hocko

Hi,
I have just stumbled over the comment above wake_up_process which
claims:
"
 * It may be assumed that this function implies a write memory barrier before
 * changing the task state if and only if any tasks are woken up.
"

but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
since 04e2f1741d235 unconditionally. The comment was added when the
smp_wmb was in place already so I am wondering whether the comment is
wrong/misleading.

Could somebody clarify please?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

wake_up_process implied memory barrier clarification

2015-08-27 Thread Michal Hocko

Hi,
I have just stumbled over the comment above wake_up_process which
claims:

 * It may be assumed that this function implies a write memory barrier before
 * changing the task state if and only if any tasks are woken up.


but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
since 04e2f1741d235 unconditionally. The comment was added when the
smp_wmb was in place already so I am wondering whether the comment is
wrong/misleading.

Could somebody clarify please?
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-27 Thread Michal Hocko

On Thu 27-08-15 14:43:34, Peter Zijlstra wrote:
 On Thu, Aug 27, 2015 at 02:27:27PM +0200, Michal Hocko wrote:
  Hi,
  I have just stumbled over the comment above wake_up_process which
  claims:
  
   * It may be assumed that this function implies a write memory barrier 
  before
   * changing the task state if and only if any tasks are woken up.
  
  
  but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
  since 04e2f1741d235 unconditionally. The comment was added when the
  smp_wmb was in place already so I am wondering whether the comment is
  wrong/misleading.
  
  Could somebody clarify please?
 
 Its true for wake_up(), since that bails early if the waitqueue list is
 empty.
 
 I suspect there was no exception made for wake_up_process() to simplify
 the rules.

Thanks for the confirmation. Shouldn't we rather change the
documentation because this is clearly misleading and confusing.
--- 
From b70d9a384cfd018e686c0aca06e830f564a34dd9 Mon Sep 17 00:00:00 2001
From: Michal Hocko mho...@suse.com
Date: Thu, 27 Aug 2015 15:10:55 +0200
Subject: [PATCH] sched: Clarigy wake_up_process memory barrier semantic

wake_up_process unlike other wake up primitives based on __wake_up
implies the write memory barrier unconditionally because it relies
on try_to_wake_up directly.

Clarify this in the function comment and memory-barriers.txt because the
current doc is quite misleading.

Signed-off-by: Michal Hocko mho...@suse.com
---
 Documentation/memory-barriers.txt | 3 +++
 kernel/sched/core.c   | 3 +--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index 13feb697271f..c4f180caf0ff 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state 
is cleared, and so sits
general barrier STORE current-state
LOAD event_indicated
 
+Please note that wake_up_process is an exception here because it implies
+the write memory barrier unconditionally.
+
 To repeat, this write memory barrier is present if and only if something
 is actually awakened.  To see this, consider the following sequence of
 events, where X and Y are both initially zero:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10081..39583b76ad2c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p)
  *
  * Return: 1 if the process was woken up, 0 if it was already running.
  *
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * It may be assumed that this function implies a write memory barrier.
  */
 int wake_up_process(struct task_struct *p)
 {
-- 
2.5.0


-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-27 Thread Peter Zijlstra

On Thu, Aug 27, 2015 at 02:27:27PM +0200, Michal Hocko wrote:
 Hi,
 I have just stumbled over the comment above wake_up_process which
 claims:
 
  * It may be assumed that this function implies a write memory barrier before
  * changing the task state if and only if any tasks are woken up.
 
 
 but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
 since 04e2f1741d235 unconditionally. The comment was added when the
 smp_wmb was in place already so I am wondering whether the comment is
 wrong/misleading.
 
 Could somebody clarify please?

Its true for wake_up(), since that bails early if the waitqueue list is
empty.

I suspect there was no exception made for wake_up_process() to simplify
the rules.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up_process implied memory barrier clarification

2015-08-27 Thread Oleg Nesterov

On 08/27, Michal Hocko wrote:

 --- a/Documentation/memory-barriers.txt
 +++ b/Documentation/memory-barriers.txt
 @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state 
 is cleared, and so sits
   general barrier STORE current-state
   LOAD event_indicated

 +Please note that wake_up_process is an exception here because it implies
 +the write memory barrier unconditionally.
 +

I simply can't understand (can't even parse) this part of memory-barriers.txt.

 --- a/kernel/sched/core.c
 +++ b/kernel/sched/core.c
 @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p)
   *
   * Return: 1 if the process was woken up, 0 if it was already running.
   *
 - * It may be assumed that this function implies a write memory barrier before
 - * changing the task state if and only if any tasks are woken up.
 + * It may be assumed that this function implies a write memory barrier.
   */

I won't argue, technically this is correct of course. And I agree that
the old comment is misleading.



But the new comment looks as if it is fine to avoid wmb() if you do
wake_up_process(). Say,

void w(void)
{
A = 1;
wake_up_process(something_unrelated);
// we know that it implies wmb().
B = 1;
}

void r(void)
{
int a, b;

b = B;
rmb();
a = A;

BUG_ON(b  !a);
}

Perhaps this part of the comment should be simply removed, the unconditional
wmb() in ttwu() is just implementation detail. And note that initially the
documented behaviour of smp_mb__before_spinlock() was only the STORE - LOAD
serialization. Then people noticed that it actually does wmb() and started
to rely on this fact.

To me, this comment should just explain that this function implies a barrier
but only in a sense that you do not need another one after CONDITION = T and
before wake_up_process().

Oleg.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

40 matches

Mail list logo