Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-23 Thread Boqun Feng
On Sat, Aug 19, 2017 at 03:34:01PM +0200, Arnd Bergmann wrote:
> On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmann  wrote:
> 
> >> --- a/include/linux/completion.h
> >> +++ b/include/linux/completion.h
> >> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct 
> >> completion *x) {}
> >>  #endif
> >>
> >>  #define COMPLETION_INITIALIZER_ONSTACK(work) \
> >> -   ({ init_completion(); work; })
> >> +   (*({ init_completion();  }))
> >>
> >>  /**
> >>   * DECLARE_COMPLETION - declare and initialize a completion structure
> >
> > Nice hack. Any idea why that's different to the compiler?
> >

So I find this link:

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

it says:

"In G++, the result value of a statement expression undergoes array and
function pointer decay, and is returned by value to the enclosing
expression. "

I think this is why the temporary variable is constructed(or at least
allocated). Lemme put this in my commit log.

> > I've applied that one to my test tree now, and reverted my own patch,
> > will let you know if anything else shows up. I think we probably want
> > to merge both patches to mainline.
> 
> There is apparently one user of COMPLETION_INITIALIZER_ONSTACK
> that causes a regression with the patch above:
> 
> drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe':
> include/linux/completion.h:77:3: error: value computed is not used
> [-Werror=unused-value]
>   (*({ init_completion();  }))
> 
> It would be trivial to convert to init_completion(), which seems to be
> what was intended there.
> 

Thanks. Will send the conversion as a separate patch along with my
patch.

Regards,
Boqun

> Arnd


signature.asc
Description: PGP signature


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-23 Thread Boqun Feng
On Sat, Aug 19, 2017 at 03:34:01PM +0200, Arnd Bergmann wrote:
> On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmann  wrote:
> 
> >> --- a/include/linux/completion.h
> >> +++ b/include/linux/completion.h
> >> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct 
> >> completion *x) {}
> >>  #endif
> >>
> >>  #define COMPLETION_INITIALIZER_ONSTACK(work) \
> >> -   ({ init_completion(); work; })
> >> +   (*({ init_completion();  }))
> >>
> >>  /**
> >>   * DECLARE_COMPLETION - declare and initialize a completion structure
> >
> > Nice hack. Any idea why that's different to the compiler?
> >

So I find this link:

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

it says:

"In G++, the result value of a statement expression undergoes array and
function pointer decay, and is returned by value to the enclosing
expression. "

I think this is why the temporary variable is constructed(or at least
allocated). Lemme put this in my commit log.

> > I've applied that one to my test tree now, and reverted my own patch,
> > will let you know if anything else shows up. I think we probably want
> > to merge both patches to mainline.
> 
> There is apparently one user of COMPLETION_INITIALIZER_ONSTACK
> that causes a regression with the patch above:
> 
> drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe':
> include/linux/completion.h:77:3: error: value computed is not used
> [-Werror=unused-value]
>   (*({ init_completion();  }))
> 
> It would be trivial to convert to init_completion(), which seems to be
> what was intended there.
> 

Thanks. Will send the conversion as a separate patch along with my
patch.

Regards,
Boqun

> Arnd


signature.asc
Description: PGP signature


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-19 Thread Boqun Feng
On Sat, Aug 19, 2017 at 02:51:17PM +0200, Arnd Bergmann wrote:
[...]
> > Those two "rep movsq"s are very suspicious, because
> > COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
> > rather than move it to some temporary variable and copy it back.
> 
> Right. I've seen this behavior before when using c99 compound
> literals, but I was surprised to see it here.
> 
> I also submitted a patch for the one driver that turned up a new
> warning because of this behavior:
> 
> https://www.spinics.net/lists/raid/msg58766.html
> 

This solution also came up into my mind but then I found there are
several callsites of COMPLETION_INITIALIZER_ONSTACK(), so I then tried
to find a way to fix the macro itself. But your patch looks good to me
;-)

> In case of the mmc driver, the behavior was as expected, it was
> just a little too large and I sent the obvious workaround for it
> 
> https://patchwork.kernel.org/patch/9902063/
> 

Yep.

> > I tried to reduce the size of completion struct, and the "rep movsq" did
> > go away, however it seemed the compiler still allocated the memory for
> > the temporary variables on the stack, because whenever I
> > increased/decreased  the size of completion, the stack size of
> > write_journal() got increased/decreased *7* times, but there are only
> > 3 journal_completion structures in write_journal(). So the *4* callsites
> > of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous.
> >
> > So I come up with the following patch, trying to teach the compiler not
> > to do the unnecessary allocation, could you give it a try?
> >
> > Besides, I could also observe the stack size reduction of
> > write_journal() even for !LOCKDEP kernel.
> 
> Ok.
> 
> > ---
> > Reported-by: Arnd Bergmann 
> > Signed-off-by: Boqun Feng 
> > ---
> >  include/linux/completion.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/completion.h b/include/linux/completion.h
> > index 791f053f28b7..cae5400022a3 100644
> > --- a/include/linux/completion.h
> > +++ b/include/linux/completion.h
> > @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct 
> > completion *x) {}
> >  #endif
> >
> >  #define COMPLETION_INITIALIZER_ONSTACK(work) \
> > -   ({ init_completion(); work; })
> > +   (*({ init_completion();  }))
> >
> >  /**
> >   * DECLARE_COMPLETION - declare and initialize a completion structure
> 
> Nice hack. Any idea why that's different to the compiler?
> 

So *I think* the block {init_completion(); } now will return
a pointer rather than a whole structure, and a pointer could fit in a
register, so the compiler won't bother to allocate the memory for it.

> I've applied that one to my test tree now, and reverted my own patch,
> will let you know if anything else shows up. I think we probably want

Thanks ;-)

> to merge both patches to mainline.
> 

Agreed! Unless we want to remove COMPLETION_INITIALIZER_ONSTACK() for
some reason, then my patch is not needed.

Regards,
Boqun

>   Arnd


signature.asc
Description: PGP signature


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-19 Thread Boqun Feng
On Sat, Aug 19, 2017 at 02:51:17PM +0200, Arnd Bergmann wrote:
[...]
> > Those two "rep movsq"s are very suspicious, because
> > COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
> > rather than move it to some temporary variable and copy it back.
> 
> Right. I've seen this behavior before when using c99 compound
> literals, but I was surprised to see it here.
> 
> I also submitted a patch for the one driver that turned up a new
> warning because of this behavior:
> 
> https://www.spinics.net/lists/raid/msg58766.html
> 

This solution also came up into my mind but then I found there are
several callsites of COMPLETION_INITIALIZER_ONSTACK(), so I then tried
to find a way to fix the macro itself. But your patch looks good to me
;-)

> In case of the mmc driver, the behavior was as expected, it was
> just a little too large and I sent the obvious workaround for it
> 
> https://patchwork.kernel.org/patch/9902063/
> 

Yep.

> > I tried to reduce the size of completion struct, and the "rep movsq" did
> > go away, however it seemed the compiler still allocated the memory for
> > the temporary variables on the stack, because whenever I
> > increased/decreased  the size of completion, the stack size of
> > write_journal() got increased/decreased *7* times, but there are only
> > 3 journal_completion structures in write_journal(). So the *4* callsites
> > of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous.
> >
> > So I come up with the following patch, trying to teach the compiler not
> > to do the unnecessary allocation, could you give it a try?
> >
> > Besides, I could also observe the stack size reduction of
> > write_journal() even for !LOCKDEP kernel.
> 
> Ok.
> 
> > ---
> > Reported-by: Arnd Bergmann 
> > Signed-off-by: Boqun Feng 
> > ---
> >  include/linux/completion.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/completion.h b/include/linux/completion.h
> > index 791f053f28b7..cae5400022a3 100644
> > --- a/include/linux/completion.h
> > +++ b/include/linux/completion.h
> > @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct 
> > completion *x) {}
> >  #endif
> >
> >  #define COMPLETION_INITIALIZER_ONSTACK(work) \
> > -   ({ init_completion(); work; })
> > +   (*({ init_completion();  }))
> >
> >  /**
> >   * DECLARE_COMPLETION - declare and initialize a completion structure
> 
> Nice hack. Any idea why that's different to the compiler?
> 

So *I think* the block {init_completion(); } now will return
a pointer rather than a whole structure, and a pointer could fit in a
register, so the compiler won't bother to allocate the memory for it.

> I've applied that one to my test tree now, and reverted my own patch,
> will let you know if anything else shows up. I think we probably want

Thanks ;-)

> to merge both patches to mainline.
> 

Agreed! Unless we want to remove COMPLETION_INITIALIZER_ONSTACK() for
some reason, then my patch is not needed.

Regards,
Boqun

>   Arnd


signature.asc
Description: PGP signature


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-19 Thread Arnd Bergmann
On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmann  wrote:

>> --- a/include/linux/completion.h
>> +++ b/include/linux/completion.h
>> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct 
>> completion *x) {}
>>  #endif
>>
>>  #define COMPLETION_INITIALIZER_ONSTACK(work) \
>> -   ({ init_completion(); work; })
>> +   (*({ init_completion();  }))
>>
>>  /**
>>   * DECLARE_COMPLETION - declare and initialize a completion structure
>
> Nice hack. Any idea why that's different to the compiler?
>
> I've applied that one to my test tree now, and reverted my own patch,
> will let you know if anything else shows up. I think we probably want
> to merge both patches to mainline.

There is apparently one user of COMPLETION_INITIALIZER_ONSTACK
that causes a regression with the patch above:

drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe':
include/linux/completion.h:77:3: error: value computed is not used
[-Werror=unused-value]
  (*({ init_completion();  }))

It would be trivial to convert to init_completion(), which seems to be
what was intended there.

Arnd


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-19 Thread Arnd Bergmann
On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmann  wrote:

>> --- a/include/linux/completion.h
>> +++ b/include/linux/completion.h
>> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct 
>> completion *x) {}
>>  #endif
>>
>>  #define COMPLETION_INITIALIZER_ONSTACK(work) \
>> -   ({ init_completion(); work; })
>> +   (*({ init_completion();  }))
>>
>>  /**
>>   * DECLARE_COMPLETION - declare and initialize a completion structure
>
> Nice hack. Any idea why that's different to the compiler?
>
> I've applied that one to my test tree now, and reverted my own patch,
> will let you know if anything else shows up. I think we probably want
> to merge both patches to mainline.

There is apparently one user of COMPLETION_INITIALIZER_ONSTACK
that causes a regression with the patch above:

drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe':
include/linux/completion.h:77:3: error: value computed is not used
[-Werror=unused-value]
  (*({ init_completion();  }))

It would be trivial to convert to init_completion(), which seems to be
what was intended there.

Arnd


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-19 Thread Arnd Bergmann
On Sat, Aug 19, 2017 at 1:43 AM, Boqun Feng  wrote:
> Hi Arnd,
>
> On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote:
>> On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park  
>> wrote:
>> > Although wait_for_completion() and its family can cause deadlock, the
>> > lock correctness validator could not be applied to them until now,
>> > because things like complete() are usually called in a different context
>> > from the waiting context, which violates lockdep's assumption.
>> >
>> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
>> > detector to those completion operations. Applied it.
>> >
>> > Signed-off-by: Byungchul Park 
>>
>> This patch introduced a significant growth in kernel stack usage for a small
>> set of functions. I see two new warnings for functions that get tipped over 
>> the
>> 1024 or 2048 byte frame size limit in linux-next (with a few other patches
>> applied):
>>
>> Before:
>>
>> drivers/md/dm-integrity.c: In function 'write_journal':
>> drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
>> larger than xxx bytes [-Werror=frame-larger-than=]
>> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
>> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
>> is larger than 104 bytes [-Werror=frame-larger-than=]
>>
>> After:
>>
>> drivers/md/dm-integrity.c: In function 'write_journal':
>> drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
>> is larger than 1024 bytes [-Werror=frame-larger-than=]
>> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
>> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
>> bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>>
>> I have not checked in detail why this happens, but I'm guessing that
>> there is an overall increase in stack usage with
>> CONFIG_LOCKDEP_COMPLETE in functions using completions,
>> and I think it would be good to try to come up with a version that doesn't
>> add as much.
>>
>
> So I have been staring at this for a while, and below is what I found:
>
> (BTW, Arnd, may I know your compiler version? Mine is 7.1.1)

That is what I used as well, on x86, arm32 and arm64.

> In write_journal(), I can see the code generated like this on x86:
>
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 2462:   e8 00 00 00 00  callq  2467 
> 2467:   48 8d 85 80 fd ff fflea-0x280(%rbp),%rax
> 246e:   48 c7 c6 00 00 00 00mov$0x0,%rsi
> 2475:   48 c7 c2 00 00 00 00mov$0x0,%rdx
> x->done = 0;
> 247c:   c7 85 90 fd ff ff 00movl   $0x0,-0x270(%rbp)
> 2483:   00 00 00
> init_waitqueue_head(>wait);
> 2486:   48 8d 78 18 lea0x18(%rax),%rdi
> 248a:   e8 00 00 00 00  callq  248f 
> if (commit_start + commit_sections <= ic->journal_sections) {
> 248f:   41 8b 87 a8 00 00 00mov0xa8(%r15),%eax
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 2496:   48 8d bd e8 f9 ff fflea-0x618(%rbp),%rdi
> 249d:   48 8d b5 90 fd ff fflea-0x270(%rbp),%rsi
> 24a4:   b9 17 00 00 00  mov$0x17,%ecx
> 24a9:   f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi)
> if (commit_start + commit_sections <= ic->journal_sections) {
> 24ac:   41 39 c6cmp%eax,%r14d
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 24af:   48 8d bd 90 fd ff fflea-0x270(%rbp),%rdi
> 24b6:   48 8d b5 e8 f9 ff fflea-0x618(%rbp),%rsi
> 24bd:   b9 17 00 00 00  mov$0x17,%ecx
> 24c2:   f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi)
>
> Those two "rep movsq"s are very suspicious, because
> COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
> rather than move it to some temporary variable and copy it back.

Right. I've seen this behavior before when using c99 compound
literals, but I was surprised to see it here.

I also submitted a patch for the one driver that turned up a new
warning because of this behavior:

https://www.spinics.net/lists/raid/msg58766.html

In case of the mmc driver, the behavior was as expected, it was
just a little too large and I sent the obvious workaround for it

https://patchwork.kernel.org/patch/9902063/

> I tried to reduce the size of completion struct, and the "rep movsq" did
> go away, however it seemed the compiler still allocated the memory for
> the temporary variables on the stack, because whenever I
> increased/decreased  the size of completion, the stack size of
> write_journal() got increased/decreased *7* times, but there are only
> 3 journal_completion structures in write_journal(). So the *4* callsites
> of 

Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-19 Thread Arnd Bergmann
On Sat, Aug 19, 2017 at 1:43 AM, Boqun Feng  wrote:
> Hi Arnd,
>
> On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote:
>> On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park  
>> wrote:
>> > Although wait_for_completion() and its family can cause deadlock, the
>> > lock correctness validator could not be applied to them until now,
>> > because things like complete() are usually called in a different context
>> > from the waiting context, which violates lockdep's assumption.
>> >
>> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
>> > detector to those completion operations. Applied it.
>> >
>> > Signed-off-by: Byungchul Park 
>>
>> This patch introduced a significant growth in kernel stack usage for a small
>> set of functions. I see two new warnings for functions that get tipped over 
>> the
>> 1024 or 2048 byte frame size limit in linux-next (with a few other patches
>> applied):
>>
>> Before:
>>
>> drivers/md/dm-integrity.c: In function 'write_journal':
>> drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
>> larger than xxx bytes [-Werror=frame-larger-than=]
>> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
>> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
>> is larger than 104 bytes [-Werror=frame-larger-than=]
>>
>> After:
>>
>> drivers/md/dm-integrity.c: In function 'write_journal':
>> drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
>> is larger than 1024 bytes [-Werror=frame-larger-than=]
>> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
>> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
>> bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>>
>> I have not checked in detail why this happens, but I'm guessing that
>> there is an overall increase in stack usage with
>> CONFIG_LOCKDEP_COMPLETE in functions using completions,
>> and I think it would be good to try to come up with a version that doesn't
>> add as much.
>>
>
> So I have been staring at this for a while, and below is what I found:
>
> (BTW, Arnd, may I know your compiler version? Mine is 7.1.1)

That is what I used as well, on x86, arm32 and arm64.

> In write_journal(), I can see the code generated like this on x86:
>
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 2462:   e8 00 00 00 00  callq  2467 
> 2467:   48 8d 85 80 fd ff fflea-0x280(%rbp),%rax
> 246e:   48 c7 c6 00 00 00 00mov$0x0,%rsi
> 2475:   48 c7 c2 00 00 00 00mov$0x0,%rdx
> x->done = 0;
> 247c:   c7 85 90 fd ff ff 00movl   $0x0,-0x270(%rbp)
> 2483:   00 00 00
> init_waitqueue_head(>wait);
> 2486:   48 8d 78 18 lea0x18(%rax),%rdi
> 248a:   e8 00 00 00 00  callq  248f 
> if (commit_start + commit_sections <= ic->journal_sections) {
> 248f:   41 8b 87 a8 00 00 00mov0xa8(%r15),%eax
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 2496:   48 8d bd e8 f9 ff fflea-0x618(%rbp),%rdi
> 249d:   48 8d b5 90 fd ff fflea-0x270(%rbp),%rsi
> 24a4:   b9 17 00 00 00  mov$0x17,%ecx
> 24a9:   f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi)
> if (commit_start + commit_sections <= ic->journal_sections) {
> 24ac:   41 39 c6cmp%eax,%r14d
> io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
> 24af:   48 8d bd 90 fd ff fflea-0x270(%rbp),%rdi
> 24b6:   48 8d b5 e8 f9 ff fflea-0x618(%rbp),%rsi
> 24bd:   b9 17 00 00 00  mov$0x17,%ecx
> 24c2:   f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi)
>
> Those two "rep movsq"s are very suspicious, because
> COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
> rather than move it to some temporary variable and copy it back.

Right. I've seen this behavior before when using c99 compound
literals, but I was surprised to see it here.

I also submitted a patch for the one driver that turned up a new
warning because of this behavior:

https://www.spinics.net/lists/raid/msg58766.html

In case of the mmc driver, the behavior was as expected, it was
just a little too large and I sent the obvious workaround for it

https://patchwork.kernel.org/patch/9902063/

> I tried to reduce the size of completion struct, and the "rep movsq" did
> go away, however it seemed the compiler still allocated the memory for
> the temporary variables on the stack, because whenever I
> increased/decreased  the size of completion, the stack size of
> write_journal() got increased/decreased *7* times, but there are only
> 3 journal_completion structures in write_journal(). So the *4* callsites
> of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous.
>
> So I come up with the following patch, trying to teach the compiler 

Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-18 Thread Boqun Feng
Hi Arnd,

On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote:
> On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park  wrote:
> > Although wait_for_completion() and its family can cause deadlock, the
> > lock correctness validator could not be applied to them until now,
> > because things like complete() are usually called in a different context
> > from the waiting context, which violates lockdep's assumption.
> >
> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> > detector to those completion operations. Applied it.
> >
> > Signed-off-by: Byungchul Park 
> 
> This patch introduced a significant growth in kernel stack usage for a small
> set of functions. I see two new warnings for functions that get tipped over 
> the
> 1024 or 2048 byte frame size limit in linux-next (with a few other patches
> applied):
> 
> Before:
> 
> drivers/md/dm-integrity.c: In function 'write_journal':
> drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
> larger than xxx bytes [-Werror=frame-larger-than=]
> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
> is larger than 104 bytes [-Werror=frame-larger-than=]
> 
> After:
> 
> drivers/md/dm-integrity.c: In function 'write_journal':
> drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
> is larger than 1024 bytes [-Werror=frame-larger-than=]
> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
> bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
> 
> I have not checked in detail why this happens, but I'm guessing that
> there is an overall increase in stack usage with
> CONFIG_LOCKDEP_COMPLETE in functions using completions,
> and I think it would be good to try to come up with a version that doesn't
> add as much.
> 

So I have been staring at this for a while, and below is what I found:

(BTW, Arnd, may I know your compiler version? Mine is 7.1.1)

In write_journal(), I can see the code generated like this on x86:

io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2462:   e8 00 00 00 00  callq  2467 
2467:   48 8d 85 80 fd ff fflea-0x280(%rbp),%rax
246e:   48 c7 c6 00 00 00 00mov$0x0,%rsi
2475:   48 c7 c2 00 00 00 00mov$0x0,%rdx
x->done = 0;
247c:   c7 85 90 fd ff ff 00movl   $0x0,-0x270(%rbp)
2483:   00 00 00 
init_waitqueue_head(>wait);
2486:   48 8d 78 18 lea0x18(%rax),%rdi
248a:   e8 00 00 00 00  callq  248f 
if (commit_start + commit_sections <= ic->journal_sections) {
248f:   41 8b 87 a8 00 00 00mov0xa8(%r15),%eax
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2496:   48 8d bd e8 f9 ff fflea-0x618(%rbp),%rdi
249d:   48 8d b5 90 fd ff fflea-0x270(%rbp),%rsi
24a4:   b9 17 00 00 00  mov$0x17,%ecx
24a9:   f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi)
if (commit_start + commit_sections <= ic->journal_sections) {
24ac:   41 39 c6cmp%eax,%r14d
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
24af:   48 8d bd 90 fd ff fflea-0x270(%rbp),%rdi
24b6:   48 8d b5 e8 f9 ff fflea-0x618(%rbp),%rsi
24bd:   b9 17 00 00 00  mov$0x17,%ecx
24c2:   f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi)

Those two "rep movsq"s are very suspicious, because
COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
rather than move it to some temporary variable and copy it back.

I tried to reduce the size of completion struct, and the "rep movsq" did
go away, however it seemed the compiler still allocated the memory for
the temporary variables on the stack, because whenever I
increased/decreased  the size of completion, the stack size of
write_journal() got increased/decreased *7* times, but there are only
3 journal_completion structures in write_journal(). So the *4* callsites
of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous.

So I come up with the following patch, trying to teach the compiler not
to do the unnecessary allocation, could you give it a try?

Besides, I could also observe the stack size reduction of
write_journal() even for !LOCKDEP kernel.

-->8
Subject: [PATCH] completion: Avoid unnecessary stack allocation for
 COMPLETION_INITIALIZER_ONSTACK()

In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the
stack allocation of the caller. However, on some compilers, a temporary
structure was allocated for the return value of
COMPLETION_INITIALIZER_ONSTACK(), for example in write_journal() with

Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-18 Thread Boqun Feng
Hi Arnd,

On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote:
> On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park  wrote:
> > Although wait_for_completion() and its family can cause deadlock, the
> > lock correctness validator could not be applied to them until now,
> > because things like complete() are usually called in a different context
> > from the waiting context, which violates lockdep's assumption.
> >
> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> > detector to those completion operations. Applied it.
> >
> > Signed-off-by: Byungchul Park 
> 
> This patch introduced a significant growth in kernel stack usage for a small
> set of functions. I see two new warnings for functions that get tipped over 
> the
> 1024 or 2048 byte frame size limit in linux-next (with a few other patches
> applied):
> 
> Before:
> 
> drivers/md/dm-integrity.c: In function 'write_journal':
> drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
> larger than xxx bytes [-Werror=frame-larger-than=]
> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
> is larger than 104 bytes [-Werror=frame-larger-than=]
> 
> After:
> 
> drivers/md/dm-integrity.c: In function 'write_journal':
> drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
> is larger than 1024 bytes [-Werror=frame-larger-than=]
> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
> bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
> 
> I have not checked in detail why this happens, but I'm guessing that
> there is an overall increase in stack usage with
> CONFIG_LOCKDEP_COMPLETE in functions using completions,
> and I think it would be good to try to come up with a version that doesn't
> add as much.
> 

So I have been staring at this for a while, and below is what I found:

(BTW, Arnd, may I know your compiler version? Mine is 7.1.1)

In write_journal(), I can see the code generated like this on x86:

io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2462:   e8 00 00 00 00  callq  2467 
2467:   48 8d 85 80 fd ff fflea-0x280(%rbp),%rax
246e:   48 c7 c6 00 00 00 00mov$0x0,%rsi
2475:   48 c7 c2 00 00 00 00mov$0x0,%rdx
x->done = 0;
247c:   c7 85 90 fd ff ff 00movl   $0x0,-0x270(%rbp)
2483:   00 00 00 
init_waitqueue_head(>wait);
2486:   48 8d 78 18 lea0x18(%rax),%rdi
248a:   e8 00 00 00 00  callq  248f 
if (commit_start + commit_sections <= ic->journal_sections) {
248f:   41 8b 87 a8 00 00 00mov0xa8(%r15),%eax
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
2496:   48 8d bd e8 f9 ff fflea-0x618(%rbp),%rdi
249d:   48 8d b5 90 fd ff fflea-0x270(%rbp),%rsi
24a4:   b9 17 00 00 00  mov$0x17,%ecx
24a9:   f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi)
if (commit_start + commit_sections <= ic->journal_sections) {
24ac:   41 39 c6cmp%eax,%r14d
io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp);
24af:   48 8d bd 90 fd ff fflea-0x270(%rbp),%rdi
24b6:   48 8d b5 e8 f9 ff fflea-0x618(%rbp),%rsi
24bd:   b9 17 00 00 00  mov$0x17,%ecx
24c2:   f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi)

Those two "rep movsq"s are very suspicious, because
COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place,
rather than move it to some temporary variable and copy it back.

I tried to reduce the size of completion struct, and the "rep movsq" did
go away, however it seemed the compiler still allocated the memory for
the temporary variables on the stack, because whenever I
increased/decreased  the size of completion, the stack size of
write_journal() got increased/decreased *7* times, but there are only
3 journal_completion structures in write_journal(). So the *4* callsites
of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous.

So I come up with the following patch, trying to teach the compiler not
to do the unnecessary allocation, could you give it a try?

Besides, I could also observe the stack size reduction of
write_journal() even for !LOCKDEP kernel.

-->8
Subject: [PATCH] completion: Avoid unnecessary stack allocation for
 COMPLETION_INITIALIZER_ONSTACK()

In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the
stack allocation of the caller. However, on some compilers, a temporary
structure was allocated for the return value of
COMPLETION_INITIALIZER_ONSTACK(), for example in write_journal() with
LOCKDEP_COMPLETIONS=y(gcc is 7.1.1):

io_comp.comp = 

Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-14 Thread Arnd Bergmann
On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park  wrote:
> Although wait_for_completion() and its family can cause deadlock, the
> lock correctness validator could not be applied to them until now,
> because things like complete() are usually called in a different context
> from the waiting context, which violates lockdep's assumption.
>
> Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> detector to those completion operations. Applied it.
>
> Signed-off-by: Byungchul Park 

This patch introduced a significant growth in kernel stack usage for a small
set of functions. I see two new warnings for functions that get tipped over the
1024 or 2048 byte frame size limit in linux-next (with a few other patches
applied):

Before:

drivers/md/dm-integrity.c: In function 'write_journal':
drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
larger than xxx bytes [-Werror=frame-larger-than=]
drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
is larger than 104 bytes [-Werror=frame-larger-than=]

After:

drivers/md/dm-integrity.c: In function 'write_journal':
drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
is larger than 1024 bytes [-Werror=frame-larger-than=]
drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

I have not checked in detail why this happens, but I'm guessing that
there is an overall increase in stack usage with
CONFIG_LOCKDEP_COMPLETE in functions using completions,
and I think it would be good to try to come up with a version that doesn't
add as much.

Arnd


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-14 Thread Arnd Bergmann
On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park  wrote:
> Although wait_for_completion() and its family can cause deadlock, the
> lock correctness validator could not be applied to them until now,
> because things like complete() are usually called in a different context
> from the waiting context, which violates lockdep's assumption.
>
> Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
> detector to those completion operations. Applied it.
>
> Signed-off-by: Byungchul Park 

This patch introduced a significant growth in kernel stack usage for a small
set of functions. I see two new warnings for functions that get tipped over the
1024 or 2048 byte frame size limit in linux-next (with a few other patches
applied):

Before:

drivers/md/dm-integrity.c: In function 'write_journal':
drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is
larger than xxx bytes [-Werror=frame-larger-than=]
drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes
is larger than 104 bytes [-Werror=frame-larger-than=]

After:

drivers/md/dm-integrity.c: In function 'write_journal':
drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes
is larger than 1024 bytes [-Werror=frame-larger-than=]
drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq':
drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072
bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

I have not checked in detail why this happens, but I'm guessing that
there is an overall increase in stack usage with
CONFIG_LOCKDEP_COMPLETE in functions using completions,
and I think it would be good to try to come up with a version that doesn't
add as much.

Arnd


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-09 Thread Byungchul Park
On Wed, Aug 09, 2017 at 12:24:39PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> > > +static inline void wait_for_completion(struct completion *x)
> > > +{
> > > + complete_acquire(x);
> > > + __wait_for_completion(x);
> > > + complete_release(x);
> > > +}
> > > +
> > > +static inline void wait_for_completion_io(struct completion *x)
> > > +{
> > > + complete_acquire(x);
> > > + __wait_for_completion_io(x);
> > > + complete_release(x);
> > > +}
> > > +
> > > +static inline int wait_for_completion_interruptible(struct completion *x)
> > > +{
> > > + int ret;
> > > + complete_acquire(x);
> > > + ret = __wait_for_completion_interruptible(x);
> > > + complete_release(x);
> > > + return ret;
> > > +}
> > > +
> > > +static inline int wait_for_completion_killable(struct completion *x)
> > > +{
> > > + int ret;
> > > + complete_acquire(x);
> > > + ret = __wait_for_completion_killable(x);
> > > + complete_release(x);
> > > + return ret;
> > > +}
> > 
> > I don't understand, why not change __wait_for_common() ?
> 
> That is what is wrong with the below?
> 
> Yes, it adds acquire/release to the timeout variants too, but I don't

Yes, I didn't want to involve them in lockdep play which reports _deadlock_
warning since it's not a dependency causing a deadlock.

> see why we should exclude those, and even if we'd want to do that, it
> would be trivial:
> 
>   bool timo = (timeout == MAX_SCHEDULE_TIMEOUT);
> 
>   if (!timo)
>   complete_acquire(x);
> 
>   /* ... */
> 
>   if (!timo)
>   complete_release(x);

Yes, frankly I wanted to use this.. but skip it.

> But like said, I think we very much want to annotate waits with timeouts
> too. Hitting the max timo doesn't necessarily mean we'll make fwd
> progress, we could be stuck in a loop doing something else again before
> returning to wait.

In that case, it should be detected by other dependencies which makes
problems, not the dependency by wait_for_complete().

> Also, even if we'd make fwd progress, hitting that max timo is still not
> desirable.

It's not desirable but it's not a dependency causing a deadlock, so I did
not want to _deadlock_ warning in that cases.. I didn't want to abuse
lockdep reports..

However, it's OK if you think it's worth warning even in that cases.

Thank you very much,
Byungchul



Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-09 Thread Byungchul Park
On Wed, Aug 09, 2017 at 12:24:39PM +0200, Peter Zijlstra wrote:
> On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote:
> > On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> > > +static inline void wait_for_completion(struct completion *x)
> > > +{
> > > + complete_acquire(x);
> > > + __wait_for_completion(x);
> > > + complete_release(x);
> > > +}
> > > +
> > > +static inline void wait_for_completion_io(struct completion *x)
> > > +{
> > > + complete_acquire(x);
> > > + __wait_for_completion_io(x);
> > > + complete_release(x);
> > > +}
> > > +
> > > +static inline int wait_for_completion_interruptible(struct completion *x)
> > > +{
> > > + int ret;
> > > + complete_acquire(x);
> > > + ret = __wait_for_completion_interruptible(x);
> > > + complete_release(x);
> > > + return ret;
> > > +}
> > > +
> > > +static inline int wait_for_completion_killable(struct completion *x)
> > > +{
> > > + int ret;
> > > + complete_acquire(x);
> > > + ret = __wait_for_completion_killable(x);
> > > + complete_release(x);
> > > + return ret;
> > > +}
> > 
> > I don't understand, why not change __wait_for_common() ?
> 
> That is what is wrong with the below?
> 
> Yes, it adds acquire/release to the timeout variants too, but I don't

Yes, I didn't want to involve them in lockdep play which reports _deadlock_
warning since it's not a dependency causing a deadlock.

> see why we should exclude those, and even if we'd want to do that, it
> would be trivial:
> 
>   bool timo = (timeout == MAX_SCHEDULE_TIMEOUT);
> 
>   if (!timo)
>   complete_acquire(x);
> 
>   /* ... */
> 
>   if (!timo)
>   complete_release(x);

Yes, frankly I wanted to use this.. but skip it.

> But like said, I think we very much want to annotate waits with timeouts
> too. Hitting the max timo doesn't necessarily mean we'll make fwd
> progress, we could be stuck in a loop doing something else again before
> returning to wait.

In that case, it should be detected by other dependencies which makes
problems, not the dependency by wait_for_complete().

> Also, even if we'd make fwd progress, hitting that max timo is still not
> desirable.

It's not desirable but it's not a dependency causing a deadlock, so I did
not want to _deadlock_ warning in that cases.. I didn't want to abuse
lockdep reports..

However, it's OK if you think it's worth warning even in that cases.

Thank you very much,
Byungchul



Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-09 Thread Peter Zijlstra
On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote:
> On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> > +static inline void wait_for_completion(struct completion *x)
> > +{
> > +   complete_acquire(x);
> > +   __wait_for_completion(x);
> > +   complete_release(x);
> > +}
> > +
> > +static inline void wait_for_completion_io(struct completion *x)
> > +{
> > +   complete_acquire(x);
> > +   __wait_for_completion_io(x);
> > +   complete_release(x);
> > +}
> > +
> > +static inline int wait_for_completion_interruptible(struct completion *x)
> > +{
> > +   int ret;
> > +   complete_acquire(x);
> > +   ret = __wait_for_completion_interruptible(x);
> > +   complete_release(x);
> > +   return ret;
> > +}
> > +
> > +static inline int wait_for_completion_killable(struct completion *x)
> > +{
> > +   int ret;
> > +   complete_acquire(x);
> > +   ret = __wait_for_completion_killable(x);
> > +   complete_release(x);
> > +   return ret;
> > +}
> 
> I don't understand, why not change __wait_for_common() ?

That is what is wrong with the below?

Yes, it adds acquire/release to the timeout variants too, but I don't
see why we should exclude those, and even if we'd want to do that, it
would be trivial:

bool timo = (timeout == MAX_SCHEDULE_TIMEOUT);

if (!timo)
complete_acquire(x);

/* ... */

if (!timo)
complete_release(x);

But like said, I think we very much want to annotate waits with timeouts
too. Hitting the max timo doesn't necessarily mean we'll make fwd
progress, we could be stuck in a loop doing something else again before
returning to wait.

Also, even if we'd make fwd progress, hitting that max timo is still not
desirable.

---
Subject: lockdep: Apply crossrelease to completions
From: Byungchul Park 
Date: Mon, 7 Aug 2017 16:12:56 +0900

Although wait_for_completion() and its family can cause deadlock, the
lock correctness validator could not be applied to them until now,
because things like complete() are usually called in a different context
from the waiting context, which violates lockdep's assumption.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to those completion operations. Applied it.

Cc: npig...@gmail.com
Cc: mi...@kernel.org
Cc: a...@linux-foundation.org
Cc: t...@linutronix.de
Cc: boqun.f...@gmail.com
Cc: wi...@infradead.org
Cc: wal...@google.com
Cc: kernel-t...@lge.com
Cc: kir...@shutemov.name
Signed-off-by: Byungchul Park 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/completion.h |   45 -
 kernel/sched/completion.c  |   11 +++
 lib/Kconfig.debug  |8 
 3 files changed, 63 insertions(+), 1 deletion(-)

--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
  */
 
 #include 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include 
+#endif
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,50 @@
 struct completion {
unsigned int done;
wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+   struct lockdep_map_cross map;
+#endif
 };
 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+   lock_acquire_exclusive((struct lockdep_map *)>map, 0, 0, NULL, 
_RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+   lock_release((struct lockdep_map *)>map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+   lock_commit_crosslock((struct lockdep_map *)>map);
+}
+
+#define init_completion(x) \
+do {   \
+   static struct lock_class_key __key; \
+   lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \
+   "(complete)" #x,\
+   &__key, 0); \
+   __init_completion(x);   \
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+   { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+   STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
+#else
 #define COMPLETION_INITIALIZER(work) \
{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif
 
 #define COMPLETION_INITIALIZER_ONSTACK(work) \
({ init_completion(); work; })
@@ -70,7 +113,7 @@ struct completion {
  * This inline function will initialize a dynamically created 

Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-09 Thread Peter Zijlstra
On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote:
> On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> > +static inline void wait_for_completion(struct completion *x)
> > +{
> > +   complete_acquire(x);
> > +   __wait_for_completion(x);
> > +   complete_release(x);
> > +}
> > +
> > +static inline void wait_for_completion_io(struct completion *x)
> > +{
> > +   complete_acquire(x);
> > +   __wait_for_completion_io(x);
> > +   complete_release(x);
> > +}
> > +
> > +static inline int wait_for_completion_interruptible(struct completion *x)
> > +{
> > +   int ret;
> > +   complete_acquire(x);
> > +   ret = __wait_for_completion_interruptible(x);
> > +   complete_release(x);
> > +   return ret;
> > +}
> > +
> > +static inline int wait_for_completion_killable(struct completion *x)
> > +{
> > +   int ret;
> > +   complete_acquire(x);
> > +   ret = __wait_for_completion_killable(x);
> > +   complete_release(x);
> > +   return ret;
> > +}
> 
> I don't understand, why not change __wait_for_common() ?

That is what is wrong with the below?

Yes, it adds acquire/release to the timeout variants too, but I don't
see why we should exclude those, and even if we'd want to do that, it
would be trivial:

bool timo = (timeout == MAX_SCHEDULE_TIMEOUT);

if (!timo)
complete_acquire(x);

/* ... */

if (!timo)
complete_release(x);

But like said, I think we very much want to annotate waits with timeouts
too. Hitting the max timo doesn't necessarily mean we'll make fwd
progress, we could be stuck in a loop doing something else again before
returning to wait.

Also, even if we'd make fwd progress, hitting that max timo is still not
desirable.

---
Subject: lockdep: Apply crossrelease to completions
From: Byungchul Park 
Date: Mon, 7 Aug 2017 16:12:56 +0900

Although wait_for_completion() and its family can cause deadlock, the
lock correctness validator could not be applied to them until now,
because things like complete() are usually called in a different context
from the waiting context, which violates lockdep's assumption.

Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep
detector to those completion operations. Applied it.

Cc: npig...@gmail.com
Cc: mi...@kernel.org
Cc: a...@linux-foundation.org
Cc: t...@linutronix.de
Cc: boqun.f...@gmail.com
Cc: wi...@infradead.org
Cc: wal...@google.com
Cc: kernel-t...@lge.com
Cc: kir...@shutemov.name
Signed-off-by: Byungchul Park 
Signed-off-by: Peter Zijlstra (Intel) 
---
 include/linux/completion.h |   45 -
 kernel/sched/completion.c  |   11 +++
 lib/Kconfig.debug  |8 
 3 files changed, 63 insertions(+), 1 deletion(-)

--- a/include/linux/completion.h
+++ b/include/linux/completion.h
@@ -9,6 +9,9 @@
  */
 
 #include 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#include 
+#endif
 
 /*
  * struct completion - structure used to maintain state for a "completion"
@@ -25,10 +28,50 @@
 struct completion {
unsigned int done;
wait_queue_head_t wait;
+#ifdef CONFIG_LOCKDEP_COMPLETE
+   struct lockdep_map_cross map;
+#endif
 };
 
+#ifdef CONFIG_LOCKDEP_COMPLETE
+static inline void complete_acquire(struct completion *x)
+{
+   lock_acquire_exclusive((struct lockdep_map *)>map, 0, 0, NULL, 
_RET_IP_);
+}
+
+static inline void complete_release(struct completion *x)
+{
+   lock_release((struct lockdep_map *)>map, 0, _RET_IP_);
+}
+
+static inline void complete_release_commit(struct completion *x)
+{
+   lock_commit_crosslock((struct lockdep_map *)>map);
+}
+
+#define init_completion(x) \
+do {   \
+   static struct lock_class_key __key; \
+   lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \
+   "(complete)" #x,\
+   &__key, 0); \
+   __init_completion(x);   \
+} while (0)
+#else
+#define init_completion(x) __init_completion(x)
+static inline void complete_acquire(struct completion *x) {}
+static inline void complete_release(struct completion *x) {}
+static inline void complete_release_commit(struct completion *x) {}
+#endif
+
+#ifdef CONFIG_LOCKDEP_COMPLETE
+#define COMPLETION_INITIALIZER(work) \
+   { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \
+   STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
+#else
 #define COMPLETION_INITIALIZER(work) \
{ 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) }
+#endif
 
 #define COMPLETION_INITIALIZER_ONSTACK(work) \
({ init_completion(); work; })
@@ -70,7 +113,7 @@ struct completion {
  * This inline function will initialize a dynamically created completion
  * structure.
  */
-static inline void init_completion(struct 

Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-09 Thread Peter Zijlstra
On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> +static inline void wait_for_completion(struct completion *x)
> +{
> + complete_acquire(x);
> + __wait_for_completion(x);
> + complete_release(x);
> +}
> +
> +static inline void wait_for_completion_io(struct completion *x)
> +{
> + complete_acquire(x);
> + __wait_for_completion_io(x);
> + complete_release(x);
> +}
> +
> +static inline int wait_for_completion_interruptible(struct completion *x)
> +{
> + int ret;
> + complete_acquire(x);
> + ret = __wait_for_completion_interruptible(x);
> + complete_release(x);
> + return ret;
> +}
> +
> +static inline int wait_for_completion_killable(struct completion *x)
> +{
> + int ret;
> + complete_acquire(x);
> + ret = __wait_for_completion_killable(x);
> + complete_release(x);
> + return ret;
> +}

I don't understand, why not change __wait_for_common() ?


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-09 Thread Peter Zijlstra
On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote:
> +static inline void wait_for_completion(struct completion *x)
> +{
> + complete_acquire(x);
> + __wait_for_completion(x);
> + complete_release(x);
> +}
> +
> +static inline void wait_for_completion_io(struct completion *x)
> +{
> + complete_acquire(x);
> + __wait_for_completion_io(x);
> + complete_release(x);
> +}
> +
> +static inline int wait_for_completion_interruptible(struct completion *x)
> +{
> + int ret;
> + complete_acquire(x);
> + ret = __wait_for_completion_interruptible(x);
> + complete_release(x);
> + return ret;
> +}
> +
> +static inline int wait_for_completion_killable(struct completion *x)
> +{
> + int ret;
> + complete_acquire(x);
> + ret = __wait_for_completion_killable(x);
> + complete_release(x);
> + return ret;
> +}

I don't understand, why not change __wait_for_common() ?


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-07 Thread kbuild test robot
Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: cris-allmodconfig (attached as .config)
compiler: cris-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=cris 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/pm.h:29:0,
from include/linux/device.h:25,
from include/linux/pci.h:30,
from drivers/usb/host/ehci-hcd.c:24:
   include/linux/completion.h:32:27: error: field 'map' has incomplete type
 struct lockdep_map_cross map;
  ^~~
   In file included from include/linux/spinlock_types.h:18:0,
from include/linux/spinlock.h:81,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/usb/host/ehci-hcd.c:23:
   drivers/usb/host/ehci-hub.c: In function 'ehset_single_step_set_feature':
>> include/linux/lockdep.h:578:4: error: field name not in record or union 
>> initializer
 { .map.name = (_name), .map.key = (void *)(_key), \
   ^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:4: note: (near initialization for 'done.map')
 { .map.name = (_name), .map.key = (void *)(_key), \
   ^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:25: error: field name not in record or union 
initializer
 { .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:25: note: (near initialization for 'done.map')
 { .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of 

Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-07 Thread kbuild test robot
Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: cris-allmodconfig (attached as .config)
compiler: cris-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=cris 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/pm.h:29:0,
from include/linux/device.h:25,
from include/linux/pci.h:30,
from drivers/usb/host/ehci-hcd.c:24:
   include/linux/completion.h:32:27: error: field 'map' has incomplete type
 struct lockdep_map_cross map;
  ^~~
   In file included from include/linux/spinlock_types.h:18:0,
from include/linux/spinlock.h:81,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/usb/host/ehci-hcd.c:23:
   drivers/usb/host/ehci-hub.c: In function 'ehset_single_step_set_feature':
>> include/linux/lockdep.h:578:4: error: field name not in record or union 
>> initializer
 { .map.name = (_name), .map.key = (void *)(_key), \
   ^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:4: note: (near initialization for 'done.map')
 { .map.name = (_name), .map.key = (void *)(_key), \
   ^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:25: error: field name not in record or union 
initializer
 { .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of macro 
'DECLARE_COMPLETION'
# define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work)
  ^~
   drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 
'DECLARE_COMPLETION_ONSTACK'
 DECLARE_COMPLETION_ONSTACK(done);
 ^~
   include/linux/lockdep.h:578:25: note: (near initialization for 'done.map')
 { .map.name = (_name), .map.key = (void *)(_key), \
^
>> include/linux/completion.h:70:2: note: in expansion of macro 
>> 'STATIC_CROSS_LOCKDEP_MAP_INIT'
 STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) }
 ^
   include/linux/completion.h:88:27: note: in expansion of macro 
'COMPLETION_INITIALIZER'
 struct completion work = COMPLETION_INITIALIZER(work)
  ^~
   include/linux/completion.h:106:43: note: in expansion of 

Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-07 Thread kbuild test robot
Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha 

All errors (new ones prefixed by >>):

   warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet 
direct dependencies (PROVE_LOCKING)
   warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet 
direct dependencies (PROVE_LOCKING)
   In file included from include/linux/srcutree.h:28:0,
from include/linux/srcu.h:62,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:771,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/alpha/kernel/asm-offsets.c:10:
>> include/linux/completion.h:32:27: error: field 'map' has incomplete type
 struct lockdep_map_cross map;
  ^~~
   make[2]: *** [arch/alpha/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +/map +32 include/linux/completion.h

15  
16  /*
17   * struct completion - structure used to maintain state for a 
"completion"
18   *
19   * This is the opaque structure used to maintain the state for a 
"completion".
20   * Completions currently use a FIFO to queue threads that have to wait 
for
21   * the "completion" event.
22   *
23   * See also:  complete(), wait_for_completion() (and friends _timeout,
24   * _interruptible, _interruptible_timeout, and _killable), 
init_completion(),
25   * reinit_completion(), and macros DECLARE_COMPLETION(),
26   * DECLARE_COMPLETION_ONSTACK().
27   */
28  struct completion {
29  unsigned int done;
30  wait_queue_head_t wait;
31  #ifdef CONFIG_LOCKDEP_COMPLETE
  > 32  struct lockdep_map_cross map;
33  #endif
34  };
35  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions

2017-08-07 Thread kbuild test robot
Hi Byungchul,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.13-rc4 next-20170804]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617
config: alpha-allmodconfig (attached as .config)
compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=alpha 

All errors (new ones prefixed by >>):

   warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet 
direct dependencies (PROVE_LOCKING)
   warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet 
direct dependencies (PROVE_LOCKING)
   In file included from include/linux/srcutree.h:28:0,
from include/linux/srcu.h:62,
from include/linux/notifier.h:15,
from include/linux/memory_hotplug.h:6,
from include/linux/mmzone.h:771,
from include/linux/gfp.h:5,
from include/linux/mm.h:9,
from include/linux/pid_namespace.h:6,
from include/linux/ptrace.h:9,
from arch/alpha/kernel/asm-offsets.c:10:
>> include/linux/completion.h:32:27: error: field 'map' has incomplete type
 struct lockdep_map_cross map;
  ^~~
   make[2]: *** [arch/alpha/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [sub-make] Error 2

vim +/map +32 include/linux/completion.h

15  
16  /*
17   * struct completion - structure used to maintain state for a 
"completion"
18   *
19   * This is the opaque structure used to maintain the state for a 
"completion".
20   * Completions currently use a FIFO to queue threads that have to wait 
for
21   * the "completion" event.
22   *
23   * See also:  complete(), wait_for_completion() (and friends _timeout,
24   * _interruptible, _interruptible_timeout, and _killable), 
init_completion(),
25   * reinit_completion(), and macros DECLARE_COMPLETION(),
26   * DECLARE_COMPLETION_ONSTACK().
27   */
28  struct completion {
29  unsigned int done;
30  wait_queue_head_t wait;
31  #ifdef CONFIG_LOCKDEP_COMPLETE
  > 32  struct lockdep_map_cross map;
33  #endif
34  };
35  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip