Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Sat, Aug 19, 2017 at 03:34:01PM +0200, Arnd Bergmann wrote: > On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmannwrote: > > >> --- a/include/linux/completion.h > >> +++ b/include/linux/completion.h > >> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct > >> completion *x) {} > >> #endif > >> > >> #define COMPLETION_INITIALIZER_ONSTACK(work) \ > >> - ({ init_completion(); work; }) > >> + (*({ init_completion(); })) > >> > >> /** > >> * DECLARE_COMPLETION - declare and initialize a completion structure > > > > Nice hack. Any idea why that's different to the compiler? > > So I find this link: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html it says: "In G++, the result value of a statement expression undergoes array and function pointer decay, and is returned by value to the enclosing expression. " I think this is why the temporary variable is constructed(or at least allocated). Lemme put this in my commit log. > > I've applied that one to my test tree now, and reverted my own patch, > > will let you know if anything else shows up. I think we probably want > > to merge both patches to mainline. > > There is apparently one user of COMPLETION_INITIALIZER_ONSTACK > that causes a regression with the patch above: > > drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe': > include/linux/completion.h:77:3: error: value computed is not used > [-Werror=unused-value] > (*({ init_completion(); })) > > It would be trivial to convert to init_completion(), which seems to be > what was intended there. > Thanks. Will send the conversion as a separate patch along with my patch. Regards, Boqun > Arnd signature.asc Description: PGP signature
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Sat, Aug 19, 2017 at 03:34:01PM +0200, Arnd Bergmann wrote: > On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmann wrote: > > >> --- a/include/linux/completion.h > >> +++ b/include/linux/completion.h > >> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct > >> completion *x) {} > >> #endif > >> > >> #define COMPLETION_INITIALIZER_ONSTACK(work) \ > >> - ({ init_completion(); work; }) > >> + (*({ init_completion(); })) > >> > >> /** > >> * DECLARE_COMPLETION - declare and initialize a completion structure > > > > Nice hack. Any idea why that's different to the compiler? > > So I find this link: https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html it says: "In G++, the result value of a statement expression undergoes array and function pointer decay, and is returned by value to the enclosing expression. " I think this is why the temporary variable is constructed(or at least allocated). Lemme put this in my commit log. > > I've applied that one to my test tree now, and reverted my own patch, > > will let you know if anything else shows up. I think we probably want > > to merge both patches to mainline. > > There is apparently one user of COMPLETION_INITIALIZER_ONSTACK > that causes a regression with the patch above: > > drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe': > include/linux/completion.h:77:3: error: value computed is not used > [-Werror=unused-value] > (*({ init_completion(); })) > > It would be trivial to convert to init_completion(), which seems to be > what was intended there. > Thanks. Will send the conversion as a separate patch along with my patch. Regards, Boqun > Arnd signature.asc Description: PGP signature
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Sat, Aug 19, 2017 at 02:51:17PM +0200, Arnd Bergmann wrote: [...] > > Those two "rep movsq"s are very suspicious, because > > COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place, > > rather than move it to some temporary variable and copy it back. > > Right. I've seen this behavior before when using c99 compound > literals, but I was surprised to see it here. > > I also submitted a patch for the one driver that turned up a new > warning because of this behavior: > > https://www.spinics.net/lists/raid/msg58766.html > This solution also came up into my mind but then I found there are several callsites of COMPLETION_INITIALIZER_ONSTACK(), so I then tried to find a way to fix the macro itself. But your patch looks good to me ;-) > In case of the mmc driver, the behavior was as expected, it was > just a little too large and I sent the obvious workaround for it > > https://patchwork.kernel.org/patch/9902063/ > Yep. > > I tried to reduce the size of completion struct, and the "rep movsq" did > > go away, however it seemed the compiler still allocated the memory for > > the temporary variables on the stack, because whenever I > > increased/decreased the size of completion, the stack size of > > write_journal() got increased/decreased *7* times, but there are only > > 3 journal_completion structures in write_journal(). So the *4* callsites > > of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous. > > > > So I come up with the following patch, trying to teach the compiler not > > to do the unnecessary allocation, could you give it a try? > > > > Besides, I could also observe the stack size reduction of > > write_journal() even for !LOCKDEP kernel. > > Ok. > > > --- > > Reported-by: Arnd Bergmann> > Signed-off-by: Boqun Feng > > --- > > include/linux/completion.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/linux/completion.h b/include/linux/completion.h > > index 791f053f28b7..cae5400022a3 100644 > > --- a/include/linux/completion.h > > +++ b/include/linux/completion.h > > @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct > > completion *x) {} > > #endif > > > > #define COMPLETION_INITIALIZER_ONSTACK(work) \ > > - ({ init_completion(); work; }) > > + (*({ init_completion(); })) > > > > /** > > * DECLARE_COMPLETION - declare and initialize a completion structure > > Nice hack. Any idea why that's different to the compiler? > So *I think* the block {init_completion(); } now will return a pointer rather than a whole structure, and a pointer could fit in a register, so the compiler won't bother to allocate the memory for it. > I've applied that one to my test tree now, and reverted my own patch, > will let you know if anything else shows up. I think we probably want Thanks ;-) > to merge both patches to mainline. > Agreed! Unless we want to remove COMPLETION_INITIALIZER_ONSTACK() for some reason, then my patch is not needed. Regards, Boqun > Arnd signature.asc Description: PGP signature
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Sat, Aug 19, 2017 at 02:51:17PM +0200, Arnd Bergmann wrote: [...] > > Those two "rep movsq"s are very suspicious, because > > COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place, > > rather than move it to some temporary variable and copy it back. > > Right. I've seen this behavior before when using c99 compound > literals, but I was surprised to see it here. > > I also submitted a patch for the one driver that turned up a new > warning because of this behavior: > > https://www.spinics.net/lists/raid/msg58766.html > This solution also came up into my mind but then I found there are several callsites of COMPLETION_INITIALIZER_ONSTACK(), so I then tried to find a way to fix the macro itself. But your patch looks good to me ;-) > In case of the mmc driver, the behavior was as expected, it was > just a little too large and I sent the obvious workaround for it > > https://patchwork.kernel.org/patch/9902063/ > Yep. > > I tried to reduce the size of completion struct, and the "rep movsq" did > > go away, however it seemed the compiler still allocated the memory for > > the temporary variables on the stack, because whenever I > > increased/decreased the size of completion, the stack size of > > write_journal() got increased/decreased *7* times, but there are only > > 3 journal_completion structures in write_journal(). So the *4* callsites > > of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous. > > > > So I come up with the following patch, trying to teach the compiler not > > to do the unnecessary allocation, could you give it a try? > > > > Besides, I could also observe the stack size reduction of > > write_journal() even for !LOCKDEP kernel. > > Ok. > > > --- > > Reported-by: Arnd Bergmann > > Signed-off-by: Boqun Feng > > --- > > include/linux/completion.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/linux/completion.h b/include/linux/completion.h > > index 791f053f28b7..cae5400022a3 100644 > > --- a/include/linux/completion.h > > +++ b/include/linux/completion.h > > @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct > > completion *x) {} > > #endif > > > > #define COMPLETION_INITIALIZER_ONSTACK(work) \ > > - ({ init_completion(); work; }) > > + (*({ init_completion(); })) > > > > /** > > * DECLARE_COMPLETION - declare and initialize a completion structure > > Nice hack. Any idea why that's different to the compiler? > So *I think* the block {init_completion(); } now will return a pointer rather than a whole structure, and a pointer could fit in a register, so the compiler won't bother to allocate the memory for it. > I've applied that one to my test tree now, and reverted my own patch, > will let you know if anything else shows up. I think we probably want Thanks ;-) > to merge both patches to mainline. > Agreed! Unless we want to remove COMPLETION_INITIALIZER_ONSTACK() for some reason, then my patch is not needed. Regards, Boqun > Arnd signature.asc Description: PGP signature
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmannwrote: >> --- a/include/linux/completion.h >> +++ b/include/linux/completion.h >> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct >> completion *x) {} >> #endif >> >> #define COMPLETION_INITIALIZER_ONSTACK(work) \ >> - ({ init_completion(); work; }) >> + (*({ init_completion(); })) >> >> /** >> * DECLARE_COMPLETION - declare and initialize a completion structure > > Nice hack. Any idea why that's different to the compiler? > > I've applied that one to my test tree now, and reverted my own patch, > will let you know if anything else shows up. I think we probably want > to merge both patches to mainline. There is apparently one user of COMPLETION_INITIALIZER_ONSTACK that causes a regression with the patch above: drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe': include/linux/completion.h:77:3: error: value computed is not used [-Werror=unused-value] (*({ init_completion(); })) It would be trivial to convert to init_completion(), which seems to be what was intended there. Arnd
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Sat, Aug 19, 2017 at 2:51 PM, Arnd Bergmann wrote: >> --- a/include/linux/completion.h >> +++ b/include/linux/completion.h >> @@ -74,7 +74,7 @@ static inline void complete_release_commit(struct >> completion *x) {} >> #endif >> >> #define COMPLETION_INITIALIZER_ONSTACK(work) \ >> - ({ init_completion(); work; }) >> + (*({ init_completion(); })) >> >> /** >> * DECLARE_COMPLETION - declare and initialize a completion structure > > Nice hack. Any idea why that's different to the compiler? > > I've applied that one to my test tree now, and reverted my own patch, > will let you know if anything else shows up. I think we probably want > to merge both patches to mainline. There is apparently one user of COMPLETION_INITIALIZER_ONSTACK that causes a regression with the patch above: drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe': include/linux/completion.h:77:3: error: value computed is not used [-Werror=unused-value] (*({ init_completion(); })) It would be trivial to convert to init_completion(), which seems to be what was intended there. Arnd
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Sat, Aug 19, 2017 at 1:43 AM, Boqun Fengwrote: > Hi Arnd, > > On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote: >> On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park >> wrote: >> > Although wait_for_completion() and its family can cause deadlock, the >> > lock correctness validator could not be applied to them until now, >> > because things like complete() are usually called in a different context >> > from the waiting context, which violates lockdep's assumption. >> > >> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep >> > detector to those completion operations. Applied it. >> > >> > Signed-off-by: Byungchul Park >> >> This patch introduced a significant growth in kernel stack usage for a small >> set of functions. I see two new warnings for functions that get tipped over >> the >> 1024 or 2048 byte frame size limit in linux-next (with a few other patches >> applied): >> >> Before: >> >> drivers/md/dm-integrity.c: In function 'write_journal': >> drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is >> larger than xxx bytes [-Werror=frame-larger-than=] >> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': >> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes >> is larger than 104 bytes [-Werror=frame-larger-than=] >> >> After: >> >> drivers/md/dm-integrity.c: In function 'write_journal': >> drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes >> is larger than 1024 bytes [-Werror=frame-larger-than=] >> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': >> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072 >> bytes is larger than 1024 bytes [-Werror=frame-larger-than=] >> >> I have not checked in detail why this happens, but I'm guessing that >> there is an overall increase in stack usage with >> CONFIG_LOCKDEP_COMPLETE in functions using completions, >> and I think it would be good to try to come up with a version that doesn't >> add as much. >> > > So I have been staring at this for a while, and below is what I found: > > (BTW, Arnd, may I know your compiler version? Mine is 7.1.1) That is what I used as well, on x86, arm32 and arm64. > In write_journal(), I can see the code generated like this on x86: > > io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); > 2462: e8 00 00 00 00 callq 2467 > 2467: 48 8d 85 80 fd ff fflea-0x280(%rbp),%rax > 246e: 48 c7 c6 00 00 00 00mov$0x0,%rsi > 2475: 48 c7 c2 00 00 00 00mov$0x0,%rdx > x->done = 0; > 247c: c7 85 90 fd ff ff 00movl $0x0,-0x270(%rbp) > 2483: 00 00 00 > init_waitqueue_head(>wait); > 2486: 48 8d 78 18 lea0x18(%rax),%rdi > 248a: e8 00 00 00 00 callq 248f > if (commit_start + commit_sections <= ic->journal_sections) { > 248f: 41 8b 87 a8 00 00 00mov0xa8(%r15),%eax > io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); > 2496: 48 8d bd e8 f9 ff fflea-0x618(%rbp),%rdi > 249d: 48 8d b5 90 fd ff fflea-0x270(%rbp),%rsi > 24a4: b9 17 00 00 00 mov$0x17,%ecx > 24a9: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) > if (commit_start + commit_sections <= ic->journal_sections) { > 24ac: 41 39 c6cmp%eax,%r14d > io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); > 24af: 48 8d bd 90 fd ff fflea-0x270(%rbp),%rdi > 24b6: 48 8d b5 e8 f9 ff fflea-0x618(%rbp),%rsi > 24bd: b9 17 00 00 00 mov$0x17,%ecx > 24c2: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) > > Those two "rep movsq"s are very suspicious, because > COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place, > rather than move it to some temporary variable and copy it back. Right. I've seen this behavior before when using c99 compound literals, but I was surprised to see it here. I also submitted a patch for the one driver that turned up a new warning because of this behavior: https://www.spinics.net/lists/raid/msg58766.html In case of the mmc driver, the behavior was as expected, it was just a little too large and I sent the obvious workaround for it https://patchwork.kernel.org/patch/9902063/ > I tried to reduce the size of completion struct, and the "rep movsq" did > go away, however it seemed the compiler still allocated the memory for > the temporary variables on the stack, because whenever I > increased/decreased the size of completion, the stack size of > write_journal() got increased/decreased *7* times, but there are only > 3 journal_completion structures in write_journal(). So the *4* callsites > of
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Sat, Aug 19, 2017 at 1:43 AM, Boqun Feng wrote: > Hi Arnd, > > On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote: >> On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park >> wrote: >> > Although wait_for_completion() and its family can cause deadlock, the >> > lock correctness validator could not be applied to them until now, >> > because things like complete() are usually called in a different context >> > from the waiting context, which violates lockdep's assumption. >> > >> > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep >> > detector to those completion operations. Applied it. >> > >> > Signed-off-by: Byungchul Park >> >> This patch introduced a significant growth in kernel stack usage for a small >> set of functions. I see two new warnings for functions that get tipped over >> the >> 1024 or 2048 byte frame size limit in linux-next (with a few other patches >> applied): >> >> Before: >> >> drivers/md/dm-integrity.c: In function 'write_journal': >> drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is >> larger than xxx bytes [-Werror=frame-larger-than=] >> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': >> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes >> is larger than 104 bytes [-Werror=frame-larger-than=] >> >> After: >> >> drivers/md/dm-integrity.c: In function 'write_journal': >> drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes >> is larger than 1024 bytes [-Werror=frame-larger-than=] >> drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': >> drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072 >> bytes is larger than 1024 bytes [-Werror=frame-larger-than=] >> >> I have not checked in detail why this happens, but I'm guessing that >> there is an overall increase in stack usage with >> CONFIG_LOCKDEP_COMPLETE in functions using completions, >> and I think it would be good to try to come up with a version that doesn't >> add as much. >> > > So I have been staring at this for a while, and below is what I found: > > (BTW, Arnd, may I know your compiler version? Mine is 7.1.1) That is what I used as well, on x86, arm32 and arm64. > In write_journal(), I can see the code generated like this on x86: > > io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); > 2462: e8 00 00 00 00 callq 2467 > 2467: 48 8d 85 80 fd ff fflea-0x280(%rbp),%rax > 246e: 48 c7 c6 00 00 00 00mov$0x0,%rsi > 2475: 48 c7 c2 00 00 00 00mov$0x0,%rdx > x->done = 0; > 247c: c7 85 90 fd ff ff 00movl $0x0,-0x270(%rbp) > 2483: 00 00 00 > init_waitqueue_head(>wait); > 2486: 48 8d 78 18 lea0x18(%rax),%rdi > 248a: e8 00 00 00 00 callq 248f > if (commit_start + commit_sections <= ic->journal_sections) { > 248f: 41 8b 87 a8 00 00 00mov0xa8(%r15),%eax > io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); > 2496: 48 8d bd e8 f9 ff fflea-0x618(%rbp),%rdi > 249d: 48 8d b5 90 fd ff fflea-0x270(%rbp),%rsi > 24a4: b9 17 00 00 00 mov$0x17,%ecx > 24a9: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) > if (commit_start + commit_sections <= ic->journal_sections) { > 24ac: 41 39 c6cmp%eax,%r14d > io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); > 24af: 48 8d bd 90 fd ff fflea-0x270(%rbp),%rdi > 24b6: 48 8d b5 e8 f9 ff fflea-0x618(%rbp),%rsi > 24bd: b9 17 00 00 00 mov$0x17,%ecx > 24c2: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) > > Those two "rep movsq"s are very suspicious, because > COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place, > rather than move it to some temporary variable and copy it back. Right. I've seen this behavior before when using c99 compound literals, but I was surprised to see it here. I also submitted a patch for the one driver that turned up a new warning because of this behavior: https://www.spinics.net/lists/raid/msg58766.html In case of the mmc driver, the behavior was as expected, it was just a little too large and I sent the obvious workaround for it https://patchwork.kernel.org/patch/9902063/ > I tried to reduce the size of completion struct, and the "rep movsq" did > go away, however it seemed the compiler still allocated the memory for > the temporary variables on the stack, because whenever I > increased/decreased the size of completion, the stack size of > write_journal() got increased/decreased *7* times, but there are only > 3 journal_completion structures in write_journal(). So the *4* callsites > of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous. > > So I come up with the following patch, trying to teach the compiler
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
Hi Arnd, On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote: > On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Parkwrote: > > Although wait_for_completion() and its family can cause deadlock, the > > lock correctness validator could not be applied to them until now, > > because things like complete() are usually called in a different context > > from the waiting context, which violates lockdep's assumption. > > > > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep > > detector to those completion operations. Applied it. > > > > Signed-off-by: Byungchul Park > > This patch introduced a significant growth in kernel stack usage for a small > set of functions. I see two new warnings for functions that get tipped over > the > 1024 or 2048 byte frame size limit in linux-next (with a few other patches > applied): > > Before: > > drivers/md/dm-integrity.c: In function 'write_journal': > drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is > larger than xxx bytes [-Werror=frame-larger-than=] > drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': > drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes > is larger than 104 bytes [-Werror=frame-larger-than=] > > After: > > drivers/md/dm-integrity.c: In function 'write_journal': > drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes > is larger than 1024 bytes [-Werror=frame-larger-than=] > drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': > drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072 > bytes is larger than 1024 bytes [-Werror=frame-larger-than=] > > I have not checked in detail why this happens, but I'm guessing that > there is an overall increase in stack usage with > CONFIG_LOCKDEP_COMPLETE in functions using completions, > and I think it would be good to try to come up with a version that doesn't > add as much. > So I have been staring at this for a while, and below is what I found: (BTW, Arnd, may I know your compiler version? Mine is 7.1.1) In write_journal(), I can see the code generated like this on x86: io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 2462: e8 00 00 00 00 callq 2467 2467: 48 8d 85 80 fd ff fflea-0x280(%rbp),%rax 246e: 48 c7 c6 00 00 00 00mov$0x0,%rsi 2475: 48 c7 c2 00 00 00 00mov$0x0,%rdx x->done = 0; 247c: c7 85 90 fd ff ff 00movl $0x0,-0x270(%rbp) 2483: 00 00 00 init_waitqueue_head(>wait); 2486: 48 8d 78 18 lea0x18(%rax),%rdi 248a: e8 00 00 00 00 callq 248f if (commit_start + commit_sections <= ic->journal_sections) { 248f: 41 8b 87 a8 00 00 00mov0xa8(%r15),%eax io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 2496: 48 8d bd e8 f9 ff fflea-0x618(%rbp),%rdi 249d: 48 8d b5 90 fd ff fflea-0x270(%rbp),%rsi 24a4: b9 17 00 00 00 mov$0x17,%ecx 24a9: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) if (commit_start + commit_sections <= ic->journal_sections) { 24ac: 41 39 c6cmp%eax,%r14d io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 24af: 48 8d bd 90 fd ff fflea-0x270(%rbp),%rdi 24b6: 48 8d b5 e8 f9 ff fflea-0x618(%rbp),%rsi 24bd: b9 17 00 00 00 mov$0x17,%ecx 24c2: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) Those two "rep movsq"s are very suspicious, because COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place, rather than move it to some temporary variable and copy it back. I tried to reduce the size of completion struct, and the "rep movsq" did go away, however it seemed the compiler still allocated the memory for the temporary variables on the stack, because whenever I increased/decreased the size of completion, the stack size of write_journal() got increased/decreased *7* times, but there are only 3 journal_completion structures in write_journal(). So the *4* callsites of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous. So I come up with the following patch, trying to teach the compiler not to do the unnecessary allocation, could you give it a try? Besides, I could also observe the stack size reduction of write_journal() even for !LOCKDEP kernel. -->8 Subject: [PATCH] completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK() In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the stack allocation of the caller. However, on some compilers, a temporary structure was allocated for the return value of COMPLETION_INITIALIZER_ONSTACK(), for example in write_journal() with
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
Hi Arnd, On Mon, Aug 14, 2017 at 10:50:24AM +0200, Arnd Bergmann wrote: > On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park wrote: > > Although wait_for_completion() and its family can cause deadlock, the > > lock correctness validator could not be applied to them until now, > > because things like complete() are usually called in a different context > > from the waiting context, which violates lockdep's assumption. > > > > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep > > detector to those completion operations. Applied it. > > > > Signed-off-by: Byungchul Park > > This patch introduced a significant growth in kernel stack usage for a small > set of functions. I see two new warnings for functions that get tipped over > the > 1024 or 2048 byte frame size limit in linux-next (with a few other patches > applied): > > Before: > > drivers/md/dm-integrity.c: In function 'write_journal': > drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is > larger than xxx bytes [-Werror=frame-larger-than=] > drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': > drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes > is larger than 104 bytes [-Werror=frame-larger-than=] > > After: > > drivers/md/dm-integrity.c: In function 'write_journal': > drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes > is larger than 1024 bytes [-Werror=frame-larger-than=] > drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': > drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072 > bytes is larger than 1024 bytes [-Werror=frame-larger-than=] > > I have not checked in detail why this happens, but I'm guessing that > there is an overall increase in stack usage with > CONFIG_LOCKDEP_COMPLETE in functions using completions, > and I think it would be good to try to come up with a version that doesn't > add as much. > So I have been staring at this for a while, and below is what I found: (BTW, Arnd, may I know your compiler version? Mine is 7.1.1) In write_journal(), I can see the code generated like this on x86: io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 2462: e8 00 00 00 00 callq 2467 2467: 48 8d 85 80 fd ff fflea-0x280(%rbp),%rax 246e: 48 c7 c6 00 00 00 00mov$0x0,%rsi 2475: 48 c7 c2 00 00 00 00mov$0x0,%rdx x->done = 0; 247c: c7 85 90 fd ff ff 00movl $0x0,-0x270(%rbp) 2483: 00 00 00 init_waitqueue_head(>wait); 2486: 48 8d 78 18 lea0x18(%rax),%rdi 248a: e8 00 00 00 00 callq 248f if (commit_start + commit_sections <= ic->journal_sections) { 248f: 41 8b 87 a8 00 00 00mov0xa8(%r15),%eax io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 2496: 48 8d bd e8 f9 ff fflea-0x618(%rbp),%rdi 249d: 48 8d b5 90 fd ff fflea-0x270(%rbp),%rsi 24a4: b9 17 00 00 00 mov$0x17,%ecx 24a9: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) if (commit_start + commit_sections <= ic->journal_sections) { 24ac: 41 39 c6cmp%eax,%r14d io_comp.comp = COMPLETION_INITIALIZER_ONSTACK(io_comp.comp); 24af: 48 8d bd 90 fd ff fflea-0x270(%rbp),%rdi 24b6: 48 8d b5 e8 f9 ff fflea-0x618(%rbp),%rsi 24bd: b9 17 00 00 00 mov$0x17,%ecx 24c2: f3 48 a5rep movsq %ds:(%rsi),%es:(%rdi) Those two "rep movsq"s are very suspicious, because COMPLETION_INITIALIZER_ONSTACK() should initialize the data in-place, rather than move it to some temporary variable and copy it back. I tried to reduce the size of completion struct, and the "rep movsq" did go away, however it seemed the compiler still allocated the memory for the temporary variables on the stack, because whenever I increased/decreased the size of completion, the stack size of write_journal() got increased/decreased *7* times, but there are only 3 journal_completion structures in write_journal(). So the *4* callsites of COMPLETION_INITIALIZER_ONSTACK() looked very suspicous. So I come up with the following patch, trying to teach the compiler not to do the unnecessary allocation, could you give it a try? Besides, I could also observe the stack size reduction of write_journal() even for !LOCKDEP kernel. -->8 Subject: [PATCH] completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK() In theory, COMPLETION_INITIALIZER_ONSTACK() should never affect the stack allocation of the caller. However, on some compilers, a temporary structure was allocated for the return value of COMPLETION_INITIALIZER_ONSTACK(), for example in write_journal() with LOCKDEP_COMPLETIONS=y(gcc is 7.1.1): io_comp.comp =
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Parkwrote: > Although wait_for_completion() and its family can cause deadlock, the > lock correctness validator could not be applied to them until now, > because things like complete() are usually called in a different context > from the waiting context, which violates lockdep's assumption. > > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep > detector to those completion operations. Applied it. > > Signed-off-by: Byungchul Park This patch introduced a significant growth in kernel stack usage for a small set of functions. I see two new warnings for functions that get tipped over the 1024 or 2048 byte frame size limit in linux-next (with a few other patches applied): Before: drivers/md/dm-integrity.c: In function 'write_journal': drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is larger than xxx bytes [-Werror=frame-larger-than=] drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes is larger than 104 bytes [-Werror=frame-larger-than=] After: drivers/md/dm-integrity.c: In function 'write_journal': drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] I have not checked in detail why this happens, but I'm guessing that there is an overall increase in stack usage with CONFIG_LOCKDEP_COMPLETE in functions using completions, and I think it would be good to try to come up with a version that doesn't add as much. Arnd
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Mon, Aug 7, 2017 at 9:12 AM, Byungchul Park wrote: > Although wait_for_completion() and its family can cause deadlock, the > lock correctness validator could not be applied to them until now, > because things like complete() are usually called in a different context > from the waiting context, which violates lockdep's assumption. > > Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep > detector to those completion operations. Applied it. > > Signed-off-by: Byungchul Park This patch introduced a significant growth in kernel stack usage for a small set of functions. I see two new warnings for functions that get tipped over the 1024 or 2048 byte frame size limit in linux-next (with a few other patches applied): Before: drivers/md/dm-integrity.c: In function 'write_journal': drivers/md/dm-integrity.c:827:1: error: the frame size of 504 bytes is larger than xxx bytes [-Werror=frame-larger-than=] drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 680 bytes is larger than 104 bytes [-Werror=frame-larger-than=] After: drivers/md/dm-integrity.c: In function 'write_journal': drivers/md/dm-integrity.c:827:1: error: the frame size of 1280 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] drivers/mmc/core/mmc_test.c: In function 'mmc_test_area_io_seq': drivers/mmc/core/mmc_test.c:1491:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] I have not checked in detail why this happens, but I'm guessing that there is an overall increase in stack usage with CONFIG_LOCKDEP_COMPLETE in functions using completions, and I think it would be good to try to come up with a version that doesn't add as much. Arnd
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Wed, Aug 09, 2017 at 12:24:39PM +0200, Peter Zijlstra wrote: > On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote: > > On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote: > > > +static inline void wait_for_completion(struct completion *x) > > > +{ > > > + complete_acquire(x); > > > + __wait_for_completion(x); > > > + complete_release(x); > > > +} > > > + > > > +static inline void wait_for_completion_io(struct completion *x) > > > +{ > > > + complete_acquire(x); > > > + __wait_for_completion_io(x); > > > + complete_release(x); > > > +} > > > + > > > +static inline int wait_for_completion_interruptible(struct completion *x) > > > +{ > > > + int ret; > > > + complete_acquire(x); > > > + ret = __wait_for_completion_interruptible(x); > > > + complete_release(x); > > > + return ret; > > > +} > > > + > > > +static inline int wait_for_completion_killable(struct completion *x) > > > +{ > > > + int ret; > > > + complete_acquire(x); > > > + ret = __wait_for_completion_killable(x); > > > + complete_release(x); > > > + return ret; > > > +} > > > > I don't understand, why not change __wait_for_common() ? > > That is what is wrong with the below? > > Yes, it adds acquire/release to the timeout variants too, but I don't Yes, I didn't want to involve them in lockdep play which reports _deadlock_ warning since it's not a dependency causing a deadlock. > see why we should exclude those, and even if we'd want to do that, it > would be trivial: > > bool timo = (timeout == MAX_SCHEDULE_TIMEOUT); > > if (!timo) > complete_acquire(x); > > /* ... */ > > if (!timo) > complete_release(x); Yes, frankly I wanted to use this.. but skip it. > But like said, I think we very much want to annotate waits with timeouts > too. Hitting the max timo doesn't necessarily mean we'll make fwd > progress, we could be stuck in a loop doing something else again before > returning to wait. In that case, it should be detected by other dependencies which makes problems, not the dependency by wait_for_complete(). > Also, even if we'd make fwd progress, hitting that max timo is still not > desirable. It's not desirable but it's not a dependency causing a deadlock, so I did not want to _deadlock_ warning in that cases.. I didn't want to abuse lockdep reports.. However, it's OK if you think it's worth warning even in that cases. Thank you very much, Byungchul
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Wed, Aug 09, 2017 at 12:24:39PM +0200, Peter Zijlstra wrote: > On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote: > > On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote: > > > +static inline void wait_for_completion(struct completion *x) > > > +{ > > > + complete_acquire(x); > > > + __wait_for_completion(x); > > > + complete_release(x); > > > +} > > > + > > > +static inline void wait_for_completion_io(struct completion *x) > > > +{ > > > + complete_acquire(x); > > > + __wait_for_completion_io(x); > > > + complete_release(x); > > > +} > > > + > > > +static inline int wait_for_completion_interruptible(struct completion *x) > > > +{ > > > + int ret; > > > + complete_acquire(x); > > > + ret = __wait_for_completion_interruptible(x); > > > + complete_release(x); > > > + return ret; > > > +} > > > + > > > +static inline int wait_for_completion_killable(struct completion *x) > > > +{ > > > + int ret; > > > + complete_acquire(x); > > > + ret = __wait_for_completion_killable(x); > > > + complete_release(x); > > > + return ret; > > > +} > > > > I don't understand, why not change __wait_for_common() ? > > That is what is wrong with the below? > > Yes, it adds acquire/release to the timeout variants too, but I don't Yes, I didn't want to involve them in lockdep play which reports _deadlock_ warning since it's not a dependency causing a deadlock. > see why we should exclude those, and even if we'd want to do that, it > would be trivial: > > bool timo = (timeout == MAX_SCHEDULE_TIMEOUT); > > if (!timo) > complete_acquire(x); > > /* ... */ > > if (!timo) > complete_release(x); Yes, frankly I wanted to use this.. but skip it. > But like said, I think we very much want to annotate waits with timeouts > too. Hitting the max timo doesn't necessarily mean we'll make fwd > progress, we could be stuck in a loop doing something else again before > returning to wait. In that case, it should be detected by other dependencies which makes problems, not the dependency by wait_for_complete(). > Also, even if we'd make fwd progress, hitting that max timo is still not > desirable. It's not desirable but it's not a dependency causing a deadlock, so I did not want to _deadlock_ warning in that cases.. I didn't want to abuse lockdep reports.. However, it's OK if you think it's worth warning even in that cases. Thank you very much, Byungchul
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote: > On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote: > > +static inline void wait_for_completion(struct completion *x) > > +{ > > + complete_acquire(x); > > + __wait_for_completion(x); > > + complete_release(x); > > +} > > + > > +static inline void wait_for_completion_io(struct completion *x) > > +{ > > + complete_acquire(x); > > + __wait_for_completion_io(x); > > + complete_release(x); > > +} > > + > > +static inline int wait_for_completion_interruptible(struct completion *x) > > +{ > > + int ret; > > + complete_acquire(x); > > + ret = __wait_for_completion_interruptible(x); > > + complete_release(x); > > + return ret; > > +} > > + > > +static inline int wait_for_completion_killable(struct completion *x) > > +{ > > + int ret; > > + complete_acquire(x); > > + ret = __wait_for_completion_killable(x); > > + complete_release(x); > > + return ret; > > +} > > I don't understand, why not change __wait_for_common() ? That is what is wrong with the below? Yes, it adds acquire/release to the timeout variants too, but I don't see why we should exclude those, and even if we'd want to do that, it would be trivial: bool timo = (timeout == MAX_SCHEDULE_TIMEOUT); if (!timo) complete_acquire(x); /* ... */ if (!timo) complete_release(x); But like said, I think we very much want to annotate waits with timeouts too. Hitting the max timo doesn't necessarily mean we'll make fwd progress, we could be stuck in a loop doing something else again before returning to wait. Also, even if we'd make fwd progress, hitting that max timo is still not desirable. --- Subject: lockdep: Apply crossrelease to completions From: Byungchul ParkDate: Mon, 7 Aug 2017 16:12:56 +0900 Although wait_for_completion() and its family can cause deadlock, the lock correctness validator could not be applied to them until now, because things like complete() are usually called in a different context from the waiting context, which violates lockdep's assumption. Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep detector to those completion operations. Applied it. Cc: npig...@gmail.com Cc: mi...@kernel.org Cc: a...@linux-foundation.org Cc: t...@linutronix.de Cc: boqun.f...@gmail.com Cc: wi...@infradead.org Cc: wal...@google.com Cc: kernel-t...@lge.com Cc: kir...@shutemov.name Signed-off-by: Byungchul Park Signed-off-by: Peter Zijlstra (Intel) --- include/linux/completion.h | 45 - kernel/sched/completion.c | 11 +++ lib/Kconfig.debug |8 3 files changed, 63 insertions(+), 1 deletion(-) --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -9,6 +9,9 @@ */ #include +#ifdef CONFIG_LOCKDEP_COMPLETE +#include +#endif /* * struct completion - structure used to maintain state for a "completion" @@ -25,10 +28,50 @@ struct completion { unsigned int done; wait_queue_head_t wait; +#ifdef CONFIG_LOCKDEP_COMPLETE + struct lockdep_map_cross map; +#endif }; +#ifdef CONFIG_LOCKDEP_COMPLETE +static inline void complete_acquire(struct completion *x) +{ + lock_acquire_exclusive((struct lockdep_map *)>map, 0, 0, NULL, _RET_IP_); +} + +static inline void complete_release(struct completion *x) +{ + lock_release((struct lockdep_map *)>map, 0, _RET_IP_); +} + +static inline void complete_release_commit(struct completion *x) +{ + lock_commit_crosslock((struct lockdep_map *)>map); +} + +#define init_completion(x) \ +do { \ + static struct lock_class_key __key; \ + lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \ + "(complete)" #x,\ + &__key, 0); \ + __init_completion(x); \ +} while (0) +#else +#define init_completion(x) __init_completion(x) +static inline void complete_acquire(struct completion *x) {} +static inline void complete_release(struct completion *x) {} +static inline void complete_release_commit(struct completion *x) {} +#endif + +#ifdef CONFIG_LOCKDEP_COMPLETE +#define COMPLETION_INITIALIZER(work) \ + { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \ + STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } +#else #define COMPLETION_INITIALIZER(work) \ { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) } +#endif #define COMPLETION_INITIALIZER_ONSTACK(work) \ ({ init_completion(); work; }) @@ -70,7 +113,7 @@ struct completion { * This inline function will initialize a dynamically created
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Wed, Aug 09, 2017 at 11:51:07AM +0200, Peter Zijlstra wrote: > On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote: > > +static inline void wait_for_completion(struct completion *x) > > +{ > > + complete_acquire(x); > > + __wait_for_completion(x); > > + complete_release(x); > > +} > > + > > +static inline void wait_for_completion_io(struct completion *x) > > +{ > > + complete_acquire(x); > > + __wait_for_completion_io(x); > > + complete_release(x); > > +} > > + > > +static inline int wait_for_completion_interruptible(struct completion *x) > > +{ > > + int ret; > > + complete_acquire(x); > > + ret = __wait_for_completion_interruptible(x); > > + complete_release(x); > > + return ret; > > +} > > + > > +static inline int wait_for_completion_killable(struct completion *x) > > +{ > > + int ret; > > + complete_acquire(x); > > + ret = __wait_for_completion_killable(x); > > + complete_release(x); > > + return ret; > > +} > > I don't understand, why not change __wait_for_common() ? That is what is wrong with the below? Yes, it adds acquire/release to the timeout variants too, but I don't see why we should exclude those, and even if we'd want to do that, it would be trivial: bool timo = (timeout == MAX_SCHEDULE_TIMEOUT); if (!timo) complete_acquire(x); /* ... */ if (!timo) complete_release(x); But like said, I think we very much want to annotate waits with timeouts too. Hitting the max timo doesn't necessarily mean we'll make fwd progress, we could be stuck in a loop doing something else again before returning to wait. Also, even if we'd make fwd progress, hitting that max timo is still not desirable. --- Subject: lockdep: Apply crossrelease to completions From: Byungchul Park Date: Mon, 7 Aug 2017 16:12:56 +0900 Although wait_for_completion() and its family can cause deadlock, the lock correctness validator could not be applied to them until now, because things like complete() are usually called in a different context from the waiting context, which violates lockdep's assumption. Thanks to CONFIG_LOCKDEP_CROSSRELEASE, we can now apply the lockdep detector to those completion operations. Applied it. Cc: npig...@gmail.com Cc: mi...@kernel.org Cc: a...@linux-foundation.org Cc: t...@linutronix.de Cc: boqun.f...@gmail.com Cc: wi...@infradead.org Cc: wal...@google.com Cc: kernel-t...@lge.com Cc: kir...@shutemov.name Signed-off-by: Byungchul Park Signed-off-by: Peter Zijlstra (Intel) --- include/linux/completion.h | 45 - kernel/sched/completion.c | 11 +++ lib/Kconfig.debug |8 3 files changed, 63 insertions(+), 1 deletion(-) --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -9,6 +9,9 @@ */ #include +#ifdef CONFIG_LOCKDEP_COMPLETE +#include +#endif /* * struct completion - structure used to maintain state for a "completion" @@ -25,10 +28,50 @@ struct completion { unsigned int done; wait_queue_head_t wait; +#ifdef CONFIG_LOCKDEP_COMPLETE + struct lockdep_map_cross map; +#endif }; +#ifdef CONFIG_LOCKDEP_COMPLETE +static inline void complete_acquire(struct completion *x) +{ + lock_acquire_exclusive((struct lockdep_map *)>map, 0, 0, NULL, _RET_IP_); +} + +static inline void complete_release(struct completion *x) +{ + lock_release((struct lockdep_map *)>map, 0, _RET_IP_); +} + +static inline void complete_release_commit(struct completion *x) +{ + lock_commit_crosslock((struct lockdep_map *)>map); +} + +#define init_completion(x) \ +do { \ + static struct lock_class_key __key; \ + lockdep_init_map_crosslock((struct lockdep_map *)&(x)->map, \ + "(complete)" #x,\ + &__key, 0); \ + __init_completion(x); \ +} while (0) +#else +#define init_completion(x) __init_completion(x) +static inline void complete_acquire(struct completion *x) {} +static inline void complete_release(struct completion *x) {} +static inline void complete_release_commit(struct completion *x) {} +#endif + +#ifdef CONFIG_LOCKDEP_COMPLETE +#define COMPLETION_INITIALIZER(work) \ + { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait), \ + STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } +#else #define COMPLETION_INITIALIZER(work) \ { 0, __WAIT_QUEUE_HEAD_INITIALIZER((work).wait) } +#endif #define COMPLETION_INITIALIZER_ONSTACK(work) \ ({ init_completion(); work; }) @@ -70,7 +113,7 @@ struct completion { * This inline function will initialize a dynamically created completion * structure. */ -static inline void init_completion(struct
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote: > +static inline void wait_for_completion(struct completion *x) > +{ > + complete_acquire(x); > + __wait_for_completion(x); > + complete_release(x); > +} > + > +static inline void wait_for_completion_io(struct completion *x) > +{ > + complete_acquire(x); > + __wait_for_completion_io(x); > + complete_release(x); > +} > + > +static inline int wait_for_completion_interruptible(struct completion *x) > +{ > + int ret; > + complete_acquire(x); > + ret = __wait_for_completion_interruptible(x); > + complete_release(x); > + return ret; > +} > + > +static inline int wait_for_completion_killable(struct completion *x) > +{ > + int ret; > + complete_acquire(x); > + ret = __wait_for_completion_killable(x); > + complete_release(x); > + return ret; > +} I don't understand, why not change __wait_for_common() ?
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
On Mon, Aug 07, 2017 at 04:12:56PM +0900, Byungchul Park wrote: > +static inline void wait_for_completion(struct completion *x) > +{ > + complete_acquire(x); > + __wait_for_completion(x); > + complete_release(x); > +} > + > +static inline void wait_for_completion_io(struct completion *x) > +{ > + complete_acquire(x); > + __wait_for_completion_io(x); > + complete_release(x); > +} > + > +static inline int wait_for_completion_interruptible(struct completion *x) > +{ > + int ret; > + complete_acquire(x); > + ret = __wait_for_completion_interruptible(x); > + complete_release(x); > + return ret; > +} > + > +static inline int wait_for_completion_killable(struct completion *x) > +{ > + int ret; > + complete_acquire(x); > + ret = __wait_for_completion_killable(x); > + complete_release(x); > + return ret; > +} I don't understand, why not change __wait_for_common() ?
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
Hi Byungchul, [auto build test ERROR on linus/master] [also build test ERROR on v4.13-rc4 next-20170804] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617 config: cris-allmodconfig (attached as .config) compiler: cris-linux-gcc (GCC) 6.2.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=cris All error/warnings (new ones prefixed by >>): In file included from include/linux/pm.h:29:0, from include/linux/device.h:25, from include/linux/pci.h:30, from drivers/usb/host/ehci-hcd.c:24: include/linux/completion.h:32:27: error: field 'map' has incomplete type struct lockdep_map_cross map; ^~~ In file included from include/linux/spinlock_types.h:18:0, from include/linux/spinlock.h:81, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/linux/stat.h:18, from include/linux/module.h:10, from drivers/usb/host/ehci-hcd.c:23: drivers/usb/host/ehci-hub.c: In function 'ehset_single_step_set_feature': >> include/linux/lockdep.h:578:4: error: field name not in record or union >> initializer { .map.name = (_name), .map.key = (void *)(_key), \ ^ >> include/linux/completion.h:70:2: note: in expansion of macro >> 'STATIC_CROSS_LOCKDEP_MAP_INIT' STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } ^ include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER' struct completion work = COMPLETION_INITIALIZER(work) ^~ include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION' # define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work) ^~ drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK' DECLARE_COMPLETION_ONSTACK(done); ^~ include/linux/lockdep.h:578:4: note: (near initialization for 'done.map') { .map.name = (_name), .map.key = (void *)(_key), \ ^ >> include/linux/completion.h:70:2: note: in expansion of macro >> 'STATIC_CROSS_LOCKDEP_MAP_INIT' STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } ^ include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER' struct completion work = COMPLETION_INITIALIZER(work) ^~ include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION' # define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work) ^~ drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK' DECLARE_COMPLETION_ONSTACK(done); ^~ include/linux/lockdep.h:578:25: error: field name not in record or union initializer { .map.name = (_name), .map.key = (void *)(_key), \ ^ >> include/linux/completion.h:70:2: note: in expansion of macro >> 'STATIC_CROSS_LOCKDEP_MAP_INIT' STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } ^ include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER' struct completion work = COMPLETION_INITIALIZER(work) ^~ include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION' # define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work) ^~ drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK' DECLARE_COMPLETION_ONSTACK(done); ^~ include/linux/lockdep.h:578:25: note: (near initialization for 'done.map') { .map.name = (_name), .map.key = (void *)(_key), \ ^ >> include/linux/completion.h:70:2: note: in expansion of macro >> 'STATIC_CROSS_LOCKDEP_MAP_INIT' STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } ^ include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER' struct completion work = COMPLETION_INITIALIZER(work) ^~ include/linux/completion.h:106:43: note: in expansion of
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
Hi Byungchul, [auto build test ERROR on linus/master] [also build test ERROR on v4.13-rc4 next-20170804] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617 config: cris-allmodconfig (attached as .config) compiler: cris-linux-gcc (GCC) 6.2.0 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=cris All error/warnings (new ones prefixed by >>): In file included from include/linux/pm.h:29:0, from include/linux/device.h:25, from include/linux/pci.h:30, from drivers/usb/host/ehci-hcd.c:24: include/linux/completion.h:32:27: error: field 'map' has incomplete type struct lockdep_map_cross map; ^~~ In file included from include/linux/spinlock_types.h:18:0, from include/linux/spinlock.h:81, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/linux/stat.h:18, from include/linux/module.h:10, from drivers/usb/host/ehci-hcd.c:23: drivers/usb/host/ehci-hub.c: In function 'ehset_single_step_set_feature': >> include/linux/lockdep.h:578:4: error: field name not in record or union >> initializer { .map.name = (_name), .map.key = (void *)(_key), \ ^ >> include/linux/completion.h:70:2: note: in expansion of macro >> 'STATIC_CROSS_LOCKDEP_MAP_INIT' STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } ^ include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER' struct completion work = COMPLETION_INITIALIZER(work) ^~ include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION' # define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work) ^~ drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK' DECLARE_COMPLETION_ONSTACK(done); ^~ include/linux/lockdep.h:578:4: note: (near initialization for 'done.map') { .map.name = (_name), .map.key = (void *)(_key), \ ^ >> include/linux/completion.h:70:2: note: in expansion of macro >> 'STATIC_CROSS_LOCKDEP_MAP_INIT' STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } ^ include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER' struct completion work = COMPLETION_INITIALIZER(work) ^~ include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION' # define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work) ^~ drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK' DECLARE_COMPLETION_ONSTACK(done); ^~ include/linux/lockdep.h:578:25: error: field name not in record or union initializer { .map.name = (_name), .map.key = (void *)(_key), \ ^ >> include/linux/completion.h:70:2: note: in expansion of macro >> 'STATIC_CROSS_LOCKDEP_MAP_INIT' STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } ^ include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER' struct completion work = COMPLETION_INITIALIZER(work) ^~ include/linux/completion.h:106:43: note: in expansion of macro 'DECLARE_COMPLETION' # define DECLARE_COMPLETION_ONSTACK(work) DECLARE_COMPLETION(work) ^~ drivers/usb/host/ehci-hub.c:811:2: note: in expansion of macro 'DECLARE_COMPLETION_ONSTACK' DECLARE_COMPLETION_ONSTACK(done); ^~ include/linux/lockdep.h:578:25: note: (near initialization for 'done.map') { .map.name = (_name), .map.key = (void *)(_key), \ ^ >> include/linux/completion.h:70:2: note: in expansion of macro >> 'STATIC_CROSS_LOCKDEP_MAP_INIT' STATIC_CROSS_LOCKDEP_MAP_INIT("(complete)" #work, &(work)) } ^ include/linux/completion.h:88:27: note: in expansion of macro 'COMPLETION_INITIALIZER' struct completion work = COMPLETION_INITIALIZER(work) ^~ include/linux/completion.h:106:43: note: in expansion of
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
Hi Byungchul, [auto build test ERROR on linus/master] [also build test ERROR on v4.13-rc4 next-20170804] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617 config: alpha-allmodconfig (attached as .config) compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=alpha All errors (new ones prefixed by >>): warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING) warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING) In file included from include/linux/srcutree.h:28:0, from include/linux/srcu.h:62, from include/linux/notifier.h:15, from include/linux/memory_hotplug.h:6, from include/linux/mmzone.h:771, from include/linux/gfp.h:5, from include/linux/mm.h:9, from include/linux/pid_namespace.h:6, from include/linux/ptrace.h:9, from arch/alpha/kernel/asm-offsets.c:10: >> include/linux/completion.h:32:27: error: field 'map' has incomplete type struct lockdep_map_cross map; ^~~ make[2]: *** [arch/alpha/kernel/asm-offsets.s] Error 1 make[2]: Target '__build' not remade because of errors. make[1]: *** [prepare0] Error 2 make[1]: Target 'prepare' not remade because of errors. make: *** [sub-make] Error 2 vim +/map +32 include/linux/completion.h 15 16 /* 17 * struct completion - structure used to maintain state for a "completion" 18 * 19 * This is the opaque structure used to maintain the state for a "completion". 20 * Completions currently use a FIFO to queue threads that have to wait for 21 * the "completion" event. 22 * 23 * See also: complete(), wait_for_completion() (and friends _timeout, 24 * _interruptible, _interruptible_timeout, and _killable), init_completion(), 25 * reinit_completion(), and macros DECLARE_COMPLETION(), 26 * DECLARE_COMPLETION_ONSTACK(). 27 */ 28 struct completion { 29 unsigned int done; 30 wait_queue_head_t wait; 31 #ifdef CONFIG_LOCKDEP_COMPLETE > 32 struct lockdep_map_cross map; 33 #endif 34 }; 35 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v8 09/14] lockdep: Apply crossrelease to completions
Hi Byungchul, [auto build test ERROR on linus/master] [also build test ERROR on v4.13-rc4 next-20170804] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Byungchul-Park/lockdep-Implement-crossrelease-feature/20170807-172617 config: alpha-allmodconfig (attached as .config) compiler: alpha-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705 reproduce: wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=alpha All errors (new ones prefixed by >>): warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING) warning: (LOCKDEP_COMPLETE) selects LOCKDEP_CROSSRELEASE which has unmet direct dependencies (PROVE_LOCKING) In file included from include/linux/srcutree.h:28:0, from include/linux/srcu.h:62, from include/linux/notifier.h:15, from include/linux/memory_hotplug.h:6, from include/linux/mmzone.h:771, from include/linux/gfp.h:5, from include/linux/mm.h:9, from include/linux/pid_namespace.h:6, from include/linux/ptrace.h:9, from arch/alpha/kernel/asm-offsets.c:10: >> include/linux/completion.h:32:27: error: field 'map' has incomplete type struct lockdep_map_cross map; ^~~ make[2]: *** [arch/alpha/kernel/asm-offsets.s] Error 1 make[2]: Target '__build' not remade because of errors. make[1]: *** [prepare0] Error 2 make[1]: Target 'prepare' not remade because of errors. make: *** [sub-make] Error 2 vim +/map +32 include/linux/completion.h 15 16 /* 17 * struct completion - structure used to maintain state for a "completion" 18 * 19 * This is the opaque structure used to maintain the state for a "completion". 20 * Completions currently use a FIFO to queue threads that have to wait for 21 * the "completion" event. 22 * 23 * See also: complete(), wait_for_completion() (and friends _timeout, 24 * _interruptible, _interruptible_timeout, and _killable), init_completion(), 25 * reinit_completion(), and macros DECLARE_COMPLETION(), 26 * DECLARE_COMPLETION_ONSTACK(). 27 */ 28 struct completion { 29 unsigned int done; 30 wait_queue_head_t wait; 31 #ifdef CONFIG_LOCKDEP_COMPLETE > 32 struct lockdep_map_cross map; 33 #endif 34 }; 35 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip