Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 1:03 PM David Woodhouse wrote: > > On Mon, 2020-05-11 at 13:01 -0700, Linus Torvalds wrote: > > On Mon, May 11, 2020 at 12:52 PM Nick Desaulniers > > wrote: > > > > > > Interesting approach. Researching __builtin_choose_expr, it looks > > > like it was cited as prior art for C11's _Generic keyword. > > > > Well, the thing that made me think that __builtin_choose_expr() would > > work is that unlike the switch statement, you absolutely _have_ to do > > the choice in the front end. You can't leave it as some kind of > > optimization for later phases, because the choice od expression ends > > up also determining the type of the result, so it isn't just a local > > choice - it affects everything around that expression. > > > > But clang still doesn't like that "qi" constraint with a (non-chosen) > > expression that has a "u64" type. > > > > I guess we can take the stupid extra cast, but I think it would at > > least need a comment (maybe through a helper function) about why "qi" > > needs it, but "ri" does not, and why the cast to "unsigned long" is > > needed, even though "clearly" the type is already just 8 bits. > > > > Otherwise somebody will just remove that "obviously pointless" cast, > > and gcc will eat the result happily, and clang will fail. > > I'm also mildly concerned that LLVM will start to whine about the 'ri' > case too. It's odd that it doesn't, even when GCC does. Ah, sorry, it took me a while to understand what case you meant by this. I recall you pointing this out previously in https://bugs.llvm.org/show_bug.cgi?id=33587#c16, but at the time I simply wasn't well versed enough in inline asm to understand. The case you're referring to is 64b operands, "r" constraint, -m32 target. Yes, if I fix that: https://reviews.llvm.org/D79804, then this whole patch needs to be reworked. Back to the drawing board. -- Thanks, ~Nick Desaulniers
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 3:34 PM Brian Gerst wrote: > > On Mon, May 11, 2020 at 2:46 PM Nick Desaulniers > wrote: > > > > On Mon, May 11, 2020 at 11:09 AM Brian Gerst wrote: > > > This looks like the same issue that we just discussed for bitops.h. > > > Add the "b" operand size modifier to force it to use the 8-bit > > > register names (and probably also needs the "w" modifier in the 16-bit > > > case). > > > > While it does feel familiar, it is slightly different. > > https://godbolt.org/z/Rme4Zg > > That case was both compilers validating the inline asm, yet generating > > assembly that the assembler would choke on. This case is validation > > in the front end failing. > > > long long ret; > > switch (sizeof(ret)) { > > case 1: > > asm ("movb $5, %0" : "=q" (ret)); > > break; > > case 8:; > > } > > So if the issue here is that the output variable type is long long, > what code is using a 64-bit percpu variable on a 32-bit kernel? Can > you give a specific file that fails to build with Clang? If Clang is > choking on it it may be silently miscompiling on GCC. On further investigation, 64-bit percpu operations fall back to the generic code on x86-32, so there is no problem with miscompiling here. On a side note from looking at the preprocessed output of the percpu macros: they generate a ton of extra dead code because the core macros also have a switch on data size. I will take a stab at cleaning that up. -- Brian Gerst
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 12:34 PM Brian Gerst wrote: > > On Mon, May 11, 2020 at 2:46 PM Nick Desaulniers > wrote: > > > > On Mon, May 11, 2020 at 11:09 AM Brian Gerst wrote: > > > This looks like the same issue that we just discussed for bitops.h. > > > Add the "b" operand size modifier to force it to use the 8-bit > > > register names (and probably also needs the "w" modifier in the 16-bit > > > case). > > > > While it does feel familiar, it is slightly different. > > https://godbolt.org/z/Rme4Zg > > That case was both compilers validating the inline asm, yet generating > > assembly that the assembler would choke on. This case is validation > > in the front end failing. > > > long long ret; > > switch (sizeof(ret)) { > > case 1: > > asm ("movb $5, %0" : "=q" (ret)); > > break; > > case 8:; > > } > > So if the issue here is that the output variable type is long long, > what code is using a 64-bit percpu variable on a 32-bit kernel? Can > you give a specific file that fails to build with Clang? If Clang is > choking on it it may be silently miscompiling on GCC. I'm not sure that's the case. Applying this patch, undoing the hunk in percpu_from_op() we get tons of errors. Looking at one: kernel/events/core.c:8679:8: error: invalid output size for constraint '=q' ./include/linux/percpu-defs.h:446:2: note: expanded from macro '__this_cpu_read' raw_cpu_read(pcp); \ ^ ... There's nothing wrong with this line, it's reading a percpu u64 into a local u64. The error comes from validating the inline asm in the dead branch. -- Thanks, ~Nick Desaulniers
Re: [PATCH] x86: support i386 with Clang
On Mon, 2020-05-11 at 13:01 -0700, Linus Torvalds wrote: > On Mon, May 11, 2020 at 12:52 PM Nick Desaulniers > wrote: > > > > Interesting approach. Researching __builtin_choose_expr, it looks > > like it was cited as prior art for C11's _Generic keyword. > > Well, the thing that made me think that __builtin_choose_expr() would > work is that unlike the switch statement, you absolutely _have_ to do > the choice in the front end. You can't leave it as some kind of > optimization for later phases, because the choice od expression ends > up also determining the type of the result, so it isn't just a local > choice - it affects everything around that expression. > > But clang still doesn't like that "qi" constraint with a (non-chosen) > expression that has a "u64" type. > > I guess we can take the stupid extra cast, but I think it would at > least need a comment (maybe through a helper function) about why "qi" > needs it, but "ri" does not, and why the cast to "unsigned long" is > needed, even though "clearly" the type is already just 8 bits. > > Otherwise somebody will just remove that "obviously pointless" cast, > and gcc will eat the result happily, and clang will fail. I'm also mildly concerned that LLVM will start to whine about the 'ri' case too. It's odd that it doesn't, even when GCC does. smime.p7s Description: S/MIME cryptographic signature
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 12:52 PM Nick Desaulniers wrote: > > Interesting approach. Researching __builtin_choose_expr, it looks > like it was cited as prior art for C11's _Generic keyword. Well, the thing that made me think that __builtin_choose_expr() would work is that unlike the switch statement, you absolutely _have_ to do the choice in the front end. You can't leave it as some kind of optimization for later phases, because the choice od expression ends up also determining the type of the result, so it isn't just a local choice - it affects everything around that expression. But clang still doesn't like that "qi" constraint with a (non-chosen) expression that has a "u64" type. I guess we can take the stupid extra cast, but I think it would at least need a comment (maybe through a helper function) about why "qi" needs it, but "ri" does not, and why the cast to "unsigned long" is needed, even though "clearly" the type is already just 8 bits. Otherwise somebody will just remove that "obviously pointless" cast, and gcc will eat the result happily, and clang will fail. And nobody will notice for a while anyway, because this issue only happens on 32-bit targets, and developers don't use those any more. Linus
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 11:24 AM Linus Torvalds wrote: > > On Mon, May 11, 2020 at 11:12 AM Linus Torvalds > wrote: > > > > Would using "__builtin_choose_expr()" be able to avoid this whole issue? > > We actually have a fair amount of "pick expression based on size", so > with a few helper macros we could make the code look better than the > case statements too. > > Something (ENTIRELY UNTESTED!) like the attached patch, perhaps? > > NOTE! I only converted one single use to that "pick_size_xyz()" model. > If this actually works for clang too, we could do the others. > > I guess I should just test it, since I have that clang tree. Interesting approach. Researching __builtin_choose_expr, it looks like it was cited as prior art for C11's _Generic keyword. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1404.htm I'm kind of tempted now to play with _Generic and see if that makes a difference, though I'm not hopeful. Without checking, I don't know if that will produce warnings with `-std=gnu89`. ... I'm playing around with _Generic now to see if that's a possible solution, but it looks like it can't be used for selecting inline asm constraint strings. But maybe there's another way to use _Generic here. TODO: more testing. I don't quite understand the use of GNU C statement expressions in pick_type_statement, though I'm guessing the return value is important somewhere. Maybe just looking at the preprocessed source would make it clearer. > "why do we cast to unsigned long and then to char, > when we just checked that the size of the type is 1"? Deja vu. I don't remember who I discussed that with, maybe Arnd, but that was something else I had asked at some point. I must have forgotten to look into it more before sending the patch. Can likely drop that at least. -- Thanks, ~Nick Desaulniers
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 2:46 PM Nick Desaulniers wrote: > > On Mon, May 11, 2020 at 11:09 AM Brian Gerst wrote: > > This looks like the same issue that we just discussed for bitops.h. > > Add the "b" operand size modifier to force it to use the 8-bit > > register names (and probably also needs the "w" modifier in the 16-bit > > case). > > While it does feel familiar, it is slightly different. > https://godbolt.org/z/Rme4Zg > That case was both compilers validating the inline asm, yet generating > assembly that the assembler would choke on. This case is validation > in the front end failing. > long long ret; > switch (sizeof(ret)) { > case 1: > asm ("movb $5, %0" : "=q" (ret)); > break; > case 8:; > } So if the issue here is that the output variable type is long long, what code is using a 64-bit percpu variable on a 32-bit kernel? Can you give a specific file that fails to build with Clang? If Clang is choking on it it may be silently miscompiling on GCC. -- Brian Gerst
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 11:09 AM Brian Gerst wrote: > This looks like the same issue that we just discussed for bitops.h. > Add the "b" operand size modifier to force it to use the 8-bit > register names (and probably also needs the "w" modifier in the 16-bit > case). While it does feel familiar, it is slightly different. https://godbolt.org/z/Rme4Zg That case was both compilers validating the inline asm, yet generating assembly that the assembler would choke on. This case is validation in the front end failing. Side note: would you mind sending a review by tag for v5 of that patch if you think it's good to go? It does fix a regression I'd prefer didn't ship in 5.7. -- Thanks, ~Nick Desaulniers
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 11:24 AM Linus Torvalds wrote: > > I guess I should just test it, since I have that clang tree. No, clang doesn't seem to handle it even with __builtin_choose_expr(), and has that invalid input size for constraint 'qi' even when it's in a side that is never chosen. Very annoying. A lot of our magic macros are literally about "pick one case when the others are not valid for this type". Linus
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 11:12 AM Linus Torvalds wrote: > > Would using "__builtin_choose_expr()" be able to avoid this whole issue? We actually have a fair amount of "pick expression based on size", so with a few helper macros we could make the code look better than the case statements too. Something (ENTIRELY UNTESTED!) like the attached patch, perhaps? NOTE! I only converted one single use to that "pick_size_xyz()" model. If this actually works for clang too, we could do the others. I guess I should just test it, since I have that clang tree. Linus patch Description: Binary data
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 10:24 AM Nick Desaulniers wrote: > > Bumping for comment+review. > > On Mon, May 4, 2020 at 4:03 PM Nick Desaulniers > wrote: > > > > - : "qi" ((pto_T__)(val))); \ > > + : "qi" ((unsigned char)(unsigned long)(val))); \ I find the extra casts really annoying and illogical. I kind of see why they happen: just casting to 'unsigned char' can cause warnings, when this case isn't even taken, and the sizeof(var) is something else (and you cast a pointer to a char without going through the 'unsigned long' first. But that doesn't make this change any more sensible or readable. Would using "__builtin_choose_expr()" be able to avoid this whole issue? Because I'd rather re-write the whole thing with a different model that doesn't have this issue, than make the code look insane. And it does look insane, because that whole thing makes for a very natural question: "why do we cast to unsigned long and then to char, when we just checked that the size of the type is 1"? Linus
Re: [PATCH] x86: support i386 with Clang
On Mon, May 11, 2020 at 1:26 PM Nick Desaulniers wrote: > > Bumping for comment+review. > > On Mon, May 4, 2020 at 4:03 PM Nick Desaulniers > wrote: > > > > GCC and Clang are architecturally different, which leads to subtle > > issues for code that's invalid but clearly dead. This can happen with > > code that emulates polymorphism with the preprocessor and sizeof. > > > > GCC will perform semantic analysis after early inlining and dead code > > elimination, so it will not warn on invalid code that's dead. Clang > > strictly performs optimizations after semantic analysis, so it will warn > > for dead code. > > > > Neither Clang nor GCC like this very much with -m32: > > > > long long ret; > > asm ("movb $5, %0" : "=q" (ret)); > > > > However, GCC can tolerate this variant: > > > > long long ret; > > switch (sizeof(ret)) { > > case 1: > > asm ("movb $5, %0" : "=q" (ret)); > > break; > > case 8:; > > } > > > > Clang, on the other hand, won't accept that because it validates the > > inline asm for the '1' case *before* the optimisation phase where it > > realises that it wouldn't have to emit it anyway. > > > > If LLVM (Clang's "back end") fails such as during instruction selection > > or register allocation, it cannot provide accurate diagnostics > > (warnings/errors) that contain line information, as the AST has been > > discarded from memory at that point. > > > > While there have been early discussions about having C/C++ specific > > language optimizations in Clang via the use of MLIR, which would enable > > such earlier optimizations, such work is not scoped and likely a > > multi-year endeavor. > > > > We also don't want to swap the use of "=q" with "=r". For 64b, it > > doesn't matter. For 32b, it's possible that a 32b register without a 8b > > lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will > > then reject. > > > > With this, Clang can finally build an i386 defconfig. > > > > Reported-by: Arnd Bergmann > > Reported-by: David Woodhouse > > Reported-by: Dmitry Golovin > > Reported-by: Linus Torvalds > > Link: https://bugs.llvm.org/show_bug.cgi?id=33587 > > Link: https://github.com/ClangBuiltLinux/linux/issues/3 > > Link: https://github.com/ClangBuiltLinux/linux/issues/194 > > Link: https://github.com/ClangBuiltLinux/linux/issues/781 > > Link: > > https://lore.kernel.org/lkml/20180209161833.4605-1-dw...@infradead.org/ > > Link: > > https://lore.kernel.org/lkml/cak8p3a1ebawdbaezirfdsghvjmtwjunt2hgg8z+vpxenhwe...@mail.gmail.com/ > > Signed-off-by: Nick Desaulniers > > --- > > Note: this is a resend of: > > https://lore.kernel.org/lkml/20180209161833.4605-1-dw...@infradead.org/ > > rebased on today's Linux next, and with the additional change to > > uaccess.h. > > > > I'm happy to resend with authorship and reported by tags changed to > > suggested by's or whatever, just let me know. > > > > Part of the commit message is stolen from David, and part from Linus. > > Shall I resend with David's authorship and > > [Nick: reworded] > > ??? > > > > I don't really care, I just don't really want to carry this out of tree > > for our CI, which is green for i386 otherwise. > > > > > > arch/x86/include/asm/percpu.h | 12 > > arch/x86/include/asm/uaccess.h | 4 +++- > > 2 files changed, 11 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h > > index 2278797c769d..826d086f71c9 100644 > > --- a/arch/x86/include/asm/percpu.h > > +++ b/arch/x86/include/asm/percpu.h > > @@ -99,7 +99,7 @@ do { \ > > case 1: \ > > asm qual (op "b %1,"__percpu_arg(0) \ > > : "+m" (var)\ > > - : "qi" ((pto_T__)(val))); \ > > + : "qi" ((unsigned char)(unsigned long)(val))); \ > > break; \ > > case 2: \ > > asm qual (op "w %1,"__percpu_arg(0) \ > > @@ -144,7 +144,7 @@ do { > > \ > > else\ > > asm qual ("addb %1, "__percpu_arg(0)\ > > : "+m" (var)\ > > - : "qi" ((pao_T__)(val))); \ > > + : "qi" ((unsigned char)(unsigned long)(val))); \ > > break; \ > > case 2: \ > > if (pao_ID__ == 1) \ > > @@ -182,12 +182,14 @@ do { > > \ > > > > #define
Re: [PATCH] x86: support i386 with Clang
Bumping for comment+review. On Mon, May 4, 2020 at 4:03 PM Nick Desaulniers wrote: > > GCC and Clang are architecturally different, which leads to subtle > issues for code that's invalid but clearly dead. This can happen with > code that emulates polymorphism with the preprocessor and sizeof. > > GCC will perform semantic analysis after early inlining and dead code > elimination, so it will not warn on invalid code that's dead. Clang > strictly performs optimizations after semantic analysis, so it will warn > for dead code. > > Neither Clang nor GCC like this very much with -m32: > > long long ret; > asm ("movb $5, %0" : "=q" (ret)); > > However, GCC can tolerate this variant: > > long long ret; > switch (sizeof(ret)) { > case 1: > asm ("movb $5, %0" : "=q" (ret)); > break; > case 8:; > } > > Clang, on the other hand, won't accept that because it validates the > inline asm for the '1' case *before* the optimisation phase where it > realises that it wouldn't have to emit it anyway. > > If LLVM (Clang's "back end") fails such as during instruction selection > or register allocation, it cannot provide accurate diagnostics > (warnings/errors) that contain line information, as the AST has been > discarded from memory at that point. > > While there have been early discussions about having C/C++ specific > language optimizations in Clang via the use of MLIR, which would enable > such earlier optimizations, such work is not scoped and likely a > multi-year endeavor. > > We also don't want to swap the use of "=q" with "=r". For 64b, it > doesn't matter. For 32b, it's possible that a 32b register without a 8b > lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will > then reject. > > With this, Clang can finally build an i386 defconfig. > > Reported-by: Arnd Bergmann > Reported-by: David Woodhouse > Reported-by: Dmitry Golovin > Reported-by: Linus Torvalds > Link: https://bugs.llvm.org/show_bug.cgi?id=33587 > Link: https://github.com/ClangBuiltLinux/linux/issues/3 > Link: https://github.com/ClangBuiltLinux/linux/issues/194 > Link: https://github.com/ClangBuiltLinux/linux/issues/781 > Link: https://lore.kernel.org/lkml/20180209161833.4605-1-dw...@infradead.org/ > Link: > https://lore.kernel.org/lkml/cak8p3a1ebawdbaezirfdsghvjmtwjunt2hgg8z+vpxenhwe...@mail.gmail.com/ > Signed-off-by: Nick Desaulniers > --- > Note: this is a resend of: > https://lore.kernel.org/lkml/20180209161833.4605-1-dw...@infradead.org/ > rebased on today's Linux next, and with the additional change to > uaccess.h. > > I'm happy to resend with authorship and reported by tags changed to > suggested by's or whatever, just let me know. > > Part of the commit message is stolen from David, and part from Linus. > Shall I resend with David's authorship and > [Nick: reworded] > ??? > > I don't really care, I just don't really want to carry this out of tree > for our CI, which is green for i386 otherwise. > > > arch/x86/include/asm/percpu.h | 12 > arch/x86/include/asm/uaccess.h | 4 +++- > 2 files changed, 11 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h > index 2278797c769d..826d086f71c9 100644 > --- a/arch/x86/include/asm/percpu.h > +++ b/arch/x86/include/asm/percpu.h > @@ -99,7 +99,7 @@ do { \ > case 1: \ > asm qual (op "b %1,"__percpu_arg(0) \ > : "+m" (var)\ > - : "qi" ((pto_T__)(val))); \ > + : "qi" ((unsigned char)(unsigned long)(val))); \ > break; \ > case 2: \ > asm qual (op "w %1,"__percpu_arg(0) \ > @@ -144,7 +144,7 @@ do { > \ > else\ > asm qual ("addb %1, "__percpu_arg(0)\ > : "+m" (var)\ > - : "qi" ((pao_T__)(val))); \ > + : "qi" ((unsigned char)(unsigned long)(val))); \ > break; \ > case 2: \ > if (pao_ID__ == 1) \ > @@ -182,12 +182,14 @@ do { > \ > > #define percpu_from_op(qual, op, var) \ > ({ \ > + unsigned char pfo_u8__; \ > typeof(var) pfo_ret__; \ > switch (sizeof(var)) { \ > case
[PATCH] x86: support i386 with Clang
GCC and Clang are architecturally different, which leads to subtle issues for code that's invalid but clearly dead. This can happen with code that emulates polymorphism with the preprocessor and sizeof. GCC will perform semantic analysis after early inlining and dead code elimination, so it will not warn on invalid code that's dead. Clang strictly performs optimizations after semantic analysis, so it will warn for dead code. Neither Clang nor GCC like this very much with -m32: long long ret; asm ("movb $5, %0" : "=q" (ret)); However, GCC can tolerate this variant: long long ret; switch (sizeof(ret)) { case 1: asm ("movb $5, %0" : "=q" (ret)); break; case 8:; } Clang, on the other hand, won't accept that because it validates the inline asm for the '1' case *before* the optimisation phase where it realises that it wouldn't have to emit it anyway. If LLVM (Clang's "back end") fails such as during instruction selection or register allocation, it cannot provide accurate diagnostics (warnings/errors) that contain line information, as the AST has been discarded from memory at that point. While there have been early discussions about having C/C++ specific language optimizations in Clang via the use of MLIR, which would enable such earlier optimizations, such work is not scoped and likely a multi-year endeavor. We also don't want to swap the use of "=q" with "=r". For 64b, it doesn't matter. For 32b, it's possible that a 32b register without a 8b lower alias (i.e. ESI, EDI, EBP) is selected which the assembler will then reject. With this, Clang can finally build an i386 defconfig. Reported-by: Arnd Bergmann Reported-by: David Woodhouse Reported-by: Dmitry Golovin Reported-by: Linus Torvalds Link: https://bugs.llvm.org/show_bug.cgi?id=33587 Link: https://github.com/ClangBuiltLinux/linux/issues/3 Link: https://github.com/ClangBuiltLinux/linux/issues/194 Link: https://github.com/ClangBuiltLinux/linux/issues/781 Link: https://lore.kernel.org/lkml/20180209161833.4605-1-dw...@infradead.org/ Link: https://lore.kernel.org/lkml/cak8p3a1ebawdbaezirfdsghvjmtwjunt2hgg8z+vpxenhwe...@mail.gmail.com/ Signed-off-by: Nick Desaulniers --- Note: this is a resend of: https://lore.kernel.org/lkml/20180209161833.4605-1-dw...@infradead.org/ rebased on today's Linux next, and with the additional change to uaccess.h. I'm happy to resend with authorship and reported by tags changed to suggested by's or whatever, just let me know. Part of the commit message is stolen from David, and part from Linus. Shall I resend with David's authorship and [Nick: reworded] ??? I don't really care, I just don't really want to carry this out of tree for our CI, which is green for i386 otherwise. arch/x86/include/asm/percpu.h | 12 arch/x86/include/asm/uaccess.h | 4 +++- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 2278797c769d..826d086f71c9 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -99,7 +99,7 @@ do { \ case 1: \ asm qual (op "b %1,"__percpu_arg(0) \ : "+m" (var)\ - : "qi" ((pto_T__)(val))); \ + : "qi" ((unsigned char)(unsigned long)(val))); \ break; \ case 2: \ asm qual (op "w %1,"__percpu_arg(0) \ @@ -144,7 +144,7 @@ do { \ else\ asm qual ("addb %1, "__percpu_arg(0)\ : "+m" (var)\ - : "qi" ((pao_T__)(val))); \ + : "qi" ((unsigned char)(unsigned long)(val))); \ break; \ case 2: \ if (pao_ID__ == 1) \ @@ -182,12 +182,14 @@ do { \ #define percpu_from_op(qual, op, var) \ ({ \ + unsigned char pfo_u8__; \ typeof(var) pfo_ret__; \ switch (sizeof(var)) { \ case 1: \ asm qual (op "b "__percpu_arg(1)",%0" \ - : "=q" (pfo_ret__) \ + : "=q" (pfo_u8__) \ : "m" (var)); \ +