RE: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-24 Thread David Laight
From: David Laight
> Sent: 23 October 2020 22:52
...
> Could do_put_user() do an initial check for 64 bit
> then expand a different #define that contains the actual
> code passing either "a" or "A" for the constriant.
> 
> Apart from another level of indirection nothing is duplicated.

This code seems to compile to something sensible.
It does need change the registers that get_user_n() must
use - the normal return value is now in %ax (and %dx for
64bit values on 32bit systems, with the error in %cx.
(I've not actually tested it.)

#define __inttype_max(x, _max) __typeof__(  \
__typefits(x,char,  \
  __typefits(x,short,   \
__typefits(x,int,   \
  __typefits(x,long,_max)

#define __inttype(x) __inttype_max(x, 0ULL)

#define get_user_1(x, ptr, type, constraint)\
({  \
int __ret_gu;   \
type __val_gu;  \
asm volatile("call __get_user_%P4"  \
 : "=c" (__ret_gu), constraint (__val_gu),  \
ASM_CALL_CONSTRAINT \
 : "a" (ptr), "i" (sizeof(*(ptr;\
(x) = (__force __typeof__(*(ptr))) __val_gu;\
__builtin_expect(__ret_gu, 0);  \
})

#define get_user(x, ptr)\
({  \
__chk_user_ptr(ptr);\
might_fault();  \
(sizeof *(ptr) > sizeof(long))  \
? get_user_1(x, ptr, long long, "=A")   \
: get_user_1(x, ptr, __inttype_max(*(ptr),0ul), "=a");  \
})

The __inttype_max() is needed (I think) because clang will try (and fail)
to generate the asm for 64bit values on 32bit systems.
So the type needs limiting to 32bits.
Always using 'long' works - but generates extra casts.

The "=A" constraint (%rax or %rdx) is never used on 64bit because
the test is always false.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


RE: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread hpa
On October 23, 2020 2:52:16 PM PDT, David Laight  
wrote:
>From: Linus Torvalds
>> Sent: 23 October 2020 22:11
>> 
>> On Fri, Oct 23, 2020 at 2:00 PM  wrote:
>> >
>> > There is no same reason to mess around with hacks when we are
>talking about dx:ax, though.
>> 
>> Sure there is.
>> 
>> "A" doesn't actually mean %edx:%eax like you seem to think.
>> 
>> It actually means %eax OR %edx, and then if given a 64-bit value, it
>> will use the combination (with %edx being the high bits).
>> 
>> So using "A" unconditionally doesn't work - it gives random behavior
>> for 32-bit (or smaller) types.
>> 
>> Or you'd have to cast the value to always be 64-bit, and have the
>> extra code generation.
>> 
>> IOW, an unconditional "A" is wrong.
>> 
>> And the alternative is to just duplicate things, and go back to the
>> explicit size testing, but honestly, I really think that's much worse
>> than relying on a documented feature of "register asm()" that gcc
>> _documents_ is for this kind of inline asm use.
>
>Could do_put_user() do an initial check for 64 bit
>then expand a different #define that contains the actual
>code passing either "a" or "A" for the constriant.
>
>Apart from another level of indirection nothing is duplicated.
>
>   David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

Maybe #define ASM_AX64 or some such.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread hpa
On October 23, 2020 2:11:19 PM PDT, Linus Torvalds 
 wrote:
>On Fri, Oct 23, 2020 at 2:00 PM  wrote:
>>
>> There is no same reason to mess around with hacks when we are talking
>about dx:ax, though.
>
>Sure there is.
>
>"A" doesn't actually mean %edx:%eax like you seem to think.
>
>It actually means %eax OR %edx, and then if given a 64-bit value, it
>will use the combination (with %edx being the high bits).
>
>So using "A" unconditionally doesn't work - it gives random behavior
>for 32-bit (or smaller) types.
>
>Or you'd have to cast the value to always be 64-bit, and have the
>extra code generation.
>
>IOW, an unconditional "A" is wrong.
>
>And the alternative is to just duplicate things, and go back to the
>explicit size testing, but honestly, I really think that's much worse
>than relying on a documented feature of "register asm()" that gcc
>_documents_ is for this kind of inline asm use.
>
>So the "don't do pointless conditional duplication" is certainly a
>very sane reason for the code.
>
>Linus

Unconditional "A" is definitely wrong, no argument there.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


RE: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread David Laight
From: Linus Torvalds
> Sent: 23 October 2020 22:11
> 
> On Fri, Oct 23, 2020 at 2:00 PM  wrote:
> >
> > There is no same reason to mess around with hacks when we are talking about 
> > dx:ax, though.
> 
> Sure there is.
> 
> "A" doesn't actually mean %edx:%eax like you seem to think.
> 
> It actually means %eax OR %edx, and then if given a 64-bit value, it
> will use the combination (with %edx being the high bits).
> 
> So using "A" unconditionally doesn't work - it gives random behavior
> for 32-bit (or smaller) types.
> 
> Or you'd have to cast the value to always be 64-bit, and have the
> extra code generation.
> 
> IOW, an unconditional "A" is wrong.
> 
> And the alternative is to just duplicate things, and go back to the
> explicit size testing, but honestly, I really think that's much worse
> than relying on a documented feature of "register asm()" that gcc
> _documents_ is for this kind of inline asm use.

Could do_put_user() do an initial check for 64 bit
then expand a different #define that contains the actual
code passing either "a" or "A" for the constriant.

Apart from another level of indirection nothing is duplicated.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread Linus Torvalds
On Fri, Oct 23, 2020 at 2:00 PM  wrote:
>
> There is no same reason to mess around with hacks when we are talking about 
> dx:ax, though.

Sure there is.

"A" doesn't actually mean %edx:%eax like you seem to think.

It actually means %eax OR %edx, and then if given a 64-bit value, it
will use the combination (with %edx being the high bits).

So using "A" unconditionally doesn't work - it gives random behavior
for 32-bit (or smaller) types.

Or you'd have to cast the value to always be 64-bit, and have the
extra code generation.

IOW, an unconditional "A" is wrong.

And the alternative is to just duplicate things, and go back to the
explicit size testing, but honestly, I really think that's much worse
than relying on a documented feature of "register asm()" that gcc
_documents_ is for this kind of inline asm use.

So the "don't do pointless conditional duplication" is certainly a
very sane reason for the code.

Linus


Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread hpa
On October 23, 2020 1:55:22 PM PDT, Linus Torvalds 
 wrote:
>Thanks, applied.
>
>On Fri, Oct 23, 2020 at 1:32 PM Rasmus Villemoes
> wrote:
>>
>> I'm wondering if one would also need to make __ptr_pu and __ret_pu
>> explicitly "%"_ASM_CX".
>
>No, the "c"/"0" thing is much better, and makes it properly atomic wrt
>the actual asm.
>
>As mentioned to Andy, the "register asm()" thing is not uncommon and
>often useful, but when you can specify the register directly in asm,
>that's certainly simpler and more straightforward and preferred.
>
>  Linus

There is no same reason to mess around with hacks when we are talking about 
dx:ax, though. We have to do pretty ugly hacks when other register pairs are 
involved, but "A" is there for a reason. _ASM_AX64 maybe...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread Linus Torvalds
Thanks, applied.

On Fri, Oct 23, 2020 at 1:32 PM Rasmus Villemoes
 wrote:
>
> I'm wondering if one would also need to make __ptr_pu and __ret_pu
> explicitly "%"_ASM_CX".

No, the "c"/"0" thing is much better, and makes it properly atomic wrt
the actual asm.

As mentioned to Andy, the "register asm()" thing is not uncommon and
often useful, but when you can specify the register directly in asm,
that's certainly simpler and more straightforward and preferred.

  Linus


Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread hpa
On October 23, 2020 1:42:39 PM PDT, Andy Lutomirski  wrote:
>On Fri, Oct 23, 2020 at 1:32 PM Rasmus Villemoes
> wrote:
>>
>> Quoting
>> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html:
>>
>>   You can define a local register variable and associate it with a
>>   specified register...
>>
>>   The only supported use for this feature is to specify registers for
>>   input and output operands when calling Extended asm (see Extended
>>   Asm). This may be necessary if the constraints for a particular
>>   machine don't provide sufficient control to select the desired
>>   register.
>>
>> On 32-bit x86, this is used to ensure that gcc will put an 8-byte
>> value into the %edx:%eax pair, while all other cases will just use
>the
>> single register %eax (%rax on x86-64). While the _ASM_AX actually
>just
>> expands to "%eax", note this comment next to get_user() which does
>> something very similar:
>>
>>  * The use of _ASM_DX as the register specifier is a bit of a
>>  * simplification, as gcc only cares about it as the starting point
>>  * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
>>  * (%ecx being the next register in gcc's x86 register sequence), and
>>  * %rdx on 64 bits.
>>
>> However, getting this to work requires that there is no code between
>> the assignment to the local register variable and its use as an input
>> to the asm() which can possibly clobber any of the registers involved
>> - including evaluation of the expressions making up other inputs.
>
>This looks like the patch is an improvement, but this is still IMO
>likely to be very fragile.  Can we just do the size-dependent "a" vs
>"A" selection method instead?  Sure, it's a little more code, but it
>will be Obviously Correct.  As it stands, I can easily see our code
>failing on some gcc or clang version and the compiler authors telling
>us that we're relying on unsupportable behavior.

Yeah, the reason get_user hacks it is because there is no equivalent to "A" for 
other register pairs, but not using it for dx:ax is just silly.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread Linus Torvalds
On Fri, Oct 23, 2020 at 1:42 PM Andy Lutomirski  wrote:
>
> This looks like the patch is an improvement, but this is still IMO
> likely to be very fragile.  Can we just do the size-dependent "a" vs
> "A" selection method instead?  Sure, it's a little more code, but it
> will be Obviously Correct.  As it stands, I can easily see our code
> failing on some gcc or clang version and the compiler authors telling
> us that we're relying on unsupportable behavior.

Well, the "register asm()" thing actually _is_ documented, and it's
something we've used before too (and still use in other places).

We actually have quite a bit of them - just do a

git grep 'register.*asm('

to see literally hundreds of them. So gcc (and clang) actually has a
lot of test-cases for it.

In many ways, x86 actually tends to need _fewer_ of these than most
other architectures, since on x86, we almost always have specific
register naming in ways that other architectures often don't (because
x86 has all those specific register rules).

So the 64-bit issue for x86-32 is a bit of a special case for x86, but
this is in no way a special case in general, and is quite the common
pattern on other architectures.

The fact that KASAN could generate code in between the register asm
assignment was something I just overlooked (and embarrassingly similar
to the issues we had with objdump checking STAC/CLAC back when we
inlined that all).

It's a bit sad that gcc doesn't _warn_ about this kind of issue, since
the compiler certainly should see "oh, we just had a register clash".
But it is what it is..

Linus


Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread Andy Lutomirski
On Fri, Oct 23, 2020 at 1:32 PM Rasmus Villemoes
 wrote:
>
> Quoting
> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html:
>
>   You can define a local register variable and associate it with a
>   specified register...
>
>   The only supported use for this feature is to specify registers for
>   input and output operands when calling Extended asm (see Extended
>   Asm). This may be necessary if the constraints for a particular
>   machine don't provide sufficient control to select the desired
>   register.
>
> On 32-bit x86, this is used to ensure that gcc will put an 8-byte
> value into the %edx:%eax pair, while all other cases will just use the
> single register %eax (%rax on x86-64). While the _ASM_AX actually just
> expands to "%eax", note this comment next to get_user() which does
> something very similar:
>
>  * The use of _ASM_DX as the register specifier is a bit of a
>  * simplification, as gcc only cares about it as the starting point
>  * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
>  * (%ecx being the next register in gcc's x86 register sequence), and
>  * %rdx on 64 bits.
>
> However, getting this to work requires that there is no code between
> the assignment to the local register variable and its use as an input
> to the asm() which can possibly clobber any of the registers involved
> - including evaluation of the expressions making up other inputs.

This looks like the patch is an improvement, but this is still IMO
likely to be very fragile.  Can we just do the size-dependent "a" vs
"A" selection method instead?  Sure, it's a little more code, but it
will be Obviously Correct.  As it stands, I can easily see our code
failing on some gcc or clang version and the compiler authors telling
us that we're relying on unsupportable behavior.