Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Ard Biesheuvel
On Mon, 30 Nov 2020 at 18:52, Nicolas Pitre  wrote:
>
> On Mon, 30 Nov 2020, Ard Biesheuvel wrote:
>
> > On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre  wrote:
> >
> > > Here's my version of the fix which should be correct. Warning: this
> > > is completely untested, but should in theory produce the same code on
> > > modern gcc.
> > >
> > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > index 898e9c78a7..595e538f5b 100644
> > > --- a/arch/arm/include/asm/div64.h
> > > +++ b/arch/arm/include/asm/div64.h
> > > @@ -21,29 +21,20 @@
> > >   * assembly implementation with completely non standard calling 
> > > convention
> > >   * for arguments and results (beware).
> > >   */
> > > -
> > > -#ifdef __ARMEB__
> > > -#define __xh "r0"
> > > -#define __xl "r1"
> > > -#else
> > > -#define __xl "r0"
> > > -#define __xh "r1"
> > > -#endif
> > > -
> > >  static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> > >  {
> > > register unsigned int __base  asm("r4") = base;
> > > register unsigned long long __n   asm("r0") = *n;
> > > register unsigned long long __res asm("r2");
> > > -   register unsigned int __rem   asm(__xh);
> > > -   asm(__asmeq("%0", __xh)
> > > +   unsigned int __rem;
> > > +   asm(__asmeq("%0", "r0")
> > > __asmeq("%1", "r2")
> > > -   __asmeq("%2", "r0")
> > > -   __asmeq("%3", "r4")
> > > +   __asmeq("%2", "r4")
> > > "bl __do_div64"
> > > -   : "=r" (__rem), "=r" (__res)
> > > -   : "r" (__n), "r" (__base)
> > > +   : "+r" (__n), "=r" (__res)
> > > +   : "r" (__base)
> > > : "ip", "lr", "cc");
> > > +   __rem = __n >> 32;
> >
> > This treats {r0, r1} as a {low, high} pair, regardless of endianness,
> > and so it puts the value of r0 into r1. Doesn't that mean the shift
> > should only be done on little endian?
>
> Not quite. r0-r1 = low-high is for little endian. Then "__n >> 32" is
> actually translated into "mov r0, r1" to move it into __rem and returned
> through r0.
>
> On big endial it is r0-r1 = high-low.  Here "__n >> 32" picks r0 and
> moves it to __rem which is returned through r0 so no extra instruction
> needed.
>
> Of course the function is inlined so r0 can be anything, or optimized
> away if__rem is not used.
>

OK, you're right. I got myself confused there, but a quick test with
GCC confirms your explanation:

$ arm-linux-gnueabihf-gcc -mbig-endian -O2 -S -o - \
   -xc - <<<"long f(long long l) { return l >> 32; }"

just produces

bx lr

whereas removing the -mbig-endian gives

mov r0, r1
bx lr


I tested the change and it builds and runs fine (although I am not
sure how much coverage this code gets on an ordinary boot):

Reviewed-by: Ard Biesheuvel 
Tested-by: Ard Biesheuvel 


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Nicolas Pitre
On Mon, 30 Nov 2020, Ard Biesheuvel wrote:

> On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre  wrote:
> 
> > Here's my version of the fix which should be correct. Warning: this
> > is completely untested, but should in theory produce the same code on
> > modern gcc.
> >
> > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > index 898e9c78a7..595e538f5b 100644
> > --- a/arch/arm/include/asm/div64.h
> > +++ b/arch/arm/include/asm/div64.h
> > @@ -21,29 +21,20 @@
> >   * assembly implementation with completely non standard calling convention
> >   * for arguments and results (beware).
> >   */
> > -
> > -#ifdef __ARMEB__
> > -#define __xh "r0"
> > -#define __xl "r1"
> > -#else
> > -#define __xl "r0"
> > -#define __xh "r1"
> > -#endif
> > -
> >  static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
> >  {
> > register unsigned int __base  asm("r4") = base;
> > register unsigned long long __n   asm("r0") = *n;
> > register unsigned long long __res asm("r2");
> > -   register unsigned int __rem   asm(__xh);
> > -   asm(__asmeq("%0", __xh)
> > +   unsigned int __rem;
> > +   asm(__asmeq("%0", "r0")
> > __asmeq("%1", "r2")
> > -   __asmeq("%2", "r0")
> > -   __asmeq("%3", "r4")
> > +   __asmeq("%2", "r4")
> > "bl __do_div64"
> > -   : "=r" (__rem), "=r" (__res)
> > -   : "r" (__n), "r" (__base)
> > +   : "+r" (__n), "=r" (__res)
> > +   : "r" (__base)
> > : "ip", "lr", "cc");
> > +   __rem = __n >> 32;
> 
> This treats {r0, r1} as a {low, high} pair, regardless of endianness,
> and so it puts the value of r0 into r1. Doesn't that mean the shift
> should only be done on little endian?

Not quite. r0-r1 = low-high is for little endian. Then "__n >> 32" is 
actually translated into "mov r0, r1" to move it into __rem and returned 
through r0.

On big endial it is r0-r1 = high-low. Here "__n >> 32" picks r0 and 
moves it to __rem which is returned through r0 so no extra instruction 
needed.

Of course the function is inlined so r0 can be anything, or optimized 
away if__rem is not used.


Nicolas


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Ard Biesheuvel
On Mon, 30 Nov 2020 at 16:51, Nicolas Pitre  wrote:
>
> On Mon, 30 Nov 2020, Ard Biesheuvel wrote:
>
> > (+ Nico)
> >
> > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel  wrote:
> > >
> > > On Mon, 23 Nov 2020 at 08:39, Antony Yu  wrote:
> > > >
> > > > __do_div64 clobbers the input register r0 in little endian system.
> > > > According to the inline assembly document, if an input operand is
> > > > modified, it should be tied to a output operand. This patch can
> > > > prevent compilers from reusing r0 register after asm statements.
> > > >
> > > > Signed-off-by: Antony Yu 
> > > > ---
> > > >  arch/arm/include/asm/div64.h | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > > index 898e9c78a7e7..809efc51e90f 100644
> > > > --- a/arch/arm/include/asm/div64.h
> > > > +++ b/arch/arm/include/asm/div64.h
> > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, 
> > > > uint32_t base)
> > > > asm(__asmeq("%0", __xh)
> > > > __asmeq("%1", "r2")
> > > > __asmeq("%2", "r0")
> > > > -   __asmeq("%3", "r4")
> > > > +   __asmeq("%3", "r0")
> > > > +   __asmeq("%4", "r4")
> > > > "bl __do_div64"
> > > > -   : "=r" (__rem), "=r" (__res)
> > > > +   : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > > : "r" (__n), "r" (__base)
> > > > : "ip", "lr", "cc");
> > > > *n = __res;
> > > > --
> > > > 2.23.0
> > > >
> > >
> > > Agree that using r0 as an input operand only is incorrect, and not
> > > only on Clang. The compiler might assume that r0 will retain its value
> > > across the asm() block, which is obviously not the case.
>
> You're right.
>
> This was done like that most likely to work around some stupid code
> generation with "__n >> 32" while using gcc from about 20 years ago. IOW
> I don't exactly remember why I did it like that, but it is certainly
> flawed.
>
> Here's my version of the fix which should be correct. Warning: this
> is completely untested, but should in theory produce the same code on
> modern gcc.
>
> diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> index 898e9c78a7..595e538f5b 100644
> --- a/arch/arm/include/asm/div64.h
> +++ b/arch/arm/include/asm/div64.h
> @@ -21,29 +21,20 @@
>   * assembly implementation with completely non standard calling convention
>   * for arguments and results (beware).
>   */
> -
> -#ifdef __ARMEB__
> -#define __xh "r0"
> -#define __xl "r1"
> -#else
> -#define __xl "r0"
> -#define __xh "r1"
> -#endif
> -
>  static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
>  {
> register unsigned int __base  asm("r4") = base;
> register unsigned long long __n   asm("r0") = *n;
> register unsigned long long __res asm("r2");
> -   register unsigned int __rem   asm(__xh);
> -   asm(__asmeq("%0", __xh)
> +   unsigned int __rem;
> +   asm(__asmeq("%0", "r0")
> __asmeq("%1", "r2")
> -   __asmeq("%2", "r0")
> -   __asmeq("%3", "r4")
> +   __asmeq("%2", "r4")
> "bl __do_div64"
> -   : "=r" (__rem), "=r" (__res)
> -   : "r" (__n), "r" (__base)
> +   : "+r" (__n), "=r" (__res)
> +   : "r" (__base)
> : "ip", "lr", "cc");
> +   __rem = __n >> 32;

This treats {r0, r1} as a {low, high} pair, regardless of endianness,
and so it puts the value of r0 into r1. Doesn't that mean the shift
should only be done on little endian?


> *n = __res;
> return __rem;
>  }
>
> I'll submit it if someone confirms it works.
>
>
> Nicolas
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Nicolas Pitre
On Mon, 30 Nov 2020, Ard Biesheuvel wrote:

> (+ Nico)
> 
> On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel  wrote:
> >
> > On Mon, 23 Nov 2020 at 08:39, Antony Yu  wrote:
> > >
> > > __do_div64 clobbers the input register r0 in little endian system.
> > > According to the inline assembly document, if an input operand is
> > > modified, it should be tied to a output operand. This patch can
> > > prevent compilers from reusing r0 register after asm statements.
> > >
> > > Signed-off-by: Antony Yu 
> > > ---
> > >  arch/arm/include/asm/div64.h | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > index 898e9c78a7e7..809efc51e90f 100644
> > > --- a/arch/arm/include/asm/div64.h
> > > +++ b/arch/arm/include/asm/div64.h
> > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, 
> > > uint32_t base)
> > > asm(__asmeq("%0", __xh)
> > > __asmeq("%1", "r2")
> > > __asmeq("%2", "r0")
> > > -   __asmeq("%3", "r4")
> > > +   __asmeq("%3", "r0")
> > > +   __asmeq("%4", "r4")
> > > "bl __do_div64"
> > > -   : "=r" (__rem), "=r" (__res)
> > > +   : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > : "r" (__n), "r" (__base)
> > > : "ip", "lr", "cc");
> > > *n = __res;
> > > --
> > > 2.23.0
> > >
> >
> > Agree that using r0 as an input operand only is incorrect, and not
> > only on Clang. The compiler might assume that r0 will retain its value
> > across the asm() block, which is obviously not the case.

You're right.

This was done like that most likely to work around some stupid code 
generation with "__n >> 32" while using gcc from about 20 years ago. IOW 
I don't exactly remember why I did it like that, but it is certainly 
flawed.

Here's my version of the fix which should be correct. Warning: this 
is completely untested, but should in theory produce the same code on 
modern gcc.

diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
index 898e9c78a7..595e538f5b 100644
--- a/arch/arm/include/asm/div64.h
+++ b/arch/arm/include/asm/div64.h
@@ -21,29 +21,20 @@
  * assembly implementation with completely non standard calling convention
  * for arguments and results (beware).
  */
-
-#ifdef __ARMEB__
-#define __xh "r0"
-#define __xl "r1"
-#else
-#define __xl "r0"
-#define __xh "r1"
-#endif
-
 static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
 {
register unsigned int __base  asm("r4") = base;
register unsigned long long __n   asm("r0") = *n;
register unsigned long long __res asm("r2");
-   register unsigned int __rem   asm(__xh);
-   asm(__asmeq("%0", __xh)
+   unsigned int __rem;
+   asm(__asmeq("%0", "r0")
__asmeq("%1", "r2")
-   __asmeq("%2", "r0")
-   __asmeq("%3", "r4")
+   __asmeq("%2", "r4")
"bl __do_div64"
-   : "=r" (__rem), "=r" (__res)
-   : "r" (__n), "r" (__base)
+   : "+r" (__n), "=r" (__res)
+   : "r" (__base)
: "ip", "lr", "cc");
+   __rem = __n >> 32;
*n = __res;
return __rem;
 }

I'll submit it if someone confirms it works.


Nicolas


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Russell King - ARM Linux admin
On Mon, Nov 30, 2020 at 01:58:27PM +, David Laight wrote:
> > And actually, the same applies on BE, but the other way around. So we
> > should mark __xl as an output register as well, as __xl will assume
> > the right value depending on the endianness.
> 
> Why not use "+r" to indicate than an 'output' parameter is also
> used as an input.
> 
> Rather cleaner than specifying the same C variable as both
> input and output.

You have an incorrect understanding. "__n" is the input operand in r0.
"__rem" is the output operand in r0/r1.

No single C variable is used as both an input and an output.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


RE: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread David Laight
> And actually, the same applies on BE, but the other way around. So we
> should mark __xl as an output register as well, as __xl will assume
> the right value depending on the endianness.

Why not use "+r" to indicate than an 'output' parameter is also
used as an input.

Rather cleaner than specifying the same C variable as both
input and output.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Ard Biesheuvel
On Mon, 30 Nov 2020 at 11:21, Russell King - ARM Linux admin
 wrote:
>
> On Mon, Nov 30, 2020 at 11:12:33AM +0100, Ard Biesheuvel wrote:
> > (+ Nico)
> >
> > On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel  wrote:
> > >
> > > On Mon, 23 Nov 2020 at 08:39, Antony Yu  wrote:
> > > >
> > > > __do_div64 clobbers the input register r0 in little endian system.
> > > > According to the inline assembly document, if an input operand is
> > > > modified, it should be tied to a output operand. This patch can
> > > > prevent compilers from reusing r0 register after asm statements.
> > > >
> > > > Signed-off-by: Antony Yu 
> > > > ---
> > > >  arch/arm/include/asm/div64.h | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > > index 898e9c78a7e7..809efc51e90f 100644
> > > > --- a/arch/arm/include/asm/div64.h
> > > > +++ b/arch/arm/include/asm/div64.h
> > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, 
> > > > uint32_t base)
> > > > asm(__asmeq("%0", __xh)
> > > > __asmeq("%1", "r2")
> > > > __asmeq("%2", "r0")
> > > > -   __asmeq("%3", "r4")
> > > > +   __asmeq("%3", "r0")
> > > > +   __asmeq("%4", "r4")
> > > > "bl __do_div64"
> > > > -   : "=r" (__rem), "=r" (__res)
> > > > +   : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > > : "r" (__n), "r" (__base)
> > > > : "ip", "lr", "cc");
> > > > *n = __res;
> > > > --
> > > > 2.23.0
> > > >
> > >
> > > Agree that using r0 as an input operand only is incorrect, and not
> > > only on Clang. The compiler might assume that r0 will retain its value
> > > across the asm() block, which is obviously not the case.
>
> However, you can _not_ have an asm block that names two outputs using
> the same physical register - that's why both the original patch and
> the posted v2 will fail.
>
> You also can't mark r0 as clobbered because it's used as an operand
> and that is not allowed by gcc.
>
> The fact is, we have two register variables occupying the same register,
> which are __n and __rem. It doesn't matter which endian-ness __rem is,
> r0 will be used for both __n (input) and __rem (output).
>

__rem is a 32-bit variable, so in LE mode, only r1 is used for __rem,
not r0. So r0/r1 are used as an input operand pair, and r1 is used as
an output operand.

So I don't think the compiler has to be buggy in order for it to
assume that r0 will still contain the low word of the dividend
afterwards.

And actually, the same applies on BE, but the other way around. So we
should mark __xl as an output register as well, as __xl will assume
the right value depending on the endianness.

I suggest something like the below,

diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
index 898e9c78a7e7..85ff9109595e 100644
--- a/arch/arm/include/asm/div64.h
+++ b/arch/arm/include/asm/div64.h
@@ -36,12 +36,14 @@ static inline uint32_t __div64_32(uint64_t *n,
uint32_t base)
register unsigned long long __n   asm("r0") = *n;
register unsigned long long __res asm("r2");
register unsigned int __rem   asm(__xh);
+   register unsigned int __dummy asm(__xl);
asm(__asmeq("%0", __xh)
__asmeq("%1", "r2")
-   __asmeq("%2", "r0")
-   __asmeq("%3", "r4")
+   __asmeq("%2", __xl)
+   __asmeq("%3", "r0")
+   __asmeq("%4", "r4")
"bl __do_div64"
-   : "=r" (__rem), "=r" (__res)
+   : "=r" (__rem), "=r" (__res), "=r"(__dummy)
: "r" (__n), "r" (__base)
: "ip", "lr", "cc");
*n = __res;



> If the compiler can't work out that if a physical register used as an
> output operand will be written by the assembler, then the compiler is
> quite simply buggy.
>
> The code is correct as it stands; Clang is buggy.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Russell King - ARM Linux admin
On Mon, Nov 30, 2020 at 11:12:33AM +0100, Ard Biesheuvel wrote:
> (+ Nico)
> 
> On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel  wrote:
> >
> > On Mon, 23 Nov 2020 at 08:39, Antony Yu  wrote:
> > >
> > > __do_div64 clobbers the input register r0 in little endian system.
> > > According to the inline assembly document, if an input operand is
> > > modified, it should be tied to a output operand. This patch can
> > > prevent compilers from reusing r0 register after asm statements.
> > >
> > > Signed-off-by: Antony Yu 
> > > ---
> > >  arch/arm/include/asm/div64.h | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > index 898e9c78a7e7..809efc51e90f 100644
> > > --- a/arch/arm/include/asm/div64.h
> > > +++ b/arch/arm/include/asm/div64.h
> > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, 
> > > uint32_t base)
> > > asm(__asmeq("%0", __xh)
> > > __asmeq("%1", "r2")
> > > __asmeq("%2", "r0")
> > > -   __asmeq("%3", "r4")
> > > +   __asmeq("%3", "r0")
> > > +   __asmeq("%4", "r4")
> > > "bl __do_div64"
> > > -   : "=r" (__rem), "=r" (__res)
> > > +   : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > : "r" (__n), "r" (__base)
> > > : "ip", "lr", "cc");
> > > *n = __res;
> > > --
> > > 2.23.0
> > >
> >
> > Agree that using r0 as an input operand only is incorrect, and not
> > only on Clang. The compiler might assume that r0 will retain its value
> > across the asm() block, which is obviously not the case.

However, you can _not_ have an asm block that names two outputs using
the same physical register - that's why both the original patch and
the posted v2 will fail.

You also can't mark r0 as clobbered because it's used as an operand
and that is not allowed by gcc.

The fact is, we have two register variables occupying the same register,
which are __n and __rem. It doesn't matter which endian-ness __rem is,
r0 will be used for both __n (input) and __rem (output).

If the compiler can't work out that if a physical register used as an
output operand will be written by the assembler, then the compiler is
quite simply buggy.

The code is correct as it stands; Clang is buggy.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Ard Biesheuvel
On Mon, 23 Nov 2020 at 08:39, Antony Yu  wrote:
>
> __do_div64 clobbers the input register r0 in little endian system.
> According to the inline assembly document, if an input operand is
> modified, it should be tied to a output operand. This patch can
> prevent compilers from reusing r0 register after asm statements.
>
> Signed-off-by: Antony Yu 
> ---
>  arch/arm/include/asm/div64.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> index 898e9c78a7e7..809efc51e90f 100644
> --- a/arch/arm/include/asm/div64.h
> +++ b/arch/arm/include/asm/div64.h
> @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t 
> base)
> asm(__asmeq("%0", __xh)
> __asmeq("%1", "r2")
> __asmeq("%2", "r0")
> -   __asmeq("%3", "r4")
> +   __asmeq("%3", "r0")
> +   __asmeq("%4", "r4")
> "bl __do_div64"
> -   : "=r" (__rem), "=r" (__res)
> +   : "=r" (__rem), "=r" (__res), "=r" (__n)
> : "r" (__n), "r" (__base)
> : "ip", "lr", "cc");
> *n = __res;
> --
> 2.23.0
>

Agree that using r0 as an input operand only is incorrect, and not
only on Clang. The compiler might assume that r0 will retain its value
across the asm() block, which is obviously not the case.

However, your patch will likely break big-endian, since in that case,
__xh == r0, and so it will appear twice.

Perhaps it would be better to change the type of __rem to unsigned
long long as well?


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-30 Thread Ard Biesheuvel
(+ Nico)

On Mon, 30 Nov 2020 at 11:11, Ard Biesheuvel  wrote:
>
> On Mon, 23 Nov 2020 at 08:39, Antony Yu  wrote:
> >
> > __do_div64 clobbers the input register r0 in little endian system.
> > According to the inline assembly document, if an input operand is
> > modified, it should be tied to a output operand. This patch can
> > prevent compilers from reusing r0 register after asm statements.
> >
> > Signed-off-by: Antony Yu 
> > ---
> >  arch/arm/include/asm/div64.h | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > index 898e9c78a7e7..809efc51e90f 100644
> > --- a/arch/arm/include/asm/div64.h
> > +++ b/arch/arm/include/asm/div64.h
> > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t 
> > base)
> > asm(__asmeq("%0", __xh)
> > __asmeq("%1", "r2")
> > __asmeq("%2", "r0")
> > -   __asmeq("%3", "r4")
> > +   __asmeq("%3", "r0")
> > +   __asmeq("%4", "r4")
> > "bl __do_div64"
> > -   : "=r" (__rem), "=r" (__res)
> > +   : "=r" (__rem), "=r" (__res), "=r" (__n)
> > : "r" (__n), "r" (__base)
> > : "ip", "lr", "cc");
> > *n = __res;
> > --
> > 2.23.0
> >
>
> Agree that using r0 as an input operand only is incorrect, and not
> only on Clang. The compiler might assume that r0 will retain its value
> across the asm() block, which is obviously not the case.
>
> However, your patch will likely break big-endian, since in that case,
> __xh == r0, and so it will appear twice.
>
> Perhaps it would be better to change the type of __rem to unsigned
> long long as well?


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-24 Thread kernel test robot
Hi Antony,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.10-rc5 next-20201124]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Antony-Yu/ARM-fix-__div64_32-error-when-compiling-with-clang/20201123-154454
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
418baf2c28f3473039f2f7377760bd8f6897ae18
config: arm-randconfig-r036-20201124 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
df9ae5992889560a8f3c6760b54d5051b47c7bf5)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# 
https://github.com/0day-ci/linux/commit/83df1f983ed4219556020bf30cc819e66bb7e69c
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Antony-Yu/ARM-fix-__div64_32-error-when-compiling-with-clang/20201123-154454
git checkout 83df1f983ed4219556020bf30cc819e66bb7e69c
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/tty/serial/imx.c:359:19: warning: unused function 
'imx_uart_is_imx21' [-Wunused-function]
   static inline int imx_uart_is_imx21(struct imx_port *sport)
 ^
   drivers/tty/serial/imx.c:364:19: warning: unused function 
'imx_uart_is_imx53' [-Wunused-function]
   static inline int imx_uart_is_imx53(struct imx_port *sport)
 ^
   drivers/tty/serial/imx.c:369:19: warning: unused function 
'imx_uart_is_imx6q' [-Wunused-function]
   static inline int imx_uart_is_imx6q(struct imx_port *sport)
 ^
   In file included from drivers/tty/serial/imx.c:11:
   In file included from include/linux/module.h:12:
   In file included from include/linux/list.h:9:
   In file included from include/linux/kernel.h:19:
>> arch/arm/include/asm/div64.h:39:2: error: multiple outputs to hard register: 
>> r0
   asm(__asmeq("%0", __xh)
   ^
   3 warnings and 1 error generated.
--
   In file included from drivers/gpu/drm/tegra/dc.c:7:
   In file included from include/linux/clk.h:13:
   In file included from include/linux/kernel.h:19:
>> arch/arm/include/asm/div64.h:39:2: error: multiple outputs to hard register: 
>> r0
   asm(__asmeq("%0", __xh)
   ^
   1 error generated.

vim +39 arch/arm/include/asm/div64.h

^1da177e4c3f415 include/asm-arm/div64.h  Linus Torvalds 2005-04-16  32  
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  33  
static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  34  {
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  35  
register unsigned int __base  asm("r4") = base;
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  36  
register unsigned long long __n   asm("r0") = *n;
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  37  
register unsigned long long __res asm("r2");
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  38  
register unsigned int __rem   asm(__xh);
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02 @39  
asm(__asmeq("%0", __xh)
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  40  
__asmeq("%1", "r2")
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  41  
__asmeq("%2", "r0")
83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu  2020-11-23  42  
__asmeq("%3", "r0")
83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu  2020-11-23  43  
__asmeq("%4", "r4")
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  44  
"bl __do_div64"
83df1f983ed4219 arch/arm/include/asm/div64.h Antony Yu  2020-11-23  45  
: "=r" (__rem), "=r" (__res), "=r" (__n)
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  46  
: "r" (__n), "r" (__base)
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  47  
: "ip", "lr", "cc");
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  48  
*n = __res;
040b323b5012b55 arch/arm/include/asm/div64.h Nicolas Pitre  2015-11-02  49  
return __rem;
040b323b5012b55 

Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-24 Thread Nick Desaulniers
Thanks for the report, it probably was not fun to debug. I'll take a
closer look at this after the Thanksgiving holiday.

On Tue, Nov 24, 2020 at 2:14 AM Antony Yu  wrote:
>
> Antony Yu  於 2020年11月24日 週二 下午3:43寫道:
> >
> > On Mon, Nov 23, 2020 at 11:16:02AM -0700, Nathan Chancellor wrote:
> > > On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote:
> > > > __do_div64 clobbers the input register r0 in little endian system.
> > > > According to the inline assembly document, if an input operand is
> > > > modified, it should be tied to a output operand. This patch can
> > > > prevent compilers from reusing r0 register after asm statements.
> > > >
> > > > Signed-off-by: Antony Yu 
> > > > ---
> > > >  arch/arm/include/asm/div64.h | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > > index 898e9c78a7e7..809efc51e90f 100644
> > > > --- a/arch/arm/include/asm/div64.h
> > > > +++ b/arch/arm/include/asm/div64.h
> > > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, 
> > > > uint32_t base)
> > > > asm(__asmeq("%0", __xh)
> > > > __asmeq("%1", "r2")
> > > > __asmeq("%2", "r0")
> > > > -   __asmeq("%3", "r4")
> > > > +   __asmeq("%3", "r0")
> > > > +   __asmeq("%4", "r4")
> > > > "bl __do_div64"
> > > > -   : "=r" (__rem), "=r" (__res)
> > > > +   : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > > : "r" (__n), "r" (__base)
> > > > : "ip", "lr", "cc");
> > > > *n = __res;
> > > > --
> > > > 2.23.0
> > > >
> > >
> > > I am not sure that I am qualified to review this (my assembly knowledge
> > > is not the best) but your commit title mentions an error when compiling
> > > with clang. What is the exact error, what configuration generates it,
> > > and what version of clang? We have done fairly decent testing for
> > > 32-bit ARM, I would like to know what we are missing.
> > >
> > > Cheers,
> > > Nathan
> >
> > We have run fail on android R vts vts_libsnapshot_test with kernel 4.14.
> > This bug is triggered accidently by a workaround patch in our code base.
> > It is fine on a pure clean 4.14 branch since __do_div64 may not be
> > executed in skip_metadata.
> >
> > The attachment are .i and generated .s file. .s file can be reproduced
> > with clang -target arm-linux-eabi -march=armv8.2-a -O2.
> >
> > In function skip_metadata, it loads some value to r0, calls __do_div64,
> > adds 1 to r0 and stores it to [r5]. It gets wrong value since __do_div64
> > clobbers r0 register.
> >
> > We have tried clang-10, clang-11 and android prebuilt clang-r383902b. All
> > of them have the same problem.
>
> Sorry for the large attachment.
> I put .i and .s files on
> https://gist.github.com/penyung/274b0c697062a1c776994bb40243cfff
>
> Antony Yu



-- 
Thanks,
~Nick Desaulniers


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-24 Thread Antony Yu
Antony Yu  於 2020年11月24日 週二 下午3:43寫道:
>
> On Mon, Nov 23, 2020 at 11:16:02AM -0700, Nathan Chancellor wrote:
> > On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote:
> > > __do_div64 clobbers the input register r0 in little endian system.
> > > According to the inline assembly document, if an input operand is
> > > modified, it should be tied to a output operand. This patch can
> > > prevent compilers from reusing r0 register after asm statements.
> > >
> > > Signed-off-by: Antony Yu 
> > > ---
> > >  arch/arm/include/asm/div64.h | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> > > index 898e9c78a7e7..809efc51e90f 100644
> > > --- a/arch/arm/include/asm/div64.h
> > > +++ b/arch/arm/include/asm/div64.h
> > > @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, 
> > > uint32_t base)
> > > asm(__asmeq("%0", __xh)
> > > __asmeq("%1", "r2")
> > > __asmeq("%2", "r0")
> > > -   __asmeq("%3", "r4")
> > > +   __asmeq("%3", "r0")
> > > +   __asmeq("%4", "r4")
> > > "bl __do_div64"
> > > -   : "=r" (__rem), "=r" (__res)
> > > +   : "=r" (__rem), "=r" (__res), "=r" (__n)
> > > : "r" (__n), "r" (__base)
> > > : "ip", "lr", "cc");
> > > *n = __res;
> > > --
> > > 2.23.0
> > >
> >
> > I am not sure that I am qualified to review this (my assembly knowledge
> > is not the best) but your commit title mentions an error when compiling
> > with clang. What is the exact error, what configuration generates it,
> > and what version of clang? We have done fairly decent testing for
> > 32-bit ARM, I would like to know what we are missing.
> >
> > Cheers,
> > Nathan
>
> We have run fail on android R vts vts_libsnapshot_test with kernel 4.14.
> This bug is triggered accidently by a workaround patch in our code base.
> It is fine on a pure clean 4.14 branch since __do_div64 may not be
> executed in skip_metadata.
>
> The attachment are .i and generated .s file. .s file can be reproduced
> with clang -target arm-linux-eabi -march=armv8.2-a -O2.
>
> In function skip_metadata, it loads some value to r0, calls __do_div64,
> adds 1 to r0 and stores it to [r5]. It gets wrong value since __do_div64
> clobbers r0 register.
>
> We have tried clang-10, clang-11 and android prebuilt clang-r383902b. All
> of them have the same problem.

Sorry for the large attachment.
I put .i and .s files on
https://gist.github.com/penyung/274b0c697062a1c776994bb40243cfff

Antony Yu


Re: [RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-23 Thread Nathan Chancellor
On Mon, Nov 23, 2020 at 03:36:32PM +0800, Antony Yu wrote:
> __do_div64 clobbers the input register r0 in little endian system.
> According to the inline assembly document, if an input operand is
> modified, it should be tied to a output operand. This patch can
> prevent compilers from reusing r0 register after asm statements.
> 
> Signed-off-by: Antony Yu 
> ---
>  arch/arm/include/asm/div64.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
> index 898e9c78a7e7..809efc51e90f 100644
> --- a/arch/arm/include/asm/div64.h
> +++ b/arch/arm/include/asm/div64.h
> @@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t 
> base)
>   asm(__asmeq("%0", __xh)
>   __asmeq("%1", "r2")
>   __asmeq("%2", "r0")
> - __asmeq("%3", "r4")
> + __asmeq("%3", "r0")
> + __asmeq("%4", "r4")
>   "bl __do_div64"
> - : "=r" (__rem), "=r" (__res)
> + : "=r" (__rem), "=r" (__res), "=r" (__n)
>   : "r" (__n), "r" (__base)
>   : "ip", "lr", "cc");
>   *n = __res;
> -- 
> 2.23.0
> 

I am not sure that I am qualified to review this (my assembly knowledge
is not the best) but your commit title mentions an error when compiling
with clang. What is the exact error, what configuration generates it,
and what version of clang? We have done fairly decent testing for
32-bit ARM, I would like to know what we are missing.

Cheers,
Nathan


[RESEND,PATCH] ARM: fix __div64_32() error when compiling with clang

2020-11-22 Thread Antony Yu
__do_div64 clobbers the input register r0 in little endian system.
According to the inline assembly document, if an input operand is
modified, it should be tied to a output operand. This patch can
prevent compilers from reusing r0 register after asm statements.

Signed-off-by: Antony Yu 
---
 arch/arm/include/asm/div64.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/div64.h b/arch/arm/include/asm/div64.h
index 898e9c78a7e7..809efc51e90f 100644
--- a/arch/arm/include/asm/div64.h
+++ b/arch/arm/include/asm/div64.h
@@ -39,9 +39,10 @@ static inline uint32_t __div64_32(uint64_t *n, uint32_t base)
asm(__asmeq("%0", __xh)
__asmeq("%1", "r2")
__asmeq("%2", "r0")
-   __asmeq("%3", "r4")
+   __asmeq("%3", "r0")
+   __asmeq("%4", "r4")
"bl __do_div64"
-   : "=r" (__rem), "=r" (__res)
+   : "=r" (__rem), "=r" (__res), "=r" (__n)
: "r" (__n), "r" (__base)
: "ip", "lr", "cc");
*n = __res;
-- 
2.23.0