Re: [U-Boot] imx: get_ticks in syscounter.c get miscompiled by GCC 6

2018-03-09 Thread Yasushi SHOJI
Hi,

On Wed, Mar 7, 2018 at 11:27 PM, Tom Rini  wrote:
> On Wed, Mar 07, 2018 at 10:42:44AM -0300, Fabio Estevam wrote:
>> Patch looks good. Make sure to add your Signed-off-by line, then you
>> can send it via git send-email.
>
> Yes please, thanks!

I've sent it to you with my signed-of-by.

thanks,
-- 
  yashi
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] imx: get_ticks in syscounter.c get miscompiled by GCC 6

2018-03-07 Thread Tom Rini
On Wed, Mar 07, 2018 at 10:42:44AM -0300, Fabio Estevam wrote:
> Hi Yasushi ,
> 
> On Wed, Mar 7, 2018 at 2:57 AM, Yasushi SHOJI  wrote:
> 
> > Do you guys really want to put volatile on all of these now?
> > We are at rc4 and Tom is planing to cut the release
> > March 12th.
> 
> This can be done at a later step.

Yes.  And it should be a little bit manual too.  For example, using your
regex (thanks!) I see we have some powerpc code that's doing
asm("eieio") and that should be eieio() (which is in turn an asm
volatile ...), as well as some sync;isync or just sync/isync that should
be sync();isync(); or similar.  And people that know x86 might have some
useful comments there too.

> > I'm attaching a tentative patch to fix only syscounter.c.
> > If it looks good, I'l resend it by git-send-email.
> 
> Patch looks good. Make sure to add your Signed-off-by line, then you
> can send it via git send-email.

Yes please, thanks!

-- 
Tom


signature.asc
Description: PGP signature
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] imx: get_ticks in syscounter.c get miscompiled by GCC 6

2018-03-07 Thread Fabio Estevam
Hi Yasushi ,

On Wed, Mar 7, 2018 at 2:57 AM, Yasushi SHOJI  wrote:

> Do you guys really want to put volatile on all of these now?
> We are at rc4 and Tom is planing to cut the release
> March 12th.

This can be done at a later step.

> I'm attaching a tentative patch to fix only syscounter.c.
> If it looks good, I'l resend it by git-send-email.

Patch looks good. Make sure to add your Signed-off-by line, then you
can send it via git send-email.

Thanks
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] imx: get_ticks in syscounter.c get miscompiled by GCC 6

2018-03-06 Thread Yasushi SHOJI
Hi,

On Tue, Mar 6, 2018 at 10:11 PM, Fabio Estevam  wrote:
> On Tue, Mar 6, 2018 at 9:31 AM, Lothar Waßmann  
> wrote:
>
>> Without the 'volatile' attribute the compiler is entitled to move the
>> asm code around or optimize it out.
>> So, your patch is the correct fix independent from the gcc version
>> used.
>
> Yes, but then it would be better to fix all the places where asm is
> used without volatile.

That'd be a quite big patch.  A quick grep shows me that
there is over 100 asm() without volatile.

git grep -e '\basm *(' | grep  -v -e volatile -e nop | wc -l
153

Do you guys really want to put volatile on all of these now?
We are at rc4 and Tom is planing to cut the release
March 12th.

I'm attaching a tentative patch to fix only syscounter.c.
If it looks good, I'l resend it by git-send-email.

Best,
-- 
yashi
From 8772a9b5b532da4c1d0656ffa21f324e72d369e8 Mon Sep 17 00:00:00 2001
From: Yasushi SHOJI 
Date: Wed, 7 Mar 2018 14:38:05 +0900
Subject: [PATCH] imx: syscounter: make sure asm is volatile

Without the volatile attribute, compilers are entitled to optimize out
the same asm().  In the case of __udelay() in syscounter.c, it calls
`get_ticks()` twice, one for the starting time and the second in the
loop to check the current time.  When compilers inline `get_ticks()`
they see the same `mrrc` instructions and optimize out the second one.
This leads to infinite loop since we don't get updated value from the
system counter.

Here is a portion of the disassembly of __udelay:

  88:	428b  	cmp	r3, r1
  8a:	f8ce 20a4 	str.w	r2, [lr, #164]	; 0xa4
  8e:	bf08  	it	eq
  90:	4282  	cmpeq	r2, r0
  92:	f8ce 30a0 	str.w	r3, [lr, #160]	; 0xa0
  96:	d3f7  	bcc.n	88 <__udelay+0x88>
  98:	e8bd 8cf0 	ldmia.w	sp!, {r4, r5, r6, r7, sl, fp, pc}

Note that final jump / loop at 96 to 88, we don't have any `mrrc`.

With a volatile attribute, the above changes to this:

  8a:	ec53 2f0e 	mrrc	15, 0, r2, r3, cr14
  8e:	42ab  	cmp	r3, r5
  90:	f8c1 20a4 	str.w	r2, [r1, #164]	; 0xa4
  94:	bf08  	it	eq
  96:	42a2  	cmpeq	r2, r4
  98:	f8c1 30a0 	str.w	r3, [r1, #160]	; 0xa0
  9c:	d3f5  	bcc.n	8a <__udelay+0x8a>
  9e:	e8bd 8cf0 	ldmia.w	sp!, {r4, r5, r6, r7, sl, fp, pc}
  a2:	bf00  	nop

I'm advised to put volatile on all asm(), so this commit also adds it
to the asm() in timer_init().
---
 arch/arm/mach-imx/syscounter.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-imx/syscounter.c b/arch/arm/mach-imx/syscounter.c
index 9290918dca..1d4ebfe343 100644
--- a/arch/arm/mach-imx/syscounter.c
+++ b/arch/arm/mach-imx/syscounter.c
@@ -62,7 +62,7 @@ int timer_init(void)
 	unsigned long val, freq;
 
 	freq = CONFIG_SC_TIMER_CLK;
-	asm("mcr p15, 0, %0, c14, c0, 0" : : "r" (freq));
+	asm volatile("mcr p15, 0, %0, c14, c0, 0" : : "r" (freq));
 
 	writel(freq, >cntfid0);
 
@@ -82,7 +82,7 @@ unsigned long long get_ticks(void)
 {
 	unsigned long long now;
 
-	asm("mrrc p15, 0, %Q0, %R0, c14" : "=r" (now));
+	asm volatile("mrrc p15, 0, %Q0, %R0, c14" : "=r" (now));
 
 	gd->arch.tbl = (unsigned long)(now & 0x);
 	gd->arch.tbu = (unsigned long)(now >> 32);
-- 
2.16.2

___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] imx: get_ticks in syscounter.c get miscompiled by GCC 6

2018-03-06 Thread Fabio Estevam
On Tue, Mar 6, 2018 at 9:31 AM, Lothar Waßmann  wrote:

> Without the 'volatile' attribute the compiler is entitled to move the
> asm code around or optimize it out.
> So, your patch is the correct fix independent from the gcc version
> used.

Yes, but then it would be better to fix all the places where asm is
used without volatile.
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


Re: [U-Boot] imx: get_ticks in syscounter.c get miscompiled by GCC 6

2018-03-06 Thread Lothar Waßmann
Hi,

On Tue, 6 Mar 2018 15:06:08 +0900 Yasushi SHOJI wrote:
> Hi,
> 
> It seems to me that both GCC 6.3 and 6.4 mis-compiles
>
s/mis-compiles/optimizes/

Without the 'volatile' attribute the compiler is entitled to move the
asm code around or optimize it out.
So, your patch is the correct fix independent from the gcc version
used.

> arch/arm/mach-imx/syscounter.c.
> 
> I'm attaching two files, bad.txt is the original syscounter.c and
> good.txt is the one
> with the following patch.
> 
> diff --git a/arch/arm/mach-imx/syscounter.c b/arch/arm/mach-imx/syscounter.c
> index 9290918dca..30ed0109a2 100644
> --- a/arch/arm/mach-imx/syscounter.c
> +++ b/arch/arm/mach-imx/syscounter.c
> @@ -82,7 +82,7 @@ unsigned long long get_ticks(void)
>  {
> unsigned long long now;
> 
> -   asm("mrrc p15, 0, %Q0, %R0, c14" : "=r" (now));
> +   asm volatile("mrrc p15, 0, %Q0, %R0, c14" : "=r" (now));
> 
> gd->arch.tbl = (unsigned long)(now & 0x);
> gd->arch.tbu = (unsigned long)(now >> 32);
> 


Lothar Waßmann
___
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot


[U-Boot] imx: get_ticks in syscounter.c get miscompiled by GCC 6

2018-03-06 Thread Yasushi SHOJI
Hi,

It seems to me that both GCC 6.3 and 6.4 mis-compiles
arch/arm/mach-imx/syscounter.c.

I'm attaching two files, bad.txt is the original syscounter.c and
good.txt is the one
with the following patch.

diff --git a/arch/arm/mach-imx/syscounter.c b/arch/arm/mach-imx/syscounter.c
index 9290918dca..30ed0109a2 100644
--- a/arch/arm/mach-imx/syscounter.c
+++ b/arch/arm/mach-imx/syscounter.c
@@ -82,7 +82,7 @@ unsigned long long get_ticks(void)
 {
unsigned long long now;

-   asm("mrrc p15, 0, %Q0, %R0, c14" : "=r" (now));
+   asm volatile("mrrc p15, 0, %Q0, %R0, c14" : "=r" (now));

gd->arch.tbl = (unsigned long)(now & 0x);
gd->arch.tbu = (unsigned long)(now >> 32);


The target code is the while loop in the __udelay.
void __udelay(unsigned long usec)
{
unsigned long long tmp;
ulong tmo;

tmo = us_to_tick(usec);
tmp = get_ticks() + tmo; /* get current timestamp */

while (get_ticks() < tmp) /* loop till event */
/*NOP*/;
}


Here is the mis compiled asm from the above code (whole function is
attached as bad.txt)

  88: 428b  cmp r3, r1
  8a: f8ce 20a4 str.w r2, [lr, #164] ; 0xa4
  8e: bf08  it eq
  90: 4282  cmpeq r2, r0
  92: f8ce 30a0 str.w r3, [lr, #160] ; 0xa0
  96: d3f7  bcc.n 88 <__udelay+0x88>

Note that the last bcc.n to 88 and we don't see mrrc.

This seems to be that both get_ticks() are inlined and "mrrc"s are
duplicated in the
__udealy() and GCC sees it as an opportunity to optimize out.

GCC 5 and 8 seems to work fine.  Unfortunately I don't have GCC 7 ATM so no
idea how it compiles.

Does anyone see this?
-- 
yashi
Disassembly of section .text.__udelay:

 <__udelay>:
   0:   e92d 4cf0   stmdb   sp!, {r4, r5, r6, r7, sl, fp, lr}
   4:   ee1e 1f10   mrc 15, 0, r1, cr14, cr0, {0}
   8:   f244 243f   movwr4, #16959  ; 0x423f
   c:   f2c0 040f   movtr4, #15
  10:   2500movsr5, #0
  12:   f243 4edb   movwlr, #13531  ; 0x34db
  16:   fbe1 4500   umlal   r4, r5, r1, r0
  1a:   f2cd 7eb6   movtlr, #55222  ; 0xd7b6
  1e:   f64d 6c82   movwip, #56962  ; 0xde82
  22:   f2c4 3c1b   movtip, #17179  ; 0x431b
  26:   2300movsr3, #0
  28:   fba4 670e   umull   r6, r7, r4, lr
  2c:   4629mov r1, r5
  2e:   2500movsr5, #0
  30:   2600movsr6, #0
  32:   fba4 ab0c   umull   sl, fp, r4, ip
  36:   fb0e 7205   mla r2, lr, r5, r7
  3a:   fb0c bb05   mla fp, ip, r5, fp
  3e:   2500movsr5, #0
  40:   fbee 2301   umlal   r2, r3, lr, r1
  44:   46cemov lr, r9
  46:   eb1a 0a02   adds.w  sl, sl, r2
  4a:   eb4b 0b03   adc.w   fp, fp, r3
  4e:   459bcmp fp, r3
  50:   f64d 6382   movwr3, #56962  ; 0xde82
  54:   f2c4 331b   movtr3, #17179  ; 0x431b
  58:   465cmov r4, fp
  5a:   bf08it  eq
  5c:   4592cmpeq   sl, r2
  5e:   fbe3 4501   umlal   r4, r5, r3, r1
  62:   ec51 0f0e   mrrc15, 0, r0, r1, cr14
  66:   bf2cite cs
  68:   2700movcs   r7, #0
  6a:   2701movcc   r7, #1
  6c:   f8c9 00a4   str.w   r0, [r9, #164]  ; 0xa4
  70:   19a4addsr4, r4, r6
  72:   4602mov r2, r0
  74:   417dadcsr5, r7
  76:   f8c9 10a0   str.w   r1, [r9, #160]  ; 0xa0
  7a:   0ca4lsrsr4, r4, #18
  7c:   460bmov r3, r1
  7e:   ea44 3485   orr.w   r4, r4, r5, lsl #14
  82:   1900addsr0, r0, r4
  84:   f141 0100   adc.w   r1, r1, #0
  88:   428bcmp r3, r1
  8a:   f8ce 20a4   str.w   r2, [lr, #164]  ; 0xa4
  8e:   bf08it  eq
  90:   4282cmpeq   r2, r0
  92:   f8ce 30a0   str.w   r3, [lr, #160]  ; 0xa0
  96:   d3f7bcc.n   88 <__udelay+0x88>
  98:   e8bd 8cf0   ldmia.w sp!, {r4, r5, r6, r7, sl, fp, pc}
Disassembly of section .text.__udelay:

 <__udelay>:
   0:   e92d 4cf0   stmdb   sp!, {r4, r5, r6, r7, sl, fp, lr}
   4:   ee1e 1f10   mrc 15, 0, r1, cr14, cr0, {0}
   8:   f244 243f   movwr4, #16959  ; 0x423f
   c:   f2c0 040f   movtr4, #15
  10:   2500movsr5, #0
  12:   f243 4edb   movwlr, #13531  ; 0x34db
  16:   fbe1 4500   umlal   r4, r5, r1, r0
  1a:   f2cd 7eb6   movtlr, #55222  ; 0xd7b6
  1e:   f64d 6c82   movwip, #56962  ; 0xde82
  22:   f2c4 3c1b   movtip, #17179  ; 0x431b
  26:   2300movsr3, #0
  28:   f64d 6082   movwr0, #56962  ; 0xde82
  2c:   f2c4 301b   movtr0, #17179  ; 0x431b
  30:   fba4 670e   umull   r6, r7, r4, lr
  34:   4629mov r1, r5
  36:   2500movsr5, #0
  38:   fba4 ab0c   umull   sl, fp, r4, ip
  3c: