Re: Surprising code generated for vdso_read_begin()

2020-02-19 Thread Segher Boessenkool
On Wed, Feb 19, 2020 at 10:52:16AM +0100, Arnd Bergmann wrote:
> On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy
>  wrote:
> > Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> > > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> > >  wrote:
> > >>
> > >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> > >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> >  It looks like the compiler did loop peeling.  What GCC version is this?
> >  Please try current trunk (to become GCC 10), or at least GCC 9?
> > >>>
> > >>> It is with GCC 5.5
> > >>>
> > >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> > >>> recent than 8.1
> > >>
> > >> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
> > >> this hard and/or painful to do?
> > >
> > > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 
> > > 9.2
> > > binaries, as well as a recent 10.0 snapshot.
> > >
> >
> > Thanks Arnd,
> >
> > I have built the VDSO with 9.2, I get less performant result than with
> > 8.2 (same performance as with 5.5).
> >
> > After a quick look, I see:
> > - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
> > avoid that.
> > - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
> > 8.1 don't need that, all VDSO functions are frameless with 8.1
> 
> If you think it should be fixed in gcc, maybe try to reproduce it in
> https://godbolt.org/

(Feel free to skip this step; and don't put links to godbolt (or anything
else external) in our bugzilla, please; such links go stale before you
know it.)

> and open a gcc bug against that.

Yes please :-)

> Also, please try the gcc-10 snapshot, which has the highest chance
> of getting fixes if it shows the same issue (or worse).

If it is a regression, chances are it will be backported.  (But not to
9.3, which is due in just a few weeks, just like 8.4).  If it is just a
side effect of some other change, it will probably *not* be undone, not
on trunk (GCC 10) either.  It depends.

But sure, always test trunk if you can.


Segher


Re: Surprising code generated for vdso_read_begin()

2020-02-19 Thread Arnd Bergmann
On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy
 wrote:
> Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> >  wrote:
> >>
> >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
>  It looks like the compiler did loop peeling.  What GCC version is this?
>  Please try current trunk (to become GCC 10), or at least GCC 9?
> >>>
> >>> It is with GCC 5.5
> >>>
> >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> >>> recent than 8.1
> >>
> >> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
> >> this hard and/or painful to do?
> >
> > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
> > binaries, as well as a recent 10.0 snapshot.
> >
>
> Thanks Arnd,
>
> I have built the VDSO with 9.2, I get less performant result than with
> 8.2 (same performance as with 5.5).
>
> After a quick look, I see:
> - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
> avoid that.
> - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
> 8.1 don't need that, all VDSO functions are frameless with 8.1

If you think it should be fixed in gcc, maybe try to reproduce it in
https://godbolt.org/ and open a gcc bug against that.

Also, please try the gcc-10 snapshot, which has the highest chance
of getting fixes if it shows the same issue (or worse).

 Arnd


Re: Surprising code generated for vdso_read_begin()

2020-02-19 Thread Christophe Leroy




Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :

On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
 wrote:


On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:

Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :

It looks like the compiler did loop peeling.  What GCC version is this?
Please try current trunk (to become GCC 10), or at least GCC 9?


It is with GCC 5.5

https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
recent than 8.1


Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
this hard and/or painful to do?


To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
binaries, as well as a recent 10.0 snapshot.



Thanks Arnd,

I have built the VDSO with 9.2, I get less performant result than with 
8.2 (same performance as with 5.5).


After a quick look, I see:
- Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should 
avoid that.
- A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC 
8.1 don't need that, all VDSO functions are frameless with 8.1


Christophe


Re: Surprising code generated for vdso_read_begin()

2020-02-16 Thread Arnd Bergmann
On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
 wrote:
>
> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> > Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> > >It looks like the compiler did loop peeling.  What GCC version is this?
> > >Please try current trunk (to become GCC 10), or at least GCC 9?
> >
> > It is with GCC 5.5
> >
> > https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> > recent than 8.1
>
> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
> this hard and/or painful to do?

To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
binaries, as well as a recent 10.0 snapshot.

I hope these work, let me know if there are problems.

   Arnd


Re: Surprising code generated for vdso_read_begin()

2020-01-11 Thread Segher Boessenkool
On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> >It looks like the compiler did loop peeling.  What GCC version is this?
> >Please try current trunk (to become GCC 10), or at least GCC 9?
> 
> It is with GCC 5.5
> 
> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more 
> recent than 8.1

Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
this hard and/or painful to do?

> With 8.1, the problem doesn't show up.

Good to hear that.  Hrm, I wonder when it was fixed...


Segher


Re: Surprising code generated for vdso_read_begin()

2020-01-09 Thread Christophe Leroy




Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :

On Thu, Jan 09, 2020 at 05:52:34PM +, Christophe Leroy wrote:

Wondering why we get something so complicated/redundant for
vdso_read_begin() 

static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
{
u32 seq;

while ((seq = READ_ONCE(vd->seq)) & 1)
cpu_relax();

smp_rmb();
return seq;
}


  6e0:   81 05 00 f0 lwz r8,240(r5)
  6e4:   71 09 00 01 andi.   r9,r8,1
  6e8:   41 82 00 10 beq 6f8 <__c_kernel_clock_gettime+0x158>
  6ec:   81 05 00 f0 lwz r8,240(r5)
  6f0:   71 0a 00 01 andi.   r10,r8,1
  6f4:   40 82 ff f8 bne 6ec <__c_kernel_clock_gettime+0x14c>
  6f8:

r5 being vd pointer

Why the first triplet, not only the second triplet ? Something wrong
with using READ_ONCE() for that ?


It looks like the compiler did loop peeling.  What GCC version is this?
Please try current trunk (to become GCC 10), or at least GCC 9?



It is with GCC 5.5

https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more 
recent than 8.1


With 8.1, the problem doesn't show up.

Thanks
Christophe


Re: Surprising code generated for vdso_read_begin()

2020-01-09 Thread Segher Boessenkool
On Thu, Jan 09, 2020 at 05:52:34PM +, Christophe Leroy wrote:
> Wondering why we get something so complicated/redundant for 
> vdso_read_begin() 
> 
> static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
> {
>   u32 seq;
> 
>   while ((seq = READ_ONCE(vd->seq)) & 1)
>   cpu_relax();
> 
>   smp_rmb();
>   return seq;
> }
> 
> 
>  6e0:   81 05 00 f0 lwz r8,240(r5)
>  6e4:   71 09 00 01 andi.   r9,r8,1
>  6e8:   41 82 00 10 beq 6f8 <__c_kernel_clock_gettime+0x158>
>  6ec:   81 05 00 f0 lwz r8,240(r5)
>  6f0:   71 0a 00 01 andi.   r10,r8,1
>  6f4:   40 82 ff f8 bne 6ec <__c_kernel_clock_gettime+0x14c>
>  6f8:
> 
> r5 being vd pointer
> 
> Why the first triplet, not only the second triplet ? Something wrong 
> with using READ_ONCE() for that ?

It looks like the compiler did loop peeling.  What GCC version is this?
Please try current trunk (to become GCC 10), or at least GCC 9?


Segher