Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-06 Thread Michael Ellerman
Segher Boessenkool  writes:
> On Thu, Aug 06, 2020 at 12:03:33PM +1000, Michael Ellerman wrote:
>> Segher Boessenkool  writes:
>> > On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
>> >> Christophe Leroy  writes:
>> >> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
>> >> > frame whenever it has anything to same.
>
> ^^^
>
>> >> > fbb60:  94 21 ff e0 stwur1,-32(r1)
>> >
>> > This is the *only* place where you can use a negative offset from r1:
>> > in the stwu to extend the stack (set up a new stack frame, or make the
>> > current one bigger).
>> 
>> (You're talking about 32-bit code here right?)
>
> The "SYSV" ELF binding, yeah, which is used for 32-bit on Linux (give or
> take, ho hum).
>
> The ABIs that have a red zone are much nicer here (but less simple) :-)

Yep, just checking I wasn't misunderstanding your comment about negative
offsets.

>> >> At the same time it's much safer for us to just save/restore r2, and
>> >> probably in the noise performance wise.
>> >
>> > If you want a function to be able to work with ABI-compliant code safely
>> > (in all cases), you'll have to make it itself ABI-compliant as well,
>> > yes :-)
>> 
>> True. Except this is the VDSO which has previously been a bit wild west
>> as far as ABI goes :)
>
> It could get away with many things because it was guaranteed to be a
> leaf function.  Some of those things even violate the ABIs, but you can
> get away with it easily, much reduced scope.  Now if this is generated
> code, violating the rules will catch up with you sooner rather than
> later ;-)

Agreed.

cheers


Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-06 Thread Segher Boessenkool
Hi!

On Thu, Aug 06, 2020 at 12:03:33PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
> >> Christophe Leroy  writes:
> >> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
> >> > frame whenever it has anything to same.

^^^

> >> > fbb60:   94 21 ff e0 stwur1,-32(r1)
> >
> > This is the *only* place where you can use a negative offset from r1:
> > in the stwu to extend the stack (set up a new stack frame, or make the
> > current one bigger).
> 
> (You're talking about 32-bit code here right?)

The "SYSV" ELF binding, yeah, which is used for 32-bit on Linux (give or
take, ho hum).

The ABIs that have a red zone are much nicer here (but less simple) :-)

> >> At the same time it's much safer for us to just save/restore r2, and
> >> probably in the noise performance wise.
> >
> > If you want a function to be able to work with ABI-compliant code safely
> > (in all cases), you'll have to make it itself ABI-compliant as well,
> > yes :-)
> 
> True. Except this is the VDSO which has previously been a bit wild west
> as far as ABI goes :)

It could get away with many things because it was guaranteed to be a
leaf function.  Some of those things even violate the ABIs, but you can
get away with it easily, much reduced scope.  Now if this is generated
code, violating the rules will catch up with you sooner rather than
later ;-)


Segher


Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-05 Thread Michael Ellerman
Segher Boessenkool  writes:
> On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
>> Christophe Leroy  writes:
>> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
>> > frame whenever it has anything to same.
>> 
>> Yeah OK that would explain it.
>> 
>> > Here is what I have in libc.so:
>> >
>> > 000fbb60 <__clock_gettime>:
>> > fbb60: 94 21 ff e0 stwur1,-32(r1)
>
> This is the *only* place where you can use a negative offset from r1:
> in the stwu to extend the stack (set up a new stack frame, or make the
> current one bigger).

(You're talking about 32-bit code here right?)

>> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
>> > b/arch/powerpc/include/asm/vdso/gettimeofday.h
>> > index a0712a6e80d9..0b6fa245d54e 100644
>> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
>> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
>> > @@ -10,6 +10,7 @@
>> > .cfi_startproc
>> >PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
>> >mflrr0
>> > +  PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
>> > .cfi_register lr, r0
>> 
>> The cfi_register should come directly after the mflr I think.
>
> That is the idiomatic way to write it, and most obviously correct.  But
> as long as the value in LR at function entry is available in multiple
> places (like, in LR and in R0 here), it is fine to use either for
> unwinding.  Sometimes you can use this to optimise the unwind tables a
> bit -- not really worth it in hand-written code, it's more important to
> make it legible ;-)

OK. Because LR still holds the LR value until it's clobbered later, by
which point the cfi_register has taken effect.

But yeah I think for readability it's best to keep the cfi_register next
to the mflr.

>> >> There's also no code to load/restore the TOC pointer on BE, which I
>> >> think we'll need to handle.
>> >
>> > I see no code in the generated vdso64.so doing anything with r2, but if 
>> > you think that's needed, just let's do it:
>> 
>> Hmm, true.
>> 
>> The compiler will use the toc for globals (and possibly also for large
>> constants?)
>
> And anything else it bloody well wants to, yeah :-)

Haha yeah OK.

>> AFAIK there's no way to disable use of the toc, or make it a build error
>> if it's needed.
>
> Yes.
>
>> At the same time it's much safer for us to just save/restore r2, and
>> probably in the noise performance wise.
>
> If you want a function to be able to work with ABI-compliant code safely
> (in all cases), you'll have to make it itself ABI-compliant as well,
> yes :-)

True. Except this is the VDSO which has previously been a bit wild west
as far as ABI goes :)

>> So yeah we should probably do as below.
>
> [ snip ]
>
> Looks good yes.

Thanks for reviewing.

cheers


Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-05 Thread Segher Boessenkool
Hi!

On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote:
> Christophe Leroy  writes:
> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
> > frame whenever it has anything to same.
> 
> Yeah OK that would explain it.
> 
> > Here is what I have in libc.so:
> >
> > 000fbb60 <__clock_gettime>:
> > fbb60:  94 21 ff e0 stwur1,-32(r1)

This is the *only* place where you can use a negative offset from r1:
in the stwu to extend the stack (set up a new stack frame, or make the
current one bigger).

> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
> > b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > index a0712a6e80d9..0b6fa245d54e 100644
> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h
> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> > @@ -10,6 +10,7 @@
> > .cfi_startproc
> > PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
> > mflrr0
> > +   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
> > .cfi_register lr, r0
> 
> The cfi_register should come directly after the mflr I think.

That is the idiomatic way to write it, and most obviously correct.  But
as long as the value in LR at function entry is available in multiple
places (like, in LR and in R0 here), it is fine to use either for
unwinding.  Sometimes you can use this to optimise the unwind tables a
bit -- not really worth it in hand-written code, it's more important to
make it legible ;-)

> >> There's also no code to load/restore the TOC pointer on BE, which I
> >> think we'll need to handle.
> >
> > I see no code in the generated vdso64.so doing anything with r2, but if 
> > you think that's needed, just let's do it:
> 
> Hmm, true.
> 
> The compiler will use the toc for globals (and possibly also for large
> constants?)

And anything else it bloody well wants to, yeah :-)

> AFAIK there's no way to disable use of the toc, or make it a build error
> if it's needed.

Yes.

> At the same time it's much safer for us to just save/restore r2, and
> probably in the noise performance wise.

If you want a function to be able to work with ABI-compliant code safely
(in all cases), you'll have to make it itself ABI-compliant as well,
yes :-)

> So yeah we should probably do as below.

[ snip ]

Looks good yes.


Segher


Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-05 Thread Michael Ellerman
Christophe Leroy  writes:
> On 07/15/2020 01:04 AM, Michael Ellerman wrote:
>> Christophe Leroy  writes:
>>> Prepare for switching VDSO to generic C implementation in following
>>> patch. Here, we:
>>> - Modify __get_datapage() to take an offset
>>> - Prepare the helpers to call the C VDSO functions
>>> - Prepare the required callbacks for the C VDSO functions
>>> - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
>>> - Add the C trampolines to the generic C VDSO functions
>>>
>>> powerpc is a bit special for VDSO as well as system calls in the
>>> way that it requires setting CR SO bit which cannot be done in C.
>>> Therefore, entry/exit needs to be performed in ASM.
>>>
>>> Implementing __arch_get_vdso_data() would clobber the link register,
>>> requiring the caller to save it. As the ASM calling function already
>>> has to set a stack frame and saves the link register before calling
>>> the C vdso function, retriving the vdso data pointer there is lighter.
>> ...
>> 
>>> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
>>> b/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> new file mode 100644
>>> index ..4452897f9bd8
>>> --- /dev/null
>>> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> @@ -0,0 +1,175 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#ifndef __ASM_VDSO_GETTIMEOFDAY_H
>>> +#define __ASM_VDSO_GETTIMEOFDAY_H
>>> +
>>> +#include 
>>> +
>>> +#ifdef __ASSEMBLY__
>>> +
>>> +.macro cvdso_call funct
>>> +  .cfi_startproc
>>> +   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
>>> +   mflrr0
>>> +  .cfi_register lr, r0
>>> +   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
>> 
>> This doesn't work for me on ppc64(le) with glibc.
>> 
>> glibc doesn't create a stack frame before making the VDSO call, so the
>> store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
>> leading to an infinite loop.
>> 
>> This is an example from a statically built program that calls
>> clock_gettime():
>> 
>> 10030cb0 <__clock_gettime>:
>>  10030cb0:   0e 10 40 3c lis r2,4110
>>  10030cb4:   00 7a 42 38 addir2,r2,31232
>>  10030cb8:   a6 02 08 7c mflrr0
>>  10030cbc:   ff ff 22 3d addis   r9,r2,-1
>>  10030cc0:   58 6d 29 39 addir9,r9,27992
>>  10030cc4:   f0 ff c1 fb std r30,-16(r1) <-- 
>> redzone store
>>  10030cc8:   78 23 9e 7c mr  r30,r4
>>  10030ccc:   f8 ff e1 fb std r31,-8(r1)  <-- 
>> redzone store
>>  10030cd0:   78 1b 7f 7c mr  r31,r3
>>  10030cd4:   10 00 01 f8 std r0,16(r1)   <-- 
>> save LR to caller's frame
>>  10030cd8:   00 00 09 e8 ld  r0,0(r9)
>>  10030cdc:   00 00 20 2c cmpdi   r0,0
>>  10030ce0:   50 00 82 41 beq 10030d30 <__clock_gettime+0x80>
>>  10030ce4:   a6 03 09 7c mtctr   r0
>>  10030ce8:   21 04 80 4e bctrl   
>> <-- vdso call
>>  10030cec:   26 00 00 7c mfcrr0
>>  10030cf0:   00 10 09 74 andis.  r9,r0,4096
>>  10030cf4:   78 1b 69 7c mr  r9,r3
>>  10030cf8:   28 00 82 40 bne 10030d20 <__clock_gettime+0x70>
>>  10030cfc:   b4 07 23 7d extsw   r3,r9
>>  10030d00:   10 00 01 e8 ld  r0,16(r1)   <-- 
>> load saved LR, since clobbered by the VDSO
>>  10030d04:   f0 ff c1 eb ld  r30,-16(r1)
>>  10030d08:   f8 ff e1 eb ld  r31,-8(r1)
>>  10030d0c:   a6 03 08 7c mtlrr0  <-- 
>> restore LR
>>  10030d10:   20 00 80 4e blr <-- 
>> jumps to 10030cec
>> 
>> 
>> I'm kind of confused how it worked for you on 32-bit.
>
> Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
> frame whenever it has anything to same.

Yeah OK that would explain it.

> Here is what I have in libc.so:
>
> 000fbb60 <__clock_gettime>:
> fbb60:94 21 ff e0 stwur1,-32(r1)
> fbb64:7c 08 02 a6 mflrr0
> fbb68:48 09 75 c9 bl  193130 <_IO_stdout_+0x24b0>
> fbb6c:93 c1 00 18 stw r30,24(r1)
> fbb70:7f c8 02 a6 mflrr30
> fbb74:90 01 00 24 stw r0,36(r1)
> fbb78:93 81 00 10 stw r28,16(r1)
> fbb7c:93 a1 00 14 stw r29,20(r1)
> fbb80:83 9e fc 98 lwz r28,-872(r30)
> fbb84:93 e1 00 1c stw r31,28(r1)
> fbb88:80 1c 00 00 lwz r0,0(r28)
> fbb8c:83 82 8f f4 lwz r28,-28684(r2)
> fbb90:7c 7f 1b 78 mr  r31,r3
> fbb94:7c 00 e2 79 xor.r0,r0,r28
> fbb98:7c 9d 23 78 mr  r29,r4
> fbb9c:41 82 00 40 beq fbbdc <__clock_gettime+0x7c>
> fbba0:7c 09 03 a6 mtctr   r0
> fbba4:4e 80 04 21 bctrl
> fbba8:7c 00 00 26 mfcrr0
> fbbac:74 1c 10 

Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-08-04 Thread Christophe Leroy




On 07/15/2020 01:04 AM, Michael Ellerman wrote:

Christophe Leroy  writes:

Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Modify __get_datapage() to take an offset
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

...


diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
new file mode 100644
index ..4452897f9bd8
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#include 
+
+#ifdef __ASSEMBLY__
+
+.macro cvdso_call funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)


This doesn't work for me on ppc64(le) with glibc.

glibc doesn't create a stack frame before making the VDSO call, so the
store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
leading to an infinite loop.

This is an example from a statically built program that calls
clock_gettime():

10030cb0 <__clock_gettime>:
 10030cb0:   0e 10 40 3c lis r2,4110
 10030cb4:   00 7a 42 38 addir2,r2,31232
 10030cb8:   a6 02 08 7c mflrr0
 10030cbc:   ff ff 22 3d addis   r9,r2,-1
 10030cc0:   58 6d 29 39 addir9,r9,27992
 10030cc4:   f0 ff c1 fb std r30,-16(r1)<-- 
redzone store
 10030cc8:   78 23 9e 7c mr  r30,r4
 10030ccc:   f8 ff e1 fb std r31,-8(r1) <-- 
redzone store
 10030cd0:   78 1b 7f 7c mr  r31,r3
 10030cd4:   10 00 01 f8 std r0,16(r1)  <-- 
save LR to caller's frame
 10030cd8:   00 00 09 e8 ld  r0,0(r9)
 10030cdc:   00 00 20 2c cmpdi   r0,0
 10030ce0:   50 00 82 41 beq 10030d30 <__clock_gettime+0x80>
 10030ce4:   a6 03 09 7c mtctr   r0
 10030ce8:   21 04 80 4e bctrl  <-- 
vdso call
 10030cec:   26 00 00 7c mfcrr0
 10030cf0:   00 10 09 74 andis.  r9,r0,4096
 10030cf4:   78 1b 69 7c mr  r9,r3
 10030cf8:   28 00 82 40 bne 10030d20 <__clock_gettime+0x70>
 10030cfc:   b4 07 23 7d extsw   r3,r9
 10030d00:   10 00 01 e8 ld  r0,16(r1)  <-- 
load saved LR, since clobbered by the VDSO
 10030d04:   f0 ff c1 eb ld  r30,-16(r1)
 10030d08:   f8 ff e1 eb ld  r31,-8(r1)
 10030d0c:   a6 03 08 7c mtlrr0 <-- 
restore LR
 10030d10:   20 00 80 4e blr<-- 
jumps to 10030cec


I'm kind of confused how it worked for you on 32-bit.


Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack 
frame whenever it has anything to same. Here is what I have in libc.so:


000fbb60 <__clock_gettime>:
   fbb60:   94 21 ff e0 stwur1,-32(r1)
   fbb64:   7c 08 02 a6 mflrr0
   fbb68:   48 09 75 c9 bl  193130 <_IO_stdout_+0x24b0>
   fbb6c:   93 c1 00 18 stw r30,24(r1)
   fbb70:   7f c8 02 a6 mflrr30
   fbb74:   90 01 00 24 stw r0,36(r1)
   fbb78:   93 81 00 10 stw r28,16(r1)
   fbb7c:   93 a1 00 14 stw r29,20(r1)
   fbb80:   83 9e fc 98 lwz r28,-872(r30)
   fbb84:   93 e1 00 1c stw r31,28(r1)
   fbb88:   80 1c 00 00 lwz r0,0(r28)
   fbb8c:   83 82 8f f4 lwz r28,-28684(r2)
   fbb90:   7c 7f 1b 78 mr  r31,r3
   fbb94:   7c 00 e2 79 xor.r0,r0,r28
   fbb98:   7c 9d 23 78 mr  r29,r4
   fbb9c:   41 82 00 40 beq fbbdc <__clock_gettime+0x7c>
   fbba0:   7c 09 03 a6 mtctr   r0
   fbba4:   4e 80 04 21 bctrl
   fbba8:   7c 00 00 26 mfcrr0
   fbbac:   74 1c 10 00 andis.  r28,r0,4096
   fbbb0:   40 82 00 24 bne fbbd4 <__clock_gettime+0x74>
   fbbb4:   80 01 00 24 lwz r0,36(r1)
   fbbb8:   83 81 00 10 lwz r28,16(r1)
   fbbbc:   7c 08 03 a6 mtlrr0
   fbbc0:   83 a1 00 14 lwz r29,20(r1)
   fbbc4:   83 c1 00 18 lwz   

Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-07-16 Thread Tulio Magno Quites Machado Filho
Christophe Leroy  writes:

> Michael Ellerman  a écrit :
>
>> Christophe Leroy  writes:
>>> Prepare for switching VDSO to generic C implementation in following
>>> patch. Here, we:
>>> - Modify __get_datapage() to take an offset
>>> - Prepare the helpers to call the C VDSO functions
>>> - Prepare the required callbacks for the C VDSO functions
>>> - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
>>> - Add the C trampolines to the generic C VDSO functions
>>>
>>> powerpc is a bit special for VDSO as well as system calls in the
>>> way that it requires setting CR SO bit which cannot be done in C.
>>> Therefore, entry/exit needs to be performed in ASM.
>>>
>>> Implementing __arch_get_vdso_data() would clobber the link register,
>>> requiring the caller to save it. As the ASM calling function already
>>> has to set a stack frame and saves the link register before calling
>>> the C vdso function, retriving the vdso data pointer there is lighter.
>> ...
>>
>>> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h  
>>> b/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> new file mode 100644
>>> index ..4452897f9bd8
>>> --- /dev/null
>>> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
>>> @@ -0,0 +1,175 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#ifndef __ASM_VDSO_GETTIMEOFDAY_H
>>> +#define __ASM_VDSO_GETTIMEOFDAY_H
>>> +
>>> +#include 
>>> +
>>> +#ifdef __ASSEMBLY__
>>> +
>>> +.macro cvdso_call funct
>>> +  .cfi_startproc
>>> +   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
>>> +   mflrr0
>>> +  .cfi_register lr, r0
>>> +   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
>>
>> This doesn't work for me on ppc64(le) with glibc.
>>
>> glibc doesn't create a stack frame before making the VDSO call, so the
>> store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
>> leading to an infinite loop.
>
> Where should it be saved if it can't be saved in the standard location ?

As Michael pointed out, userspace doesn't treat the VDSO as a normal function
call.  In order to keep compatibility with existent software, LR would need to
be saved on another stack frame.

-- 
Tulio Magno


Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-07-15 Thread Christophe Leroy

Michael Ellerman  a écrit :


Christophe Leroy  writes:

Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Modify __get_datapage() to take an offset
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

...

diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h  
b/arch/powerpc/include/asm/vdso/gettimeofday.h

new file mode 100644
index ..4452897f9bd8
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#include 
+
+#ifdef __ASSEMBLY__
+
+.macro cvdso_call funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)


This doesn't work for me on ppc64(le) with glibc.

glibc doesn't create a stack frame before making the VDSO call, so the
store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
leading to an infinite loop.


Where should it be saved if it can't be saved in the standard location ?



This is an example from a statically built program that calls
clock_gettime():

10030cb0 <__clock_gettime>:
10030cb0:   0e 10 40 3c lis r2,4110
10030cb4:   00 7a 42 38 addir2,r2,31232
10030cb8:   a6 02 08 7c mflrr0
10030cbc:   ff ff 22 3d addis   r9,r2,-1
10030cc0:   58 6d 29 39 addir9,r9,27992
10030cc4:   f0 ff c1 fb std r30,-16(r1) <-- 
redzone store
10030cc8:   78 23 9e 7c mr  r30,r4
10030ccc:   f8 ff e1 fb std r31,-8(r1)  <-- 
redzone store
10030cd0:   78 1b 7f 7c mr  r31,r3
10030cd4:   10 00 01 f8 std r0,16(r1)			<-- save LR to  
caller's frame

10030cd8:   00 00 09 e8 ld  r0,0(r9)
10030cdc:   00 00 20 2c cmpdi   r0,0
10030ce0:   50 00 82 41 beq 10030d30 <__clock_gettime+0x80>
10030ce4:   a6 03 09 7c mtctr   r0
10030ce8:   21 04 80 4e bctrl   <-- 
vdso call
10030cec:   26 00 00 7c mfcrr0
10030cf0:   00 10 09 74 andis.  r9,r0,4096
10030cf4:   78 1b 69 7c mr  r9,r3
10030cf8:   28 00 82 40 bne 10030d20 <__clock_gettime+0x70>
10030cfc:   b4 07 23 7d extsw   r3,r9
10030d00:   10 00 01 e8 ld  r0,16(r1)			<-- load saved  
LR, since clobbered by the VDSO

10030d04:   f0 ff c1 eb ld  r30,-16(r1)
10030d08:   f8 ff e1 eb ld  r31,-8(r1)
10030d0c:   a6 03 08 7c mtlrr0  <-- 
restore LR
10030d10:   20 00 80 4e blr <-- 
jumps to 10030cec


I'm kind of confused how it worked for you on 32-bit.


So am I then. I'm away for 3 weeks, summer break. I'll check when I'm back.



There's also no code to load/restore the TOC pointer on BE, which I
think we'll need to handle.


What does it means exactly ? Just saving r2 all the time ? Is there a  
dedicated location in the stack frame for it ? Is that only for 64 be ?


Christophe




Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-07-14 Thread Michael Ellerman
Christophe Leroy  writes:
> Prepare for switching VDSO to generic C implementation in following
> patch. Here, we:
> - Modify __get_datapage() to take an offset
> - Prepare the helpers to call the C VDSO functions
> - Prepare the required callbacks for the C VDSO functions
> - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
> - Add the C trampolines to the generic C VDSO functions
>
> powerpc is a bit special for VDSO as well as system calls in the
> way that it requires setting CR SO bit which cannot be done in C.
> Therefore, entry/exit needs to be performed in ASM.
>
> Implementing __arch_get_vdso_data() would clobber the link register,
> requiring the caller to save it. As the ASM calling function already
> has to set a stack frame and saves the link register before calling
> the C vdso function, retriving the vdso data pointer there is lighter.
...

> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
> b/arch/powerpc/include/asm/vdso/gettimeofday.h
> new file mode 100644
> index ..4452897f9bd8
> --- /dev/null
> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
> @@ -0,0 +1,175 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_VDSO_GETTIMEOFDAY_H
> +#define __ASM_VDSO_GETTIMEOFDAY_H
> +
> +#include 
> +
> +#ifdef __ASSEMBLY__
> +
> +.macro cvdso_call funct
> +  .cfi_startproc
> + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
> + mflrr0
> +  .cfi_register lr, r0
> + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)

This doesn't work for me on ppc64(le) with glibc.

glibc doesn't create a stack frame before making the VDSO call, so the
store of r0 (LR) goes into the caller's frame, corrupting the saved LR,
leading to an infinite loop.

This is an example from a statically built program that calls
clock_gettime():

10030cb0 <__clock_gettime>:
10030cb0:   0e 10 40 3c lis r2,4110
10030cb4:   00 7a 42 38 addir2,r2,31232
10030cb8:   a6 02 08 7c mflrr0
10030cbc:   ff ff 22 3d addis   r9,r2,-1
10030cc0:   58 6d 29 39 addir9,r9,27992
10030cc4:   f0 ff c1 fb std r30,-16(r1) <-- 
redzone store
10030cc8:   78 23 9e 7c mr  r30,r4
10030ccc:   f8 ff e1 fb std r31,-8(r1)  <-- 
redzone store
10030cd0:   78 1b 7f 7c mr  r31,r3
10030cd4:   10 00 01 f8 std r0,16(r1)   <-- 
save LR to caller's frame
10030cd8:   00 00 09 e8 ld  r0,0(r9)
10030cdc:   00 00 20 2c cmpdi   r0,0
10030ce0:   50 00 82 41 beq 10030d30 <__clock_gettime+0x80>
10030ce4:   a6 03 09 7c mtctr   r0
10030ce8:   21 04 80 4e bctrl   <-- 
vdso call
10030cec:   26 00 00 7c mfcrr0
10030cf0:   00 10 09 74 andis.  r9,r0,4096
10030cf4:   78 1b 69 7c mr  r9,r3
10030cf8:   28 00 82 40 bne 10030d20 <__clock_gettime+0x70>
10030cfc:   b4 07 23 7d extsw   r3,r9
10030d00:   10 00 01 e8 ld  r0,16(r1)   <-- 
load saved LR, since clobbered by the VDSO
10030d04:   f0 ff c1 eb ld  r30,-16(r1)
10030d08:   f8 ff e1 eb ld  r31,-8(r1)
10030d0c:   a6 03 08 7c mtlrr0  <-- 
restore LR
10030d10:   20 00 80 4e blr <-- 
jumps to 10030cec


I'm kind of confused how it worked for you on 32-bit.

There's also no code to load/restore the TOC pointer on BE, which I
think we'll need to handle.

cheers


[PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.

2020-04-28 Thread Christophe Leroy
Prepare for switching VDSO to generic C implementation in following
patch. Here, we:
- Modify __get_datapage() to take an offset
- Prepare the helpers to call the C VDSO functions
- Prepare the required callbacks for the C VDSO functions
- Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES
- Add the C trampolines to the generic C VDSO functions

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit needs to be performed in ASM.

Implementing __arch_get_vdso_data() would clobber the link register,
requiring the caller to save it. As the ASM calling function already
has to set a stack frame and saves the link register before calling
the C vdso function, retriving the vdso data pointer there is lighter.

Implement __arch_vdso_capable() and:
- When the timebase is used, make it always return true.
- When the RTC clock is used, make it always return false.

Provide vdso_shift_ns(), as the generic x >> s gives the following
bad result:

  18:   35 25 ff e0 addic.  r9,r5,-32
  1c:   41 80 00 10 blt 2c 
  20:   7c 64 4c 30 srw r4,r3,r9
  24:   38 60 00 00 li  r3,0
...
  2c:   54 69 08 3c rlwinm  r9,r3,1,0,30
  30:   21 45 00 1f subfic  r10,r5,31
  34:   7c 84 2c 30 srw r4,r4,r5
  38:   7d 29 50 30 slw r9,r9,r10
  3c:   7c 63 2c 30 srw r3,r3,r5
  40:   7d 24 23 78 or  r4,r9,r4

In our case the shift is always <= 32. In addition,  the upper 32 bits
of the result are likely nul. Lets GCC know it, it also optimises the
following calculations.

With the patch, we get:
   0:   21 25 00 20 subfic  r9,r5,32
   4:   7c 69 48 30 slw r9,r3,r9
   8:   7c 84 2c 30 srw r4,r4,r5
   c:   7d 24 23 78 or  r4,r9,r4
  10:   7c 63 2c 30 srw r3,r3,r5

Signed-off-by: Christophe Leroy 
---
v8:
- New, splitted out of last patch of the series
---
 arch/powerpc/include/asm/clocksource.h   |   7 +
 arch/powerpc/include/asm/vdso/clocksource.h  |   7 +
 arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++
 arch/powerpc/include/asm/vdso_datapage.h |   6 +-
 arch/powerpc/kernel/vdso32/vgettimeofday.c   |  29 +++
 arch/powerpc/kernel/vdso64/vgettimeofday.c   |  29 +++
 6 files changed, 250 insertions(+), 3 deletions(-)
 create mode 100644 arch/powerpc/include/asm/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c
 create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c

diff --git a/arch/powerpc/include/asm/clocksource.h 
b/arch/powerpc/include/asm/clocksource.h
new file mode 100644
index ..482185566b0c
--- /dev/null
+++ b/arch/powerpc/include/asm/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_CLOCKSOURCE_H
+#define _ASM_CLOCKSOURCE_H
+
+#include 
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/clocksource.h 
b/arch/powerpc/include/asm/vdso/clocksource.h
new file mode 100644
index ..ec5d672d2569
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/clocksource.h
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSOCLOCKSOURCE_H
+#define __ASM_VDSOCLOCKSOURCE_H
+
+#define VDSO_ARCH_CLOCKMODES   VDSO_CLOCKMODE_ARCHTIMER
+
+#endif
diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h 
b/arch/powerpc/include/asm/vdso/gettimeofday.h
new file mode 100644
index ..4452897f9bd8
--- /dev/null
+++ b/arch/powerpc/include/asm/vdso/gettimeofday.h
@@ -0,0 +1,175 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_VDSO_GETTIMEOFDAY_H
+#define __ASM_VDSO_GETTIMEOFDAY_H
+
+#include 
+
+#ifdef __ASSEMBLY__
+
+.macro cvdso_call funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   get_datapager5, VDSO_DATA_OFFSET
+   bl  \funct
+   PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   cmpwi   r3, 0
+   mtlrr0
+  .cfi_restore lr
+   addir1, r1, STACK_FRAME_OVERHEAD
+   crclr   so
+   beqlr+
+   crset   so
+   neg r3, r3
+   blr
+  .cfi_endproc
+.endm
+
+.macro cvdso_call_time funct
+  .cfi_startproc
+   PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1)
+   mflrr0
+  .cfi_register lr, r0
+   PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   get_datapager4, VDSO_DATA_OFFSET
+   bl  \funct
+   PPC_LL  r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1)
+   crclr   so
+   mtlrr0
+  .cfi_restore lr
+   addir1, r1, STACK_FRAME_OVERHEAD
+   blr
+  .cfi_endproc
+.endm
+
+#else
+
+#include