Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Segher Boessenkool writes: > On Thu, Aug 06, 2020 at 12:03:33PM +1000, Michael Ellerman wrote: >> Segher Boessenkool writes: >> > On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote: >> >> Christophe Leroy writes: >> >> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack >> >> > frame whenever it has anything to same. > > ^^^ > >> >> > fbb60: 94 21 ff e0 stwur1,-32(r1) >> > >> > This is the *only* place where you can use a negative offset from r1: >> > in the stwu to extend the stack (set up a new stack frame, or make the >> > current one bigger). >> >> (You're talking about 32-bit code here right?) > > The "SYSV" ELF binding, yeah, which is used for 32-bit on Linux (give or > take, ho hum). > > The ABIs that have a red zone are much nicer here (but less simple) :-) Yep, just checking I wasn't misunderstanding your comment about negative offsets. >> >> At the same time it's much safer for us to just save/restore r2, and >> >> probably in the noise performance wise. >> > >> > If you want a function to be able to work with ABI-compliant code safely >> > (in all cases), you'll have to make it itself ABI-compliant as well, >> > yes :-) >> >> True. Except this is the VDSO which has previously been a bit wild west >> as far as ABI goes :) > > It could get away with many things because it was guaranteed to be a > leaf function. Some of those things even violate the ABIs, but you can > get away with it easily, much reduced scope. Now if this is generated > code, violating the rules will catch up with you sooner rather than > later ;-) Agreed. cheers
Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Hi! On Thu, Aug 06, 2020 at 12:03:33PM +1000, Michael Ellerman wrote: > Segher Boessenkool writes: > > On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote: > >> Christophe Leroy writes: > >> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack > >> > frame whenever it has anything to same. ^^^ > >> > fbb60: 94 21 ff e0 stwur1,-32(r1) > > > > This is the *only* place where you can use a negative offset from r1: > > in the stwu to extend the stack (set up a new stack frame, or make the > > current one bigger). > > (You're talking about 32-bit code here right?) The "SYSV" ELF binding, yeah, which is used for 32-bit on Linux (give or take, ho hum). The ABIs that have a red zone are much nicer here (but less simple) :-) > >> At the same time it's much safer for us to just save/restore r2, and > >> probably in the noise performance wise. > > > > If you want a function to be able to work with ABI-compliant code safely > > (in all cases), you'll have to make it itself ABI-compliant as well, > > yes :-) > > True. Except this is the VDSO which has previously been a bit wild west > as far as ABI goes :) It could get away with many things because it was guaranteed to be a leaf function. Some of those things even violate the ABIs, but you can get away with it easily, much reduced scope. Now if this is generated code, violating the rules will catch up with you sooner rather than later ;-) Segher
Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Segher Boessenkool writes: > On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote: >> Christophe Leroy writes: >> > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack >> > frame whenever it has anything to same. >> >> Yeah OK that would explain it. >> >> > Here is what I have in libc.so: >> > >> > 000fbb60 <__clock_gettime>: >> > fbb60: 94 21 ff e0 stwur1,-32(r1) > > This is the *only* place where you can use a negative offset from r1: > in the stwu to extend the stack (set up a new stack frame, or make the > current one bigger). (You're talking about 32-bit code here right?) >> > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h >> > b/arch/powerpc/include/asm/vdso/gettimeofday.h >> > index a0712a6e80d9..0b6fa245d54e 100644 >> > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h >> > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h >> > @@ -10,6 +10,7 @@ >> > .cfi_startproc >> >PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) >> >mflrr0 >> > + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) >> > .cfi_register lr, r0 >> >> The cfi_register should come directly after the mflr I think. > > That is the idiomatic way to write it, and most obviously correct. But > as long as the value in LR at function entry is available in multiple > places (like, in LR and in R0 here), it is fine to use either for > unwinding. Sometimes you can use this to optimise the unwind tables a > bit -- not really worth it in hand-written code, it's more important to > make it legible ;-) OK. Because LR still holds the LR value until it's clobbered later, by which point the cfi_register has taken effect. But yeah I think for readability it's best to keep the cfi_register next to the mflr. >> >> There's also no code to load/restore the TOC pointer on BE, which I >> >> think we'll need to handle. >> > >> > I see no code in the generated vdso64.so doing anything with r2, but if >> > you think that's needed, just let's do it: >> >> Hmm, true. >> >> The compiler will use the toc for globals (and possibly also for large >> constants?) > > And anything else it bloody well wants to, yeah :-) Haha yeah OK. >> AFAIK there's no way to disable use of the toc, or make it a build error >> if it's needed. > > Yes. > >> At the same time it's much safer for us to just save/restore r2, and >> probably in the noise performance wise. > > If you want a function to be able to work with ABI-compliant code safely > (in all cases), you'll have to make it itself ABI-compliant as well, > yes :-) True. Except this is the VDSO which has previously been a bit wild west as far as ABI goes :) >> So yeah we should probably do as below. > > [ snip ] > > Looks good yes. Thanks for reviewing. cheers
Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Hi! On Wed, Aug 05, 2020 at 04:24:16PM +1000, Michael Ellerman wrote: > Christophe Leroy writes: > > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack > > frame whenever it has anything to same. > > Yeah OK that would explain it. > > > Here is what I have in libc.so: > > > > 000fbb60 <__clock_gettime>: > > fbb60: 94 21 ff e0 stwur1,-32(r1) This is the *only* place where you can use a negative offset from r1: in the stwu to extend the stack (set up a new stack frame, or make the current one bigger). > > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h > > b/arch/powerpc/include/asm/vdso/gettimeofday.h > > index a0712a6e80d9..0b6fa245d54e 100644 > > --- a/arch/powerpc/include/asm/vdso/gettimeofday.h > > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h > > @@ -10,6 +10,7 @@ > > .cfi_startproc > > PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) > > mflrr0 > > + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) > > .cfi_register lr, r0 > > The cfi_register should come directly after the mflr I think. That is the idiomatic way to write it, and most obviously correct. But as long as the value in LR at function entry is available in multiple places (like, in LR and in R0 here), it is fine to use either for unwinding. Sometimes you can use this to optimise the unwind tables a bit -- not really worth it in hand-written code, it's more important to make it legible ;-) > >> There's also no code to load/restore the TOC pointer on BE, which I > >> think we'll need to handle. > > > > I see no code in the generated vdso64.so doing anything with r2, but if > > you think that's needed, just let's do it: > > Hmm, true. > > The compiler will use the toc for globals (and possibly also for large > constants?) And anything else it bloody well wants to, yeah :-) > AFAIK there's no way to disable use of the toc, or make it a build error > if it's needed. Yes. > At the same time it's much safer for us to just save/restore r2, and > probably in the noise performance wise. If you want a function to be able to work with ABI-compliant code safely (in all cases), you'll have to make it itself ABI-compliant as well, yes :-) > So yeah we should probably do as below. [ snip ] Looks good yes. Segher
Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Christophe Leroy writes: > On 07/15/2020 01:04 AM, Michael Ellerman wrote: >> Christophe Leroy writes: >>> Prepare for switching VDSO to generic C implementation in following >>> patch. Here, we: >>> - Modify __get_datapage() to take an offset >>> - Prepare the helpers to call the C VDSO functions >>> - Prepare the required callbacks for the C VDSO functions >>> - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES >>> - Add the C trampolines to the generic C VDSO functions >>> >>> powerpc is a bit special for VDSO as well as system calls in the >>> way that it requires setting CR SO bit which cannot be done in C. >>> Therefore, entry/exit needs to be performed in ASM. >>> >>> Implementing __arch_get_vdso_data() would clobber the link register, >>> requiring the caller to save it. As the ASM calling function already >>> has to set a stack frame and saves the link register before calling >>> the C vdso function, retriving the vdso data pointer there is lighter. >> ... >> >>> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h >>> b/arch/powerpc/include/asm/vdso/gettimeofday.h >>> new file mode 100644 >>> index ..4452897f9bd8 >>> --- /dev/null >>> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h >>> @@ -0,0 +1,175 @@ >>> +/* SPDX-License-Identifier: GPL-2.0 */ >>> +#ifndef __ASM_VDSO_GETTIMEOFDAY_H >>> +#define __ASM_VDSO_GETTIMEOFDAY_H >>> + >>> +#include >>> + >>> +#ifdef __ASSEMBLY__ >>> + >>> +.macro cvdso_call funct >>> + .cfi_startproc >>> + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) >>> + mflrr0 >>> + .cfi_register lr, r0 >>> + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) >> >> This doesn't work for me on ppc64(le) with glibc. >> >> glibc doesn't create a stack frame before making the VDSO call, so the >> store of r0 (LR) goes into the caller's frame, corrupting the saved LR, >> leading to an infinite loop. >> >> This is an example from a statically built program that calls >> clock_gettime(): >> >> 10030cb0 <__clock_gettime>: >> 10030cb0: 0e 10 40 3c lis r2,4110 >> 10030cb4: 00 7a 42 38 addir2,r2,31232 >> 10030cb8: a6 02 08 7c mflrr0 >> 10030cbc: ff ff 22 3d addis r9,r2,-1 >> 10030cc0: 58 6d 29 39 addir9,r9,27992 >> 10030cc4: f0 ff c1 fb std r30,-16(r1) <-- >> redzone store >> 10030cc8: 78 23 9e 7c mr r30,r4 >> 10030ccc: f8 ff e1 fb std r31,-8(r1) <-- >> redzone store >> 10030cd0: 78 1b 7f 7c mr r31,r3 >> 10030cd4: 10 00 01 f8 std r0,16(r1) <-- >> save LR to caller's frame >> 10030cd8: 00 00 09 e8 ld r0,0(r9) >> 10030cdc: 00 00 20 2c cmpdi r0,0 >> 10030ce0: 50 00 82 41 beq 10030d30 <__clock_gettime+0x80> >> 10030ce4: a6 03 09 7c mtctr r0 >> 10030ce8: 21 04 80 4e bctrl >> <-- vdso call >> 10030cec: 26 00 00 7c mfcrr0 >> 10030cf0: 00 10 09 74 andis. r9,r0,4096 >> 10030cf4: 78 1b 69 7c mr r9,r3 >> 10030cf8: 28 00 82 40 bne 10030d20 <__clock_gettime+0x70> >> 10030cfc: b4 07 23 7d extsw r3,r9 >> 10030d00: 10 00 01 e8 ld r0,16(r1) <-- >> load saved LR, since clobbered by the VDSO >> 10030d04: f0 ff c1 eb ld r30,-16(r1) >> 10030d08: f8 ff e1 eb ld r31,-8(r1) >> 10030d0c: a6 03 08 7c mtlrr0 <-- >> restore LR >> 10030d10: 20 00 80 4e blr <-- >> jumps to 10030cec >> >> >> I'm kind of confused how it worked for you on 32-bit. > > Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack > frame whenever it has anything to same. Yeah OK that would explain it. > Here is what I have in libc.so: > > 000fbb60 <__clock_gettime>: > fbb60:94 21 ff e0 stwur1,-32(r1) > fbb64:7c 08 02 a6 mflrr0 > fbb68:48 09 75 c9 bl 193130 <_IO_stdout_+0x24b0> > fbb6c:93 c1 00 18 stw r30,24(r1) > fbb70:7f c8 02 a6 mflrr30 > fbb74:90 01 00 24 stw r0,36(r1) > fbb78:93 81 00 10 stw r28,16(r1) > fbb7c:93 a1 00 14 stw r29,20(r1) > fbb80:83 9e fc 98 lwz r28,-872(r30) > fbb84:93 e1 00 1c stw r31,28(r1) > fbb88:80 1c 00 00 lwz r0,0(r28) > fbb8c:83 82 8f f4 lwz r28,-28684(r2) > fbb90:7c 7f 1b 78 mr r31,r3 > fbb94:7c 00 e2 79 xor.r0,r0,r28 > fbb98:7c 9d 23 78 mr r29,r4 > fbb9c:41 82 00 40 beq fbbdc <__clock_gettime+0x7c> > fbba0:7c 09 03 a6 mtctr r0 > fbba4:4e 80 04 21 bctrl > fbba8:7c 00 00 26 mfcrr0 > fbbac:74 1c 10
Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
On 07/15/2020 01:04 AM, Michael Ellerman wrote: Christophe Leroy writes: Prepare for switching VDSO to generic C implementation in following patch. Here, we: - Modify __get_datapage() to take an offset - Prepare the helpers to call the C VDSO functions - Prepare the required callbacks for the C VDSO functions - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES - Add the C trampolines to the generic C VDSO functions powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit needs to be performed in ASM. Implementing __arch_get_vdso_data() would clobber the link register, requiring the caller to save it. As the ASM calling function already has to set a stack frame and saves the link register before calling the C vdso function, retriving the vdso data pointer there is lighter. ... diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h new file mode 100644 index ..4452897f9bd8 --- /dev/null +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h @@ -0,0 +1,175 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#include + +#ifdef __ASSEMBLY__ + +.macro cvdso_call funct + .cfi_startproc + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) + mflrr0 + .cfi_register lr, r0 + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) This doesn't work for me on ppc64(le) with glibc. glibc doesn't create a stack frame before making the VDSO call, so the store of r0 (LR) goes into the caller's frame, corrupting the saved LR, leading to an infinite loop. This is an example from a statically built program that calls clock_gettime(): 10030cb0 <__clock_gettime>: 10030cb0: 0e 10 40 3c lis r2,4110 10030cb4: 00 7a 42 38 addir2,r2,31232 10030cb8: a6 02 08 7c mflrr0 10030cbc: ff ff 22 3d addis r9,r2,-1 10030cc0: 58 6d 29 39 addir9,r9,27992 10030cc4: f0 ff c1 fb std r30,-16(r1)<-- redzone store 10030cc8: 78 23 9e 7c mr r30,r4 10030ccc: f8 ff e1 fb std r31,-8(r1) <-- redzone store 10030cd0: 78 1b 7f 7c mr r31,r3 10030cd4: 10 00 01 f8 std r0,16(r1) <-- save LR to caller's frame 10030cd8: 00 00 09 e8 ld r0,0(r9) 10030cdc: 00 00 20 2c cmpdi r0,0 10030ce0: 50 00 82 41 beq 10030d30 <__clock_gettime+0x80> 10030ce4: a6 03 09 7c mtctr r0 10030ce8: 21 04 80 4e bctrl <-- vdso call 10030cec: 26 00 00 7c mfcrr0 10030cf0: 00 10 09 74 andis. r9,r0,4096 10030cf4: 78 1b 69 7c mr r9,r3 10030cf8: 28 00 82 40 bne 10030d20 <__clock_gettime+0x70> 10030cfc: b4 07 23 7d extsw r3,r9 10030d00: 10 00 01 e8 ld r0,16(r1) <-- load saved LR, since clobbered by the VDSO 10030d04: f0 ff c1 eb ld r30,-16(r1) 10030d08: f8 ff e1 eb ld r31,-8(r1) 10030d0c: a6 03 08 7c mtlrr0 <-- restore LR 10030d10: 20 00 80 4e blr<-- jumps to 10030cec I'm kind of confused how it worked for you on 32-bit. Indeed, 32-bit doesn't have a redzone, so I believe it needs a stack frame whenever it has anything to same. Here is what I have in libc.so: 000fbb60 <__clock_gettime>: fbb60: 94 21 ff e0 stwur1,-32(r1) fbb64: 7c 08 02 a6 mflrr0 fbb68: 48 09 75 c9 bl 193130 <_IO_stdout_+0x24b0> fbb6c: 93 c1 00 18 stw r30,24(r1) fbb70: 7f c8 02 a6 mflrr30 fbb74: 90 01 00 24 stw r0,36(r1) fbb78: 93 81 00 10 stw r28,16(r1) fbb7c: 93 a1 00 14 stw r29,20(r1) fbb80: 83 9e fc 98 lwz r28,-872(r30) fbb84: 93 e1 00 1c stw r31,28(r1) fbb88: 80 1c 00 00 lwz r0,0(r28) fbb8c: 83 82 8f f4 lwz r28,-28684(r2) fbb90: 7c 7f 1b 78 mr r31,r3 fbb94: 7c 00 e2 79 xor.r0,r0,r28 fbb98: 7c 9d 23 78 mr r29,r4 fbb9c: 41 82 00 40 beq fbbdc <__clock_gettime+0x7c> fbba0: 7c 09 03 a6 mtctr r0 fbba4: 4e 80 04 21 bctrl fbba8: 7c 00 00 26 mfcrr0 fbbac: 74 1c 10 00 andis. r28,r0,4096 fbbb0: 40 82 00 24 bne fbbd4 <__clock_gettime+0x74> fbbb4: 80 01 00 24 lwz r0,36(r1) fbbb8: 83 81 00 10 lwz r28,16(r1) fbbbc: 7c 08 03 a6 mtlrr0 fbbc0: 83 a1 00 14 lwz r29,20(r1) fbbc4: 83 c1 00 18 lwz
Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Christophe Leroy writes: > Michael Ellerman a écrit : > >> Christophe Leroy writes: >>> Prepare for switching VDSO to generic C implementation in following >>> patch. Here, we: >>> - Modify __get_datapage() to take an offset >>> - Prepare the helpers to call the C VDSO functions >>> - Prepare the required callbacks for the C VDSO functions >>> - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES >>> - Add the C trampolines to the generic C VDSO functions >>> >>> powerpc is a bit special for VDSO as well as system calls in the >>> way that it requires setting CR SO bit which cannot be done in C. >>> Therefore, entry/exit needs to be performed in ASM. >>> >>> Implementing __arch_get_vdso_data() would clobber the link register, >>> requiring the caller to save it. As the ASM calling function already >>> has to set a stack frame and saves the link register before calling >>> the C vdso function, retriving the vdso data pointer there is lighter. >> ... >> >>> diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h >>> b/arch/powerpc/include/asm/vdso/gettimeofday.h >>> new file mode 100644 >>> index ..4452897f9bd8 >>> --- /dev/null >>> +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h >>> @@ -0,0 +1,175 @@ >>> +/* SPDX-License-Identifier: GPL-2.0 */ >>> +#ifndef __ASM_VDSO_GETTIMEOFDAY_H >>> +#define __ASM_VDSO_GETTIMEOFDAY_H >>> + >>> +#include >>> + >>> +#ifdef __ASSEMBLY__ >>> + >>> +.macro cvdso_call funct >>> + .cfi_startproc >>> + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) >>> + mflrr0 >>> + .cfi_register lr, r0 >>> + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) >> >> This doesn't work for me on ppc64(le) with glibc. >> >> glibc doesn't create a stack frame before making the VDSO call, so the >> store of r0 (LR) goes into the caller's frame, corrupting the saved LR, >> leading to an infinite loop. > > Where should it be saved if it can't be saved in the standard location ? As Michael pointed out, userspace doesn't treat the VDSO as a normal function call. In order to keep compatibility with existent software, LR would need to be saved on another stack frame. -- Tulio Magno
Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Michael Ellerman a écrit : Christophe Leroy writes: Prepare for switching VDSO to generic C implementation in following patch. Here, we: - Modify __get_datapage() to take an offset - Prepare the helpers to call the C VDSO functions - Prepare the required callbacks for the C VDSO functions - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES - Add the C trampolines to the generic C VDSO functions powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit needs to be performed in ASM. Implementing __arch_get_vdso_data() would clobber the link register, requiring the caller to save it. As the ASM calling function already has to set a stack frame and saves the link register before calling the C vdso function, retriving the vdso data pointer there is lighter. ... diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h new file mode 100644 index ..4452897f9bd8 --- /dev/null +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h @@ -0,0 +1,175 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#include + +#ifdef __ASSEMBLY__ + +.macro cvdso_call funct + .cfi_startproc + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) + mflrr0 + .cfi_register lr, r0 + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) This doesn't work for me on ppc64(le) with glibc. glibc doesn't create a stack frame before making the VDSO call, so the store of r0 (LR) goes into the caller's frame, corrupting the saved LR, leading to an infinite loop. Where should it be saved if it can't be saved in the standard location ? This is an example from a statically built program that calls clock_gettime(): 10030cb0 <__clock_gettime>: 10030cb0: 0e 10 40 3c lis r2,4110 10030cb4: 00 7a 42 38 addir2,r2,31232 10030cb8: a6 02 08 7c mflrr0 10030cbc: ff ff 22 3d addis r9,r2,-1 10030cc0: 58 6d 29 39 addir9,r9,27992 10030cc4: f0 ff c1 fb std r30,-16(r1) <-- redzone store 10030cc8: 78 23 9e 7c mr r30,r4 10030ccc: f8 ff e1 fb std r31,-8(r1) <-- redzone store 10030cd0: 78 1b 7f 7c mr r31,r3 10030cd4: 10 00 01 f8 std r0,16(r1) <-- save LR to caller's frame 10030cd8: 00 00 09 e8 ld r0,0(r9) 10030cdc: 00 00 20 2c cmpdi r0,0 10030ce0: 50 00 82 41 beq 10030d30 <__clock_gettime+0x80> 10030ce4: a6 03 09 7c mtctr r0 10030ce8: 21 04 80 4e bctrl <-- vdso call 10030cec: 26 00 00 7c mfcrr0 10030cf0: 00 10 09 74 andis. r9,r0,4096 10030cf4: 78 1b 69 7c mr r9,r3 10030cf8: 28 00 82 40 bne 10030d20 <__clock_gettime+0x70> 10030cfc: b4 07 23 7d extsw r3,r9 10030d00: 10 00 01 e8 ld r0,16(r1) <-- load saved LR, since clobbered by the VDSO 10030d04: f0 ff c1 eb ld r30,-16(r1) 10030d08: f8 ff e1 eb ld r31,-8(r1) 10030d0c: a6 03 08 7c mtlrr0 <-- restore LR 10030d10: 20 00 80 4e blr <-- jumps to 10030cec I'm kind of confused how it worked for you on 32-bit. So am I then. I'm away for 3 weeks, summer break. I'll check when I'm back. There's also no code to load/restore the TOC pointer on BE, which I think we'll need to handle. What does it means exactly ? Just saving r2 all the time ? Is there a dedicated location in the stack frame for it ? Is that only for 64 be ? Christophe
Re: [PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Christophe Leroy writes: > Prepare for switching VDSO to generic C implementation in following > patch. Here, we: > - Modify __get_datapage() to take an offset > - Prepare the helpers to call the C VDSO functions > - Prepare the required callbacks for the C VDSO functions > - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES > - Add the C trampolines to the generic C VDSO functions > > powerpc is a bit special for VDSO as well as system calls in the > way that it requires setting CR SO bit which cannot be done in C. > Therefore, entry/exit needs to be performed in ASM. > > Implementing __arch_get_vdso_data() would clobber the link register, > requiring the caller to save it. As the ASM calling function already > has to set a stack frame and saves the link register before calling > the C vdso function, retriving the vdso data pointer there is lighter. ... > diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h > b/arch/powerpc/include/asm/vdso/gettimeofday.h > new file mode 100644 > index ..4452897f9bd8 > --- /dev/null > +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h > @@ -0,0 +1,175 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef __ASM_VDSO_GETTIMEOFDAY_H > +#define __ASM_VDSO_GETTIMEOFDAY_H > + > +#include > + > +#ifdef __ASSEMBLY__ > + > +.macro cvdso_call funct > + .cfi_startproc > + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) > + mflrr0 > + .cfi_register lr, r0 > + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) This doesn't work for me on ppc64(le) with glibc. glibc doesn't create a stack frame before making the VDSO call, so the store of r0 (LR) goes into the caller's frame, corrupting the saved LR, leading to an infinite loop. This is an example from a statically built program that calls clock_gettime(): 10030cb0 <__clock_gettime>: 10030cb0: 0e 10 40 3c lis r2,4110 10030cb4: 00 7a 42 38 addir2,r2,31232 10030cb8: a6 02 08 7c mflrr0 10030cbc: ff ff 22 3d addis r9,r2,-1 10030cc0: 58 6d 29 39 addir9,r9,27992 10030cc4: f0 ff c1 fb std r30,-16(r1) <-- redzone store 10030cc8: 78 23 9e 7c mr r30,r4 10030ccc: f8 ff e1 fb std r31,-8(r1) <-- redzone store 10030cd0: 78 1b 7f 7c mr r31,r3 10030cd4: 10 00 01 f8 std r0,16(r1) <-- save LR to caller's frame 10030cd8: 00 00 09 e8 ld r0,0(r9) 10030cdc: 00 00 20 2c cmpdi r0,0 10030ce0: 50 00 82 41 beq 10030d30 <__clock_gettime+0x80> 10030ce4: a6 03 09 7c mtctr r0 10030ce8: 21 04 80 4e bctrl <-- vdso call 10030cec: 26 00 00 7c mfcrr0 10030cf0: 00 10 09 74 andis. r9,r0,4096 10030cf4: 78 1b 69 7c mr r9,r3 10030cf8: 28 00 82 40 bne 10030d20 <__clock_gettime+0x70> 10030cfc: b4 07 23 7d extsw r3,r9 10030d00: 10 00 01 e8 ld r0,16(r1) <-- load saved LR, since clobbered by the VDSO 10030d04: f0 ff c1 eb ld r30,-16(r1) 10030d08: f8 ff e1 eb ld r31,-8(r1) 10030d0c: a6 03 08 7c mtlrr0 <-- restore LR 10030d10: 20 00 80 4e blr <-- jumps to 10030cec I'm kind of confused how it worked for you on 32-bit. There's also no code to load/restore the TOC pointer on BE, which I think we'll need to handle. cheers
[PATCH v8 5/8] powerpc/vdso: Prepare for switching VDSO to generic C implementation.
Prepare for switching VDSO to generic C implementation in following patch. Here, we: - Modify __get_datapage() to take an offset - Prepare the helpers to call the C VDSO functions - Prepare the required callbacks for the C VDSO functions - Prepare the clocksource.h files to define VDSO_ARCH_CLOCKMODES - Add the C trampolines to the generic C VDSO functions powerpc is a bit special for VDSO as well as system calls in the way that it requires setting CR SO bit which cannot be done in C. Therefore, entry/exit needs to be performed in ASM. Implementing __arch_get_vdso_data() would clobber the link register, requiring the caller to save it. As the ASM calling function already has to set a stack frame and saves the link register before calling the C vdso function, retriving the vdso data pointer there is lighter. Implement __arch_vdso_capable() and: - When the timebase is used, make it always return true. - When the RTC clock is used, make it always return false. Provide vdso_shift_ns(), as the generic x >> s gives the following bad result: 18: 35 25 ff e0 addic. r9,r5,-32 1c: 41 80 00 10 blt 2c 20: 7c 64 4c 30 srw r4,r3,r9 24: 38 60 00 00 li r3,0 ... 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 30: 21 45 00 1f subfic r10,r5,31 34: 7c 84 2c 30 srw r4,r4,r5 38: 7d 29 50 30 slw r9,r9,r10 3c: 7c 63 2c 30 srw r3,r3,r5 40: 7d 24 23 78 or r4,r9,r4 In our case the shift is always <= 32. In addition, the upper 32 bits of the result are likely nul. Lets GCC know it, it also optimises the following calculations. With the patch, we get: 0: 21 25 00 20 subfic r9,r5,32 4: 7c 69 48 30 slw r9,r3,r9 8: 7c 84 2c 30 srw r4,r4,r5 c: 7d 24 23 78 or r4,r9,r4 10: 7c 63 2c 30 srw r3,r3,r5 Signed-off-by: Christophe Leroy --- v8: - New, splitted out of last patch of the series --- arch/powerpc/include/asm/clocksource.h | 7 + arch/powerpc/include/asm/vdso/clocksource.h | 7 + arch/powerpc/include/asm/vdso/gettimeofday.h | 175 +++ arch/powerpc/include/asm/vdso_datapage.h | 6 +- arch/powerpc/kernel/vdso32/vgettimeofday.c | 29 +++ arch/powerpc/kernel/vdso64/vgettimeofday.c | 29 +++ 6 files changed, 250 insertions(+), 3 deletions(-) create mode 100644 arch/powerpc/include/asm/clocksource.h create mode 100644 arch/powerpc/include/asm/vdso/clocksource.h create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c create mode 100644 arch/powerpc/kernel/vdso64/vgettimeofday.c diff --git a/arch/powerpc/include/asm/clocksource.h b/arch/powerpc/include/asm/clocksource.h new file mode 100644 index ..482185566b0c --- /dev/null +++ b/arch/powerpc/include/asm/clocksource.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_CLOCKSOURCE_H +#define _ASM_CLOCKSOURCE_H + +#include + +#endif diff --git a/arch/powerpc/include/asm/vdso/clocksource.h b/arch/powerpc/include/asm/vdso/clocksource.h new file mode 100644 index ..ec5d672d2569 --- /dev/null +++ b/arch/powerpc/include/asm/vdso/clocksource.h @@ -0,0 +1,7 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSOCLOCKSOURCE_H +#define __ASM_VDSOCLOCKSOURCE_H + +#define VDSO_ARCH_CLOCKMODES VDSO_CLOCKMODE_ARCHTIMER + +#endif diff --git a/arch/powerpc/include/asm/vdso/gettimeofday.h b/arch/powerpc/include/asm/vdso/gettimeofday.h new file mode 100644 index ..4452897f9bd8 --- /dev/null +++ b/arch/powerpc/include/asm/vdso/gettimeofday.h @@ -0,0 +1,175 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_VDSO_GETTIMEOFDAY_H +#define __ASM_VDSO_GETTIMEOFDAY_H + +#include + +#ifdef __ASSEMBLY__ + +.macro cvdso_call funct + .cfi_startproc + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) + mflrr0 + .cfi_register lr, r0 + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) + get_datapager5, VDSO_DATA_OFFSET + bl \funct + PPC_LL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) + cmpwi r3, 0 + mtlrr0 + .cfi_restore lr + addir1, r1, STACK_FRAME_OVERHEAD + crclr so + beqlr+ + crset so + neg r3, r3 + blr + .cfi_endproc +.endm + +.macro cvdso_call_time funct + .cfi_startproc + PPC_STLUr1, -STACK_FRAME_OVERHEAD(r1) + mflrr0 + .cfi_register lr, r0 + PPC_STL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) + get_datapager4, VDSO_DATA_OFFSET + bl \funct + PPC_LL r0, STACK_FRAME_OVERHEAD + PPC_LR_STKOFF(r1) + crclr so + mtlrr0 + .cfi_restore lr + addir1, r1, STACK_FRAME_OVERHEAD + blr + .cfi_endproc +.endm + +#else + +#include