Re: [PATCH 0/7] x86 vdso32 cleanups

2015-09-01 Thread Andy Lutomirski
On Mon, Aug 31, 2015 at 6:37 PM, Andy Lutomirski  wrote:
> I got random errors from perf kvm, but I think I found at least part
> of the issue.  The two irqs_disabled() calls in common.c are kind of
> expensive.  I should disable them on non-lockdep kernels.
>
> The context tracking hooks are also too expensive, even when disabled.
> I should do something to optimize those.  Hello, static keys?  This
> doesn't affect syscalls, though.
>
> With context tracking off and the irqs_disabled checks commented out,
> we're probably doing well enough.  We can always tweak the C code and
> aggressively force inlining if we want a few cycles back.

Currently, a compat AT_SYSINFO syscall (getpid) is 171 cycles for me.
With my patches, it's 196 cycles, so it's really not that bad.  The
impact will probably be slightly worse on native 32-bit because of
increased register pressure and because one of the micro-optimizations
I threw in are 64-bit specific.  We could probably tune the C code a
bit more to get a few of the cycles back.

On the flip side, the rewrite is *far* faster in some of the slow path
cases because the slow path no longer forces IRET.

On 32-bit, there's the added benefit that we could drop asmlinkage
from the syscall bodies on top of the rewrite.

--Andy

>
> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-09-01 Thread Andy Lutomirski
On Mon, Aug 31, 2015 at 6:37 PM, Andy Lutomirski  wrote:
> I got random errors from perf kvm, but I think I found at least part
> of the issue.  The two irqs_disabled() calls in common.c are kind of
> expensive.  I should disable them on non-lockdep kernels.
>
> The context tracking hooks are also too expensive, even when disabled.
> I should do something to optimize those.  Hello, static keys?  This
> doesn't affect syscalls, though.
>
> With context tracking off and the irqs_disabled checks commented out,
> we're probably doing well enough.  We can always tweak the C code and
> aggressively force inlining if we want a few cycles back.

Currently, a compat AT_SYSINFO syscall (getpid) is 171 cycles for me.
With my patches, it's 196 cycles, so it's really not that bad.  The
impact will probably be slightly worse on native 32-bit because of
increased register pressure and because one of the micro-optimizations
I threw in are 64-bit specific.  We could probably tune the C code a
bit more to get a few of the cycles back.

On the flip side, the rewrite is *far* faster in some of the slow path
cases because the slow path no longer forces IRET.

On 32-bit, there's the added benefit that we could drop asmlinkage
from the syscall bodies on top of the rewrite.

--Andy

>
> --Andy



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-08-31 Thread Andy Lutomirski
On Mon, Aug 31, 2015 at 6:19 PM, Andy Lutomirski  wrote:
>
> On Sun, Aug 30, 2015 at 7:52 PM, Andy Lutomirski  wrote:
>>
>> On Sun, Aug 30, 2015 at 2:18 PM, Brian Gerst  wrote:
>> > On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski  
>> > wrote:
>> >> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst  wrote:
>> >>> This patch set contains several cleanups to the 32-bit VDSO.  The
>> >>> main change is to only build one VDSO image, and select the syscall
>> >>> entry point at runtime.
>> >>
>> >> Oh no, we have dueling patches!
>> >>
>> >> I have a 2/3 finished series that cleans up the AT_SYSINFO mess
>> >> differently, as I outlined earlier.  I've only done the compat and
>> >> common bits (no 32-bit native support quite yet), and it enters
>> >> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
>> >> The SYSRET bit isn't there yet.
>> >>
>> >> Other than some ifdeffery, the final system_call.S looks like this:
>> >>
>> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat
>> >>
>> >> The meat is (sorry for whitespace damage):
>> >>
>> >> .text
>> >> .globl __kernel_vsyscall
>> >> .type __kernel_vsyscall,@function
>> >> ALIGN
>> >> __kernel_vsyscall:
>> >> CFI_STARTPROC
>> >> /*
>> >> * Reshuffle regs so that all of any of the entry instructions
>> >> * will preserve enough state.
>> >> */
>> >> pushl %edx
>> >> CFI_ADJUST_CFA_OFFSET 4
>> >> CFI_REL_OFFSET edx, 0
>> >> pushl %ecx
>> >> CFI_ADJUST_CFA_OFFSET 4
>> >> CFI_REL_OFFSET ecx, 0
>> >> movl %esp, %ecx
>> >>
>> >> #ifdef CONFIG_X86_64
>> >> /* If SYSENTER is available, use it. */
>> >> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
>> >>  "syscall",  X86_FEATURE_SYSCALL32
>> >> #endif
>> >>
>> >> /* Enter using int $0x80 */
>> >> movl (%esp), %ecx
>> >> int $0x80
>> >> GLOBAL(int80_landing_pad)
>> >>
>> >> /* Restore ECX and EDX in case they were clobbered. */
>> >> popl %ecx
>> >> CFI_RESTORE ecx
>> >> CFI_ADJUST_CFA_OFFSET -4
>> >> popl %edx
>> >> CFI_RESTORE edx
>> >> CFI_ADJUST_CFA_OFFSET -4
>> >> ret
>> >> CFI_ENDPROC
>> >>
>> >> .size __kernel_vsyscall,.-__kernel_vsyscall
>> >> .previous
>> >>
>> >> And that's it.
>> >>
>> >> What do you think?  This comes with massively cleaned up kernel-side
>> >> asm as well as a test case that actually validates the CFI directives.
>> >>
>> >> Certainly, a bunch of your patches make sense regardless, and I'll
>> >> review them and add them to my queue soon.
>> >>
>> >> --Andy
>> >
>> > How does the performance compare to the original?  Looking at the
>> > disassembly, there are two added function calls, and it reloads the
>> > args from the stack instead of just shuffling registers.
>>
>> The replacement is dramatically faster, which means I probably
>> benchmarked it wrong.  I'll try again in a day or two.
>
>
> It's enough slower to be problematic.  I need to figure out how to trace it 
> properly.  (Hmm?  Maybe it's time to learn how to get perf on the host to 
> trace a KVM guest.)
>
> Everything is and was hilariously slow with context tracking on.  That needs 
> to get fixed, and hopefully once this entry stuff is done someone will do the 
> other end of it.
>

I got random errors from perf kvm, but I think I found at least part
of the issue.  The two irqs_disabled() calls in common.c are kind of
expensive.  I should disable them on non-lockdep kernels.

The context tracking hooks are also too expensive, even when disabled.
I should do something to optimize those.  Hello, static keys?  This
doesn't affect syscalls, though.

With context tracking off and the irqs_disabled checks commented out,
we're probably doing well enough.  We can always tweak the C code and
aggressively force inlining if we want a few cycles back.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-08-31 Thread Andy Lutomirski
On Mon, Aug 31, 2015 at 6:19 PM, Andy Lutomirski  wrote:
>
> On Sun, Aug 30, 2015 at 7:52 PM, Andy Lutomirski  wrote:
>>
>> On Sun, Aug 30, 2015 at 2:18 PM, Brian Gerst  wrote:
>> > On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski  
>> > wrote:
>> >> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst  wrote:
>> >>> This patch set contains several cleanups to the 32-bit VDSO.  The
>> >>> main change is to only build one VDSO image, and select the syscall
>> >>> entry point at runtime.
>> >>
>> >> Oh no, we have dueling patches!
>> >>
>> >> I have a 2/3 finished series that cleans up the AT_SYSINFO mess
>> >> differently, as I outlined earlier.  I've only done the compat and
>> >> common bits (no 32-bit native support quite yet), and it enters
>> >> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
>> >> The SYSRET bit isn't there yet.
>> >>
>> >> Other than some ifdeffery, the final system_call.S looks like this:
>> >>
>> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat
>> >>
>> >> The meat is (sorry for whitespace damage):
>> >>
>> >> .text
>> >> .globl __kernel_vsyscall
>> >> .type __kernel_vsyscall,@function
>> >> ALIGN
>> >> __kernel_vsyscall:
>> >> CFI_STARTPROC
>> >> /*
>> >> * Reshuffle regs so that all of any of the entry instructions
>> >> * will preserve enough state.
>> >> */
>> >> pushl %edx
>> >> CFI_ADJUST_CFA_OFFSET 4
>> >> CFI_REL_OFFSET edx, 0
>> >> pushl %ecx
>> >> CFI_ADJUST_CFA_OFFSET 4
>> >> CFI_REL_OFFSET ecx, 0
>> >> movl %esp, %ecx
>> >>
>> >> #ifdef CONFIG_X86_64
>> >> /* If SYSENTER is available, use it. */
>> >> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
>> >>  "syscall",  X86_FEATURE_SYSCALL32
>> >> #endif
>> >>
>> >> /* Enter using int $0x80 */
>> >> movl (%esp), %ecx
>> >> int $0x80
>> >> GLOBAL(int80_landing_pad)
>> >>
>> >> /* Restore ECX and EDX in case they were clobbered. */
>> >> popl %ecx
>> >> CFI_RESTORE ecx
>> >> CFI_ADJUST_CFA_OFFSET -4
>> >> popl %edx
>> >> CFI_RESTORE edx
>> >> CFI_ADJUST_CFA_OFFSET -4
>> >> ret
>> >> CFI_ENDPROC
>> >>
>> >> .size __kernel_vsyscall,.-__kernel_vsyscall
>> >> .previous
>> >>
>> >> And that's it.
>> >>
>> >> What do you think?  This comes with massively cleaned up kernel-side
>> >> asm as well as a test case that actually validates the CFI directives.
>> >>
>> >> Certainly, a bunch of your patches make sense regardless, and I'll
>> >> review them and add them to my queue soon.
>> >>
>> >> --Andy
>> >
>> > How does the performance compare to the original?  Looking at the
>> > disassembly, there are two added function calls, and it reloads the
>> > args from the stack instead of just shuffling registers.
>>
>> The replacement is dramatically faster, which means I probably
>> benchmarked it wrong.  I'll try again in a day or two.
>
>
> It's enough slower to be problematic.  I need to figure out how to trace it 
> properly.  (Hmm?  Maybe it's time to learn how to get perf on the host to 
> trace a KVM guest.)
>
> Everything is and was hilariously slow with context tracking on.  That needs 
> to get fixed, and hopefully once this entry stuff is done someone will do the 
> other end of it.
>

I got random errors from perf kvm, but I think I found at least part
of the issue.  The two irqs_disabled() calls in common.c are kind of
expensive.  I should disable them on non-lockdep kernels.

The context tracking hooks are also too expensive, even when disabled.
I should do something to optimize those.  Hello, static keys?  This
doesn't affect syscalls, though.

With context tracking off and the irqs_disabled checks commented out,
we're probably doing well enough.  We can always tweak the C code and
aggressively force inlining if we want a few cycles back.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-08-30 Thread Andy Lutomirski
On Sun, Aug 30, 2015 at 2:18 PM, Brian Gerst  wrote:
> On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski  wrote:
>> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst  wrote:
>>> This patch set contains several cleanups to the 32-bit VDSO.  The
>>> main change is to only build one VDSO image, and select the syscall
>>> entry point at runtime.
>>
>> Oh no, we have dueling patches!
>>
>> I have a 2/3 finished series that cleans up the AT_SYSINFO mess
>> differently, as I outlined earlier.  I've only done the compat and
>> common bits (no 32-bit native support quite yet), and it enters
>> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
>> The SYSRET bit isn't there yet.
>>
>> Other than some ifdeffery, the final system_call.S looks like this:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat
>>
>> The meat is (sorry for whitespace damage):
>>
>> .text
>> .globl __kernel_vsyscall
>> .type __kernel_vsyscall,@function
>> ALIGN
>> __kernel_vsyscall:
>> CFI_STARTPROC
>> /*
>> * Reshuffle regs so that all of any of the entry instructions
>> * will preserve enough state.
>> */
>> pushl %edx
>> CFI_ADJUST_CFA_OFFSET 4
>> CFI_REL_OFFSET edx, 0
>> pushl %ecx
>> CFI_ADJUST_CFA_OFFSET 4
>> CFI_REL_OFFSET ecx, 0
>> movl %esp, %ecx
>>
>> #ifdef CONFIG_X86_64
>> /* If SYSENTER is available, use it. */
>> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
>>  "syscall",  X86_FEATURE_SYSCALL32
>> #endif
>>
>> /* Enter using int $0x80 */
>> movl (%esp), %ecx
>> int $0x80
>> GLOBAL(int80_landing_pad)
>>
>> /* Restore ECX and EDX in case they were clobbered. */
>> popl %ecx
>> CFI_RESTORE ecx
>> CFI_ADJUST_CFA_OFFSET -4
>> popl %edx
>> CFI_RESTORE edx
>> CFI_ADJUST_CFA_OFFSET -4
>> ret
>> CFI_ENDPROC
>>
>> .size __kernel_vsyscall,.-__kernel_vsyscall
>> .previous
>>
>> And that's it.
>>
>> What do you think?  This comes with massively cleaned up kernel-side
>> asm as well as a test case that actually validates the CFI directives.
>>
>> Certainly, a bunch of your patches make sense regardless, and I'll
>> review them and add them to my queue soon.
>>
>> --Andy
>
> How does the performance compare to the original?  Looking at the
> disassembly, there are two added function calls, and it reloads the
> args from the stack instead of just shuffling registers.

The replacement is dramatically faster, which means I probably
benchmarked it wrong.  I'll try again in a day or two.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-08-30 Thread Brian Gerst
On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski  wrote:
> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst  wrote:
>> This patch set contains several cleanups to the 32-bit VDSO.  The
>> main change is to only build one VDSO image, and select the syscall
>> entry point at runtime.
>
> Oh no, we have dueling patches!
>
> I have a 2/3 finished series that cleans up the AT_SYSINFO mess
> differently, as I outlined earlier.  I've only done the compat and
> common bits (no 32-bit native support quite yet), and it enters
> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
> The SYSRET bit isn't there yet.
>
> Other than some ifdeffery, the final system_call.S looks like this:
>
> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat
>
> The meat is (sorry for whitespace damage):
>
> .text
> .globl __kernel_vsyscall
> .type __kernel_vsyscall,@function
> ALIGN
> __kernel_vsyscall:
> CFI_STARTPROC
> /*
> * Reshuffle regs so that all of any of the entry instructions
> * will preserve enough state.
> */
> pushl %edx
> CFI_ADJUST_CFA_OFFSET 4
> CFI_REL_OFFSET edx, 0
> pushl %ecx
> CFI_ADJUST_CFA_OFFSET 4
> CFI_REL_OFFSET ecx, 0
> movl %esp, %ecx
>
> #ifdef CONFIG_X86_64
> /* If SYSENTER is available, use it. */
> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
>  "syscall",  X86_FEATURE_SYSCALL32
> #endif
>
> /* Enter using int $0x80 */
> movl (%esp), %ecx
> int $0x80
> GLOBAL(int80_landing_pad)
>
> /* Restore ECX and EDX in case they were clobbered. */
> popl %ecx
> CFI_RESTORE ecx
> CFI_ADJUST_CFA_OFFSET -4
> popl %edx
> CFI_RESTORE edx
> CFI_ADJUST_CFA_OFFSET -4
> ret
> CFI_ENDPROC
>
> .size __kernel_vsyscall,.-__kernel_vsyscall
> .previous
>
> And that's it.
>
> What do you think?  This comes with massively cleaned up kernel-side
> asm as well as a test case that actually validates the CFI directives.
>
> Certainly, a bunch of your patches make sense regardless, and I'll
> review them and add them to my queue soon.
>
> --Andy

How does the performance compare to the original?  Looking at the
disassembly, there are two added function calls, and it reloads the
args from the stack instead of just shuffling registers.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-08-30 Thread Brian Gerst
On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst brge...@gmail.com wrote:
 This patch set contains several cleanups to the 32-bit VDSO.  The
 main change is to only build one VDSO image, and select the syscall
 entry point at runtime.

 Oh no, we have dueling patches!

 I have a 2/3 finished series that cleans up the AT_SYSINFO mess
 differently, as I outlined earlier.  I've only done the compat and
 common bits (no 32-bit native support quite yet), and it enters
 successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
 The SYSRET bit isn't there yet.

 Other than some ifdeffery, the final system_call.S looks like this:

 https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat

 The meat is (sorry for whitespace damage):

 .text
 .globl __kernel_vsyscall
 .type __kernel_vsyscall,@function
 ALIGN
 __kernel_vsyscall:
 CFI_STARTPROC
 /*
 * Reshuffle regs so that all of any of the entry instructions
 * will preserve enough state.
 */
 pushl %edx
 CFI_ADJUST_CFA_OFFSET 4
 CFI_REL_OFFSET edx, 0
 pushl %ecx
 CFI_ADJUST_CFA_OFFSET 4
 CFI_REL_OFFSET ecx, 0
 movl %esp, %ecx

 #ifdef CONFIG_X86_64
 /* If SYSENTER is available, use it. */
 ALTERNATIVE_2 , sysenter, X86_FEATURE_SYSENTER32, \
  syscall,  X86_FEATURE_SYSCALL32
 #endif

 /* Enter using int $0x80 */
 movl (%esp), %ecx
 int $0x80
 GLOBAL(int80_landing_pad)

 /* Restore ECX and EDX in case they were clobbered. */
 popl %ecx
 CFI_RESTORE ecx
 CFI_ADJUST_CFA_OFFSET -4
 popl %edx
 CFI_RESTORE edx
 CFI_ADJUST_CFA_OFFSET -4
 ret
 CFI_ENDPROC

 .size __kernel_vsyscall,.-__kernel_vsyscall
 .previous

 And that's it.

 What do you think?  This comes with massively cleaned up kernel-side
 asm as well as a test case that actually validates the CFI directives.

 Certainly, a bunch of your patches make sense regardless, and I'll
 review them and add them to my queue soon.

 --Andy

How does the performance compare to the original?  Looking at the
disassembly, there are two added function calls, and it reloads the
args from the stack instead of just shuffling registers.

--
Brian Gerst
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-08-30 Thread Andy Lutomirski
On Sun, Aug 30, 2015 at 2:18 PM, Brian Gerst brge...@gmail.com wrote:
 On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst brge...@gmail.com wrote:
 This patch set contains several cleanups to the 32-bit VDSO.  The
 main change is to only build one VDSO image, and select the syscall
 entry point at runtime.

 Oh no, we have dueling patches!

 I have a 2/3 finished series that cleans up the AT_SYSINFO mess
 differently, as I outlined earlier.  I've only done the compat and
 common bits (no 32-bit native support quite yet), and it enters
 successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
 The SYSRET bit isn't there yet.

 Other than some ifdeffery, the final system_call.S looks like this:

 https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat

 The meat is (sorry for whitespace damage):

 .text
 .globl __kernel_vsyscall
 .type __kernel_vsyscall,@function
 ALIGN
 __kernel_vsyscall:
 CFI_STARTPROC
 /*
 * Reshuffle regs so that all of any of the entry instructions
 * will preserve enough state.
 */
 pushl %edx
 CFI_ADJUST_CFA_OFFSET 4
 CFI_REL_OFFSET edx, 0
 pushl %ecx
 CFI_ADJUST_CFA_OFFSET 4
 CFI_REL_OFFSET ecx, 0
 movl %esp, %ecx

 #ifdef CONFIG_X86_64
 /* If SYSENTER is available, use it. */
 ALTERNATIVE_2 , sysenter, X86_FEATURE_SYSENTER32, \
  syscall,  X86_FEATURE_SYSCALL32
 #endif

 /* Enter using int $0x80 */
 movl (%esp), %ecx
 int $0x80
 GLOBAL(int80_landing_pad)

 /* Restore ECX and EDX in case they were clobbered. */
 popl %ecx
 CFI_RESTORE ecx
 CFI_ADJUST_CFA_OFFSET -4
 popl %edx
 CFI_RESTORE edx
 CFI_ADJUST_CFA_OFFSET -4
 ret
 CFI_ENDPROC

 .size __kernel_vsyscall,.-__kernel_vsyscall
 .previous

 And that's it.

 What do you think?  This comes with massively cleaned up kernel-side
 asm as well as a test case that actually validates the CFI directives.

 Certainly, a bunch of your patches make sense regardless, and I'll
 review them and add them to my queue soon.

 --Andy

 How does the performance compare to the original?  Looking at the
 disassembly, there are two added function calls, and it reloads the
 args from the stack instead of just shuffling registers.

The replacement is dramatically faster, which means I probably
benchmarked it wrong.  I'll try again in a day or two.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-08-29 Thread Andy Lutomirski
On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst  wrote:
> This patch set contains several cleanups to the 32-bit VDSO.  The
> main change is to only build one VDSO image, and select the syscall
> entry point at runtime.

Oh no, we have dueling patches!

I have a 2/3 finished series that cleans up the AT_SYSINFO mess
differently, as I outlined earlier.  I've only done the compat and
common bits (no 32-bit native support quite yet), and it enters
successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
The SYSRET bit isn't there yet.

Other than some ifdeffery, the final system_call.S looks like this:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat

The meat is (sorry for whitespace damage):

.text
.globl __kernel_vsyscall
.type __kernel_vsyscall,@function
ALIGN
__kernel_vsyscall:
CFI_STARTPROC
/*
* Reshuffle regs so that all of any of the entry instructions
* will preserve enough state.
*/
pushl %edx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET edx, 0
pushl %ecx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ecx, 0
movl %esp, %ecx

#ifdef CONFIG_X86_64
/* If SYSENTER is available, use it. */
ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \
 "syscall",  X86_FEATURE_SYSCALL32
#endif

/* Enter using int $0x80 */
movl (%esp), %ecx
int $0x80
GLOBAL(int80_landing_pad)

/* Restore ECX and EDX in case they were clobbered. */
popl %ecx
CFI_RESTORE ecx
CFI_ADJUST_CFA_OFFSET -4
popl %edx
CFI_RESTORE edx
CFI_ADJUST_CFA_OFFSET -4
ret
CFI_ENDPROC

.size __kernel_vsyscall,.-__kernel_vsyscall
.previous

And that's it.

What do you think?  This comes with massively cleaned up kernel-side
asm as well as a test case that actually validates the CFI directives.

Certainly, a bunch of your patches make sense regardless, and I'll
review them and add them to my queue soon.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] x86 vdso32 cleanups

2015-08-29 Thread Brian Gerst
This patch set contains several cleanups to the 32-bit VDSO.  The
main change is to only build one VDSO image, and select the syscall
entry point at runtime.

 arch/x86/entry/vdso/.gitignore |  4 +---
 arch/x86/entry/vdso/Makefile   | 53 
++---
 arch/x86/entry/vdso/{vdso32 => }/int80.S   | 13 +
 arch/x86/entry/vdso/{vdso32 => }/sigreturn.S   |  9 +++--
 arch/x86/entry/vdso/{vdso32 => }/syscall.S | 23 +--
 arch/x86/entry/vdso/{vdso32 => }/sysenter.S| 19 +--
 arch/x86/entry/vdso/vclock_gettime.c   | 31 
+++
 arch/x86/entry/vdso/vdso-note.S| 32 
+++-
 arch/x86/entry/vdso/vdso2c.c   |  2 ++
 arch/x86/entry/vdso/vdso32-setup.c | 15 ---
 arch/x86/entry/vdso/{vdso32 => }/vdso32.lds.S  |  2 +-
 arch/x86/entry/vdso/vdso32/.gitignore  |  1 -
 arch/x86/entry/vdso/vdso32/note.S  | 44 

 arch/x86/entry/vdso/vdso32/vclock_gettime.c| 30 
--
 arch/x86/entry/vdso/vdso32/vdso-fakesections.c |  1 -
 arch/x86/entry/vdso/vma.c  |  6 +++---
 arch/x86/ia32/ia32_signal.c|  4 ++--
 arch/x86/include/asm/elf.h |  3 +--
 arch/x86/include/asm/vdso.h| 20 +---
 arch/x86/kernel/signal.c   |  4 ++--
 arch/x86/xen/setup.c   | 13 ++---
 arch/x86/xen/vdso.h|  4 
 22 files changed, 137 insertions(+), 196 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] x86 vdso32 cleanups

2015-08-29 Thread Andy Lutomirski
On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst brge...@gmail.com wrote:
 This patch set contains several cleanups to the 32-bit VDSO.  The
 main change is to only build one VDSO image, and select the syscall
 entry point at runtime.

Oh no, we have dueling patches!

I have a 2/3 finished series that cleans up the AT_SYSINFO mess
differently, as I outlined earlier.  I've only done the compat and
common bits (no 32-bit native support quite yet), and it enters
successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL.
The SYSRET bit isn't there yet.

Other than some ifdeffery, the final system_call.S looks like this:

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat

The meat is (sorry for whitespace damage):

.text
.globl __kernel_vsyscall
.type __kernel_vsyscall,@function
ALIGN
__kernel_vsyscall:
CFI_STARTPROC
/*
* Reshuffle regs so that all of any of the entry instructions
* will preserve enough state.
*/
pushl %edx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET edx, 0
pushl %ecx
CFI_ADJUST_CFA_OFFSET 4
CFI_REL_OFFSET ecx, 0
movl %esp, %ecx

#ifdef CONFIG_X86_64
/* If SYSENTER is available, use it. */
ALTERNATIVE_2 , sysenter, X86_FEATURE_SYSENTER32, \
 syscall,  X86_FEATURE_SYSCALL32
#endif

/* Enter using int $0x80 */
movl (%esp), %ecx
int $0x80
GLOBAL(int80_landing_pad)

/* Restore ECX and EDX in case they were clobbered. */
popl %ecx
CFI_RESTORE ecx
CFI_ADJUST_CFA_OFFSET -4
popl %edx
CFI_RESTORE edx
CFI_ADJUST_CFA_OFFSET -4
ret
CFI_ENDPROC

.size __kernel_vsyscall,.-__kernel_vsyscall
.previous

And that's it.

What do you think?  This comes with massively cleaned up kernel-side
asm as well as a test case that actually validates the CFI directives.

Certainly, a bunch of your patches make sense regardless, and I'll
review them and add them to my queue soon.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/7] x86 vdso32 cleanups

2015-08-29 Thread Brian Gerst
This patch set contains several cleanups to the 32-bit VDSO.  The
main change is to only build one VDSO image, and select the syscall
entry point at runtime.

 arch/x86/entry/vdso/.gitignore |  4 +---
 arch/x86/entry/vdso/Makefile   | 53 
++---
 arch/x86/entry/vdso/{vdso32 = }/int80.S   | 13 +
 arch/x86/entry/vdso/{vdso32 = }/sigreturn.S   |  9 +++--
 arch/x86/entry/vdso/{vdso32 = }/syscall.S | 23 +--
 arch/x86/entry/vdso/{vdso32 = }/sysenter.S| 19 +--
 arch/x86/entry/vdso/vclock_gettime.c   | 31 
+++
 arch/x86/entry/vdso/vdso-note.S| 32 
+++-
 arch/x86/entry/vdso/vdso2c.c   |  2 ++
 arch/x86/entry/vdso/vdso32-setup.c | 15 ---
 arch/x86/entry/vdso/{vdso32 = }/vdso32.lds.S  |  2 +-
 arch/x86/entry/vdso/vdso32/.gitignore  |  1 -
 arch/x86/entry/vdso/vdso32/note.S  | 44 

 arch/x86/entry/vdso/vdso32/vclock_gettime.c| 30 
--
 arch/x86/entry/vdso/vdso32/vdso-fakesections.c |  1 -
 arch/x86/entry/vdso/vma.c  |  6 +++---
 arch/x86/ia32/ia32_signal.c|  4 ++--
 arch/x86/include/asm/elf.h |  3 +--
 arch/x86/include/asm/vdso.h| 20 +---
 arch/x86/kernel/signal.c   |  4 ++--
 arch/x86/xen/setup.c   | 13 ++---
 arch/x86/xen/vdso.h|  4 
 22 files changed, 137 insertions(+), 196 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/