Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-09 Thread Brian Gerst
On Wed, Dec 9, 2015 at 1:21 AM, Andy Lutomirski  wrote:
> On Tue, Dec 8, 2015 at 9:45 PM, Andy Lutomirski  wrote:
>> On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst  wrote:
>>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
 64-bit syscalls currently have an optimization in which they are
 called with partial pt_regs.  A small handful require full pt_regs.

 In the 32-bit and compat cases, I cleaned this up by forcing full
 pt_regs for all syscalls.  The performance hit doesn't really matter.

 I want to clean up the 64-bit case as well, but I don't want to hurt
 fast path performance.  To do that, I want to force the syscalls
 that use pt_regs onto the slow path.  This will enable us to make
 slow path syscalls be real ABI-compliant C functions.

 Use the new syscall entry qualification machinery for this.
 stub_clone is now stub_clone/ptregs.

 The next patch will eliminate the stubs, and we'll just have
 sys_clone/ptregs.

 Signed-off-by: Andy Lutomirski 
>>>
>>> Fails to boot, bisected to this patch:
>>> [   32.675319] kernel BUG at kernel/auditsc.c:1504!
>>> [   32.675325] invalid opcode:  [#65] SMP
>>> [   32.675328] Modules linked in:
>>> [   32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G  D
>>> 4.3.0-rc4+ #7
>>> [   32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>> [   32.675339] task: 88075340 ti: 88003652 task.ti:
>>> 88003652
>>> [   32.675350] RIP: 0010:[]  []
>>> __audit_syscall_entry+0xcd/0xf0
>>> [   32.675353] RSP: 0018:880036523ef0  EFLAGS: 00010202
>>> [   32.675355] RAX: 000c RBX: 8800797b3000 RCX: 
>>> 7ffef8504e88
>>> [   32.675357] RDX: 56172f37cfd0 RSI:  RDI: 
>>> 000c
>>> [   32.675359] RBP: 880036523f00 R08: 0001 R09: 
>>> 88075340
>>> [   32.675361] R10:  R11: 0001 R12: 
>>> 
>>> [   32.675363] R13: c03e R14: 0001 R15: 
>>> 1000
>>> [   32.675380] FS:  7f02b4ff48c0() GS:88007fc8()
>>> knlGS:
>>> [   32.675383] CS:  0010 DS:  ES:  CR0: 8005003b
>>> [   32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 
>>> 06e0
>>> [   32.675391] Stack:
>>> [   32.675396]  880036523f58  880036523f10
>>> 8100321b
>>> [   32.675401]  880036523f48 81003ad0 56172f374040
>>> 7f93d45c9990
>>> [   32.675404]  0001 0001 1000
>>> 000a
>>> [   32.675405] Call Trace:
>>> [   32.675414]  [] do_audit_syscall_entry+0x4b/0x70
>>> [   32.675420]  [] syscall_trace_enter_phase2+0x110/0x1d0
>>> [   32.675425]  [] tracesys+0x3a/0x96
>>> [   32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00
>>> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b
>>> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89
>>> de 4c
>>> [   32.675469] RIP  [] __audit_syscall_entry+0xcd/0xf0
>>> [   32.675471]  RSP 
>>
>> I'm not reproducing this, even with audit manually enabled.  Can you
>> send a .config?
>
> Never mind, I found the bug by inspection.  I'll send a fixed up
> series tomorrow.
>
> Can you send the boot failure you got with the full series applied,
> though?  I think that the bug I found is only triggerable part-way
> through the series -- I think I inadvertently fixed it later on.

I can't reproduce it now.  It was a hang, or I just didn't get the
oops displayed on the screen.  Could have been somethng unrelated.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-09 Thread Brian Gerst
On Wed, Dec 9, 2015 at 1:21 AM, Andy Lutomirski  wrote:
> On Tue, Dec 8, 2015 at 9:45 PM, Andy Lutomirski  wrote:
>> On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst  wrote:
>>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
 64-bit syscalls currently have an optimization in which they are
 called with partial pt_regs.  A small handful require full pt_regs.

 In the 32-bit and compat cases, I cleaned this up by forcing full
 pt_regs for all syscalls.  The performance hit doesn't really matter.

 I want to clean up the 64-bit case as well, but I don't want to hurt
 fast path performance.  To do that, I want to force the syscalls
 that use pt_regs onto the slow path.  This will enable us to make
 slow path syscalls be real ABI-compliant C functions.

 Use the new syscall entry qualification machinery for this.
 stub_clone is now stub_clone/ptregs.

 The next patch will eliminate the stubs, and we'll just have
 sys_clone/ptregs.

 Signed-off-by: Andy Lutomirski 
>>>
>>> Fails to boot, bisected to this patch:
>>> [   32.675319] kernel BUG at kernel/auditsc.c:1504!
>>> [   32.675325] invalid opcode:  [#65] SMP
>>> [   32.675328] Modules linked in:
>>> [   32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G  D
>>> 4.3.0-rc4+ #7
>>> [   32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>> [   32.675339] task: 88075340 ti: 88003652 task.ti:
>>> 88003652
>>> [   32.675350] RIP: 0010:[]  []
>>> __audit_syscall_entry+0xcd/0xf0
>>> [   32.675353] RSP: 0018:880036523ef0  EFLAGS: 00010202
>>> [   32.675355] RAX: 000c RBX: 8800797b3000 RCX: 
>>> 7ffef8504e88
>>> [   32.675357] RDX: 56172f37cfd0 RSI:  RDI: 
>>> 000c
>>> [   32.675359] RBP: 880036523f00 R08: 0001 R09: 
>>> 88075340
>>> [   32.675361] R10:  R11: 0001 R12: 
>>> 
>>> [   32.675363] R13: c03e R14: 0001 R15: 
>>> 1000
>>> [   32.675380] FS:  7f02b4ff48c0() GS:88007fc8()
>>> knlGS:
>>> [   32.675383] CS:  0010 DS:  ES:  CR0: 8005003b
>>> [   32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 
>>> 06e0
>>> [   32.675391] Stack:
>>> [   32.675396]  880036523f58  880036523f10
>>> 8100321b
>>> [   32.675401]  880036523f48 81003ad0 56172f374040
>>> 7f93d45c9990
>>> [   32.675404]  0001 0001 1000
>>> 000a
>>> [   32.675405] Call Trace:
>>> [   32.675414]  [] do_audit_syscall_entry+0x4b/0x70
>>> [   32.675420]  [] syscall_trace_enter_phase2+0x110/0x1d0
>>> [   32.675425]  [] tracesys+0x3a/0x96
>>> [   32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00
>>> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b
>>> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89
>>> de 4c
>>> [   32.675469] RIP  [] __audit_syscall_entry+0xcd/0xf0
>>> [   32.675471]  RSP 
>>
>> I'm not reproducing this, even with audit manually enabled.  Can you
>> send a .config?
>
> Never mind, I found the bug by inspection.  I'll send a fixed up
> series tomorrow.
>
> Can you send the boot failure you got with the full series applied,
> though?  I think that the bug I found is only triggerable part-way
> through the series -- I think I inadvertently fixed it later on.

I can't reproduce it now.  It was a hang, or I just didn't get the
oops displayed on the screen.  Could have been somethng unrelated.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Andy Lutomirski
On Tue, Dec 8, 2015 at 9:45 PM, Andy Lutomirski  wrote:
> On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst  wrote:
>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
>>> 64-bit syscalls currently have an optimization in which they are
>>> called with partial pt_regs.  A small handful require full pt_regs.
>>>
>>> In the 32-bit and compat cases, I cleaned this up by forcing full
>>> pt_regs for all syscalls.  The performance hit doesn't really matter.
>>>
>>> I want to clean up the 64-bit case as well, but I don't want to hurt
>>> fast path performance.  To do that, I want to force the syscalls
>>> that use pt_regs onto the slow path.  This will enable us to make
>>> slow path syscalls be real ABI-compliant C functions.
>>>
>>> Use the new syscall entry qualification machinery for this.
>>> stub_clone is now stub_clone/ptregs.
>>>
>>> The next patch will eliminate the stubs, and we'll just have
>>> sys_clone/ptregs.
>>>
>>> Signed-off-by: Andy Lutomirski 
>>
>> Fails to boot, bisected to this patch:
>> [   32.675319] kernel BUG at kernel/auditsc.c:1504!
>> [   32.675325] invalid opcode:  [#65] SMP
>> [   32.675328] Modules linked in:
>> [   32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G  D
>> 4.3.0-rc4+ #7
>> [   32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [   32.675339] task: 88075340 ti: 88003652 task.ti:
>> 88003652
>> [   32.675350] RIP: 0010:[]  []
>> __audit_syscall_entry+0xcd/0xf0
>> [   32.675353] RSP: 0018:880036523ef0  EFLAGS: 00010202
>> [   32.675355] RAX: 000c RBX: 8800797b3000 RCX: 
>> 7ffef8504e88
>> [   32.675357] RDX: 56172f37cfd0 RSI:  RDI: 
>> 000c
>> [   32.675359] RBP: 880036523f00 R08: 0001 R09: 
>> 88075340
>> [   32.675361] R10:  R11: 0001 R12: 
>> 
>> [   32.675363] R13: c03e R14: 0001 R15: 
>> 1000
>> [   32.675380] FS:  7f02b4ff48c0() GS:88007fc8()
>> knlGS:
>> [   32.675383] CS:  0010 DS:  ES:  CR0: 8005003b
>> [   32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 
>> 06e0
>> [   32.675391] Stack:
>> [   32.675396]  880036523f58  880036523f10
>> 8100321b
>> [   32.675401]  880036523f48 81003ad0 56172f374040
>> 7f93d45c9990
>> [   32.675404]  0001 0001 1000
>> 000a
>> [   32.675405] Call Trace:
>> [   32.675414]  [] do_audit_syscall_entry+0x4b/0x70
>> [   32.675420]  [] syscall_trace_enter_phase2+0x110/0x1d0
>> [   32.675425]  [] tracesys+0x3a/0x96
>> [   32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00
>> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b
>> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89
>> de 4c
>> [   32.675469] RIP  [] __audit_syscall_entry+0xcd/0xf0
>> [   32.675471]  RSP 
>
> I'm not reproducing this, even with audit manually enabled.  Can you
> send a .config?

Never mind, I found the bug by inspection.  I'll send a fixed up
series tomorrow.

Can you send the boot failure you got with the full series applied,
though?  I think that the bug I found is only triggerable part-way
through the series -- I think I inadvertently fixed it later on.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Andy Lutomirski
On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst  wrote:
> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
>> 64-bit syscalls currently have an optimization in which they are
>> called with partial pt_regs.  A small handful require full pt_regs.
>>
>> In the 32-bit and compat cases, I cleaned this up by forcing full
>> pt_regs for all syscalls.  The performance hit doesn't really matter.
>>
>> I want to clean up the 64-bit case as well, but I don't want to hurt
>> fast path performance.  To do that, I want to force the syscalls
>> that use pt_regs onto the slow path.  This will enable us to make
>> slow path syscalls be real ABI-compliant C functions.
>>
>> Use the new syscall entry qualification machinery for this.
>> stub_clone is now stub_clone/ptregs.
>>
>> The next patch will eliminate the stubs, and we'll just have
>> sys_clone/ptregs.
>>
>> Signed-off-by: Andy Lutomirski 
>
> Fails to boot, bisected to this patch:
> [   32.675319] kernel BUG at kernel/auditsc.c:1504!
> [   32.675325] invalid opcode:  [#65] SMP
> [   32.675328] Modules linked in:
> [   32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G  D
> 4.3.0-rc4+ #7
> [   32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [   32.675339] task: 88075340 ti: 88003652 task.ti:
> 88003652
> [   32.675350] RIP: 0010:[]  []
> __audit_syscall_entry+0xcd/0xf0
> [   32.675353] RSP: 0018:880036523ef0  EFLAGS: 00010202
> [   32.675355] RAX: 000c RBX: 8800797b3000 RCX: 
> 7ffef8504e88
> [   32.675357] RDX: 56172f37cfd0 RSI:  RDI: 
> 000c
> [   32.675359] RBP: 880036523f00 R08: 0001 R09: 
> 88075340
> [   32.675361] R10:  R11: 0001 R12: 
> 
> [   32.675363] R13: c03e R14: 0001 R15: 
> 1000
> [   32.675380] FS:  7f02b4ff48c0() GS:88007fc8()
> knlGS:
> [   32.675383] CS:  0010 DS:  ES:  CR0: 8005003b
> [   32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 
> 06e0
> [   32.675391] Stack:
> [   32.675396]  880036523f58  880036523f10
> 8100321b
> [   32.675401]  880036523f48 81003ad0 56172f374040
> 7f93d45c9990
> [   32.675404]  0001 0001 1000
> 000a
> [   32.675405] Call Trace:
> [   32.675414]  [] do_audit_syscall_entry+0x4b/0x70
> [   32.675420]  [] syscall_trace_enter_phase2+0x110/0x1d0
> [   32.675425]  [] tracesys+0x3a/0x96
> [   32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00
> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b
> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89
> de 4c
> [   32.675469] RIP  [] __audit_syscall_entry+0xcd/0xf0
> [   32.675471]  RSP 

I'm not reproducing this, even with audit manually enabled.  Can you
send a .config?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Brian Gerst
On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
> 64-bit syscalls currently have an optimization in which they are
> called with partial pt_regs.  A small handful require full pt_regs.
>
> In the 32-bit and compat cases, I cleaned this up by forcing full
> pt_regs for all syscalls.  The performance hit doesn't really matter.
>
> I want to clean up the 64-bit case as well, but I don't want to hurt
> fast path performance.  To do that, I want to force the syscalls
> that use pt_regs onto the slow path.  This will enable us to make
> slow path syscalls be real ABI-compliant C functions.
>
> Use the new syscall entry qualification machinery for this.
> stub_clone is now stub_clone/ptregs.
>
> The next patch will eliminate the stubs, and we'll just have
> sys_clone/ptregs.
>
> Signed-off-by: Andy Lutomirski 

Fails to boot, bisected to this patch:
[   32.675319] kernel BUG at kernel/auditsc.c:1504!
[   32.675325] invalid opcode:  [#65] SMP
[   32.675328] Modules linked in:
[   32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G  D
4.3.0-rc4+ #7
[   32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   32.675339] task: 88075340 ti: 88003652 task.ti:
88003652
[   32.675350] RIP: 0010:[]  []
__audit_syscall_entry+0xcd/0xf0
[   32.675353] RSP: 0018:880036523ef0  EFLAGS: 00010202
[   32.675355] RAX: 000c RBX: 8800797b3000 RCX: 7ffef8504e88
[   32.675357] RDX: 56172f37cfd0 RSI:  RDI: 000c
[   32.675359] RBP: 880036523f00 R08: 0001 R09: 88075340
[   32.675361] R10:  R11: 0001 R12: 
[   32.675363] R13: c03e R14: 0001 R15: 1000
[   32.675380] FS:  7f02b4ff48c0() GS:88007fc8()
knlGS:
[   32.675383] CS:  0010 DS:  ES:  CR0: 8005003b
[   32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 06e0
[   32.675391] Stack:
[   32.675396]  880036523f58  880036523f10
8100321b
[   32.675401]  880036523f48 81003ad0 56172f374040
7f93d45c9990
[   32.675404]  0001 0001 1000
000a
[   32.675405] Call Trace:
[   32.675414]  [] do_audit_syscall_entry+0x4b/0x70
[   32.675420]  [] syscall_trace_enter_phase2+0x110/0x1d0
[   32.675425]  [] tracesys+0x3a/0x96
[   32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00
48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b
41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89
de 4c
[   32.675469] RIP  [] __audit_syscall_entry+0xcd/0xf0
[   32.675471]  RSP 

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Andy Lutomirski
On Tue, Dec 8, 2015 at 10:56 AM, Ingo Molnar  wrote:
>
> * Brian Gerst  wrote:
>
>> > We could adjust it a bit and check whether we're in C land (by checking rsp
>> > for ts) and jump into the slow path if we aren't, but I'm not sure this is 
>> > a
>> > huge win.  It does save some rodata space by avoiding duplicating the 
>> > table.
>>
>> The syscall table is huge.  545*8 bytes, over a full page. Duplicating it for
>> just a few different entries is wasteful.
>
> Note that what matters more is cache footprint, not pure size: 1K of RAM 
> overhead
> for something as fundamental as system calls is trivial cost.
>
> So the questions to ask are along these lines:
>
>  - what is the typical locality of access (do syscall numbers cluster in time 
> and
>space)
>

I suspect that they do.  Web servers will call send over and over, for example.

>  - how frequently would the two tables be accessed (is one accessed less
>frequently than the other?)

On setups that don't bail right away, the fast path table gets hit
most of the time.  On setups that do bail right away (context tracking
on, for example), we exclusively use the slow path table.

>
>  - subsequently how does the effective cache footprint change with the
>duplication?

In the worst case (repeatedly forking, for example, but I doubt we
care about that case), the duplication adds one extra cacheline.

>
> it might still end up not being worth it - but it's not the RAM cost that is 
> the
> main factor IMHO.

Agreed.

One option: borrow the high bit to indicate "needs ptregs".  This adds
a branch to both the fast path and the slow path, but it avoids the
cache hit.

Brian's approach gets the best of all worlds except that, if I
understand it right, it's a bit fragile.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Ingo Molnar

* Brian Gerst  wrote:

> > We could adjust it a bit and check whether we're in C land (by checking rsp 
> > for ts) and jump into the slow path if we aren't, but I'm not sure this is 
> > a 
> > huge win.  It does save some rodata space by avoiding duplicating the table.
> 
> The syscall table is huge.  545*8 bytes, over a full page. Duplicating it for 
> just a few different entries is wasteful.

Note that what matters more is cache footprint, not pure size: 1K of RAM 
overhead 
for something as fundamental as system calls is trivial cost.

So the questions to ask are along these lines:

 - what is the typical locality of access (do syscall numbers cluster in time 
and 
   space)

 - how frequently would the two tables be accessed (is one accessed less 
   frequently than the other?)

 - subsequently how does the effective cache footprint change with the 
   duplication?

it might still end up not being worth it - but it's not the RAM cost that is 
the 
main factor IMHO.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Brian Gerst
On Mon, Dec 7, 2015 at 8:12 PM, Andy Lutomirski  wrote:
> On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerst  wrote:
>> On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst  wrote:
>>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
 64-bit syscalls currently have an optimization in which they are
 called with partial pt_regs.  A small handful require full pt_regs.

 In the 32-bit and compat cases, I cleaned this up by forcing full
 pt_regs for all syscalls.  The performance hit doesn't really matter.

 I want to clean up the 64-bit case as well, but I don't want to hurt
 fast path performance.  To do that, I want to force the syscalls
 that use pt_regs onto the slow path.  This will enable us to make
 slow path syscalls be real ABI-compliant C functions.

 Use the new syscall entry qualification machinery for this.
 stub_clone is now stub_clone/ptregs.

 The next patch will eliminate the stubs, and we'll just have
 sys_clone/ptregs.
>>
>> [Resend after gmail web interface fail]
>>
>> I've got an idea on how to do this without the duplicate syscall table.
>>
>> ptregs_foo:
>> leaq sys_foo(%rip), %rax
>> jmp stub_ptregs_64
>>
>> stub_ptregs_64:
>> testl $TS_EXTRAREGS, ti_status>
>> jnz 1f
>> SAVE_EXTRA_REGS
>> call *%rax
>> RESTORE_EXTRA_REGS
>> ret
>> 1:
>> call *%rax
>> ret
>>
>> This makes sure that the extra regs don't get saved a second time if
>> coming in from the slow path, but preserves the fast path if not
>> tracing.
>
> I think there's value in having the entries in the table be genuine C
> ABI-compliant function pointers.  In your example, it only barely
> works -- you can call them from C only if you have TS_EXTRAREGS set
> appropriately -- -otherwise you crash and burn.  That will break the
> rest of the series.

I'm working on a full patch.  It will set the flag (renamed
TS_SLOWPATH) in do_syscall_64(), which is the only place these
functions can get called from C code.  Your changes already have it
set up so that the slow path saved these registers before calling any
C code.  Where else do you expect them to be called from?

> We could adjust it a bit and check whether we're in C land (by
> checking rsp for ts) and jump into the slow path if we aren't, but I'm
> not sure this is a huge win.  It does save some rodata space by
> avoiding duplicating the table.

The syscall table is huge.  545*8 bytes, over a full page.
Duplicating it for just a few different entries is wasteful.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Brian Gerst
On Mon, Dec 7, 2015 at 8:12 PM, Andy Lutomirski  wrote:
> On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerst  wrote:
>> On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst  wrote:
>>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
 64-bit syscalls currently have an optimization in which they are
 called with partial pt_regs.  A small handful require full pt_regs.

 In the 32-bit and compat cases, I cleaned this up by forcing full
 pt_regs for all syscalls.  The performance hit doesn't really matter.

 I want to clean up the 64-bit case as well, but I don't want to hurt
 fast path performance.  To do that, I want to force the syscalls
 that use pt_regs onto the slow path.  This will enable us to make
 slow path syscalls be real ABI-compliant C functions.

 Use the new syscall entry qualification machinery for this.
 stub_clone is now stub_clone/ptregs.

 The next patch will eliminate the stubs, and we'll just have
 sys_clone/ptregs.
>>
>> [Resend after gmail web interface fail]
>>
>> I've got an idea on how to do this without the duplicate syscall table.
>>
>> ptregs_foo:
>> leaq sys_foo(%rip), %rax
>> jmp stub_ptregs_64
>>
>> stub_ptregs_64:
>> testl $TS_EXTRAREGS, ti_status>
>> jnz 1f
>> SAVE_EXTRA_REGS
>> call *%rax
>> RESTORE_EXTRA_REGS
>> ret
>> 1:
>> call *%rax
>> ret
>>
>> This makes sure that the extra regs don't get saved a second time if
>> coming in from the slow path, but preserves the fast path if not
>> tracing.
>
> I think there's value in having the entries in the table be genuine C
> ABI-compliant function pointers.  In your example, it only barely
> works -- you can call them from C only if you have TS_EXTRAREGS set
> appropriately -- -otherwise you crash and burn.  That will break the
> rest of the series.

I'm working on a full patch.  It will set the flag (renamed
TS_SLOWPATH) in do_syscall_64(), which is the only place these
functions can get called from C code.  Your changes already have it
set up so that the slow path saved these registers before calling any
C code.  Where else do you expect them to be called from?

> We could adjust it a bit and check whether we're in C land (by
> checking rsp for ts) and jump into the slow path if we aren't, but I'm
> not sure this is a huge win.  It does save some rodata space by
> avoiding duplicating the table.

The syscall table is huge.  545*8 bytes, over a full page.
Duplicating it for just a few different entries is wasteful.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Ingo Molnar

* Brian Gerst  wrote:

> > We could adjust it a bit and check whether we're in C land (by checking rsp 
> > for ts) and jump into the slow path if we aren't, but I'm not sure this is 
> > a 
> > huge win.  It does save some rodata space by avoiding duplicating the table.
> 
> The syscall table is huge.  545*8 bytes, over a full page. Duplicating it for 
> just a few different entries is wasteful.

Note that what matters more is cache footprint, not pure size: 1K of RAM 
overhead 
for something as fundamental as system calls is trivial cost.

So the questions to ask are along these lines:

 - what is the typical locality of access (do syscall numbers cluster in time 
and 
   space)

 - how frequently would the two tables be accessed (is one accessed less 
   frequently than the other?)

 - subsequently how does the effective cache footprint change with the 
   duplication?

it might still end up not being worth it - but it's not the RAM cost that is 
the 
main factor IMHO.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Andy Lutomirski
On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst  wrote:
> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
>> 64-bit syscalls currently have an optimization in which they are
>> called with partial pt_regs.  A small handful require full pt_regs.
>>
>> In the 32-bit and compat cases, I cleaned this up by forcing full
>> pt_regs for all syscalls.  The performance hit doesn't really matter.
>>
>> I want to clean up the 64-bit case as well, but I don't want to hurt
>> fast path performance.  To do that, I want to force the syscalls
>> that use pt_regs onto the slow path.  This will enable us to make
>> slow path syscalls be real ABI-compliant C functions.
>>
>> Use the new syscall entry qualification machinery for this.
>> stub_clone is now stub_clone/ptregs.
>>
>> The next patch will eliminate the stubs, and we'll just have
>> sys_clone/ptregs.
>>
>> Signed-off-by: Andy Lutomirski 
>
> Fails to boot, bisected to this patch:
> [   32.675319] kernel BUG at kernel/auditsc.c:1504!
> [   32.675325] invalid opcode:  [#65] SMP
> [   32.675328] Modules linked in:
> [   32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G  D
> 4.3.0-rc4+ #7
> [   32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [   32.675339] task: 88075340 ti: 88003652 task.ti:
> 88003652
> [   32.675350] RIP: 0010:[]  []
> __audit_syscall_entry+0xcd/0xf0
> [   32.675353] RSP: 0018:880036523ef0  EFLAGS: 00010202
> [   32.675355] RAX: 000c RBX: 8800797b3000 RCX: 
> 7ffef8504e88
> [   32.675357] RDX: 56172f37cfd0 RSI:  RDI: 
> 000c
> [   32.675359] RBP: 880036523f00 R08: 0001 R09: 
> 88075340
> [   32.675361] R10:  R11: 0001 R12: 
> 
> [   32.675363] R13: c03e R14: 0001 R15: 
> 1000
> [   32.675380] FS:  7f02b4ff48c0() GS:88007fc8()
> knlGS:
> [   32.675383] CS:  0010 DS:  ES:  CR0: 8005003b
> [   32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 
> 06e0
> [   32.675391] Stack:
> [   32.675396]  880036523f58  880036523f10
> 8100321b
> [   32.675401]  880036523f48 81003ad0 56172f374040
> 7f93d45c9990
> [   32.675404]  0001 0001 1000
> 000a
> [   32.675405] Call Trace:
> [   32.675414]  [] do_audit_syscall_entry+0x4b/0x70
> [   32.675420]  [] syscall_trace_enter_phase2+0x110/0x1d0
> [   32.675425]  [] tracesys+0x3a/0x96
> [   32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00
> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b
> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89
> de 4c
> [   32.675469] RIP  [] __audit_syscall_entry+0xcd/0xf0
> [   32.675471]  RSP 

I'm not reproducing this, even with audit manually enabled.  Can you
send a .config?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Andy Lutomirski
On Tue, Dec 8, 2015 at 9:45 PM, Andy Lutomirski  wrote:
> On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst  wrote:
>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
>>> 64-bit syscalls currently have an optimization in which they are
>>> called with partial pt_regs.  A small handful require full pt_regs.
>>>
>>> In the 32-bit and compat cases, I cleaned this up by forcing full
>>> pt_regs for all syscalls.  The performance hit doesn't really matter.
>>>
>>> I want to clean up the 64-bit case as well, but I don't want to hurt
>>> fast path performance.  To do that, I want to force the syscalls
>>> that use pt_regs onto the slow path.  This will enable us to make
>>> slow path syscalls be real ABI-compliant C functions.
>>>
>>> Use the new syscall entry qualification machinery for this.
>>> stub_clone is now stub_clone/ptregs.
>>>
>>> The next patch will eliminate the stubs, and we'll just have
>>> sys_clone/ptregs.
>>>
>>> Signed-off-by: Andy Lutomirski 
>>
>> Fails to boot, bisected to this patch:
>> [   32.675319] kernel BUG at kernel/auditsc.c:1504!
>> [   32.675325] invalid opcode:  [#65] SMP
>> [   32.675328] Modules linked in:
>> [   32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G  D
>> 4.3.0-rc4+ #7
>> [   32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [   32.675339] task: 88075340 ti: 88003652 task.ti:
>> 88003652
>> [   32.675350] RIP: 0010:[]  []
>> __audit_syscall_entry+0xcd/0xf0
>> [   32.675353] RSP: 0018:880036523ef0  EFLAGS: 00010202
>> [   32.675355] RAX: 000c RBX: 8800797b3000 RCX: 
>> 7ffef8504e88
>> [   32.675357] RDX: 56172f37cfd0 RSI:  RDI: 
>> 000c
>> [   32.675359] RBP: 880036523f00 R08: 0001 R09: 
>> 88075340
>> [   32.675361] R10:  R11: 0001 R12: 
>> 
>> [   32.675363] R13: c03e R14: 0001 R15: 
>> 1000
>> [   32.675380] FS:  7f02b4ff48c0() GS:88007fc8()
>> knlGS:
>> [   32.675383] CS:  0010 DS:  ES:  CR0: 8005003b
>> [   32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 
>> 06e0
>> [   32.675391] Stack:
>> [   32.675396]  880036523f58  880036523f10
>> 8100321b
>> [   32.675401]  880036523f48 81003ad0 56172f374040
>> 7f93d45c9990
>> [   32.675404]  0001 0001 1000
>> 000a
>> [   32.675405] Call Trace:
>> [   32.675414]  [] do_audit_syscall_entry+0x4b/0x70
>> [   32.675420]  [] syscall_trace_enter_phase2+0x110/0x1d0
>> [   32.675425]  [] tracesys+0x3a/0x96
>> [   32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00
>> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b
>> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89
>> de 4c
>> [   32.675469] RIP  [] __audit_syscall_entry+0xcd/0xf0
>> [   32.675471]  RSP 
>
> I'm not reproducing this, even with audit manually enabled.  Can you
> send a .config?

Never mind, I found the bug by inspection.  I'll send a fixed up
series tomorrow.

Can you send the boot failure you got with the full series applied,
though?  I think that the bug I found is only triggerable part-way
through the series -- I think I inadvertently fixed it later on.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Brian Gerst
On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
> 64-bit syscalls currently have an optimization in which they are
> called with partial pt_regs.  A small handful require full pt_regs.
>
> In the 32-bit and compat cases, I cleaned this up by forcing full
> pt_regs for all syscalls.  The performance hit doesn't really matter.
>
> I want to clean up the 64-bit case as well, but I don't want to hurt
> fast path performance.  To do that, I want to force the syscalls
> that use pt_regs onto the slow path.  This will enable us to make
> slow path syscalls be real ABI-compliant C functions.
>
> Use the new syscall entry qualification machinery for this.
> stub_clone is now stub_clone/ptregs.
>
> The next patch will eliminate the stubs, and we'll just have
> sys_clone/ptregs.
>
> Signed-off-by: Andy Lutomirski 

Fails to boot, bisected to this patch:
[   32.675319] kernel BUG at kernel/auditsc.c:1504!
[   32.675325] invalid opcode:  [#65] SMP
[   32.675328] Modules linked in:
[   32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G  D
4.3.0-rc4+ #7
[   32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   32.675339] task: 88075340 ti: 88003652 task.ti:
88003652
[   32.675350] RIP: 0010:[]  []
__audit_syscall_entry+0xcd/0xf0
[   32.675353] RSP: 0018:880036523ef0  EFLAGS: 00010202
[   32.675355] RAX: 000c RBX: 8800797b3000 RCX: 7ffef8504e88
[   32.675357] RDX: 56172f37cfd0 RSI:  RDI: 000c
[   32.675359] RBP: 880036523f00 R08: 0001 R09: 88075340
[   32.675361] R10:  R11: 0001 R12: 
[   32.675363] R13: c03e R14: 0001 R15: 1000
[   32.675380] FS:  7f02b4ff48c0() GS:88007fc8()
knlGS:
[   32.675383] CS:  0010 DS:  ES:  CR0: 8005003b
[   32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 06e0
[   32.675391] Stack:
[   32.675396]  880036523f58  880036523f10
8100321b
[   32.675401]  880036523f48 81003ad0 56172f374040
7f93d45c9990
[   32.675404]  0001 0001 1000
000a
[   32.675405] Call Trace:
[   32.675414]  [] do_audit_syscall_entry+0x4b/0x70
[   32.675420]  [] syscall_trace_enter_phase2+0x110/0x1d0
[   32.675425]  [] tracesys+0x3a/0x96
[   32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00
48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b
41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89
de 4c
[   32.675469] RIP  [] __audit_syscall_entry+0xcd/0xf0
[   32.675471]  RSP 

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-08 Thread Andy Lutomirski
On Tue, Dec 8, 2015 at 10:56 AM, Ingo Molnar  wrote:
>
> * Brian Gerst  wrote:
>
>> > We could adjust it a bit and check whether we're in C land (by checking rsp
>> > for ts) and jump into the slow path if we aren't, but I'm not sure this is 
>> > a
>> > huge win.  It does save some rodata space by avoiding duplicating the 
>> > table.
>>
>> The syscall table is huge.  545*8 bytes, over a full page. Duplicating it for
>> just a few different entries is wasteful.
>
> Note that what matters more is cache footprint, not pure size: 1K of RAM 
> overhead
> for something as fundamental as system calls is trivial cost.
>
> So the questions to ask are along these lines:
>
>  - what is the typical locality of access (do syscall numbers cluster in time 
> and
>space)
>

I suspect that they do.  Web servers will call send over and over, for example.

>  - how frequently would the two tables be accessed (is one accessed less
>frequently than the other?)

On setups that don't bail right away, the fast path table gets hit
most of the time.  On setups that do bail right away (context tracking
on, for example), we exclusively use the slow path table.

>
>  - subsequently how does the effective cache footprint change with the
>duplication?

In the worst case (repeatedly forking, for example, but I doubt we
care about that case), the duplication adds one extra cacheline.

>
> it might still end up not being worth it - but it's not the RAM cost that is 
> the
> main factor IMHO.

Agreed.

One option: borrow the high bit to indicate "needs ptregs".  This adds
a branch to both the fast path and the slow path, but it avoids the
cache hit.

Brian's approach gets the best of all worlds except that, if I
understand it right, it's a bit fragile.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-07 Thread Andy Lutomirski
On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerst  wrote:
> On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst  wrote:
>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
>>> 64-bit syscalls currently have an optimization in which they are
>>> called with partial pt_regs.  A small handful require full pt_regs.
>>>
>>> In the 32-bit and compat cases, I cleaned this up by forcing full
>>> pt_regs for all syscalls.  The performance hit doesn't really matter.
>>>
>>> I want to clean up the 64-bit case as well, but I don't want to hurt
>>> fast path performance.  To do that, I want to force the syscalls
>>> that use pt_regs onto the slow path.  This will enable us to make
>>> slow path syscalls be real ABI-compliant C functions.
>>>
>>> Use the new syscall entry qualification machinery for this.
>>> stub_clone is now stub_clone/ptregs.
>>>
>>> The next patch will eliminate the stubs, and we'll just have
>>> sys_clone/ptregs.
>
> [Resend after gmail web interface fail]
>
> I've got an idea on how to do this without the duplicate syscall table.
>
> ptregs_foo:
> leaq sys_foo(%rip), %rax
> jmp stub_ptregs_64
>
> stub_ptregs_64:
> testl $TS_EXTRAREGS, ti_status>
> jnz 1f
> SAVE_EXTRA_REGS
> call *%rax
> RESTORE_EXTRA_REGS
> ret
> 1:
> call *%rax
> ret
>
> This makes sure that the extra regs don't get saved a second time if
> coming in from the slow path, but preserves the fast path if not
> tracing.

I think there's value in having the entries in the table be genuine C
ABI-compliant function pointers.  In your example, it only barely
works -- you can call them from C only if you have TS_EXTRAREGS set
appropriately -- -otherwise you crash and burn.  That will break the
rest of the series.

We could adjust it a bit and check whether we're in C land (by
checking rsp for ts) and jump into the slow path if we aren't, but I'm
not sure this is a huge win.  It does save some rodata space by
avoiding duplicating the table.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-07 Thread Brian Gerst
On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst  wrote:
> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
>> 64-bit syscalls currently have an optimization in which they are
>> called with partial pt_regs.  A small handful require full pt_regs.
>>
>> In the 32-bit and compat cases, I cleaned this up by forcing full
>> pt_regs for all syscalls.  The performance hit doesn't really matter.
>>
>> I want to clean up the 64-bit case as well, but I don't want to hurt
>> fast path performance.  To do that, I want to force the syscalls
>> that use pt_regs onto the slow path.  This will enable us to make
>> slow path syscalls be real ABI-compliant C functions.
>>
>> Use the new syscall entry qualification machinery for this.
>> stub_clone is now stub_clone/ptregs.
>>
>> The next patch will eliminate the stubs, and we'll just have
>> sys_clone/ptregs.

[Resend after gmail web interface fail]

I've got an idea on how to do this without the duplicate syscall table.

ptregs_foo:
leaq sys_foo(%rip), %rax
jmp stub_ptregs_64

stub_ptregs_64:
testl $TS_EXTRAREGS, ti_status>
jnz 1f
SAVE_EXTRA_REGS
call *%rax
RESTORE_EXTRA_REGS
ret
1:
call *%rax
ret

This makes sure that the extra regs don't get saved a second time if
coming in from the slow path, but preserves the fast path if not
tracing.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-07 Thread Brian Gerst
On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
> 64-bit syscalls currently have an optimization in which they are
> called with partial pt_regs.  A small handful require full pt_regs.
>
> In the 32-bit and compat cases, I cleaned this up by forcing full
> pt_regs for all syscalls.  The performance hit doesn't really matter.
>
> I want to clean up the 64-bit case as well, but I don't want to hurt
> fast path performance.  To do that, I want to force the syscalls
> that use pt_regs onto the slow path.  This will enable us to make
> slow path syscalls be real ABI-compliant C functions.
>
> Use the new syscall entry qualification machinery for this.
> stub_clone is now stub_clone/ptregs.
>
> The next patch will eliminate the stubs, and we'll just have
> sys_clone/ptregs.

I've got an idea on how to do this without the duplicate syscall table.

ptregs_foo:
leaq sys_foo(%rip), %rax
jmp stub_ptregs_64

stub_ptregs_64:
testl $TS_EXTRAREGS, ti_status>
jnz 1f
SAVE_EXTRA_REGS
call *%rax
RESTORE_EXTRA_REGS
ret
1:
call *%rax


--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-07 Thread Andy Lutomirski
64-bit syscalls currently have an optimization in which they are
called with partial pt_regs.  A small handful require full pt_regs.

In the 32-bit and compat cases, I cleaned this up by forcing full
pt_regs for all syscalls.  The performance hit doesn't really matter.

I want to clean up the 64-bit case as well, but I don't want to hurt
fast path performance.  To do that, I want to force the syscalls
that use pt_regs onto the slow path.  This will enable us to make
slow path syscalls be real ABI-compliant C functions.

Use the new syscall entry qualification machinery for this.
stub_clone is now stub_clone/ptregs.

The next patch will eliminate the stubs, and we'll just have
sys_clone/ptregs.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_64.S  | 17 +
 arch/x86/entry/syscall_64.c| 18 ++
 arch/x86/entry/syscalls/syscall_64.tbl | 16 
 3 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9d34d3cfceb6..a698b8092831 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -182,7 +182,7 @@ entry_SYSCALL_64_fastpath:
 #endif
ja  1f  /* return -ENOSYS (already in 
pt_regs->ax) */
movq%r10, %rcx
-   call*sys_call_table(, %rax, 8)
+   call*sys_call_table_fastpath_64(, %rax, 8)
movq%rax, RAX(%rsp)
 1:
 /*
@@ -238,13 +238,6 @@ tracesys:
movq%rsp, %rdi
movl$AUDIT_ARCH_X86_64, %esi
callsyscall_trace_enter_phase1
-   test%rax, %rax
-   jnz tracesys_phase2 /* if needed, run the slow path 
*/
-   RESTORE_C_REGS_EXCEPT_RAX   /* else restore clobbered regs 
*/
-   movqORIG_RAX(%rsp), %rax
-   jmp entry_SYSCALL_64_fastpath   /* and return to the fast path 
*/
-
-tracesys_phase2:
SAVE_EXTRA_REGS
movq%rsp, %rdi
movl$AUDIT_ARCH_X86_64, %esi
@@ -355,6 +348,14 @@ opportunistic_sysret_failed:
jmp restore_c_regs_and_iret
 END(entry_SYSCALL_64)
 
+ENTRY(stub_ptregs_64)
+   /*
+* Syscalls marked as needing ptregs that go through the fast path
+* land here.  We transfer to the slow path.
+*/
+   addq$8, %rsp
+   jmp tracesys
+END(stub_ptregs_64)
 
.macro FORK_LIKE func
 ENTRY(stub_\func)
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index a1d408772ae6..601745c667ce 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -22,3 +22,21 @@ asmlinkage const sys_call_ptr_t 
sys_call_table[__NR_syscall_max+1] = {
[0 ... __NR_syscall_max] = _ni_syscall,
 #include 
 };
+
+#undef __SYSCALL_64
+
+extern long stub_ptregs_64(unsigned long, unsigned long, unsigned long, 
unsigned long, unsigned long, unsigned long);
+
+#define __SYSCALL_64_QUAL_(nr, sym) [nr] = sym,
+#define __SYSCALL_64_QUAL_ptregs(nr, sym) [nr] = stub_ptregs_64,
+
+#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(nr, sym)
+
+asmlinkage const sys_call_ptr_t sys_call_table_fastpath_64[__NR_syscall_max+1] 
= {
+   /*
+* Smells like a compiler bug -- it doesn't work
+* when the & below is removed.
+*/
+   [0 ... __NR_syscall_max] = _ni_syscall,
+#include 
+};
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 278842fdf1f6..6b9db2e338f4 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -21,7 +21,7 @@
 12 common  brk sys_brk
 13 64  rt_sigactionsys_rt_sigaction
 14 common  rt_sigprocmask  sys_rt_sigprocmask
-15 64  rt_sigreturnstub_rt_sigreturn
+15 64  rt_sigreturnstub_rt_sigreturn/ptregs
 16 64  ioctl   sys_ioctl
 17 common  pread64 sys_pread64
 18 common  pwrite64sys_pwrite64
@@ -62,10 +62,10 @@
 53 common  socketpair  sys_socketpair
 54 64  setsockopt  sys_setsockopt
 55 64  getsockopt  sys_getsockopt
-56 common  clone   stub_clone
-57 common  forkstub_fork
-58 common  vfork   stub_vfork
-59 64  execve  stub_execve
+56 common  clone   stub_clone/ptregs
+57 common  forkstub_fork/ptregs
+58 common  vfork   stub_vfork/ptregs
+59 64  execve  stub_execve/ptregs
 60 common  exitsys_exit
 61 common  wait4   sys_wait4
 62 common  killsys_kill
@@ -328,7 +328,7 @@
 319common  memfd_createsys_memfd_create
 320common  kexec_file_load sys_kexec_file_load
 321common  bpf 

[PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-07 Thread Andy Lutomirski
64-bit syscalls currently have an optimization in which they are
called with partial pt_regs.  A small handful require full pt_regs.

In the 32-bit and compat cases, I cleaned this up by forcing full
pt_regs for all syscalls.  The performance hit doesn't really matter.

I want to clean up the 64-bit case as well, but I don't want to hurt
fast path performance.  To do that, I want to force the syscalls
that use pt_regs onto the slow path.  This will enable us to make
slow path syscalls be real ABI-compliant C functions.

Use the new syscall entry qualification machinery for this.
stub_clone is now stub_clone/ptregs.

The next patch will eliminate the stubs, and we'll just have
sys_clone/ptregs.

Signed-off-by: Andy Lutomirski 
---
 arch/x86/entry/entry_64.S  | 17 +
 arch/x86/entry/syscall_64.c| 18 ++
 arch/x86/entry/syscalls/syscall_64.tbl | 16 
 3 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 9d34d3cfceb6..a698b8092831 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -182,7 +182,7 @@ entry_SYSCALL_64_fastpath:
 #endif
ja  1f  /* return -ENOSYS (already in 
pt_regs->ax) */
movq%r10, %rcx
-   call*sys_call_table(, %rax, 8)
+   call*sys_call_table_fastpath_64(, %rax, 8)
movq%rax, RAX(%rsp)
 1:
 /*
@@ -238,13 +238,6 @@ tracesys:
movq%rsp, %rdi
movl$AUDIT_ARCH_X86_64, %esi
callsyscall_trace_enter_phase1
-   test%rax, %rax
-   jnz tracesys_phase2 /* if needed, run the slow path 
*/
-   RESTORE_C_REGS_EXCEPT_RAX   /* else restore clobbered regs 
*/
-   movqORIG_RAX(%rsp), %rax
-   jmp entry_SYSCALL_64_fastpath   /* and return to the fast path 
*/
-
-tracesys_phase2:
SAVE_EXTRA_REGS
movq%rsp, %rdi
movl$AUDIT_ARCH_X86_64, %esi
@@ -355,6 +348,14 @@ opportunistic_sysret_failed:
jmp restore_c_regs_and_iret
 END(entry_SYSCALL_64)
 
+ENTRY(stub_ptregs_64)
+   /*
+* Syscalls marked as needing ptregs that go through the fast path
+* land here.  We transfer to the slow path.
+*/
+   addq$8, %rsp
+   jmp tracesys
+END(stub_ptregs_64)
 
.macro FORK_LIKE func
 ENTRY(stub_\func)
diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c
index a1d408772ae6..601745c667ce 100644
--- a/arch/x86/entry/syscall_64.c
+++ b/arch/x86/entry/syscall_64.c
@@ -22,3 +22,21 @@ asmlinkage const sys_call_ptr_t 
sys_call_table[__NR_syscall_max+1] = {
[0 ... __NR_syscall_max] = _ni_syscall,
 #include 
 };
+
+#undef __SYSCALL_64
+
+extern long stub_ptregs_64(unsigned long, unsigned long, unsigned long, 
unsigned long, unsigned long, unsigned long);
+
+#define __SYSCALL_64_QUAL_(nr, sym) [nr] = sym,
+#define __SYSCALL_64_QUAL_ptregs(nr, sym) [nr] = stub_ptregs_64,
+
+#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(nr, sym)
+
+asmlinkage const sys_call_ptr_t sys_call_table_fastpath_64[__NR_syscall_max+1] 
= {
+   /*
+* Smells like a compiler bug -- it doesn't work
+* when the & below is removed.
+*/
+   [0 ... __NR_syscall_max] = _ni_syscall,
+#include 
+};
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl 
b/arch/x86/entry/syscalls/syscall_64.tbl
index 278842fdf1f6..6b9db2e338f4 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -21,7 +21,7 @@
 12 common  brk sys_brk
 13 64  rt_sigactionsys_rt_sigaction
 14 common  rt_sigprocmask  sys_rt_sigprocmask
-15 64  rt_sigreturnstub_rt_sigreturn
+15 64  rt_sigreturnstub_rt_sigreturn/ptregs
 16 64  ioctl   sys_ioctl
 17 common  pread64 sys_pread64
 18 common  pwrite64sys_pwrite64
@@ -62,10 +62,10 @@
 53 common  socketpair  sys_socketpair
 54 64  setsockopt  sys_setsockopt
 55 64  getsockopt  sys_getsockopt
-56 common  clone   stub_clone
-57 common  forkstub_fork
-58 common  vfork   stub_vfork
-59 64  execve  stub_execve
+56 common  clone   stub_clone/ptregs
+57 common  forkstub_fork/ptregs
+58 common  vfork   stub_vfork/ptregs
+59 64  execve  stub_execve/ptregs
 60 common  exitsys_exit
 61 common  wait4   sys_wait4
 62 common  killsys_kill
@@ -328,7 +328,7 @@
 319common  memfd_createsys_memfd_create
 320common  kexec_file_load sys_kexec_file_load
 321   

Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-07 Thread Brian Gerst
On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst  wrote:
> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
>> 64-bit syscalls currently have an optimization in which they are
>> called with partial pt_regs.  A small handful require full pt_regs.
>>
>> In the 32-bit and compat cases, I cleaned this up by forcing full
>> pt_regs for all syscalls.  The performance hit doesn't really matter.
>>
>> I want to clean up the 64-bit case as well, but I don't want to hurt
>> fast path performance.  To do that, I want to force the syscalls
>> that use pt_regs onto the slow path.  This will enable us to make
>> slow path syscalls be real ABI-compliant C functions.
>>
>> Use the new syscall entry qualification machinery for this.
>> stub_clone is now stub_clone/ptregs.
>>
>> The next patch will eliminate the stubs, and we'll just have
>> sys_clone/ptregs.

[Resend after gmail web interface fail]

I've got an idea on how to do this without the duplicate syscall table.

ptregs_foo:
leaq sys_foo(%rip), %rax
jmp stub_ptregs_64

stub_ptregs_64:
testl $TS_EXTRAREGS, ti_status>
jnz 1f
SAVE_EXTRA_REGS
call *%rax
RESTORE_EXTRA_REGS
ret
1:
call *%rax
ret

This makes sure that the extra regs don't get saved a second time if
coming in from the slow path, but preserves the fast path if not
tracing.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-07 Thread Andy Lutomirski
On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerst  wrote:
> On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst  wrote:
>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
>>> 64-bit syscalls currently have an optimization in which they are
>>> called with partial pt_regs.  A small handful require full pt_regs.
>>>
>>> In the 32-bit and compat cases, I cleaned this up by forcing full
>>> pt_regs for all syscalls.  The performance hit doesn't really matter.
>>>
>>> I want to clean up the 64-bit case as well, but I don't want to hurt
>>> fast path performance.  To do that, I want to force the syscalls
>>> that use pt_regs onto the slow path.  This will enable us to make
>>> slow path syscalls be real ABI-compliant C functions.
>>>
>>> Use the new syscall entry qualification machinery for this.
>>> stub_clone is now stub_clone/ptregs.
>>>
>>> The next patch will eliminate the stubs, and we'll just have
>>> sys_clone/ptregs.
>
> [Resend after gmail web interface fail]
>
> I've got an idea on how to do this without the duplicate syscall table.
>
> ptregs_foo:
> leaq sys_foo(%rip), %rax
> jmp stub_ptregs_64
>
> stub_ptregs_64:
> testl $TS_EXTRAREGS, ti_status>
> jnz 1f
> SAVE_EXTRA_REGS
> call *%rax
> RESTORE_EXTRA_REGS
> ret
> 1:
> call *%rax
> ret
>
> This makes sure that the extra regs don't get saved a second time if
> coming in from the slow path, but preserves the fast path if not
> tracing.

I think there's value in having the entries in the table be genuine C
ABI-compliant function pointers.  In your example, it only barely
works -- you can call them from C only if you have TS_EXTRAREGS set
appropriately -- -otherwise you crash and burn.  That will break the
rest of the series.

We could adjust it a bit and check whether we're in C land (by
checking rsp for ts) and jump into the slow path if we aren't, but I'm
not sure this is a huge win.  It does save some rodata space by
avoiding duplicating the table.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path

2015-12-07 Thread Brian Gerst
On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski  wrote:
> 64-bit syscalls currently have an optimization in which they are
> called with partial pt_regs.  A small handful require full pt_regs.
>
> In the 32-bit and compat cases, I cleaned this up by forcing full
> pt_regs for all syscalls.  The performance hit doesn't really matter.
>
> I want to clean up the 64-bit case as well, but I don't want to hurt
> fast path performance.  To do that, I want to force the syscalls
> that use pt_regs onto the slow path.  This will enable us to make
> slow path syscalls be real ABI-compliant C functions.
>
> Use the new syscall entry qualification machinery for this.
> stub_clone is now stub_clone/ptregs.
>
> The next patch will eliminate the stubs, and we'll just have
> sys_clone/ptregs.

I've got an idea on how to do this without the duplicate syscall table.

ptregs_foo:
leaq sys_foo(%rip), %rax
jmp stub_ptregs_64

stub_ptregs_64:
testl $TS_EXTRAREGS, ti_status>
jnz 1f
SAVE_EXTRA_REGS
call *%rax
RESTORE_EXTRA_REGS
ret
1:
call *%rax


--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/