Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Wed, Dec 9, 2015 at 1:21 AM, Andy Lutomirski wrote: > On Tue, Dec 8, 2015 at 9:45 PM, Andy Lutomirski wrote: >> On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst wrote: >>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: 64-bit syscalls currently have an optimization in which they are called with partial pt_regs. A small handful require full pt_regs. In the 32-bit and compat cases, I cleaned this up by forcing full pt_regs for all syscalls. The performance hit doesn't really matter. I want to clean up the 64-bit case as well, but I don't want to hurt fast path performance. To do that, I want to force the syscalls that use pt_regs onto the slow path. This will enable us to make slow path syscalls be real ABI-compliant C functions. Use the new syscall entry qualification machinery for this. stub_clone is now stub_clone/ptregs. The next patch will eliminate the stubs, and we'll just have sys_clone/ptregs. Signed-off-by: Andy Lutomirski >>> >>> Fails to boot, bisected to this patch: >>> [ 32.675319] kernel BUG at kernel/auditsc.c:1504! >>> [ 32.675325] invalid opcode: [#65] SMP >>> [ 32.675328] Modules linked in: >>> [ 32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G D >>> 4.3.0-rc4+ #7 >>> [ 32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >>> [ 32.675339] task: 88075340 ti: 88003652 task.ti: >>> 88003652 >>> [ 32.675350] RIP: 0010:[] [] >>> __audit_syscall_entry+0xcd/0xf0 >>> [ 32.675353] RSP: 0018:880036523ef0 EFLAGS: 00010202 >>> [ 32.675355] RAX: 000c RBX: 8800797b3000 RCX: >>> 7ffef8504e88 >>> [ 32.675357] RDX: 56172f37cfd0 RSI: RDI: >>> 000c >>> [ 32.675359] RBP: 880036523f00 R08: 0001 R09: >>> 88075340 >>> [ 32.675361] R10: R11: 0001 R12: >>> >>> [ 32.675363] R13: c03e R14: 0001 R15: >>> 1000 >>> [ 32.675380] FS: 7f02b4ff48c0() GS:88007fc8() >>> knlGS: >>> [ 32.675383] CS: 0010 DS: ES: CR0: 8005003b >>> [ 32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: >>> 06e0 >>> [ 32.675391] Stack: >>> [ 32.675396] 880036523f58 880036523f10 >>> 8100321b >>> [ 32.675401] 880036523f48 81003ad0 56172f374040 >>> 7f93d45c9990 >>> [ 32.675404] 0001 0001 1000 >>> 000a >>> [ 32.675405] Call Trace: >>> [ 32.675414] [] do_audit_syscall_entry+0x4b/0x70 >>> [ 32.675420] [] syscall_trace_enter_phase2+0x110/0x1d0 >>> [ 32.675425] [] tracesys+0x3a/0x96 >>> [ 32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00 >>> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b >>> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89 >>> de 4c >>> [ 32.675469] RIP [] __audit_syscall_entry+0xcd/0xf0 >>> [ 32.675471] RSP >> >> I'm not reproducing this, even with audit manually enabled. Can you >> send a .config? > > Never mind, I found the bug by inspection. I'll send a fixed up > series tomorrow. > > Can you send the boot failure you got with the full series applied, > though? I think that the bug I found is only triggerable part-way > through the series -- I think I inadvertently fixed it later on. I can't reproduce it now. It was a hang, or I just didn't get the oops displayed on the screen. Could have been somethng unrelated. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Wed, Dec 9, 2015 at 1:21 AM, Andy Lutomirskiwrote: > On Tue, Dec 8, 2015 at 9:45 PM, Andy Lutomirski wrote: >> On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst wrote: >>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: 64-bit syscalls currently have an optimization in which they are called with partial pt_regs. A small handful require full pt_regs. In the 32-bit and compat cases, I cleaned this up by forcing full pt_regs for all syscalls. The performance hit doesn't really matter. I want to clean up the 64-bit case as well, but I don't want to hurt fast path performance. To do that, I want to force the syscalls that use pt_regs onto the slow path. This will enable us to make slow path syscalls be real ABI-compliant C functions. Use the new syscall entry qualification machinery for this. stub_clone is now stub_clone/ptregs. The next patch will eliminate the stubs, and we'll just have sys_clone/ptregs. Signed-off-by: Andy Lutomirski >>> >>> Fails to boot, bisected to this patch: >>> [ 32.675319] kernel BUG at kernel/auditsc.c:1504! >>> [ 32.675325] invalid opcode: [#65] SMP >>> [ 32.675328] Modules linked in: >>> [ 32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G D >>> 4.3.0-rc4+ #7 >>> [ 32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >>> [ 32.675339] task: 88075340 ti: 88003652 task.ti: >>> 88003652 >>> [ 32.675350] RIP: 0010:[] [] >>> __audit_syscall_entry+0xcd/0xf0 >>> [ 32.675353] RSP: 0018:880036523ef0 EFLAGS: 00010202 >>> [ 32.675355] RAX: 000c RBX: 8800797b3000 RCX: >>> 7ffef8504e88 >>> [ 32.675357] RDX: 56172f37cfd0 RSI: RDI: >>> 000c >>> [ 32.675359] RBP: 880036523f00 R08: 0001 R09: >>> 88075340 >>> [ 32.675361] R10: R11: 0001 R12: >>> >>> [ 32.675363] R13: c03e R14: 0001 R15: >>> 1000 >>> [ 32.675380] FS: 7f02b4ff48c0() GS:88007fc8() >>> knlGS: >>> [ 32.675383] CS: 0010 DS: ES: CR0: 8005003b >>> [ 32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: >>> 06e0 >>> [ 32.675391] Stack: >>> [ 32.675396] 880036523f58 880036523f10 >>> 8100321b >>> [ 32.675401] 880036523f48 81003ad0 56172f374040 >>> 7f93d45c9990 >>> [ 32.675404] 0001 0001 1000 >>> 000a >>> [ 32.675405] Call Trace: >>> [ 32.675414] [] do_audit_syscall_entry+0x4b/0x70 >>> [ 32.675420] [] syscall_trace_enter_phase2+0x110/0x1d0 >>> [ 32.675425] [] tracesys+0x3a/0x96 >>> [ 32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00 >>> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b >>> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89 >>> de 4c >>> [ 32.675469] RIP [] __audit_syscall_entry+0xcd/0xf0 >>> [ 32.675471] RSP >> >> I'm not reproducing this, even with audit manually enabled. Can you >> send a .config? > > Never mind, I found the bug by inspection. I'll send a fixed up > series tomorrow. > > Can you send the boot failure you got with the full series applied, > though? I think that the bug I found is only triggerable part-way > through the series -- I think I inadvertently fixed it later on. I can't reproduce it now. It was a hang, or I just didn't get the oops displayed on the screen. Could have been somethng unrelated. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Tue, Dec 8, 2015 at 9:45 PM, Andy Lutomirski wrote: > On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst wrote: >> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: >>> 64-bit syscalls currently have an optimization in which they are >>> called with partial pt_regs. A small handful require full pt_regs. >>> >>> In the 32-bit and compat cases, I cleaned this up by forcing full >>> pt_regs for all syscalls. The performance hit doesn't really matter. >>> >>> I want to clean up the 64-bit case as well, but I don't want to hurt >>> fast path performance. To do that, I want to force the syscalls >>> that use pt_regs onto the slow path. This will enable us to make >>> slow path syscalls be real ABI-compliant C functions. >>> >>> Use the new syscall entry qualification machinery for this. >>> stub_clone is now stub_clone/ptregs. >>> >>> The next patch will eliminate the stubs, and we'll just have >>> sys_clone/ptregs. >>> >>> Signed-off-by: Andy Lutomirski >> >> Fails to boot, bisected to this patch: >> [ 32.675319] kernel BUG at kernel/auditsc.c:1504! >> [ 32.675325] invalid opcode: [#65] SMP >> [ 32.675328] Modules linked in: >> [ 32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G D >> 4.3.0-rc4+ #7 >> [ 32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >> [ 32.675339] task: 88075340 ti: 88003652 task.ti: >> 88003652 >> [ 32.675350] RIP: 0010:[] [] >> __audit_syscall_entry+0xcd/0xf0 >> [ 32.675353] RSP: 0018:880036523ef0 EFLAGS: 00010202 >> [ 32.675355] RAX: 000c RBX: 8800797b3000 RCX: >> 7ffef8504e88 >> [ 32.675357] RDX: 56172f37cfd0 RSI: RDI: >> 000c >> [ 32.675359] RBP: 880036523f00 R08: 0001 R09: >> 88075340 >> [ 32.675361] R10: R11: 0001 R12: >> >> [ 32.675363] R13: c03e R14: 0001 R15: >> 1000 >> [ 32.675380] FS: 7f02b4ff48c0() GS:88007fc8() >> knlGS: >> [ 32.675383] CS: 0010 DS: ES: CR0: 8005003b >> [ 32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: >> 06e0 >> [ 32.675391] Stack: >> [ 32.675396] 880036523f58 880036523f10 >> 8100321b >> [ 32.675401] 880036523f48 81003ad0 56172f374040 >> 7f93d45c9990 >> [ 32.675404] 0001 0001 1000 >> 000a >> [ 32.675405] Call Trace: >> [ 32.675414] [] do_audit_syscall_entry+0x4b/0x70 >> [ 32.675420] [] syscall_trace_enter_phase2+0x110/0x1d0 >> [ 32.675425] [] tracesys+0x3a/0x96 >> [ 32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00 >> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b >> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89 >> de 4c >> [ 32.675469] RIP [] __audit_syscall_entry+0xcd/0xf0 >> [ 32.675471] RSP > > I'm not reproducing this, even with audit manually enabled. Can you > send a .config? Never mind, I found the bug by inspection. I'll send a fixed up series tomorrow. Can you send the boot failure you got with the full series applied, though? I think that the bug I found is only triggerable part-way through the series -- I think I inadvertently fixed it later on. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst wrote: > On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: >> 64-bit syscalls currently have an optimization in which they are >> called with partial pt_regs. A small handful require full pt_regs. >> >> In the 32-bit and compat cases, I cleaned this up by forcing full >> pt_regs for all syscalls. The performance hit doesn't really matter. >> >> I want to clean up the 64-bit case as well, but I don't want to hurt >> fast path performance. To do that, I want to force the syscalls >> that use pt_regs onto the slow path. This will enable us to make >> slow path syscalls be real ABI-compliant C functions. >> >> Use the new syscall entry qualification machinery for this. >> stub_clone is now stub_clone/ptregs. >> >> The next patch will eliminate the stubs, and we'll just have >> sys_clone/ptregs. >> >> Signed-off-by: Andy Lutomirski > > Fails to boot, bisected to this patch: > [ 32.675319] kernel BUG at kernel/auditsc.c:1504! > [ 32.675325] invalid opcode: [#65] SMP > [ 32.675328] Modules linked in: > [ 32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G D > 4.3.0-rc4+ #7 > [ 32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 32.675339] task: 88075340 ti: 88003652 task.ti: > 88003652 > [ 32.675350] RIP: 0010:[] [] > __audit_syscall_entry+0xcd/0xf0 > [ 32.675353] RSP: 0018:880036523ef0 EFLAGS: 00010202 > [ 32.675355] RAX: 000c RBX: 8800797b3000 RCX: > 7ffef8504e88 > [ 32.675357] RDX: 56172f37cfd0 RSI: RDI: > 000c > [ 32.675359] RBP: 880036523f00 R08: 0001 R09: > 88075340 > [ 32.675361] R10: R11: 0001 R12: > > [ 32.675363] R13: c03e R14: 0001 R15: > 1000 > [ 32.675380] FS: 7f02b4ff48c0() GS:88007fc8() > knlGS: > [ 32.675383] CS: 0010 DS: ES: CR0: 8005003b > [ 32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: > 06e0 > [ 32.675391] Stack: > [ 32.675396] 880036523f58 880036523f10 > 8100321b > [ 32.675401] 880036523f48 81003ad0 56172f374040 > 7f93d45c9990 > [ 32.675404] 0001 0001 1000 > 000a > [ 32.675405] Call Trace: > [ 32.675414] [] do_audit_syscall_entry+0x4b/0x70 > [ 32.675420] [] syscall_trace_enter_phase2+0x110/0x1d0 > [ 32.675425] [] tracesys+0x3a/0x96 > [ 32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00 > 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b > 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89 > de 4c > [ 32.675469] RIP [] __audit_syscall_entry+0xcd/0xf0 > [ 32.675471] RSP I'm not reproducing this, even with audit manually enabled. Can you send a .config? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: > 64-bit syscalls currently have an optimization in which they are > called with partial pt_regs. A small handful require full pt_regs. > > In the 32-bit and compat cases, I cleaned this up by forcing full > pt_regs for all syscalls. The performance hit doesn't really matter. > > I want to clean up the 64-bit case as well, but I don't want to hurt > fast path performance. To do that, I want to force the syscalls > that use pt_regs onto the slow path. This will enable us to make > slow path syscalls be real ABI-compliant C functions. > > Use the new syscall entry qualification machinery for this. > stub_clone is now stub_clone/ptregs. > > The next patch will eliminate the stubs, and we'll just have > sys_clone/ptregs. > > Signed-off-by: Andy Lutomirski Fails to boot, bisected to this patch: [ 32.675319] kernel BUG at kernel/auditsc.c:1504! [ 32.675325] invalid opcode: [#65] SMP [ 32.675328] Modules linked in: [ 32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G D 4.3.0-rc4+ #7 [ 32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 32.675339] task: 88075340 ti: 88003652 task.ti: 88003652 [ 32.675350] RIP: 0010:[] [] __audit_syscall_entry+0xcd/0xf0 [ 32.675353] RSP: 0018:880036523ef0 EFLAGS: 00010202 [ 32.675355] RAX: 000c RBX: 8800797b3000 RCX: 7ffef8504e88 [ 32.675357] RDX: 56172f37cfd0 RSI: RDI: 000c [ 32.675359] RBP: 880036523f00 R08: 0001 R09: 88075340 [ 32.675361] R10: R11: 0001 R12: [ 32.675363] R13: c03e R14: 0001 R15: 1000 [ 32.675380] FS: 7f02b4ff48c0() GS:88007fc8() knlGS: [ 32.675383] CS: 0010 DS: ES: CR0: 8005003b [ 32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 06e0 [ 32.675391] Stack: [ 32.675396] 880036523f58 880036523f10 8100321b [ 32.675401] 880036523f48 81003ad0 56172f374040 7f93d45c9990 [ 32.675404] 0001 0001 1000 000a [ 32.675405] Call Trace: [ 32.675414] [] do_audit_syscall_entry+0x4b/0x70 [ 32.675420] [] syscall_trace_enter_phase2+0x110/0x1d0 [ 32.675425] [] tracesys+0x3a/0x96 [ 32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89 de 4c [ 32.675469] RIP [] __audit_syscall_entry+0xcd/0xf0 [ 32.675471] RSP -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Tue, Dec 8, 2015 at 10:56 AM, Ingo Molnar wrote: > > * Brian Gerst wrote: > >> > We could adjust it a bit and check whether we're in C land (by checking rsp >> > for ts) and jump into the slow path if we aren't, but I'm not sure this is >> > a >> > huge win. It does save some rodata space by avoiding duplicating the >> > table. >> >> The syscall table is huge. 545*8 bytes, over a full page. Duplicating it for >> just a few different entries is wasteful. > > Note that what matters more is cache footprint, not pure size: 1K of RAM > overhead > for something as fundamental as system calls is trivial cost. > > So the questions to ask are along these lines: > > - what is the typical locality of access (do syscall numbers cluster in time > and >space) > I suspect that they do. Web servers will call send over and over, for example. > - how frequently would the two tables be accessed (is one accessed less >frequently than the other?) On setups that don't bail right away, the fast path table gets hit most of the time. On setups that do bail right away (context tracking on, for example), we exclusively use the slow path table. > > - subsequently how does the effective cache footprint change with the >duplication? In the worst case (repeatedly forking, for example, but I doubt we care about that case), the duplication adds one extra cacheline. > > it might still end up not being worth it - but it's not the RAM cost that is > the > main factor IMHO. Agreed. One option: borrow the high bit to indicate "needs ptregs". This adds a branch to both the fast path and the slow path, but it avoids the cache hit. Brian's approach gets the best of all worlds except that, if I understand it right, it's a bit fragile. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
* Brian Gerst wrote: > > We could adjust it a bit and check whether we're in C land (by checking rsp > > for ts) and jump into the slow path if we aren't, but I'm not sure this is > > a > > huge win. It does save some rodata space by avoiding duplicating the table. > > The syscall table is huge. 545*8 bytes, over a full page. Duplicating it for > just a few different entries is wasteful. Note that what matters more is cache footprint, not pure size: 1K of RAM overhead for something as fundamental as system calls is trivial cost. So the questions to ask are along these lines: - what is the typical locality of access (do syscall numbers cluster in time and space) - how frequently would the two tables be accessed (is one accessed less frequently than the other?) - subsequently how does the effective cache footprint change with the duplication? it might still end up not being worth it - but it's not the RAM cost that is the main factor IMHO. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 8:12 PM, Andy Lutomirski wrote: > On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerst wrote: >> On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst wrote: >>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: 64-bit syscalls currently have an optimization in which they are called with partial pt_regs. A small handful require full pt_regs. In the 32-bit and compat cases, I cleaned this up by forcing full pt_regs for all syscalls. The performance hit doesn't really matter. I want to clean up the 64-bit case as well, but I don't want to hurt fast path performance. To do that, I want to force the syscalls that use pt_regs onto the slow path. This will enable us to make slow path syscalls be real ABI-compliant C functions. Use the new syscall entry qualification machinery for this. stub_clone is now stub_clone/ptregs. The next patch will eliminate the stubs, and we'll just have sys_clone/ptregs. >> >> [Resend after gmail web interface fail] >> >> I've got an idea on how to do this without the duplicate syscall table. >> >> ptregs_foo: >> leaq sys_foo(%rip), %rax >> jmp stub_ptregs_64 >> >> stub_ptregs_64: >> testl $TS_EXTRAREGS, ti_status> >> jnz 1f >> SAVE_EXTRA_REGS >> call *%rax >> RESTORE_EXTRA_REGS >> ret >> 1: >> call *%rax >> ret >> >> This makes sure that the extra regs don't get saved a second time if >> coming in from the slow path, but preserves the fast path if not >> tracing. > > I think there's value in having the entries in the table be genuine C > ABI-compliant function pointers. In your example, it only barely > works -- you can call them from C only if you have TS_EXTRAREGS set > appropriately -- -otherwise you crash and burn. That will break the > rest of the series. I'm working on a full patch. It will set the flag (renamed TS_SLOWPATH) in do_syscall_64(), which is the only place these functions can get called from C code. Your changes already have it set up so that the slow path saved these registers before calling any C code. Where else do you expect them to be called from? > We could adjust it a bit and check whether we're in C land (by > checking rsp for ts) and jump into the slow path if we aren't, but I'm > not sure this is a huge win. It does save some rodata space by > avoiding duplicating the table. The syscall table is huge. 545*8 bytes, over a full page. Duplicating it for just a few different entries is wasteful. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 8:12 PM, Andy Lutomirskiwrote: > On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerst wrote: >> On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst wrote: >>> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: 64-bit syscalls currently have an optimization in which they are called with partial pt_regs. A small handful require full pt_regs. In the 32-bit and compat cases, I cleaned this up by forcing full pt_regs for all syscalls. The performance hit doesn't really matter. I want to clean up the 64-bit case as well, but I don't want to hurt fast path performance. To do that, I want to force the syscalls that use pt_regs onto the slow path. This will enable us to make slow path syscalls be real ABI-compliant C functions. Use the new syscall entry qualification machinery for this. stub_clone is now stub_clone/ptregs. The next patch will eliminate the stubs, and we'll just have sys_clone/ptregs. >> >> [Resend after gmail web interface fail] >> >> I've got an idea on how to do this without the duplicate syscall table. >> >> ptregs_foo: >> leaq sys_foo(%rip), %rax >> jmp stub_ptregs_64 >> >> stub_ptregs_64: >> testl $TS_EXTRAREGS, ti_status> >> jnz 1f >> SAVE_EXTRA_REGS >> call *%rax >> RESTORE_EXTRA_REGS >> ret >> 1: >> call *%rax >> ret >> >> This makes sure that the extra regs don't get saved a second time if >> coming in from the slow path, but preserves the fast path if not >> tracing. > > I think there's value in having the entries in the table be genuine C > ABI-compliant function pointers. In your example, it only barely > works -- you can call them from C only if you have TS_EXTRAREGS set > appropriately -- -otherwise you crash and burn. That will break the > rest of the series. I'm working on a full patch. It will set the flag (renamed TS_SLOWPATH) in do_syscall_64(), which is the only place these functions can get called from C code. Your changes already have it set up so that the slow path saved these registers before calling any C code. Where else do you expect them to be called from? > We could adjust it a bit and check whether we're in C land (by > checking rsp for ts) and jump into the slow path if we aren't, but I'm > not sure this is a huge win. It does save some rodata space by > avoiding duplicating the table. The syscall table is huge. 545*8 bytes, over a full page. Duplicating it for just a few different entries is wasteful. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
* Brian Gerstwrote: > > We could adjust it a bit and check whether we're in C land (by checking rsp > > for ts) and jump into the slow path if we aren't, but I'm not sure this is > > a > > huge win. It does save some rodata space by avoiding duplicating the table. > > The syscall table is huge. 545*8 bytes, over a full page. Duplicating it for > just a few different entries is wasteful. Note that what matters more is cache footprint, not pure size: 1K of RAM overhead for something as fundamental as system calls is trivial cost. So the questions to ask are along these lines: - what is the typical locality of access (do syscall numbers cluster in time and space) - how frequently would the two tables be accessed (is one accessed less frequently than the other?) - subsequently how does the effective cache footprint change with the duplication? it might still end up not being worth it - but it's not the RAM cost that is the main factor IMHO. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerstwrote: > On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: >> 64-bit syscalls currently have an optimization in which they are >> called with partial pt_regs. A small handful require full pt_regs. >> >> In the 32-bit and compat cases, I cleaned this up by forcing full >> pt_regs for all syscalls. The performance hit doesn't really matter. >> >> I want to clean up the 64-bit case as well, but I don't want to hurt >> fast path performance. To do that, I want to force the syscalls >> that use pt_regs onto the slow path. This will enable us to make >> slow path syscalls be real ABI-compliant C functions. >> >> Use the new syscall entry qualification machinery for this. >> stub_clone is now stub_clone/ptregs. >> >> The next patch will eliminate the stubs, and we'll just have >> sys_clone/ptregs. >> >> Signed-off-by: Andy Lutomirski > > Fails to boot, bisected to this patch: > [ 32.675319] kernel BUG at kernel/auditsc.c:1504! > [ 32.675325] invalid opcode: [#65] SMP > [ 32.675328] Modules linked in: > [ 32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G D > 4.3.0-rc4+ #7 > [ 32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > [ 32.675339] task: 88075340 ti: 88003652 task.ti: > 88003652 > [ 32.675350] RIP: 0010:[] [] > __audit_syscall_entry+0xcd/0xf0 > [ 32.675353] RSP: 0018:880036523ef0 EFLAGS: 00010202 > [ 32.675355] RAX: 000c RBX: 8800797b3000 RCX: > 7ffef8504e88 > [ 32.675357] RDX: 56172f37cfd0 RSI: RDI: > 000c > [ 32.675359] RBP: 880036523f00 R08: 0001 R09: > 88075340 > [ 32.675361] R10: R11: 0001 R12: > > [ 32.675363] R13: c03e R14: 0001 R15: > 1000 > [ 32.675380] FS: 7f02b4ff48c0() GS:88007fc8() > knlGS: > [ 32.675383] CS: 0010 DS: ES: CR0: 8005003b > [ 32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: > 06e0 > [ 32.675391] Stack: > [ 32.675396] 880036523f58 880036523f10 > 8100321b > [ 32.675401] 880036523f48 81003ad0 56172f374040 > 7f93d45c9990 > [ 32.675404] 0001 0001 1000 > 000a > [ 32.675405] Call Trace: > [ 32.675414] [] do_audit_syscall_entry+0x4b/0x70 > [ 32.675420] [] syscall_trace_enter_phase2+0x110/0x1d0 > [ 32.675425] [] tracesys+0x3a/0x96 > [ 32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00 > 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b > 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89 > de 4c > [ 32.675469] RIP [] __audit_syscall_entry+0xcd/0xf0 > [ 32.675471] RSP I'm not reproducing this, even with audit manually enabled. Can you send a .config? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Tue, Dec 8, 2015 at 9:45 PM, Andy Lutomirskiwrote: > On Tue, Dec 8, 2015 at 8:43 PM, Brian Gerst wrote: >> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: >>> 64-bit syscalls currently have an optimization in which they are >>> called with partial pt_regs. A small handful require full pt_regs. >>> >>> In the 32-bit and compat cases, I cleaned this up by forcing full >>> pt_regs for all syscalls. The performance hit doesn't really matter. >>> >>> I want to clean up the 64-bit case as well, but I don't want to hurt >>> fast path performance. To do that, I want to force the syscalls >>> that use pt_regs onto the slow path. This will enable us to make >>> slow path syscalls be real ABI-compliant C functions. >>> >>> Use the new syscall entry qualification machinery for this. >>> stub_clone is now stub_clone/ptregs. >>> >>> The next patch will eliminate the stubs, and we'll just have >>> sys_clone/ptregs. >>> >>> Signed-off-by: Andy Lutomirski >> >> Fails to boot, bisected to this patch: >> [ 32.675319] kernel BUG at kernel/auditsc.c:1504! >> [ 32.675325] invalid opcode: [#65] SMP >> [ 32.675328] Modules linked in: >> [ 32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G D >> 4.3.0-rc4+ #7 >> [ 32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >> [ 32.675339] task: 88075340 ti: 88003652 task.ti: >> 88003652 >> [ 32.675350] RIP: 0010:[] [] >> __audit_syscall_entry+0xcd/0xf0 >> [ 32.675353] RSP: 0018:880036523ef0 EFLAGS: 00010202 >> [ 32.675355] RAX: 000c RBX: 8800797b3000 RCX: >> 7ffef8504e88 >> [ 32.675357] RDX: 56172f37cfd0 RSI: RDI: >> 000c >> [ 32.675359] RBP: 880036523f00 R08: 0001 R09: >> 88075340 >> [ 32.675361] R10: R11: 0001 R12: >> >> [ 32.675363] R13: c03e R14: 0001 R15: >> 1000 >> [ 32.675380] FS: 7f02b4ff48c0() GS:88007fc8() >> knlGS: >> [ 32.675383] CS: 0010 DS: ES: CR0: 8005003b >> [ 32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: >> 06e0 >> [ 32.675391] Stack: >> [ 32.675396] 880036523f58 880036523f10 >> 8100321b >> [ 32.675401] 880036523f48 81003ad0 56172f374040 >> 7f93d45c9990 >> [ 32.675404] 0001 0001 1000 >> 000a >> [ 32.675405] Call Trace: >> [ 32.675414] [] do_audit_syscall_entry+0x4b/0x70 >> [ 32.675420] [] syscall_trace_enter_phase2+0x110/0x1d0 >> [ 32.675425] [] tracesys+0x3a/0x96 >> [ 32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00 >> 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b >> 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89 >> de 4c >> [ 32.675469] RIP [] __audit_syscall_entry+0xcd/0xf0 >> [ 32.675471] RSP > > I'm not reproducing this, even with audit manually enabled. Can you > send a .config? Never mind, I found the bug by inspection. I'll send a fixed up series tomorrow. Can you send the boot failure you got with the full series applied, though? I think that the bug I found is only triggerable part-way through the series -- I think I inadvertently fixed it later on. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirskiwrote: > 64-bit syscalls currently have an optimization in which they are > called with partial pt_regs. A small handful require full pt_regs. > > In the 32-bit and compat cases, I cleaned this up by forcing full > pt_regs for all syscalls. The performance hit doesn't really matter. > > I want to clean up the 64-bit case as well, but I don't want to hurt > fast path performance. To do that, I want to force the syscalls > that use pt_regs onto the slow path. This will enable us to make > slow path syscalls be real ABI-compliant C functions. > > Use the new syscall entry qualification machinery for this. > stub_clone is now stub_clone/ptregs. > > The next patch will eliminate the stubs, and we'll just have > sys_clone/ptregs. > > Signed-off-by: Andy Lutomirski Fails to boot, bisected to this patch: [ 32.675319] kernel BUG at kernel/auditsc.c:1504! [ 32.675325] invalid opcode: [#65] SMP [ 32.675328] Modules linked in: [ 32.675333] CPU: 1 PID: 216 Comm: systemd-cgroups Tainted: G D 4.3.0-rc4+ #7 [ 32.675336] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 32.675339] task: 88075340 ti: 88003652 task.ti: 88003652 [ 32.675350] RIP: 0010:[] [] __audit_syscall_entry+0xcd/0xf0 [ 32.675353] RSP: 0018:880036523ef0 EFLAGS: 00010202 [ 32.675355] RAX: 000c RBX: 8800797b3000 RCX: 7ffef8504e88 [ 32.675357] RDX: 56172f37cfd0 RSI: RDI: 000c [ 32.675359] RBP: 880036523f00 R08: 0001 R09: 88075340 [ 32.675361] R10: R11: 0001 R12: [ 32.675363] R13: c03e R14: 0001 R15: 1000 [ 32.675380] FS: 7f02b4ff48c0() GS:88007fc8() knlGS: [ 32.675383] CS: 0010 DS: ES: CR0: 8005003b [ 32.675385] CR2: 7f93d47ea0e0 CR3: 36aa9000 CR4: 06e0 [ 32.675391] Stack: [ 32.675396] 880036523f58 880036523f10 8100321b [ 32.675401] 880036523f48 81003ad0 56172f374040 7f93d45c9990 [ 32.675404] 0001 0001 1000 000a [ 32.675405] Call Trace: [ 32.675414] [] do_audit_syscall_entry+0x4b/0x70 [ 32.675420] [] syscall_trace_enter_phase2+0x110/0x1d0 [ 32.675425] [] tracesys+0x3a/0x96 [ 32.675464] Code: 00 00 00 00 e8 a5 e0 fc ff c7 43 04 01 00 00 00 48 89 43 18 48 89 53 20 44 89 63 0c c7 83 94 02 00 00 00 00 00 00 5b 41 5c 5d c3 <0f> 0b 48 c7 43 50 00 00 00 00 48 c7 c2 60 b4 c5 81 48 89 de 4c [ 32.675469] RIP [] __audit_syscall_entry+0xcd/0xf0 [ 32.675471] RSP -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Tue, Dec 8, 2015 at 10:56 AM, Ingo Molnarwrote: > > * Brian Gerst wrote: > >> > We could adjust it a bit and check whether we're in C land (by checking rsp >> > for ts) and jump into the slow path if we aren't, but I'm not sure this is >> > a >> > huge win. It does save some rodata space by avoiding duplicating the >> > table. >> >> The syscall table is huge. 545*8 bytes, over a full page. Duplicating it for >> just a few different entries is wasteful. > > Note that what matters more is cache footprint, not pure size: 1K of RAM > overhead > for something as fundamental as system calls is trivial cost. > > So the questions to ask are along these lines: > > - what is the typical locality of access (do syscall numbers cluster in time > and >space) > I suspect that they do. Web servers will call send over and over, for example. > - how frequently would the two tables be accessed (is one accessed less >frequently than the other?) On setups that don't bail right away, the fast path table gets hit most of the time. On setups that do bail right away (context tracking on, for example), we exclusively use the slow path table. > > - subsequently how does the effective cache footprint change with the >duplication? In the worst case (repeatedly forking, for example, but I doubt we care about that case), the duplication adds one extra cacheline. > > it might still end up not being worth it - but it's not the RAM cost that is > the > main factor IMHO. Agreed. One option: borrow the high bit to indicate "needs ptregs". This adds a branch to both the fast path and the slow path, but it avoids the cache hit. Brian's approach gets the best of all worlds except that, if I understand it right, it's a bit fragile. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerst wrote: > On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst wrote: >> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: >>> 64-bit syscalls currently have an optimization in which they are >>> called with partial pt_regs. A small handful require full pt_regs. >>> >>> In the 32-bit and compat cases, I cleaned this up by forcing full >>> pt_regs for all syscalls. The performance hit doesn't really matter. >>> >>> I want to clean up the 64-bit case as well, but I don't want to hurt >>> fast path performance. To do that, I want to force the syscalls >>> that use pt_regs onto the slow path. This will enable us to make >>> slow path syscalls be real ABI-compliant C functions. >>> >>> Use the new syscall entry qualification machinery for this. >>> stub_clone is now stub_clone/ptregs. >>> >>> The next patch will eliminate the stubs, and we'll just have >>> sys_clone/ptregs. > > [Resend after gmail web interface fail] > > I've got an idea on how to do this without the duplicate syscall table. > > ptregs_foo: > leaq sys_foo(%rip), %rax > jmp stub_ptregs_64 > > stub_ptregs_64: > testl $TS_EXTRAREGS, ti_status> > jnz 1f > SAVE_EXTRA_REGS > call *%rax > RESTORE_EXTRA_REGS > ret > 1: > call *%rax > ret > > This makes sure that the extra regs don't get saved a second time if > coming in from the slow path, but preserves the fast path if not > tracing. I think there's value in having the entries in the table be genuine C ABI-compliant function pointers. In your example, it only barely works -- you can call them from C only if you have TS_EXTRAREGS set appropriately -- -otherwise you crash and burn. That will break the rest of the series. We could adjust it a bit and check whether we're in C land (by checking rsp for ts) and jump into the slow path if we aren't, but I'm not sure this is a huge win. It does save some rodata space by avoiding duplicating the table. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst wrote: > On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: >> 64-bit syscalls currently have an optimization in which they are >> called with partial pt_regs. A small handful require full pt_regs. >> >> In the 32-bit and compat cases, I cleaned this up by forcing full >> pt_regs for all syscalls. The performance hit doesn't really matter. >> >> I want to clean up the 64-bit case as well, but I don't want to hurt >> fast path performance. To do that, I want to force the syscalls >> that use pt_regs onto the slow path. This will enable us to make >> slow path syscalls be real ABI-compliant C functions. >> >> Use the new syscall entry qualification machinery for this. >> stub_clone is now stub_clone/ptregs. >> >> The next patch will eliminate the stubs, and we'll just have >> sys_clone/ptregs. [Resend after gmail web interface fail] I've got an idea on how to do this without the duplicate syscall table. ptregs_foo: leaq sys_foo(%rip), %rax jmp stub_ptregs_64 stub_ptregs_64: testl $TS_EXTRAREGS, ti_status> jnz 1f SAVE_EXTRA_REGS call *%rax RESTORE_EXTRA_REGS ret 1: call *%rax ret This makes sure that the extra regs don't get saved a second time if coming in from the slow path, but preserves the fast path if not tracing. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: > 64-bit syscalls currently have an optimization in which they are > called with partial pt_regs. A small handful require full pt_regs. > > In the 32-bit and compat cases, I cleaned this up by forcing full > pt_regs for all syscalls. The performance hit doesn't really matter. > > I want to clean up the 64-bit case as well, but I don't want to hurt > fast path performance. To do that, I want to force the syscalls > that use pt_regs onto the slow path. This will enable us to make > slow path syscalls be real ABI-compliant C functions. > > Use the new syscall entry qualification machinery for this. > stub_clone is now stub_clone/ptregs. > > The next patch will eliminate the stubs, and we'll just have > sys_clone/ptregs. I've got an idea on how to do this without the duplicate syscall table. ptregs_foo: leaq sys_foo(%rip), %rax jmp stub_ptregs_64 stub_ptregs_64: testl $TS_EXTRAREGS, ti_status> jnz 1f SAVE_EXTRA_REGS call *%rax RESTORE_EXTRA_REGS ret 1: call *%rax -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
64-bit syscalls currently have an optimization in which they are called with partial pt_regs. A small handful require full pt_regs. In the 32-bit and compat cases, I cleaned this up by forcing full pt_regs for all syscalls. The performance hit doesn't really matter. I want to clean up the 64-bit case as well, but I don't want to hurt fast path performance. To do that, I want to force the syscalls that use pt_regs onto the slow path. This will enable us to make slow path syscalls be real ABI-compliant C functions. Use the new syscall entry qualification machinery for this. stub_clone is now stub_clone/ptregs. The next patch will eliminate the stubs, and we'll just have sys_clone/ptregs. Signed-off-by: Andy Lutomirski --- arch/x86/entry/entry_64.S | 17 + arch/x86/entry/syscall_64.c| 18 ++ arch/x86/entry/syscalls/syscall_64.tbl | 16 3 files changed, 35 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 9d34d3cfceb6..a698b8092831 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -182,7 +182,7 @@ entry_SYSCALL_64_fastpath: #endif ja 1f /* return -ENOSYS (already in pt_regs->ax) */ movq%r10, %rcx - call*sys_call_table(, %rax, 8) + call*sys_call_table_fastpath_64(, %rax, 8) movq%rax, RAX(%rsp) 1: /* @@ -238,13 +238,6 @@ tracesys: movq%rsp, %rdi movl$AUDIT_ARCH_X86_64, %esi callsyscall_trace_enter_phase1 - test%rax, %rax - jnz tracesys_phase2 /* if needed, run the slow path */ - RESTORE_C_REGS_EXCEPT_RAX /* else restore clobbered regs */ - movqORIG_RAX(%rsp), %rax - jmp entry_SYSCALL_64_fastpath /* and return to the fast path */ - -tracesys_phase2: SAVE_EXTRA_REGS movq%rsp, %rdi movl$AUDIT_ARCH_X86_64, %esi @@ -355,6 +348,14 @@ opportunistic_sysret_failed: jmp restore_c_regs_and_iret END(entry_SYSCALL_64) +ENTRY(stub_ptregs_64) + /* +* Syscalls marked as needing ptregs that go through the fast path +* land here. We transfer to the slow path. +*/ + addq$8, %rsp + jmp tracesys +END(stub_ptregs_64) .macro FORK_LIKE func ENTRY(stub_\func) diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c index a1d408772ae6..601745c667ce 100644 --- a/arch/x86/entry/syscall_64.c +++ b/arch/x86/entry/syscall_64.c @@ -22,3 +22,21 @@ asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = { [0 ... __NR_syscall_max] = _ni_syscall, #include }; + +#undef __SYSCALL_64 + +extern long stub_ptregs_64(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); + +#define __SYSCALL_64_QUAL_(nr, sym) [nr] = sym, +#define __SYSCALL_64_QUAL_ptregs(nr, sym) [nr] = stub_ptregs_64, + +#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(nr, sym) + +asmlinkage const sys_call_ptr_t sys_call_table_fastpath_64[__NR_syscall_max+1] = { + /* +* Smells like a compiler bug -- it doesn't work +* when the & below is removed. +*/ + [0 ... __NR_syscall_max] = _ni_syscall, +#include +}; diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 278842fdf1f6..6b9db2e338f4 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -21,7 +21,7 @@ 12 common brk sys_brk 13 64 rt_sigactionsys_rt_sigaction 14 common rt_sigprocmask sys_rt_sigprocmask -15 64 rt_sigreturnstub_rt_sigreturn +15 64 rt_sigreturnstub_rt_sigreturn/ptregs 16 64 ioctl sys_ioctl 17 common pread64 sys_pread64 18 common pwrite64sys_pwrite64 @@ -62,10 +62,10 @@ 53 common socketpair sys_socketpair 54 64 setsockopt sys_setsockopt 55 64 getsockopt sys_getsockopt -56 common clone stub_clone -57 common forkstub_fork -58 common vfork stub_vfork -59 64 execve stub_execve +56 common clone stub_clone/ptregs +57 common forkstub_fork/ptregs +58 common vfork stub_vfork/ptregs +59 64 execve stub_execve/ptregs 60 common exitsys_exit 61 common wait4 sys_wait4 62 common killsys_kill @@ -328,7 +328,7 @@ 319common memfd_createsys_memfd_create 320common kexec_file_load sys_kexec_file_load 321common bpf
[PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
64-bit syscalls currently have an optimization in which they are called with partial pt_regs. A small handful require full pt_regs. In the 32-bit and compat cases, I cleaned this up by forcing full pt_regs for all syscalls. The performance hit doesn't really matter. I want to clean up the 64-bit case as well, but I don't want to hurt fast path performance. To do that, I want to force the syscalls that use pt_regs onto the slow path. This will enable us to make slow path syscalls be real ABI-compliant C functions. Use the new syscall entry qualification machinery for this. stub_clone is now stub_clone/ptregs. The next patch will eliminate the stubs, and we'll just have sys_clone/ptregs. Signed-off-by: Andy Lutomirski--- arch/x86/entry/entry_64.S | 17 + arch/x86/entry/syscall_64.c| 18 ++ arch/x86/entry/syscalls/syscall_64.tbl | 16 3 files changed, 35 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 9d34d3cfceb6..a698b8092831 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -182,7 +182,7 @@ entry_SYSCALL_64_fastpath: #endif ja 1f /* return -ENOSYS (already in pt_regs->ax) */ movq%r10, %rcx - call*sys_call_table(, %rax, 8) + call*sys_call_table_fastpath_64(, %rax, 8) movq%rax, RAX(%rsp) 1: /* @@ -238,13 +238,6 @@ tracesys: movq%rsp, %rdi movl$AUDIT_ARCH_X86_64, %esi callsyscall_trace_enter_phase1 - test%rax, %rax - jnz tracesys_phase2 /* if needed, run the slow path */ - RESTORE_C_REGS_EXCEPT_RAX /* else restore clobbered regs */ - movqORIG_RAX(%rsp), %rax - jmp entry_SYSCALL_64_fastpath /* and return to the fast path */ - -tracesys_phase2: SAVE_EXTRA_REGS movq%rsp, %rdi movl$AUDIT_ARCH_X86_64, %esi @@ -355,6 +348,14 @@ opportunistic_sysret_failed: jmp restore_c_regs_and_iret END(entry_SYSCALL_64) +ENTRY(stub_ptregs_64) + /* +* Syscalls marked as needing ptregs that go through the fast path +* land here. We transfer to the slow path. +*/ + addq$8, %rsp + jmp tracesys +END(stub_ptregs_64) .macro FORK_LIKE func ENTRY(stub_\func) diff --git a/arch/x86/entry/syscall_64.c b/arch/x86/entry/syscall_64.c index a1d408772ae6..601745c667ce 100644 --- a/arch/x86/entry/syscall_64.c +++ b/arch/x86/entry/syscall_64.c @@ -22,3 +22,21 @@ asmlinkage const sys_call_ptr_t sys_call_table[__NR_syscall_max+1] = { [0 ... __NR_syscall_max] = _ni_syscall, #include }; + +#undef __SYSCALL_64 + +extern long stub_ptregs_64(unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long); + +#define __SYSCALL_64_QUAL_(nr, sym) [nr] = sym, +#define __SYSCALL_64_QUAL_ptregs(nr, sym) [nr] = stub_ptregs_64, + +#define __SYSCALL_64(nr, sym, qual) __SYSCALL_64_QUAL_##qual(nr, sym) + +asmlinkage const sys_call_ptr_t sys_call_table_fastpath_64[__NR_syscall_max+1] = { + /* +* Smells like a compiler bug -- it doesn't work +* when the & below is removed. +*/ + [0 ... __NR_syscall_max] = _ni_syscall, +#include +}; diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 278842fdf1f6..6b9db2e338f4 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -21,7 +21,7 @@ 12 common brk sys_brk 13 64 rt_sigactionsys_rt_sigaction 14 common rt_sigprocmask sys_rt_sigprocmask -15 64 rt_sigreturnstub_rt_sigreturn +15 64 rt_sigreturnstub_rt_sigreturn/ptregs 16 64 ioctl sys_ioctl 17 common pread64 sys_pread64 18 common pwrite64sys_pwrite64 @@ -62,10 +62,10 @@ 53 common socketpair sys_socketpair 54 64 setsockopt sys_setsockopt 55 64 getsockopt sys_getsockopt -56 common clone stub_clone -57 common forkstub_fork -58 common vfork stub_vfork -59 64 execve stub_execve +56 common clone stub_clone/ptregs +57 common forkstub_fork/ptregs +58 common vfork stub_vfork/ptregs +59 64 execve stub_execve/ptregs 60 common exitsys_exit 61 common wait4 sys_wait4 62 common killsys_kill @@ -328,7 +328,7 @@ 319common memfd_createsys_memfd_create 320common kexec_file_load sys_kexec_file_load 321
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerstwrote: > On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: >> 64-bit syscalls currently have an optimization in which they are >> called with partial pt_regs. A small handful require full pt_regs. >> >> In the 32-bit and compat cases, I cleaned this up by forcing full >> pt_regs for all syscalls. The performance hit doesn't really matter. >> >> I want to clean up the 64-bit case as well, but I don't want to hurt >> fast path performance. To do that, I want to force the syscalls >> that use pt_regs onto the slow path. This will enable us to make >> slow path syscalls be real ABI-compliant C functions. >> >> Use the new syscall entry qualification machinery for this. >> stub_clone is now stub_clone/ptregs. >> >> The next patch will eliminate the stubs, and we'll just have >> sys_clone/ptregs. [Resend after gmail web interface fail] I've got an idea on how to do this without the duplicate syscall table. ptregs_foo: leaq sys_foo(%rip), %rax jmp stub_ptregs_64 stub_ptregs_64: testl $TS_EXTRAREGS, ti_status> jnz 1f SAVE_EXTRA_REGS call *%rax RESTORE_EXTRA_REGS ret 1: call *%rax ret This makes sure that the extra regs don't get saved a second time if coming in from the slow path, but preserves the fast path if not tracing. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 4:54 PM, Brian Gerstwrote: > On Mon, Dec 7, 2015 at 7:50 PM, Brian Gerst wrote: >> On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirski wrote: >>> 64-bit syscalls currently have an optimization in which they are >>> called with partial pt_regs. A small handful require full pt_regs. >>> >>> In the 32-bit and compat cases, I cleaned this up by forcing full >>> pt_regs for all syscalls. The performance hit doesn't really matter. >>> >>> I want to clean up the 64-bit case as well, but I don't want to hurt >>> fast path performance. To do that, I want to force the syscalls >>> that use pt_regs onto the slow path. This will enable us to make >>> slow path syscalls be real ABI-compliant C functions. >>> >>> Use the new syscall entry qualification machinery for this. >>> stub_clone is now stub_clone/ptregs. >>> >>> The next patch will eliminate the stubs, and we'll just have >>> sys_clone/ptregs. > > [Resend after gmail web interface fail] > > I've got an idea on how to do this without the duplicate syscall table. > > ptregs_foo: > leaq sys_foo(%rip), %rax > jmp stub_ptregs_64 > > stub_ptregs_64: > testl $TS_EXTRAREGS, ti_status> > jnz 1f > SAVE_EXTRA_REGS > call *%rax > RESTORE_EXTRA_REGS > ret > 1: > call *%rax > ret > > This makes sure that the extra regs don't get saved a second time if > coming in from the slow path, but preserves the fast path if not > tracing. I think there's value in having the entries in the table be genuine C ABI-compliant function pointers. In your example, it only barely works -- you can call them from C only if you have TS_EXTRAREGS set appropriately -- -otherwise you crash and burn. That will break the rest of the series. We could adjust it a bit and check whether we're in C land (by checking rsp for ts) and jump into the slow path if we aren't, but I'm not sure this is a huge win. It does save some rodata space by avoiding duplicating the table. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/12] x86/entry/64: Always run ptregs-using syscalls on the slow path
On Mon, Dec 7, 2015 at 4:51 PM, Andy Lutomirskiwrote: > 64-bit syscalls currently have an optimization in which they are > called with partial pt_regs. A small handful require full pt_regs. > > In the 32-bit and compat cases, I cleaned this up by forcing full > pt_regs for all syscalls. The performance hit doesn't really matter. > > I want to clean up the 64-bit case as well, but I don't want to hurt > fast path performance. To do that, I want to force the syscalls > that use pt_regs onto the slow path. This will enable us to make > slow path syscalls be real ABI-compliant C functions. > > Use the new syscall entry qualification machinery for this. > stub_clone is now stub_clone/ptregs. > > The next patch will eliminate the stubs, and we'll just have > sys_clone/ptregs. I've got an idea on how to do this without the duplicate syscall table. ptregs_foo: leaq sys_foo(%rip), %rax jmp stub_ptregs_64 stub_ptregs_64: testl $TS_EXTRAREGS, ti_status> jnz 1f SAVE_EXTRA_REGS call *%rax RESTORE_EXTRA_REGS ret 1: call *%rax -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/