Re: Fast syscalls via sysenter
On 06/23/2012 08:58 PM, Konstantin Belousov wrote: On Sat, Jun 23, 2012 at 02:17:53PM +0800, David Xu wrote: On 2012/06/21 20:11, John Baldwin wrote: On Monday, June 18, 2012 2:56:30 pm Daniil Cherednik wrote: Hi! I am trying to continue the work started by DavidXu on implemention of fast syscalls via sysenter/sysexit. http://people.freebsd.org/~davidxu/sysenter/kernel/ I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a beginner in kernel so I have some questions: 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch /* * If %edx was changed, we can not use sysexit, because it * needs %edx to restore userland %eip. */ if (orig_edx != frame.tf_edx) td-td_pcb-pcb_flags |= PCB_FULLCTX; What is the reason why we have to do this additional check? In http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s we store %edx to the stack in pushl %edx /* ring 3 next %eip */ and we restore the register in popl%edx/* ring 3 %eip */ Some system calls return two return values (pipe(2)) or return a 64-bit off_t (lseek(2)). Those system calls change %edx's value and need that changed value to make it out to userland. 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s movlPCPU(CURPCB),%esi callsyscall Why do we movl PCPU(CURPCB),%esi before calling syscall? syscall is just c- function. No clue on this one, looks like it is not needed. [kib@ is cc'ed] I implemented the sysenter syscall long time ago, it indeed can reduce system call overhead on i386. I think it might be the time to implement linux like vdso syscall now based on the work kib@ recently has done, though I don''t know how to hook it into kib's code. I quick googled it, and found they put some data into aux vector: http://www.trilithium.com/johan/2005/08/linux-gate/ http://www.takatan.net/lxr/source/arch/um/os-Linux/elf_aux.c?a=x86_64#L40 Yes, intent is to eventually switch to VDSO from current situation were libc is aware of shared page content. This was extensively discussed in flame that resulted in me writing the current gettimeofday(2) patch. It was arch@ several weeks ago, AFAIR. Committed gettimeofday() code structure allows for VDSO interposing without breaking normal symbol visibility rules. I do not see a sense in implementing syscall or sysenter support for i386 kernel. On the other hand, using syscall for 32bit binaries on amd64 looks reasonable. I was not able to write some time, sorry. So. What about implementing vdso now? I know it was a patch and feature request http://lists.freebsd.org/pipermail/freebsd-bugs/2010-April/039597.html About sysenter: I have ported sysenter patch for 9.0-RELEASE-p4, it looks fine. I made some fixes in SYS.h. The reason is (if i understand it right) we have to get elf without DT_TEXTREL in ld-elf.so You can find the patch here: https://redmine.sportcomitet.org/projects/dev-freebsd/repository/revisions/master/raw/sysenter.patch https://redmine.sportcomitet.org/projects/dev-freebsd/repository/revisions/master/raw/sys/i386/i386/sysenter.s But now, this patch breaks compatibility with i386 XEN PV kernel. I wanted to fix it, but without VDSO it would be limited solution. It is one of reasons why I am interested about vdso status. So, about using 32bit binaries on amd64. It is reasonable. But if we will use it I think we have to implement vdso support in i386 kernel too for compatibility and it is better to implement sysenter too. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fast syscalls via sysenter
On 2012/06/21 20:11, John Baldwin wrote: On Monday, June 18, 2012 2:56:30 pm Daniil Cherednik wrote: Hi! I am trying to continue the work started by DavidXu on implemention of fast syscalls via sysenter/sysexit. http://people.freebsd.org/~davidxu/sysenter/kernel/ I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a beginner in kernel so I have some questions: 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch /* * If %edx was changed, we can not use sysexit, because it * needs %edx to restore userland %eip. */ if (orig_edx != frame.tf_edx) td-td_pcb-pcb_flags |= PCB_FULLCTX; What is the reason why we have to do this additional check? In http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s we store %edx to the stack in pushl %edx /* ring 3 next %eip */ and we restore the register in popl%edx/* ring 3 %eip */ Some system calls return two return values (pipe(2)) or return a 64-bit off_t (lseek(2)). Those system calls change %edx's value and need that changed value to make it out to userland. 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s movlPCPU(CURPCB),%esi callsyscall Why do we movl PCPU(CURPCB),%esi before calling syscall? syscall is just c- function. No clue on this one, looks like it is not needed. [kib@ is cc'ed] I implemented the sysenter syscall long time ago, it indeed can reduce system call overhead on i386. I think it might be the time to implement linux like vdso syscall now based on the work kib@ recently has done, though I don''t know how to hook it into kib's code. I quick googled it, and found they put some data into aux vector: http://www.trilithium.com/johan/2005/08/linux-gate/ http://www.takatan.net/lxr/source/arch/um/os-Linux/elf_aux.c?a=x86_64#L40 Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fast syscalls via sysenter
On Sat, Jun 23, 2012 at 02:17:53PM +0800, David Xu wrote: On 2012/06/21 20:11, John Baldwin wrote: On Monday, June 18, 2012 2:56:30 pm Daniil Cherednik wrote: Hi! I am trying to continue the work started by DavidXu on implemention of fast syscalls via sysenter/sysexit. http://people.freebsd.org/~davidxu/sysenter/kernel/ I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a beginner in kernel so I have some questions: 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch /* * If %edx was changed, we can not use sysexit, because it * needs %edx to restore userland %eip. */ if (orig_edx != frame.tf_edx) td-td_pcb-pcb_flags |= PCB_FULLCTX; What is the reason why we have to do this additional check? In http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s we store %edx to the stack in pushl %edx /* ring 3 next %eip */ and we restore the register in popl%edx/* ring 3 %eip */ Some system calls return two return values (pipe(2)) or return a 64-bit off_t (lseek(2)). Those system calls change %edx's value and need that changed value to make it out to userland. 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s movlPCPU(CURPCB),%esi callsyscall Why do we movl PCPU(CURPCB),%esi before calling syscall? syscall is just c- function. No clue on this one, looks like it is not needed. [kib@ is cc'ed] I implemented the sysenter syscall long time ago, it indeed can reduce system call overhead on i386. I think it might be the time to implement linux like vdso syscall now based on the work kib@ recently has done, though I don''t know how to hook it into kib's code. I quick googled it, and found they put some data into aux vector: http://www.trilithium.com/johan/2005/08/linux-gate/ http://www.takatan.net/lxr/source/arch/um/os-Linux/elf_aux.c?a=x86_64#L40 Yes, intent is to eventually switch to VDSO from current situation were libc is aware of shared page content. This was extensively discussed in flame that resulted in me writing the current gettimeofday(2) patch. It was arch@ several weeks ago, AFAIR. Committed gettimeofday() code structure allows for VDSO interposing without breaking normal symbol visibility rules. I do not see a sense in implementing syscall or sysenter support for i386 kernel. On the other hand, using syscall for 32bit binaries on amd64 looks reasonable. pgp459emKisb7.pgp Description: PGP signature
Re: Fast syscalls via sysenter
On Monday, June 18, 2012 2:56:30 pm Daniil Cherednik wrote: Hi! I am trying to continue the work started by DavidXu on implemention of fast syscalls via sysenter/sysexit. http://people.freebsd.org/~davidxu/sysenter/kernel/ I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a beginner in kernel so I have some questions: 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch /* * If %edx was changed, we can not use sysexit, because it * needs %edx to restore userland %eip. */ if (orig_edx != frame.tf_edx) td-td_pcb-pcb_flags |= PCB_FULLCTX; What is the reason why we have to do this additional check? In http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s we store %edx to the stack in pushl %edx/* ring 3 next %eip */ and we restore the register in popl %edx/* ring 3 %eip */ Some system calls return two return values (pipe(2)) or return a 64-bit off_t (lseek(2)). Those system calls change %edx's value and need that changed value to make it out to userland. 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s movl PCPU(CURPCB),%esi call syscall Why do we movl PCPU(CURPCB),%esi before calling syscall? syscall is just c- function. No clue on this one, looks like it is not needed. -- John Baldwin ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Fast syscalls via sysenter
Hi! I am trying to continue the work started by DavidXu on implemention of fast syscalls via sysenter/sysexit. http://people.freebsd.org/~davidxu/sysenter/kernel/ I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a beginner in kernel so I have some questions: 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch /* * If %edx was changed, we can not use sysexit, because it * needs %edx to restore userland %eip. */ if (orig_edx != frame.tf_edx) td-td_pcb-pcb_flags |= PCB_FULLCTX; What is the reason why we have to do this additional check? In http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s we store %edx to the stack in pushl %edx /* ring 3 next %eip */ and we restore the register in popl%edx/* ring 3 %eip */ 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s movlPCPU(CURPCB),%esi callsyscall Why do we movl PCPU(CURPCB),%esi before calling syscall? syscall is just c- function. -- Daniil Cherednik ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org