Re: Proposed change to makesyscalls.sh
I'm still not totally sure if this change is worthwhile. However, it does resolve the issue in kern/45781, and it removes the need for the work-around as currently implemented in src/sys/arch/usermode/modules/syscallemu/syscallemu.c I really would appreciate feedback from others on whether or not this change should be committed. Aside from some very constructive feedback and lots of awk-foo from kre@ I've seen no further feedback. So I'm planning to commit these changes (latest version incorporating kre's feedback is attached) in the next couple of days. Since this changes struct emul, it will require a kernel version bump; if anyone else wants to coordinate commits and ride-the-bump, please let me know! +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++Index: kern/makesyscalls.sh === RCS file: /cvsroot/src/sys/kern/makesyscalls.sh,v retrieving revision 1.169 diff -u -p -r1.169 makesyscalls.sh --- kern/makesyscalls.sh10 May 2017 06:08:56 - 1.169 +++ kern/makesyscalls.sh7 Aug 2018 21:36:07 - @@ -855,9 +855,12 @@ function putent(type, compatwrap) { if (argc != 0) { printf("\n\t\tns(struct %s%s_args),", compatwrap_, funcname) > sysent } - if (modular) + if (modular) { wfn = "sys_nomodule"; - else if (compatwrap == "") + idx = int(syscall / 32); + bit = 2 ^ (syscall % 32); + nomodbits[ idx ] += bit; + } else if (compatwrap == "") wfn = funcname; else wfn = compatwrap "(" funcname ")"; @@ -1126,6 +1129,13 @@ END { } maxsyscall = syscall + + # XXX + # XXX The following comparisons with nsysent will produce + # XXX unexpected results if (for example) syscall has a + # XXX value of 900 and nsysent has a value of "1024". We + # XXX probably ought to use int(nsysent) here... + # XXX if (nsysent) { if (syscall > nsysent) { printf("%s: line %d: too many syscalls [%d > %d]\n", infile, NR, syscall, nsysent) @@ -1142,6 +1152,16 @@ END { } } printf("};\n") > sysent + printf("\nconst uint32_t %s_nomodbits[] = {\n", switchname) > sysent + printf("};\n") > rumpsysent + printf("\nconst uint32_t rump_sysent_nomodbits[] = {\n") > rumpsysent + for (i = 0; i < syscall; i += 32) { + printf("\t0x%08x,\t /* syscalls %3d-%3d */\n", + nomodbits[i / 32], i, i + 31) > sysent + printf("\t0x%08x,\t /* syscalls %3d-%3d */\n", + nomodbits[i / 32], i, i + 31) > rumpsysent + } + printf("};\n") > sysent printf("};\n") > rumpsysent printf("CTASSERT(__arraycount(rump_sysent) == SYS_NSYSENT);\n") > rumpsysent printf("__strong_alias(rumpns_sysent,rump_sysent);\n") > rumpsysent
Re: Proposed change to makesyscalls.sh
| My initial pass at this was to maintain the bit vector at run-time | rather than having makesyscalls.sh calculate the value. The cost to | set the bits in syscall_establish() is not very large. EIther way, but that makes the kernel fractionally bigger for no real benefit - if it can be done at compile time, let it be. Yeah, I knew that! The attached patch calculates the bit vector at "regen" time. So we don't need to maintain the bit vector at run time, and we need only test the bit vector in one place - syscall_disestablish(). | The only drawback here is to add a new entry to struct emul to point | at the bit vector, Sure. | and to initialize it in every place that the entrypoint array is | initialized. Whenever e_sysent is set, so shall be the new one. How many places can there be? About 20 of them! :) Each of the compat- modules has its own struct emul. The attached patch includes all of the required changes. Note that after making these changes, we'll need to regenerate the various syscall files for each emul. The attachment also lists these files. | FWIW, what would you suggest as the name of a new struct emul member | (in sys/proc.h)? Naming ... not my strong suit... e_sc_nomodbits ? I ended up using e_nomodbits[] but it would be quite simple to change if anyone has a strong preference. | I'm beginning to think that this change really isn't worthwhile. I'm still not totally sure if this change is worthwhile. However, it does resolve the issue in kern/45781, and it removes the need for the work-around as currently implemented in src/sys/arch/usermode/modules/syscallemu/syscallemu.c I really would appreciate feedback from others on whether or not this change should be committed. +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++Index: arch/i386/i386/linux_syscall.c === RCS file: /cvsroot/src/sys/arch/i386/i386/linux_syscall.c,v retrieving revision 1.53 diff -u -p -r1.53 linux_syscall.c --- arch/i386/i386/linux_syscall.c 12 Aug 2017 07:21:57 - 1.53 +++ arch/i386/i386/linux_syscall.c 6 Aug 2018 12:02:16 - @@ -53,6 +53,7 @@ __KERNEL_RCSID(0, "$NetBSD: linux_syscal static void linux_syscall(struct trapframe *); extern struct sysent linux_sysent[]; +extern const uint32_t linux_sysent_nomodbits[]; void linux_syscall_intern(struct proc *p) Index: compat/aoutm68k/aoutm68k_exec.c === RCS file: /cvsroot/src/sys/compat/aoutm68k/aoutm68k_exec.c,v retrieving revision 1.29 diff -u -p -r1.29 aoutm68k_exec.c --- compat/aoutm68k/aoutm68k_exec.c 6 May 2018 13:40:50 - 1.29 +++ compat/aoutm68k/aoutm68k_exec.c 6 Aug 2018 12:02:17 - @@ -49,6 +49,7 @@ __KERNEL_RCSID(0, "$NetBSD: aoutm68k_exe #include extern struct sysent aoutm68k_sysent[]; +extern const uint32_t aoutm68k_sysent_nomodbits[]; extern char sigcode[], esigcode[]; void aoutm68k_syscall_intern(struct proc *); @@ -64,6 +65,7 @@ struct emul emul_netbsd_aoutm68k = { .e_nsysent =AOUTM68K_SYS_NSYSENT, #endif .e_sysent = aoutm68k_sysent, + .e_nomodbits = aoutm68k_sysent_nomodbits, #ifdef SYSCALL_DEBUG .e_syscallnames = syscallnames, #endif Index: compat/freebsd/freebsd_exec.c === RCS file: /cvsroot/src/sys/compat/freebsd/freebsd_exec.c,v retrieving revision 1.41 diff -u -p -r1.41 freebsd_exec.c --- compat/freebsd/freebsd_exec.c 6 May 2018 13:40:50 - 1.41 +++ compat/freebsd/freebsd_exec.c 6 Aug 2018 12:02:17 - @@ -54,6 +54,7 @@ __KERNEL_RCSID(0, "$NetBSD: freebsd_exec #include extern struct sysent freebsd_sysent[]; +extern const uint32_t freebsd_sysent_nomodbits[]; extern const char * const freebsd_syscallnames[]; struct uvm_object *emul_freebsd_object; @@ -72,6 +73,7 @@ struct emul emul_freebsd = { .e_nsysent =FREEBSD_SYS_NSYSENT, #endif .e_sysent = freebsd_sysent, + .e_nomodbits = freebsd_sysent_nomodbits, #ifdef SYSCALL_DEBUG .e_syscallnames = freebsd_syscallnames, #else Index: compat/ibcs2/ibcs2_exec.c === RCS file: /cvsroot/src/sys/compat/ibcs2/ibcs2_exec.c,v retrieving revision 1.78 diff -u -p -r1.78 ibcs2_exec.c --- compat/ibcs2/ibcs2_exec.c 6 May 2018 13:40:51 - 1.78 +++ compat/ibcs2/ibcs2_exec.c 6 Aug 2018 12:02:17 - @@ -64,6 +64,7 @@ __KERNEL_RCSID(0, "$NetBSD:
Re: Proposed change to makesyscalls.sh
On Mon 06 Aug 2018 at 05:07:03 +0800, Paul Goyette wrote: > Well, as I indicated before, it's not really essential to restore the > original value. If we blindly reset to sys_nomodule the only down-side > is the attempt to find an auto-loadable module on subsequent calls for > the "wrongly-tagged" syscall. sys_nomodule() could maybe even put back sys_nosys in the table in that case, for later callers. If it is certain it can't be a module. -Olaf. -- ___ Olaf 'Rhialto' Seibert -- Wayland: Those who don't understand X \X/ rhialto/at/falu.nl -- are condemned to reinvent it. Poorly. signature.asc Description: PGP signature
Re: Proposed change to makesyscalls.sh
Date:Mon, 6 Aug 2018 07:06:42 +0800 (+08) From:Paul Goyette Message-ID: | I looked into using bitstring(3) operations, but since that stuff "lives | in" /usr/include/bitstring.h it wasn't clear if using it in kernel code | would be appropriate. Doesn't matter much either way, when all you need is test (plus set, if you're doing it dynamically, rather than pre-prepared) and you don't need all the other crap that is in there, then the ops are trivial and can just be open coded if using that seems like a poor idea -- which I think it would be, the kernel should be self-contained and use only what is in /usr/src/sys (plus the common stuff)). | My initial pass at this was to maintain the bit vector at run-time | rather than having makesyscalls.sh calculate the value. The cost to | set the bits in syscall_establish() is not very large. EIther way, but that makes the kernel fractionally bigger for no real benefit - if it can be done at compile time, let it be. | The only drawback here is to add a new entry to struct emul to point | at the bit vector, Sure. | and to initialize it in every place that the entrypoint array is initialized. Whenever e_sysent is set, so shall be the new one. How many places can there be? | FWIW, what would you suggest as the name of a new struct emul member | (in sys/proc.h)? Naming ... not my strong suit... e_sc_nomodbits ? | I'm beginning to think that this change really isn't worthwhile. Not my decision. Just suggesting a possibility. As you know (I am sure) I never use modules (for anything at all, so every time someone suggests dtrace as useful for anything my skin crawls) so it makes no difference to me. kre
Re: Proposed change to makesyscalls.sh
On Mon, 6 Aug 2018, Robert Elz wrote: Date:Mon, 6 Aug 2018 05:07:03 +0800 (+08) From:Paul Goyette Message-ID: | Or we could just leave things alone, and tolerate the "hack" that is | currently being used. Or there could just be a new 1 bit/syscall const data struct that says whether the original (as compiled originally) syscall entry was sys_nomodule or something different. If the bit is set, put sys_nomodule back, otherwise sys_nosys (those are the only possible values, right). Yes, those are the only two values, as far as I can see. I can't imagine anything else making sense. I looked into using bitstring(3) operations, but since that stuff "lives in" /usr/include/bitstring.h it wasn't clear if using it in kernel code would be appropriate. makesyscalls.sh could easily make this new big vector, which would only be used as part of unloading a syscall module, so need not be particularly effecient (avoid big/little endian issues by making it an array of u_char if that matters, for example). My initial pass at this was to maintain the bit vector at run-time rather than having makesyscalls.sh calculate the value. The cost to set the bits in syscall_establish() is not very large. The only drawback here is to add a new entry to struct emul to point at the bit vector, and to initialize it in every place that the entrypoint array is initialized. Not a big deal, but just a little bit intrusive. FWIW, what would you suggest as the name of a new struct emul member (in sys/proc.h)? struct emul { ... struct sysent *e_sysent; /* System call array */ bitstr_t*e_??? ; /* nomodule/nosys flag for * syscall_disestablish() */ ... }; I'm beginning to think that this change really isn't worthwhile. By simply allowing syscall_establish() to modify both types of entry, we accomplish the desired goal, with only the minor impact of unneeded searches of the autoload list, IFF the syscall has been disestablished AND subsequently invoked. That invocation would fail anyway, so the small amount of extra work for the search should not be significant. +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: Proposed change to makesyscalls.sh
Date:Mon, 6 Aug 2018 05:07:03 +0800 (+08) From:Paul Goyette Message-ID: | Or we could just leave things alone, and tolerate the "hack" that is | currently being used. Or there could just be a new 1 bit/syscall const data struct that says whether the original (as compiled originally) syscall entry was sys_nomodule or something different. If the bit is set, put sys_nomodule back, otherwise sys_nosys (those are the only possible values, right). makesyscalls.sh could easily make this new big vector, which would only be used as part of unloading a syscall module, so need not be particularly effecient (avoid big/little endian issues by making it an array of u_char if that matters, for example). kre
Re: Proposed change to makesyscalls.sh
On Sun, Aug 05, 2018 at 09:09:54PM +0800, Paul Goyette wrote: > We could change syscall_establish() to install for both sys_nomodule or > sys_nosys entry points. But then we'd need to remember which value to > restore when syscall_disestablish() is called. Change the API and make syscall_establish() return the previous handler as a token, and pass that to syscall_disestablish() ? Then maybe we should check the validity of the token in syscall_disestablish(), but we are deep in "shoot your own foot" teritory here. Martin
Re: Proposed change to makesyscalls.sh
On Sun, 5 Aug 2018, Martin Husemann wrote: On Sun, Aug 05, 2018 at 06:24:25AM +0800, Paul Goyette wrote: The primary usage for this is for arch/usermode's syscallemu() which (according to those working on arch/usermode) could be considered "dangerous" and "should never be autoloaded. Having this change will allow use of a "registered" syscall number rather than having to have a work-around for PR kern/45781. Doesn't (say) 211 work? Then just mark it as reserved in syscalls.master in a comment, like: ; ; Syscalls 210-219 are reserved for dynamically loaded syscalls ; 210 EXTERN MODULAR openafs { int|sys||afssys(long id, long a1, long a2, \ long a3, long a4, long a5, long a6); } 211 UNIMPL /* syscallemu for sys/arch/usermode/modules/syscallemu */ 212 UNIMPL 213 UNIMPL 214 UNIMPL That would work, but only with the "work-around" for 45781. Otherwise, syscall_establish() will only install entries with sys_nomodule as the entrypoint. As far as I understand, UNIMPL generates entries with an entry point of sys_nosys - hence the work-around. Has anyone analyzed PR 45781? Is it just about extending the valid range or does it break for this numbers too? It's not just the range, it's the entrypoint that triggers the autoload stuff. (If the syscall number exceed the limit SYS_NSYSENT, which controls the size of various allocated arrays, there's not much you can do...) We could change syscall_establish() to install for both sys_nomodule or sys_nosys entry points. But then we'd need to remember which value to restore when syscall_disestablish() is called. Hmmm, looking closer at the code, we probably could ignore that issue. If we blindly restore a value of sys_nomodule in syscall_disestablish() the only thing that happens is that any subsequent attempts to use that syscall will go through the search of the syscall_autoload[] list to find one that matches the syscall number. If no entry is found, then we'll simply return sys_nosys() after all. OK, so the original proposal isn't necessary. Here's a proposal to allow syscall_establish() to install new syscalls for either sys_nosys or sys_nomodule entry points: Index: kern_syscall.c === RCS file: /cvsroot/src/sys/kern/kern_syscall.c,v retrieving revision 1.16 diff -u -p -r1.16 kern_syscall.c --- kern_syscall.c 24 Mar 2017 17:40:44 - 1.16 +++ kern_syscall.c 5 Aug 2018 13:06:41 - @@ -123,7 +123,8 @@ syscall_establish(const struct emul *em, * on error. */ for (i = 0; sp[i].sp_call != NULL; i++) { - if (sy[sp[i].sp_code].sy_call != sys_nomodule) { + if (sy[sp[i].sp_code].sy_call != sys_nomodule && + sy[sp[i].sp_code].sy_call != sys_nosys) { #ifdef DIAGNOSTIC printf("syscall %d is busy\n", sp[i].sp_code); #endif This avoids the need for the "work-around" for syscallemu... +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: Proposed change to makesyscalls.sh
On Sun, Aug 05, 2018 at 06:24:25AM +0800, Paul Goyette wrote: > The primary usage for this is for arch/usermode's syscallemu() which > (according to those working on arch/usermode) could be considered > "dangerous" and "should never be autoloaded. Having this change will > allow use of a "registered" syscall number rather than having to have > a work-around for PR kern/45781. Doesn't (say) 211 work? Then just mark it as reserved in syscalls.master in a comment, like: ; ; Syscalls 210-219 are reserved for dynamically loaded syscalls ; 210 EXTERN MODULAR openafs { int|sys||afssys(long id, long a1, long a2, \ long a3, long a4, long a5, long a6); } 211 UNIMPL /* syscallemu for sys/arch/usermode/modules/syscallemu */ 212 UNIMPL 213 UNIMPL 214 UNIMPL Has anyone analyzed PR 45781? Is it just about extending the valid range or does it break for this numbers too? Martin
Proposed change to makesyscalls.sh
I'd like to propose the following change to makesyscalls.sh to enable specifying a sys-call which _can_ be provided by a module, but only if the module is manually loaded. For these sys-calls, no entry is made in the autoload list, so the module will not be autoloaded. The primary usage for this is for arch/usermode's syscallemu() which (according to those working on arch/usermode) could be considered "dangerous" and "should never be autoloaded. Having this change will allow use of a "registered" syscall number rather than having to have a work-around for PR kern/45781. Side note: the existing work-around is incomplete, as it does not correctly restore the unregistered syscall's entry to sys_nomodule() after the module's MODCMD_FINI routine calls syscall_disestablish(). Comments? Index: syscalls.master === RCS file: /cvsroot/src/sys/kern/syscalls.master,v retrieving revision 1.293 diff -u -p -r1.293 syscalls.master --- syscalls.master 31 Jul 2018 13:00:13 - 1.293 +++ syscalls.master 4 Aug 2018 22:23:51 - @@ -13,6 +13,7 @@ ; Optional fields are specified after the type field ; (NOTE! they *must* be specified in this order): ; MODULAR modname :attempt to autoload system call module if not present +;orMODULAR NONE:optional system call which must be manually loaded ; RUMP: generate rump syscall entry point ; ; types: Index: makesyscalls.sh === RCS file: /cvsroot/src/sys/kern/makesyscalls.sh,v retrieving revision 1.169 diff -u -p -r1.169 makesyscalls.sh --- makesyscalls.sh 10 May 2017 06:08:56 - 1.169 +++ makesyscalls.sh 4 Aug 2018 22:23:51 - @@ -694,12 +694,13 @@ function printproto(wrap) { syscall) > sysnumhdr # output entry for syscall autoload table, if modular - if (modular ) { - printf("\t{ %s%s%s, \"%s\" },\n", constprefix, wrap, - funcalias, modname) > sysautoload + if (modular) { + if (modname != "NONE") { + printf("\t{ %s%s%s, \"%s\" },\n", constprefix, + wrap, funcalias, modname) > sysautoload + } } - # rumpalooza if (!rumpable) return +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++