Re: [PATCH] vmm: Add MSRs to readregs / writeregs
On Sat, Apr 29, 2017 at 05:24:22PM -0700, Pratik Vyas wrote: > Hello tech@, > > This is a patch that extends the readregs and writeregs vmm(4) ioctl to > read and write MSRs as well. > > It also sets the IA32_VMX_IA32E_MODE_GUEST entry control in > vcpu_reset_regs based on the value of EFER_LMA. > > There are changes to vmmvar.h and would require a `make includes` step > to build vmd(8). This should also not alter any behaviour of vmd(8). > > Context: These are the kernel changes required to implement vmctl send > and vmctl receive. vmctl send / receive are two new options that will > support snapshotting VMs and migrating VMs from one host to another. > This project was undertaken at San Jose State University along with my > three teammates CCed with mlarkin@ as our advisor. > > Patches to vmd(8) that implement vmctl send and vmctl receive are > forthcoming. > > > Thanks, > Pratik > committed, thanks. > > > Index: sys/arch/amd64/amd64/vmm.c > === > RCS file: /home/pdvyas/cvs/src/sys/arch/amd64/amd64/vmm.c,v > retrieving revision 1.137 > diff -u -p -a -u -r1.137 vmm.c > --- sys/arch/amd64/amd64/vmm.c28 Apr 2017 10:09:37 - 1.137 > +++ sys/arch/amd64/amd64/vmm.c30 Apr 2017 00:01:21 - > @@ -1334,7 +1334,9 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > uint64_t sel, limit, ar; > uint64_t *gprs = vrs->vrs_gprs; > uint64_t *crs = vrs->vrs_crs; > + uint64_t *msrs = vrs->vrs_msrs; > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > + struct vmx_msr_store *msr_store; > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > return (EINVAL); > @@ -1402,6 +1404,14 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > goto errout; > } > > + msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > + > + if (regmask & VM_RWREGS_MSRS) { > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > + msrs[i] = msr_store[i].vms_data; > + } > + } > + > goto out; > > errout: > @@ -1448,7 +1458,9 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > uint64_t limit, ar; > uint64_t *gprs = vrs->vrs_gprs; > uint64_t *crs = vrs->vrs_crs; > + uint64_t *msrs = vrs->vrs_msrs; > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > + struct vmx_msr_store *msr_store; > > if (loadvmcs) { > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > @@ -1518,6 +1530,14 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > goto errout; > } > > + msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > + > + if (regmask & VM_RWREGS_MSRS) { > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > + msr_store[i].vms_data = msrs[i]; > + } > + } > + > goto out; > > errout: > @@ -2128,7 +2148,7 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s >* IA32_VMX_LOAD_DEBUG_CONTROLS >* IA32_VMX_LOAD_IA32_PERF_GLOBAL_CTRL_ON_ENTRY >*/ > - if (ug == 1) > + if (ug == 1 && !(vrs->vrs_msrs[VCPU_REGS_EFER] & EFER_LMA)) > want1 = 0; > else > want1 = IA32_VMX_IA32E_MODE_GUEST; > @@ -2277,26 +2297,12 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s >*/ > msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > > - /* > - * Make sure LME is enabled in EFER if restricted guest mode is > - * needed. > - */ > - msr_store[0].vms_index = MSR_EFER; > - if (ug == 1) > - msr_store[0].vms_data = 0ULL; /* Initial value */ > - else > - msr_store[0].vms_data = EFER_LME; > - > - msr_store[1].vms_index = MSR_STAR; > - msr_store[1].vms_data = 0ULL; /* Initial value */ > - msr_store[2].vms_index = MSR_LSTAR; > - msr_store[2].vms_data = 0ULL; /* Initial value */ > - msr_store[3].vms_index = MSR_CSTAR; > - msr_store[3].vms_data = 0ULL; /* Initial value */ > - msr_store[4].vms_index = MSR_SFMASK; > - msr_store[4].vms_data = 0ULL; /* Initial value */ > - msr_store[5].vms_index = MSR_KERNELGSBASE; > - msr_store[5].vms_data = 0ULL; /* Initial value */ > + msr_store[VCPU_REGS_EFER].vms_index = MSR_EFER; > + msr_store[VCPU_REGS_STAR].vms_index = MSR_STAR; > + msr_store[VCPU_REGS_LSTAR].vms_index = MSR_LSTAR; > + msr_store[VCPU_REGS_CSTAR].vms_index = MSR_CSTAR; > + msr_store[VCPU_REGS_SFMASK].vms_index = MSR_SFMASK; > + msr_store[VCPU_REGS_KGSBASE].vms_index = MSR_KERNELGSBASE; > > /* >* Currently we have the same count of entry/exit MSRs loads/stores > @@ -2359,6 +2365,13 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s > ret = vcpu_writeregs_vmx(vcpu, VM_RWREGS_ALL, 0, vrs); > > /* > + * Make sure LME is enabled in EFER if restricted guest mode is > + * ne
Re: explicit_bzero after readpassphrase
> On Mon, May 01, 2017 at 04:07:27PM -0600, Theo de Raadt wrote: > > > > Let me stop here and ask if the pattern is: "always explicit_bzero > > a password field once it is used"? It might make sense, but some > > of these are heading straight to exit immediately. Is it too much > > to do it then, or is the worry these code patterns might get copied > > elsewhere? > > > > I would fall on the side of "It could get copied elsewhere or hoisted > for other reasons (like pledge)" so do it anyway. OK, the argument it could get copied into another program, where it is nowhere near a terminal path.. makes sense. So then all of them should get it. It is simply a safer pattern.
Re: explicit_bzero after readpassphrase
On Mon, May 01, 2017 at 04:07:27PM -0600, Theo de Raadt wrote: > > Let me stop here and ask if the pattern is: "always explicit_bzero > a password field once it is used"? It might make sense, but some > of these are heading straight to exit immediately. Is it too much > to do it then, or is the worry these code patterns might get copied > elsewhere? > I would fall on the side of "It could get copied elsewhere or hoisted for other reasons (like pledge)" so do it anyway.
Re: arm64 lock: no userland progress, several procs in wchan "vp"
On Mon, May 01, 2017 at 10:18:58PM +0200, Mark Kettenis wrote: > > Date: Mon, 1 May 2017 20:58:29 +0100 > > From: Stuart Henderson > > > > Userland is non-responsive, machine is pingable, tcp connections open > > but no banner from ssh. No failed pool requests. This kernel is from > > today's snapshot but I saw the same with one from a couple of days > > ago. Is there anything else I can get that might be useful? > > > > ddb> ps > >PID TID PPIDUID S FLAGS WAIT COMMAND > > 99554 86967 57409 55 3 0x2 vpcc > > 57409 23557 97377 55 30x82 wait cc > > 97377 51254 49407 55 30x10008a pause sh > > 71034 186155 65198 0 30x11 vpperl > > 49407 183801 58608 55 30x82 wait gmake > > 58608 251568 90720 55 30x10008a pause sh > > 90720 294385 26849 55 30x82 wait gmake > > 26849 434857 31480 55 30x100088 pause sh > > 31480 479316 1945 55 30x10008a pause sh > > 1945 53261 1392 55 30x82 wait gmake > > 1392 297593 28991 55 30x100088 pause sh > > 28991 101756 11650 55 30x10008a pause sh > > 11650 273060 70062 55 30x82 wait gmake > > 70062 380995 21324 55 30x82 wait gmake > > 21324 380494 20357 55 30x10008a pause make > > 20357 495141 79040 55 30x10008a pause sh > > 79040 411698 40069 55 30x10008a pause make > > 40069 407214 61289 55 30x10008a pause sh > > 61289 440156 65198 55 30x10008a pause make > > 16484 143829 63578 55 30x82 nanosleep perl > > 63578 247597 69857 55 30x10008a pause sh > > 698573708 28018 55 30x10008a pause make > > 28018 161747 1 55 30x10008a pause sh > > 78305 185109 40308 1000 30x100083 ttyin ksh > > 65198 454438 69872 0 30x93 wait perl > > 69872 91535 40308 1000 30x10008b pause ksh > > 40308 108204 1 1000 30x100080 kqreadtmux > > 72632 510504 69073 1000 30x100083 kqreadtmux > > 69073 166246 39096 1000 30x10008b pause ksh > > 39096 474432 39165 1000 30x10 vpsshd > > 39165 380864 95218 0 30x92 poll sshd > > 19837 75515 1 0 30x13 vpgetty > > 61 140725 1 0 30x100010 vpcron > > 33247 144573 1110 30x100090 poll sndiod > > 85245 294054 1 99 30x100090 poll sndiod > > 20071 339430 77361 95 30x100092 kqreadsmtpd > > 31714 216717 77361103 30x100092 kqreadsmtpd > > 38145 373966 77361 95 30x100092 kqreadsmtpd > > 73235 449750 77361 95 30x100092 kqreadsmtpd > > 52512 523411 77361 95 30x100092 kqreadsmtpd > > 25217 17706 77361 95 30x100092 kqreadsmtpd > > 77361 512649 1 0 30x100080 kqreadsmtpd > > 95218 352524 1 0 30x80 selectsshd > > 28640 338771 0 0 3 0x14280 nfsidlnfsio > > 30707 131410 0 0 3 0x14280 nfsidlnfsio > > 26109 142203 0 0 3 0x14280 nfsidlnfsio > > 61054 453416 0 0 3 0x14280 nfsidlnfsio > > 20679 124381 1 0 30x80 poll rpc.statd > > 75142 494960 1 28 30x100090 poll portmap > > 13394 497677 1 0 30x10 vpntpd > > 56991 117256 27035 83 30x100092 poll ntpd > > 27035 498377 1 83 30x100092 poll ntpd > > 19071 360785 2016 74 30x100090 bpf pflogd > > 2016 326372 1 0 30x80 netio pflogd > > 75485 263260 29155 73 30x100090 kqreadsyslogd > > 29155 379800 1 0 30x100082 netio syslogd > > 9314 271265 1 77 30x100090 poll dhclient > > 77002 87 1 0 30x80 poll dhclient > > 4332 479844 1 0 30x80 mfsidlmount_mfs > > 58334 330646 0 0 3 0x14200 pgzerozerothread > > 15557 331142 0 0 3 0x14200 aiodoned aiodoned > > 34557 432814 0 0 3 0x14200 syncerupdate > > 82663 208419 0 0 3 0x14200 cleaner cleaner > > 51853 347618 0 0 3 0x14200 reaperreaper > > 18753 499821 0
Re: explicit_bzero after readpassphrase
> Index: sbin/init/init.c > === > RCS file: /cvs/src/sbin/init/init.c,v > retrieving revision 1.63 > diff -u -p -u -r1.63 init.c > --- sbin/init/init.c 2 Mar 2017 10:38:09 - 1.63 > +++ sbin/init/init.c 4 Apr 2017 08:50:53 - > @@ -561,12 +561,13 @@ f_single_user(void) > write(STDERR_FILENO, banner, sizeof banner - 1); > for (;;) { > int ok = 0; > - clear = readpassphrase("Password:", pbuf, > sizeof(pbuf), RPP_ECHO_OFF); > + clear = readpassphrase("Password:", pbuf, > + sizeof(pbuf), RPP_ECHO_OFF); > if (clear == NULL || *clear == '\0') > _exit(0); > if (crypt_checkpass(clear, pp->pw_passwd) == 0) > ok = 1; > - memset(clear, 0, strlen(clear)); > + explicit_bzero(clear, strlen(clear)); > if (ok) > break; > warning("single-user login failed\n"); I wonder whether explicit_bzero of pbuf would be better. > Index: usr.bin/encrypt/encrypt.c > === > RCS file: /cvs/src/usr.bin/encrypt/encrypt.c,v > retrieving revision 1.45 > diff -u -p -u -r1.45 encrypt.c > --- usr.bin/encrypt/encrypt.c 4 Sep 2016 15:36:13 - 1.45 > +++ usr.bin/encrypt/encrypt.c 4 Apr 2017 08:51:00 - > @@ -134,6 +134,7 @@ main(int argc, char **argv) > err(1, "readpassphrase"); > print_passwd(string, operation, extra); > (void)fputc('\n', stdout); > + explicit_bzero(string, sizeof(string)); > } else { > size_t len; > /* Encrypt stdin to stdout. */ It is heading straight return (0) from main(). On the edge of useless, because if you have a bug in logic that simple.. Let me stop here and ask if the pattern is: "always explicit_bzero a password field once it is used"? It might make sense, but some of these are heading straight to exit immediately. Is it too much to do it then, or is the worry these code patterns might get copied elsewhere?
sxopio(4) pinctrl binding changes
So the Linux developers decided to change the bindings for the sunxi pinctrl devices. You know, it's supposed to a stable ABI... This diff updates the sxipio(4) driver to support the new binding in addition to the old binding. There is a subtle change in that it now leaves the current configuration of pull up/down and drive strength alone if there is no explicit configuration information. The Linux driver code is pretty incomprehensible, but I think that's what we're supposed to do. Tested on my Banana Pi with a dtb generated from the current Linux sources. The pin configurations change a bit, but all the changes make sense if I compare the old and the new dtb. ok? Index: sxipio.c === RCS file: /cvs/src/sys/dev/fdt/sxipio.c,v retrieving revision 1.1 diff -u -p -r1.1 sxipio.c --- sxipio.c21 Jan 2017 08:26:49 - 1.1 +++ sxipio.c1 May 2017 20:48:50 - @@ -206,6 +206,36 @@ sxipio_attach(struct device *parent, str printf(": %d pins\n", sc->sc_npins); } +int +sxipio_drive(int node) +{ + int drive; + + drive = OF_getpropint(node, "allwinner,drive", -1); + if (drive >= 0) + return drive; + drive = OF_getpropint(node, "drive-strength", 0) - 10; + if (drive >= 0) + return (drive / 10); + return -1; +} + +int +sxipio_pull(int node) +{ + int pull; + + pull = OF_getpropint(node, "allwinner,pull", -1); + if (pull >= 0) + return pull; + if (OF_getproplen(node, "bias-disable") == 0) + return 0; + if (OF_getproplen(node, "bias-pull-up") == 0) + return 1; + if (OF_getproplen(node, "bias-pull-down") == 0) + return 2; + return -1; +} int sxipio_pinctrl(uint32_t phandle, void *cookie) @@ -213,7 +243,7 @@ sxipio_pinctrl(uint32_t phandle, void *c struct sxipio_softc *sc = cookie; char func[32]; char *names, *name; - int port, pin, off; + int port, pin, off, mask; int mux, drive, pull; int node; int len; @@ -225,18 +255,25 @@ sxipio_pinctrl(uint32_t phandle, void *c return -1; len = OF_getprop(node, "allwinner,function", func, sizeof(func)); - if (len <= 0 || len >= sizeof(func)) - return -1; + if (len <= 0 || len >= sizeof(func)) { + len = OF_getprop(node, "function", func, sizeof(func)); + if (len <= 0 || len >= sizeof(func)) + return -1; + } len = OF_getproplen(node, "allwinner,pins"); - if (len <= 0) - return -1; + if (len <= 0) { + len = OF_getproplen(node, "pins"); + if (len <= 0) + return -1; + } names = malloc(len, M_TEMP, M_WAITOK); - OF_getprop(node, "allwinner,pins", names, len); + if (OF_getprop(node, "allwinner,pins", names, len) <= 0) + OF_getprop(node, "pins", names, len); - drive = OF_getpropint(node, "allwinner,drive", 0); - pull = OF_getpropint(node, "allwinner,pull", 0); + drive = sxipio_drive(node); + pull = sxipio_pull(node); name = names; while (len > 0) { @@ -261,11 +298,13 @@ sxipio_pinctrl(uint32_t phandle, void *c mux = sc->sc_pins[i].funcs[j].mux; s = splhigh(); - off = (pin & 0x7) << 2; - SXICMS4(sc, SXIPIO_CFG(port, pin), 0x7 << off, mux << off); - off = (pin & 0xf) << 1; - SXICMS4(sc, SXIPIO_DRV(port, pin), 0x3 << off, drive << off); - SXICMS4(sc, SXIPIO_PUL(port, pin), 0x3 << off, pull << off); + off = (pin & 0x7) << 2, mask = (0x7 << off); + SXICMS4(sc, SXIPIO_CFG(port, pin), mask, mux << off); + off = (pin & 0xf) << 1, mask = (0x3 << off); + if (drive >= 0 && drive < 4) + SXICMS4(sc, SXIPIO_DRV(port, pin), mask, drive << off); + if (pull >= 0 && pull < 3) + SXICMS4(sc, SXIPIO_PUL(port, pin), mask, pull << off); splx(s); len -= strlen(name) + 1;
explicit_bzero after readpassphrase
Hi tech@, After we are done with sensitive data (such as passwords) on readpassphrase(3) we should dispose it with explicit_bzero(3), nevertheless some base applications still rely either on bzero(3), memset(3), or something else entirely. Please find a diff below to change it to explicit_bzero(3). At least for init(8) I recall having discussed this with tb@ aeons ago and it's a little bit paranoid since it only occurs during single user, but we should give the example anyway. Thoughts? Am I missing somethings? Most likely yes... Best regards mestre Index: sbin/init/init.c === RCS file: /cvs/src/sbin/init/init.c,v retrieving revision 1.63 diff -u -p -u -r1.63 init.c --- sbin/init/init.c2 Mar 2017 10:38:09 - 1.63 +++ sbin/init/init.c4 Apr 2017 08:50:53 - @@ -561,12 +561,13 @@ f_single_user(void) write(STDERR_FILENO, banner, sizeof banner - 1); for (;;) { int ok = 0; - clear = readpassphrase("Password:", pbuf, sizeof(pbuf), RPP_ECHO_OFF); + clear = readpassphrase("Password:", pbuf, + sizeof(pbuf), RPP_ECHO_OFF); if (clear == NULL || *clear == '\0') _exit(0); if (crypt_checkpass(clear, pp->pw_passwd) == 0) ok = 1; - memset(clear, 0, strlen(clear)); + explicit_bzero(clear, strlen(clear)); if (ok) break; warning("single-user login failed\n"); Index: usr.bin/encrypt/encrypt.c === RCS file: /cvs/src/usr.bin/encrypt/encrypt.c,v retrieving revision 1.45 diff -u -p -u -r1.45 encrypt.c --- usr.bin/encrypt/encrypt.c 4 Sep 2016 15:36:13 - 1.45 +++ usr.bin/encrypt/encrypt.c 4 Apr 2017 08:51:00 - @@ -134,6 +134,7 @@ main(int argc, char **argv) err(1, "readpassphrase"); print_passwd(string, operation, extra); (void)fputc('\n', stdout); + explicit_bzero(string, sizeof(string)); } else { size_t len; /* Encrypt stdin to stdout. */ Index: usr.bin/lock/lock.c === RCS file: /cvs/src/usr.bin/lock/lock.c,v retrieving revision 1.33 diff -u -p -u -r1.33 lock.c --- usr.bin/lock/lock.c 28 May 2016 16:11:10 - 1.33 +++ usr.bin/lock/lock.c 4 Apr 2017 08:51:00 - @@ -162,7 +162,7 @@ main(int argc, char *argv[]) warnx("\apasswords didn't match."); exit(1); } - s[0] = '\0'; + explicit_bzero(s, sizeof(s)); } /* set signal handlers */ @@ -205,10 +205,16 @@ main(int argc, char *argv[]) p = NULL; else p = s; - if (auth_userokay(pw->pw_name, nstyle, "auth-lock", p)) + if (auth_userokay(pw->pw_name, nstyle, "auth-lock", + p)) { + explicit_bzero(s, sizeof(s)); break; - } else if (strcmp(s, s1) == 0) + } + } else if (strcmp(s, s1) == 0) { + explicit_bzero(s, sizeof(s)); + explicit_bzero(s1, sizeof(s1)); break; + } (void)putc('\a', stderr); cnt %= tries; if (++cnt > backoff) { Index: usr.bin/nc/socks.c === RCS file: /cvs/src/usr.bin/nc/socks.c,v retrieving revision 1.24 diff -u -p -u -r1.24 socks.c --- usr.bin/nc/socks.c 27 Jun 2016 14:43:04 - 1.24 +++ usr.bin/nc/socks.c 4 Apr 2017 08:51:01 - @@ -350,10 +350,13 @@ socks_connect(const char *host, const ch proxypass = getproxypass(proxyuser, proxyhost); r = snprintf(buf, sizeof(buf), "%s:%s", proxyuser, proxypass); + explicit_bzero(proxypass, sizeof(proxypass)); if (r == -1 || (size_t)r >= sizeof(buf) || b64_ntop(buf, strlen(buf), resp, - sizeof(resp)) == -1) + sizeof(resp)) == -1) { + explicit_bzero(r, sizeof(r)); errx(1, "Proxy username/password too long"); +
Re: arm64 lock: no userland progress, several procs in wchan "vp"
> Date: Mon, 1 May 2017 20:58:29 +0100 > From: Stuart Henderson > > Userland is non-responsive, machine is pingable, tcp connections open > but no banner from ssh. No failed pool requests. This kernel is from > today's snapshot but I saw the same with one from a couple of days > ago. Is there anything else I can get that might be useful? > > ddb> ps >PID TID PPIDUID S FLAGS WAIT COMMAND > 99554 86967 57409 55 3 0x2 vpcc > 57409 23557 97377 55 30x82 wait cc > 97377 51254 49407 55 30x10008a pause sh > 71034 186155 65198 0 30x11 vpperl > 49407 183801 58608 55 30x82 wait gmake > 58608 251568 90720 55 30x10008a pause sh > 90720 294385 26849 55 30x82 wait gmake > 26849 434857 31480 55 30x100088 pause sh > 31480 479316 1945 55 30x10008a pause sh > 1945 53261 1392 55 30x82 wait gmake > 1392 297593 28991 55 30x100088 pause sh > 28991 101756 11650 55 30x10008a pause sh > 11650 273060 70062 55 30x82 wait gmake > 70062 380995 21324 55 30x82 wait gmake > 21324 380494 20357 55 30x10008a pause make > 20357 495141 79040 55 30x10008a pause sh > 79040 411698 40069 55 30x10008a pause make > 40069 407214 61289 55 30x10008a pause sh > 61289 440156 65198 55 30x10008a pause make > 16484 143829 63578 55 30x82 nanosleep perl > 63578 247597 69857 55 30x10008a pause sh > 698573708 28018 55 30x10008a pause make > 28018 161747 1 55 30x10008a pause sh > 78305 185109 40308 1000 30x100083 ttyin ksh > 65198 454438 69872 0 30x93 wait perl > 69872 91535 40308 1000 30x10008b pause ksh > 40308 108204 1 1000 30x100080 kqreadtmux > 72632 510504 69073 1000 30x100083 kqreadtmux > 69073 166246 39096 1000 30x10008b pause ksh > 39096 474432 39165 1000 30x10 vpsshd > 39165 380864 95218 0 30x92 poll sshd > 19837 75515 1 0 30x13 vpgetty > 61 140725 1 0 30x100010 vpcron > 33247 144573 1110 30x100090 poll sndiod > 85245 294054 1 99 30x100090 poll sndiod > 20071 339430 77361 95 30x100092 kqreadsmtpd > 31714 216717 77361103 30x100092 kqreadsmtpd > 38145 373966 77361 95 30x100092 kqreadsmtpd > 73235 449750 77361 95 30x100092 kqreadsmtpd > 52512 523411 77361 95 30x100092 kqreadsmtpd > 25217 17706 77361 95 30x100092 kqreadsmtpd > 77361 512649 1 0 30x100080 kqreadsmtpd > 95218 352524 1 0 30x80 selectsshd > 28640 338771 0 0 3 0x14280 nfsidlnfsio > 30707 131410 0 0 3 0x14280 nfsidlnfsio > 26109 142203 0 0 3 0x14280 nfsidlnfsio > 61054 453416 0 0 3 0x14280 nfsidlnfsio > 20679 124381 1 0 30x80 poll rpc.statd > 75142 494960 1 28 30x100090 poll portmap > 13394 497677 1 0 30x10 vpntpd > 56991 117256 27035 83 30x100092 poll ntpd > 27035 498377 1 83 30x100092 poll ntpd > 19071 360785 2016 74 30x100090 bpf pflogd > 2016 326372 1 0 30x80 netio pflogd > 75485 263260 29155 73 30x100090 kqreadsyslogd > 29155 379800 1 0 30x100082 netio syslogd > 9314 271265 1 77 30x100090 poll dhclient > 77002 87 1 0 30x80 poll dhclient > 4332 479844 1 0 30x80 mfsidlmount_mfs > 58334 330646 0 0 3 0x14200 pgzerozerothread > 15557 331142 0 0 3 0x14200 aiodoned aiodoned > 34557 432814 0 0 3 0x14200 syncerupdate > 82663 208419 0 0 3 0x14200 cleaner cleaner > 51853 347618 0 0 3 0x14200 reaperreaper > 18753 499821 0 0 3 0x14200 pgdaemon pagedaemon > 59831 415568 0 0 3 0x14200 bored crynlk > 29354 478337 0 0 3 0x14200 bored crypto > 338986377 0 0
Re: [PATCH] vmm: Add MSRs to readregs / writeregs
On Mon, May 01, 2017 at 10:08:28PM +0200, Mark Kettenis wrote: > > Date: Mon, 1 May 2017 12:35:32 -0700 > > From: Mike Larkin > > > > Yes, the list is locked down to those listed in the diff (those are the > > only > > ones in the permissions bitmap anyway). > > > > I don't think there are any bits we should not allow touching via > > this mechanism. The guest can already do that itself (since we > > whitelist these MSRs in the bitmap). If we find there is a bit, say, > > in EFER, that we need to lock down, then we need to do that > > differently (there will be more changes than just this diff). I > > think this can go in and we can address that potential issue later > > (but I don't think there is an issue). > > > > One thing that will also need to be addressed (later) is the > > handling of FPU state, which was recently improved. We can choose to > > either just flush the state and set CR0_TS on the target after > > receive, or try to figure out all the state that needs to be sent > > over, and transfer that. Of course, there's always the possibility > > that we are sending to a machine with less FPU capability (or more) > > than the source, and we'll need to improve checking there as well > > (that probably amounts to transferring %xcr0 and checking that what > > the "sending side" wants is a subset of what the "receiving side" > > can accept). > > I fear that you'll have to make sure that vmd on the new machine masks > off any functionality not offered by the old machine and refuse a > transfer if the capability set of the new machine isn't a strict > superset of the old machine. > Yes, FPU is just the tip of the iceberg here. More stuff for these guys to implement :) > > I'll work with Pratik and the rest of the team to clean that up later. > > > > Other than the reserved bits question, any other thoughts/ ok? > > Nope. Go ahead. > Thanks, I'll commit it tonight. -ml > > > > > === > > > > > RCS file: /home/pdvyas/cvs/src/sys/arch/amd64/amd64/vmm.c,v > > > > > retrieving revision 1.137 > > > > > diff -u -p -a -u -r1.137 vmm.c > > > > > --- sys/arch/amd64/amd64/vmm.c28 Apr 2017 10:09:37 - > > > > > 1.137 > > > > > +++ sys/arch/amd64/amd64/vmm.c30 Apr 2017 00:01:21 - > > > > > @@ -1334,7 +1334,9 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > > > > uint64_t sel, limit, ar; > > > > > uint64_t *gprs = vrs->vrs_gprs; > > > > > uint64_t *crs = vrs->vrs_crs; > > > > > + uint64_t *msrs = vrs->vrs_msrs; > > > > > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > > > > > + struct vmx_msr_store *msr_store; > > > > > > > > > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > > > > > return (EINVAL); > > > > > @@ -1402,6 +1404,14 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > > > > goto errout; > > > > > } > > > > > > > > > > + msr_store = (struct vmx_msr_store > > > > > *)vcpu->vc_vmx_msr_exit_save_va; > > > > > + > > > > > + if (regmask & VM_RWREGS_MSRS) { > > > > > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > > > > > + msrs[i] = msr_store[i].vms_data; > > > > > + } > > > > > + } > > > > > + > > > > > goto out; > > > > > > > > > > errout: > > > > > @@ -1448,7 +1458,9 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > > > > > uint64_t limit, ar; > > > > > uint64_t *gprs = vrs->vrs_gprs; > > > > > uint64_t *crs = vrs->vrs_crs; > > > > > + uint64_t *msrs = vrs->vrs_msrs; > > > > > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > > > > > + struct vmx_msr_store *msr_store; > > > > > > > > > > if (loadvmcs) { > > > > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > > > > > @@ -1518,6 +1530,14 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > > > > > goto errout; > > > > > } > > > > > > > > > > + msr_store = (struct vmx_msr_store > > > > > *)vcpu->vc_vmx_msr_exit_save_va; > > > > > + > > > > > + if (regmask & VM_RWREGS_MSRS) { > > > > > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > > > > > + msr_store[i].vms_data = msrs[i]; > > > > > + } > > > > > + } > > > > > + > > > > > goto out; > > > > > > > > > > errout: > > > > > @@ -2128,7 +2148,7 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s > > > > >* IA32_VMX_LOAD_DEBUG_CONTROLS > > > > >* IA32_VMX_LOAD_IA32_PERF_GLOBAL_CTRL_ON_ENTRY > > > > >*/ > > > > > - if (ug == 1) > > > > > + if (ug == 1 && !(vrs->vrs_msrs[VCPU_REGS_EFER] & EFER_LMA)) > > > > > want1 = 0; > > > > > else > > > > > want1 = IA32_VMX_IA32E_MODE_GUEST; > > > > > @@ -2277,26 +2297,12 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s > > > > >*/ > > > > > msr_store = (struct vmx_msr_store > > > > > *)vcpu->vc_vmx_msr_exit_save_va; > > > > > > >
Re: [PATCH] vmm: Add MSRs to readregs / writeregs
> Date: Mon, 1 May 2017 12:35:32 -0700 > From: Mike Larkin > > Yes, the list is locked down to those listed in the diff (those are the only > ones in the permissions bitmap anyway). > > I don't think there are any bits we should not allow touching via > this mechanism. The guest can already do that itself (since we > whitelist these MSRs in the bitmap). If we find there is a bit, say, > in EFER, that we need to lock down, then we need to do that > differently (there will be more changes than just this diff). I > think this can go in and we can address that potential issue later > (but I don't think there is an issue). > > One thing that will also need to be addressed (later) is the > handling of FPU state, which was recently improved. We can choose to > either just flush the state and set CR0_TS on the target after > receive, or try to figure out all the state that needs to be sent > over, and transfer that. Of course, there's always the possibility > that we are sending to a machine with less FPU capability (or more) > than the source, and we'll need to improve checking there as well > (that probably amounts to transferring %xcr0 and checking that what > the "sending side" wants is a subset of what the "receiving side" > can accept). I fear that you'll have to make sure that vmd on the new machine masks off any functionality not offered by the old machine and refuse a transfer if the capability set of the new machine isn't a strict superset of the old machine. > I'll work with Pratik and the rest of the team to clean that up later. > > Other than the reserved bits question, any other thoughts/ ok? Nope. Go ahead. > > > > === > > > > RCS file: /home/pdvyas/cvs/src/sys/arch/amd64/amd64/vmm.c,v > > > > retrieving revision 1.137 > > > > diff -u -p -a -u -r1.137 vmm.c > > > > --- sys/arch/amd64/amd64/vmm.c 28 Apr 2017 10:09:37 - 1.137 > > > > +++ sys/arch/amd64/amd64/vmm.c 30 Apr 2017 00:01:21 - > > > > @@ -1334,7 +1334,9 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > > > uint64_t sel, limit, ar; > > > > uint64_t *gprs = vrs->vrs_gprs; > > > > uint64_t *crs = vrs->vrs_crs; > > > > + uint64_t *msrs = vrs->vrs_msrs; > > > > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > > > > + struct vmx_msr_store *msr_store; > > > > > > > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > > > > return (EINVAL); > > > > @@ -1402,6 +1404,14 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > > > goto errout; > > > > } > > > > > > > > + msr_store = (struct vmx_msr_store > > > > *)vcpu->vc_vmx_msr_exit_save_va; > > > > + > > > > + if (regmask & VM_RWREGS_MSRS) { > > > > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > > > > + msrs[i] = msr_store[i].vms_data; > > > > + } > > > > + } > > > > + > > > > goto out; > > > > > > > > errout: > > > > @@ -1448,7 +1458,9 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > > > > uint64_t limit, ar; > > > > uint64_t *gprs = vrs->vrs_gprs; > > > > uint64_t *crs = vrs->vrs_crs; > > > > + uint64_t *msrs = vrs->vrs_msrs; > > > > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > > > > + struct vmx_msr_store *msr_store; > > > > > > > > if (loadvmcs) { > > > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > > > > @@ -1518,6 +1530,14 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > > > > goto errout; > > > > } > > > > > > > > + msr_store = (struct vmx_msr_store > > > > *)vcpu->vc_vmx_msr_exit_save_va; > > > > + > > > > + if (regmask & VM_RWREGS_MSRS) { > > > > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > > > > + msr_store[i].vms_data = msrs[i]; > > > > + } > > > > + } > > > > + > > > > goto out; > > > > > > > > errout: > > > > @@ -2128,7 +2148,7 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s > > > > * IA32_VMX_LOAD_DEBUG_CONTROLS > > > > * IA32_VMX_LOAD_IA32_PERF_GLOBAL_CTRL_ON_ENTRY > > > > */ > > > > - if (ug == 1) > > > > + if (ug == 1 && !(vrs->vrs_msrs[VCPU_REGS_EFER] & EFER_LMA)) > > > > want1 = 0; > > > > else > > > > want1 = IA32_VMX_IA32E_MODE_GUEST; > > > > @@ -2277,26 +2297,12 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s > > > > */ > > > > msr_store = (struct vmx_msr_store > > > > *)vcpu->vc_vmx_msr_exit_save_va; > > > > > > > > - /* > > > > -* Make sure LME is enabled in EFER if restricted guest mode is > > > > -* needed. > > > > -*/ > > > > - msr_store[0].vms_index = MSR_EFER; > > > > - if (ug == 1) > > > > - msr_store[0].vms_data = 0ULL; /* Initial value */ > > > > - else > > > > - msr_s
arm64 lock: no userland progress, several procs in wchan "vp"
Userland is non-responsive, machine is pingable, tcp connections open but no banner from ssh. No failed pool requests. This kernel is from today's snapshot but I saw the same with one from a couple of days ago. Is there anything else I can get that might be useful? ddb> ps PID TID PPIDUID S FLAGS WAIT COMMAND 99554 86967 57409 55 3 0x2 vpcc 57409 23557 97377 55 30x82 wait cc 97377 51254 49407 55 30x10008a pause sh 71034 186155 65198 0 30x11 vpperl 49407 183801 58608 55 30x82 wait gmake 58608 251568 90720 55 30x10008a pause sh 90720 294385 26849 55 30x82 wait gmake 26849 434857 31480 55 30x100088 pause sh 31480 479316 1945 55 30x10008a pause sh 1945 53261 1392 55 30x82 wait gmake 1392 297593 28991 55 30x100088 pause sh 28991 101756 11650 55 30x10008a pause sh 11650 273060 70062 55 30x82 wait gmake 70062 380995 21324 55 30x82 wait gmake 21324 380494 20357 55 30x10008a pause make 20357 495141 79040 55 30x10008a pause sh 79040 411698 40069 55 30x10008a pause make 40069 407214 61289 55 30x10008a pause sh 61289 440156 65198 55 30x10008a pause make 16484 143829 63578 55 30x82 nanosleep perl 63578 247597 69857 55 30x10008a pause sh 698573708 28018 55 30x10008a pause make 28018 161747 1 55 30x10008a pause sh 78305 185109 40308 1000 30x100083 ttyin ksh 65198 454438 69872 0 30x93 wait perl 69872 91535 40308 1000 30x10008b pause ksh 40308 108204 1 1000 30x100080 kqreadtmux 72632 510504 69073 1000 30x100083 kqreadtmux 69073 166246 39096 1000 30x10008b pause ksh 39096 474432 39165 1000 30x10 vpsshd 39165 380864 95218 0 30x92 poll sshd 19837 75515 1 0 30x13 vpgetty 61 140725 1 0 30x100010 vpcron 33247 144573 1110 30x100090 poll sndiod 85245 294054 1 99 30x100090 poll sndiod 20071 339430 77361 95 30x100092 kqreadsmtpd 31714 216717 77361103 30x100092 kqreadsmtpd 38145 373966 77361 95 30x100092 kqreadsmtpd 73235 449750 77361 95 30x100092 kqreadsmtpd 52512 523411 77361 95 30x100092 kqreadsmtpd 25217 17706 77361 95 30x100092 kqreadsmtpd 77361 512649 1 0 30x100080 kqreadsmtpd 95218 352524 1 0 30x80 selectsshd 28640 338771 0 0 3 0x14280 nfsidlnfsio 30707 131410 0 0 3 0x14280 nfsidlnfsio 26109 142203 0 0 3 0x14280 nfsidlnfsio 61054 453416 0 0 3 0x14280 nfsidlnfsio 20679 124381 1 0 30x80 poll rpc.statd 75142 494960 1 28 30x100090 poll portmap 13394 497677 1 0 30x10 vpntpd 56991 117256 27035 83 30x100092 poll ntpd 27035 498377 1 83 30x100092 poll ntpd 19071 360785 2016 74 30x100090 bpf pflogd 2016 326372 1 0 30x80 netio pflogd 75485 263260 29155 73 30x100090 kqreadsyslogd 29155 379800 1 0 30x100082 netio syslogd 9314 271265 1 77 30x100090 poll dhclient 77002 87 1 0 30x80 poll dhclient 4332 479844 1 0 30x80 mfsidlmount_mfs 58334 330646 0 0 3 0x14200 pgzerozerothread 15557 331142 0 0 3 0x14200 aiodoned aiodoned 34557 432814 0 0 3 0x14200 syncerupdate 82663 208419 0 0 3 0x14200 cleaner cleaner 51853 347618 0 0 3 0x14200 reaperreaper 18753 499821 0 0 3 0x14200 pgdaemon pagedaemon 59831 415568 0 0 3 0x14200 bored crynlk 29354 478337 0 0 3 0x14200 bored crypto 338986377 0 0 3 0x14200 pftm pfpurge 33736 115197 0 0 3 0x14200 usbtskusbtask 54044 43212 0 0 3 0x14200 usbatsk usbatsk 57344 273215 0 0 3 0x14200
Re: [PATCH] vmm: Add MSRs to readregs / writeregs
On Mon, May 01, 2017 at 08:30:42PM +0200, Mark Kettenis wrote: > > Date: Mon, 1 May 2017 11:16:07 -0700 > > From: Mike Larkin > > > > On Sat, Apr 29, 2017 at 05:24:22PM -0700, Pratik Vyas wrote: > > > Hello tech@, > > > > > > This is a patch that extends the readregs and writeregs vmm(4) ioctl to > > > read and write MSRs as well. > > > > > > It also sets the IA32_VMX_IA32E_MODE_GUEST entry control in > > > vcpu_reset_regs based on the value of EFER_LMA. > > > > > > There are changes to vmmvar.h and would require a `make includes` step > > > to build vmd(8). This should also not alter any behaviour of vmd(8). > > > > > > Context: These are the kernel changes required to implement vmctl send > > > and vmctl receive. vmctl send / receive are two new options that will > > > support snapshotting VMs and migrating VMs from one host to another. > > > This project was undertaken at San Jose State University along with my > > > three teammates CCed with mlarkin@ as our advisor. > > > > > > Patches to vmd(8) that implement vmctl send and vmctl receive are > > > forthcoming. > > > > > > > > > Thanks, > > > Pratik > > > > > > > > > > For some more color commentary - this diff is needed to provide vmd a way to > > access the MSRs via the register set it sends to vmm for VM initialization. > > Without that, the VM resumes with default/power on state, and that won't > > work > > to restore a VM that had already been running. > > > > I'll do some quick testing myself (I know this team has already tested > > OpenBSD > > and Linux guests with this code, for both bios and non-bios boots) and if I > > don't see any problems, I'll commit this tonight. Unless someone objects? > > It only allows reading/writing selected MSRs isn't it? > > Are there any bits in those MSRs that vmd shouldn't touch? > Yes, the list is locked down to those listed in the diff (those are the only ones in the permissions bitmap anyway). I don't think there are any bits we should not allow touching via this mechanism. The guest can already do that itself (since we whitelist these MSRs in the bitmap). If we find there is a bit, say, in EFER, that we need to lock down, then we need to do that differently (there will be more changes than just this diff). I think this can go in and we can address that potential issue later (but I don't think there is an issue). One thing that will also need to be addressed (later) is the handling of FPU state, which was recently improved. We can choose to either just flush the state and set CR0_TS on the target after receive, or try to figure out all the state that needs to be sent over, and transfer that. Of course, there's always the possibility that we are sending to a machine with less FPU capability (or more) than the source, and we'll need to improve checking there as well (that probably amounts to transferring %xcr0 and checking that what the "sending side" wants is a subset of what the "receiving side" can accept). I'll work with Pratik and the rest of the team to clean that up later. Other than the reserved bits question, any other thoughts/ ok? -ml > > > === > > > RCS file: /home/pdvyas/cvs/src/sys/arch/amd64/amd64/vmm.c,v > > > retrieving revision 1.137 > > > diff -u -p -a -u -r1.137 vmm.c > > > --- sys/arch/amd64/amd64/vmm.c28 Apr 2017 10:09:37 - 1.137 > > > +++ sys/arch/amd64/amd64/vmm.c30 Apr 2017 00:01:21 - > > > @@ -1334,7 +1334,9 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > > uint64_t sel, limit, ar; > > > uint64_t *gprs = vrs->vrs_gprs; > > > uint64_t *crs = vrs->vrs_crs; > > > + uint64_t *msrs = vrs->vrs_msrs; > > > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > > > + struct vmx_msr_store *msr_store; > > > > > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > > > return (EINVAL); > > > @@ -1402,6 +1404,14 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > > goto errout; > > > } > > > > > > + msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > > > + > > > + if (regmask & VM_RWREGS_MSRS) { > > > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > > > + msrs[i] = msr_store[i].vms_data; > > > + } > > > + } > > > + > > > goto out; > > > > > > errout: > > > @@ -1448,7 +1458,9 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > > > uint64_t limit, ar; > > > uint64_t *gprs = vrs->vrs_gprs; > > > uint64_t *crs = vrs->vrs_crs; > > > + uint64_t *msrs = vrs->vrs_msrs; > > > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > > > + struct vmx_msr_store *msr_store; > > > > > > if (loadvmcs) { > > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > > > @@ -1518,6 +1530,14 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > > > goto errout; > > > } > > > > > > + msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > > > + > > > + if (regmask & VM_RWREGS_MSRS) { > > > +
Re: [PATCH] vmm: Add MSRs to readregs / writeregs
> Date: Mon, 1 May 2017 11:16:07 -0700 > From: Mike Larkin > > On Sat, Apr 29, 2017 at 05:24:22PM -0700, Pratik Vyas wrote: > > Hello tech@, > > > > This is a patch that extends the readregs and writeregs vmm(4) ioctl to > > read and write MSRs as well. > > > > It also sets the IA32_VMX_IA32E_MODE_GUEST entry control in > > vcpu_reset_regs based on the value of EFER_LMA. > > > > There are changes to vmmvar.h and would require a `make includes` step > > to build vmd(8). This should also not alter any behaviour of vmd(8). > > > > Context: These are the kernel changes required to implement vmctl send > > and vmctl receive. vmctl send / receive are two new options that will > > support snapshotting VMs and migrating VMs from one host to another. > > This project was undertaken at San Jose State University along with my > > three teammates CCed with mlarkin@ as our advisor. > > > > Patches to vmd(8) that implement vmctl send and vmctl receive are > > forthcoming. > > > > > > Thanks, > > Pratik > > > > > > For some more color commentary - this diff is needed to provide vmd a way to > access the MSRs via the register set it sends to vmm for VM initialization. > Without that, the VM resumes with default/power on state, and that won't work > to restore a VM that had already been running. > > I'll do some quick testing myself (I know this team has already tested OpenBSD > and Linux guests with this code, for both bios and non-bios boots) and if I > don't see any problems, I'll commit this tonight. Unless someone objects? It only allows reading/writing selected MSRs isn't it? Are there any bits in those MSRs that vmd shouldn't touch? > > Index: sys/arch/amd64/amd64/vmm.c > > === > > RCS file: /home/pdvyas/cvs/src/sys/arch/amd64/amd64/vmm.c,v > > retrieving revision 1.137 > > diff -u -p -a -u -r1.137 vmm.c > > --- sys/arch/amd64/amd64/vmm.c 28 Apr 2017 10:09:37 - 1.137 > > +++ sys/arch/amd64/amd64/vmm.c 30 Apr 2017 00:01:21 - > > @@ -1334,7 +1334,9 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > uint64_t sel, limit, ar; > > uint64_t *gprs = vrs->vrs_gprs; > > uint64_t *crs = vrs->vrs_crs; > > + uint64_t *msrs = vrs->vrs_msrs; > > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > > + struct vmx_msr_store *msr_store; > > > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > > return (EINVAL); > > @@ -1402,6 +1404,14 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > > goto errout; > > } > > > > + msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > > + > > + if (regmask & VM_RWREGS_MSRS) { > > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > > + msrs[i] = msr_store[i].vms_data; > > + } > > + } > > + > > goto out; > > > > errout: > > @@ -1448,7 +1458,9 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > > uint64_t limit, ar; > > uint64_t *gprs = vrs->vrs_gprs; > > uint64_t *crs = vrs->vrs_crs; > > + uint64_t *msrs = vrs->vrs_msrs; > > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > > + struct vmx_msr_store *msr_store; > > > > if (loadvmcs) { > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > > @@ -1518,6 +1530,14 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > > goto errout; > > } > > > > + msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > > + > > + if (regmask & VM_RWREGS_MSRS) { > > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > > + msr_store[i].vms_data = msrs[i]; > > + } > > + } > > + > > goto out; > > > > errout: > > @@ -2128,7 +2148,7 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s > > * IA32_VMX_LOAD_DEBUG_CONTROLS > > * IA32_VMX_LOAD_IA32_PERF_GLOBAL_CTRL_ON_ENTRY > > */ > > - if (ug == 1) > > + if (ug == 1 && !(vrs->vrs_msrs[VCPU_REGS_EFER] & EFER_LMA)) > > want1 = 0; > > else > > want1 = IA32_VMX_IA32E_MODE_GUEST; > > @@ -2277,26 +2297,12 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s > > */ > > msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > > > > - /* > > -* Make sure LME is enabled in EFER if restricted guest mode is > > -* needed. > > -*/ > > - msr_store[0].vms_index = MSR_EFER; > > - if (ug == 1) > > - msr_store[0].vms_data = 0ULL; /* Initial value */ > > - else > > - msr_store[0].vms_data = EFER_LME; > > - > > - msr_store[1].vms_index = MSR_STAR; > > - msr_store[1].vms_data = 0ULL; /* Initial value */ > > - msr_store[2].vms_index = MSR_LSTAR; > > - msr_store[2].vms_data = 0ULL; /* Initial value */ > > - msr_store[3].vms_index = MSR_CSTAR; > > - msr_store[3].vms_data = 0ULL; /* Initial value */ > > - msr_store[4].vms_index = MSR_SFMASK; > > - msr_store[4].vms_data = 0ULL; /* Initial value */ > > -
Re: [PATCH] vmm: Add MSRs to readregs / writeregs
On Sat, Apr 29, 2017 at 05:24:22PM -0700, Pratik Vyas wrote: > Hello tech@, > > This is a patch that extends the readregs and writeregs vmm(4) ioctl to > read and write MSRs as well. > > It also sets the IA32_VMX_IA32E_MODE_GUEST entry control in > vcpu_reset_regs based on the value of EFER_LMA. > > There are changes to vmmvar.h and would require a `make includes` step > to build vmd(8). This should also not alter any behaviour of vmd(8). > > Context: These are the kernel changes required to implement vmctl send > and vmctl receive. vmctl send / receive are two new options that will > support snapshotting VMs and migrating VMs from one host to another. > This project was undertaken at San Jose State University along with my > three teammates CCed with mlarkin@ as our advisor. > > Patches to vmd(8) that implement vmctl send and vmctl receive are > forthcoming. > > > Thanks, > Pratik > > For some more color commentary - this diff is needed to provide vmd a way to access the MSRs via the register set it sends to vmm for VM initialization. Without that, the VM resumes with default/power on state, and that won't work to restore a VM that had already been running. I'll do some quick testing myself (I know this team has already tested OpenBSD and Linux guests with this code, for both bios and non-bios boots) and if I don't see any problems, I'll commit this tonight. Unless someone objects? -ml > > Index: sys/arch/amd64/amd64/vmm.c > === > RCS file: /home/pdvyas/cvs/src/sys/arch/amd64/amd64/vmm.c,v > retrieving revision 1.137 > diff -u -p -a -u -r1.137 vmm.c > --- sys/arch/amd64/amd64/vmm.c28 Apr 2017 10:09:37 - 1.137 > +++ sys/arch/amd64/amd64/vmm.c30 Apr 2017 00:01:21 - > @@ -1334,7 +1334,9 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > uint64_t sel, limit, ar; > uint64_t *gprs = vrs->vrs_gprs; > uint64_t *crs = vrs->vrs_crs; > + uint64_t *msrs = vrs->vrs_msrs; > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > + struct vmx_msr_store *msr_store; > > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > return (EINVAL); > @@ -1402,6 +1404,14 @@ vcpu_readregs_vmx(struct vcpu *vcpu, uin > goto errout; > } > > + msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > + > + if (regmask & VM_RWREGS_MSRS) { > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > + msrs[i] = msr_store[i].vms_data; > + } > + } > + > goto out; > > errout: > @@ -1448,7 +1458,9 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > uint64_t limit, ar; > uint64_t *gprs = vrs->vrs_gprs; > uint64_t *crs = vrs->vrs_crs; > + uint64_t *msrs = vrs->vrs_msrs; > struct vcpu_segment_info *sregs = vrs->vrs_sregs; > + struct vmx_msr_store *msr_store; > > if (loadvmcs) { > if (vcpu_reload_vmcs_vmx(&vcpu->vc_control_pa)) > @@ -1518,6 +1530,14 @@ vcpu_writeregs_vmx(struct vcpu *vcpu, ui > goto errout; > } > > + msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > + > + if (regmask & VM_RWREGS_MSRS) { > + for (i = 0; i < VCPU_REGS_NMSRS; i++) { > + msr_store[i].vms_data = msrs[i]; > + } > + } > + > goto out; > > errout: > @@ -2128,7 +2148,7 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s >* IA32_VMX_LOAD_DEBUG_CONTROLS >* IA32_VMX_LOAD_IA32_PERF_GLOBAL_CTRL_ON_ENTRY >*/ > - if (ug == 1) > + if (ug == 1 && !(vrs->vrs_msrs[VCPU_REGS_EFER] & EFER_LMA)) > want1 = 0; > else > want1 = IA32_VMX_IA32E_MODE_GUEST; > @@ -2277,26 +2297,12 @@ vcpu_reset_regs_vmx(struct vcpu *vcpu, s >*/ > msr_store = (struct vmx_msr_store *)vcpu->vc_vmx_msr_exit_save_va; > > - /* > - * Make sure LME is enabled in EFER if restricted guest mode is > - * needed. > - */ > - msr_store[0].vms_index = MSR_EFER; > - if (ug == 1) > - msr_store[0].vms_data = 0ULL; /* Initial value */ > - else > - msr_store[0].vms_data = EFER_LME; > - > - msr_store[1].vms_index = MSR_STAR; > - msr_store[1].vms_data = 0ULL; /* Initial value */ > - msr_store[2].vms_index = MSR_LSTAR; > - msr_store[2].vms_data = 0ULL; /* Initial value */ > - msr_store[3].vms_index = MSR_CSTAR; > - msr_store[3].vms_data = 0ULL; /* Initial value */ > - msr_store[4].vms_index = MSR_SFMASK; > - msr_store[4].vms_data = 0ULL; /* Initial value */ > - msr_store[5].vms_index = MSR_KERNELGSBASE; > - msr_store[5].vms_data = 0ULL; /* Initial value */ > + msr_store[VCPU_REGS_EFER].vms_index = MSR_EFER; > + msr_store[VCPU_REGS_STAR].vms_index = MSR_STAR; > + msr_store[VCPU_REGS_LSTAR].vms_index = MSR_LSTAR; > +
Re: Atomic copyin(9)/copyout(9) for amd64
Mark Kettenis wrote: > The futex(2) syscall needs to be able to atomically copy the futex in > and out of userland. The current implementation uses copyin(9) and > copyout(9) for that. The futex is a 32-bit integer, and currently our > copyin(9) and copyout(9) don't guarantee an atomic 32-bit access. > Previously mpi@ and I discussed implementing new interfaces that do > guarantee the required atomicity. However, it oocurred to me that we > could simply change our copyin implementations such that they > guarantee atomicity of a properly aligned 32-bit copyin and copyout. > > The i386 version of these calls uses "rep movsl", which means it is > already atomic. At least that is how I interpret 8.2.4 in Volume 3A > of the Intel SDM. The diff below makes the amd64 version safe as > well. This does introduce a few additional instructions in the loop. > Apparently modern Intel CPUs optimize the string loops. If we can > rely on the hardware to turn 32-bit moves into 64-bit moves, we could > simplify the code by using "rep movsl" instead of "rep movsq". That sounds like a somewhat risky, very difficult to verify, optimization for little gain. I'm not sure I would trust move coalescing to apply across page boundaries. But adding support for 32-bit atomic copies sounds like a great idea. > > Thoughts? > > > Index: arch/amd64/amd64/copy.S > === > RCS file: /cvs/src/sys/arch/amd64/amd64/copy.S,v > retrieving revision 1.7 > diff -u -p -r1.7 copy.S > --- arch/amd64/amd64/copy.S 25 Apr 2015 21:31:24 - 1.7 > +++ arch/amd64/amd64/copy.S 1 May 2017 15:32:17 - > @@ -138,7 +138,12 @@ ENTRY(copyout) > rep > movsq > movb%al,%cl > - andb$7,%cl > + shrb$2,%cl > + andb$1,%cl > + rep > + movsl > + movb%al,%cl > + andb$3,%cl > rep > movsb > SMAP_CLAC > @@ -168,7 +173,12 @@ ENTRY(copyin) > rep > movsq > movb%al,%cl > - andb$7,%cl > + shrb$2,%cl > + andb$1,%cl > + rep > + movsl > + movb%al,%cl > + andb$3,%cl > rep > movsb > >
nvme.c: trivial patch fixing a typo in a warning message
Hi, Just to see what would happen, I added a virtual NVMe controller to an OpenBSD 6.1 VM on VMware ESXi 6.5. It seemed to work OK, but on VM shutdown a warning appeared. I wasn't able to find out more, so the best I have is a patch correcting a typo in the warning... Index: src/sys/dev/ic/nvme.c === RCS file: /cvs/src/sys/dev/ic/nvme.c,v retrieving revision 1.54 diff -u -p -u -r1.54 nvme.c --- src/sys/dev/ic/nvme.c 8 Apr 2017 02:57:25 - 1.54 +++ src/sys/dev/ic/nvme.c 30 Apr 2017 13:12:37 - @@ -452,7 +452,7 @@ nvme_shutdown(struct nvme_softc *sc) delay(1000); } - printf("%s: unable to shudown, disabling\n", DEVNAME(sc)); + printf("%s: unable to shutdown, disabling\n", DEVNAME(sc)); disable: nvme_disable(sc); Regards, Jurjen Oskam
Atomic copyin(9)/copyout(9) for amd64
The futex(2) syscall needs to be able to atomically copy the futex in and out of userland. The current implementation uses copyin(9) and copyout(9) for that. The futex is a 32-bit integer, and currently our copyin(9) and copyout(9) don't guarantee an atomic 32-bit access. Previously mpi@ and I discussed implementing new interfaces that do guarantee the required atomicity. However, it oocurred to me that we could simply change our copyin implementations such that they guarantee atomicity of a properly aligned 32-bit copyin and copyout. The i386 version of these calls uses "rep movsl", which means it is already atomic. At least that is how I interpret 8.2.4 in Volume 3A of the Intel SDM. The diff below makes the amd64 version safe as well. This does introduce a few additional instructions in the loop. Apparently modern Intel CPUs optimize the string loops. If we can rely on the hardware to turn 32-bit moves into 64-bit moves, we could simplify the code by using "rep movsl" instead of "rep movsq". Thoughts? Index: arch/amd64/amd64/copy.S === RCS file: /cvs/src/sys/arch/amd64/amd64/copy.S,v retrieving revision 1.7 diff -u -p -r1.7 copy.S --- arch/amd64/amd64/copy.S 25 Apr 2015 21:31:24 - 1.7 +++ arch/amd64/amd64/copy.S 1 May 2017 15:32:17 - @@ -138,7 +138,12 @@ ENTRY(copyout) rep movsq movb%al,%cl - andb$7,%cl + shrb$2,%cl + andb$1,%cl + rep + movsl + movb%al,%cl + andb$3,%cl rep movsb SMAP_CLAC @@ -168,7 +173,12 @@ ENTRY(copyin) rep movsq movb%al,%cl - andb$7,%cl + shrb$2,%cl + andb$1,%cl + rep + movsl + movb%al,%cl + andb$3,%cl rep movsb
Re: less(1): plug memory leak
looks good, ok nicm On Mon, May 01, 2017 at 10:35:59AM +0200, Tobias Stoeckmann wrote: > Hi, > > On Mon, May 01, 2017 at 09:15:45AM +0200, Anton Lindqvist wrote: > > While freeing tag entries, make sure to free the copied strings. > > this patch looks good to me. > > Have you reported this to the upstream less maintainers, as well? > The original less and the forked one from Garrett D'Amore, which we > use as foundation, have the same issue in them. > > > Tobias > > > Index: tags.c > > === > > RCS file: /cvs/src/usr.bin/less/tags.c,v > > retrieving revision 1.18 > > diff -u -p -r1.18 tags.c > > --- tags.c 17 Sep 2016 15:06:41 - 1.18 > > +++ tags.c 30 Apr 2017 18:13:24 - > > @@ -77,6 +77,8 @@ cleantags(void) > > */ > > while ((tp = taglist.tl_first) != TAG_END) { > > TAG_RM(tp); > > + free(tp->tag_file); > > + free(tp->tag_pattern); > > free(tp); > > } > > curtag = NULL; > > >
Re: ASan on OpenBSD
On Mon, May 1, 2017 at 12:28 PM, Mark Kettenis wrote: >> From: Dmitry Vyukov >> Date: Mon, 1 May 2017 10:43:26 +0200 >> >> On Mon, May 1, 2017 at 8:51 AM, Greg Steuck wrote: >> > I naively tried to build something with -fsanitize=address using llvm-4.0 >> > port available on OpenBSD 6.1-amd64. I was immediately greeted with: >> > clang-4.0: error: unsupported option '-fsanitize=address' for target >> > 'amd64-unknown-openbsd6.1' >> > >> > How deep a rat hole does one have to go to port ASan to a new >> > flavour of BSD? Is OpenBSD going to be particularly painful with >> > its special malloc and advanced ASLR? Is anybody working on this? > > Our focus is currently still on integrating llvm/clang/lld into > OpenBSD. As far as I know nobody is working on this yet, but it > sounds like something we'd certainly be interested in having. > >> Hi, >> >> I can think of 2 major areas re porting to a new OS: >> 1. Function interception. Presumably our current scheme just works on >> OpenBSD (as it works on Linux and FreeBSD). >> 2. Shadow memory mapping. We need to mmap a multi-TB range in process >> address space. Kernel need to support such huge mappings at all and >> with reasonable performance. How aggressive is ASLR? Is it possible to >> turn it off for a process (that may be the simplest option)? What's so >> special about malloc? Note that asan replaces standard malloc >> entirely. > > The ASLR is pretty aggressive and there is (quite deliberately) no > option to turn it off. Our malloc has some pretty nift bug detection > strategies built-in such as guard pages, aggressive unmapping of freed > memory, aggressive radomization, etc. Many of these features are > turned on by default. But replacing malloc with a different > implementation is still possible in OpenBSD. > > Currently mapping a mult-TB range doesn't work in OpenBSD. But I > think it could be made to work if most of that range will be mapped > with PROT_NONE. It needs to be mapped with PROT_READ|PROT_WRITE and MAP_NORESERVE. We don't have code that later would detect which parts of the mapping actually become active and need to be remapped from PROT_NONE to PROT_RW, we rely on lazy, on-demand paging in.
Re: ASan on OpenBSD
> From: Dmitry Vyukov > Date: Mon, 1 May 2017 10:43:26 +0200 > > On Mon, May 1, 2017 at 8:51 AM, Greg Steuck wrote: > > I naively tried to build something with -fsanitize=address using llvm-4.0 > > port available on OpenBSD 6.1-amd64. I was immediately greeted with: > > clang-4.0: error: unsupported option '-fsanitize=address' for target > > 'amd64-unknown-openbsd6.1' > > > > How deep a rat hole does one have to go to port ASan to a new > > flavour of BSD? Is OpenBSD going to be particularly painful with > > its special malloc and advanced ASLR? Is anybody working on this? Our focus is currently still on integrating llvm/clang/lld into OpenBSD. As far as I know nobody is working on this yet, but it sounds like something we'd certainly be interested in having. > Hi, > > I can think of 2 major areas re porting to a new OS: > 1. Function interception. Presumably our current scheme just works on > OpenBSD (as it works on Linux and FreeBSD). > 2. Shadow memory mapping. We need to mmap a multi-TB range in process > address space. Kernel need to support such huge mappings at all and > with reasonable performance. How aggressive is ASLR? Is it possible to > turn it off for a process (that may be the simplest option)? What's so > special about malloc? Note that asan replaces standard malloc > entirely. The ASLR is pretty aggressive and there is (quite deliberately) no option to turn it off. Our malloc has some pretty nift bug detection strategies built-in such as guard pages, aggressive unmapping of freed memory, aggressive radomization, etc. Many of these features are turned on by default. But replacing malloc with a different implementation is still possible in OpenBSD. Currently mapping a mult-TB range doesn't work in OpenBSD. But I think it could be made to work if most of that range will be mapped with PROT_NONE.
Re: ASan on OpenBSD
On Mon, May 1, 2017 at 8:51 AM, Greg Steuck wrote: > I naively tried to build something with -fsanitize=address using llvm-4.0 > port available on OpenBSD 6.1-amd64. I was immediately greeted with: > clang-4.0: error: unsupported option '-fsanitize=address' for target > 'amd64-unknown-openbsd6.1' > > How deep a rat hole does one have to go to port ASan to a new flavour of > BSD? Is OpenBSD going to be particularly painful with its special malloc and > advanced ASLR? Is anybody working on this? Hi, I can think of 2 major areas re porting to a new OS: 1. Function interception. Presumably our current scheme just works on OpenBSD (as it works on Linux and FreeBSD). 2. Shadow memory mapping. We need to mmap a multi-TB range in process address space. Kernel need to support such huge mappings at all and with reasonable performance. How aggressive is ASLR? Is it possible to turn it off for a process (that may be the simplest option)? What's so special about malloc? Note that asan replaces standard malloc entirely. Nobody is working on OpenBSD support as far as I know.
Re: less(1): plug memory leak
Hi, On Mon, May 01, 2017 at 10:35:59AM +0200, Tobias Stoeckmann wrote: > Hi, > > On Mon, May 01, 2017 at 09:15:45AM +0200, Anton Lindqvist wrote: > > While freeing tag entries, make sure to free the copied strings. > > this patch looks good to me. > > Have you reported this to the upstream less maintainers, as well? > The original less and the forked one from Garrett D'Amore, which we > use as foundation, have the same issue in them. Already reported to Garrett[1]. I will make sure to submit it to the original as well. [1] https://github.com/gdamore/less-fork/pull/38
Re: less(1): plug memory leak
Hi, On Mon, May 01, 2017 at 09:15:45AM +0200, Anton Lindqvist wrote: > While freeing tag entries, make sure to free the copied strings. this patch looks good to me. Have you reported this to the upstream less maintainers, as well? The original less and the forked one from Garrett D'Amore, which we use as foundation, have the same issue in them. Tobias > Index: tags.c > === > RCS file: /cvs/src/usr.bin/less/tags.c,v > retrieving revision 1.18 > diff -u -p -r1.18 tags.c > --- tags.c17 Sep 2016 15:06:41 - 1.18 > +++ tags.c30 Apr 2017 18:13:24 - > @@ -77,6 +77,8 @@ cleantags(void) >*/ > while ((tp = taglist.tl_first) != TAG_END) { > TAG_RM(tp); > + free(tp->tag_file); > + free(tp->tag_pattern); > free(tp); > } > curtag = NULL; >
less(1): plug memory leak
Hi, While freeing tag entries, make sure to free the copied strings. Index: tags.c === RCS file: /cvs/src/usr.bin/less/tags.c,v retrieving revision 1.18 diff -u -p -r1.18 tags.c --- tags.c 17 Sep 2016 15:06:41 - 1.18 +++ tags.c 30 Apr 2017 18:13:24 - @@ -77,6 +77,8 @@ cleantags(void) */ while ((tp = taglist.tl_first) != TAG_END) { TAG_RM(tp); + free(tp->tag_file); + free(tp->tag_pattern); free(tp); } curtag = NULL;