Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP
matthew green wrote: > what's the use-case for disabling encrypted swap later? It might be too slow on some machines. > i'd argue we should avoid kauth for this and simply disable > it always as i've been unable to think of any use case that > is the only solution. Always encrypted swap would be even better but ... slow machines. -- Alex
Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP
Greg Troxel wrote: > Kamil Rytarowski writes: > > > Is it possible to avoid negation in the name? > > > > KAUTH_SYSTEM_ENABLE_SWAP_ENCRYPTION > > I think the point is to have one permission to enable it, which is > perhaps just regular root, and another to disable it if securelevel is > elevated. > > So perhaps there should be two names, one to enable, one to disable. Kauth is about security rather than speed or convenience. Disabling encryption may improve speed but it definitely degrades your security level. So, you can enable vm.swap_encrypt at any level but you can't disable it if you care about security. -- Alex
Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP
m...@netbsd.org wrote: > No objections from me, but I feel like "will commit unless objected" > should be done on longer time scales. I spend way too much time on > netbsd and I still have some days I dont get to reading email for > whatever reason. It's a small change, we discussed it on source-changes-d and current-users already. I could have committed it without sending the patch but I introduce a new public constant and I wasn't very sure how to best name it. -- Alex
KAUTH_SYSTEM_UNENCRYPTED_SWAP
Attached patch adds KAUTH_SYSTEM_UNENCRYPTED_SWAP and it forbids changing vm.swap_encrypt from 1 to 0 when securelevel > 0. If there are no objections, I'm going to commit it tomorrow. -- Alex Index: share/man/man9/kauth.9 === RCS file: /cvsroot/src/share/man/man9/kauth.9,v retrieving revision 1.112 diff -p -u -u -r1.112 kauth.9 --- share/man/man9/kauth.9 15 Jul 2018 05:16:41 - 1.112 +++ share/man/man9/kauth.9 16 May 2020 19:22:46 - @@ -25,7 +25,7 @@ .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" -.Dd July 14, 2018 +.Dd May 17, 2020 .Dt KAUTH 9 .Os .Sh NAME @@ -488,6 +488,8 @@ Check if changing the RTC offset is allo .It Dv KAUTH_REQ_SYSTEM_TIME_TIMECOUNTERS Check if manipulating timecounters is allowed. .El +.It Dv KAUTH_SYSTEM_UNENCRYPTED_SWAP +Check if encrypted swap can be degraded to unencrypted. .It Dv KAUTH_SYSTEM_VERIEXEC Check if operations on the .Xr veriexec 8 Index: share/man/man9/secmodel_securelevel.9 === RCS file: /cvsroot/src/share/man/man9/secmodel_securelevel.9,v retrieving revision 1.19 diff -p -u -u -r1.19 secmodel_securelevel.9 --- share/man/man9/secmodel_securelevel.9 18 May 2019 10:21:03 - 1.19 +++ share/man/man9/secmodel_securelevel.9 16 May 2020 19:22:46 - @@ -26,7 +26,7 @@ .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" -.Dd May 18, 2019 +.Dd May 17, 2020 .Dt SECMODEL_SECURELEVEL 9 .Os .Sh NAME @@ -129,6 +129,11 @@ calls are denied. .It Access to unmanaged memory is denied. .It +The +.Va vm.swap_encrypt +.Xr sysctl 8 +variable may not be changed to 0. +.It Only GPIO pins that have been set at .Em securelevel 0 can be accessed. Index: sys/secmodel/securelevel/secmodel_securelevel.c === RCS file: /cvsroot/src/sys/secmodel/securelevel/secmodel_securelevel.c,v retrieving revision 1.35 diff -p -u -u -r1.35 secmodel_securelevel.c --- sys/secmodel/securelevel/secmodel_securelevel.c 11 May 2020 19:36:39 - 1.35 +++ sys/secmodel/securelevel/secmodel_securelevel.c 16 May 2020 19:22:47 - @@ -343,6 +343,11 @@ secmodel_securelevel_system_cb(kauth_cre result = KAUTH_RESULT_DENY; break; + case KAUTH_SYSTEM_UNENCRYPTED_SWAP: + if (securelevel > 0) + result = KAUTH_RESULT_DENY; + break; + case KAUTH_SYSTEM_DEBUG: default: break; Index: sys/secmodel/suser/secmodel_suser.c === RCS file: /cvsroot/src/sys/secmodel/suser/secmodel_suser.c,v retrieving revision 1.54 diff -p -u -u -r1.54 secmodel_suser.c --- sys/secmodel/suser/secmodel_suser.c 16 May 2020 19:12:38 - 1.54 +++ sys/secmodel/suser/secmodel_suser.c 16 May 2020 19:22:47 - @@ -397,6 +397,11 @@ secmodel_suser_system_cb(kauth_cred_t cr break; + case KAUTH_SYSTEM_UNENCRYPTED_SWAP: + if (isroot) + result = KAUTH_RESULT_ALLOW; + break; + case KAUTH_SYSTEM_VERIEXEC: switch (req) { case KAUTH_REQ_SYSTEM_VERIEXEC_ACCESS: Index: sys/sys/kauth.h === RCS file: /cvsroot/src/sys/sys/kauth.h,v retrieving revision 1.84 diff -p -u -u -r1.84 kauth.h --- sys/sys/kauth.h 29 Apr 2020 05:54:37 - 1.84 +++ sys/sys/kauth.h 16 May 2020 19:22:47 - @@ -152,6 +152,7 @@ enum { KAUTH_SYSTEM_FS_SNAPSHOT, KAUTH_SYSTEM_INTR, KAUTH_SYSTEM_KERNADDR, + KAUTH_SYSTEM_UNENCRYPTED_SWAP, }; /* Index: sys/uvm/uvm_swap.c === RCS file: /cvsroot/src/sys/uvm/uvm_swap.c,v retrieving revision 1.189 diff -p -u -u -r1.189 uvm_swap.c --- sys/uvm/uvm_swap.c 10 May 2020 02:38:10 - 1.189 +++ sys/uvm/uvm_swap.c 16 May 2020 19:22:47 - @@ -2078,12 +2078,34 @@ uvm_swap_decryptpage(struct swapdev *sdp explicit_memset(, 0, sizeof aes); } +static int +sysctl_swap_encrypt(SYSCTLFN_ARGS) +{ + struct sysctlnode node; + int newval, error; + + newval = *(int *)rnode->sysctl_data; + + node = *rnode; + node.sysctl_data = + error = sysctl_lookup(SYSCTLFN_CALL()); + if (error || newp == NULL) + return error; + + if (!newval && kauth_authorize_system(l->l_cred, + KAUTH_SYSTEM_UNENCRYPTED_SWAP, 0, NULL, NULL, NULL)) + return EPERM; + + *(int *)rnode->sysctl_data = newval; + return 0; +} +
Re: fexecve
Christos Zoulas wrote: > - We can completely dissallow fexecve in chrooted environments. Full disk encryption (loaded with cgdroot.kmod) requires a complete system to be chrooted. -- Alex
ASan: Unauthorized Access
Yesterday evening I updated the tree and compiled GENERIC_KASAN kernel. It survived a night of three aft test runs but I see a bunch of 'ASan: Unauthorized Access' stack traces in dmesg, all around the same time. They all look alike. Is it a known problem? [ 10325.124325] ASan: Unauthorized Access In 0x80f60ce0: Addr 0x938023b77f80 [4 bytes, read, RedZone] [ 10325.124325] #0 0x80f60ce0 in kauth_cred_uidmatch [ 10325.134352] #1 0x816f6dc5 in secmodel_extensions_network_cb [ 10325.134352] #2 0x80f5f75c in kauth_authorize_action_internal [ 10325.134352] #3 0x80f61dfa in kauth_authorize_action [ 10325.134352] #4 0x80f627a0 in kauth_authorize_network [ 10325.134352] #5 0x80aeafd3 in sysctl_inpcblist [ 10325.134352] #6 0x80fc447f in sysctl_dispatch [ 10325.134352] #7 0x80fc48dc in sys___sysctl [ 10325.134352] #8 0x8026b3ae in syscall [ 10325.134352] ASan: Unauthorized Access In 0x80f60d05: Addr 0x938023b77f84 [4 bytes, read, RedZone] [ 10325.134352] #0 0x80f60d05 in kauth_cred_uidmatch [ 10325.134352] #1 0x816f6dc5 in secmodel_extensions_network_cb [ 10325.134352] #2 0x80f5f75c in kauth_authorize_action_internal [ 10325.144377] #3 0x80f61dfa in kauth_authorize_action [ 10325.144377] #4 0x80f627a0 in kauth_authorize_network [ 10325.144377] #5 0x80aeafd3 in sysctl_inpcblist [ 10325.144377] #6 0x80fc447f in sysctl_dispatch [ 10325.144377] #7 0x80fc48dc in sys___sysctl [ 10325.144377] #8 0x8026b3ae in syscall [ 10325.144377] ASan: Unauthorized Access In 0x80f60ce0: Addr 0x93805d15db00 [4 bytes, read, RedZone] [ 10325.144377] #0 0x80f60ce0 in kauth_cred_uidmatch [ 10325.144377] #1 0x816f6dc5 in secmodel_extensions_network_cb [ 10325.144377] #2 0x80f5f75c in kauth_authorize_action_internal [ 10325.144377] #3 0x80f61dfa in kauth_authorize_action [ 10325.144377] #4 0x80f627a0 in kauth_authorize_network [ 10325.154403] #5 0x80aeafd3 in sysctl_inpcblist [ 10325.154403] #6 0x80fc447f in sysctl_dispatch [ 10325.154403] #7 0x80fc48dc in sys___sysctl [ 10325.154403] #8 0x8026b3ae in syscall [ 10325.154403] ASan: Unauthorized Access In 0x80f60d05: Addr 0x93805d15db04 [4 bytes, read, RedZone] [ 10325.154403] #0 0x80f60d05 in kauth_cred_uidmatch [ 10325.154403] #1 0x816f6dc5 in secmodel_extensions_network_cb [ 10325.154403] #2 0x80f5f75c in kauth_authorize_action_internal [ 10325.154403] #3 0x80f61dfa in kauth_authorize_action [ 10325.154403] #4 0x80f627a0 in kauth_authorize_network [ 10325.154403] #5 0x80aeafd3 in sysctl_inpcblist [ 10325.154403] #6 0x80fc447f in sysctl_dispatch [ 10325.154403] #7 0x80fc48dc in sys___sysctl [ 10325.164430] #8 0x8026b3ae in syscall [ 10325.164430] ASan: Unauthorized Access In 0x80f60ce0: Addr 0x938023b77f80 [4 bytes, read, RedZone] [ 10325.164430] #0 0x80f60ce0 in kauth_cred_uidmatch [ 10325.164430] #1 0x816f6dc5 in secmodel_extensions_network_cb [ 10325.164430] #2 0x80f5f75c in kauth_authorize_action_internal [ 10325.164430] #3 0x80f61dfa in kauth_authorize_action [ 10325.164430] #4 0x80f627a0 in kauth_authorize_network [ 10325.164430] #5 0x80aeafd3 in sysctl_inpcblist [ 10325.164430] #6 0x80fc447f in sysctl_dispatch [ 10325.164430] #7 0x80fc48dc in sys___sysctl [ 10325.164430] #8 0x8026b3ae in syscall [ 10325.164430] ASan: Unauthorized Access In 0x80f60d05: Addr 0x938023b77f84 [4 bytes, read, RedZone] [ 10325.164430] #0 0x80f60d05 in kauth_cred_uidmatch [ 10325.174456] #1 0x816f6dc5 in secmodel_extensions_network_cb [ 10325.174456] #2 0x80f5f75c in kauth_authorize_action_internal [ 10325.174456] #3 0x80f61dfa in kauth_authorize_action [ 10325.174456] #4 0x80f627a0 in kauth_authorize_network [ 10325.174456] #5 0x80aeafd3 in sysctl_inpcblist [ 10325.174456] #6 0x80fc447f in sysctl_dispatch [ 10325.174456] #7 0x80fc48dc in sys___sysctl [ 10325.174456] #8 0x8026b3ae in syscall [ 10325.174456] ASan: Unauthorized Access In 0x80f60ce0: Addr 0x93805d15db00 [4 bytes, read, RedZone] [ 10325.174456] #0 0x80f60ce0 in kauth_cred_uidmatch [ 10325.174456] #1 0x816f6dc5 in secmodel_extensions_network_cb [ 10325.174456] #2 0x80f5f75c in kauth_authorize_action_internal [ 10325.174456] #3 0x80f61dfa in kauth_authorize_action [ 10325.184518] #4 0x80f627a0 in kauth_authorize_network [ 10325.184518] #5 0x80aeafd3 in sysctl_inpcblist [ 10325.184518] #6 0x80fc447f in sysctl_dispatch [ 10325.184518] #7 0x80fc48dc in sys___sysctl [ 10325.184518] #8 0x8026b3ae in
Re: Removing PF
Mouse wrote: > Security is not a boolean. Some say that security isn't a product, security is a process. Surgeons among us have a particular view on that process ;-) Alex
Re: Time to merge the pgoyette-compat branch (take two)
Martin Husemann wrote: > Also I wonder if we could do some nm digging and awk scripts with tsort > to find potential symbol collisions or missing symbols not properly > covered by module dependencies. BTW, I'm working on Lua bindings for elftoolchain [1]. In principle, it should be a better alternative to nm+awk hacks but I'm not sure if the code can handle all required functionality. You're welcome to take a look :-) [1] https://www.github.com/xmmswap/luaelftoolchain -- Alex
Re: Too many PMC implementations
David Holland wrote: > On Sat, Aug 25, 2018 at 11:26:07AM +0100, Alexander Nasonov wrote: > > 1. It's not standartised and it will very likely change in future versions > > That doesn't really matter as long as you're only using one version at > a time... If bytecode is generated from a valid Lua program, it's indeed takes very little effort to update to a new version. But updating handcrafted bytecode may take a bit of time. > > 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned > >an idea of bytecode validation few years ago. From Lua 5.3 manual: > > > >Lua does not check the consistency of binary chunks. Maliciously > >crafted binary chunks can crash the interpreter. > > Are we talking about installing untrusted/unprivileged kernel trace > logic? Because that seems like a bad idea, or at least a very hard > thing to get right... and if not, it doesn't matter if there's a > validator. Lua bytecode is turing complete and not validatable but I'm pretty sure some subset of it (e.g. no loops, no strings, etc) can be validated. > (Also, isn't EBPF not really validatable either, or am I mixing it > up with something else?) Last I checked, the author(s) of eBPF claimed that it can be validated. -- Alex
Re: Too many PMC implementations
Kamil Rytarowski wrote: > There is already a Lua-powered solution for traces in Linux: ktap. It > uses nice rules written natively in Lua.. however it seems to be > abandoned in favor of eBPF. I see two potential problems with using Lua bytecode: 1. It's not standartised and it will very likely change in future versions 2. There is no bpf_validate for Lua bytecode. In fact, Lua team abandoned an idea of bytecode validation few years ago. From Lua 5.3 manual: Lua does not check the consistency of binary chunks. Maliciously crafted binary chunks can crash the interpreter. Alex signature.asc Description: PGP signature
Re: secmodel_securelevel(9) and machdep.svs.enabled
26.04.2018, 13:03, "Thor Lancelot Simon":If that is true it's a serious regression. Are you talking about thepathological case where "options INSECURE" is set?ThorMy reasoning was based entirely upon reading secmodel_securelevel(9) and kmem(4) man pages. I don’t know whether it accurately reflects the reality.Alex
Re: secmodel_securelevel(9) and machdep.svs.enabled
Alexander Nasonov wrote: > Thinking a bit more about this, I don't think my patch will prevent > data leakage from the kernel because /dev/mem and /dev/kmem are > readable at all securelevels. There is an important distrinction, though. Code in sys/dev/mm.c can be changed to scramble sensitive pages (e.g. cgd(4) keys) while meltdown is a wild beast and it's nearly impossible to control. -- Alex
Re: secmodel_securelevel(9) and machdep.svs.enabled
Maxime Villard wrote: > Yes, it's fine. I've never taken care of securelevel, but your change > can't be incorrect. Perhaps I would use just KAUTH_MACHDEP_SVS instead > of KAUTH_MACHDEP_SVS_DISABLE, in case another operation gets added in > the future, but that doesn't matter. I don't think securelevel should care about details of SVS. If you want to introduce levels of SVS, KAUTH_MACHDEP_SVS_DISABLE can still be used to prevent lowering (instead of disabling SVS completely). Perhaps the name can be changed to KAUTH_MACHDEP_SVS_DEGRADE or something similar but it's not that important. Thinking a bit more about this, I don't think my patch will prevent data leakage from the kernel because /dev/mem and /dev/kmem are readable at all securelevels. It can only prevent leakage in some situations. For example, when root is compromised inside chroot and chroot directory is on a file system mounted with nodev. -- Alex
Re: secmodel_securelevel(9) and machdep.svs.enabled
Alexander Nasonov wrote: > When securelevel is set, should be lock 1->0 change for > machdep.svs.enabled (and possibly for other sysctls related > to recent security mitigations)? Can I commit the attached patch? (doc update will follow) -- Alex Index: src/sys/sys/kauth.h === RCS file: /cvsroot/src/sys/sys/kauth.h,v retrieving revision 1.75 diff -p -u -u -r1.75 kauth.h --- src/sys/sys/kauth.h 28 Aug 2017 00:46:07 - 1.75 +++ src/sys/sys/kauth.h 24 Apr 2018 17:59:13 - @@ -320,7 +320,8 @@ enum { KAUTH_MACHDEP_NVRAM, KAUTH_MACHDEP_UNMANAGEDMEM, KAUTH_MACHDEP_PXG, - KAUTH_MACHDEP_X86PMC + KAUTH_MACHDEP_X86PMC, + KAUTH_MACHDEP_SVS_DISABLE }; /* Index: src/sys/secmodel/suser/secmodel_suser.c === RCS file: /cvsroot/src/sys/secmodel/suser/secmodel_suser.c,v retrieving revision 1.43 diff -p -u -u -r1.43 secmodel_suser.c --- src/sys/secmodel/suser/secmodel_suser.c 14 Jun 2017 17:48:41 - 1.43 +++ src/sys/secmodel/suser/secmodel_suser.c 24 Apr 2018 17:59:13 - @@ -854,6 +854,7 @@ secmodel_suser_machdep_cb(kauth_cred_t c case KAUTH_MACHDEP_UNMANAGEDMEM: case KAUTH_MACHDEP_PXG: case KAUTH_MACHDEP_X86PMC: + case KAUTH_MACHDEP_SVS_DISABLE: if (isroot) result = KAUTH_RESULT_ALLOW; break; Index: src/sys/secmodel/securelevel/secmodel_securelevel.c === RCS file: /cvsroot/src/sys/secmodel/securelevel/secmodel_securelevel.c,v retrieving revision 1.30 diff -p -u -u -r1.30 secmodel_securelevel.c --- src/sys/secmodel/securelevel/secmodel_securelevel.c 25 Feb 2014 18:30:13 - 1.30 +++ src/sys/secmodel/securelevel/secmodel_securelevel.c 24 Apr 2018 17:59:13 - @@ -494,6 +494,11 @@ secmodel_securelevel_machdep_cb(kauth_cr result = KAUTH_RESULT_DENY; break; + case KAUTH_MACHDEP_SVS_DISABLE: + if (securelevel > 0) + result = KAUTH_RESULT_DENY; + break; + case KAUTH_MACHDEP_CPU_UCODE_APPLY: if (securelevel > 1) result = KAUTH_RESULT_DENY; Index: src/sys/arch/x86/x86/svs.c === RCS file: /cvsroot/src/sys/arch/x86/x86/svs.c,v retrieving revision 1.17 diff -p -u -u -r1.17 svs.c --- src/sys/arch/x86/x86/svs.c 30 Mar 2018 19:58:05 - 1.17 +++ src/sys/arch/x86/x86/svs.c 24 Apr 2018 17:59:11 - @@ -38,6 +38,7 @@ __KERNEL_RCSID(0, "$NetBSD: svs.c,v 1.17 #include #include #include +#include #include #include @@ -737,11 +738,13 @@ sysctl_machdep_svs_enabled(SYSCTLFN_ARGS error = 0; else error = EOPNOTSUPP; - } else { - if (svs_enabled) + } else if (svs_enabled) { + error = kauth_authorize_machdep(kauth_cred_get(), + KAUTH_MACHDEP_SVS_DISABLE, NULL, NULL, NULL, NULL); + if (!error) error = svs_disable(); - else - error = 0; + } else { + error = 0; } return error;
secmodel_securelevel(9) and machdep.svs.enabled
When securelevel is set, should be lock 1->0 change for machdep.svs.enabled (and possibly for other sysctls related to recent security mitigations)? -- Alex
Re: KASSERT in exec_elf.c for DYN executable when p_align==0
Christos Zoulas wrote: > In article <20180317225722.GA1538@neva>, > Alexander Nasonov <al...@yandex.ru> wrote: > >Coverity (CID 1427746) complains about a division by zero when > >align is 0 in all PT_LOAD headers. > >... > >I would be nice to perform sanity checks of tainted executable > >instead of panicing. > > Fixed, thanks. But it doesn't fix CID 1427746. Given that both 0 and 1 specify no alignment, the fix is simple: - for (align = i = 0; i < eh->e_phnum; i++) + align = 1; + for (i = 0; i < eh->e_phnum; i++) if (ph[i].p_type == PT_LOAD && ph[i].p_align > align) align = ph[i].p_align; Alex
Re: KASSERT in exec_elf.c for DYN executable when p_align==0
Alexander Nasonov wrote: > Steps to reproduce (on amd64 compiled with MKPIE=yes): > > bvi -s 0x0e2 /bin/echo # change 20 to 00 > bvi -s 0x11a /bin/echo # change 20 to 00 > > /bin/echo # boom! > > I would be nice to perform sanity checks of tainted executable > instead of panicing. Attached is a simple patch. I don't know (yet) if it works. Alex Index: exec_elf.c === RCS file: /cvsroot/src/sys/kern/exec_elf.c,v retrieving revision 1.94 diff -p -u -u -r1.94 exec_elf.c --- exec_elf.c 17 Mar 2018 00:30:50 - 1.94 +++ exec_elf.c 17 Mar 2018 23:10:43 - @@ -129,7 +129,8 @@ elf_placedynexec(struct exec_package *ep Elf_Addr align, offset; int i; - for (align = i = 0; i < eh->e_phnum; i++) + align = 1; + for (i = 0; i < eh->e_phnum; i++) if (ph[i].p_type == PT_LOAD && ph[i].p_align > align) align = ph[i].p_align; @@ -679,6 +680,12 @@ exec_elf_makecmds(struct lwp *l, struct for (i = 0; i < eh->e_phnum; i++) { pp = [i]; + if (pp->p_type == PT_LOAD && + (pp->p_align & (pp->p_align - 1)) != 0) { + DPRINTF("bad alignment %#jx", (uintmax_t)pp->p_align); + error = ENOEXEC; + goto bad; + } if (pp->p_type == PT_INTERP) { if (pp->p_filesz < 2 || pp->p_filesz > MAXPATHLEN) { DPRINTF("bad interpreter namelen %#jx",
KASSERT in exec_elf.c for DYN executable when p_align==0
Coverity (CID 1427746) complains about a division by zero when align is 0 in all PT_LOAD headers. I tried reproducing the problem but the code in question is inside 'if (offset < epp->ep_vm_minaddr)' and it isn't easily reproducable. However, I hit KASSERT panic: "(offset & (align - 1)) == 0" file sys/kern/exec_elf.c, line 139. Steps to reproduce (on amd64 compiled with MKPIE=yes): bvi -s 0x0e2 /bin/echo # change 20 to 00 bvi -s 0x11a /bin/echo # change 20 to 00 /bin/echo # boom! I would be nice to perform sanity checks of tainted executable instead of panicing. -- Alex
Re: 8.0 crash inside KVM
Michael van Elst wrote: > al...@yandex.ru (Alexander Nasonov) writes: > > >inside ProxMox kvm and rebooted it a couple of times. But on the > >last reboot, the kernel stuck at something and I sent a reboot from > >the console. A bit later it panicked. > > If "console" means: issuing the reboot from DDB, then this can be > expected. If it means: running multiple /sbin/reboot concurrently, > then it's a bug. >From ProxMox noVNC console. -- Alex
Re: 8.0 crash inside KVM
Alexander Nasonov wrote: > I don't know if a panic in a VM should be reported to gnats. I'm sending > the report to tech-kern instead. > > I successfully installed 8.0_BETA amd64 GENERIC, which I downloaded from > > https://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-8/201802240710Z/images/ > One more panic on shutdown: panic: kernel diagnostic assertion "pipe != NULL" failed: /usr/src/sys/dev/usb/usbdi.c", line 670 Fatal breakpoint trap in supervisor mode Trap type 1 code 0 rip 0x80224d95 cs 0x8 rflags 0x246 cr2 0 ilevel 0 rsp 0xfe80028a7d30 curlwp 0xfe8022707500 pi 0.24 lowest kstack 0xe80028a42c0 Stopped in pid 0.24 (system) at netbsd:breakpoint+0x5c: leave breakpoint() at netbsd:breakpoint+9x5c vpanic() at netbsd:vpanic+0x140 ch_voltag_convert_in() at netbsd:ch_voltag_convert_in usbd_abort_pipe() at netbsd:usbd_abort_pipe+0x6b usbd_kill_pipe() at netbsd:usbd_kill_pipe+0x11 usbd_new_device() at netbsd:usbd_new_device+0x3ec usb_doattach() at netbsd:usb_doattach+0xa8 config_interrupt_thread() at netbsd:config_interrupt_thread+0x30 Snapshots are available. I can give remote access to noVNC for debugging. -- Alex
8.0 crash inside KVM
I don't know if a panic in a VM should be reported to gnats. I'm sending the report to tech-kern instead. I successfully installed 8.0_BETA amd64 GENERIC, which I downloaded from https://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-8/201802240710Z/images/ inside ProxMox kvm and rebooted it a couple of times. But on the last reboot, the kernel stuck at something and I sent a reboot from the console. A bit later it panicked. Their noVNC console doesn't support copy/paste. The information below is incomplete and may be inaccurate. Mutex error: mutex_destroy,387: assertion failed: !MUTEX_OWNED(mtx->mtx_owner) & !MUTEX_HAS_WAITERS(mtx) lock address: 0xfe81dbfaa0a0 current cpu : 1 current lwp : 0xfe81bdff94a0 owner field : 0x81481c60 wait/spin: 0/0 ... trap type 1 code 0 rip 0x80224d95 cs 0x8 rflags 0x246 cr2 0 ilevel 0 rsp 0xfe810aeccc60 curlwp 0xfe81bdff94a0 pid 0.15 lowest kstack 0xfe810aec92c0 Stopped in pid 0.15 (system) at netbsd:breakpoint+0x5: leave breakpoint() at netbsd:breakpoint+0x5 vpanic() at netbsd:vpanic+x0140 snprintf() at netbsd:snprintf lockdebug_abort() at netbsd:lockdebug_abort+x06e mutex_destroy() at netbsd:mutex_destroy+0x3d disk_destroy() at netbsd:disk_destroy+0x18 wddetach() at netbsd:wddetach+0x10c config_detach() at netbsd:config_detach+0x10b config_detach_all() at netbsd:config_detach_all+0x97 cpu_reboot() at netbsd:cpu_reboot+0x173 sysmon_pswitch_event() at netbsd:sysmon_pswitch_event+0x14e sysmon_task_queue_thread() at netbsd:sysmon_task_queue_thread+x044 Some bits from dmesg: ahcisata0 at pci0 dev 7 function 0: vendor 8086 product 2922 (rev 0x02) ahcisata0: interrupting at ioapic0 pin 11 ahcisata0: 64-bit DMA ahcisata0: AHCI revision 1.0, 6 ports, 32 slots, CAP 0xc0141f05atabus2 at ahcisata0 channel 0 atabus3 at ahcisata0 channel 1 ... ahcisata0 port 0: device present, speed: 1.5Gb/s atapibus0 at atabus1: 2 targets cd0 at atapibus0 drive 0: cdrom removable cd0: 32-bit data port cd0: drive supports PIO mode 4, DMA mode 2 (using DMA) uhub0: device problem, disabling port 1 acpi0: power button pressed, shutting down! syncing disks ... done cd0: detached midi0: detached sysbeep0: detached atapibus0: detached pci2: detached pci1: detached wd0 at atabus2 drive 0 wd0: wd0: drive supports 16-sector PIO transfers, LBA48 addressing wd0: 128 GB, 266305 cyl, 16 head, 63 sec, 512 bytes/sect x 268435456 sectors wd0(ahcisata0:0:0) using PIO mode 4, DMA mode 2, Ultra DMA mode 5 (Ultra/100) (using DMA) atabus7: detached pad0: output 44100Hz, 16-bit, stereo audio0 at pad0: half duplex, playback, capture, mmap pad0: Virtual format configured - Format SLINEAR, precision 16, channels 2, frequency 44100 pad0: Latency: 139 milliseconds spkr1 at audio0: PC speaker (synthesized) atabus6: detached atabus5: detached atabus4: detached atabus3: detached atabus1: detached atabus0: detached WARNNG: 1 error while detecting hardware; check system log. ppb1: detached boot device: wd0 ppb0: detached root on wd0a dumps on wd0bpchb0: detached Skipping crash dump on recursive panic Alex
Re: Spectre
m...@netbsd.org wrote: > Considering JITs are a much bigger risk, and how cheap this suggestion > is, should we use lfence / similar for other architectures within sljit > (and possibly lua)? While everyone seems to be concerned on negative performance impact of Spectre, I recently worked on preventing speculative loads to avoid trashing caches and keeping tails of performance in good shape. I found that lfence was ineffective and I had to insert data dependency and some extra work to distract processor. It worked much better and it improved latencies in tails. If you look at Intel's documentation of lfence, they make it very clear that lfence doesn't prevent speculative loads. -- Alex
Re: Go binary panics on amd64-current - SIG*unknown
Christos Zoulas wrote: > Index: linux_sigaction.c > === > RCS file: /cvsroot/src/sys/compat/linux/common/linux_sigaction.c,v > retrieving revision 1.34 > diff -u -u -r1.34 linux_sigaction.c > --- linux_sigaction.c 17 Oct 2008 20:21:34 - 1.34 > +++ linux_sigaction.c 7 Jan 2018 00:10:43 - > @@ -84,6 +84,15 @@ > linux_old_to_native_sigaction(, ); > } > sig = SCARG(uap, signum); > + /* > + * XXX: Linux has 33 realtime signals, the go binary wants to > + * reset all of them; nothing else uses the last RT signal, so for > + * now ignore it. > + */ > + if (sig == LINUX__NSIG) { > + uprintf("%s: setting signal %d ignored\n", __func__, sig); > + sig--; /* back to 63 which is ignored */ > + } > if (sig < 0 || sig >= LINUX__NSIG) > return (EINVAL); > if (sig > 0 && !linux_to_native_signo[sig]) { it works but you patched the wrong file, this change should go into linux_signal.c. -- Alex
Re: Go binary panics on amd64-current - SIG*unknown
Ah, I thought I mentioned it in my email. Yes, it’s Linux binary.Sent from Yandex.Mail for mobile: http://m.ya.ru/ymail
Go binary panics on amd64-current - SIG*unknown
I downloaded IACA tool from intel.com but I couldn't run it: $ ktrace ./iaca-v3.0-lin64/iaca fatal error: rt_sigaction read failure runtime stack: runtime.throw(0x75de24, 0x19) /nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/panic.go:596 +0x95 runtime.getsig(0x40, 0x441c60) /nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/os_linux.go:427 +0x92 runtime.initsig(0xe59d00) /nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/signal_unix.go:79 +0x98 runtime.mstart1() /nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/proc.go:1175 +0xa4 runtime.mstart() /nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/proc.go:1149 +0x64 goroutine 1 [runnable]: runtime.main() /nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/proc.go:106 runtime.goexit() /nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/asm_amd64.s:2197 +0x1 goroutine 17 [syscall, locked to thread]: runtime.goexit() /nfs/iil/disks/kfw/tools/go/go-latest/src/runtime/asm_amd64.s:2197 +0x1 $ kdump 5477 5477 iaca CALL rt_sigaction(SIG*unknown 64*,0,0x7f7fe580,8) 5477 5477 iaca RET rt_sigaction -1 errno -22 Invalid argument 5477 5477 iaca CALL rt_sigaction(SIG*unknown 64*,0,0x7f7fe8b8,8) 5477 5477 iaca RET rt_sigaction -1 errno -22 Invalid argument $ uname -a NetBSD neva 8.99.7 NetBSD 8.99.7 (GENERIC_KASLR) #0: Sat Nov 18 09:54:53 GMT 2017 alnsn@neva:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC_KASLR amd64 I don't know if it matters but I have PAX protections enabled: $ sysctl -a |grep -w pax | grep enabled security.pax.mprotect.enabled = 1 security.pax.segvguard.enabled = 1 security.pax.aslr.enabled = 1 -- Alex
Re: Restricting rdtsc [was: kernel aslr]
Taylor R Campbell wrote: > > Date: Tue, 28 Mar 2017 16:58:58 +0200 > > From: Maxime Villard> > > > Having read several papers on the exploitation of cache latency to defeat > > aslr (kernel or not), it appears that disabling the rdtsc instruction is a > > good mitigation on x86. However, some applications can legitimately use it, > > so I would rather suggest restricting it to root instead. > > Put barriers in the way of legitimate applications to thwart > hypothetical attackers who will... step around them and use another > time source, of which there are many options in the system? This > sounds more like cutting off the nose to spite the face than a good > mitigation against real attacks. Old thread but the authors of the spectre paper did exactly what Taylor said: https://spectreattack.com/spectre.pdf "JavaScript does not provide access to the rdtscp instruction, and Chrome intentionally degrades the accuracy of its high-resolution timer to dissuade timing attacks using performance.now() [1]. However, the Web Workers feature of HTML5 makes it simple to create a separate thread that repeatedly decrements a value in a shared memory location [18, 32]. This approach yielded a high-resolution timer that provided sufficient resolution." -- Alex
Re: raid and cgd
Edgar Fu? wrote: > RAIDframe operates on units called stripes. FFS mostly operates on FS blocks. > If one FS block is not exactly a whole number of stripes (either because of > misalignment or because an FS block is smaller than a RAID stripe), then each > write of an FS block will force the RAID to do a RMW cycle. > Further down, RAIDframe will operate on the components in units of stripe > size > divided by (components minus one). If this is not aligned to whatever the > component can natively modify (think 512e discs), then the component will RMW. > Doing RMW will dramatically (at least by a factor of three, more likely ten) > reduce performance. Your description matches a paragraph in raidctl(8) that starts with 'Tuning RAID 5 sets is trickier.' while my setup is RAID 1 and recommended values for SectPerSU are 32 to 128. I set it to 128 but I'll try 64 to see if makes a difference. -- Alex
Re: raid and cgd
Edgar Fu? wrote: > > when raid is configured for the first time, it needs to do something > > with every block? > raidctl -i, yes. That's what I thought. Thanks for confirming. > Other than that, you'll need to align your FS blocks with RAID stripes > and each component's stripe part with whatever cgd operates on. Can you please elaborate? Do they have to match for the best performance, or should one be a multiple of the other? > Why do you do RAID-on-CGD and not CGD-on-RAID? Do I? First, I configured raid0 and then cgd0 as following: cgdconfig -s cgd0 /dev/raid0d aes-cbc 256 < /dev/urandom This was a temporary cgd0 configuration to scrub data. -- Alex
Re: raid and cgd
Mouse wrote: > > My raid is made of two disks dk8 and dk9. When there is an activity > > on cgd, dk8 and dk9 shows more than 2x bytes each than cgd. Why is > > that? > > Speculation: read-modify-write cycles? > > Speculation: cgd's encryption leading to accessing more of its > underlying disk than the upper-layer access? One more: when raid is configured for the first time, it needs to do something with every block? I'm a newbie and so far I spent more time learning about raid commands on linux (disgusting!). Time to read raid(4) and raidctl(8) properly. -- Alex
Re: raid and cgd
Alexander Nasonov wrote: > My raid is made of two disks dk8 and dk9. When there is an activity on cgd, > dk8 and dk9 shows more than 2x bytes each than cgd. Why is that? > >wd01525 71M 66.8 msix2 vec 0 2730 > ftarg >wd11524 71M 59.3 msix2 vec 2 75 > wired >dk91524 71M 59.8 >dk81524 71M 67.3 > raid0 761 24M 86.9 > cgd0 761 24M 99.5 It's still scrubbing raid0 (dd if=/dev/zero of=/dev/rcgd0d bs=32k) but stats now look different and there is no 2-3x difference in transfrer rates anymore. cgd0 is configured with aes-cbc 256. Disks: seeks xfers bytes %busy msix1 vec 4 2048 fmin wd03586 112M 30.7 msix2 vec 0 2730 ftarg dk3 msix2 vec 1 itarg wd13586 112M 29.6 msix2 vec 2 75 wired dk4 msix2 vec 3 pdfre dk5 msix2 vec 4 pdscn dk6 dk7 dk93568 111M 31.5 dk10 dk0 dk1 dk2 dk83568 111M 32.0 dk11 raid03568 111M 39.0 cgd03550 111M 94.5 -- Alex
raid and cgd
My raid is made of two disks dk8 and dk9. When there is an activity on cgd, dk8 and dk9 shows more than 2x bytes each than cgd. Why is that? $ systat 2 :vmstat Disks: seeks xfers bytes %busy msix1 vec 4 2048 fmin wd01525 71M 66.8 msix2 vec 0 2730 ftarg dk3 1203 0.3 msix2 vec 1 itarg wd11524 71M 59.3 msix2 vec 2 75 wired dk4 msix2 vec 3 pdfre dk5 msix2 vec 4 pdscn dk6 dk7 dk91524 71M 59.8 dk10 dk0 dk1 dk2 dk81524 71M 67.3 dk11 raid0 761 24M 86.9 cgd0 761 24M 99.5 -- Alex
Re: Restricting rdtsc [was: kernel aslr]
Maxime Villard wrote: > Le 29/03/2017 ? 00:49, Alexander Nasonov a ?crit : > > I think this should be either all-or-nothing. You either have rdtsc as > > a time source or you don't. Similar for rdpmc (and other performance > > counters). > > Well, the idea was to make the availability more fine-grained. > > > Seeing the general skepticism that prevails, I guess we can just forget about > this idea. There are two more or less independent things: fine-grained time source and userspace rdtsc. The latter is often used directly when vdso isn't available. If we implement vdso, I assume that software that needs rdtsc can be taught to call it via vdso. With vdso implemented, we can have a flag that enables/disables vdso globally as well as per process (paxctl?). Independetly, the kernel can be configured to use either fine-grained or hackerproof time source for regular (non-vdso) system calls. Alex
Re: Restricting rdtsc [was: kernel aslr]
Maxime Villard wrote: > Having read several papers on the exploitation of cache latency to defeat > aslr (kernel or not), it appears that disabling the rdtsc instruction is a > good mitigation on x86. However, some applications can legitimately use it, > so I would rather suggest restricting it to root instead. Why does root need it? For ntp? Properly implemented ntp should be privsep'ed. I think this should be either all-or-nothing. You either have rdtsc as a time source or you don't. Similar for rdpmc (and other performance counters). -- Alex
Re: A Fast Alternative to Modulo Reduction
Mouse wrote: > Abhinav Upadhyay wrote: > >> Both operations are not _equivalent_ but he proves that the latter > >> is also a fair mapping of x in the range [0, N) for 32 bit integers. > > Only under certain assumptions about x - loosely put, that the entropy > in x is distributed equally across all of x's bits. But, for example, > if you use the common string hash function that works like h=0 then > loop{h=(h*37)+*cp++}, short strings will have all 0s in the high bits. I can confirm that for several common hash functions (murmur, xxh, jenkins), a distribution of bits isn't good enough to generate an acyclic graph for chm (cdb uses the chm algorithm) for many inputs while fast_divide32 reduction works fine most of the time. Code is here: https://github.com:/alnsn/rgph Alex
Re: EFI native support on amd64
Kimihiro Nonaka wrote: > I updated the patches. It works great. Some observations: - When the kernel starts booting, it prints lines painfully slow on my Asus laptop. When the kernel initialises genfb(4), printing suddenly speeds up. If I connect to Sony TV via HDMI, it somewhat slow but acceptable. - I can't figure out how to change screen size in boot.cfg. Adding text=2 to boot.cfg doesn't work. - Likewise, I don't know how to change screen size in genfb(4). I will google more on the topic. - Does the bootloader suport msdos fs? Currently, I create an ffs partition solely for hosting boot.cfg file. It would be nice to read boot.cfg file from the boot EFI partition. Alex
Re: EFI native support on amd64
Kimihiro Nonaka wrote: > > I'm working on efiboot. > > http://cdn.netbsd.org/pub/NetBSD/misc/nonaka/efiboot/ Hi Kimihiro, That's cool! I can boot from your bootloader. The compilation fails, though: /home/alnsn/netbsd-current/tooldir.amd64/bin/x86_64--netbsd-ld: cannot find startprog32.o: No such file or directory I'm building on amd64 -current. -- Alex
EFI native support on amd64
Hi, I see some pieces of EFI in the base (gnu-efi, grub support) but no complete native support. If I remember correctly, someone worked on it but I don't have any link to their work. Are there any patches I can hack on further? PS I was able to link a hello world amd64 executable. It printed gibberish, though, presumably because it expected CHAR16. Otherwise, coding was quite easy given my lack of knowledge in this area. Alex
Re: small changes in aesxcbcmac.c
Alexander Nasonov wrote: > The first change shrinks aes_xcbc_mac_init by 183 bytes on amd64 > (from 562 to 379 bytes). > The second change avoids a comparison with an address that may > point beyond the end of a buffer. > The third change is stylistic. > Alex If there are no objections I'll commit the code. PS I noticed some excessive memory copying (often of fixed-size blocks). Some of them may be needed to prevent side channel attacks by measuring execution time of cache misses. Data of the stack is more likely to be in cache but it's not bulletproof. If we rely on this at all, buffers on the stack should have __cacheline_aligned attribute but I don't see any in the code. > aes_xcbc_mac_result(u_int8_t *addr, void *vctx) > { > - u_char digest[AES_BLOCKSIZE]; > + u_int8_t digest[AES_BLOCKSIZE]; > aesxcbc_ctx *ctx; > int i; This buffer isn't actually needed. The destination addr can be passed directly to rijndaelEncrypt() calls inside the function. I didn't change it because it is the only array in the function and removing it would disable ssp. Alex
Re: small changes in aesxcbcmac.c
Rhialto wrote: > On Sun 25 Sep 2016 at 22:01:16 +0100, Alexander Nasonov wrote: > > - while (addr + AES_BLOCKSIZE < ep) { > > + while (ep - addr > AES_BLOCKSIZE) { > > I think that if ep points beyond tbe buffer (apart from the > just-past-the-end location), the subtraction is just as undefined > behaviour as before... If understand the code correctly, ep points to the first byte past the end which is well defined. Alex signature.asc Description: PGP signature
Re: small changes in aesxcbcmac.c
Eric Haszlakiewicz wrote: > On September 25, 2016 5:01:16 PM EDT, Alexander Nasonov <al...@yandex.ru> > wrote: > >The first change shrinks aes_xcbc_mac_init by 183 bytes on amd64 > >(from 562 to 379 bytes). > > Do you mean it shrinks its stack usage? Or does that change to static const > vars somehow shrink the function itself? gcc copies each byte to the stack: 80532a10: c6 45 98 01 movb $0x1,-0x68(%rbp) 80532a14: c6 45 99 01 movb $0x1,-0x67(%rbp) 80532a18: c6 45 9a 01 movb $0x1,-0x66(%rbp) 80532a1c: c6 45 9b 01 movb $0x1,-0x65(%rbp) 80532a20: c6 45 9c 01 movb $0x1,-0x64(%rbp) 80532a24: c6 45 9d 01 movb $0x1,-0x63(%rbp) 80532a28: c6 45 9e 01 movb $0x1,-0x62(%rbp) 80532a2c: c6 45 9f 01 movb $0x1,-0x61(%rbp) 80532a30: c6 45 a0 01 movb $0x1,-0x60(%rbp) 80532a34: c6 45 a1 01 movb $0x1,-0x5f(%rbp) 80532a38: c6 45 a2 01 movb $0x1,-0x5e(%rbp) 80532a3c: c6 45 a3 01 movb $0x1,-0x5d(%rbp) 80532a40: c6 45 a4 01 movb $0x1,-0x5c(%rbp) 80532a44: c6 45 a5 01 movb $0x1,-0x5b(%rbp) 80532a48: c6 45 a6 01 movb $0x1,-0x5a(%rbp) 80532a4c: c6 45 a7 01 movb $0x1,-0x59(%rbp) 80532a50: c6 45 a8 02 movb $0x2,-0x58(%rbp) 80532a54: c6 45 a9 02 movb $0x2,-0x57(%rbp) 80532a58: c6 45 aa 02 movb $0x2,-0x56(%rbp) and so on. Alex
small changes in aesxcbcmac.c
The first change shrinks aes_xcbc_mac_init by 183 bytes on amd64 (from 562 to 379 bytes). The second change avoids a comparison with an address that may point beyond the end of a buffer. The third change is stylistic. Alex --- sys/opencrypto/aesxcbcmac.c.orig2016-09-25 21:44:25.344941650 +0100 +++ sys/opencrypto/aesxcbcmac.c 2016-09-25 13:21:43.364224984 +0100 @@ -41,9 +41,12 @@ int aes_xcbc_mac_init(void *vctx, const u_int8_t *key, u_int16_t keylen) { - u_int8_t k1seed[AES_BLOCKSIZE] = { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }; - u_int8_t k2seed[AES_BLOCKSIZE] = { 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 }; - u_int8_t k3seed[AES_BLOCKSIZE] = { 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3 }; + static const u_int8_t k1seed[AES_BLOCKSIZE] = + { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }; + static const u_int8_t k2seed[AES_BLOCKSIZE] = + { 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2 }; + static const u_int8_t k3seed[AES_BLOCKSIZE] = + { 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3 }; u_int32_t r_ks[(RIJNDAEL_MAXNR+1)*4]; aesxcbc_ctx *ctx; u_int8_t k1[AES_BLOCKSIZE]; @@ -98,7 +101,7 @@ ctx->buflen = 0; } /* due to the special processing for M[n], "=" case is not included */ - while (addr + AES_BLOCKSIZE < ep) { + while (ep - addr > AES_BLOCKSIZE) { memcpy(buf, addr, AES_BLOCKSIZE); for (i = 0; i < sizeof(buf); i++) buf[i] ^= ctx->e[i]; @@ -115,7 +118,7 @@ void aes_xcbc_mac_result(u_int8_t *addr, void *vctx) { - u_char digest[AES_BLOCKSIZE]; + u_int8_t digest[AES_BLOCKSIZE]; aesxcbc_ctx *ctx; int i;
Re: cgdstrategy: divide fault in supervisor mode
Michael van Elst wrote: > Right. This needs to be written differently. Instead of GETCGD_SOFTC() > use: > > cs = getcgd_softc(bp->b_dev); > if (!cs) { > bp->b_error = ENXIO; > biodone(bp); > return; > } I enabled DEBUG in the config and changed cgdstrategy. Same crash: Stopped in pid 10.1 (mount_ffs) at netbsd:cgdstrategy+0x2d:divl 4 0(%r12),%eax 808edcd8 : 808edcd8: 55 push %rbp 808edcd9: 48 89 e5mov%rsp,%rbp 808edcdc: 53 push %rbx 808edcdd: 48 83 ec 08 sub$0x8,%rsp 808edce1: 48 89 fbmov%rdi,%rbx 808edce4: f6 05 d5 d0 8e 00 01testb $0x1,0x8ed0d5(%rip) # 811dadc0 808edceb: 75 52 jne808edd3f808edced: 48 8b 7b 38 mov0x38(%rbx),%rdi 808edcf1: e8 e5 fd ff ff callq 808edadb 808edcf6: 48 89 c7mov%rax,%rdi 808edcf9: 48 85 c0test %rax,%rax 808edcfc: 74 58 je 808edd56 808edcfe: 48 83 7b 48 00 cmpq $0x0,0x48(%rbx) 808edd03: 8b 4b 34mov0x34(%rbx),%ecx 808edd06: 78 11 js 808edd19 808edd08: 89 c8 mov%ecx,%eax 808edd0a: 31 d2 xor%edx,%edx 808edd0c: f7 77 40divl 0x40(%rdi) 808edd0f: 85 d2 test %edx,%edx 808edd11: 75 06 jne808edd19 808edd13: f6 43 40 03 testb $0x3,0x40(%rbx) 808edd17: 74 18 je 808edd31 808edd19: c7 43 20 16 00 00 00movl $0x16,0x20(%rbx) 808edd20: 89 4b 24mov%ecx,0x24(%rbx) 808edd23: 48 89 dfmov%rbx,%rdi 808edd26: 48 83 c4 08 add$0x8,%rsp 808edd2a: 5b pop%rbx 808edd2b: 5d pop%rbp 808edd2c: e9 f0 c3 fc ff jmpq 808ba121 808edd31: 48 89 demov%rbx,%rsi 808edd34: 48 83 c4 08 add$0x8,%rsp 808edd38: 5b pop%rbx 808edd39: 5d pop%rbp 808edd3a: e9 a1 2e 00 00 jmpq 808f0be0 808edd3f: 48 63 57 34 movslq 0x34(%rdi),%rdx 808edd43: 48 89 femov%rdi,%rsi 808edd46: 48 c7 c7 18 15 f9 80mov$0x80f91518,%rdi 808edd4d: 31 c0 xor%eax,%eax 808edd4f: e8 4f d8 f8 ff callq 8087b5a3 808edd54: eb 97 jmp808edced 808edd56: c7 43 20 06 00 00 00movl $0x6,0x20(%rbx) 808edd5d: eb c4 jmp808edd23 808eeb2e: 48 c7 c7 d8 dc 8e 80mov$0x808edcd8,%rdi 808eeb35: 5b pop%rbx 808eeb36: 41 5c pop%r12 808eeb38: 5d pop%rbp 808eeb39: e9 4f db f4 ff jmpq 8083c68d 808eeb9d: 48 c7 c7 d8 dc 8e 80mov$0x808edcd8,%rdi 808eeba4: 5b pop%rbx 808eeba5: 41 5c pop%r12 808eeba7: 5d pop%rbp 808eeba8: e9 e0 da f4 ff jmpq 8083c68d Alex
Re: cgdstrategy: divide fault in supervisor mode
Michael van Elst wrote: > Right. This needs to be written differently. Instead of GETCGD_SOFTC() > use: > > cs = getcgd_softc(bp->b_dev); > if (!cs) { > bp->b_error = ENXIO; > biodone(bp); > return; > } I tried something similar but with bp->b_resid = bp->b_bcount; instead of biodone(bp); It still crashes. I'll try your code. Alex
Re: cgdstrategy: divide fault in supervisor mode
Michael van Elst wrote: > Ah, maybe then: > > --- cgd.c 5 Aug 2016 08:24:46 - 1.110 > +++ cgd.c 13 Sep 2016 21:43:27 - > @@ -305,13 +305,17 @@ > static void > cgdstrategy(struct buf *bp) > { > - struct cgd_softc *cs = getcgd_softc(bp->b_dev); > - struct dk_softc *dksc = >sc_dksc; > - struct disk_geom *dg = >sc_dkdev.dk_geom; > + struct cgd_softc *cs; > + struct dk_softc *dksc; > + struct disk_geom *dg; > > DPRINTF_FOLLOW(("cgdstrategy(%p): b_bcount = %ld\n", bp, > (long)bp->b_bcount)); > > + GETCGD_SOFTC(cs, bp->b_dev); > + dksc = >sc_dksc; > + dg = >sc_dkdev.dk_geom; > + It will not compile because cgdstrategy() returns void. Alex
Re: cgdstrategy: divide fault in supervisor mode
Alexander Nasonov wrote: > I can examine data at 0x34 offset and it's indeed 0. Correction: $rdi+0x40 is the correct location. I also inspected low half of dev_t ($rbx+0x38) and its value was 0x1423 which corresponds to: brw-r- 1 root operator 20, 35 Dec 13 2015 /dev/cgd2d Alex
Re: cgdstrategy: divide fault in supervisor mode
Michael van Elst wrote: > That would require dg_secsize to be 0 which is difficult to do > because the drivers initialize the value and the common disk_set_info() > function fixes up a zero value. I can reproduce division by zero but not when rebooting. If I take an unconfigured cgd device, i.e. cgd2 and run mount /dev/cgd2d /tmp the kernel will panic instead of returning ENXIO. > But maybe the dg pointer is bad? Please have a look at the %rdi > register. I don't know what was rdi's value when it crashed during reboot but crashes when mounting /dev/cgd2d all have good kernel-space values. I can examine data at 0x34 offset and it's indeed 0. $ crash -M netbsd.12.core Crash version 7.99.36, image version 7.99.36. System panicked: dump forced via kernel debugger Backtrace from time of crash is available. crash> dmesg|tail iwm0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 36M bps 48Mbps 54Mbps acpibat0: normal capacity on 'charge state' fatal integer divide fault in supervisor mode trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 4d8000 ilevel 0 rs p fe8116cfba50 curlwp 0xfe836fcbab00 pid 13.1 lowest kstack 0xfe8116cf82c0 dumping to dev 20,17 (offset=212951, size=3119109): dump crash> bt _KERNEL_OPT_NARCNET() at 0 _KERNEL_OPT_NARCNET() at 0 db_reboot_cmd() at db_reboot_cmd db_command() at db_command+0xeb db_command_loop() at db_command_loop+0x90 db_trap() at db_trap+0xe3 kdb_trap() at kdb_trap+0xe1 trap() at trap+0x574 --- trap (number 8) --- cgdstrategy() at cgdstrategy+0x26 bdev_strategy() at bdev_strategy+0x68 spec_strategy() at spec_strategy+0x81 VOP_STRATEGY() at VOP_STRATEGY+0x33 bio_doread() at bio_doread+0x98 bread() at bread+0x1a ffs_mountfs() at ffs_mountfs+0x170 ffs_mount() at ffs_mount+0x227 VFS_MOUNT() at VFS_MOUNT+0x34 mount_domount() at mount_domount+0x122 do_sys_mount() at do_sys_mount+0x20f sys___mount50() at sys___mount50+0x33 syscall() at syscall+0x15b --- syscall (number 410) --- 75c7da: crash> ps PIDLID S CPU FLAGS STRUCT LWP * NAME WAIT 13 > 1 7 1 0 fe836fcbab00 mount_ffs 12 1 2 1 802 fe811681d2a0 mount 81 2 1 802 fe811681d6c0ksh 21 2 1 802 fe811681dae0ksh 11 2 1 802 fe81163f5680 init ... Alex
cgdstrategy: divide fault in supervisor mode
Someone warned me that adding cgd to dump devices can have bad consequences. I think I caught one possible bad case yesterday. I was lucky enough to still have my data. My setup is quite complicated. I have a small root on wd0a which does only one thing: to mount a real root on cgd0a and chroot to it. The rest of the system is on cgd1. I was in a single-user mode, inside /altroot (iirc), all fs mounted but I wanted to remount them in read-only mode. Some of them couldn't be unmounted and I forced umounts with the -f flag. Then I mounted them with read-only flag. I don't remember exact commands but I have nested mount points, e.g. /var/log inside /var and I was definitely trying to remount both inner and outer fs. All mount/umount worked but when I ran reboot, the system trapped here: fatal integer divide fault in supervisor mode trap type 8 code 0 rip 808db36f cs 8 rflags 10246 cr2 efd... curlwp 0xfe81163b4a40 pid 276.1 lowest kstack 0xfe8117343... kernel: integer divide fault trap, code=0 Stopped in pid 276.1 (reboot) atnetbsd:cgdstrategy+0x26: 4 0(%rdi),%eax This it what I run: NetBSD neva 7.99.36 NetBSD 7.99.36 (GENERIC) #0: Fri Sep 2 22:04:02 BST 2016 alnsn@nebeda:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC amd64 Sources checked out on Sep 2. Looking at the assembly, it appears that the fault happened at the second line of this branch: if (bp->b_blkno < 0 || (bp->b_bcount % dg->dg_secsize) != 0 || (offset of b_blkno is 0x48, b_bcount's offset is 0x34). 808db349 : 808db349: 55 push %rbp 808db34a: 48 89 e5mov%rsp,%rbp 808db34d: 53 push %rbx 808db34e: 48 83 ec 08 sub$0x8,%rsp 808db352: 48 89 fbmov%rdi,%rbx 808db355: 48 8b 7f 38 mov0x38(%rdi),%rdi 808db359: e8 4d fe ff ff callq 808db1ab 808db35e: 48 83 7b 48 00 cmpq $0x0,0x48(%rbx) 808db363: 78 3d js 808db3a2 808db365: 48 89 c7mov%rax,%rdi 808db368: 8b 4b 34mov0x34(%rbx),%ecx 808db36b: 89 c8 mov%ecx,%eax 808db36d: 31 d2 xor%edx,%edx 808db36f: f7 77 40divl 0x40(%rdi) Alex
Re: current amd64 crashes in usb_transfer_complete
Alexander Nasonov wrote: > The crash happened only once. I'm not sure that running with these > will prove anything. I'll try to take steps to reproduce the crash > again before running with these patches. Got a different crash in usb_transfer_complete. I was playing with RPI2 powered from USB on my computer and connected my computer over uplcom(4). It likely happened because I unplugged uplcom(4) while minicom was still hanging up. It didn't happen immediately, though. I was able to start X and firefox before the kernel crashed. I also replugged power USB a couple of times but it's a very dumb, I'm not sure if plugging it triggers any code in the kernel. I don't know if this crash is related to the first crash. The first crash happened while I was trying to secure yubikey into the slot. Here's dmesg of the second crash: panic: lock error: Mutex: mutex_vector_enter: locking against myself: lock 0xfe81163ee370 cpu 3 lwp 0xfe8117053b20 cpu3: Begin traceback... vpanic() at netbsd:vpanic+0x140 snprintf() at netbsd:snprintf lockdebug_abort() at netbsd:lockdebug_abort+0x63 mutex_vector_enter() at netbsd:mutex_vector_enter+0x369 ucomwritecb() at netbsd:ucomwritecb+0x22 usb_transfer_complete() at netbsd:usb_transfer_complete+0x149 xhci_abort_xfer() at netbsd:xhci_abort_xfer+0x15d usbd_ar_pipe() at netbsd:usbd_ar_pipe+0x3b usbd_abort_pipe() at netbsd:usbd_abort_pipe+0x27 ucomclose() at netbsd:ucomclose+0x12c spec_close() at netbsd:spec_close+0x11a VOP_CLOSE() at netbsd:VOP_CLOSE+0x33 vn_close() at netbsd:vn_close+0x36 closef() at netbsd:closef+0x54 fd_close() at netbsd:fd_close+0x1ac sys_close() at netbsd:sys_close+0x20 syscall() at netbsd:syscall+0x15b crash> show lock 0xfe81163ee370 doesn't show anything because the kernel isn't compiled with LOCKDEBUG. Alex
Re: current amd64 crashes in usb_transfer_complete
Takahiro Hayashi wrote: > Could you try both of > > http://www.netbsd.org/~skrll/usb.softint.diff > https://mail-index.netbsd.org/tech-kern/2016/08/09/msg020963.html > > and check if you can still see the problem? The crash happened only once. I'm not sure that running with these will prove anything. I'll try to take steps to reproduce the crash again before running with these patches. Alex
Re: current amd64 crashes in usb_transfer_complete
Alexander Nasonov wrote: > Got KASSERT when a yubikey device was plugged in. > > $ crash -M netbsd.9.core > Crash version 7.99.35, image version 7.99.35. > System panicked: kernel diagnostic assertion "xfer->ux_state == XFER_ONQU" > failed: file "/home/alnsn/netbsd-current/clean/src/sys/dev/usb/usbdi.c", line > 910 > Backtrace from time of crash is available. > crash> bt > _KERNEL_OPT_NARCNET() at 0 > _KERNEL_OPT_NARCNET() at 0 > vpanic() at vpanic+0x149 > cd_play_msf() at cd_play_msf > usb_transfer_complete() at usb_transfer_complete+0x24e > xhci_abort_xfer() at xhci_abort_xfer+0x15d > usbd_ar_pipe() at usbd_ar_pipe+0x3b > usbd_abort_pipe() at usbd_abort_pipe+0x27 > uhidev_detach() at uhidev_detach+0x32 > config_detach() at config_detach+0xf8 > usb_disconnect_port() at usb_disconnect_port+0xae > uhub_explore() at uhub_explore+0x1fe > usb_discover.isra.2() at usb_discover.isra.2+0x4e > usb_event_thread() at usb_event_thread+0x7c > > > Kernel built from src updated about a week ago: > > $ uname -a > NetBSD neva 7.99.35 NetBSD 7.99.35 (GENERIC) #0: Wed Aug 3 23:02:01 BST 2016 > > alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC > amd64 > > $ gunzip -c /netbsd |ident |grep usbdi.c > $NetBSD: usbdi.c,v 1.171 2016/05/17 11:37:50 pooka Exp $ $ dmesg -M /var/crash/netbsd.9.core 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 usbd_ar_pipe() at WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 125 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 netbsd:usbd_ar_pipe+0x3b WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 132 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWERED ON SYSCALL 107 4 EXIT af227c08 6 WARNING: SPL NOT LOWERED ON SYSCALL 1 6 EXIT 0 7 WARNING: SPL NOT LOWER
current amd64 crashes in usb_transfer_complete
Got KASSERT when a yubikey device was plugged in. $ crash -M netbsd.9.core Crash version 7.99.35, image version 7.99.35. System panicked: kernel diagnostic assertion "xfer->ux_state == XFER_ONQU" failed: file "/home/alnsn/netbsd-current/clean/src/sys/dev/usb/usbdi.c", line 910 Backtrace from time of crash is available. crash> bt _KERNEL_OPT_NARCNET() at 0 _KERNEL_OPT_NARCNET() at 0 vpanic() at vpanic+0x149 cd_play_msf() at cd_play_msf usb_transfer_complete() at usb_transfer_complete+0x24e xhci_abort_xfer() at xhci_abort_xfer+0x15d usbd_ar_pipe() at usbd_ar_pipe+0x3b usbd_abort_pipe() at usbd_abort_pipe+0x27 uhidev_detach() at uhidev_detach+0x32 config_detach() at config_detach+0xf8 usb_disconnect_port() at usb_disconnect_port+0xae uhub_explore() at uhub_explore+0x1fe usb_discover.isra.2() at usb_discover.isra.2+0x4e usb_event_thread() at usb_event_thread+0x7c Kernel built from src updated about a week ago: $ uname -a NetBSD neva 7.99.35 NetBSD 7.99.35 (GENERIC) #0: Wed Aug 3 23:02:01 BST 2016 alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC amd64 $ gunzip -c /netbsd |ident |grep usbdi.c $NetBSD: usbdi.c,v 1.171 2016/05/17 11:37:50 pooka Exp $ Alex
Re: UVM and the NULL page
Maxime Villard wrote: > ... > You know - as well as I do - that NULL pointer dereferences are quite common, > and that it is the main way to execute malicious code in kernel mode. Allowing > NULL is a huge problem on architectures like amd64. The way you are talking > about compatibility sounds like you are ready to sacrifice the security of > almost every NetBSD user just to allow a few programs that make use of a > mapping > model which has been known to be flawed for years. This is ridiculous. I don't think that NULL pointer dereference is "is the main way to execute malicious code in kernel mode". There are so many other ways. If you're so scared about being hacked, stop writing in C. -- Alex
Re: dump to cgdNb device
Alexander Nasonov wrote: > Taylor R Campbell wrote: > > Maybe savecore should have an option to specify what the dump device > > is supposed to be. > > I think it has but I didn't try it. Ah, I misread your sentence. savecore understands -N but I didn't try it. -- Alex
Re: dump to cgdNb device
Taylor R Campbell wrote: > Can you ktrace it to see what devices it's actually looking at? > > Every time I try to scrutinize savecore it makes me dizzy. I think if > you selected cgd1b as the dump device with `swapctl -D', rather than > using the default of b, and/or if the running kernel image > doesn't match the dumped kernel image, savecore may get confused. Indeed. When I renamed my kernel to /netbsd, savecore worked. > Maybe savecore should have an option to specify what the dump device > is supposed to be. I think it has but I didn't try it. Alex
Re: dump to cgdNb device
Taylor R Campbell wrote: > So maybe it did get written, but there's just something funny about > it. If you pass -v to savecore, can you find what aspect of the dump > savecore doesn't like? I don't see any difference between -vf and -f. savecore: magic number mismatch (0x0 != 0x8fca0101) savecore: no core dump savecore: warning: /dev/ksyms version mismatch: NetBSD 7.99.30 (GENERIC) #0: Sun May 29 20:24:03 BST 2016 alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC and atad savecore: msgbuf magic incorrect savecore: reboot after panic: <<< 1K+ of non-printable nonsense >>> savecore: dump time is zero savecore: writing core to /var/crash/netbsd.5.core savecore: writing kernel to /var/crash/netbsd.5 savecore: (null): Bad address dumplo = 110530048 (215879 * 512) Alex
Re: dump to cgdNb device
Taylor R Campbell wrote: > Just to double-check (since I am likely to make this mistake myself!), > you're not using a random-keyed cgd, right? Random-keyed cgd is great > for swap while the key is in memory but not so great for dump after > rebooting! I have both types but the one I was testing was definitely with a persistent key. > Does strings(1) on /dev/rcgd1b show anything vaguely meaningful? Yes, I see this string: NetBSD 7.99.30 (CGDDEBUG) #0: Thu Jun 16 19:40:27 BST 2016 alnsn@neva:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/CGDDEBUG This is the kernel I was running. The disk was initially filled with random bits and CGDDEBUG kernel didn't exist back then. > Hmm... I might have caused your dump to scribble all over another > part of the disk, by not adjusting the blkno before passing it along > to bdev_dump. I'm not sure about this -- but the blkno adjustments > certainly need to be reviewed. I don't see any problem after reboot. I cat'ed all files in neighbour partitions (496M and 3G) to /dev/null and I don't see any error. There aren't full, though. One if about 50% full, the other is 23% full. Alex
Re: dump to cgdNb device
Taylor R Campbell wrote: > Slightly simpler version attached -- device_private is not > device_lookup; device_private never fails. (If the device_lookup in > cgddump failed, we wouldn't have gotten to cgd_dumpblocks anyway.) It dumped succesfully but savecore doesn't see any valid core. If I run savecore -f, it prints gibberish. Alex
Re: dump to cgdNb device
Michael van Elst wrote: > al...@yandex.ru (Alexander Nasonov) writes: > > >There is a risk even with hardware devices but it's smaller because less > >software is involved. Dumping to cgd is a quite important usecase and > >perhaps we should make an exception. Would it help to RO protect some > >data structures like private keys? > > You would need to protect all data that is required to dump a block, > the keys aren't more important than e.g. the disklabel or the > bus space handle of the disk controller. True, but you have to protect disklabel even for "hardware" devices. My point was about protecting code specific to a "software" device to make it looks more like "hardware" device. Alex
dump to cgdNb device
Hi, I setup an encrypted disk cgd1 (aes-cbc 256 on top of wd0g, disklabel verification) with a dump device cgd1b but I can't dump to it (I enter ddb and type sync to dump). It prints "device bad". If I enable CGDDEBUG and set cgddebug to 1, it prints some additional information (typing from a photo): dumping to dev 20,17 (offset=215879, size=3119109):getcgd_softc(0x1411): unit = 1 cgdsize(0x1411) cgdopen(0x1411, 0) getcgd_softc(0x1411): unit = 1 cgdclose(0x1411, 0) getcgd_softc(0x1411): unit = 1 dump cgddump(0x1411, 215879, 0x8000d6c3e000, 4096) getcgd_softc(0x1411, unit = 1 device bad With more verbose debugging it can hang instead of printing "device bad". For example, if I set cgddebug to 4 and mount cgd1e dumping to dev 20,17 (offset=215879, size=3119109):getcgd_softc(0x1411): unit = 1 cgdsize(0x1411) cgdopen(0x1411, 0) getcgd_softc(0x1411): unit = 1 getcgd_softc(0x1413): unit = 1 cgdstrategy(0xfe836f63b5a8): b_bcount = 512 cgd_diskstart(x0fe836f85cd08, 0xfe836f63b5a8) Is it expected to work at all? I'm running: NetBSD neva 7.99.30 NetBSD 7.99.30 (GENERIC) #0: Sun May 29 20:24:03 BST 2016 alnsn@nebeda.localdomain:/home/alnsn/netbsd-current/clean/src/sys/arch/amd64/compile/obj/GENERIC amd64 wd0 disklabel (cgd1 is on wd0g): #sizeoffset fstype [fsize bsize cpg/sgs] a: 1049328 134219776 4.2BSD 1024 8192 0 # (Cyl. 133154*- 134195*) b: 25168752 135269104 swap # (Cyl. 134195*- 159164*) c: 365898416 134219776 unused 0 0# (Cyl. 133154*- 496148) d: 500118192 0 unused 0 0# (Cyl. 0 - 496148) e: 33554432 2048 Linux Ext2 0 0# (Cyl. 2*- 33290*) f: 100663296 33556480 Linux Ext2 0 0# (Cyl. 33290*- 133154*) g: 339680336 160437856 4.2BSD 2048 16384 0 # (Cyl. 159164*- 496148) cgd1 disklabel: #sizeoffset fstype [fsize bsize cpg/sgs] a: 1048576 0 4.2BSD 1024 8192 0 # (Cyl. 0 -511) b: 25168752 1048576 swap # (Cyl.512 - 12801*) d: 339680336 0 unused 0 0# (Cyl. 0 - 165859*) e: 6292188 26217328 4.2BSD 2048 16384 0 # (Cyl. 12801*- 15873*) f: 227059614 32509516 4.2BSD 2048 16384 0 # (Cyl. 15873*- 126742*) g: 41943040 259569130 4.2BSD 2048 16384 0 # (Cyl. 126742*- 147222*) h: 33554432 301512170 4.2BSD 2048 16384 0 # (Cyl. 147222*- 163606*) i: 2516582 335066602 4.2BSD 2048 16384 0 # (Cyl. 163606*- 164835*) j: 1048576 337583184 4.2BSD 1024 8192 0 # (Cyl. 164835*- 165347*) k: 1048576 338631760 4.2BSD 1024 8192 0 # (Cyl. 165347*- 165859*) Thanks, Alex
Re: status of Linux ptrace on amd64?
Christos Zoulas wrote: In article 20141202221949.GA4053@neva, Alexander Nasonov al...@yandex.ru wrote: I don't think source code is available. It is going to be hard to get it to work Yeah. I can't make any further progress. Working gdb on 32bit Linux binary gave me a false hope. I'm stuck at orig_ax. If I understand correctly, pin (and strace too) want to know a syscall number of a process stopped in a syscall and they query for orig_ax. I use (lt-l_sysent - emul_linux.e_sysent) to get a syscall number but l_sysent is always NULL. I haven't looked at whether a traced process is really in syscall and whether l_sysent is guaranteed to be syncronised accross cpus (I think, it's not but l_sysent is always NULL in nonsmp kernel too). Anyway, I've got some working code. It's not ptrace related, though. To make the tool run, I needed to add support for /proc/sys/kernel/osrelease. The patch is here http://www.netbsd.org/~alnsn/procfs-sys-kernel-osrelease.diff Alex
/proc/sys/kernel/osrelease patch
[Changing the topic from Re: status of Linux ptrace on amd64?] Christos Zoulas wrote: On Dec 4, 10:15am, m...@netbsd.org (Emmanuel Dreyfus) wrote: | That looks fine. Any other reviewer? The code in the PFSsyskern and PFSsys can be factored out to a function (and used perhaps by other directories with single entries). The procfs code has a mini framework for static entries but it's limited to the root directory only. Extending it would be a better approach I think but it would require more work. nm[0] assignment can move outside the if like nm[3]. I'd like to move two almost identical copies of the code in procfs_doversion() and procfs_doosrelease() to a new helper function. I'll check if your still comment applies after the change. Alex
Re: status of Linux ptrace on amd64?
Alexander Nasonov wrote: Christos Zoulas wrote: Should not be that hard, but what is that tool reading from PEEKUSER? Registers? In both cases it reads from user_regs_struct, if I understand everything correctly. But it's the first step, the tool would definitely try other things if PEEKUSER didn't fail. In fact, I'm not sure that thing would work at all because it's a dynamic instrumentation tool called Pin. It updates code on the fly. OK, I implemented reading orig_ax by PEEKUSER. It fixed ptrace failure but it added two problems: 1. I had to disable compat_linux32 to fix compilation. I need to make linux_sys_ptrace_arch multiarch-aware. 2. PEEKUSER didn't change anything. The tool still detaches its child and exits. I can't debug to see what's going on, gdb receives SIGUSR1 from the tool. I suspect PEEKUSER returns unexpected value. Alex
Re: status of Linux ptrace on amd64?
Christos Zoulas wrote: Have you tried porting the tool to NetBSD? What is involved? I don't think source code is available. Alex
status of Linux ptrace on amd64?
Hi, While trying to make some Linux instrumentation tool work on -current amd64 I noticed that ptrace support in compat_linux and compat_linux32 have some notable differences and compat_linux32 has a better support. For instance, I can debug 32bit Linux binaries with NetBSD's gdb but any attempt to set a breakpoint on 64bit Linux results in SIGTRAP. Neither compat_linux32 nor compat_linux work for that particular tool because they don't support LIUNX_PEEKUSER. So, I wonder how much work it is to improve ptrace in compat_linux and compat_linux32? Alex
Re: status of Linux ptrace on amd64?
Christos Zoulas wrote: Should not be that hard, but what is that tool reading from PEEKUSER? Registers? In both cases it reads from user_regs_struct, if I understand everything correctly. But it's the first step, the tool would definitely try other things if PEEKUSER didn't fail. In fact, I'm not sure that thing would work at all because it's a dynamic instrumentation tool called Pin. It updates code on the fly. I've got one more question. Is it possible for Linux emulated binary to control NetBSD native binary with ptrace? I see that it can attach, pokedata and continue but would it be able to do more advanced things correctly? Alex
ACPI warning
Hi, I noticed a strange warning on my console: ACPI Warning: \_SB_.PCI0.LPCB.SNC_.GSNE: Insufficient arguments - Caller passed 0, method requires 1 (20131218/nsarguments-263) What does it mean? In case it matters, I was playing with linux emulation and /emul/linux/proc/version. I'm running -current built in August. NetBSD neva 7.99.1 NetBSD 7.99.1 (GENERIC) #0: Wed Aug 20 18:54:44 BST 2014 alnsn@neva:/home/alnsn/netbsd-current/src/sys/arch/amd64/compile/obj/GENERIC amd64 Alex
Re: icache sync private rump component
Matt Thomas wrote: On Jul 19, 2014, at 2:02 AM, Alexander Nasonov al...@yandex.ru wrote: To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1 to Makefile.rump for mips platforms. This is the only change outside of sljit scope. the cache instructions are privileged. There's a sysarch interface that you can use the clean the cache. That's exactly what rumpcomp_sync_icache() hypercall does. I need this definition to compile cache.c stub. To draw an analogy, my hack is similar to this hack, except that I define one baseline cpu while they define all cpus: #ifdef _KERNEL #if defined(_MODULAR) || defined(_LKM) || defined(_STANDALONE) /* Assume all CPU architectures are valid for LKM's and standlone progs */ #define MIPS1 1 #define MIPS3 1 #define MIPS4 1 #define MIPS32 1 #define MIPS32R21 #define MIPS64 1 #define MIPS64R21 #endif Alex
Re: icache sync private rump component
Justin Cormack wrote: On Jul 19, 2014 10:01 AM, Alexander Nasonov al...@yandex.ru wrote: To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1 to Makefile.rump for mips platforms. This is the only change outside of sljit scope. You surely can't do that, people may be trying to compile on non mips3 hardware? Some mips have different ILP models in userspace and in the kernel. This means that rump test doesn't necessarily test the code in the kernel. I came up with the following: In sys/rump/Makefile.rump: # Define baseline cpu for mips ports, required for # rumpcomp_sync_icache() hypercall. .if !empty(MACHINE_ARCH:Mmips*) .if !empty(MACHINE_ARCH:Mmips64*) CPPFLAGS+= -DMIPS64=1 .else CPPFLAGS+= -DMIPS1=1 .endif .endif In sys/arch/mips/include/sljitarch.h: #ifdef _LP64 #define SLJIT_CONFIG_MIPS_64 1 #else #define SLJIT_CONFIG_MIPS_32 1 #endif Alex
icache sync private rump component
Hi, I'd like to commit a private rump component that adds a hypercall for synching icache. This will help us to test bpfjit and npf on arm and mips platforms. I already sent a couple of emails to Antti but because he hasn't replied yet and the branching date is fast approaching, I thought I'd give a heads up to the community. I'll commit my changes if I don't hear any objections in the next few days. Basically, this component resides in librumpkern_sljit and it exposes one function: int rumpcomp_sync_icache(void *, uint64_t); This declaration doesn't go to any public header file because it's only being used inside librumpkern_sljit. On arm, rumpcomp_sync_icache() makes ARM_SYNC_ICACHE sysarch syscall while on mips it calls _cacheflush() which is defined in libc. On the kernel side, both arm and mips use a global object with a bunch of function pointers to various cpu-related routines. Those objects are defined in cpufunc.c and cache.c, respectively. For the rump kernel, I add barebone versions of those objects with only one non-NULL function pointer for icache_rync_range routine. To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1 to Makefile.rump for mips platforms. This is the only change outside of sljit scope. Alex
Re: icache sync private rump component
Justin Cormack wrote: On Jul 19, 2014 10:01 AM, Alexander Nasonov al...@yandex.ru wrote: To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1 to Makefile.rump for mips platforms. This is the only change outside of sljit scope. You surely can't do that, people may be trying to compile on non mips3 hardware? Pre mips3 or post mips3? Or even completely different port like amd64? If rump kernel worked without setting any of these variables, chances are, it will work with -DMIPS3. I can actually change it to -DMIPS1. Alex
Re: icache sync private rump component
Alexander Nasonov wrote: Justin Cormack wrote: On Jul 19, 2014 10:01 AM, Alexander Nasonov al...@yandex.ru wrote: To compile mips/cache.h in rump kernel, I needed to add -DMIPS3=1 to Makefile.rump for mips platforms. This is the only change outside of sljit scope. You surely can't do that, people may be trying to compile on non mips3 hardware? Pre mips3 or post mips3? Or even completely different port like amd64? If rump kernel worked without setting any of these variables, chances are, it will work with -DMIPS3. I can actually change it to -DMIPS1. Wikipedia (https://en.wikipedia.org/wiki/MIPS_architecture) states that Each revision is a superset of its predecessors. So, I can use MIPS I as a base and set -DMIPS1=1 for rump. Does it sound good? Alex
Re: icache sync private rump component
Alexander Nasonov wrote: Wikipedia (https://en.wikipedia.org/wiki/MIPS_architecture) states that Each revision is a superset of its predecessors. This is strictly true for MIPS I to MIPS V but we also have MIPS 32 and MIPS 64. MIPS 32 is based on MIPS II (with some features from MIPS III, IV and V) while MIPS 64 is based on MIPS V. So, -DMIPS1=1 should be fine for all ports. Alex
Re: icache sync private rump component
Greg Troxel wrote: Why is this private? If it's generally necessary, it should become part of the standard interface.Is it just that sljit is the only place that is currently creating code that is later executed? I made a change to the standard place about a month ago but Antti objected and I rolled it back. We discussed it on source-changes-d. If my understaning is correct, Antti would like to see a generic interface that handles other related things like W^X protection. In the end we agreed on creating a private component. Yes, sljit is the only place that generate code on the fly. Other executable code is handled by rt-linker. Alex pgpnhjA0K1EsT.pgp Description: PGP signature
Re: icache sync private rump component
19.07.2014, 19:57, Greg Troxel g...@ir.bbn.com: Are you saying that's the evenntual goal and this is just a temporary hack to make sjlit work on rump? Committed code will have a comment stating that it's temporary but because I don't know what exactly should be in the new interface I can't promise that I can undertake that work. Alex PS sorry for the formatting, I'm replying from a tablet. -- Alex
Re: lua: pending patches
Lourival Vieira Neto wrote: Hi folks, Here are some pending patches which I want to commit: http://www.netbsd.org/~lneto/pending/. Please, could someone review them? Thank you in advance! -- 0007: lua: updated from 5.1 to 5.3 work3 I will review the changes later but I wonder why the rush to update lua to work-in-progress version? Alex
Re: one time crash in usb_allocmem_flags
10.02.14, 12:15, Nick Hudson sk...@netbsd.org: Please fill a PR so it doesn't get forgotten about. Sure, will do. At first glance it doesn't look like that usb_frag_freelist isn't protected correctly. I looks more like random corruption. What was the value of %edx? The stack isn't in that function anymore, I'm not sure it shows the right values. 'show registers' command prints all zeroes except rbp=fe80ca6f1450 and rsp=fe80ca6f1410. -- Alex
one time crash in usb_allocmem_flags
Hi, I was running current amd64 (last updated few weeks ago) when I got a random crash shortly after switching to X mode. If my analysis is correct, it crashed in usb_allocmem_flags inside this loop: LIST_FOREACH(f, usb_frag_freelist, next) { KDASSERTMSG(usb_valid_block_p(f-block, usb_blk_fraglist), %s: usb frag %p: unknown block pointer %p, __func__, f, f-block); if (f-block-tag == tag) break; } It couldn't access f-block-tag. I wasn't actively using any of the usb devices at that time. I wonder if it's a known problem or should I file a PR? Details of the analysis is below. Thanks, Alex crash dmesg ... fatal protection fault in supervisor mode trap type 4 code 0 rip 808515e2 cs 8 rflags 13282 cr2 7f7ff5773020 ilevel 0 rsp fe80ca6f16c0 curlwp 0xfe811a8aaba0 pid 475.1 lowest kstack 0xfe80ca6ee000 panic: trap cpu2: Begin traceback... vpanic() at netbsd:vpanic+0x13e printf_nolog() at netbsd:printf_nolog startlwp() at netbsd:startlwp alltraps() at netbsd:alltraps+0x9e ehci_allocm() at netbsd:ehci_allocm+0x2c usbd_transfer() at netbsd:usbd_transfer+0x5f usbd_open_pipe_intr() at netbsd:usbd_open_pipe_intr+0xcb uhidev_open() at netbsd:uhidev_open+0xb3 wsmouseopen() at netbsd:wsmouseopen+0xf3 cdev_open() at netbsd:cdev_open+0x87 spec_open() at netbsd:spec_open+0x183 VOP_OPEN() at netbsd:VOP_OPEN+0x33 vn_open() at netbsd:vn_open+0x1b0 do_open() at netbsd:do_open+0x102 do_sys_openat() at netbsd:do_sys_openat+0x68 sys_open() at netbsd:sys_open+0x24 syscall() at netbsd:syscall+0x9a --- syscall (number 5) --- 7f7ff403af3a: cpu2: End traceback... rebooting in 10 9 8 7 6 5 4 3 2 1 0 crash dmesg|grep usb usb0 at xhci0: USB revision 2.0 usb1 at ehci0: USB revision 2.0 uhub0 at usb0: NetBSD xHCI Root Hub, class 9/0, rev 2.00/1.00, addr 0 uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 usbd_transfer() at netbsd:usbd_transfer+0x5f usbd_open_pipe_intr() at netbsd:usbd_open_pipe_intr+0xcb crash x 0x808515e2 usb_allocmem_flags+0xfd:751a3948 $ objdump -d /netbsd ... 8085158b: 48 c7 c7 60 15 f8 80mov$0x80f81560,%rdi 80851592: e8 69 42 d3 ff callq 80585800 mutex_enter 80851597: 48 8b 05 c2 bf 69 00mov0x69bfc2(%rip),%rax # 80eed560 usb_frag_freelist 8085159e: 48 85 c0test %rax,%rax 808515a1: 75 3c jne808515df usb_allocmem_flags+0xfa /* You don't need to look at this block */ 808515a3: 48 8d 4d c8 lea-0x38(%rbp),%rcx 808515a7: 45 31 c0xor%r8d,%r8d 808515aa: ba 40 00 00 00 mov$0x40,%edx 808515af: be 00 10 00 00 mov$0x1000,%esi 808515b4: 48 89 dfmov%rbx,%rdi 808515b7: e8 f4 fb ff ff callq 808511b0 usb_block_allocmem 808515bc: 89 c3 mov%eax,%ebx 808515be: 85 c0 test %eax,%eax 808515c0: 75 ac jne8085156e usb_allocmem_flags+0x89 808515c2: 48 8b 4d c8 mov-0x38(%rbp),%rcx 808515c6: c7 41 38 00 00 00 00movl $0x0,0x38(%rcx) 808515cd: bb 40 00 00 00 mov$0x40,%ebx 808515d2: 31 d2 xor%edx,%edx 808515d4: eb 57 jmp8085162d usb_allocmem_flags+0x148 /* end of block. */ /* LIST_FOREACH(f, usb_frag_freelist, next) { */ 808515d6: 48 8b 40 10 mov0x10(%rax),%rax 808515da: 48 85 c0test %rax,%rax 808515dd: 74 c4 je 808515a3 usb_allocmem_flags+0xbe 808515df: 48 8b 10mov(%rax),%rdx 808515e2: 48 39 1acmp%rbx,(%rdx) 808515e5: 75 ef jne808515d6 usb_allocmem_flags+0xf1 crash ps PIDLID S CPU FLAGS STRUCT LWP * NAME WAIT 475 1 7 2 0 fe811a8aaba0 Xorg 72 1 2 3 902 fe811a709b80 xinit 43 1 2 3 802 fe811a709760 sh 437 1 2 3 802 fe811d311720ksh 420 1 2 2 802 fe811e2b6240 getty 435 1 2 0 802 fe811e2b6a80 getty 429 1 2 3 802 fe811e2b6660 login 412 1 2 0 802 fe811e4c1220 getty 390 1 2 0 802 fe8119a90b60 cron 407 1 2 0 802 fe811d767b00 inetd 342 1 2 3 802 fe811d311300
Mostly working uts(4) touchscreen
Touchscreen of my new notebook was reporting touchscreen has no range report which turned out to be Z-axis related and I thought that always passing z=0 might solve the problem. And it indeed mostly solved the problem. I can move the mouse and click inside the modular Xorg. The only problem is that the mouse moves two times faster to the right and 4-5 faster to the bottom. Basically, it works like if I had a small touchpad at the top left corner of my screen. If I'm in a text mode and run 'wsmoused -d /dev/wsmouse1', the cursor doesn't move at all but it flashes (in the center of the screen) when I touch the screen. I wonder where should I make the last change to restore 1:1 mappings between movements and screen pixels? Excerpts from dmesg, xorg.conf and my patch to dev/usb/uts.c are below. Alex Section InputDevice Identifier Mouse1 Driver mouse Option Protocol wsmouse Option Device /dev/wsmouse1 Option ZAxisMapping X EndSection $ dmesg |grep uhid uhidev0 at uhub2 port 3 configuration 1 interface 0 uhidev0: eGalax Inc. eGalaxTouch EXC7910-1031-12.00.03, rev 2.00/31.04, addr 5, iclass 3/1 uhidev0: 7 report ids uhid0 at uhidev0 reportid 3: input=63, output=63, feature=0 uhid1 at uhidev0 reportid 5: input=0, output=0, feature=2 uts0 at uhidev0 reportid 6wsmouse1 at uts0 mux 0 uhid2 at uhidev0 reportid 7: input=0, output=0, feature=256 uhidev0 at uhub2 port 3 configuration 1 interface 0 uhidev0: eGalax Inc. eGalaxTouch EXC7910-1031-12.00.03, rev 2.00/31.04, addr 5, iclass 3/1 uhidev0: 7 report ids uhid0 at uhidev0 reportid 3: input=63, output=63, feature=0 uhid1 at uhidev0 reportid 5: input=0, output=0, feature=2 uts0 at uhidev0 reportid 6wsmouse1 at uts0 mux 0 uhid2 at uhidev0 reportid 7: input=0, output=0, feature=256 uhidev0 at uhub2 port 3 configuration 1 interface 0 uhidev0: eGalax Inc. eGalaxTouch EXC7910-1031-12.00.03, rev 2.00/31.04, addr 4, iclass 3/1 uhidev0: 7 report ids uhid0 at uhidev0 reportid 3: input=63, output=63, feature=0 uhid1 at uhidev0 reportid 5: input=0, output=0, feature=2 uts0 at uhidev0 reportid 6wsmouse1 at uts0 mux 0 uhid2 at uhidev0 reportid 7: input=0, output=0, feature=256 $ cvs diff -u sys/dev/usb/uts.c Index: sys/dev/usb/uts.c === RCS file: /cvsroot/src/sys/dev/usb/uts.c,v retrieving revision 1.3 diff -p -u -u -r1.3 uts.c --- sys/dev/usb/uts.c 5 Jan 2013 23:34:21 - 1.3 +++ sys/dev/usb/uts.c 4 Jan 2014 23:08:03 - @@ -75,6 +75,7 @@ struct uts_softc { struct hid_location sc_loc_btn; int sc_enabled; + int z_enabled; int flags; /* device configuration */ #define UTS_ABS0x1 /* absolute position */ @@ -199,13 +200,9 @@ uts_attach(device_t parent, device_t sel return; } - /* requires HID usage Digitizer:In_Range */ - if (!hid_locate(desc, size, HID_USAGE2(HUP_DIGITIZERS, HUD_IN_RANGE), - uha-reportid, hid_input, sc-sc_loc_z, flags)) { - aprint_error_dev(sc-sc_hdev.sc_dev, - touchscreen has no range report\n); - return; - } + sc-z_enabled = hid_locate(desc, size, + HID_USAGE2(HUP_DIGITIZERS, HUD_IN_RANGE), + uha-reportid, hid_input, sc-sc_loc_z, flags); /* multi-touch support would need HUD_CONTACTID and HUD_CONTACTMAX */ @@ -215,8 +212,12 @@ uts_attach(device_t parent, device_t sel sc-sc_loc_x.pos, sc-sc_loc_x.size)); DPRINTF((uts_attach: Y\t%d/%d\n, sc-sc_loc_y.pos, sc-sc_loc_y.size)); - DPRINTF((uts_attach: Z\t%d/%d\n, - sc-sc_loc_z.pos, sc-sc_loc_z.size)); + if (sc-z_enabled) { + DPRINTF((uts_attach: Z\t%d/%d\n, + sc-sc_loc_z.pos, sc-sc_loc_z.size)); + } else { + DPRINTF((uts_attach: Z disabled\n)); + } #endif a.accessops = uts_accessops; @@ -368,7 +369,7 @@ uts_intr(struct uhidev *addr, void *ibuf } else dy = -hid_get_data(ibuf, sc-sc_loc_y); - dz = hid_get_data(ibuf, sc-sc_loc_z); + dz = sc-z_enabled ? hid_get_data(ibuf, sc-sc_loc_z) : 0; if (hid_get_data(ibuf, sc-sc_loc_btn)) buttons |= 1;
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Moreover, the usual byte-code produced by tcpdump/pcap does not even use the memory store so you optimisations would most of the time be applicable anyway! This is not always the case. For instance, # tcpdump -y IEEE802_11 -i urtwn0 -d not tcp tcpdump: data link type IEEE802_11 (000) ldx #0x0 (001) txa (002) add #24 (003) st M[0] (004) ldb [x + 0] (005) jset #0x8 jt 6jf 11 (006) jset #0x4 jt 11 jf 7 (007) jset #0x80jt 8jf 11 (008) ld M[0] (009) add #2 (010) st M[0] (011) ldb [0] (012) jset #0x4 jt 27 jf 13 (013) ldb [0] (014) jset #0x8 jt 15 jf 27 (015) ldx M[0] (016) ldh [x + 6] (017) jeq #0x86dd jt 18 jf 27 (018) ldx M[0] (019) ldb [x + 14] (020) jeq #0x6 jt 37 jf 21 (021) ldx M[0] (022) ldb [x + 14] (023) jeq #0x2cjt 24 jf 27 (024) ldx M[0] (025) ldb [x + 48] (026) jeq #0x6 jt 37 jf 27 (027) ldb [0] (028) jset #0x4 jt 38 jf 29 (029) ldb [0] (030) jset #0x8 jt 31 jf 38 (031) ldx M[0] (032) ldh [x + 6] (033) jeq #0x800 jt 34 jf 38 (034) ldx M[0] (035) ldb [x + 17] (036) jeq #0x6 jt 37 jf 38 (037) ret #0 (038) ret #65535 Alex
Re: BPF memstore and bpf_validate_ext()
Sorry for top-posting. I'm replying from my phone. I've not looked at linux bpf before. I remember taking a quick look at bpf_jit_compile function but I didn't like emitting binary machine code with macro commands. I spent few minutes today looking at linux code and I noticed few interesting things: - They use negative offsets to access auxiliary data. So, there is a clear distinction between local memory store and external data. I don't think it's a new addition, though. - They have a big enum of commands. Many of them translate to bpf commands but there are also special commands like load protocol number into A. There is a decoder from bpf but I have no clue how it works. - Those commands are adapted to work with skbuf data. Alex 20.12.13, 04:16, David Laight da...@l8s.co.uk: On Fri, Dec 20, 2013 at 01:28:12AM +0200, Mindaugas Rasiukevicius wrote: Alexander Nasonov al...@yandex.ru wrote: Well, if it wasn't needed for many year in bpf, why do we need it now? ;-) Because it was decided to use BPF byte-code for more applications and that meant there is a need for improvements. It is called evolution. :) Has anyone here looked closely at the changes linux is making to bpf? David -- David Laight: da...@l8s.co.uk -- Alex
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: That is great, but we are going circles here. If a program just needs some values stored in the memory store - why would you create a COP to get few integers instead of simply letting the caller to pass them. It is the thing which actually has those numbers. Well, if it wasn't needed for many year in bpf, why do we need it now? ;-) To answer your question about a program that needs some values stored, you want a protocol to pass some values from the host C environment to embedded BPF environment. You do it in a very ad-hoc manner. It's not possible to say by looking at bpf program which memwords are passed from outside and which are really internal. BPF is a language and all its extensions should be designed with this in mind. I have already explained the benefits of the external memory store in this thread (simple, straightforward way to pass values and have their cache). Your main argument seems to be that the external memstore makes it more difficult for bpfjit to optimise certain corner cases (which, as been pointed out, are not common). Well, I was arguing about performance because you were concerned about performance of your bpf programs. Your solution lacks a concept. You're mixing things together without bothering much about how will they interact to achieve a common goal. I already pointed out that your COP is powerful enough to copy external data to a memory store. To add to my other point above, you're mixing local and global memory. Or, if you use an analogy with Lua (that uses a byte code to interpret program), there are local and global variables with a very clear distinction between them. With your proposal, there is no clear distinction inside bpf program which memory words are internal and which are external. Though, I don't think I want to see global memory in bpf. Given that BPF did not have JIT compiler for many years and even a very simplistic JIT compilation is a huge win - I do not understand why do you consider corner case optimisations as a higher value than the benefits provided by the external memstore. I personally see no benefits in external memstore at all. You can copy external data with a special copfunc. BPF_COP is powerful enough for getting external data. It's a bit slower but I can give you SLJIT_FAST_CALL interface. If it's still slow for you, you should stop using sljit and consider other alternatives. That is wonderful, but not the point.. Ok, let me make a point them. If you want to make significant changes to bpf, you'd better start from scrach and design whatever you want. I'd be easier than arguing about bpf changes forever. Alex
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Alexander Nasonov al...@yandex.ru wrote: My point is that you mix argument pack with something else. They should be separeted. The external memory store can be used as an argument (and the initial values determined as proposed in this thread). If you want, we can pass it as a third argument, I just think it is a pointless indirection level. Would even need extra wrapping i.e. more work in bpfjit, but if you want that separated - fine. I'll tell you more. Mixing mem with arguments will not only save me one register/one indirection level but it will allow to use memory addressing. It's very attractive solution from a performance perspective. But it's clearly a hack and I don't want to have hacks in public API. I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit. They're much faster than regular copfuncs. But that's mean you will need to emit sljit code and you will have a limited number of sljit registers and all other limitations of sljit. You still should be able to copy data from auxiliary argument to memstore and you can do it quite fast. ... That is great, but we are going circles here. If a program just needs some values stored in the memory store - why would you create a COP to get few integers instead of simply letting the caller to pass them. It is the thing which actually has those numbers. What if your program just needs something else? Does it mean we have to add a new feature to bpf? No. We have to stop somewhere. BPF_COP is powerful enough for getting external data. It's a bit slower but I can give you SLJIT_FAST_CALL interface. If it's still slow for you, you should stop using sljit and consider other alternatives. It's easy to suggest to have a flag but it's actually a lot or work. You need to write several lines of C code to generate a single instruction. The flag would basically say treat the memstore as internal i.e. just do all the optimisations, because I assure there are no side effects. It is a green light for what you said you already want to implement. That would be one-liner check or I miss something? Ok, with my recent change on github, it's indeed one-line change but having two different modes can introduce subtle bugs. If you forget to set the mode properly, you may have a situation when your memory is in a bad state only for packets of length 131 (or whatever) and it's consistent for all other lengths. I'm as a maintainer of bpfjit is responsible for a robust public interface. Alex
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Also, it was you who proposed sljit. Proposed for what? I implemented bpfjit using sljit if that's what you mean. I offered you a help with implementing jit compiler for npfcode. It was your idea to add COP/COPX and I agreed to implement a support for it in bpfjit. I never agreed on implementing external memory. It can optimise *most* practical cases (80-20 rule) and I am happy with that. I do not understand why are you concerned about those rare/unusual cases. Do you have some particular application in mind? Something else than in our tree? I don't have any application in mind but I don't understand why are you pushing two extentions to bpf solely to get performance benefit for your cases and you don't care that bpf looses performance even if there are no cop instructions in a program at all. We can pass the memstore pointer as a separate argument (it would be three arguments, fine for sljit), but what's the point.. My point is that you mix argument pack with something else. They should be separeted. Why are you ignoring the fact that your optimisations can still be added and be effective? I already suggested - we can add a flag to indicate that the caller does not care about the result in the memory store. I already offered to support SLJIT_FAST_CALL copfuncs in bpfjit. They're much faster than regular copfuncs. But that's mean you will need to emit sljit code and you will have a limited number of sljit registers and all other limitations of sljit. You still should be able to copy data from auxiliary argument to memstore and you can do it quite fast. You didn't respond to me about it. If you ingored it because you don't want to deal with sljit than you're pulling a blanket. It's easy to suggest to have a flag but it's actually a lot or work. You need to write several lines of C code to generate a single instruction. I don't want to maintain two different modes of code generation. If you want this flag, go ahead, write the code, write the tests and everyone will be happy. Moreover, the usual byte-code produced by tcpdump/pcap does not even use the memory store so you optimisations would most of the time be applicable anyway! Maybe in this case. I don't know all use cases. There are some IDS/IPS that use bpf but I never looked at them. In any case, this functionality will have to be tested. Alex
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Alexander Nasonov al...@yandex.ru wrote: In your case BPF_COP is always the first instruction (or in the first linear block). Because it's the first and it's a function call, it can be moved outside of bpf program and inlined. If this is the case, you don't need BPF_COP at all. For this particular case - yes, correct. Except of course you may run out or memwords one day and you want to have BPF_COP as a fallback. I still need BPF_COP for other tasks (e.g. NPF_COP_TABLE). Running out of words is unlikely, but COP can certainly be used to handle that too. I wonder why do you need two different features when one is a superset of the other if you could use BPF_COP? If the only reason is performance, external memory is not a win-win proposition. PS sljit has a non-standard fast call mechanism. If you're really concerned about performance, you can generate copfuncs and I can call them from bpfjit. Alex
Re: BPF memstore and bpf_validate_ext()
Mindaugas Rasiukevicius wrote: Now that the BPF memory store can be external and can be provided by the caller of the BPF program, we can use to it pass some values or reuse it as a cache. I don't think that external memory is needed. You added an auxiliary agrument, what is it for if it's not for passing some values? External memory disables several optimizations in bpfjit for most filter programs even if they don't use BPF_COP. 1. sljit has a limited number of registers especially memory addressable registers. I will have to allocate a register to a memstore pointer. Because all registers are already assigned, I will have to start moving data. The best I can think of is assigning BPFJIT_TMP2 to a new pointer but I will need to switch to EREG register in 32bit BPF_LD. This register is emulated on some arches, for instance on i386. 2. I have (or plan to have) some simple optimizations which aren't possible by rewriting a bpf program. For instance, if there are multiple loads in a linear block, I will only generate one index check. Here is an example: LD+W(0) ST(0) LD+W(4) ST(1) LD+W(8) ST(2) ... It loads packet words with increasing offsets and stores them in memwords for later processing. I generate only only one index check for the first LD instruction (return 0 is packet is shorter than 12 bytes). I can exit early without storing any word because I know that memory will not be available after that bpf program returns. If you make it external, I will have to generate three index checks to make sure that stores are visible to a caller for all possible packet lengths. However, this has to be supported by bpf_validate() - we have to tell it which words are going to be initialised by the caller, since it currently checks and prevents loads of the uninitialised memstore words. Hence, here is the proposed API function for this: int bpf_set_initwords(bpf_ctx_t *bc, uint8_t *wordmask, size_t len); Your description of the new API is too terse to be absolutely certain but it doesn't look like my proposal I sent to you privately. I think you can add a mask to each copfunc which indicates which of the words it loads and which words are stored by the copfunc. For instance, your npf_cop_l3 will look like { .copfunc = npf_cop_l3, .loads = BPF_COP_LOAD_NONE, .stores = BPF_COP_STORE(BPF_MW_IPVER) | BPF_COP_STORE(BPF_MW_L4OFF) | BPF_COP_STORE(BPF_MW_L4PROTO) }, /* ... other copfuncs ... */ Validation of BPF_COP instruction will look very natural: /* inside BPF_COP in bpf_validate_ext */ if (bc-copfuncs) { if (bc-copfuncs[i].inwords invalid) goto out; invalid = ~bc-copfuncs[i].outwords; } It's a bit more trickier for BPF_COPX because you need to pre-calculate loads/stores masks but it's doable. At the moment, bpf_filter_ext() resets the invalid mask. This translates to .stores = BPF_COP_STORE_ALL. Alex
Re: [patch] put Lua standard libraries into the kernel
Marc Balmer wrote: Am 27.11.13 22:23, schrieb Martin Husemann: Can't it be a per-state option, passed by luactl when creating the state? That is actually an excellent idea. So what should be the default, stdlibs enabled or not enabled? I'm a bit late to join the conversation but I wanted to emphasize one quite important thing: Please keep look feel of plain userspace Lua even in the kernel space. If you look at NetBSD C code in the kernel, and compare it with some other's OS kernel code, NetBSD code will likely look more natural. I hope that one day we can say the same about Lua code in NetBSD. There is no need to invent new things if they're already provided by the languange. So, I don't think we need a special sysctl to control loading of standard libraries. Lua already gives you luaL_openlibs() and luaL_requiref(). Alex
Re: [patch] put Lua standard libraries into the kernel
Marc Balmer wrote: If they are, then of course the standard functions like luaL_openlibs() are used. So this not about a replacement mechanism for luaL_openlibs(), but to control whether luaL_openlibs() is called or not. Yes, I understand this but I don't understand why to you need a special control for it. It's your choice as a programmer to call that function or not to call. Why do you need a special variable/syscal to control this? Alex
Re: [patch] put Lua standard libraries into the kernel
Alexander Nasonov wrote: Yes, I understand this but I don't understand why to you need a special control for it. It's your choice as a programmer to call that function or not to call. Why do you need a special variable/syscal to control this? Ok, I missed luactl(8) part I guess. I don't know your usecase but this remote orchestration of kernel Lua state from userspace sounds a bit dodgy to me. Since you support 'luactl require' you can use it to load base libraries. There is no need for a boolean flag if you already have a more generic mechanism in place. Alex
Re: in which we present an ugly hack to make sys/queue.h CIRCLEQ work
matthew green wrote: thanks to dholland, apb, joerg, martin, matt, and skrll for this least-worst-so-far solution. I'm pretty sure that __attribute__((__may_alias__)) was on the table but I wonder why was it rejected? Thanks, Alex
Re: [patch] changing lua_Number to int64_t
Mouse wrote: Also, using an exact-width type assumes that the hardware/compiler in question _has_ such a type. It's possible that lua, NetBSD, or the combination of the two is willing to write off portability to machines where one or both of those potential portability issues becomes actual. But that seems to be asking for trouble to me; history is full of but nobody will ever want to port this to one of _those_ that come back to bite people. I was perfectly fine with long long because it's long enough to represent all integers in range [-2^53-1, 2^53-1]. As Marc pointed out, Lua has a single numeric type which is double by default. Many Lua libraries don't need FP and they use a subset of exactly representable integers (not all of them do range checks, though). Extending the range when porting from userspace to kernel will decrease the pain factor of porting. Alex
Re: A Library for Converting Data to and from C Structs for Lua
Marc Balmer wrote: I came accross a small library for converting data to an from C structs for Lua, written by Roberto Ierusalimschy: http://www.inf.puc-rio.br/~roberto/struct/ I plan to import it and to make it available to both lua(1) and lua(4) as follows: The source code will be imported into ${NETBSDSRCDIR}/sys/external/mit/struct unaltered and then be modified to compile on NetBSD. Shouldn't it be mit/luastruct? Then ${NETBSDSRCDIR}/sys/module/luastruct/ and ${NETBSDSRCDIR}/lib/lua/struct/ directories will be added with the respective Makefiles etc. Are you going to make it kmod? It's an overkill, IMO. Not every Lua module a kernel module. The framework should support this distinction. PS I don't see devel/lua-struct. It'd be nice to have it in pkgsrc. Alex
Re: [patch] changing lua_Number to int64_t
Lourival Vieira Neto wrote: On Sat, Nov 16, 2013 at 8:52 PM, Christos Zoulas chris...@astron.com wrote: In article 52872b0c.5080...@msys.ch, Marc Balmer m...@msys.ch wrote: Changing the number type to int64_t is certainly a good idea. Two questions, however: Why not intmax_t? My only argument is that int64_t has a well-defined width and, AFAIK, intmax_t could vary. But I have no strong feelings about this. Do you think intmax_t would be better? int64_t should be enough to cover a range of exactly representable integers in userspace Lua program where lua_Number is double. I don't see a need for bigger type unless mainstream Lua switches to long double. I don't expect it to happen any time soon. PS Why do you still use a shadow copy of luaconf.h? Please add your changes to the main luaconf.h. If you guard your kernel changes properly with _KERNEL, they will not affect userspace. PPS %PRId64 may break in C++11, space between the literals should fix it. Alex
Re: Moving Lua source codes
Marc Balmer wrote: Yes, this is an issue. Dunno if we need a 'kluac' or so, at the moment I'd say loading code from source form is ok. Supporting binary chunks is more challenging because binary format can change completely in a new Lua version. Source code is more stable, there are often small changes in the language too (like new keywords) but I'd say they are manageable. Alex
Re: Lua in-kernel (lbuf library)
Terry Moore wrote: Just to clarify a bit Indeed, we started with Lua because AWK was not embeddable and because of the 1-origin issue. We thought, mistakenly, that a language that didn't look very much like C would cause fewer problems because of the 0-origin / 1-origin difference. Apparently, however, it's a deeper habit of thought. Ditto for the string escape sequences. Apparently the '[' and ']' for indexing, and the '\' for character escapes, trigger very deeply seated patterns. While I agree that 0-1 switch is mentally hard [*] but if you often need to access arrays by index in Lua code then you either solving a wrong problem of doing something a wrong way. Lua is designed to be a glue language. If you need arrays (as opposite to collections which can be iterated over with ipairs/pairs/iterators), then you're likely doing low-level C stuff in Lua. Data structures in kernel are often organized as linked lists. To iterate from Lua, you probably want to write an iterator and also a lookup function. Lets say you want to iterate over all ps processes from ddb (if gdb has python support, why can't we have a cooler thing?). You do this: ddb lua on ddb for p in processes:match(ps) do print(p.pid) end 906 2245 4935 ... ddb =processes:find(1).path /sbin/init ddb lua off You can do a lot without ever accessing elements by integer indices. Based on our experience, it seems risky to use Lua to implement code that interoperates with kernel-like C code in security-critical contexts. We found this risk to be insurmountable. Engineers who are used to zero-origin code, and who are looking at zero-origin code for reference, will make zero-origin mistakes. Again, don't do C stuff in Lua. However, there is a different kind of risk when developing security code in Lua: it's the layers of Lua itselft. Any complex layer introduces a non-neglidgible risk. [*] I found it out while developing mixed Lua-C module which could also detect LuaJIT and use its zero-based FFI structures. Alex
Re: Lua in-kernel (lbuf library)
Christoph Badura wrote: Also, having to switch mentally between zero-based arrays in the kernel C code and 1-based arrays in the Lua code make my head ache. Yeah, I totally agree here. There are several other reasons why Lua will not become same league player with C in the kernel. But for some projects, the classical module (in C) and scripting (in Lua) separation works extremely well. This includes complex configurations where you need to orchestrate many calls to C code or some complex tasks like generating code for bpf or now defunct npf opcode. Alex
Re: Lua in-kernel (lbuf library)
Lourival Vieira Neto wrote: I'm developing a library to handle buffers in Lua, named lbuf. It is been developed as part of my efforts to perform experimentation in kernel network stack using Lua. Initially, I intended to bind mbuf to allow, for example, to write protocols dissectors in Lua. For example, calling a Lua function to inspect network packets: function filter(packet) if packet.field == value then return DROP end return PASS end Thus, I started to design a Lua binding to mbuf inspired by '#pragma pack' and bitfields of C lang. Then, I realized that this Lua library could be useful to other kernel (and user-space) areas, such as device drivers and user-level protocols. So, I started to develop this binding generically as a independent library to give random access to bits in a buffer. It is just in the early beginning, but I want to share some thoughts. I wonder if you looked at Lua support in Wireshark [1]? Unfortunately, it's GPL and they even have a special section 'Beware the GPL' on wiki. [1] http://wiki.wireshark.org/Lua Alex
Re: Adding Lua to the kernel and moving Lua source codes
Jean-Yves Migeon wrote: Le 07/10/2013 12:05, Alan Barrett a écrit : I still haven't seen a use case for in-kernel Lua. I mean, an example (preferably a working example) of something useful that could not easily be done without in-kernel Lua. I'd prefer not to see it added to the base system without a use case. I second the use case. Not something as polished or finished as possible, but at least shows that it is useful (I am well aware of the cause/effect circle ie. you cannot prove it without having Lua first available, but breaking that vicious circle with a few examples can help). In the early days of bpfjit when I didn't yet know of sljit, I was considering ripping off lua code for generating machine instructions from LuaJIT2 code. I still believe that rewriting bpfjit in Lua would improve readability. I even started a rewrite mostly as a good use-case for Lua bindings for sljit but it's low priority project for me. Alex
lua_Number in the kernel
[ Ccing to Justin who seems to be interested in Lua in NetBSD but I'm not sure whether he's subscribed to tech-kern@ ]. Like some other people, I beleived that Lua kernel project is dormant and was just waiting for any activity before starting a discussion here but Marc replied today to an ongoing discussion on developers@. Hence, my post. It's very important to decide on lua_Number type in kernel early because it affects nearly all Lua code and also because arithmetic operations is often a cause of security holes. In Lua, you can use your-own signed type for numbers. De-facto standard for userspace is double (note that LuaJIT supports _only_ double type) but ptrdiff_t is also recommented if for some reason double isn't an option. So, ptrdiff_t looks like a reasonable choice for the kernel but when I was developing sljit binding for Lua [1] and I was trying to make it robust against overflows for two different types (double and ptrdiff_t) on both 32bit and 64bit platforms, I ended up creating bindings [2] for arbitrary precision library. The problem is that there are three different ranges for integer arithmetic: 1. IEEE 754 double is a always -2^53-1 to 2^53-1 regardless of platform, 2. ptrdiff_t on 2s-complement 32bit platform is -2^31 to 2^31-1, 3. ptrdiff_t on 2s-complement 64bit platform is -2^63 to 2^63-1. Note also that min values of ptrdiff_t on 2s-complement platforms don't have a regular math semantic (e.g. negation produces identical value). It's very challenging to write a code that supports all three options without resorting to #ifdef (or Lua equivalent) or splitting a number into low and high halves. I'd like to propose that lua_Number in the kernel should always be int64_t (*). This type will guarantee regular arithmetic rules for the range (-2^53, 2^53) and for 32-bit signed integer range, in particular. When possible, Lua code can assume 32-bit arithmetic (without 2s-complement iggerularities). Some Lua libraries (for instance, BitOp) assume 32bit. Other libraries that we might port to the kernel are often written in assumption that lua_Number is double and int64_t should be safe to use too. When you need a full 64-bit range (signed or unsigned), arbitrary precision library can be used instead. [1] https://github.com/alnsn/luaSljit [2] https://github.com/alnsn/luaBn (*) I assume that int64_t type is available on all supported platforms but it can be emulated by a compiler, of course. Alex
Re: Importing lua(4), but where in the source tree?
Marc Balmer wrote: Sure. The full diff is at http://www.netbsd.org/~mbalmer/diffs/kernel_lua_010.diff and it's the files that the diff now places in sys/modules/lua/ that I think should better go to sys/dev/lua/ These placeholder files look hackish. Do you compile Lua with all extras like turned off? How many in-place changes do you anticipate? As you move Lua to the kernel space, dist location should be moved to sys/external. Alex
Re: Importing lua(4), but where in the source tree?
Alexander Nasonov wrote: These placeholder files look hackish. Do you compile Lua with all extras like turned off? How many in-place changes do you anticipate? ... and how many of these changes will reside outside if luaconf.h? As you move Lua to the kernel space, dist location should be moved to sys/external. Alex