Re: Panic on a -current from 13/12/2018
On 2018/12/17 1:09, Chavdar Ivanov wrote: I have no idea. As I said, it is running under VirtualBox on a Windows 10 host; I put the host in hibernation whilst the NetBSD guest is running. I tested today's -current on VirtualBox 5.2.22 on Windows 7 64bit (on Core i7-2600). I tried hybernate(shutdown ->hybernate(H)) a few times but I couldn't reproduce the problem yet. while (deltat > 0) { xtick = lapic_gettick(); if (lapic_broken_periodic && xtick == 0 && otick == 0) { lapic_initclocks(); xtick = lapic_gettick(); if (xtick == 0) panic("lapic timer stopped ticking"); <=== here! } If that panic is from this, lapic_broken_periodic must be true, but it's set only when the VM is KVM: /* * Apply workaround for broken periodic timer under KVM */ if (vm_guest == VM_GUEST_KVM) { lapic_broken_periodic = true; lapic_timecounter.tc_quality = -100; aprint_debug_dev(ci->ci_dev, "applying KVM timer workaround\n"); } Could you try to reproduce the problem and see the panic message? ci4ic4-panic-01.png has backtrace and it wiped out the panic message. Regards. Previously it survived this, using the Intel Desktop NIC emulation within VirtualBox, even my ssh connections (from the host to the guest) remained active. I switched the NIC emulation for the NetBSD guest to virtio-net, now it behaves as before, surviving a hibernation. There was a VirtualBox upgrade a few weeks ago, perhaps the problem is there. On Sun, 16 Dec 2018 at 15:55, SAITOH Masanobu wrote: Hi. On 2018/12/16 18:09, Chavdar Ivanov wrote: Repeated this morning. Happens when the host hibernates when the machine is running. The initial trace is slightly different, but the lines with wm_gmii are the same, so for now I will switch to a different NIC emulator. In your .png: vpanic() lapic_delay() wm_gmii_mdic_readreg() . . . There is no panic message itself, but I suspect it's: static void lapic_delay(unsigned int usec) { int32_t xtick, otick; int64_t deltat; /* XXX may want to be 64bit */ otick = lapic_gettick(); if (usec <= 0) return; if (usec <= 25) deltat = lapic_delaytab[usec]; else deltat = (lapic_frac_cycle_per_usec * usec) >> 32; while (deltat > 0) { xtick = lapic_gettick(); if (lapic_broken_periodic && xtick == 0 && otick == 0) { lapic_initclocks(); xtick = lapic_gettick(); if (xtick == 0) panic("lapic timer stopped ticking"); <=== here! } if (xtick > otick) deltat -= lapic_tval - (xtick - otick); else deltat -= otick - xtick; otick = xtick; x86_pause(); } } Why does it cause? And yes, it used to survive many hibernations of the hosts before. I only had to adjust the time after waking the host up. On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov wrote: Hi, On 8.99.27 AMD64 running under VirtualBox I got this morning the panic in http://ci4ic4.tx0.org/ci4ic4-panic-01.png I have the coredump, if it is of interest. I thought it might be useful, as it is apparently in the wm driver. Chavdar -- -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org) -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
Re: UVMHIST, pmap_get_physpage panic
Le 17/12/2018 à 08:10, Thomas Klausner a écrit : On Mon, Dec 17, 2018 at 08:06:36AM +0100, Maxime Villard wrote: Le 16/12/2018 à 09:09, Thomas Klausner a écrit : [ 16674.534547] panic: pmap_get_physpage: out of memory Well, out of memory means out of memory. KASAN consumes a bit more than 1/8 of the KVA. So if in normal times your system would use 8GB of ram, KASAN adds an extra ~1.1GB. So why doesn't it kill userland processes? I don't believe my kernel needs all 32GB of RAM. I don't know. In fact I don't understand how it is normal to get this: [ 16674.544550] pmap_growkernel() at netbsd:pmap_growkernel [ 16674.544550] kasan_shadow_map() at netbsd:kasan_shadow_map+0xff [ 16674.544550] pmap_growkernel() at netbsd:pmap_growkernel+0x283 pmap_growkernel() does mutex_enter(kpm->pm_lock); So if it's called recursively I think we have a problem. The call path is: pmap_growkernel -> kasan_shadow_map -> pmap_get_physpage -> [somewhere we need to allocate KVA] -> pmap_growkernel This problem is not KASAN-specific, because KASAN just duplicates the existing logic: pmap_growkernel -> pmap_alloc_level -> pmap_get_physpage Maybe KASAN makes the problem more visible. Do you also get out-of-memory when you disable UVMHIST?
Re: UVMHIST, pmap_get_physpage panic
On Mon, Dec 17, 2018 at 08:06:36AM +0100, Maxime Villard wrote: > Le 16/12/2018 à 09:09, Thomas Klausner a écrit : > > [ 16674.534547] panic: pmap_get_physpage: out of memory > > Well, out of memory means out of memory. KASAN consumes a bit more than > 1/8 of the KVA. So if in normal times your system would use 8GB of ram, > KASAN adds an extra ~1.1GB. So why doesn't it kill userland processes? I don't believe my kernel needs all 32GB of RAM. Thomas
Re: UVMHIST, pmap_get_physpage panic
Le 16/12/2018 à 09:09, Thomas Klausner a écrit : [ 16674.534547] panic: pmap_get_physpage: out of memory Well, out of memory means out of memory. KASAN consumes a bit more than 1/8 of the KVA. So if in normal times your system would use 8GB of ram, KASAN adds an extra ~1.1GB.
daily CVS update output
Updating src tree: P src/distrib/amd64/liveimage/emuimage/Makefile P src/doc/CHANGES P src/lib/libc/hash/md2/md2.3 P src/lib/librumphijack/hijack.c P src/lib/librumphijack/rumphijack.3 P src/lib/libtelnet/auth.c P src/sys/arch/arm/cortex/scu_reg.h P src/sys/arch/arm/imx/imx6_pcie.c P src/sys/arch/evbarm/nitrogen6/nitrogen6_machdep.c P src/sys/arch/x86/x86/identcpu.c P src/sys/arch/x86/x86/lapic.c P src/sys/kern/files.kern P src/sys/kern/subr_pool.c U src/sys/kern/subr_thmap.c P src/sys/netinet/dccp_usrreq.c P src/sys/netinet/tcp_usrreq.c P src/sys/netinet6/nd6.c P src/sys/rump/librump/rumpkern/Makefile.rumpkern P src/sys/sys/pool.h P src/sys/sys/socketvar.h U src/sys/sys/thmap.h P src/tests/fs/common/fstest_zfs.c P src/tests/fs/zfs/t_zpool.sh P src/tests/lib/libc/net/getaddrinfo/no_serv_v4.exp P src/usr.bin/make/parse.c P src/usr.bin/make/unit-tests/varquote.mk P src/usr.sbin/ndp/ndp.c P src/usr.sbin/sysinst/Makefile.inc P src/usr.sbin/sysinst/defs.h P src/usr.sbin/sysinst/main.c Updating xsrc tree: Killing core files: Updating release-7 src tree (netbsd-7): Updating release-7 xsrc tree (netbsd-7): Updating release-8 src tree (netbsd-8): U doc/CHANGES-8.1 P sys/arch/x86/pci/amdnb_misc.c P sys/arch/x86/pci/amdtemp.c Updating release-8 xsrc tree (netbsd-8): Updating file list: -rw-rw-r-- 1 srcmastr netbsd 52414655 Dec 17 03:09 ls-lRA.gz
Re: Panic on a -current from 13/12/2018
I have no idea. As I said, it is running under VirtualBox on a Windows 10 host; I put the host in hibernation whilst the NetBSD guest is running. Previously it survived this, using the Intel Desktop NIC emulation within VirtualBox, even my ssh connections (from the host to the guest) remained active. I switched the NIC emulation for the NetBSD guest to virtio-net, now it behaves as before, surviving a hibernation. There was a VirtualBox upgrade a few weeks ago, perhaps the problem is there. On Sun, 16 Dec 2018 at 15:55, SAITOH Masanobu wrote: > > Hi. > > On 2018/12/16 18:09, Chavdar Ivanov wrote: > > Repeated this morning. Happens when the host hibernates when the > > machine is running. The initial trace is slightly different, but the > > lines with wm_gmii are the same, so for now I will switch to a > > different NIC emulator. > > > > In your .png: > >vpanic() > >lapic_delay() > >wm_gmii_mdic_readreg() > >. > >. > >. > > There is no panic message itself, but I suspect it's: > > static void > > lapic_delay(unsigned int usec) > > { > > int32_t xtick, otick; > > int64_t deltat; /* XXX may want to be 64bit */ > > > > otick = lapic_gettick(); > > > > if (usec <= 0) > > return; > > if (usec <= 25) > > deltat = lapic_delaytab[usec]; > > else > > deltat = (lapic_frac_cycle_per_usec * usec) >> 32; > > > > while (deltat > 0) { > > xtick = lapic_gettick(); > > if (lapic_broken_periodic && xtick == 0 && otick == 0) { > > lapic_initclocks(); > > xtick = lapic_gettick(); > > if (xtick == 0) > > panic("lapic timer stopped ticking"); > > <=== here! > > } > > if (xtick > otick) > > deltat -= lapic_tval - (xtick - otick); > > else > > deltat -= otick - xtick; > > otick = xtick; > > > > x86_pause(); > > } > > } > > Why does it cause? > > > > And yes, it used to survive many hibernations of the hosts before. I > > only had to adjust the time after waking the host up. > > On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov wrote: > >> > >> Hi, > >> > >> On 8.99.27 AMD64 running under VirtualBox I got this morning the panic > >> in http://ci4ic4.tx0.org/ci4ic4-panic-01.png > >> > >> I have the coredump, if it is of interest. I thought it might be > >> useful, as it is apparently in the wm driver. > >> > >> Chavdar > >> -- > >> > > > > > > > > > -- > --- > SAITOH Masanobu (msai...@execsw.org > msai...@netbsd.org) --
Re: Panic on a -current from 13/12/2018
Hi. On 2018/12/16 18:09, Chavdar Ivanov wrote: > Repeated this morning. Happens when the host hibernates when the > machine is running. The initial trace is slightly different, but the > lines with wm_gmii are the same, so for now I will switch to a > different NIC emulator. > In your .png: >vpanic() >lapic_delay() >wm_gmii_mdic_readreg() >. >. >. There is no panic message itself, but I suspect it's: > static void > lapic_delay(unsigned int usec) > { > int32_t xtick, otick; > int64_t deltat; /* XXX may want to be 64bit */ > > otick = lapic_gettick(); > > if (usec <= 0) > return; > if (usec <= 25) > deltat = lapic_delaytab[usec]; > else > deltat = (lapic_frac_cycle_per_usec * usec) >> 32; > > while (deltat > 0) { > xtick = lapic_gettick(); > if (lapic_broken_periodic && xtick == 0 && otick == 0) { > lapic_initclocks(); > xtick = lapic_gettick(); > if (xtick == 0) > panic("lapic timer stopped ticking"); > <=== here! > } > if (xtick > otick) > deltat -= lapic_tval - (xtick - otick); > else > deltat -= otick - xtick; > otick = xtick; > > x86_pause(); > } > } Why does it cause? > And yes, it used to survive many hibernations of the hosts before. I > only had to adjust the time after waking the host up. > On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov wrote: >> >> Hi, >> >> On 8.99.27 AMD64 running under VirtualBox I got this morning the panic >> in http://ci4ic4.tx0.org/ci4ic4-panic-01.png >> >> I have the coredump, if it is of interest. I thought it might be >> useful, as it is apparently in the wm driver. >> >> Chavdar >> -- >> > > > -- --- SAITOH Masanobu (msai...@execsw.org msai...@netbsd.org)
Re: build.sh syspkgs
On Sun 16 Dec 2018 at 14:46:54 +0100, Rhialto wrote: > regpkg: ERROR: The metalog file > (/vol1/rhialto/destdir.amd64/METALOG.sanitised) does not > contain entries for the following files or directories > which should be part of the base-util-root syspkg: > ./bin/\133 > --- makesyspkgs --- > *** [makesyspkgs] Error code 128 > nbmake[1]: stopped in /mnt/vol1/rhialto/cvs/src/distrib/sets > 1 error From the cvs history, I see that the last struggle with this was in 2014. Here is a potential patch: Index: join.awk === RCS file: /cvsroot/src/distrib/sets/join.awk,v retrieving revision 1.6 diff -u -r1.6 join.awk --- join.awk24 Oct 2014 22:19:44 - 1.6 +++ join.awk16 Dec 2018 15:08:42 - @@ -30,6 +30,8 @@ # join.awk F1 F2 # Similar to join(1), this reads a list of words from F1 # and outputs lines in F2 with a first word that is in F1. +# For purposes of matching the first word, both instances are +# canonicalised via unvis(word); the version from F2 is printed. # Neither file needs to be sorted function unvis(s) \ @@ -79,17 +81,16 @@ exit 1 } while ( (getline < ARGV[1]) > 0) { - $1 = unvis($1) - words[$1] = $0 + f1 = unvis($1) + words[f1] = $0 } delete ARGV[1] } -// { $1 = unvis($1) } +{ f1 = unvis($1) } -$1 in words \ +f1 in words \ { - f1=$1 $1="" print words[f1] $0 } This join.awk script is used to take the file names that are in a PLIST-type file and select just those same lines from the METALOG file. I think that the issue was that the join.awk script would unvis() the file names in all cases. The PLIST would have /bin/[ (which is vis()ed to \133 at regpkg:810 to spec1) and the METALOG would have /bin/\133 too. The resulting output metalog-type file spec2 would contain the unvis()ed /bin/[ again. This would happen at cvs/src/distrib/sets/regppkg line 818. Then after that there would be a check at regpkg:836 which compares if spec1 and spec2 contain the same names, but this is not the case since one of them is unvis()ed. Hence the error message, which for our purposes is likely spurious. I fix the undesired unvis()ing in the first chunk of the diff. As it was, changing $1 (the first field of the line) changes $0, the line as a whole. Then the unvis()ed line from the METALOG is stored (and maybe later printed). Using a temporary to store the unvis()ed version, to be used as the key, preserves the original version. Likewise, in the remaining changes, I reinstate the unvis() call, but also use a temporary for cleanness. So this presumes that the METALOG file is properly vis()ed, but I think that is a fairly safe assumption. Comments? Ok to commit? -Olaf. -- ___ Olaf 'Rhialto' Seibert -- "What good is a Ring of Power \X/ rhialto/at/falu.nl -- if you're unable...to Speak." - Agent Elrond signature.asc Description: PGP signature
build.sh syspkgs
For the fun of it, I tried a "build.sh syspkgs", because I saw it as a subcommand of build.sh and I hadn't heard about it for a while. Is this actually supposed to work, or was this in the process of being removed but not completely? Anyway, it started out well but then stopped with this error: regpkg: WARNING: no comment for "base-x11-root" (using placeholder) regpkg: WARNING: no description for "base-x11-root" (re-using comment) Registered base-x11-root-8.99.27.0.20181215 Packaged base-x11-root-8.99.27.0.20181215.tgz Registered base-util-root-8.99.27.0.20181212 regpkg: ERROR: The metalog file (/vol1/rhialto/destdir.amd64/METALOG.sanitised) does not contain entries for the following files or directories which should be part of the base-util-root syspkg: ./bin/\133 --- makesyspkgs --- *** [makesyspkgs] Error code 128 nbmake[1]: stopped in /mnt/vol1/rhialto/cvs/src/distrib/sets 1 error Actuallu, the named METALOG.sanitised does contain a line for exactly that spelling: ./bin/\133 type=file uname=root gname=wheel mode=0555 size=18416 sha256=887c6f1483584be2d8a8247cccef74592807859f88a5ba1b193f43fe47d81132 However the file cvs/src/distrib/sets/lists/base/mi references the file like this: ./bin/[ base-util-root So it seem that somewhere along the line, the [ gets escaped for the error message but not for the actual check in the METALOG file. Or maybe the entry in METALOG should not be escaped? Anybody knows? -Olaf. -- ___ Olaf 'Rhialto' Seibert -- "What good is a Ring of Power \X/ rhialto/at/falu.nl -- if you're unable...to Speak." - Agent Elrond signature.asc Description: PGP signature
Re: Panic on a -current from 13/12/2018
Repeated this morning. Happens when the host hibernates when the machine is running. The initial trace is slightly different, but the lines with wm_gmii are the same, so for now I will switch to a different NIC emulator. And yes, it used to survive many hibernations of the hosts before. I only had to adjust the time after waking the host up. On Sat, 15 Dec 2018 at 10:59, Chavdar Ivanov wrote: > > Hi, > > On 8.99.27 AMD64 running under VirtualBox I got this morning the panic > in http://ci4ic4.tx0.org/ci4ic4-panic-01.png > > I have the coredump, if it is of interest. I thought it might be > useful, as it is apparently in the wm driver. > > Chavdar > -- > --
UVMHIST, pmap_get_physpage panic
Hi! I've been adding UVMHIST to my kernel config (now its GENERIC + KASAN + UVMHIST). I noticed that UVMHIST slowed the machine down a bit (not by a factor of two, but in the ballpark, for bulk builds). And I had two panics since. The machine is doing a bulk build (in a tmpfs) and some file I/O (via NFS mostly). The first panic was the usual SPL NOT LOWERED gibberish (attached). The second was: [ 16674.534547] panic: pmap_get_physpage: out of memory [ 16674.534547] cpu10: Begin traceback... [ 16674.534547] vpanic() at netbsd:vpanic+0x221 [ 16674.534547] snprintf() at netbsd:snprintf [ 16674.544550] pmap_growkernel() at netbsd:pmap_growkernel [ 16674.544550] kasan_shadow_map() at netbsd:kasan_shadow_map+0xff [ 16674.544550] pmap_growkernel() at netbsd:pmap_growkernel+0x283 [ 16674.554553] uvm_map_prepare() at netbsd:uvm_map_prepare+0xe14 [ 16674.554553] uvm_map() at netbsd:uvm_map+0xec [ 16674.564557] uvm_km_alloc() at netbsd:uvm_km_alloc+0x466 [ 16674.564557] pool_grow() at netbsd:pool_grow+0xbb [ 16674.574561] pool_catchup() at netbsd:pool_catchup+0x46 [ 16674.574561] pool_get() at netbsd:pool_get+0x7e1 [ 16674.584564] allocbuf() at netbsd:allocbuf+0x119 [ 16674.584564] getblk() at netbsd:getblk+0x185 [ 16674.584564] bio_doread() at netbsd:bio_doread+0x1b [ 16674.594568] bread() at netbsd:bread+0x18 [ 16674.594568] ffs_init_vnode() at netbsd:ffs_init_vnode+0x1cd [ 16674.604572] ffs_loadvnode() at netbsd:ffs_loadvnode+0xc8 [ 16674.604572] vcache_get() at netbsd:vcache_get+0x4f4 [ 16674.604572] ufs_lookup() at netbsd:ufs_lookup+0x1320 [ 16674.614575] VOP_LOOKUP() at netbsd:VOP_LOOKUP+0xb6 [ 16674.614575] lookup_once() at netbsd:lookup_once+0x34b [ 16674.624579] namei_tryemulroot() at netbsd:namei_tryemulroot+0x87d [ 16674.624579] namei() at netbsd:namei+0x65 [ 16674.634583] fd_nameiat.isra.2() at netbsd:fd_nameiat.isra.2+0xd1 [ 16674.634583] do_sys_statat() at netbsd:do_sys_statat+0x111 [ 16674.644586] sys___lstat50() at netbsd:sys___lstat50+0x85 [ 16674.644586] syscall() at netbsd:syscall+0x308 [ 16674.644586] --- syscall (number 441) --- [ 16674.644586] 761a961145aa: [ 16674.644586] cpu10: End traceback... I have a kernel core dump for this one. Is this a bug or do I need to get more RAM? Comments on UVMHIST performance cost and the first panic are also appreciated. Thanks, Thomas panic.gz Description: application/gunzip