Re: Status of 8.99.12
On Mon, 12 Feb 2018, Paul Goyette wrote: 2. Whenever I try to shutdown the system, I get a networking-related panic. The following is manually transcribed: trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282 cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80 curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack 0x80090a7e0c20 kernel: protection fault trap, code = 0 stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq 360(%rax),%rdi traceback: ip_setmoptions + 0x237 ip_rtloutput + 0x218 udp_ctloutput + 0x82 udp_ctloutput_wrapper + 0x2c sosetopt + 0x67 sys_setsockopt + 0x91 syscall + 0x1ed (syscall #105) This appears to be fixed by a patch provided by ozakir@ 3. After getting the above, as soon as I type a single character as command input to ddb(4), I get a LOCKDEBUG panic. I didn't yet transcribe the 40+ lines of output, but the backtrace clearly includes a couple entries from the xhci (USB-3) driver. Here's the console output from the LOCKDEBUG panic - all transcribed by hand, but hopefully without too many typos! Mutex error: mutex_vector_enter,523: spin lock held lock address: 0xe410e9d1d9a0 type: spin initialized: 0x802bac06 shared holds: 0 exclusive: 1 shares wanted: 0 exclusive: 0 current CPU: 11 last held: 11 curlwp: 0xe41fc09ad2c0 last held: 0xe41fc09ad2c0 last locked*: 0x802b81de unlocked: 0x80291179 owner field: 0x00010600 wait/spin:0/1 panic: LOCKDEBUG: Mutex error: mutex_vector_enter,523: spin lock held And the backtrace is vpanic+0x140 snprintf lockdebug_more mutex_enter+0x69d xhci_device_intr_start+0x125 usbd_start_next+0x65 xhci_soft_intr+0x49b xhci_poll+0x37 ukbd_cngetc+0x19 cngetc+0x34 db_readline+0x65 db_read_line+0x15 db_command_loop+0x84 db_trap+0xe3 kbd_trap+0xe2 trap (number 4) followed by the original backtrace (see above, starting with ip_setmoptions). 4. While the system is running, I have noticed that un-mounting nullfs mounts is very slow. Using mksandbox (from pkgsrc), I create a sandbox with about 22 null mounts. Creating/mounting is no problem, and everything runs as expected. However, when unmounting these nullfs, each one takes between 3 and 6 wall-seconds, during which the umount process is running at 100% of one CPU. Additionally, some of these umounts seem to grab the CPU with interrupts disabled, resulting in total stall of the machine for the duration (and, in X, cursor movement stalls/gets "jerky"). All the unmounts eventually complete successflly. +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++ +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: Status of 8.99.12
On Mon, 12 Feb 2018, Ryota Ozaki wrote: 2. Whenever I try to shutdown the system, I get a networking-related panic. The following is manually transcribed: trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282 cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80 curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack 0x80090a7e0c20 kernel: protection fault trap, code = 0 stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq 360(%rax),%rdi traceback: ip_setmoptions + 0x237 ip_rtloutput + 0x218 udp_ctloutput + 0x82 udp_ctloutput_wrapper + 0x2c sosetopt + 0x67 sys_setsockopt + 0x91 syscall + 0x1ed (syscall #105) Is the panic fixed by the following patch? diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c index 44d8032f387..2e5e346af91 100644 --- a/sys/netinet/ip_output.c +++ b/sys/netinet/ip_output.c @@ -1927,9 +1927,13 @@ ip_drop_membership(struct ip_moptions *imo, const struct sockopt *sopt) * Give up the multicast address record to which the * membership points. */ - IFNET_LOCK(imo->imo_membership[i]->inm_ifp); +{ + struct ifnet *inm_ifp = imo->imo_membership[i]->inm_ifp; + IFNET_LOCK(inm_ifp); in_delmulti(imo->imo_membership[i]); - IFNET_UNLOCK(imo->imo_membership[i]->inm_ifp); + /* ifp should not leave thanks to solock */ + IFNET_UNLOCK(inm_ifp); +} /* * Remove the gap in the membership array. Yes it appears to address the problem. Without this patch, the above crash was 100% reproducible (5 out of 5). With this patch applied (and no other changes) I have had 3 consecutive reboots without and problem! Thanks for the quick turn-around. +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Re: Status of 8.99.12
On Mon, Feb 12, 2018 at 9:48 AM, Paul Goyettewrote: > After an extended period of build breaks, I finally got a new release built > from sources updated on 2018-02-10 at 04:02:43 UTC > > I'm seeing several problems with this release that were not seen with my > previous installation (from last November). > > 1. Starting the gnucash program (from pkgsrc finance/gnucash) now takes >about 3 times as long as before. Even after successfully loading >the image (to get libraries etc into the file system cache) it take >more than three full minutes for the program to initialize. > > 2. Whenever I try to shutdown the system, I get a networking-related >panic. The following is manually transcribed: > > trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282 > cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80 > curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack > 0x80090a7e0c20 > kernel: protection fault trap, code = 0 > stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq > 360(%rax),%rdi > traceback: > ip_setmoptions + 0x237 > ip_rtloutput + 0x218 > udp_ctloutput + 0x82 > udp_ctloutput_wrapper + 0x2c > sosetopt + 0x67 > sys_setsockopt + 0x91 > syscall + 0x1ed (syscall #105) Is the panic fixed by the following patch? Thanks, ozaki-r diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c index 44d8032f387..2e5e346af91 100644 --- a/sys/netinet/ip_output.c +++ b/sys/netinet/ip_output.c @@ -1927,9 +1927,13 @@ ip_drop_membership(struct ip_moptions *imo, const struct sockopt *sopt) * Give up the multicast address record to which the * membership points. */ - IFNET_LOCK(imo->imo_membership[i]->inm_ifp); +{ + struct ifnet *inm_ifp = imo->imo_membership[i]->inm_ifp; + IFNET_LOCK(inm_ifp); in_delmulti(imo->imo_membership[i]); - IFNET_UNLOCK(imo->imo_membership[i]->inm_ifp); + /* ifp should not leave thanks to solock */ + IFNET_UNLOCK(inm_ifp); +} /* * Remove the gap in the membership array.
daily CVS update output
Updating src tree: P src/crypto/external/bsd/openssl/dist/crypto/idea/i_skey.c P src/distrib/sets/lists/comp/mi P src/distrib/sets/lists/tests/mi P src/external/gpl3/gcc/dist/gcc/config/i386/i386.c P src/external/gpl3/gcc/dist/gcc/cp/decl.c P src/external/gpl3/gcc.old/dist/gcc/config/i386/i386.c P src/lib/libc/time/ctime.3 P src/lib/libc/time/getdate.3 P src/lib/libc/yp/ypclnt.3 P src/libexec/ld.elf_so/rtld.c P src/share/man/man4/ahc.4 P src/share/man/man4/bluetooth.4 P src/share/man/man7/ascii.7 P src/share/man/man9/pcmcia.9 P src/share/man/man9/wsbell.9 P src/share/mk/bsd.own.mk P src/sys/arch/amd64/amd64/db_machdep.c P src/sys/arch/amd64/amd64/machdep.c P src/sys/arch/i386/i386/db_machdep.c P src/sys/arch/x86/conf/files.x86 P src/sys/arch/x86/x86/db_trace.c U src/sys/arch/x86/x86/svs.c P src/sys/external/bsd/drm2/dist/drm/nouveau/core/engine/device/nouveau_engine_device_nve0.c Updating xsrc tree: Killing core files: Updating release-6 src tree (netbsd-6): U doc/CHANGES-6.2 P sys/dist/pf/net/pf.c Updating release-6 xsrc tree (netbsd-6): Updating release-7 src tree (netbsd-7): U doc/CHANGES-7.2 P sys/dist/pf/net/pf.c Updating release-7 xsrc tree (netbsd-7): Updating release-8 src tree (netbsd-8): P distrib/sets/lists/base/shl.mi P distrib/sets/lists/comp/mi P distrib/sets/lists/comp/shl.mi P distrib/sets/lists/debug/mi P distrib/sets/lists/debug/shl.mi P distrib/sets/lists/man/mi P distrib/sets/lists/tests/mi U doc/CHANGES-8.0 P etc/mtree/NetBSD.dist.tests P external/gpl2/xcvs/dist/src/rsh-client.c P share/man/man4/Makefile P share/man/man4/ipsec.4 U share/man/man4/ipsecif.4 P sys/arch/amd64/conf/ALL P sys/arch/amd64/conf/GENERIC P sys/conf/files P sys/dev/sdmmc/sdmmc_mem.c P sys/dist/pf/net/pf.c P sys/external/bsd/drm2/dist/drm/nouveau/core/engine/device/nouveau_engine_device_nve0.c P sys/net/Makefile P sys/net/files.net P sys/net/if.c P sys/net/if.h P sys/net/if_gif.c U sys/net/if_ipsec.c U sys/net/if_ipsec.h P sys/net/if_l2tp.c P sys/net/if_types.h P sys/netinet/in.c P sys/netinet/in.h P sys/netinet/in_gif.c P sys/netinet/ip_var.h P sys/netinet6/in6.c P sys/netinet6/in6.h P sys/netinet6/in6_gif.c P sys/netinet6/ip6_var.h P sys/netipsec/Makefile P sys/netipsec/files.netipsec P sys/netipsec/ipsec.h U sys/netipsec/ipsecif.c U sys/netipsec/ipsecif.h P sys/netipsec/key.c P sys/netipsec/key.h P sys/rump/dev/lib/libucom/UCOM.ioconf P sys/rump/net/Makefile.rumpnetcomp U sys/rump/net/lib/libipsec/IPSEC.ioconf U sys/rump/net/lib/libipsec/Makefile U sys/rump/net/lib/libipsec/ipsec_component.c P tests/net/Makefile U tests/net/if_ipsec/Makefile U tests/net/if_ipsec/t_ipsec.sh P usr.sbin/ypserv/ypserv/ypserv_proc.c Updating release-8 xsrc tree (netbsd-8): Updating file list: -rw-rw-r-- 1 srcmastr netbsd 50375996 Feb 12 03:10 ls-lRA.gz
Status of 8.99.12
After an extended period of build breaks, I finally got a new release built from sources updated on 2018-02-10 at 04:02:43 UTC I'm seeing several problems with this release that were not seen with my previous installation (from last November). 1. Starting the gnucash program (from pkgsrc finance/gnucash) now takes about 3 times as long as before. Even after successfully loading the image (to get libraries etc into the file system cache) it take more than three full minutes for the program to initialize. 2. Whenever I try to shutdown the system, I get a networking-related panic. The following is manually transcribed: trap type 4 code 0 rip 0x802d3f75 cs 0x8 rflags 0x10282 cr2 0x77e0e931c020 ilevel 0x4 rsp 0x80090a7e3c80 curlwp 0xe4afbb6e8700 pid 926.1 lowest kstack 0x80090a7e0c20 kernel: protection fault trap, code = 0 stopped in 926.1 (avahi-daemon) at ip_setmoptions+0x237: movq 360(%rax),%rdi traceback: ip_setmoptions + 0x237 ip_rtloutput + 0x218 udp_ctloutput + 0x82 udp_ctloutput_wrapper + 0x2c sosetopt + 0x67 sys_setsockopt + 0x91 syscall + 0x1ed (syscall #105) 3. After getting the above, as soon as I type a single character as command input to ddb(4), I get a LOCKDEBUG panic. I didn't yet transcribe the 40+ lines of output, but the backtrace clearly includes a couple entries from the xhci (USB-3) driver. 4. While the system is running, I have noticed that un-mounting nullfs mounts is very slow. Using mksandbox (from pkgsrc), I create a sandbox with about 22 null mounts. Creating/mounting is no problem, and everything runs as expected. However, when unmounting these nullfs, each one takes between 3 and 6 wall-seconds, during which the umount process is running at 100% of one CPU. Additionally, some of these umounts seem to grab the CPU with interrupts disabled, resulting in total stall of the machine for the duration (and, in X, cursor movement stalls/gets "jerky"). All the unmounts eventually complete successflly. +--+--++ | Paul Goyette | PGP Key fingerprint: | E-mail addresses: | | (Retired)| FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com | | Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org | +--+--++
Automated report: NetBSD-current/i386 test failure
This is an automatically generated notice of new failures of the NetBSD test suite. The newly failing test cases are: crypto/libcrypto/t_certs:x509v3 crypto/libcrypto/t_ciphers:evp crypto/libcrypto/t_ciphers:idea crypto/libcrypto/t_hashes:sha crypto/libcrypto/t_libcrypto:lhash The above tests failed in each of the last 3 test runs, and passed in at least 27 consecutive runs before that. The following commits were made between the last successful test and the failed test: 2018.02.09.04.38.24 christos src/share/mk/bsd.own.mk,v 1.1032 2018.02.09.08.03.33 maxv src/sys/netinet/ip_mroute.c,v 1.154 2018.02.09.08.42.26 maxv src/sys/arch/amd64/amd64/vector.S,v 1.58 2018.02.09.08.54.11 maxv src/sys/arch/amd64/amd64/amd64_trap.S,v 1.24 2018.02.09.08.58.01 maxv src/sys/arch/x86/x86/fpu.c,v 1.28 2018.02.09.09.07.13 maxv src/sys/uvm/uvm_bio.c,v 1.92 2018.02.09.09.36.42 maxv src/sys/arch/amd64/amd64/db_interface.c,v 1.28 2018.02.09.09.36.42 maxv src/sys/arch/i386/i386/db_interface.c,v 1.77 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/Makefile,v 1.8 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/aes-586.S,v 1.7 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/aesni-x86.S,v 1.8 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bf-586.S,v 1.4 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bn-586.S,v 1.9 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/cast-586.S,v 1.5 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/cmll-x86.S,v 1.5 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/co-586.S,v 1.7 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/crypt586.S,v 1.4 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/des-586.S,v 1.6 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/ghash-x86.S,v 1.6 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/md5-586.S,v 1.7 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/rc4-586.S,v 1.6 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/rc5-586.S,v 1.4 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/rmd-586.S,v 1.7 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/sha1-586.S,v 1.7 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/sha256-586.S,v 1.6 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/sha512-586.S,v 1.6 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/uplink-x86.S,v 1.5 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/vpaes-x86.S,v 1.5 2018.02.09.13.25.41 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/x86cpuid.S,v 1.14 2018.02.09.13.35.45 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bn.inc,v 1.3 2018.02.09.13.35.45 christos src/crypto/external/bsd/openssl/lib/libcrypto/bn.inc,v 1.5 2018.02.09.13.37.16 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bf-686.S,v 1.4 2018.02.09.13.37.16 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/x86.S,v 1.7 2018.02.09.14.06.17 maxv src/sys/netinet/tcp_input.c,v 1.375 2018.02.09.15.24.35 tsutsui src/sys/arch/atari/pci/pci_machdep.c,v 1.56 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/Makefile,v 1.9 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/aes-586.S,v 1.8 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/aesni-x86.S,v 1.9 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/bn.inc,v 1.4 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/cast-586.S,v 1.6 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/chacha-x86.S,v 1.2 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/cmll-x86.S,v 1.6 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/crypt586.S,v 1.5 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/e_padlock-x86.S,v 1.2 2018.02.09.16.06.59 christos src/crypto/external/bsd/openssl/lib/libcrypto/arch/i386/ec.inc,v 1.1 2018.02.09.16.06.59 christos
Re: Lockups in a Ryzen 7 1800X ASUS Crosshair Hero VI system
kar...@netbsd.org (Frank Kardel) writes: >Lockups - the Ryzen machine is difficault as mostof the time I cannot >get into the debugger thus I cannot decide wether it is a hard >(CPU/system) lockup or just a software bug. >I have seen tstile related lockups on another 8.99.9 machine that ceases >network operation at the point and processes pile up on tstiles when >accessing the network. So at least one locking issue seems to be there. That could be independent of the CPU... after all, there are lots of changes in the network stack. I'm running continous bulk builds on that machine for a week now, no lockups or crashes. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."
Re: Lockups in a Ryzen 7 1800X ASUS Crosshair Hero VI system
On 02/06/18 13:16, m...@netbsd.org wrote: upon further reading it's probably not related but it bugs me that this patch/similar hack is still not in. Yes - I have been running with XSAVEOPT disable since mlelstx sent that observation. I still have Lockups - the Ryzen machine is difficault as mostof the time I cannot get into the debugger thus I cannot decide wether it is a hard (CPU/system) lockup or just a software bug. I have seen tstile related lockups on another 8.99.9 machine that ceases network operation at the point and processes pile up on tstiles when accessing the network. So at least one locking issue seems to be there. Frank