Re: Missing Drivers?
anyone? On 08/21/2024 9:40 am, Larry Rosenman wrote: Any chance of getting Bluetooth working on this box and/or any other missing pieces? dmesg.boot: https://www.lerctr.org/~ler/ryzen/dmesg.boot pciconf -lv: https://www.lerctr.org/~ler/ryzen/pciconf.txt Thanks1 -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010 signature.asc Description: OpenPGP digital signature
Missing Drivers?
Any chance of getting Bluetooth working on this box and/or any other missing pieces? dmesg.boot: https://www.lerctr.org/~ler/ryzen/dmesg.boot pciconf -lv: https://www.lerctr.org/~ler/ryzen/pciconf.txt Thanks1 -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010 signature.asc Description: OpenPGP digital signature
Re: RTL8125: wont stay up
On 08/20/2024 2:08 am, Michael Gmelin wrote: On 20. Aug 2024, at 04:15, Larry Rosenman wrote: shows up, doesn't ping -- ifconfig up brings it back for ~4 packets ifconfig_re0="DHCP" ifconfig_re0_ipv6="inet6 accept_rtadv" ipv6_activate_all_interfaces="AUTO" rtsold_enable="YES" 15-CURRENT as of just now, port of net/realtek-re-kmod as of now as well. Ideas Did you already try net/realtek-re-kmod198 ? -m I did *NOT*, but per suggestion from Alex Dupre, disabling csum offload fixes it for me. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010 -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010 signature.asc Description: OpenPGP digital signature
Re: RTL8125: wont stay up
On 08/20/2024 2:55 am, Alex Dupre wrote: On 20/08/2024 04:14, Larry Rosenman wrote: shows up, doesn't ping -- ifconfig up brings it back for ~4 packets ifconfig_re0="DHCP" ifconfig_re0_ipv6="inet6 accept_rtadv" ipv6_activate_all_interfaces="AUTO" rtsold_enable="YES" 15-CURRENT as of just now, port of net/realtek-re-kmod as of now as well. Ideas You might try disabling checksum offload and report the result: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275882#c44 disabling csum offload fixes it for me as well. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010 signature.asc Description: OpenPGP digital signature
RTL8125: wont stay up
shows up, doesn't ping -- ifconfig up brings it back for ~4 packets ifconfig_re0="DHCP" ifconfig_re0_ipv6="inet6 accept_rtadv" ipv6_activate_all_interfaces="AUTO" rtsold_enable="YES" 15-CURRENT as of just now, port of net/realtek-re-kmod as of now as well. Ideas -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010 signature.asc Description: OpenPGP digital signature
Re: build failure: clang.full
On 07/30/2024 9:25 am, Larry Rosenman wrote: On 07/30/2024 9:22 am, Ed Maste wrote: On Mon, 29 Jul 2024 at 19:54, Larry Rosenman wrote: I'm getting the following on an up2date checkout: Building /usr/obj/usr/src/amd64.amd64/usr.bin/clang/clang/clang.full ld: warning: /usr/obj/usr/src/amd64.amd64/lib/clang/libllvm/libllvm.a: archive member 'FaultMaps.o' is neither ET_REL nor LLVM bitcode This looks like you have a corrupt object in this archive. If you want to start on the path of determining the root cause you could try extracting FaultMaps.o (using ar or tar) and seeing what file(1) says about it. This happens even with a FRESH (I.E. empty) /usr/obj. It apparently got fixed, I just made it through a buildworld/buildkernel. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
Re: build failure: clang.full
On 07/30/2024 9:22 am, Ed Maste wrote: On Mon, 29 Jul 2024 at 19:54, Larry Rosenman wrote: I'm getting the following on an up2date checkout: Building /usr/obj/usr/src/amd64.amd64/usr.bin/clang/clang/clang.full ld: warning: /usr/obj/usr/src/amd64.amd64/lib/clang/libllvm/libllvm.a: archive member 'FaultMaps.o' is neither ET_REL nor LLVM bitcode This looks like you have a corrupt object in this archive. If you want to start on the path of determining the root cause you could try extracting FaultMaps.o (using ar or tar) and seeing what file(1) says about it. This happens even with a FRESH (I.E. empty) /usr/obj. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
build failure: clang.full
b/clang/llvm.build.mk /usr/src/share/mk/bsd.prog.mk /usr/src/share/mk/bsd.init.mk /usr/src/share/mk/local.init.mk /usr/src/share/mk/src.init.mk /usr/src/usr.bin/clang/clang/../Makefile.inc /usr/src/usr.bin/clang/clang/../../Makefile.inc /usr/src/share/mk/bsd.sanitizer.mk /usr/src/share/mk/bsd.libnames.mk /usr/src/share/mk/src.libnames.mk /usr/src/share/mk/bsd.nls.mk /usr/src/share/mk/bsd.confs.mk /usr/src/share/mk/bsd.files.mk /usr/src/share/mk/bsd.dirs.mk /usr/src/share/mk/bsd.incs.mk /usr/src/share/mk/bsd.links.mk /usr/src/share/mk/bsd.man.mk /usr/src/share/mk/bsd.dep.mk /usr/src/share/mk/bsd.clang-analyze.mk /usr/src/share/mk/bsd.obj.mk /usr/src/share/mk/bsd.subdir.mk /usr/src/share/mk/bsd.sys.mk /dev/null' .PATH='. /usr/src/usr.bin/clang/clang /usr/src/contrib/llvm-project/clang/tools/driver' make[2]: stopped making "all" in /usr/src make[2]: stopped making "all" in /usr/src make[2]: stopped making "all" in /usr/src make[2]: stopped making "all" in /usr/src 35.46 real 100.11 user16.43 sys make[1]: stopped making "buildworld" in /usr/src make: stopped making "buildworld buildkernel" in /usr/src ler in playbox in src🔒 on main [?] took 2m4s ❯ cd /usr/src/ && sudo git pull zsh: correct 'git' to '.git' [nyae]? n Already up to date. ler in playbox in src🔒 on main [?] ❯ -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 13425 Ranch Road 620 N, Apt 718, Austin, TX 78717-1010
Re: Kerberos doc needs an update
Can you gen a PR with a patch? I'm sure the doc folks would appreciate it. On 12/27/2022 6:34 pm, Rick Macklem wrote: Hi, I just set up a KDC, which is easy once you realize that Sec. 14.5 of the FreeBSD handbook is out of date. (I was a dummy and spent several hours installing stuff from ports before I realized it was all in the system, but the startup file in /etc/rc.d is called "kdc" and not "kerberos".) In 14.5.1, kerberos5_server_enable and kadmind5_server_enable have been renamed, although these old names still work. Further down in 14.5.1, it says "service kerberos start", which doens't work. It is now "service kdc start". (This was the one that sent me on a wild goose chase.;-) Maybe someone that can do so could patch the handbook? Thanks, rick -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
BUILD BREAK:
Building /usr/obj/usr/src/amd64.amd64/tests/sys/net/routing/test_rtsock_l3.debug --- all_subdir_tests/sys/kern --- --- subr_physmem_test --- --- subr_physmem_test.o --- In file included from /usr/src/tests/sys/kern/subr_physmem_test.c:34: /usr/obj/usr/src/amd64.amd64/tmp/usr/include/sys/physmem.h:57:1: error: unknown type name 'bool' bool physmem_excluded(vm_paddr_t pa, vm_size_t sz); ^ -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Build Break?
On 10/02/2022 11:44 am, Larry Rosenman wrote: On 10/02/2022 11:27 am, Alexander V. Chernikov wrote: 02.10.2022, 17:18, "Larry Rosenman" : On 10/02/2022 8:12 am, Alexander V. Chernikov wrote: On 1 Oct 2022, at 22:57, Larry Rosenman wrote: --- all_subdir_nfscommon --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/nfscommon/nfs_commonkrpc.o --- all_subdir_netgraph --- --- all_subdir_netgraph/deflate --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/deflate/offset.inc --- all_subdir_netgraph/device --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/device/i386 --- all_subdir_netgraph/echo --- ===> netgraph/echo (all) --- all_subdir_netlink --- --- netlink_io.o --- /usr/src/sys/netlink/netlink_io.c:146:2: error: implicit declaration of function 'mtx_lock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_LOCK(nlp); That's interesting. netlink_io.c includes sys/mutex.h which defines mutex_lock() / mutex_unlock(). Could you share the diff between GENERIC and LER-MINIMAL? I sent the diff in another message, but here is LER-MINIMAL. Thank you! So it's non-networking config. I'll make netlink build conditional on INET || INET6 today/tomorrow. I actually kldload a bunch of stuff. kld_list="aesni coretemp filemon linux ichsmb ichwd cpuctl cryptodev dtraceall i pmi " kld_list="$kld_list if_bridge bridgestp if_tuntap hwpmc tcp_rack mfip ioat" kld_list="$kld_list if_bce usb ukbd usb_quirk usb_template ums uhci xhci ehci oh ci" kld_list="$kld_list efirt nfscl nfscommon nfsd nfslockd nfssvc" kld_list="$kld_list ataintel geom_label" #kld_list="$kld_list geom_label" -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 also MINIMAL (which I INCLUDE) does have INET/INET6... PFA MINIMAL -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106# # MINIMAL -- Mostly Minimal kernel configuration file for FreeBSD/amd64 # # Many definitions of minimal are possible. The one this file follows is # GENERIC, minus all functionality that can be replaced by loading kernel # modules. # # Exceptions: # o While UFS is buildable as a module, the current module lacks # some features (ACL, GJOURNAL) that GENERIC includes. # o acpi as a module has been reported flakey and not well tested, so # is included in the kernel. # o (non-loaded) random is included due to uncertainty... # o Many networking things are included # # For now, please run changes to these list past i...@freebsd.org # # For more information on this file, please read the config(5) manual page, # and/or the handbook section on Kernel Configuration Files: # # https://docs.freebsd.org/en/books/handbook/kernelconfig/#kernelconfig-config # # The handbook is also available locally in /usr/share/doc/handbook # if you've installed the doc distribution, otherwise always see the # FreeBSD World Wide Web server (https://www.FreeBSD.org/) for the # latest information. # # An exhaustive list of options and more detailed explanations of the # device lines is also present in the ../../conf/NOTES and NOTES files. # If you are in doubt as to the purpose or necessity of a line, check first # in NOTES. # # $FreeBSD$ cpu HAMMER ident MINIMAL makeoptions DEBUG=-g# Build kernel with gdb(1) debug symbols makeoptions WITH_CTF=1 # Run ctfconvert(1) for DTrace support options SCHED_ULE # ULE scheduler options NUMA# Non-Uniform Memory Architecture support options PREEMPTION # Enable kernel thread preemption options INET# InterNETworking options INET6 # IPv6 communications protocols options TCP_OFFLOAD # TCP offload options SCTP_SUPPORT# Allow kldload of SCTP options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL# Enable gjournal-based UFS journaling options QUOTA # Enable disk quotas for UFS options MD_ROOT # MD is a potential root device options COMPAT_FREEBSD32# C
Re: Build Break?
On 10/02/2022 11:27 am, Alexander V. Chernikov wrote: 02.10.2022, 17:18, "Larry Rosenman" : On 10/02/2022 8:12 am, Alexander V. Chernikov wrote: On 1 Oct 2022, at 22:57, Larry Rosenman wrote: --- all_subdir_nfscommon --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/nfscommon/nfs_commonkrpc.o --- all_subdir_netgraph --- --- all_subdir_netgraph/deflate --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/deflate/offset.inc --- all_subdir_netgraph/device --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/device/i386 --- all_subdir_netgraph/echo --- ===> netgraph/echo (all) --- all_subdir_netlink --- --- netlink_io.o --- /usr/src/sys/netlink/netlink_io.c:146:2: error: implicit declaration of function 'mtx_lock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_LOCK(nlp); That's interesting. netlink_io.c includes sys/mutex.h which defines mutex_lock() / mutex_unlock(). Could you share the diff between GENERIC and LER-MINIMAL? I sent the diff in another message, but here is LER-MINIMAL. Thank you! So it's non-networking config. I'll make netlink build conditional on INET || INET6 today/tomorrow. I actually kldload a bunch of stuff. kld_list="aesni coretemp filemon linux ichsmb ichwd cpuctl cryptodev dtraceall i pmi " kld_list="$kld_list if_bridge bridgestp if_tuntap hwpmc tcp_rack mfip ioat" kld_list="$kld_list if_bce usb ukbd usb_quirk usb_template ums uhci xhci ehci oh ci" kld_list="$kld_list efirt nfscl nfscommon nfsd nfslockd nfssvc" kld_list="$kld_list ataintel geom_label" #kld_list="$kld_list geom_label" -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Build Break?
On 10/02/2022 8:12 am, Alexander V. Chernikov wrote: On 1 Oct 2022, at 22:57, Larry Rosenman wrote: --- all_subdir_nfscommon --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/nfscommon/nfs_commonkrpc.o --- all_subdir_netgraph --- --- all_subdir_netgraph/deflate --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/deflate/offset.inc --- all_subdir_netgraph/device --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/device/i386 --- all_subdir_netgraph/echo --- ===> netgraph/echo (all) --- all_subdir_netlink --- --- netlink_io.o --- /usr/src/sys/netlink/netlink_io.c:146:2: error: implicit declaration of function 'mtx_lock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_LOCK(nlp); That’s interesting. netlink_io.c includes sys/mutex.h which defines mutex_lock() / mutex_unlock(). Could you share the diff between GENERIC and LER-MINIMAL? I sent the diff in another message, but here is LER-MINIMAL. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 # LER-MINIMAL -- kernel config based on MINIMAL include MINIMAL ident LER-MINIMAL nooptions WITNESS # Enable checks to detect deadlocks and cycles nooptions WITNESS_SKIPSPIN# Don't run witness on spinlocks for speed options KDB_UNATTENDED #optionsDEBUG_MEMGUARD #optionsDEBUG_REDZONE makeoptions WITH_EXTRA_TCP_STACKS=1 options TCPHPTS device mfi options TCP_RFC7413 # Kernel dump features. options EKCD# Support for encrypted kernel dumps options GZIO# gzip-compressed kernel and user dumps options ZSTDIO # zstd-compressed kernel and user dumps options NETDUMP # netdump(4) client support # ipsec support options IPSEC_SUPPORT device crypto #netgraph debug options NETGRAPH_DEBUG #tcp ratelimit options RATELIMIT ## INVARIANTS options INVARIANT_SUPPORT #optionsINVARIANTS
Re: Build Break?
On 10/02/2022 8:12 am, Alexander V. Chernikov wrote: On 1 Oct 2022, at 22:57, Larry Rosenman wrote: --- all_subdir_nfscommon --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/nfscommon/nfs_commonkrpc.o --- all_subdir_netgraph --- --- all_subdir_netgraph/deflate --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/deflate/offset.inc --- all_subdir_netgraph/device --- Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/device/i386 --- all_subdir_netgraph/echo --- ===> netgraph/echo (all) --- all_subdir_netlink --- --- netlink_io.o --- /usr/src/sys/netlink/netlink_io.c:146:2: error: implicit declaration of function 'mtx_lock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_LOCK(nlp); That’s interesting. netlink_io.c includes sys/mutex.h which defines mutex_lock() / mutex_unlock(). Could you share the diff between GENERIC and LER-MINIMAL? attached. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 --- GENERIC 2022-08-18 14:44:35.576844000 -0500 +++ LER-MINIMAL 2022-10-02 10:46:41.308926000 -0500 @@ -1,401 +1,32 @@ -# -# GENERIC -- Generic kernel configuration file for FreeBSD/amd64 -# -# For more information on this file, please read the config(5) manual page, -# and/or the handbook section on Kernel Configuration Files: -# -#https://docs.freebsd.org/en/books/handbook/kernelconfig/#kernelconfig-config -# -# The handbook is also available locally in /usr/share/doc/handbook -# if you've installed the doc distribution, otherwise always see the -# FreeBSD World Wide Web server (https://www.FreeBSD.org/) for the -# latest information. -# -# An exhaustive list of options and more detailed explanations of the -# device lines is also present in the ../../conf/NOTES and NOTES files. -# If you are in doubt as to the purpose or necessity of a line, check first -# in NOTES. -# -# $FreeBSD$ +# LER-MINIMAL -- kernel config based on MINIMAL -cpu HAMMER -ident GENERIC +include MINIMAL +ident LER-MINIMAL -makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols -makeoptions WITH_CTF=1 # Run ctfconvert(1) for DTrace support - -options SCHED_ULE # ULE scheduler -options NUMA # Non-Uniform Memory Architecture support -options PREEMPTION # Enable kernel thread preemption -options VIMAGE # Subsystem virtualization, e.g. VNET -options INET # InterNETworking -options INET6 # IPv6 communications protocols -options IPSEC_SUPPORT # Allow kldload of ipsec and tcpmd5 -options ROUTE_MPATH # Multipath routing support -options FIB_ALGO # Modular fib lookups -options TCP_OFFLOAD # TCP offload -options TCP_BLACKBOX # Enhanced TCP event logging -options TCP_HHOOK # hhook(9) framework for TCP -options TCP_RFC7413 # TCP Fast Open -options SCTP_SUPPORT # Allow kldload of SCTP -options KERN_TLS # TLS transmit & receive offload -options FFS # Berkeley Fast Filesystem -options SOFTUPDATES # Enable FFS soft updates support -options UFS_ACL # Support for access control lists -options UFS_DIRHASH # Improve performance on big directories -options UFS_GJOURNAL # Enable gjournal-based UFS journaling -options QUOTA # Enable disk quotas for UFS -options MD_ROOT # MD is a potential root device -options NFSCL # Network Filesystem Client -options NFSD # Network Filesystem Server -options NFSLOCKD # Network Lock Manager -options NFS_ROOT # NFS usable as /, requires NFSCL -options MSDOSFS # MSDOS Filesystem -options CD9660 # ISO 9660 Filesystem -options PROCFS # Process filesystem (requires PSEUDOFS) -options PSEUDOFS # Pseudo-filesystem framework -options TMPFS # Efficient memory filesystem -options GEOM_RAID # Soft RAID functionality. -options GEOM_LABEL # Provides labelization -options EFIRT # EFI Runtime Services support -options COMPAT_FREEBSD32 # Compatible with i386 binaries -options COMPAT_FREEBSD4 # Compatible with FreeBSD4 -options COMPAT_FREEBSD5 # Compatible with FreeBSD5 -options COMPAT_FREEBSD6 # Compatible with FreeBSD6 -options COMPAT_FREEBSD7 # Compatible with FreeBSD7 -options COMPAT_FREEBSD9 # Compatible with FreeBSD9 -options COMPAT_FREEBSD10 # Compatible with FreeBSD10 -options COMPAT_FREEBSD11 # Compatible with FreeBSD11 -options COMPAT_FREEBSD12 # Compatible with FreeBSD12 -options COMPAT_FREEBSD13 # Compatible with FreeBSD13 -options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI -options KTRACE # ktrace(1) support -options STACK # stack(9) support -options SYSVSHM # SYSV-style shared memory -options SYSVMSG # SYSV-style message queues -options SYSVSEM # SYSV-style semaphores -options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions -options PRINTF_BUFR_SIZE=128 # Preve
Re: BOOT CRASH -- Current -CURRENT
On 10/01/2022 10:43 pm, Larry Rosenman wrote: On 10/01/2022 10:08 pm, Warner Losh wrote: On Sat, Oct 1, 2022 at 9:06 PM Larry Rosenman wrote: On 10/01/2022 10:04 pm, Warner Losh wrote: Do you have a /boot tarball that can be loaded in a VM that recreates the problem (along with a clean hash)? But before you try that, have you tried a completely clean rebuild of the kernel to preclude the possibility that something is somehow cross threaded? Warner On Sat, Oct 1, 2022 at 8:39 PM Larry Rosenman wrote: ❯ more info.11 Dump header from device: /dev/mfid0p3 Architecture: amd64 Architecture Version: 2 Dump Length: 126748815 Blocksize: 512 Compression: zstd Dumptime: 2022-10-01 21:26:40 -0500 Hostname: Magic: FreeBSD Kernel Dump Version String: FreeBSD 14.0-CURRENT #168 ler/freebsd-main-changes-n258354-6cdd871ebc4: Sat Oct 1 21:13:01 CDT 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL Panic String: page fault Dump Parity: 501115454 Bounds: 11 Dump Status: good I do have source and debug stuff, BUT kgdb croaks on me. I *CAN* give access to the machine. the console backtrace showed something about the kld load of dependencies. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 let me wipe /usr/obj, and rebuild everything (I *DO* use meta-mode). I've had fewer problems with it than non-meta mode, but this looks like a 'corruption' or 'cross threaded' crash I've chased in the past that went away with a rebuild. So it's better to be sure... Warner Still breaks -- did someone(tm) forget to make netlink a module? -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ❯ sudo kgdb -c vmcore.12 /mnt/usr/lib/debug/boot/kernel/kernel.debug GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /mnt/usr/lib/debug/boot/kernel/kernel.debug... Unread portion of the kernel message buffer: ---<>--- Copyright (c) 1992-2022 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 14.0-CURRENT #0 ler/freebsd-main-changes-n258354-6cdd871ebc4: Sat Oct 1 22:30:48 CDT 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c) VT(efifb): resolution 640x480 CPU: Intel(R) Xeon(R) CPU X5660 @ 2.80GHz (2793.16-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 Features=0xbfebfbff Features2=0x29ee3ff AMD Features=0x2c100800 AMD Features2=0x1 Structured Extended Features3=0x9c00 VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 137438953472 (131072 MB) avail memory = 133789515776 (127591 MB) CPU microcode: no matching update found Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 hardware threads random: unblocking device. ioapic1: MADT APIC ID 1 != hw id 0 ioapic0 irqs 0-23 ioapic1 irqs 32-55 Launching APs: 1 14 12 21 2 6 17 10 18 15 4 19 7 3 8 20 13 5 23 11 9 16 22 TCP_ratelimit: Is now initialized TCP Hpts created 24 swi interrupt threads and bound 24 to NUMA domains random: entropy device external interface kbd1 at kbdmux0 acpi0: acpi0: Power Button (fixed) apei0: on acpi0 cpu0: on acpi0 atrtc0: port 0x70-0x7f irq 8 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.00s Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x5f irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 hpet0: iomem 0xfed0-0xfed003ff on acpi0 Timecounte
Re: BOOT CRASH -- Current -CURRENT
On 10/01/2022 10:08 pm, Warner Losh wrote: On Sat, Oct 1, 2022 at 9:06 PM Larry Rosenman wrote: On 10/01/2022 10:04 pm, Warner Losh wrote: Do you have a /boot tarball that can be loaded in a VM that recreates the problem (along with a clean hash)? But before you try that, have you tried a completely clean rebuild of the kernel to preclude the possibility that something is somehow cross threaded? Warner On Sat, Oct 1, 2022 at 8:39 PM Larry Rosenman wrote: ❯ more info.11 Dump header from device: /dev/mfid0p3 Architecture: amd64 Architecture Version: 2 Dump Length: 126748815 Blocksize: 512 Compression: zstd Dumptime: 2022-10-01 21:26:40 -0500 Hostname: Magic: FreeBSD Kernel Dump Version String: FreeBSD 14.0-CURRENT #168 ler/freebsd-main-changes-n258354-6cdd871ebc4: Sat Oct 1 21:13:01 CDT 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL Panic String: page fault Dump Parity: 501115454 Bounds: 11 Dump Status: good I do have source and debug stuff, BUT kgdb croaks on me. I *CAN* give access to the machine. the console backtrace showed something about the kld load of dependencies. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 let me wipe /usr/obj, and rebuild everything (I *DO* use meta-mode). I've had fewer problems with it than non-meta mode, but this looks like a 'corruption' or 'cross threaded' crash I've chased in the past that went away with a rebuild. So it's better to be sure... Warner Still breaks -- did someone(tm) forget to make netlink a module? -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: BOOT CRASH -- Current -CURRENT
On 10/01/2022 10:04 pm, Warner Losh wrote: Do you have a /boot tarball that can be loaded in a VM that recreates the problem (along with a clean hash)? But before you try that, have you tried a completely clean rebuild of the kernel to preclude the possibility that something is somehow cross threaded? Warner On Sat, Oct 1, 2022 at 8:39 PM Larry Rosenman wrote: ❯ more info.11 Dump header from device: /dev/mfid0p3 Architecture: amd64 Architecture Version: 2 Dump Length: 126748815 Blocksize: 512 Compression: zstd Dumptime: 2022-10-01 21:26:40 -0500 Hostname: Magic: FreeBSD Kernel Dump Version String: FreeBSD 14.0-CURRENT #168 ler/freebsd-main-changes-n258354-6cdd871ebc4: Sat Oct 1 21:13:01 CDT 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL Panic String: page fault Dump Parity: 501115454 Bounds: 11 Dump Status: good I do have source and debug stuff, BUT kgdb croaks on me. I *CAN* give access to the machine. the console backtrace showed something about the kld load of dependencies. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 let me wipe /usr/obj, and rebuild everything (I *DO* use meta-mode). -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: BOOT CRASH -- Current -CURRENT
On 10/01/2022 9:39 pm, Larry Rosenman wrote: ❯ more info.11 Dump header from device: /dev/mfid0p3 Architecture: amd64 Architecture Version: 2 Dump Length: 126748815 Blocksize: 512 Compression: zstd Dumptime: 2022-10-01 21:26:40 -0500 Hostname: Magic: FreeBSD Kernel Dump Version String: FreeBSD 14.0-CURRENT #168 ler/freebsd-main-changes-n258354-6cdd871ebc4: Sat Oct 1 21:13:01 CDT 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL Panic String: page fault Dump Parity: 501115454 Bounds: 11 Dump Status: good I do have source and debug stuff, BUT kgdb croaks on me. I *CAN* give access to the machine. the console backtrace showed something about the kld load of dependencies. Here's the BT: ❯ sudo kgdb -c vmcore.11 /mnt/usr/lib/debug/boot/kernel/kernel.debug GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /mnt/usr/lib/debug/boot/kernel/kernel.debug... Unread portion of the kernel message buffer: ---<>--- Copyright (c) 1992-2022 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 14.0-CURRENT #168 ler/freebsd-main-changes-n258354-6cdd871ebc4: Sat Oct 1 21:13:01 CDT 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c) VT(efifb): resolution 640x480 CPU: Intel(R) Xeon(R) CPU X5660 @ 2.80GHz (2793.07-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 Features=0xbfebfbff Features2=0x29ee3ff AMD Features=0x2c100800 AMD Features2=0x1 Structured Extended Features3=0x9c00 VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 137438953472 (131072 MB) avail memory = 133789515776 (127591 MB) CPU microcode: no matching update found Event timer "LAPIC" quality 600 ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 hardware threads random: unblocking device. ioapic1: MADT APIC ID 1 != hw id 0 ioapic0 irqs 0-23 ioapic1 irqs 32-55 Launching APs: 1 8 7 5 2 12 15 17 14 20 3 18 13 4 19 10 22 11 6 9 16 23 21 TCP_ratelimit: Is now initialized TCP Hpts created 24 swi interrupt threads and bound 24 to NUMA domains random: entropy device external interface kbd1 at kbdmux0 acpi0: acpi0: Power Button (fixed) apei0: on acpi0 cpu0: on acpi0 atrtc0: port 0x70-0x7f irq 8 on acpi0 atrtc0: registered as a time-of-day clock, resolution 1.00s Event timer "RTC" frequency 32768 Hz quality 0 attimer0: port 0x40-0x5f irq 0 on acpi0 Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254" frequency 1193182 Hz quality 100 hpet0: iomem 0xfed0-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer "HPET" frequency 14318180 Hz quality 350 Event timer "HPET1" frequency 14318180 Hz quality 340 Event timer "HPET2" frequency 14318180 Hz quality 340 Event timer "HPET3" frequency 14318180 Hz quality 340 Timecounter "ACPI-fast" frequency 3579545 Hz quality 900 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) pci1: at device 0.1 (no driver attached) pcib2: at device 3.0 on pci0 pci2: on pcib2 pci2: at device 0.0 (no driver attached) pci2: at device 0.1 (no driver attached) pcib3: at device 4.0 on pci0 pci3: on pcib3 mfi0: port 0xfc00-0xfcff mem 0xdf1bc000-0xdf1b,0xdf1c-0xdf1f irq 33 at device 0.0 on pci3 mfi0: Using MSI mfi0: Megaraid SAS driver Ver 4.23 mfi0: FW MaxCmds = 1008, limiting to 128 mfi0: 55158 (717992596s/0x0020/info) - Shutdown command received from host pcib4: mfi0: 55159 (boot + 33s/0x0020/info) - Firmware initialization started (PCI I
BOOT CRASH -- Current -CURRENT
❯ more info.11 Dump header from device: /dev/mfid0p3 Architecture: amd64 Architecture Version: 2 Dump Length: 126748815 Blocksize: 512 Compression: zstd Dumptime: 2022-10-01 21:26:40 -0500 Hostname: Magic: FreeBSD Kernel Dump Version String: FreeBSD 14.0-CURRENT #168 ler/freebsd-main-changes-n258354-6cdd871ebc4: Sat Oct 1 21:13:01 CDT 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL Panic String: page fault Dump Parity: 501115454 Bounds: 11 Dump Status: good I do have source and debug stuff, BUT kgdb croaks on me. I *CAN* give access to the machine. the console backtrace showed something about the kld load of dependencies. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Build Break?
r.h:80:26: note: expanded from macro 'NLP_UNLOCK' #define NLP_UNLOCK(_nlp)mtx_unlock(&((_nlp)->nl_lock)) ^ /usr/src/sys/netlink/netlink_io.c:354:2: error: implicit declaration of function 'mtx_lock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_LOCK(nlp); ^ /usr/src/sys/netlink/netlink_var.h:79:25: note: expanded from macro 'NLP_LOCK' #define NLP_LOCK(_nlp) mtx_lock(&((_nlp)->nl_lock)) ^ /usr/src/sys/netlink/netlink_io.c:357:3: error: implicit declaration of function 'mtx_unlock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_UNLOCK(nlp); ^ /usr/src/sys/netlink/netlink_var.h:80:26: note: expanded from macro 'NLP_UNLOCK' #define NLP_UNLOCK(_nlp)mtx_unlock(&((_nlp)->nl_lock)) ^ /usr/src/sys/netlink/netlink_io.c:369:3: error: implicit declaration of function 'mtx_unlock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_UNLOCK(nlp); ^ /usr/src/sys/netlink/netlink_var.h:80:26: note: expanded from macro 'NLP_UNLOCK' #define NLP_UNLOCK(_nlp)mtx_unlock(&((_nlp)->nl_lock)) ^ /usr/src/sys/netlink/netlink_io.c:395:2: error: implicit declaration of function 'mtx_unlock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_UNLOCK(nlp); ^ /usr/src/sys/netlink/netlink_var.h:80:26: note: expanded from macro 'NLP_UNLOCK' #define NLP_UNLOCK(_nlp)mtx_unlock(&((_nlp)->nl_lock)) ^ /usr/src/sys/netlink/netlink_io.c:519:3: error: implicit declaration of function 'mtx_lock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_LOCK(nlp); ^ /usr/src/sys/netlink/netlink_var.h:79:25: note: expanded from macro 'NLP_LOCK' #define NLP_LOCK(_nlp) mtx_lock(&((_nlp)->nl_lock)) ^ /usr/src/sys/netlink/netlink_io.c:521:3: error: implicit declaration of function 'mtx_unlock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] NLP_UNLOCK(nlp); ^ /usr/src/sys/netlink/netlink_var.h:80:26: note: expanded from macro 'NLP_UNLOCK' #define NLP_UNLOCK(_nlp)mtx_unlock(&((_nlp)->nl_lock)) ^ 16 errors generated. --- all_subdir_netgraph --- --- all_subdir_netgraph/device --- --- i386 --- i386 -> /usr/src/sys/i386/include Building /usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netgraph/device/vnode_if_newproto.h --- all_subdir_netlink --- *** [netlink_io.o] Error code 1 make[4]: stopped in /usr/src/sys/modules/netlink .ERROR_TARGET='netlink_io.o' .ERROR_META_FILE='/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL/modules/usr/src/sys/modules/netlink/netlink_io.o.meta' .MAKE.LEVEL='4' MAKEFILE='' .MAKE.MODE='meta missing-filemon=yes missing-meta=yes silent=yes verbose' 5.79 real30.04 user 9.35 sys make[1]: stopped in /usr/src make: stopped in /usr/src ler in 🌐 borg in src🔒 on ler/freebsd-main-changes:main [⇡] on ☁️ (us-east-1) took 1m56s ❯ ler in 🌐 borg in src🔒 on ler/freebsd-main-changes:main [⇡] on ☁️ (us-east-1) ❯ -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Beadm can't create snapshot
On 08/28/2022 2:28 pm, Ryan Moeller wrote: On 8/17/22 12:16 PM, Ryan Moeller wrote: On 8/17/22 12:05 PM, Ryan Moeller wrote: On 8/17/22 10:35 AM, Thomas Laus wrote: I attempted to create a ZFS snapshot after upgrading this morning and received this error # beadm create n257443 cannot create 'zroot/ROOT/n257443': 'snapshots_changed' is readonly # This looks like a bug in beadm. It must be trying to set the snapshots_changed property when cloning the snapshot for the BE, but the property is of course readonly. -Ryan I took a closer look at what beadm is doing and this appears to be a bug in the property after all. beadm filters by source "local" or "received" and for "snapshots_changed" the source is "local" when it should be "-" like other readonly properties. We'll get this fixed ASAP. -Ryan Now fixed as of https://github.com/openzfs/zfs/commit/518b4876022eee58b14903da09b99c01b8caa754 That doesn't look right? It's about arc_c_max, and not properties? -Ryan My version info: 14.0-CURRENT FreeBSD 14.0-CURRENT #9 main-n257443-f7413197245: Wed Aug 17 08:15:27 EDT 2022 There was not any information in UPDATING about any required ZFS configuration changes required nor any ZFS flags that listed 'snapshots_changed' as something that needed a new value. There is actually a new snapshot created, but 'beadm list' does not show it and the boot menu does not have it listed. Tom -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Hangs in bacula / NFS? on recent Current
On 08/19/2022 10:36 am, Rick Macklem wrote: On 08/18/2022 9:49 am, Larry Rosenman wrote: I didn't get all my mail on my bacula backups today (they backup to NFS mounted TrueNAS). Also a df hangs. Here are procstat -kk's for all: ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ ps auxxxwww|grep bacula bacula 20670.0 0.0 63188 13652 - Is 11:30 0:17.49 /usr/local/sbin/bacula-sd -u bacula -g bacula -v -c /usr/local/etc/bacula/bacula-sd.conf root 20720.0 0.0 59280 31276 - Is 11:30 0:00.31 /usr/local/sbin/bacula-fd -u root -g wheel -v -c /usr/local/etc/bacula/bacula-fd.conf bacula 20750.0 0.0 86992 19352 - Is 11:30 0:56.95 /usr/local/sbin/bacula-dir -u bacula -g bacula -v -c /usr/local/etc/bacula/bacula-dir.conf postgres502410.0 0.1 285764 160244 - Is 23:05 0:00.38 postgres: bacula bacula [local] (postgres) postgres502440.0 0.1 298784 74448 - Ds 23:05 0:00.67 postgres: bacula bacula [local] (postgres) ler 665950.0 0.0 12888 2600 3 S+ 09:46 0:00.00 grep --color=auto bacula At the end, I'll list what options are needed for ps and all of its output is needed. See the end of the email.. ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ sudo procstat -kk 2067 PIDTID COMMTDNAME KSTACK 2067 100742 bacula-sd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _cv_wait_sig+0x137 kern_select+0x9fe sys_select+0x56 amd64_syscall+0x12e fast_syscall_common+0xf8 2067 101036 bacula-sd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2067 101038 bacula-sd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2067 124485 bacula-sd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x15c kern_poll_kfds+0x457 kern_poll+0x9f sys_poll+0x50 amd64_syscall+0x12e fast_syscall_common+0xf8 ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ sudo procstat -kk 2072 PIDTID COMMTDNAME KSTACK 2072 100677 bacula-fd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _cv_wait_sig+0x137 kern_select+0x9fe sys_select+0x56 amd64_syscall+0x12e fast_syscall_common+0xf8 2072 101039 bacula-fd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2072 101040 bacula-fd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2072 124490 bacula-fd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x15c kern_poll_kfds+0x457 kern_poll+0x9f sys_poll+0x50 amd64_syscall+0x12e fast_syscall_common+0xf8 ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ sudo procstat -kk 2075 PIDTID COMMTDNAME KSTACK 2075 101007 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _sleep+0x29b umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait+0x53 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2075 101041 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2075 101045 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _cv_wait_sig+0x137 kern_select+0x9fe sys_select+0x56 amd64_syscall+0x12e fast_syscall_common+0xf8 2075 101046 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2075 101047 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_
Re: Lots of port failures today?
On 08/19/2022 10:03 am, Tomoaki AOKI wrote: On Fri, 19 Aug 2022 09:06:11 -0400 Charlie Li wrote: Mateusz Guzik wrote: > On 8/18/22, Mateusz Guzik wrote: >> On 8/18/22, Larry Rosenman wrote: >>> https://home.lerctr.org:/build.html?mastername=live-host_ports&build=2022-08-18_13h12m51s >>> >>> circa 97ecdc00ac5 on main >>> Ideas? >>> >> >> try with 9ac6eda6c6a36db6bffa01be7faea24f8bb92a0f reverted >> > > I'm pretty sure it will be fixed with URL: > https://cgit.FreeBSD.org/src/commit/?id=545db925c3d5408e71e21432895770cd49fd2cf3 > Seems to be fixed with this commit, at least for graphics/jpeg-turbo, whose configure failed with something about platform not supporting SIMD. -- Charlie Li …nope, still don't have an exit line. And so as base /usr/bin/xz (through pipe) and ports lang/ruby30. The former caused x11/linux-nvidia-libs to fail on extract, and the latter caused ports-mgmt/portupgrade (including portsclean) to fail on start. Both are fixed at the commit. Thanks! and all my unexplained failures are fixed as well. Thanks, Mateusz! -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Lots of port failures today?
On 08/18/2022 4:25 pm, Mateusz Guzik wrote: On 8/18/22, Mateusz Guzik wrote: On 8/18/22, Larry Rosenman wrote: https://home.lerctr.org:/build.html?mastername=live-host_ports&build=2022-08-18_13h12m51s circa 97ecdc00ac5 on main Ideas? try with 9ac6eda6c6a36db6bffa01be7faea24f8bb92a0f reverted I'm pretty sure it will be fixed with URL: https://cgit.FreeBSD.org/src/commit/?id=545db925c3d5408e71e21432895770cd49fd2cf3 should I un-revert 9ac6eda6c6a36db6bffa01be7faea24f8bb92a0f and pick up a new pull? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Lots of port failures today?
https://home.lerctr.org:/build.html?mastername=live-host_ports&build=2022-08-18_13h12m51s circa 97ecdc00ac5 on main Ideas? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Hangs in bacula / NFS? on recent Current
On 08/18/2022 9:49 am, Larry Rosenman wrote: I didn't get all my mail on my bacula backups today (they backup to NFS mounted TrueNAS). Also a df hangs. Here are procstat -kk's for all: ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ ps auxxxwww|grep bacula bacula 20670.0 0.0 63188 13652 - Is 11:30 0:17.49 /usr/local/sbin/bacula-sd -u bacula -g bacula -v -c /usr/local/etc/bacula/bacula-sd.conf root 20720.0 0.0 59280 31276 - Is 11:30 0:00.31 /usr/local/sbin/bacula-fd -u root -g wheel -v -c /usr/local/etc/bacula/bacula-fd.conf bacula 20750.0 0.0 86992 19352 - Is 11:30 0:56.95 /usr/local/sbin/bacula-dir -u bacula -g bacula -v -c /usr/local/etc/bacula/bacula-dir.conf postgres502410.0 0.1 285764 160244 - Is 23:05 0:00.38 postgres: bacula bacula [local] (postgres) postgres502440.0 0.1 298784 74448 - Ds 23:05 0:00.67 postgres: bacula bacula [local] (postgres) ler 665950.0 0.0 12888 2600 3 S+ 09:46 0:00.00 grep --color=auto bacula ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ sudo procstat -kk 2067 PIDTID COMMTDNAME KSTACK 2067 100742 bacula-sd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _cv_wait_sig+0x137 kern_select+0x9fe sys_select+0x56 amd64_syscall+0x12e fast_syscall_common+0xf8 2067 101036 bacula-sd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2067 101038 bacula-sd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2067 124485 bacula-sd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x15c kern_poll_kfds+0x457 kern_poll+0x9f sys_poll+0x50 amd64_syscall+0x12e fast_syscall_common+0xf8 ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ sudo procstat -kk 2072 PIDTID COMMTDNAME KSTACK 2072 100677 bacula-fd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _cv_wait_sig+0x137 kern_select+0x9fe sys_select+0x56 amd64_syscall+0x12e fast_syscall_common+0xf8 2072 101039 bacula-fd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2072 101040 bacula-fd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2072 124490 bacula-fd - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x15c kern_poll_kfds+0x457 kern_poll+0x9f sys_poll+0x50 amd64_syscall+0x12e fast_syscall_common+0xf8 ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ sudo procstat -kk 2075 PIDTID COMMTDNAME KSTACK 2075 101007 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _sleep+0x29b umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait+0x53 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2075 101041 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2075 101045 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _cv_wait_sig+0x137 kern_select+0x9fe sys_select+0x56 amd64_syscall+0x12e fast_syscall_common+0xf8 2075 101046 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d umtxq_sleep+0x242 do_wait+0x26b __umtx_op_wait_uint_private+0x54 sys__umtx_op+0x7e amd64_syscall+0x12e fast_syscall_common+0xf8 2075 101047 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d kern_clock_nanosleep+0x1d1 sys_nanosleep+0x3b amd64_syscall+0x12e fast_syscall_common+0xf8 2075 124479 bacula-dir - mi_sw
Hangs in bacula / NFS? on recent Current
479 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_wait_sig+0x9 _cv_wait_sig+0x137 kern_poll_kfds+0x48c kern_poll+0x9f sys_poll+0x50 amd64_syscall+0x12e fast_syscall_common+0xf8 2075 124480 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d kern_clock_nanosleep+0x1d1 sys_nanosleep+0x3b amd64_syscall+0x12e fast_syscall_common+0xf8 2075 124481 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d kern_clock_nanosleep+0x1d1 sys_nanosleep+0x3b amd64_syscall+0x12e fast_syscall_common+0xf8 2075 124489 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _cv_timedwait_sig_sbt+0x15c kern_poll_kfds+0x457 kern_poll+0x9f sys_poll+0x50 amd64_syscall+0x12e fast_syscall_common+0xf8 2075 124506 bacula-dir - mi_switch+0x157 sleepq_switch+0x107 sleepq_catch_signals+0x266 sleepq_timedwait_sig+0x12 _sleep+0x27d kern_clock_nanosleep+0x1d1 sys_nanosleep+0x3b amd64_syscall+0x12e fast_syscall_common+0xf8 ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ sudo procstat -kk 66390 PIDTID COMMTDNAME KSTACK 66390 101514 df - mi_switch+0x157 sleepq_switch+0x107 sleepq_timedwait+0x4b _sleep+0x28e clnt_reconnect_call+0x809 newnfs_request+0xa95 nfscl_request+0x5a nfsrpc_statfs+0x19d nfs_statfs+0x148 vfs_statfs_sigdefer+0x2e kern_getfsstat+0x1f1 sys_getfsstat+0x22 amd64_syscall+0x12e fast_syscall_common+0xf8 ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ this was built yesterday: ❯ uname -a FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #142 ler/freebsd-main-changes-n257453-175a127a72f: Wed Aug 17 09:23:32 CDT 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 ler in 🌐 borg in ~ via C v14.0.5-clang on ☁️ (us-east-1) ❯ What else do we need? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Updating EFI boot loader results in boot hangup
boot off a memstick? On 08/12/2022 3:25 pm, Nuno Teixeira wrote: The problem is if boot is failing, how to mount and rename it? I'm looking for a way, if possible, to boot directly bkp boot64x in case of failure. I was hoping to find it in loader(8) or uefi(8)... Larry Rosenman escreveu no dia sexta, 12/08/2022 à(s) 21:09: I would assume just rename the bootx64.old to bootx64.efi and/or put it in a different directory that EFI can see On 08/12/2022 3:03 pm, Nuno Teixeira wrote: I'm searching without success to load a bkp loader in case of boot failure. Upgrade process willl be like: --- mount -t msdosfs /dev/nvd0p1 /mnt cp /mnt/efi/boot/bootx64.efi /mnt/efi/boot/bootx64.old cp /boot/loader.efi /mnt/efi/boot/bootx64.efi --- I can't find the right docs to load bootx64.old. Could you tell me what you did to solve your boot? Thanks Yasuhiro Kimura escreveu no dia sexta, 12/08/2022 à(s) 18:45: From: Nuno Teixeira Subject: Re: Updating EFI boot loader results in boot hangup Date: Fri, 12 Aug 2022 18:26:11 +0100 Hello Yasu, Does it needes to update boot loader everytime that we upgrade current? No, you need not. The only time that I updated was a month ago because of zfs upgrade and I need to practice how to boot loader bkp file :) I update boot loader everytime because I'd like to do it :-). And sometimes problem hits upon me like this time and I contribute to debugging base system :-):-). --- Yasuhiro Kimura -- Nuno Teixeira FreeBSD Committer (ports) -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Nuno Teixeira FreeBSD Committer (ports) -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Updating EFI boot loader results in boot hangup
I would assume just rename the bootx64.old to bootx64.efi and/or put it in a different directory that EFI can see On 08/12/2022 3:03 pm, Nuno Teixeira wrote: I'm searching without success to load a bkp loader in case of boot failure. Upgrade process willl be like: --- mount -t msdosfs /dev/nvd0p1 /mnt cp /mnt/efi/boot/bootx64.efi /mnt/efi/boot/bootx64.old cp /boot/loader.efi /mnt/efi/boot/bootx64.efi --- I can't find the right docs to load bootx64.old. Could you tell me what you did to solve your boot? Thanks Yasuhiro Kimura escreveu no dia sexta, 12/08/2022 à(s) 18:45: From: Nuno Teixeira Subject: Re: Updating EFI boot loader results in boot hangup Date: Fri, 12 Aug 2022 18:26:11 +0100 Hello Yasu, Does it needes to update boot loader everytime that we upgrade current? No, you need not. The only time that I updated was a month ago because of zfs upgrade and I need to practice how to boot loader bkp file :) I update boot loader everytime because I'd like to do it :-). And sometimes problem hits upon me like this time and I contribute to debugging base system :-):-). --- Yasuhiro Kimura -- Nuno Teixeira FreeBSD Committer (ports) -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: limits.conf/stacksize doesn't seem to work?
On 07/15/2022 5:32 pm, Mark Johnston wrote: On Fri, Jul 15, 2022 at 05:26:09PM -0500, Larry Rosenman wrote: On 07/15/2022 5:24 pm, Mark Johnston wrote: > On Fri, Jul 15, 2022 at 05:21:27PM -0500, Larry Rosenman wrote: >> On 07/15/2022 5:18 pm, Mark Johnston wrote: >> > On Fri, Jul 15, 2022 at 05:04:18PM -0500, Larry Rosenman wrote: >> >> I'm using the following kernel config: >> >> [...] >> >> and the following login.conf: >> >> [...] >> >> bacula_dir:\ >> >> :stacksize-max=68719476736:\ >> >> :stacksize-cur=68719476736:\ >> >> :tc=daemon: >> >> [...] >> >> I've updated my (ler) password entry to reference bacula_dir: >> >> ler::1001:1001:bacula_dir:0:0:Larry >> >> Rosenman:/home/ler:/usr/local/bin/zsh >> >> >> >> >> >> when I ssh in, the stacklimit is still: >> >> ❯ ulimit -H -s >> >> 2097152 >> > >> > What is the value of the kern.maxssiz sysctl on this system? >> > >> >> ler in 🌐 borg in sys/amd64/conf🔒 on ler/freebsd-main-changes:main on >> >> ☁️ (us-east-1) >> >> ❯ ulimit -S -s >> >> 2097152 >> >> >> >> ler in 🌐 borg in sys/amd64/conf🔒 on ler/freebsd-main-changes:main on >> >> ☁️ (us-east-1) >> >> ❯ >> >> >> >> Where does this number come from? What am I missing here? >> > >> > The stack limit cannot be set to an arbitrarily large number. It will >> > silently be clamped to maxssiz. >> >> ❯ sysctl kern.maxssiz >> kern.maxssiz: 2147483648 > > Then what you're seeing is expected. The kernel is clamping the stack > segment limit to 2GB. I assume this is the default for MAXSSIZ? and if I change that in the kernel config, it will allow bigger? Where is this default defined? The default value is platform dependent. On amd64 it's 512MB, so I'm not sure where your value is coming from. It's defined in a header. You can set it in the kernel configuration, or as a tunable or sysctl. ok, so I had (back when, heaven only knows) set it in /boot/loader.conf: kern.maxssiz="2147483648" thank you. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: limits.conf/stacksize doesn't seem to work?
On 07/15/2022 5:24 pm, Mark Johnston wrote: On Fri, Jul 15, 2022 at 05:21:27PM -0500, Larry Rosenman wrote: On 07/15/2022 5:18 pm, Mark Johnston wrote: > On Fri, Jul 15, 2022 at 05:04:18PM -0500, Larry Rosenman wrote: >> I'm using the following kernel config: >> [...] >> and the following login.conf: >> [...] >> bacula_dir:\ >>:stacksize-max=68719476736:\ >>:stacksize-cur=68719476736:\ >>:tc=daemon: >> [...] >> I've updated my (ler) password entry to reference bacula_dir: >> ler::1001:1001:bacula_dir:0:0:Larry >> Rosenman:/home/ler:/usr/local/bin/zsh >> >> >> when I ssh in, the stacklimit is still: >> ❯ ulimit -H -s >> 2097152 > > What is the value of the kern.maxssiz sysctl on this system? > >> ler in 🌐 borg in sys/amd64/conf🔒 on ler/freebsd-main-changes:main on >> ☁️ (us-east-1) >> ❯ ulimit -S -s >> 2097152 >> >> ler in 🌐 borg in sys/amd64/conf🔒 on ler/freebsd-main-changes:main on >> ☁️ (us-east-1) >> ❯ >> >> Where does this number come from? What am I missing here? > > The stack limit cannot be set to an arbitrarily large number. It will > silently be clamped to maxssiz. ❯ sysctl kern.maxssiz kern.maxssiz: 2147483648 Then what you're seeing is expected. The kernel is clamping the stack segment limit to 2GB. I assume this is the default for MAXSSIZ? and if I change that in the kernel config, it will allow bigger? Where is this default defined? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: limits.conf/stacksize doesn't seem to work?
On 07/15/2022 5:18 pm, Mark Johnston wrote: On Fri, Jul 15, 2022 at 05:04:18PM -0500, Larry Rosenman wrote: I'm using the following kernel config: [...] and the following login.conf: [...] bacula_dir:\ :stacksize-max=68719476736:\ :stacksize-cur=68719476736:\ :tc=daemon: [...] I've updated my (ler) password entry to reference bacula_dir: ler::1001:1001:bacula_dir:0:0:Larry Rosenman:/home/ler:/usr/local/bin/zsh when I ssh in, the stacklimit is still: ❯ ulimit -H -s 2097152 What is the value of the kern.maxssiz sysctl on this system? ler in 🌐 borg in sys/amd64/conf🔒 on ler/freebsd-main-changes:main on ☁️ (us-east-1) ❯ ulimit -S -s 2097152 ler in 🌐 borg in sys/amd64/conf🔒 on ler/freebsd-main-changes:main on ☁️ (us-east-1) ❯ Where does this number come from? What am I missing here? The stack limit cannot be set to an arbitrarily large number. It will silently be clamped to maxssiz. ❯ sysctl kern.maxssiz kern.maxssiz: 2147483648 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
limits.conf/stacksize doesn't seem to work?
:maxproc-cur=64:\ # :openfiles-cur=64:\ # :priority=0:\ # :requirehome@:\ # :umask=022:\ # :tc=auth-defaults: # # ## ## standard - standard user defaults ## #standard:\ # :copyright=/etc/COPYRIGHT:\ # :welcome=/var/run/motd:\ # :setenv=BLOCKSIZE=K:\ # :mail=/var/mail/$:\ # :path=~/bin /bin /usr/bin /usr/local/bin:\ # :manpath=/usr/share/man /usr/local/man:\ # :nologin=/var/run/nologin:\ # :cputime=1h30m:\ # :datasize=8M:\ # :vmemoryuse=100M:\ # :stacksize=2M:\ # :memorylocked=4M:\ # :memoryuse=8M:\ # :filesize=8M:\ # :coredumpsize=8M:\ # :openfiles=24:\ # :maxproc=32:\ # :priority=0:\ # :requirehome:\ # :passwordtime=90d:\ # :umask=002:\ # :ignoretime@:\ # :tc=default: # # ## ## users of X (needs more resources!) ## #xuser:\ # :manpath=/usr/share/man /usr/local/man:\ # :cputime=4h:\ # :datasize=12M:\ # :vmemoryuse=infinity:\ # :stacksize=4M:\ # :filesize=8M:\ # :memoryuse=16M:\ # :openfiles=32:\ # :maxproc=48:\ # :tc=standard: # # ## ## Staff users - few restrictions and allow login anytime ## #staff:\ # :ignorenologin:\ # :ignoretime:\ # :requirehome@:\ # :accounted@:\ # :path=~/bin /bin /sbin /usr/bin /usr/sbin /usr/local/bin /usr/local/sbin:\ # :umask=022:\ # :tc=standard: # # ## ## root - fallback for root logins ## #root:\ # :path=~/bin /bin /sbin /usr/bin /usr/sbin /usr/local/bin /usr/local/sbin:\ # :cputime=infinity:\ # :datasize=infinity:\ # :stacksize=infinity:\ # :memorylocked=infinity:\ # :memoryuse=infinity:\ # :filesize=infinity:\ # :coredumpsize=infinity:\ # :openfiles=infinity:\ # :maxproc=infinity:\ # :memoryuse-cur=32M:\ # :maxproc-cur=64:\ # :openfiles-cur=1024:\ # :priority=0:\ # :requirehome@:\ # :umask=022:\ # :tc=auth-root-defaults: # # ## ## Settings used by /etc/rc ## #daemon:\ # :coredumpsize@:\ # :coredumpsize-cur=0:\ # :datasize=infinity:\ # :datasize-cur@:\ # :maxproc=512:\ # :maxproc-cur@:\ # :memoryuse-cur=64M:\ # :memorylocked-cur=64M:\ # :openfiles=1024:\ # :openfiles-cur@:\ # :stacksize=16M:\ # :stacksize-cur@:\ # :tc=default: # # ## ## Settings used by news subsystem ## #news:\ # :path=/usr/local/news/bin /bin /sbin /usr/bin /usr/sbin /usr/local/bin /usr/local/sbin:\ # :cputime=infinity:\ # :filesize=128M:\ # :datasize-cur=64M:\ # :stacksize-cur=32M:\ # :coredumpsize-cur=0:\ # :maxmemorysize-cur=128M:\ # :memorylocked=32M:\ # :maxproc=128:\ # :openfiles=256:\ # :tc=default: # # ## ## The dialer class should be used for a dialup PPP account ## Welcome messages/news suppressed ## #dialer:\ # :hushlogin:\ # :requirehome@:\ # :cputime=unlimited:\ # :filesize=2M:\ # :datasize=2M:\ # :stacksize=4M:\ # :coredumpsize=0:\ # :memoryuse=4M:\ # :memorylocked=1M:\ # :maxproc=16:\ # :openfiles=32:\ # :tc=standard: # # ## ## Site full-time 24/7 PPP connection ## - no time accounting, restricted to access via dialin lines ## #site:\ # :ignoretime:\ # :passwordtime@:\ # :refreshtime@:\ # :refreshperiod@:\ # :sessionlimit@:\ # :autodelete@:\ # :expireperiod@:\ # :graceexpire@:\ # :gracetime@:\ # :warnexpire@:\ # :warnpassword@:\ # :idletime@:\ # :sessiontime@:\ # :daytime@:\ # :weektime@:\ # :monthtime@:\ # :warntime@:\ # :accounted@:\ # :tc=dialer:\ # :tc=staff: # # ## ## Example standard accounting entries for subscriber levels ## # #subscriber|Subscribers:\ # :accounted:\ # :refreshtime=180d:\ # :refreshperiod@:\ # :sessionlimit@:\ # :autodelete=30d:\ # :expireperiod=180d:\ # :graceexpire=7d:\ # :gracetime=10m:\ # :warnexpire=7d:\ # :warnpassword=7d:\ # :idletime=30m:\ # :sessiontime=4h:\ # :daytime=6h:\ # :weektime=40h:\ # :monthtime=120h:\ # :warntime=4h:\ # :tc=standard: # # ## ## Subscriber accounts. These accounts have their login times ## accounted and have access limits applied. ## #subppp|PPP Subscriber Accounts:\ # :tc=dialer:\ # :tc=subscriber: # # #subshell|Shell Subscriber Accounts:\ # :tc=subscriber: # ## ## If you want some of the accounts to use traditional UNIX DES based ## password hashes. ## #des_users:\ # :passwd_format=des:\ # :tc=default: ler in 🌐 borg in sys/amd64/conf🔒 on ler/freebsd-main-changes:main on ☁️ (us-east-1) ❯ I've updated my (ler) password entry to reference bacula_dir: ler::1001:1001:bacula_dir:0:0:Larry Rosenman:/home/ler:/usr/loc
Re: SAS/SATA controllers: 8 port that support 8TB Drives
On 06/18/2022 8:30 am, Michael Gmelin wrote: That certainly sounds promising. Best Michael got the new controllers, and no sweat -- saw all 8TB on the drives (modulo one bad drive -- seller is replacing). thanks all. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: MCE: Does this look possibly like a slot issue?
On 06/21/2022 1:23 pm, Chris wrote: On 2022-06-20 17:23, Larry Rosenman wrote: I'm seeing them constantly: FWIW it looks like a sync(ing) problem between your RAM && CPU cache. Are are your clocks set correctly for your CPU && RAM? Is your CPU too hot? Is the CPU cache ECC? root@freenas[~]# mcelog --dmi [snip] Hrm. IIRC all the BIOS parameters are default (I could be mistaken). It's a SuperMicro X8DTN+ motherboard with: CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2400.22-MHz K8-class CPU) Origin="GenuineIntel" Id=0x206c2 Family=0x6 Model=0x2c Stepping=2 Features=0xbfebfbff Features2=0x29ee3ff AMD Features=0x2c100800 AMD Features2=0x1 Structured Extended Features3=0x9c00 VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID TSC: P-state invariant, performance statistics real memory = 77309411328 (73728 MB) avail memory = 75186962432 (71703 MB) (2 packages, 6 core, 12-threads each) and 18 4GB sticks. this ONE slot seems to be a problem. How would you recommend looking for an issue modulo pulling the 2 cpu packages? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: MCE: Does this look possibly like a slot issue?
Hardware event. This is not a software error. MCE 6 CPU 12 BANK 8 TSC 5f6cbe9ef2bc MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 20 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 7 CPU 14 BANK 8 TSC 64ba63c66e52 MISC ac29890200041181 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 8 CPU 14 BANK 8 TSC 659878c17622 MISC ac29890200040282 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 9 CPU 14 BANK 8 TSC 66b71c1dccf6 MISC ac29890200040183 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 83 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 10 CPU 14 BANK 8 TSC 6be0988610ce MISC ac29890200040682 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 82 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE Hardware event. This is not a software error. MCE 11 CPU 14 BANK 8 TSC 6be0995926f8 MISC ac29890200044000 ADDR ee2f6e800 TIME 1655827951 Tue Jun 21 11:12:31 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 22 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Nanya Serial Number: 642264CD Asset Tag: Part Number: NT4GC72B4NA1NL-BE root@freenas[~]# On 06/21/2022 11:06 am, Rodney W. Grimes wrote: Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using all the memory. Depending on the results of that one thing that is often overlooked when trying to trouble shoot memory systems in modern Intel systems is the fact that the DIMM now talks directly to the CPU chip that has the memory controller built into it. THUS these "slot" related ECC/Parity/blowup errors can actually be the CPU and/or the CPU socket and/or the seating of the CPU in the socket. So if the error sticks with the DIMM slot and not the DIMM module the next thing I would try would be a CPU chip reseat, including a good inspection of the socket for for a damaged pin. Also look at the lands on the CPU chip itself, and you can even try swaping CPU chips to see if it follows the CPU or the socket, much as you do with a DIMM. On 06/20/2022 7:59 pm, Larry Rosenman wrote: > Sup
Re: MCE: Does this look possibly like a slot issue?
Swapped 2 DIMMS, now we wait for the ZFS ARC to fill and start using all the memory. On 06/20/2022 7:59 pm, Larry Rosenman wrote: SuperMicro X8DTN+ 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2400.20-MHz K8-class CPU) I'll bring it down and swap DIMMS around On 06/20/2022 7:57 pm, Ultima wrote: Hey Larry, One red flag I am seeing is that the error is being produced on the same CPU/bank with each error you have provided so far. Can you try and follow my original recommendation and swap currently installed DIMM with the problem DIMM slot and see if anything changes? Can you also provide the motherboard model? Also, do you have multiple CPUs installed in this system? Best regards, Richard Gallamore On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman wrote: Yes and Yes. On 06/20/2022 7:37 pm, Ultima wrote: Are you sure that the module you replaced it with was good? Are you sure you replaced the correct module? Best regards, Richard Gallamore On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman wrote: I'm seeing them constantly: root@freenas[~]# mcelog --dmi Hardware event. This is not a software error. MCE 0 CPU 22 BANK 8 TSC 20aab486464a MISC ac29890200046444 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 44 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 WARNING: SMBIOS data is often unreliable. Take with a grain of salt! DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 1 CPU 22 BANK 8 TSC 296dfcc82582 MISC ac29890200041381 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 2 CPU 22 BANK 8 TSC 2a5604a6a070 MISC ac29890200044281 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory ECC error occurred during scrub Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 884200cf MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 3 CPU 22 BANK 8 TSC 31e141418eb8 MISC ac29890200046a4a ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 4a Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 4 CPU 22 BANK 8 TSC 3a014afee106 MISC ac29890200046646 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 46 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 5 CPU 22 BANK 8 TSC 41d1dbef1a6a MISC ac29890200046141 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 41 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyun
Re: MCE: Does this look possibly like a slot issue?
SuperMicro X8DTN+ 2 Processors, 6-core/12-Thread. CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (2400.20-MHz K8-class CPU) I'll bring it down and swap DIMMS around On 06/20/2022 7:57 pm, Ultima wrote: Hey Larry, One red flag I am seeing is that the error is being produced on the same CPU/bank with each error you have provided so far. Can you try and follow my original recommendation and swap currently installed DIMM with the problem DIMM slot and see if anything changes? Can you also provide the motherboard model? Also, do you have multiple CPUs installed in this system? Best regards, Richard Gallamore On Mon, Jun 20, 2022 at 5:41 PM Larry Rosenman wrote: Yes and Yes. On 06/20/2022 7:37 pm, Ultima wrote: Are you sure that the module you replaced it with was good? Are you sure you replaced the correct module? Best regards, Richard Gallamore On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman wrote: I'm seeing them constantly: root@freenas[~]# mcelog --dmi Hardware event. This is not a software error. MCE 0 CPU 22 BANK 8 TSC 20aab486464a MISC ac29890200046444 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 44 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 WARNING: SMBIOS data is often unreliable. Take with a grain of salt! DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 1 CPU 22 BANK 8 TSC 296dfcc82582 MISC ac29890200041381 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 2 CPU 22 BANK 8 TSC 2a5604a6a070 MISC ac29890200044281 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory ECC error occurred during scrub Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 884200cf MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 3 CPU 22 BANK 8 TSC 31e141418eb8 MISC ac29890200046a4a ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 4a Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 4 CPU 22 BANK 8 TSC 3a014afee106 MISC ac29890200046646 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 46 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 5 CPU 22 BANK 8 TSC 41d1dbef1a6a MISC ac29890200046141 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 41 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 6 CPU 22
Re: MCE: Does this look possibly like a slot issue?
Yes and Yes. On 06/20/2022 7:37 pm, Ultima wrote: Are you sure that the module you replaced it with was good? Are you sure you replaced the correct module? Best regards, Richard Gallamore On Mon, Jun 20, 2022 at 5:23 PM Larry Rosenman wrote: I'm seeing them constantly: root@freenas[~]# mcelog --dmi Hardware event. This is not a software error. MCE 0 CPU 22 BANK 8 TSC 20aab486464a MISC ac29890200046444 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 44 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 WARNING: SMBIOS data is often unreliable. Take with a grain of salt! DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 1 CPU 22 BANK 8 TSC 296dfcc82582 MISC ac29890200041381 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 2 CPU 22 BANK 8 TSC 2a5604a6a070 MISC ac29890200044281 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory ECC error occurred during scrub Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 81 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 884200cf MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 Hardware event. This is not a software error. MCE 3 CPU 22 BANK 8 TSC 31e141418eb8 MISC ac29890200046a4a ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 4a Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 4 CPU 22 BANK 8 TSC 3a014afee106 MISC ac29890200046646 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 46 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 5 CPU 22 BANK 8 TSC 41d1dbef1a6a MISC ac29890200046141 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 41 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 6 CPU 22 BANK 8 TSC 4a1b1ecef446 MISC ac29890200046a4a ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 4a Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 7 CPU 22 BANK 8 TSC 527bc27db776
Re: MCE: Does this look possibly like a slot issue?
error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 Hardware event. This is not a software error. MCE 8 CPU 22 BANK 8 TSC 5aa4ecdd795a MISC ac29890200046646 ADDR ee2f6e800 TIME 1655770989 Mon Jun 20 19:23:09 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 46 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 root@freenas[~]# and I replaced the DIMM yesterday :( On 06/20/2022 7:19 pm, Ultima wrote: Hey Larry, It is possible it's the motherboard itself, but it's rare. The way I would determine this is to swap the DIMM module with another populated slot on the motherboard and see if the error migrated to the new slot or not. Also, this error doesn't necessarily mean there is a problem that needs to be addressed. If you have been running the system for many months and you see ECC errors a handful of times, it can probably be safely ignored. Best regards, Richard Gallamore On Mon, Jun 20, 2022 at 3:14 PM Larry Rosenman wrote: I've gotten a BUNCH of these on my TrueNAS server. I've replaced this DIMM a couple of times, and still the MCE's continue. Is it possible it's Motherboard slot issue? Hardware event. This is not a software error. MCE 8 CPU 22 BANK 8 TSC 5aa4ecdd795a MISC ac29890200046646 ADDR ee2f6e800 TIME 1655762472 Mon Jun 20 17:01:12 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 46 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
MCE: Does this look possibly like a slot issue?
I've gotten a BUNCH of these on my TrueNAS server. I've replaced this DIMM a couple of times, and still the MCE's continue. Is it possible it's Motherboard slot issue? Hardware event. This is not a software error. MCE 8 CPU 22 BANK 8 TSC 5aa4ecdd795a MISC ac29890200046646 ADDR ee2f6e800 TIME 1655762472 Mon Jun 20 17:01:12 2022 MCG status: Memory read ECC error Memory corrected error count (CORE_ERR_CNT): 1 Memory transaction Tracker ID (RTId): 46 Memory DIMM ID of error: 0 Memory channel ID of error: 1 Memory ECC syndrome: ac298902 STATUS 8c41009f MCGSTATUS 0 MCGCAP 1c09 APICID 34 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 Step 2 DDR3 DIMM 800 Mhz Other Width 72 Data Width 64 Size 4 GB Device Locator: P2-DIMM2C Bank Locator: BANK14 Manufacturer: Hyundai Serial Number: 40F3C20F Asset Tag: Part Number: HMT151R7BFR4C-H9 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: SAS/SATA controllers: 8 port that support 8TB Drives
On 06/18/2022 8:30 am, Michael Gmelin wrote: On 18. Jun 2022, at 15:10, Larry Rosenman wrote: On 06/18/2022 3:54 am, Michael Gmelin wrote: [SNIP] Subvendor is Fujitsu Siemens - so I guess this is integrated into a system by them. Seems like flashing the 2108 to an IT firmware isn't an option (based on what I found online). You could check if there are firmware updates available though. How did you configure the drives in the megaraid utility (ctrl-h after boot)? Did you create a RAID-0 for each disk? And what capacity is shown in there? Based on [0], 2108 based controllers don't support 4kn. IT mode would help (true passthrough), but as written above, I don't think it's an option for this model. -m [0] https://bitdeals.tech/blogs/news/4kn-lsi-compatibility-list as I said earlier in the thread, I've bought 2 of these: https://www.ebay.com/itm/194910024856 which if I'm reading that chart right should work with the 4Kn drives. That certainly sounds promising. Best Michael -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 And I realized I didn't answer the question about how stuff was configured, each disk is Raid0. the current pool is 10x3T disks, and I'm adding 6x8T since the pool is 70% full (bacula backups, Time Machine Backups, random other stuff). -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: SAS/SATA controllers: 8 port that support 8TB Drives
On 06/18/2022 3:54 am, Michael Gmelin wrote: [SNIP] Subvendor is Fujitsu Siemens - so I guess this is integrated into a system by them. Seems like flashing the 2108 to an IT firmware isn't an option (based on what I found online). You could check if there are firmware updates available though. How did you configure the drives in the megaraid utility (ctrl-h after boot)? Did you create a RAID-0 for each disk? And what capacity is shown in there? Based on [0], 2108 based controllers don't support 4kn. IT mode would help (true passthrough), but as written above, I don't think it's an option for this model. -m [0] https://bitdeals.tech/blogs/news/4kn-lsi-compatibility-list as I said earlier in the thread, I've bought 2 of these: https://www.ebay.com/itm/194910024856 which if I'm reading that chart right should work with the 4Kn drives. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: SAS/SATA controllers: 8 port that support 8TB Drives
On 06/17/2022 6:20 pm, Michael Gmelin wrote: On 18. Jun 2022, at 00:57, Larry Rosenman wrote: On 06/17/2022 5:48 pm, Michael Gmelin wrote: On 18. Jun 2022, at 00:31, Alexander Motin wrote: On 17.06.2022 18:24, Alexander Motin wrote: On 17.06.2022 18:16, Larry Rosenman wrote: On 06/17/2022 5:08 pm, Alexander Motin wrote: On 17.06.2022 11:59, Larry Rosenman wrote: I'm looking to upgrade the controllers in my TrueNAS box to something that will support 8TB drives because apparently my LSI 2108 controllers do not support 8TB drives. What's the communities recommendation? needs to support SFF connectors for a total of 4 SFF connectors, as I have 16 slots. We at iX are still using LSI/Broadcom HBAs, just moved from long discontinued mps(4) to newer mpr(4). And I don't believe the problem is directly related to capacity. According to my observations it may be Seagate HDDs of/above certain (8TB) generation. We do not use Seagate HDDs in our products, so about that instability I only heard from forums and TrueNAS community user reports. This is a mfi(4) set of controllers, and a ST8Nm0045 8TB (CMR) drive. Is this a bad combo? mfi0: 9973 (708793330s/0x0002/WARN) - PD 00(e0xfc/s3) is not supported (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 3 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 2 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 1 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 0 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Error 5, Retries exhausted mfi0 Physical Drives: 0 ( 932G) UNCONFIGURED GOOD serial=ZA1AC912> SATA E1:S3 mfi(4) are RAIDs, not HBAs. We do not recommend RAIDs with TrueNAS due to problems with hot-plug, disk identification, etc. and so have limited experience with them. But I know some of LSI RAIDs can be reflashed into equivalent HBAs, so if they share the hardware, I can speculate that they may share some issues. I've just noticed "932G" instead of "8000G". It is obviously a bigger problem than what we heard for HBAs. It looks like a kind of problems that should not happen to HBAs, since they should not care about disk capacity. What does `smartctl -a ` report (especially sector sizes)? -m -- Alexander Motin It's not even making a mfid* node (it is a 4Kn disk) Ok, that’s sad (and explains the wrong size calculation as 4096/512=8). Is this in HBA mode? (Like Alexander suggested, re-/crossflashing using an IT firmware might be an option). What controller / firmware image version is it? -m -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 mfi0@pci0:8:0:0: class=0x010400 rev=0x05 hdr=0x00 vendor=0x1000 device=0x0079 subvendor=0x1734 subdevice=0x1176 vendor = 'Broadcom / LSI' device = 'MegaRAID SAS 2108 [Liberator]' class = mass storage subclass = RAID mfi1@pci0:3:0:0: class=0x010400 rev=0x05 hdr=0x00 vendor=0x1000 device=0x0079 subvendor=0x1734 subdevice=0x1176 vendor = 'Broadcom / LSI' device = 'MegaRAID SAS 2108 [Liberator]' class = mass storage subclass = RAID mfi0: port 0xd000-0xd0ff mem 0xfbc9c000-0xfbc9,0xfbcc -0xfbcf irq 26 at device 0.0 on pci3 mfi0: Using MSI mfi0: Megaraid SAS driver Ver 4.23 mfi0: FW MaxCmds = 1008, limiting to 128 mfip0: on mfi0 mfi0: 10014 (708822708s/0x0020/info) - Shutdown command received from host mfi0: 10015 (boot + 3s/0x0020/info) - Firmware initialization started (PCI ID 00 79/1000/1176/1734) mfi0: 10016 (boot + 3s/0x0020/info) - Firmware version 2.130.353-2727 mfi0: 10017 (boot + 6s/0x0020/info) - Package version 12.12.0-0174 mfi0: 10018 (boot + 6s/0x0020/info) - Board Revision mfi1: port 0xc000-0xc0ff mem 0xfba9c000-0xfba9,0xfbac -0xfbaf irq 16 at device 0.0 on pci8 mfi1: Using MSI mfi1: Megaraid SAS driver Ver 4.23 mfi1: FW MaxCmds = 1008, limiting to 128 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: SAS/SATA controllers: 8 port that support 8TB Drives
On 06/17/2022 5:48 pm, Michael Gmelin wrote: On 18. Jun 2022, at 00:31, Alexander Motin wrote: On 17.06.2022 18:24, Alexander Motin wrote: On 17.06.2022 18:16, Larry Rosenman wrote: On 06/17/2022 5:08 pm, Alexander Motin wrote: On 17.06.2022 11:59, Larry Rosenman wrote: I'm looking to upgrade the controllers in my TrueNAS box to something that will support 8TB drives because apparently my LSI 2108 controllers do not support 8TB drives. What's the communities recommendation? needs to support SFF connectors for a total of 4 SFF connectors, as I have 16 slots. We at iX are still using LSI/Broadcom HBAs, just moved from long discontinued mps(4) to newer mpr(4). And I don't believe the problem is directly related to capacity. According to my observations it may be Seagate HDDs of/above certain (8TB) generation. We do not use Seagate HDDs in our products, so about that instability I only heard from forums and TrueNAS community user reports. This is a mfi(4) set of controllers, and a ST8Nm0045 8TB (CMR) drive. Is this a bad combo? mfi0: 9973 (708793330s/0x0002/WARN) - PD 00(e0xfc/s3) is not supported (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 3 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 2 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 1 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 0 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Error 5, Retries exhausted mfi0 Physical Drives: 0 ( 932G) UNCONFIGURED GOOD serial=ZA1AC912> SATA E1:S3 mfi(4) are RAIDs, not HBAs. We do not recommend RAIDs with TrueNAS due to problems with hot-plug, disk identification, etc. and so have limited experience with them. But I know some of LSI RAIDs can be reflashed into equivalent HBAs, so if they share the hardware, I can speculate that they may share some issues. I've just noticed "932G" instead of "8000G". It is obviously a bigger problem than what we heard for HBAs. It looks like a kind of problems that should not happen to HBAs, since they should not care about disk capacity. What does `smartctl -a ` report (especially sector sizes)? -m -- Alexander Motin It's not even making a mfid* node (it is a 4Kn disk) -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: SAS/SATA controllers: 8 port that support 8TB Drives
On 06/17/2022 5:24 pm, Alexander Motin wrote: On 17.06.2022 18:16, Larry Rosenman wrote: On 06/17/2022 5:08 pm, Alexander Motin wrote: On 17.06.2022 11:59, Larry Rosenman wrote: I'm looking to upgrade the controllers in my TrueNAS box to something that will support 8TB drives because apparently my LSI 2108 controllers do not support 8TB drives. What's the communities recommendation? needs to support SFF connectors for a total of 4 SFF connectors, as I have 16 slots. We at iX are still using LSI/Broadcom HBAs, just moved from long discontinued mps(4) to newer mpr(4). And I don't believe the problem is directly related to capacity. According to my observations it may be Seagate HDDs of/above certain (8TB) generation. We do not use Seagate HDDs in our products, so about that instability I only heard from forums and TrueNAS community user reports. This is a mfi(4) set of controllers, and a ST8Nm0045 8TB (CMR) drive. Is this a bad combo? mfi0: 9973 (708793330s/0x0002/WARN) - PD 00(e0xfc/s3) is not supported (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 3 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 2 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 1 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 0 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Error 5, Retries exhausted mfi0 Physical Drives: 0 ( 932G) UNCONFIGURED GOOD SATA E1:S3 mfi(4) are RAIDs, not HBAs. We do not recommend RAIDs with TrueNAS due to problems with hot-plug, disk identification, etc. and so have limited experience with them. But I know some of LSI RAIDs can be reflashed into equivalent HBAs, so if they share the hardware, I can speculate that they may share some issues. I bought 2 of these: https://www.ebay.com/itm/194910024856 to replace the 2 mfi(4)'s Hopefully I can just move the controllers and TrueNAS 13.0-RELEASE will just notice them. and pick up the new 8T disks. let me know if I'm setting myself up for failure. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: SAS/SATA controllers: 8 port that support 8TB Drives
On 06/17/2022 5:08 pm, Alexander Motin wrote: On 17.06.2022 11:59, Larry Rosenman wrote: I'm looking to upgrade the controllers in my TrueNAS box to something that will support 8TB drives because apparently my LSI 2108 controllers do not support 8TB drives. What's the communities recommendation? needs to support SFF connectors for a total of 4 SFF connectors, as I have 16 slots. We at iX are still using LSI/Broadcom HBAs, just moved from long discontinued mps(4) to newer mpr(4). And I don't believe the problem is directly related to capacity. According to my observations it may be Seagate HDDs of/above certain (8TB) generation. We do not use Seagate HDDs in our products, so about that instability I only heard from forums and TrueNAS community user reports. This is a mfi(4) set of controllers, and a ST8Nm0045 8TB (CMR) drive. Is this a bad combo? mfi0: 9973 (708793330s/0x0002/WARN) - PD 00(e0xfc/s3) is not supported (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 3 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 2 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 1 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Retrying command, 0 more tries remain (probe0:mfi0:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00 (probe0:mfi0:0:0:0): CAM status: CCB request completed with an error (probe0:mfi0:0:0:0): Error 5, Retries exhausted mfi0 Physical Drives: 0 ( 932G) UNCONFIGURED GOOD SATA E1:S3 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
SAS/SATA controllers: 8 port that support 8TB Drives
I'm looking to upgrade the controllers in my TrueNAS box to something that will support 8TB drives because apparently my LSI 2108 controllers do not support 8TB drives. What's the communities recommendation? needs to support SFF connectors for a total of 4 SFF connectors, as I have 16 slots. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Zpool with latest feature com.delpfix:head_errlog can not be booted from.
ature@head_errlogactive local root@freebsd:~ # after re.installing boot programs, it does boot, also does work: root@freebsd:~ # /usr/obj/usr/src/amd64.amd64/stand/userboot/test/test -d /dev/da0 the fix is already pushed. rgds, toomas On 21. May 2022, at 03:56, Larry Rosenman wrote: Can you let me know when a replacement binary is available for EFI? I have my buildbox/dev system in a non-bootable state. It's RAIDZ-1 pool, and no place to put another disk. Thanks for any help. (If can email the replacement binary that would be wonderful). On 05/20/2022 4:47 am, Toomas Soome wrote: I'll see into it. It would be nice to have at least heads up message about such features, or zfs code does have means to block feature upgrade on boot pool. Rgds, Toomas On 20. May 2022, at 11:39, Johan Hendriks wrote: I did upgrade my FreeBSD Current and with that i updated my storage pool and my zroot pool. I did add the new gptboot code on the disk. After the reboot i can not boot anymore. So i did reinstall the os on one disk of the old zroot mirror pool and did leave the second untouched. Then i can import the pools. If i boot with the latest snapshot ISO (FreeBSD-14.0-CURRENT-amd64-20220519-716fd348e01-255696-disc1.iso) i see the following when i boot. BIOS drive A: is fd0 BIOS drive B: is fd1 BIOS drive K: is disk9 ZFS: unsupported feature: com.delpfix:head_errlog ZFS: pool zroot is not supported ZFS: unsupported feature: com.delpfix:head_errlog ZFS: pool storage is not supported BIOS 624kB/2000420kB available memory Then the OS is loaded, if i then go to the shell of the installer and do a zpool import, ik can import the pool zroot and storage. So this snapshot has the latest ZFS version with the com.delpfix:head_errlog feature. So it looks like the bootloader is not able to use the new feature and thus renders your system unbootable. regards Johan -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 Links: -- [1] http://148-52-235-80.sta.estpak.ee/boot.tar
Re: Zpool with latest feature com.delpfix:head_errlog can not be booted from.
Can you let me know when a replacement binary is available for EFI? I have my buildbox/dev system in a non-bootable state. It's RAIDZ-1 pool, and no place to put another disk. Thanks for any help. (If can email the replacement binary that would be wonderful). On 05/20/2022 4:47 am, Toomas Soome wrote: I’ll see into it. It would be nice to have at least heads up message about such features, or zfs code does have means to block feature upgrade on boot pool. Rgds, Toomas On 20. May 2022, at 11:39, Johan Hendriks wrote: I did upgrade my FreeBSD Current and with that i updated my storage pool and my zroot pool. I did add the new gptboot code on the disk. After the reboot i can not boot anymore. So i did reinstall the os on one disk of the old zroot mirror pool and did leave the second untouched. Then i can import the pools. If i boot with the latest snapshot ISO (FreeBSD-14.0-CURRENT-amd64-20220519-716fd348e01-255696-disc1.iso) i see the following when i boot. BIOS drive A: is fd0 BIOS drive B: is fd1 BIOS drive K: is disk9 ZFS: unsupported feature: com.delpfix:head_errlog ZFS: pool zroot is not supported ZFS: unsupported feature: com.delpfix:head_errlog ZFS: pool storage is not supported BIOS 624kB/2000420kB available memory Then the OS is loaded, if i then go to the shell of the installer and do a zpool import, ik can import the pool zroot and storage. So this snapshot has the latest ZFS version with the com.delpfix:head_errlog feature. So it looks like the bootloader is not able to use the new feature and thus renders your system unbootable. regards Johan -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/27/2022 3:58 pm, Mark Johnston wrote: On Sun, Feb 27, 2022 at 01:16:44PM -0600, Larry Rosenman wrote: On 02/26/2022 11:08 am, Larry Rosenman wrote: > On 02/26/2022 10:57 am, Larry Rosenman wrote: >> On 02/26/2022 10:37 am, Juraj Lutter wrote: >>>> On 26 Feb 2022, at 03:03, Larry Rosenman wrote: >>>> I'm running this script: >>>> #!/bin/sh >>>> for i in $(zfs list -H | awk '{print $1}') >>>> do >>>> FS=$1 >>>> FN=$(echo ${FS} | sed -e s@/@_@g) >>>> sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh >>>> l...@freenas.lerctr.org cat - \> $FN >>>> done >>>> >>>> >>>> >>> I’d put, like: >>> >>> echo ${FS} >>> >>> before “sudo zfs send”, to get at least a bit of a clue on where it >>> can get to. >>> >>> otis >>> >>> >>> — >>> Juraj Lutter >>> o...@freebsd.org >> I just looked at the destination to see where it died (it did!) and I >> bectl destroy'd the >> BE that crashed it, and am running a new scrub -- we'll see whether >> that was sufficient. >> >> Thanks, all! > Well, it was NOT sufficient More zfs export fun to come :( I was able to export the rest of the datasets, and re-install 14-CURRENT from a recent snapshot, and restore the datasets I care about. I'm now seeing: mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 48 (zpool), jid 0, uid 0: exited on signal 6 mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 54 (zpool), jid 0, uid 0: exited on signal 6 On boot. Ideas? That ioctl is DIOCGMEDIASIZE, i.e., something is asking /dev/mfi0, the controller device node, about the size of a disk. Presumably this is the result of some kind of misconfiguration somewhere, and /dev/mfid0 was meant instead. per advice from markj@ I deleted the /{etc,boot}/zfs/zpool.cache files, and this issue went away. Stale cache files which are no longer needed. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/27/2022 3:03 pm, Michael Butler wrote: [ cc list trimmed ] On 2/27/22 14:16, Larry Rosenman wrote: I was able to export the rest of the datasets, and re-install 14-CURRENT from a recent snapshot, and restore the datasets I care about. I'm now seeing: mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 48 (zpool), jid 0, uid 0: exited on signal 6 mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 54 (zpool), jid 0, uid 0: exited on signal 6 On boot. Ideas? These messages may or may not be related. I found both the mfi and mrsas drivers to be 'chatty' in this way - IOCTL complaints. I ended up setting the debug flag for mrsas in /etc/sysctl.conf .. dev.mrsas.0.mrsas_debug=0 There's an equivalent for mfi Michael I don't see it: ✖1 ❯ sysctl dev.mfi dev.mfi.0.keep_deleted_volumes: 0 dev.mfi.0.delete_busy_volumes: 0 dev.mfi.0.%parent: pci3 dev.mfi.0.%pnpinfo: vendor=0x1000 device=0x0079 subvendor=0x1028 subdevice=0x1f17 class=0x010400 dev.mfi.0.%location: slot=0 function=0 dbsf=pci0:3:0:0 dev.mfi.0.%driver: mfi dev.mfi.0.%desc: Dell PERC H700 Integrated dev.mfi.%parent: -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/26/2022 11:08 am, Larry Rosenman wrote: On 02/26/2022 10:57 am, Larry Rosenman wrote: On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done I’d put, like: echo ${FS} before “sudo zfs send”, to get at least a bit of a clue on where it can get to. otis — Juraj Lutter o...@freebsd.org I just looked at the destination to see where it died (it did!) and I bectl destroy'd the BE that crashed it, and am running a new scrub -- we'll see whether that was sufficient. Thanks, all! Well, it was NOT sufficient More zfs export fun to come :( I was able to export the rest of the datasets, and re-install 14-CURRENT from a recent snapshot, and restore the datasets I care about. I'm now seeing: mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 48 (zpool), jid 0, uid 0: exited on signal 6 mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled mfi0: IOCTL 0x40086481 not handled pid 54 (zpool), jid 0, uid 0: exited on signal 6 On boot. Ideas? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/26/2022 10:57 am, Larry Rosenman wrote: On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done I’d put, like: echo ${FS} before “sudo zfs send”, to get at least a bit of a clue on where it can get to. otis — Juraj Lutter o...@freebsd.org I just looked at the destination to see where it died (it did!) and I bectl destroy'd the BE that crashed it, and am running a new scrub -- we'll see whether that was sufficient. Thanks, all! Well, it was NOT sufficient More zfs export fun to come :( -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done I’d put, like: echo ${FS} before “sudo zfs send”, to get at least a bit of a clue on where it can get to. otis — Juraj Lutter o...@freebsd.org I just looked at the destination to see where it died (it did!) and I bectl destroy'd the BE that crashed it, and am running a new scrub -- we'll see whether that was sufficient. Thanks, all! -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/25/2022 2:11 am, Alexander Leidinger wrote: Quoting Larry Rosenman (from Thu, 24 Feb 2022 20:19:45 -0600): I tried a scrub -- it panic'd on a fatal double fault. Suggestions? The safest / cleanest (but not fastest) is data export and pool re-creation. If you export dataset by dataset (instead of recursively all), you can even see which dataset is causing the issue. In case this per dataset export narrows down the issue and it is a dataset you don't care about (as in: 1) no issue to recreate from scratch or 2) there is a backup available) you could delete this (or each such) dataset and re-create it in-place (= not re-creating the entire pool). Bye, Alexander. http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done How will I know a "Problem" dataset? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/24/2022 8:07 pm, Larry Rosenman wrote: On 02/24/2022 1:27 pm, Larry Rosenman wrote: On 02/24/2022 10:48 am, Rob Wing wrote: even with those set, I still get the panid. :( Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL system. UGH. I chroot'd to the pool, and built a no invariants kernel. It booted and seems(!) to be running. Is there any diagnostics/clearing the crappy ZIL? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 I tried a scrub -- it panic'd on a fatal double fault. Suggestions? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/24/2022 1:27 pm, Larry Rosenman wrote: On 02/24/2022 10:48 am, Rob Wing wrote: even with those set, I still get the panid. :( Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL system. UGH. I chroot'd to the pool, and built a no invariants kernel. It booted and seems(!) to be running. Is there any diagnostics/clearing the crappy ZIL? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/24/2022 10:48 am, Rob Wing wrote: Yes, I believe so. On Thu, Feb 24, 2022 at 7:42 AM Larry Rosenman wrote: On 02/24/2022 10:36 am, Rob Wing wrote: You might try setting `sysctl vfs.zfs.recover=1` and `sysctl vfs.zfs.spa.load_verify_metadata=0`. I had a similar error the other day (couple months ago). The best I did was being able to import the pool read only. I ended up restoring from backup. Are those tunables that I can set in loader.conf? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 even with those set, I still get the panid. :( Let me see if I can compile a 14 non-INVARIANTS kernel on the 13-REL system. UGH. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/24/2022 10:36 am, Rob Wing wrote: You might try setting `sysctl vfs.zfs.recover=1` and `sysctl vfs.zfs.spa.load_verify_metadata=0`. I had a similar error the other day (couple months ago). The best I did was being able to import the pool read only. I ended up restoring from backup. Are those tunables that I can set in loader.conf? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/24/2022 10:29 am, Alexander Motin wrote: On 24.02.2022 10:57, Larry Rosenman wrote: On 02/23/2022 9:27 pm, Larry Rosenman wrote: It crashes just after root mount (this is the boot pool and only pool on the system), seeL https://www.lerctr.org/~ler/14-BOOT-Crash.png Where do I go from here? I see 2 ways: 1) Since it is only an assertion and 13 is working (so far), you may just build 14 kernel without INVARIANTS option and later recreate the pool when you have time. 2) You may treat it as metadata corruption: import pool read-only and evacuate the data. If you have recent enough snapshots you may be able to easily replicate the pool with all the settings to some other disk. ZIL is not replicated, so corruptions there should not be a problem. If there are no snapshots, then either copy on file level, or you may be able to create snapshot for replication in 13 (on 14 without INVARIANTS), importing pool read-write. Ugh. The box is a 6 disk R710, and all 6 disks are in the pool. I do have a FreeNAS box with enough space to copy the data out. There ARE snaps of MOST filesystems that are taken regularly. The 13 I'm booting from is the 13 memstick image. There are ~70 filesystems (IIRC) with poudriere, ports, et al. I'm not sure how to build the 14 kernel from the 13 booted box. Ideas? Methods? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/23/2022 9:27 pm, Larry Rosenman wrote: On 02/23/2022 9:15 pm, Alexander Motin wrote: On 23.02.2022 22:01, Larry Rosenman wrote: On 02/23/2022 8:58 pm, Alexander Motin wrote: On 23.02.2022 21:52, Larry Rosenman wrote: On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? Why do you need to import without mounting file systems? I think you may actually wish them to be mounted to replay their ZILs. Just use -R option to mount file systems in some different place. I get the errors shown at: https://www.lerctr.org/~ler/14-mount-R-output.png Should I worry? Or do something(tm) here? This looks weird, but may possibly depend on mount points topology, whether /mnt is writable, etc. What happen if you export it now and try to import it in normal way on 14 without -F? It crashes just after root mount (this is the boot pool and only pool on the system), seeL https://www.lerctr.org/~ler/14-BOOT-Crash.png Where do I go from here? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/23/2022 9:15 pm, Alexander Motin wrote: On 23.02.2022 22:01, Larry Rosenman wrote: On 02/23/2022 8:58 pm, Alexander Motin wrote: On 23.02.2022 21:52, Larry Rosenman wrote: On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? Why do you need to import without mounting file systems? I think you may actually wish them to be mounted to replay their ZILs. Just use -R option to mount file systems in some different place. I get the errors shown at: https://www.lerctr.org/~ler/14-mount-R-output.png Should I worry? Or do something(tm) here? This looks weird, but may possibly depend on mount points topology, whether /mnt is writable, etc. What happen if you export it now and try to import it in normal way on 14 without -F? It crashes just after root mount (this is the boot pool and only pool on the system), seeL https://www.lerctr.org/~ler/14-BOOT-Crash.png -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/23/2022 8:58 pm, Alexander Motin wrote: On 23.02.2022 21:52, Larry Rosenman wrote: On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? Why do you need to import without mounting file systems? I think you may actually wish them to be mounted to replay their ZILs. Just use -R option to mount file systems in some different place. I get the errors shown at: https://www.lerctr.org/~ler/14-mount-R-output.png Should I worry? Or do something(tm) here? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/23/2022 8:41 pm, Alexander Motin wrote: Hi Larry, The panic you are getting is an assertion, enabled by kernel built with INVARIANTS option. On 13 you may just not have that debugging enabled to hit the issue. But that may be only a consequence. Original problem I guess in possibly corrupted ZFS intent log records (or false positive), that could happen so due to use of -F recovery option on `zpool import`, that supposed to try import pool at earlier transaction group if there is some metadata corruption found. It is not supposed to work 100% and only a last resort. Though may be that assertion is just excessively strict for that specific recovery case. If as you say pool can be imported and scrubbed on 13, then I'd expect following clean export should allow later import on 14 without -F. On 23.02.2022 21:21, Larry Rosenman wrote: 've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. How can I import the pool withOUT it mounting the FileSystems so I can export it cleanly on the 13 system? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
ZFS PANIC: HELP.
've got my main dev box that crashes on 14 with the screen shot at https://www.lerctr.org/~ler/14-zfs-crash.png. Booting from a 13-REL USB installer it imports and scrubs. Ideas? I can either video conference with shared screen or give access to the console via my Dominion KVM. Any help/ideas/etc welcome I really need to get this box back. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Panic, CURRENT, yesterday
On 02/09/2022 10:08 pm, Larry Rosenman wrote: Another one today: ❯ more /var/crash/core.txt.1 borg.lerctr.org dumped core - see /var/crash/vmcore.1 Wed Feb 9 19:30:43 CST 2022 core is available, and I can give access and/or send the core and kernel/debug stuff. True for this one too. Yet another one: ❯ more core.txt.3 borg.lerctr.org dumped core - see /var/crash/vmcore.3 Sat Feb 19 00:42:59 CST 2022 FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #56 ler/freebsd-main-changes-n253181-c140933ef40: Tue Feb 15 12:26:23 CST 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 panic: ng_snd_item: 42 != 173 GNU gdb (GDB) 11.2 [GDB v11.2 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: panic: ng_snd_item: 42 != 173 cpuid = 0 time = 1645251876 KDB: stack backtrace: #0 0x80516005 at kdb_backtrace+0x65 #1 0x804cba7f at vpanic+0x17f #2 0x804cb853 at panic+0x43 #3 0x82c755b7 at ng_snd_item+0x587 #4 0x82c8e263 at ng_ether_output+0xb3 #5 0x805e0e2d at ether_output+0x6cd #6 0x805f6461 at arpintr+0xd71 #7 0x805e5797 at netisr_dispatch_src+0x97 #8 0x805e112e at ether_demux+0x14e #9 0x82c8e89c at ng_ether_rcv_upper+0x12c #10 0x82c75dab at ng_apply_item+0x7eb #11 0x82c7538d at ng_snd_item+0x35d #12 0x82c75dab at ng_apply_item+0x7eb #13 0x82c7538d at ng_snd_item+0x35d #14 0x82c8e33f at ng_ether_input+0x9f #15 0x805e23e7 at ether_nh_input+0x217 #16 0x805e5797 at netisr_dispatch_src+0x97 #17 0x805e159d at ether_input+0x5d Uptime: 2d6h42m17s Dumping 29172 out of 131023 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0x804cb68f in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0x804cbaee in vpanic (fmt=0x82c7ed98 "%s: %d != %d", ap=) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0x804cb853 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0x82c755b7 in ng_snd_item (item=0xf8131de0bd80, flags=0) at /usr/src/sys/netgraph/ng_base.c:2256 #6 0x82c8e263 in ng_ether_output (ifp=, ifp@entry=, mp=0xfe025a044868, mp@entry=) at /usr/src/sys/netgraph/ng_ether.c:294 #7 0x805e0e2d in ether_output (ifp=0xf8010cfe0800, m=0xf81d2e92b000, dst=, ro=) at /usr/src/sys/net/if_ethersubr.c:427 #8 0x805f6461 in in_arpinput (m=0xf81d2e92b000) at /usr/src/sys/netinet/if_ether.c:1129 #9 arpintr (m=0xf81d2e92b000, m@entry=) at /usr/src/sys/netinet/if_ether.c:739 #10 0x805e5797 in netisr_dispatch_src (proto=4, source=source@entry=0, m=0xf81d2e92b000) at /usr/src/sys/net/netisr.c:1153 #11 0x805e5aef in netisr_dispatch (proto=, m=) at /usr/src/sys/net/netisr.c:1244 #12 0x805e112e in ether_demux (ifp=ifp@entry=0xf8010cfe0800, m=, m@entry=0xf81d2e92b000) at /usr/src/sys/net/if_ethersubr.c:926 #13 0x82c8e89c in ng_ether_rcv_upper (hook=, hook@entry=, item=0xf8131de0bd80, item@entry=) at /usr/src/sys/netgraph/ng_ether.c:742 #14 0x82c75dab in ng_apply_item (node=node@entry=0xf81365630b00, item=item@entry=0xf8131de0bd80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2406 #15 0x82c7538d in ng_snd_item (item=0xf8131de0bd80, item@entry=, flags=0, flags@entry=) at /usr/src/sys/netgraph/ng_base.c:2323 #16 0x82c75dab in ng_apply_item (node=node@entry=0xf813660f8500, item=item@entry=0xf8131de0bd80, rw=0) at /usr/src/sys/netgraph/ng_base.c:2406 #17 0x82c7538d in ng_snd_item (item=item@entry=0xf8131de0bd80,
Re: Panic, CURRENT, yesterday
On 02/08/2022 1:51 pm, Larry Rosenman wrote: I got the following last night while doing a poudriere run as well as a full bacula backup: borg.lerctr.org dumped core - see /var/crash/vmcore.0 Mon Feb 7 23:05:48 CST 2022 FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #54 ler/freebsd-main-changes-n252969-5e5fd0c788c: Sat Feb 5 14:48:30 CST 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 panic: ng_snd_item: 42 != 290 Another one today: ❯ more /var/crash/core.txt.1 borg.lerctr.org dumped core - see /var/crash/vmcore.1 Wed Feb 9 19:30:43 CST 2022 FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #54 ler/freebsd-main-changes-n252969-5e5fd0c788c: Sat Feb 5 14:48:30 CST 2022 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 panic: ng_snd_item: 42 != 1414 GNU gdb (GDB) 11.2 [GDB v11.2 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: panic: ng_snd_item: 42 != 1414 cpuid = 0 time = 1644455454 KDB: stack backtrace: #0 0x80515fc5 at kdb_backtrace+0x65 #1 0x804cbaef at vpanic+0x17f #2 0x804cb8c3 at panic+0x43 #3 0x82c765b7 at ng_snd_item+0x587 #4 0x82c8f263 at ng_ether_output+0xb3 #5 0x805e0c1d at ether_output+0x6cd #6 0x805f6251 at arpintr+0xd71 #7 0x805e5587 at netisr_dispatch_src+0x97 #8 0x805e0f1e at ether_demux+0x14e #9 0x82c8f89c at ng_ether_rcv_upper+0x12c #10 0x82c76dab at ng_apply_item+0x7eb #11 0x82c7638d at ng_snd_item+0x35d #12 0x82c76dab at ng_apply_item+0x7eb #13 0x82c7638d at ng_snd_item+0x35d #14 0x82c8f33f at ng_ether_input+0x9f #15 0x805e21d7 at ether_nh_input+0x217 #16 0x805e5587 at netisr_dispatch_src+0x97 #17 0x805e138d at ether_input+0x5d Uptime: 1d20h10m31s Dumping 28528 out of 131023 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0x804cb6ff in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0x804cbb5e in vpanic (fmt=0x82c7fd98 "%s: %d != %d", ap=) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0x804cb8c3 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0x82c765b7 in ng_snd_item (item=0xf8132d74f880, flags=0) at /usr/src/sys/netgraph/ng_base.c:2256 #6 0x82c8f263 in ng_ether_output (ifp=, ifp@entry=, mp=0xfe02ba63f868, mp@entry=) at /usr/src/sys/netgraph/ng_ether.c:294 #7 0x805e0c1d in ether_output (ifp=0xf80114a43000, m=0xf81f8203e600, dst=, ro=) at /usr/src/sys/net/if_ethersubr.c:427 #8 0x805f6251 in in_arpinput (m=0xf81f8203e600) at /usr/src/sys/netinet/if_ether.c:1129 #9 arpintr (m=0xf81f8203e600, m@entry=) at /usr/src/sys/netinet/if_ether.c:739 #10 0x805e5587 in netisr_dispatch_src (proto=4, source=source@entry=0, m=0xf81f8203e600) at /usr/src/sys/net/netisr.c:1153 #11 0x805e58df in netisr_dispatch (proto=, m=) at /usr/src/sys/net/netisr.c:1244 #12 0x805e0f1e in ether_demux (ifp=ifp@entry=0xf80114a43000, m=, m@entry=0xf81f8203e600) at /usr/src/sys/net/if_ethersubr.c:926 #13 0x82c8f89c in ng_ether_rcv_upper (hook=, hook@entry=, item=0xf8132d74f880, item@entry=) at /usr/src/sys/netgraph/ng_ether.c:742 #14 0x82c76dab in ng_apply_item (node=node@entry=0xf812992fe600, item=item@entry=0xf8132d74f880, rw=0) at /usr/src/sys/netgraph/ng_base.c:2406 #15 0x82c7638d in ng_snd_item (item=0xf8132d74f880, item@entry=, flags=0, flags@entry=) at /usr/src/sys/netgraph/ng_base.c:2323 #16 0x82c76dab in ng_apply_item (node=node@entry=0xfff
Panic, CURRENT, yesterday
ther_input_internal (ifp=0xf8010dc57000, m=0xf81736a3f600) at /usr/src/sys/net/if_ethersubr.c:661 #20 ether_nh_input (m=, m@entry=) at /usr/src/sys/net/if_ethersubr.c:742 #21 0x805e5587 in netisr_dispatch_src (proto=proto@entry=5, source=source@entry=0, m=m@entry=0xf81736a3f600) at /usr/src/sys/net/netisr.c:1153 #22 0x805e58df in netisr_dispatch (proto=, proto@entry=5, m=, m@entry=0xf81736a3f600) at /usr/src/sys/net/netisr.c:1244 #23 0x805e138d in ether_input (ifp=0xf8010dc57000, m=0xf81736a3f600) at /usr/src/sys/net/if_ethersubr.c:833 #24 0x821a934d in bce_rx_intr (sc=0xfe02a141c000) at /usr/src/sys/dev/bce/if_bce.c:6721 #25 bce_intr (xsc=) at /usr/src/sys/dev/bce/if_bce.c:7870 #26 0x80490929 in intr_event_execute_handlers (ie=0xf8010dac6900, p=) at /usr/src/sys/kern/kern_intr.c:1205 #27 ithread_execute_handlers (ie=0xf8010dac6900, p=) at /usr/src/sys/kern/kern_intr.c:1218 #28 ithread_loop (arg=, arg@entry=0xf8015d3683a0) at /usr/src/sys/kern/kern_intr.c:1306 #29 0x8048d3a0 in fork_exit ( callout=0x804906d0 , arg=0xf8015d3683a0, frame=0xfe025a5b2f40) at /usr/src/sys/kern/kern_fork.c:1102 #30 (kgdb) core is available, and I can give access and/or send the core and kernel/debug stuff. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: My -CURRENT crashes....
On Mon, Dec 27, 2021 at 09:15:53PM +0200, Konstantin Belousov wrote: > On Mon, Dec 27, 2021 at 10:58:02AM -0800, Gleb Smirnoff wrote: > > On Mon, Dec 27, 2021 at 01:43:01PM -0500, Alexander Motin wrote: > > A> > This allows us to deduct that the callout belongs to proc subsystem and > > A> > we can retrieve the proc it points to: c_lock - 0x128 = > > 0xf8030521e548 > > A> > It is ccache in PRS_NORMAL state. And the "tmp" in our stack frame is > > its > > A> > p_itcallout. > > A> > > > A> > So there is something that would zero out most of the p_itcallout while > > A> > it is scheduled? > > A> > > A> So carefully zero it, but keep the lock pointer... The only way that > > A> comes to mind is callout_init_mtx() in do_fork() if we assume the > > A> process has completed and the struct proc was reused. I guess if we > > A> could somehow leak scheduled callout in exit1(). May be we could add > > A> some more assertions to try catch callout still being active there. > > > > Note that _callout_stop_safe(p_itcallout) is the only place in kernel where > > CS_EXECUTING is used. > > I would start asking are there any third-party modules loaded. Nope. Id Refs AddressSize Name 1 239 0x8020 d94b58 kernel 21 0x81441000 f990 ehci.ko 3 12 0x814510003da98 usb.ko 41 0x8148f000 70ae00 zfs.ko 55 0x81b9a000 5338 xdr.ko 61 0x81ba ccf0 ukbd.ko 77 0x81bad000 5248 hid.ko 81 0x81bb3000 b2c0 uhci.ko 91 0x8203d000 cec8 tmpfs.ko 101 0x8204a000 3538 fdescfs.ko 112 0x8204e000 3240 procfs.ko 123 0x82052000 5778 pseudofs.ko 131 0x82058000 9290 aesni.ko 141 0x82062000 20f0 coretemp.ko 151 0x82065000 3238 filemon.ko 161 0x820690002dd58 linux.ko 174 0x82097000 aea8 linux_common.ko 181 0x820a2000 4250 ichsmb.ko 192 0x820a7000 2180 smbus.ko 201 0x820aa000 4c10 ichwd.ko 211 0x820af000 2220 cpuctl.ko 221 0x820b2000 4338 cryptodev.ko 231 0x820b7000 2238 dtraceall.ko 248 0x820ba000 8a60 opensolaris.ko 258 0x8220 84a300 dtrace.ko 261 0x820c3000 2274 dtmalloc.ko 271 0x820c6000 3331 fbt.ko 281 0x820ca00056570 fasttrap.ko 291 0x82121000 2258 sdt.ko 301 0x82124000 91b4 systrace.ko 311 0x8212e000 91b4 systrace_freebsd32.ko 321 0x82138000 234c profile.ko 331 0x8213b000 8b38 ipmi.ko 343 0x82144000 45b0 efirt.ko 351 0x82149000 75b0 if_bridge.ko 361 0x82151000 50d8 bridgestp.ko 371 0x821570001662c hwpmc.ko 381 0x8216e00028bb8 tcp_rack.ko 391 0x82197000 21b8 mfip.ko 402 0x82a4b00084470 cam.ko 411 0x8219a000 7d38 ioat.ko 421 0x821a20004 if_bce.ko 431 0x82ad17a50 miibus.ko 441 0x821eb000 44b0 usb_quirk.ko 451 0x821f b3a8 usb_template.ko 461 0x821fc000 3268 ums.ko 471 0x82ae8000 92d0 xhci.ko 481 0x82af2000 6120 ohci.ko 491 0x82af900043ef8 nfscl.ko 503 0x82b3d00018cf0 nfscommon.ko 513 0x82b56000 2168 nfssvc.ko 524 0x82b59000138a0 krpc.ko 531 0x82b6d0004e638 nfsd.ko 541 0x82bbc000 bdc0 nfslockd.ko 551 0x82bc8000 4168 ataintel.ko 562 0x82bcd000 8358 ata.ko 571 0x82bd6000 5388 atapci.ko 581 0x82bdc000 4d40 geom_label.ko 591 0x82be100029f58 linux64.ko 601 0x82c0b000 2260 pty.ko 611 0x82c0e000 639c linprocfs.ko 621 0x82c15000 3284 linsysfs.ko 631 0x82c19000 3378 acpi_wmi.ko 641 0x82c1d000 2280 uhid.ko 651 0x82c2 3320 usbhid.ko 661 0x82c24000 31f8 hidbus.ko 671 0x82c28000 32c0 wmt.ko 681 0x82c2c00041a38 pf.ko 691 0x82c6e000 2a08 mac_ntpd.ko 705 0x82c71000 fb28 netgraph.ko 711 0x82c81000 63f8 ng_netflow.ko 72 1 0x82c88000 41e8 ng_ksocket.ko 731 0x82c8d000 3180 ng_ether.ko 741 0x82c91000 3918 ng_socket.ko 751 0x82c95000 4708 nullfs.ko -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Panic: Page Fault in Kernel: Yesterday's CURRENT
On 12/17/2021 1:36 pm, Mark Johnston wrote: On Fri, Dec 10, 2021 at 10:43:19AM -0600, Larry Rosenman wrote: 14-2021_12_07-1217 - - 1.87G 2021-12-07 12:17 14-2021_12_09-1957 NR / 121G 2021-12-09 19:57 If that's any help I can't tell what this is saying. A kernel built on the 7th does not crash, or...? Which revision did you update from before you started seeing crashes? From a kgdb session it'd be useful to see output from (kgdb) frame 8 (kgdb) p/x *tmp to start. Correct, the 7th didn't panic, but the 9th did, and yesterday's too. Grrr ler in borg in /mnt🔒 on ☁️ (us-east-1) ❯ kgdb -c /var/crash/vmcore.0 /mnt/boot/kernel/kernel GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /mnt/boot/kernel/kernel... (No debugging symbols found in /mnt/boot/kernel/kernel) Failed to open vmcore: /var/crash/vmcore.0: Permission denied (kgdb) bt No stack. quitb) ler in borg in /mnt🔒 on ☁️ (us-east-1) took 6s ❯ sudo chmod +r /var/crash/* ler in borg in /mnt🔒 on ☁️ (us-east-1) ❯ kgdb -c /var/crash/vmcore.0 /mnt/boot/kernel/kernel GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /mnt/boot/kernel/kernel... (No debugging symbols found in /mnt/boot/kernel/kernel) /wrkdirs/usr/ports/devel/gdb/work-py37/gdb-11.1/gdb/thread.c:1345: internal-error: void switch_to_thread(thread_info *): Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) n This is a bug, please report it. For instructions, see: <https://www.gnu.org/software/gdb/bugs/>. /wrkdirs/usr/ports/devel/gdb/work-py37/gdb-11.1/gdb/thread.c:1345: internal-error: void switch_to_thread(thread_info *): Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) n Command aborted. (kgdb) bt No thread selected. (kgdb) fr 8 No thread selected. (kgdb) On 12/10/2021 10:36 am, Alexander Motin wrote: > Hi Larry, > > This looks like some use-after-free or otherwise corrupted callout > structure. Unfortunately the backtrace does not tell what was the > callout. When was the previous update to look what could change? > > On 10.12.2021 11:24, Larry Rosenman wrote: >> FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #15 >> main-n251537-ab639f2398b: Thu Dec 9 19:45:37 CST 2021 >> r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL >> amd64 >> >> VMCORE *IS* available. >> >> >> >> >> Unread portion of the kernel message buffer: >> kernel trap 12 with interrupts disabled >> >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 20 >> fault virtual address = 0x0 >> fault code = supervisor write data, page not present >> instruction pointer = 0x20:0x804e0db4 >> stack pointer = 0x0:0xfe0434de4e10 >> frame pointer = 0x0:0xfe0434de4e70 >> code segment = base 0x0, limit 0xf, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = resume, IOPL = 0 >> curren
Re: Panic: Page Fault in Kernel: Yesterday's CURRENT
On 12/16/2021 9:03 pm, Larry Rosenman wrote: On 12/10/2021 10:43 am, Larry Rosenman wrote: 14-2021_12_07-1217 - - 1.87G 2021-12-07 12:17 14-2021_12_09-1957 NR / 121G 2021-12-09 19:57 If that's any help On 12/10/2021 10:36 am, Alexander Motin wrote: Hi Larry, This looks like some use-after-free or otherwise corrupted callout structure. Unfortunately the backtrace does not tell what was the callout. When was the previous update to look what could change? On 10.12.2021 11:24, Larry Rosenman wrote: FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #15 main-n251537-ab639f2398b: Thu Dec 9 19:45:37 CST 2021 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 VMCORE *IS* available. Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 20 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0x804e0db4 stack pointer = 0x0:0xfe0434de4e10 frame pointer = 0x0:0xfe0434de4e70 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 82990 (c++) trap number = 12 panic: page fault cpuid = 0 time = 163998 KDB: stack backtrace: #0 0x8050fc95 at kdb_backtrace+0x65 #1 0x804c468f at vpanic+0x17f #2 0x804c4503 at panic+0x43 #3 0x807a2195 at trap_fatal+0x385 #4 0x807a21ef at trap_pfault+0x4f #5 0x80779c78 at calltrap+0x8 #6 0x8045ddb8 at handleevents+0x188 #7 0x8045ea3e at timercb+0x24e #8 0x807ca9eb at lapic_handle_timer+0x9b #9 0x8077b9b1 at Xtimerint+0xb1 Uptime: 2h28m57s Dumping 12829 out of 131023 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0x804c428c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0x804c46fe in vpanic (fmt=0x807e1276 "%s", ap=) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0x804c4503 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0x807a2195 in trap_fatal (frame=0xfe0434de4d50, eva=0) at /usr/src/sys/amd64/amd64/trap.c:946 #6 0x807a21ef in trap_pfault (frame=0xfe0434de4d50, usermode=false, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:765 #7 #8 0x804e0db4 in callout_process (now=now@entry=38385536922300) at /usr/src/sys/kern/kern_timeout.c:488 #9 0x8045ddb8 in handleevents (now=now@entry=38385536922300, fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213 #10 0x8045ea3e in timercb (et=0x80d475e0 , arg=) at /usr/src/sys/kern/kern_clocksource.c:357 #11 0x807ca9eb in lapic_handle_timer (frame=0xfe0434de4f40) at /usr/src/sys/x86/x86/local_apic.c:1364 #12 #13 0x00080df42bb6 in ?? () Backtrace stopped: Cannot access memory at address 0x7def2c90 (kgdb) ' I got a new crash on a today's current: ❯ more core.txt.1 borg.lerctr.org dumped core - see /var/crash/vmcore.1 Thu Dec 16 17:01:37 CST 2021 FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #22 main-n251748-c610426c4de: Thu Dec 16 09:22:52 CST 2021 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 panic: page fault GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fau
Re: Panic: Page Fault in Kernel: Yesterday's CURRENT
On 12/10/2021 10:43 am, Larry Rosenman wrote: 14-2021_12_07-1217 - - 1.87G 2021-12-07 12:17 14-2021_12_09-1957 NR / 121G 2021-12-09 19:57 If that's any help On 12/10/2021 10:36 am, Alexander Motin wrote: Hi Larry, This looks like some use-after-free or otherwise corrupted callout structure. Unfortunately the backtrace does not tell what was the callout. When was the previous update to look what could change? On 10.12.2021 11:24, Larry Rosenman wrote: FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #15 main-n251537-ab639f2398b: Thu Dec 9 19:45:37 CST 2021 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 VMCORE *IS* available. Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 20 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0x804e0db4 stack pointer = 0x0:0xfe0434de4e10 frame pointer = 0x0:0xfe0434de4e70 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 82990 (c++) trap number = 12 panic: page fault cpuid = 0 time = 163998 KDB: stack backtrace: #0 0x8050fc95 at kdb_backtrace+0x65 #1 0x804c468f at vpanic+0x17f #2 0x804c4503 at panic+0x43 #3 0x807a2195 at trap_fatal+0x385 #4 0x807a21ef at trap_pfault+0x4f #5 0x80779c78 at calltrap+0x8 #6 0x8045ddb8 at handleevents+0x188 #7 0x8045ea3e at timercb+0x24e #8 0x807ca9eb at lapic_handle_timer+0x9b #9 0x8077b9b1 at Xtimerint+0xb1 Uptime: 2h28m57s Dumping 12829 out of 131023 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0x804c428c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0x804c46fe in vpanic (fmt=0x807e1276 "%s", ap=) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0x804c4503 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0x807a2195 in trap_fatal (frame=0xfe0434de4d50, eva=0) at /usr/src/sys/amd64/amd64/trap.c:946 #6 0x807a21ef in trap_pfault (frame=0xfe0434de4d50, usermode=false, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:765 #7 #8 0x804e0db4 in callout_process (now=now@entry=38385536922300) at /usr/src/sys/kern/kern_timeout.c:488 #9 0x8045ddb8 in handleevents (now=now@entry=38385536922300, fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213 #10 0x8045ea3e in timercb (et=0x80d475e0 , arg=) at /usr/src/sys/kern/kern_clocksource.c:357 #11 0x807ca9eb in lapic_handle_timer (frame=0xfe0434de4f40) at /usr/src/sys/x86/x86/local_apic.c:1364 #12 #13 0x00080df42bb6 in ?? () Backtrace stopped: Cannot access memory at address 0x7def2c90 (kgdb) ' I got a new crash on a today's current: ❯ more core.txt.1 borg.lerctr.org dumped core - see /var/crash/vmcore.1 Thu Dec 16 17:01:37 CST 2021 FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #22 main-n251748-c610426c4de: Thu Dec 16 09:22:52 CST 2021 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 panic: page fault GNU gdb (GDB) 11.1 [GDB v11.1 for FreeBSD] Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd14.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 20 fa
Re: Panic: Page Fault in Kernel: Yesterday's CURRENT
14-2021_12_07-1217 - - 1.87G 2021-12-07 12:17 14-2021_12_09-1957 NR / 121G 2021-12-09 19:57 If that's any help On 12/10/2021 10:36 am, Alexander Motin wrote: Hi Larry, This looks like some use-after-free or otherwise corrupted callout structure. Unfortunately the backtrace does not tell what was the callout. When was the previous update to look what could change? On 10.12.2021 11:24, Larry Rosenman wrote: FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #15 main-n251537-ab639f2398b: Thu Dec 9 19:45:37 CST 2021 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 VMCORE *IS* available. Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 20 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0x804e0db4 stack pointer = 0x0:0xfe0434de4e10 frame pointer = 0x0:0xfe0434de4e70 code segment = base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 82990 (c++) trap number = 12 panic: page fault cpuid = 0 time = 163998 KDB: stack backtrace: #0 0x8050fc95 at kdb_backtrace+0x65 #1 0x804c468f at vpanic+0x17f #2 0x804c4503 at panic+0x43 #3 0x807a2195 at trap_fatal+0x385 #4 0x807a21ef at trap_pfault+0x4f #5 0x80779c78 at calltrap+0x8 #6 0x8045ddb8 at handleevents+0x188 #7 0x8045ea3e at timercb+0x24e #8 0x807ca9eb at lapic_handle_timer+0x9b #9 0x8077b9b1 at Xtimerint+0xb1 Uptime: 2h28m57s Dumping 12829 out of 131023 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0x804c428c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0x804c46fe in vpanic (fmt=0x807e1276 "%s", ap=) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0x804c4503 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0x807a2195 in trap_fatal (frame=0xfe0434de4d50, eva=0) at /usr/src/sys/amd64/amd64/trap.c:946 #6 0x807a21ef in trap_pfault (frame=0xfe0434de4d50, usermode=false, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:765 #7 #8 0x804e0db4 in callout_process (now=now@entry=38385536922300) at /usr/src/sys/kern/kern_timeout.c:488 #9 0x8045ddb8 in handleevents (now=now@entry=38385536922300, fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213 #10 0x8045ea3e in timercb (et=0x80d475e0 , arg=) at /usr/src/sys/kern/kern_clocksource.c:357 #11 0x807ca9eb in lapic_handle_timer (frame=0xfe0434de4f40) at /usr/src/sys/x86/x86/local_apic.c:1364 #12 #13 0x00080df42bb6 in ?? () Backtrace stopped: Cannot access memory at address 0x7def2c90 (kgdb) ---- -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Panic: Page Fault in Kernel: Yesterday's CURRENT
FreeBSD borg.lerctr.org 14.0-CURRENT FreeBSD 14.0-CURRENT #15 main-n251537-ab639f2398b: Thu Dec 9 19:45:37 CST 2021 r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL amd64 VMCORE *IS* available. Unread portion of the kernel message buffer: kernel trap 12 with interrupts disabled Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 20 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0x804e0db4 stack pointer = 0x0:0xfe0434de4e10 frame pointer = 0x0:0xfe0434de4e70 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= resume, IOPL = 0 current process = 82990 (c++) trap number = 12 panic: page fault cpuid = 0 time = 163998 KDB: stack backtrace: #0 0x8050fc95 at kdb_backtrace+0x65 #1 0x804c468f at vpanic+0x17f #2 0x804c4503 at panic+0x43 #3 0x807a2195 at trap_fatal+0x385 #4 0x807a21ef at trap_pfault+0x4f #5 0x80779c78 at calltrap+0x8 #6 0x8045ddb8 at handleevents+0x188 #7 0x8045ea3e at timercb+0x24e #8 0x807ca9eb at lapic_handle_timer+0x9b #9 0x8077b9b1 at Xtimerint+0xb1 Uptime: 2h28m57s Dumping 12829 out of 131023 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0x804c428c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0x804c46fe in vpanic (fmt=0x807e1276 "%s", ap=) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0x804c4503 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0x807a2195 in trap_fatal (frame=0xfe0434de4d50, eva=0) at /usr/src/sys/amd64/amd64/trap.c:946 #6 0x807a21ef in trap_pfault (frame=0xfe0434de4d50, usermode=false, signo=, ucode=) at /usr/src/sys/amd64/amd64/trap.c:765 #7 #8 0x804e0db4 in callout_process (now=now@entry=38385536922300) at /usr/src/sys/kern/kern_timeout.c:488 #9 0x8045ddb8 in handleevents (now=now@entry=38385536922300, fake=fake@entry=0) at /usr/src/sys/kern/kern_clocksource.c:213 #10 0x8045ea3e in timercb (et=0x80d475e0 , arg=) at /usr/src/sys/kern/kern_clocksource.c:357 #11 0x807ca9eb in lapic_handle_timer (frame=0xfe0434de4f40) at /usr/src/sys/x86/x86/local_apic.c:1364 #12 #13 0x00080df42bb6 in ?? () Backtrace stopped: Cannot access memory at address 0x7def2c90 (kgdb) ---- -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: installworld: Certificate Error?
Looks like I picked it up when I installed this box. Removed. Sorry for the noise. :( On 12/07/2021 5:05 pm, Larry Rosenman wrote: I have no clue. On 12/07/2021 5:04 pm, Kyle Evans wrote: Where did this ecpubkey.pem come from? On Tue, Dec 7, 2021, 15:28 Larry Rosenman wrote: -- Installing everything completed on Tue Dec 7 15:23:33 CST 2021 -- 68.45 real 262.43 user95.61 sys Scanning /mnt/usr/share/certs/untrusted for certificates... Scanning /mnt/usr/share/certs/trusted for certificates... Scanning /mnt/usr/local/share/certs for certificates... Scanning /mnt/usr/local/etc/ssl/certs for certificates... unable to load certificate 67912877395968:error:0909006C:PEM routines:get_name:no start line:/usr/src/crypto/openssl/crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE Error: /mnt/usr/local/etc/ssl/certs/ecpubkey.pem *** [installworld] Error code 1 [I] ➜ cat /mnt/usr/local/etc/ssl/certs/ecpubkey.pem -BEGIN PUBLIC KEY- MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE/ZGaXNnGRqI4vEFFlrs3HNfyWjeL 5HcODD2mLHyvI+948pNZ9ngZl/afkZZZOHwcnlChxcBwNsgPFBXf1ZqKIA== -END PUBLIC KEY- ler in src at ler-r610 on main [?] [I] ➜ Can someone fix this? -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: installworld: Certificate Error?
I have no clue. On 12/07/2021 5:04 pm, Kyle Evans wrote: Where did this ecpubkey.pem come from? On Tue, Dec 7, 2021, 15:28 Larry Rosenman wrote: -- Installing everything completed on Tue Dec 7 15:23:33 CST 2021 -- 68.45 real 262.43 user95.61 sys Scanning /mnt/usr/share/certs/untrusted for certificates... Scanning /mnt/usr/share/certs/trusted for certificates... Scanning /mnt/usr/local/share/certs for certificates... Scanning /mnt/usr/local/etc/ssl/certs for certificates... unable to load certificate 67912877395968:error:0909006C:PEM routines:get_name:no start line:/usr/src/crypto/openssl/crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE Error: /mnt/usr/local/etc/ssl/certs/ecpubkey.pem *** [installworld] Error code 1 [I] ➜ cat /mnt/usr/local/etc/ssl/certs/ecpubkey.pem -BEGIN PUBLIC KEY- MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE/ZGaXNnGRqI4vEFFlrs3HNfyWjeL 5HcODD2mLHyvI+948pNZ9ngZl/afkZZZOHwcnlChxcBwNsgPFBXf1ZqKIA== -END PUBLIC KEY- ler in src at ler-r610 on main [?] [I] ➜ Can someone fix this? -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
installworld: Certificate Error?
-- Installing everything completed on Tue Dec 7 15:23:33 CST 2021 -- 68.45 real 262.43 user95.61 sys Scanning /mnt/usr/share/certs/untrusted for certificates... Scanning /mnt/usr/share/certs/trusted for certificates... Scanning /mnt/usr/local/share/certs for certificates... Scanning /mnt/usr/local/etc/ssl/certs for certificates... unable to load certificate 67912877395968:error:0909006C:PEM routines:get_name:no start line:/usr/src/crypto/openssl/crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE Error: /mnt/usr/local/etc/ssl/certs/ecpubkey.pem *** [installworld] Error code 1 [I] ➜ cat /mnt/usr/local/etc/ssl/certs/ecpubkey.pem -BEGIN PUBLIC KEY- MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE/ZGaXNnGRqI4vEFFlrs3HNfyWjeL 5HcODD2mLHyvI+948pNZ9ngZl/afkZZZOHwcnlChxcBwNsgPFBXf1ZqKIA== -END PUBLIC KEY- ler in src at ler-r610 on main [?] [I] ➜ Can someone fix this? -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 signature.asc Description: OpenPGP digital signature
Re: NFSv4 client: Doesn't see files/protocol error 10020
On 10/20/2021 1:58 pm, Larry Rosenman wrote: I have a -CURRENT box that I upgraded yesterday & today, and it no longer can read NFS mounts from my TrueNAS 12.0-U6 server. It mounts, but any access garners: nfsv4 client/server protocol prob err=10020 nfsv4 client/server protocol prob err=10020 the fstab entries: freenas.lerctr.org:/mnt/data/TBH/vault/backup/TBHnfs rw,nfsv4,minorversion=1 0 0 freenas.lerctr.org:/mnt/data/BACULA /vault/backup/BACULA nfs rw,nfsv4,minorversion=1 0 0 Ideas? rmacklem@ helped me diagnose this, the issue (apparently) was some how my TruNAS server screwed up the NFSv4 exports. Re-did them from the GUI, and all is back to normal. Thanks, as always, rmacklem! -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
NFSv4 client: Doesn't see files/protocol error 10020
I have a -CURRENT box that I upgraded yesterday & today, and it no longer can read NFS mounts from my TrueNAS 12.0-U6 server. It mounts, but any access garners: nfsv4 client/server protocol prob err=10020 nfsv4 client/server protocol prob err=10020 the fstab entries: freenas.lerctr.org:/mnt/data/TBH/vault/backup/TBHnfs rw,nfsv4,minorversion=1 0 0 freenas.lerctr.org:/mnt/data/BACULA /vault/backup/BACULA nfs rw,nfsv4,minorversion=1 0 0 Ideas? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: poudriere jail with todays current: Install fail?
On 10/20/2021 7:11 am, Dimitry Andric wrote: On 20 Oct 2021, at 13:51, Larry Rosenman wrote: On 10/20/2021 6:41 am, Dimitry Andric wrote: On 20 Oct 2021, at 03:50, Larry Rosenman wrote: Anyone else having poudriere jail -u or jail -c fail in the installworld? log: https://www.lerctr.org/~ler/jail-install.log The actual error is pretty far from the bottom of that log: --- realinstall_subdir_usr.sbin/lpr/chkprintcap --- install: chkprintcap: No such file or directory So probably usr.sbin/lpr wasn't built during buildworld? Do you have any special settings in e.g. src.conf? -Dimitry I had WITHOUT_LPR=yes in make.conf. But I've had that in there for a LONG time, and this is the first time poudriere has complained. So, I commented that out for now, but I'd like to know why the sudden change. I haven't been able to find how poudriere jail -c passes any src.conf settings to its installworld phase. It does seem to have a bunch of stuff that goes through contortions to put a src.conf into the jail directory, but only during *buildworld*, not during installworld. It could very well be that this use case was broken due to a recent poudriere update. I don't see anything in the recent log of -CURRENT hat indicates some sort of flipping of the MK_LPR default, it has been "yes" for ages now. Whatever the case may be, for some reason you now run into a common problem with the disconnect between buildworld and installworld: if you run these under even slightly different environments, there can be unexpected consequences... :) -Dimitry Thanks, Dimitry! -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 signature.asc Description: OpenPGP digital signature
Re: poudriere jail with todays current: Install fail?
On 10/20/2021 6:41 am, Dimitry Andric wrote: On 20 Oct 2021, at 03:50, Larry Rosenman wrote: Anyone else having poudriere jail -u or jail -c fail in the installworld? log: https://www.lerctr.org/~ler/jail-install.log The actual error is pretty far from the bottom of that log: --- realinstall_subdir_usr.sbin/lpr/chkprintcap --- install: chkprintcap: No such file or directory So probably usr.sbin/lpr wasn't built during buildworld? Do you have any special settings in e.g. src.conf? -Dimitry I had WITHOUT_LPR=yes in make.conf. But I've had that in there for a LONG time, and this is the first time poudriere has complained. So, I commented that out for now, but I'd like to know why the sudden change. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 signature.asc Description: OpenPGP digital signature
poudriere jail with todays current: Install fail?
Anyone else having poudriere jail -u or jail -c fail in the installworld? log: https://www.lerctr.org/~ler/jail-install.log -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 signature.asc Description: OpenPGP digital signature
Fwd: [package - head-amd64-default][sysutils/lsof] Failed for lsof-4.93.2_9,8 in build
ded from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h:33: In file included from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h:48: In file included from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_context.h:73: In file included from /usr/src/sys/cddl/compat/opensolaris/sys/vfs.h:37: /usr/src/sys/cddl/compat/opensolaris/sys/vnode.h:243:10: warning: implicit declaration of function 'VOP_FSYNC' is invalid in C99 [-Wimplicit-function-declaration] error = VOP_FSYNC(vp, MNT_WAIT, curthread); ^ 1 warning generated. --- dproc.o --- cc -pipe -fstack-protector-strong -fno-strict-aliasing -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHAS_DUP2 -DHAS_CLOSEFROM -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_NO_IDEV -DHAS_VM_MEMATTR_T -DNEEDS_DEVICE_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=13000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR=\"13.0-CURRENT\" -I/usr/src/sys -O2 -c dproc.c -o dproc.o --- lib/liblsof.a --- --- lkud.o --- cc -pipe -fstack-protector-strong -fno-strict-aliasing -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHAS_DUP2 -DHAS_CLOSEFROM -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_NO_IDEV -DHAS_VM_MEMATTR_T -DNEEDS_DEVICE_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=13000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="13.0-CURRENT" -I/usr/src/sys -O2 -c lkud.c -o lkud.o --- dproc.o --- dproc.c:350:24: error: no member named 'fd_cdir' in 'struct filedesc' if (!ckscko && fd.fd_cdir) { ~~ ^ dproc.c:353:25: error: no member named 'fd_cdir' in 'struct filedesc' process_node((KA_T)fd.fd_cdir); ~~ ^ dproc.c:360:24: error: no member named 'fd_rdir' in 'struct filedesc' if (!ckscko && fd.fd_rdir) { ~~ ^ dproc.c:363:25: error: no member named 'fd_rdir' in 'struct filedesc' process_node((KA_T)fd.fd_rdir); ~~ ^ dproc.c:372:24: error: no member named 'fd_jdir' in 'struct filedesc' if (!ckscko && fd.fd_jdir) { ~~ ^ dproc.c:375:25: error: no member named 'fd_jdir' in 'struct filedesc' process_node((KA_T)fd.fd_jdir); ~~ ^ 6 errors generated. *** [dproc.o] Error code 1 make[1]: stopped in /wrkdirs/usr/ports/sysutils/lsof/work/lsof-4.93.2 --- lib/liblsof.a --- A failure has been detected in another branch of the parallel make make[2]: stopped in /wrkdirs/usr/ports/sysutils/lsof/work/lsof-4.93.2/lib *** [lib/liblsof.a] Error code 2 make[1]: stopped in /wrkdirs/usr/ports/sysutils/lsof/work/lsof-4.93.2 2 errors make[1]: stopped in /wrkdirs/usr/ports/sysutils/lsof/work/lsof-4.93.2 ===> Compilation failed unexpectedly. Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to the maintainer. *** Error code 1 Stop. make: stopped in /usr/ports/sysutils/lsof -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Panic with ataintel and not ready CD on a Dell r710@r357958
On 02/17/2020 3:13 pm, Larry Rosenman wrote: On 02/17/2020 3:07 pm, Warner Losh wrote: On Feb 17, 2020, at 1:24 PM, Mateusz Guzik wrote: On 2/17/20, Larry Rosenman wrote: On 02/17/2020 1:46 pm, Larry Rosenman wrote: Unread portion of the kernel message buffer: panic: aprobe1: freed with 1 active CCBs cpuid = 22 time = 1581771571 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe01fb9a11a0 vpanic() at vpanic+0x185/frame 0xfe01fb9a1200 panic() at panic+0x43/frame 0xfe01fb9a1260 cam_periph_release_locked_buses() at cam_periph_release_locked_buses+0x372/frame 0xfe01fb9a1780 cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame 0xfe01fb9a17a0 probedone() at probedone+0x186/frame 0xfe01fb9a1c60 xpt_done_process() at xpt_done_process+0x358/frame 0xfe01fb9a1ca0 xpt_done_td() at xpt_done_td+0xf5/frame 0xfe01fb9a1cf0 fork_exit() at fork_exit+0x80/frame 0xfe01fb9a1d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe01fb9a1d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 1m8s Dumping 6077 out of 131029 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:393 #2 0x804bdf80 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:480 #3 0x804be3dd in vpanic (fmt=, ap= out>) at /usr/src/sys/kern/kern_shutdown.c:910 #4 0x804be133 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:836 #5 0x823c5bc2 in camperiphfree (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:685 #6 cam_periph_release_locked_buses (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:450 #7 0x823c5bfb in cam_periph_release_locked (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:461 #8 0x8240dce6 in probedone (periph=0xf80115da2300, done_ccb=) at /usr/src/sys/cam/ata/ata_xpt.c:1352 #9 0x823cee08 in xpt_done_process (ccb_h=0xf8015013e800) at /usr/src/sys/cam/cam_xpt.c:5488 #10 0x823d0db5 in xpt_done_td (arg=0x8243d780 ) at /usr/src/sys/cam/cam_xpt.c:5515 #11 0x80483200 in fork_exit (callout=0x823d0cc0 , arg=0x8243d780 , frame=0xfe01fb9a1d40) at /usr/src/sys/kern/kern_fork.c:1059 #12 (kgdb) Core IS available as is the kernel I do load the ataintel driver as a module. Removing it allows me to boot. What info do you all need? Forgot to include, the previous working version was r356506 Can you try prior to r357647? I’m pretty sure this is mine… and I’ve already reverted the bad change. Warner I've got a world/kernel building at r358050. I'll post back either way. and it boots fine and runs with ataintel back in the mix. Thanks for the quick answer, Warner. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Panic with ataintel and not ready CD on a Dell r710@r357958
On 02/17/2020 3:07 pm, Warner Losh wrote: On Feb 17, 2020, at 1:24 PM, Mateusz Guzik wrote: On 2/17/20, Larry Rosenman wrote: On 02/17/2020 1:46 pm, Larry Rosenman wrote: Unread portion of the kernel message buffer: panic: aprobe1: freed with 1 active CCBs cpuid = 22 time = 1581771571 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe01fb9a11a0 vpanic() at vpanic+0x185/frame 0xfe01fb9a1200 panic() at panic+0x43/frame 0xfe01fb9a1260 cam_periph_release_locked_buses() at cam_periph_release_locked_buses+0x372/frame 0xfe01fb9a1780 cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame 0xfe01fb9a17a0 probedone() at probedone+0x186/frame 0xfe01fb9a1c60 xpt_done_process() at xpt_done_process+0x358/frame 0xfe01fb9a1ca0 xpt_done_td() at xpt_done_td+0xf5/frame 0xfe01fb9a1cf0 fork_exit() at fork_exit+0x80/frame 0xfe01fb9a1d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe01fb9a1d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 1m8s Dumping 6077 out of 131029 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:393 #2 0x804bdf80 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:480 #3 0x804be3dd in vpanic (fmt=, ap=) at /usr/src/sys/kern/kern_shutdown.c:910 #4 0x804be133 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:836 #5 0x823c5bc2 in camperiphfree (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:685 #6 cam_periph_release_locked_buses (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:450 #7 0x823c5bfb in cam_periph_release_locked (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:461 #8 0x8240dce6 in probedone (periph=0xf80115da2300, done_ccb=) at /usr/src/sys/cam/ata/ata_xpt.c:1352 #9 0x823cee08 in xpt_done_process (ccb_h=0xf8015013e800) at /usr/src/sys/cam/cam_xpt.c:5488 #10 0x823d0db5 in xpt_done_td (arg=0x8243d780 ) at /usr/src/sys/cam/cam_xpt.c:5515 #11 0x80483200 in fork_exit (callout=0x823d0cc0 , arg=0x8243d780 , frame=0xfe01fb9a1d40) at /usr/src/sys/kern/kern_fork.c:1059 #12 (kgdb) Core IS available as is the kernel I do load the ataintel driver as a module. Removing it allows me to boot. What info do you all need? Forgot to include, the previous working version was r356506 Can you try prior to r357647? I’m pretty sure this is mine… and I’ve already reverted the bad change. Warner I've got a world/kernel building at r358050. I'll post back either way. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: Panic with ataintel and not ready CD on a Dell r710@r357958
On 02/17/2020 1:46 pm, Larry Rosenman wrote: Unread portion of the kernel message buffer: panic: aprobe1: freed with 1 active CCBs cpuid = 22 time = 1581771571 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe01fb9a11a0 vpanic() at vpanic+0x185/frame 0xfe01fb9a1200 panic() at panic+0x43/frame 0xfe01fb9a1260 cam_periph_release_locked_buses() at cam_periph_release_locked_buses+0x372/frame 0xfe01fb9a1780 cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame 0xfe01fb9a17a0 probedone() at probedone+0x186/frame 0xfe01fb9a1c60 xpt_done_process() at xpt_done_process+0x358/frame 0xfe01fb9a1ca0 xpt_done_td() at xpt_done_td+0xf5/frame 0xfe01fb9a1cf0 fork_exit() at fork_exit+0x80/frame 0xfe01fb9a1d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe01fb9a1d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 1m8s Dumping 6077 out of 131029 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:393 #2 0x804bdf80 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:480 #3 0x804be3dd in vpanic (fmt=, ap=out>) at /usr/src/sys/kern/kern_shutdown.c:910 #4 0x804be133 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:836 #5 0x823c5bc2 in camperiphfree (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:685 #6 cam_periph_release_locked_buses (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:450 #7 0x823c5bfb in cam_periph_release_locked (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:461 #8 0x8240dce6 in probedone (periph=0xf80115da2300, done_ccb=) at /usr/src/sys/cam/ata/ata_xpt.c:1352 #9 0x823cee08 in xpt_done_process (ccb_h=0xf8015013e800) at /usr/src/sys/cam/cam_xpt.c:5488 #10 0x823d0db5 in xpt_done_td (arg=0x8243d780 ) at /usr/src/sys/cam/cam_xpt.c:5515 #11 0x80483200 in fork_exit (callout=0x823d0cc0 , arg=0x8243d780 , frame=0xfe01fb9a1d40) at /usr/src/sys/kern/kern_fork.c:1059 #12 (kgdb) Core IS available as is the kernel I do load the ataintel driver as a module. Removing it allows me to boot. What info do you all need? Forgot to include, the previous working version was r356506 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Panic with ataintel and not ready CD on a Dell r710@r357958
Unread portion of the kernel message buffer: panic: aprobe1: freed with 1 active CCBs cpuid = 22 time = 1581771571 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe01fb9a11a0 vpanic() at vpanic+0x185/frame 0xfe01fb9a1200 panic() at panic+0x43/frame 0xfe01fb9a1260 cam_periph_release_locked_buses() at cam_periph_release_locked_buses+0x372/frame 0xfe01fb9a1780 cam_periph_release_locked() at cam_periph_release_locked+0x1b/frame 0xfe01fb9a17a0 probedone() at probedone+0x186/frame 0xfe01fb9a1c60 xpt_done_process() at xpt_done_process+0x358/frame 0xfe01fb9a1ca0 xpt_done_td() at xpt_done_td+0xf5/frame 0xfe01fb9a1cf0 fork_exit() at fork_exit+0x80/frame 0xfe01fb9a1d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe01fb9a1d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 1m8s Dumping 6077 out of 131029 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:393 #2 0x804bdf80 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:480 #3 0x804be3dd in vpanic (fmt=, ap=out>) at /usr/src/sys/kern/kern_shutdown.c:910 #4 0x804be133 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:836 #5 0x823c5bc2 in camperiphfree (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:685 #6 cam_periph_release_locked_buses (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:450 #7 0x823c5bfb in cam_periph_release_locked (periph=0xf80115da2300) at /usr/src/sys/cam/cam_periph.c:461 #8 0x8240dce6 in probedone (periph=0xf80115da2300, done_ccb=) at /usr/src/sys/cam/ata/ata_xpt.c:1352 #9 0x823cee08 in xpt_done_process (ccb_h=0xf8015013e800) at /usr/src/sys/cam/cam_xpt.c:5488 #10 0x823d0db5 in xpt_done_td (arg=0x8243d780 ) at /usr/src/sys/cam/cam_xpt.c:5515 #11 0x80483200 in fork_exit (callout=0x823d0cc0 , arg=0x8243d780 , frame=0xfe01fb9a1d40) at /usr/src/sys/kern/kern_fork.c:1059 #12 (kgdb) Core IS available as is the kernel I do load the ataintel driver as a module. Removing it allows me to boot. What info do you all need? -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
sysutils/lsof: Recent changes have broken lsof
UFS1_2 -DHAS_NO_IDEV -DHAS_VM_MEMATTR_T -DNEEDS_DEVICE_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=13000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR="13.0-CURRENT" -I/usr/src/sys -O2 -c isfn.c -o isfn.o --- dnode2.o --- 1 warning generated. --- dproc.o --- cc -pipe -fstack-protector-strong -fno-strict-aliasing -DNEEDS_BOOL_TYPEDEF -DHASTASKS -DHAS_PAUSE_SBT -DHAS_DUP2 -DHAS_CLOSEFROM -DHASEFFNLINK=i_effnlink -DHASF_VNODE -DHAS_FILEDESCENT -DHAS_TMPFS -DHASWCTYPE_H -DHASSBSTATE -DHAS_KVM_VNODE -DHAS_UFS1_2 -DHAS_NO_IDEV -DHAS_VM_MEMATTR_T -DNEEDS_DEVICE_T -DHAS_CDEV2PRIV -DHAS_NO_SI_UDEV -DHAS_SYS_SX_H -DHASFUSEFS -DHAS_ZFS -DHAS_V_LOCKF -DHAS_LOCKF_ENTRY -DHAS_NO_6PORT -DHAS_NO_6PPCB -DNEEDS_BOOLEAN_T -DHAS_SB_CCC -DHAS_FDESCENTTBL -DFREEBSDV=13000 -DHASFDESCFS=2 -DHASPSEUDOFS -DHASNULLFS -DHASIPv6 -DHASUTMPX -DHAS_STRFTIME -DLSOF_VSTR=\"13.0-CURRENT\" -I/usr/src/sys -O2 -c dproc.c -o dproc.o dproc.c:693:23: error: no member named 'next' in 'struct vm_map_entry' if (!(ka = (KA_T)e->next)) ~ ^ 1 error generated. *** [dproc.o] Error code 1 from pkg-fallout. Thanks! -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 signature.asc Description: OpenPGP digital signature
ng_snd_item: I thought(?) we fixed this :( r354843
I thought someone, somewhere fixed this, but it's hit again. core *IS* available, and I can give access as well. Unread portion of the kernel message buffer: panic: ng_snd_item: 42 != 1414 cpuid = 0 time = 1574707403 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0215a304d0 vpanic() at vpanic+0x17e/frame 0xfe0215a30530 panic() at panic+0x43/frame 0xfe0215a30590 ng_snd_item() at ng_snd_item+0x482/frame 0xfe0215a305d0 ng_ether_output() at ng_ether_output+0x5e/frame 0xfe0215a30600 ether_output() at ether_output+0x661/frame 0xfe0215a306a0 arpintr() at arpintr+0xf0c/frame 0xfe0215a30840 netisr_dispatch_src() at netisr_dispatch_src+0x94/frame 0xfe0215a308c0 ether_demux() at ether_demux+0x15e/frame 0xfe0215a308f0 ng_ether_rcv_upper() at ng_ether_rcv_upper+0xb2/frame 0xfe0215a30940 ng_apply_item() at ng_apply_item+0xa4/frame 0xfe0215a309c0 ng_snd_item() at ng_snd_item+0x2b0/frame 0xfe0215a30a00 ng_apply_item() at ng_apply_item+0xa4/frame 0xfe0215a30a80 ng_snd_item() at ng_snd_item+0x2b0/frame 0xfe0215a30ac0 ng_ether_input() at ng_ether_input+0x4c/frame 0xfe0215a30af0 ether_nh_input() at ether_nh_input+0x2c9/frame 0xfe0215a30b40 netisr_dispatch_src() at netisr_dispatch_src+0x94/frame 0xfe0215a30bc0 ether_input() at ether_input+0x58/frame 0xfe0215a30c10 bce_intr() at bce_intr+0x6b7/frame 0xfe0215a30c90 ithread_loop() at ithread_loop+0x1c6/frame 0xfe0215a30cf0 fork_exit() at fork_exit+0x80/frame 0xfe0215a30d30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe0215a30d30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 6d7h53m25s Dumping 27613 out of 131029 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392 #2 0x804bbc20 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:479 #3 0x804bc076 in vpanic (fmt=, ap=out>) at /usr/src/sys/kern/kern_shutdown.c:908 #4 0x804bbdd3 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:835 #5 0x8262f442 in ng_snd_item (item=0xf813c2e41280, flags=0) at /usr/src/sys/netgraph/ng_base.c:2252 #6 0x82643c0e in ng_ether_output (ifp=, mp=0xfe0215a30658) at /usr/src/sys/netgraph/ng_ether.c:294 #7 0x805c44e1 in ether_output (ifp=, m=0xf80be7cfe500, dst=0xfe0215a30800, ro=) at /usr/src/sys/net/if_ethersubr.c:430 #8 0x805ded6c in in_arpinput (m=) at /usr/src/sys/netinet/if_ether.c:1144 #9 arpintr (m=0xf80be7cfe500) at /usr/src/sys/netinet/if_ether.c:747 #10 0x805cff94 in netisr_dispatch_src (proto=4, source=0, m=0xf80be7cfe500) at /usr/src/sys/net/netisr.c:1127 #11 0x805c47ce in ether_demux (ifp=0xf8107db75800, m=) at /usr/src/sys/net/if_ethersubr.c:916 #12 0x82644042 in ng_ether_rcv_upper (hook=, item=) at /usr/src/sys/netgraph/ng_ether.c:744 #13 0x8262f514 in ng_apply_item (node=0xf8106c02c200, item=0xf813c2e41280, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403 #14 0x8262f270 in ng_snd_item (item=0xf813c2e41280, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320 #15 0x8262f514 in ng_apply_item (node=0xf8012c69de00, item=0xf813c2e41280, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403 #16 0x8262f270 in ng_snd_item (item=0xf813c2e41280, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320 #17 0x82643c9c in ng_ether_input (ifp=, mp=0xfe0215a30b18) at /usr/src/sys/netgraph/ng_ether.c:255 #18 0x805c5a59 in ether_input_internal (ifp=0xf8107db75800, m=0xf80be7cfe500) at /usr/src/sys/net/if_ethersubr.c:654 #19 ether_nh_input (m=) at /usr/src/sys/net/if_ethersubr.c:735 #20 0x805cff94 in netisr_dispatch_src (proto=5, source=0, m=0xf80be7cfe500) at /usr/src/sys/net/netisr.c:1127 #21 0x805c4c48 in ether_input (ifp=0xf8107db75800, m=0x0) at /usr/src/sys/net/if_ethersubr.c:824 #22 0x82455767 in bce_rx_intr (sc=0xfe0234fc8000) at /usr/src/sys/dev/bce/if_bce.c:6848 #23 bce_intr (xsc=0xfe0234fc8000) at /usr/src/sys/dev/bce/if_bce.c:8017 #24 0x80485166 in intr_event_execute_handlers (p=out>, ie=) at /usr/src/sys/kern/kern_intr.c:1148 #25 ithread_execute_handlers (p=, ie=) at /usr/src/sys/kern/kern_intr.c:1161 #26 ithread_loop (arg=) at /usr/src/sys/kern/kern_intr.c:1241 #27 0x80481ca0 in fork_exit ( callout=0x80484fa0 , arg=0xf8107d1ffcc0, frame=0xfe0215a30d40) at /usr/src/sys/kern/kern_fork.c:1059 #28 (kgdb) -- Larry Rosenman http://peopl
ZFS Panic: Current: r354843: panic: solaris assert: error || lr->lr_length <= size, file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, line: 1324
Ideas? Core *IS* available, and I can give access. Unread portion of the kernel message buffer: panic: solaris assert: error || lr->lr_length <= size, file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c, line: 1324 cpuid = 20 time = 1574159903 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe028c4d1920 vpanic() at vpanic+0x17e/frame 0xfe028c4d1980 panic() at panic+0x43/frame 0xfe028c4d19e0 assfail() at assfail+0x1a/frame 0xfe028c4d19f0 zfs_get_data() at zfs_get_data+0x358/frame 0xfe028c4d1a60 zil_commit_impl() at zil_commit_impl+0xfa5/frame 0xfe028c4d1bb0 zfs_sync() at zfs_sync+0xa2/frame 0xfe028c4d1bd0 sys_sync() at sys_sync+0xf5/frame 0xfe028c4d1c00 amd64_syscall() at amd64_syscall+0x29b/frame 0xfe028c4d1d30 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe028c4d1d30 --- syscall (36, FreeBSD ELF64, sys_sync), rip = 0x80030d7aa, rsp = 0x7fffe138, rbp = 0x7fffe260 --- Uptime: 4h32m18s Dumping 24794 out of 131029 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392 #2 0x804bbc20 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:479 #3 0x804bc076 in vpanic (fmt=, ap=out>) at /usr/src/sys/kern/kern_shutdown.c:908 #4 0x804bbdd3 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:835 #5 0x8177021a in assfail (a=, f=, l=) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 #6 0x81418e98 in zfs_get_data (arg=, lr=0xfe0365716b60, buf=, lwb=0xf813d468a000, zio=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1324 #7 0x813e1775 in zil_lwb_commit (zilog=0xf81044baa800, itx=, lwb=0xf813d468a000) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:1610 #8 zil_process_commit_list (zilog=0xf81044baa800) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:2188 #9 zil_commit_writer (zilog=0xf81044baa800, zcw=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:2321 #10 zil_commit_impl (zilog=, foid=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c:2835 #11 0x81415752 in zfs_sync (vfsp=, waitfor=) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:331 #12 0x80593e35 in sys_sync (td=, uap=out>) at /usr/src/sys/kern/vfs_syscalls.c:142 #13 0x8080c75b in syscallenter (td=0xf816486ce000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144 #14 amd64_syscall (td=0xf816486ce000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1163 #15 #16 0x00080030d7aa in ?? () Backtrace stopped: Cannot access memory at address 0x7fffe138 (kgdb) -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: My buildfarm member now giving permission denied
On 10/01/2019 8:27 pm, Larry Rosenman wrote: FreeBSD SVN rev: r352600 - - 1.69G 2019-09-22 13:13 r352873 NR / 43.1G 2019-09-29 16:36 I went from r352600 to r352873 and now I'm getting PostgreSQL permission denied errors on the check phase of the build. FreeBSD folks: Any ideas? PostgreSQL folks: FYI. latest build log: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2019-10-02%2001%3A20%3A14 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
My buildfarm member now giving permission denied
FreeBSD SVN rev: r352600 - - 1.69G 2019-09-22 13:13 r352873 NR / 43.1G 2019-09-29 16:36 I went from r352600 to r352873 and now I'm getting PostgreSQL permission denied errors on the check phase of the build. FreeBSD folks: Any ideas? PostgreSQL folks: FYI. -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: panic: rcv_start < rcv_end
On 09/10/2019 9:20 am, Michael Tuexen wrote: On 10. Sep 2019, at 14:37, Yuri Pankov wrote: Just seen this almost immediately after booting the system installed from amd64-20190906-r351901 snapshot, trying to do initial pkg bootstrap. Sadly, I didn't have the swap/dump device configured at the time, so no dump was saved. But it looks like I'm not alone, seeing the https://forums.freebsd.org/threads/kernel-panic-on-bhyve-virtualization.7/ topic. Note that I'm running on bare metal, so bhyve isn't involved. My panic screenshot is at https://pasteboard.co/IwLaXXb.jpg. In (the most likely) case it's not helpful enough, I'm now running with dump device configured, and will update if/when the panic reproduces. This panic should be fixed by: https://svnweb.freebsd.org/changeset/base/352072 Please drop me a note if not. Best regards Michael ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" is this the same panic: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240471 I *DO* have a core. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
panic: VOP_UNSET_TEXT returned 22: on r351627
I got the following panic this AM during a poudriere run. r351627 is the revision I'm at. Core *IS* available. Ideas? Unread portion of the kernel message buffer: VNASSERT failed 0xf809e6335960: tag tmpfs, type VREG usecount 1, writecount 0, refcount 2 flags (VI_ACTIVE) v_object 0xf81f37227000 ref 2 pages 1063 cleanbuf 0 dirtybuf 0 lock type tmpfs: SHARED (count 1) tag VT_TMPFS, tmpfs_node 0xf803214f83a0, flags 0x0, links 1 mode 0755, owner 65534, group 0, size 4352808, status 0x0 panic: VOP_UNSET_TEXT returned 22 cpuid = 22 time = 1567862254 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe01bfd618b0 vpanic() at vpanic+0x19d/frame 0xfe01bfd61900 panic() at panic+0x43/frame 0xfe01bfd61960 vm_map_entry_set_vnode_text() at vm_map_entry_set_vnode_text+0x275/frame 0xfe01bfd619b0 vm_map_process_deferred() at vm_map_process_deferred+0x70/frame 0xfe01bfd619d0 vm_map_remove() at vm_map_remove+0xc6/frame 0xfe01bfd61a00 vmspace_exit() at vmspace_exit+0xd8/frame 0xfe01bfd61a40 exit1() at exit1+0x57d/frame 0xfe01bfd61ab0 sys_sys_exit() at sys_sys_exit+0xd/frame 0xfe01bfd61ac0 amd64_syscall() at amd64_syscall+0x29f/frame 0xfe01bfd61bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfe01bfd61bf0 --- syscall (1, FreeBSD ELF64, sys_sys_exit), rip = 0x8008326aa, rsp = 0x7fffe1b8, rbp = 0x7fffe1d0 --- Uptime: 7d15h33m31s Dumping 23246 out of 131027 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392 #2 0x804bcf60 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:479 #3 0x804bd3d9 in vpanic (fmt=, ap=out>) at /usr/src/sys/kern/kern_shutdown.c:905 #4 0x804bd113 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:832 #5 0x807644e5 in vm_map_entry_set_vnode_text (entry=out>, add=) at /usr/src/sys/vm/vm_map.c:557 #6 0x807645a0 in vm_map_process_deferred () at /usr/src/sys/vm/vm_map.c:593 #7 0x8076a1b6 in _vm_map_unlock (map=, file=, line=3653) at /usr/src/sys/vm/vm_map.c:607 #8 vm_map_remove (map=, start=4096, end=140737488355328) at /usr/src/sys/vm/vm_map.c:3653 #9 0x80764118 in vmspace_dofree (vm=) at /usr/src/sys/vm/vm_map.c:335 #10 vmspace_exit (td=0xf8016632c000) at /usr/src/sys/vm/vm_map.c:416 #11 0x8047d27d in exit1 (td=0xf8016632c000, rval=out>, signo=0) at /usr/src/sys/kern/kern_exit.c:416 #12 0x8047ccfd in sys_sys_exit (td=, uap=out>) at /usr/src/sys/kern/kern_exit.c:195 #13 0x807f13df in syscallenter (td=0xf8016632c000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:144 #14 amd64_syscall (td=0xf8016632c000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1180 #15 #16 0x0008008326aa in ?? () Backtrace stopped: Cannot access memory at address 0x7fffe1b8 (kgdb) -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 signature.asc Description: OpenPGP digital signature
Re: sysutils/lsof: VOP_FSYNC definition moved?
On 08/30/2019 10:20 pm, Yuri Pankov wrote: Larry Rosenman wrote: http://home.lerctr.org:/data/live-host-ports/2019-08-30_20h25m06s/logs/errors/lsof-4.93.2_4,8.log --- dnode2.o --- In file included from dnode2.c:56: In file included from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h:33: In file included from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h:47: In file included from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_context.h:73: In file included from /usr/src/sys/cddl/compat/opensolaris/sys/vfs.h:37: /usr/src/sys/cddl/compat/opensolaris/sys/vnode.h:243:10: warning: implicit declaration of function 'VOP_FSYNC' is invalid in C99 [-Wimplicit-function-declaration] error = VOP_FSYNC(vp, MNT_WAIT, curthread); ^ 1 warning generated. A failure has been detected in another branch of the parallel make Real error seems to be way above that (see below), and VOP_FSYNC one is just a fallout from that. It is likely related to r351594 by Konstantin, but I didn't look into the details. You could try defining _SYS_PCPU_H_ before including in dlsof.h with _KERNEL defined -- this seems to fix the lsof build for me. - In file included from ckkv.c:43: In file included from ./../lsof.h:221: In file included from ./../dlsof.h:412: In file included from /usr/src/sys/sys/file.h:44: In file included from /usr/src/sys/sys/refcount.h:36: In file included from /usr/src/sys/sys/systm.h:126: In file included from /usr/src/sys/sys/pcpu.h:223: /usr/include/machine/pcpu_aux.h:55:55: error: expected expression __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, ^ /usr/include/machine/pcpu_aux.h:56:6: error: use of undeclared identifier 'pc_curthread'; did you mean '__curthread'? pc_curthread))); ^ /usr/include/machine/pcpu_aux.h:51:1: note: '__curthread' declared here __curthread(void) ^ /usr/include/machine/pcpu_aux.h:66:56: error: expected expression __asm("movq %%gs:%P1,%0" : "=r" (pcb) : "n" (offsetof(struct pcpu, ^ /usr/include/machine/pcpu_aux.h:67:6: error: use of undeclared identifier 'pc_curpcb'; did you mean '__curpcb'? pc_curpcb))); ^ /usr/include/machine/pcpu_aux.h:62:1: note: '__curpcb' declared here __curpcb(void) Thanks, Yuri. I'd *REALLY* like someone with real kernel knowledge to look at lsof and help modernize the #ifdef mess. -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
sysutils/lsof: VOP_FSYNC definition moved?
http://home.lerctr.org:/data/live-host-ports/2019-08-30_20h25m06s/logs/errors/lsof-4.93.2_4,8.log --- dnode2.o --- In file included from dnode2.c:56: In file included from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h:33: In file included from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/dmu.h:47: In file included from /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_context.h:73: In file included from /usr/src/sys/cddl/compat/opensolaris/sys/vfs.h:37: /usr/src/sys/cddl/compat/opensolaris/sys/vnode.h:243:10: warning: implicit declaration of function 'VOP_FSYNC' is invalid in C99 [-Wimplicit-function-declaration] error = VOP_FSYNC(vp, MNT_WAIT, curthread); ^ 1 warning generated. A failure has been detected in another branch of the parallel make Can some of the kernel folks help me here? Thanks! -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 signature.asc Description: OpenPGP digital signature
panic... r350849M panic: ng_snd_item: 42 != 1414
the M is a patch from rrs@ for the previous crash on m_pullup. Ideas? I have a core. Unread portion of the kernel message buffer: panic: ng_snd_item: 42 != 1414 cpuid = 0 time = 1566303841 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe01265c0400 vpanic() at vpanic+0x19d/frame 0xfe01265c0450 panic() at panic+0x43/frame 0xfe01265c04b0 ng_snd_item() at ng_snd_item+0x455/frame 0xfe01265c04f0 ng_ether_output() at ng_ether_output+0x5e/frame 0xfe01265c0520 ether_output() at ether_output+0x665/frame 0xfe01265c05c0 arpintr() at arpintr+0xfe3/frame 0xfe01265c0780 netisr_dispatch_src() at netisr_dispatch_src+0x89/frame 0xfe01265c07f0 ether_demux() at ether_demux+0x13b/frame 0xfe01265c0820 ng_ether_rcv_upper() at ng_ether_rcv_upper+0x95/frame 0xfe01265c0840 ng_apply_item() at ng_apply_item+0xf9/frame 0xfe01265c08c0 ng_snd_item() at ng_snd_item+0x2af/frame 0xfe01265c0900 ng_apply_item() at ng_apply_item+0xf9/frame 0xfe01265c0980 ng_snd_item() at ng_snd_item+0x2af/frame 0xfe01265c09c0 ng_ether_input() at ng_ether_input+0x4c/frame 0xfe01265c09f0 ether_nh_input() at ether_nh_input+0x2cd/frame 0xfe01265c0a40 netisr_dispatch_src() at netisr_dispatch_src+0x89/frame 0xfe01265c0ab0 ether_input() at ether_input+0x48/frame 0xfe01265c0ad0 bce_intr() at bce_intr+0x697/frame 0xfe01265c0b50 ithread_loop() at ithread_loop+0x187/frame 0xfe01265c0bb0 fork_exit() at fork_exit+0x84/frame 0xfe01265c0bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe01265c0bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 19h52m2s Dumping 27502 out of 131026 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu.h:246 246 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD)); (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:246 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392 #2 0x804bb950 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:479 #3 0x804bbdc9 in vpanic (fmt=, ap=out>) at /usr/src/sys/kern/kern_shutdown.c:905 #4 0x804bbb03 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:832 #5 0x828e1515 in ng_snd_item (item=0xf81769193580, flags=0) at /usr/src/sys/netgraph/ng_base.c:2252 #6 0x828f2c2e in ng_ether_output (ifp=, mp=0xfe01265c0578) at /usr/src/sys/netgraph/ng_ether.c:294 #7 0x805c9975 in ether_output (ifp=0xf80122488000, m=0xf818181a8b00, dst=0xfe01265c0740, ro=) at /usr/src/sys/net/if_ethersubr.c:430 #8 0x805e2e43 in in_arpinput (m=) at /usr/src/sys/netinet/if_ether.c:1152 #9 arpintr (m=0xf818181a8b00) at /usr/src/sys/netinet/if_ether.c:749 #10 0x805d4959 in netisr_dispatch_src (proto=4, source=, m=) at /usr/src/sys/net/netisr.c:1123 #11 0x805c9c3b in ether_demux (ifp=0xf80122488000, m=) at /usr/src/sys/net/if_ethersubr.c:913 #12 0x828f3045 in ng_ether_rcv_upper (hook=, item=) at /usr/src/sys/netgraph/ng_ether.c:741 #13 0x828e1639 in ng_apply_item (node=0xf801c6ae3c00, item=0xf81769193580, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403 #14 0x828e136f in ng_snd_item (item=0xf81769193580, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320 #15 0x828e1639 in ng_apply_item (node=0xf810410a0400, item=0xf81769193580, rw=0) at /usr/src/sys/netgraph/ng_base.c:2403 #16 0x828e136f in ng_snd_item (item=0xf81769193580, flags=0) at /usr/src/sys/netgraph/ng_base.c:2320 #17 0x828f2cbc in ng_ether_input (ifp=, mp=0xfe01265c0a18) at /usr/src/sys/netgraph/ng_ether.c:255 #18 0x805cae8d in ether_input_internal (ifp=0xf80122488000, m=0xf818181a8b00) at /usr/src/sys/net/if_ethersubr.c:654 #19 ether_nh_input (m=) at /usr/src/sys/net/if_ethersubr.c:735 #20 0x805d4959 in netisr_dispatch_src (proto=5, source=, m=) at /usr/src/sys/net/netisr.c:1123 #21 0x805ca078 in ether_input (ifp=0xf80122488000, m=0x0) at /usr/src/sys/net/if_ethersubr.c:823 #22 0x8272f877 in bce_rx_intr (sc=) at /usr/src/sys/dev/bce/if_bce.c:6848 #23 bce_intr (xsc=0xfe013abc2000) at /usr/src/sys/dev/bce/if_bce.c:8017 #24 0x80484997 in intr_event_execute_handlers (p=out>, ie=) at /usr/src/sys/kern/kern_intr.c:1148 #25 ithread_execute_handlers (p=, ie=) at /usr/src/sys/kern/kern_intr.c:1161 #26 ithread_loop (arg=) at /usr/src/sys/kern/kern_intr.c:1241 #27 0x80481544 in fork_exit ( callout=0x80484810 , arg=0xf81058226460, frame=0xfe01265c0c00) at /usr/src/sys/kern/kern_fork.c:1057 #28 (kgdb) -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail:
Panic... r350849 panic: m_copydata, negative off -1
I do have a core if folks want to look. r350849 Unread portion of the kernel message buffer: panic: m_copydata, negative off -1 cpuid = 0 time = 1566090669 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe011a798720 vpanic() at vpanic+0x19d/frame 0xfe011a798770 panic() at panic+0x43/frame 0xfe011a7987d0 m_copydata() at m_copydata+0x17a/frame 0xfe011a798850 rack_output() at rack_output+0x2c00/frame 0xfe011a798a70 tcp_hpts_thread() at tcp_hpts_thread+0x5e6/frame 0xfe011a798b50 ithread_loop() at ithread_loop+0x187/frame 0xfe011a798bb0 fork_exit() at fork_exit+0x84/frame 0xfe011a798bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfe011a798bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 6d22h22m59s Dumping 28815 out of 131026 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu.h:246 246 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD)); (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:246 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:392 #2 0x804bb950 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:479 #3 0x804bbdc9 in vpanic (fmt=, ap=out>) at /usr/src/sys/kern/kern_shutdown.c:905 #4 0x804bbb03 in panic (fmt=) at /usr/src/sys/kern/kern_shutdown.c:832 #5 0x8054868a in m_copydata (m=, off=, len=, cp=) at /usr/src/sys/kern/uipc_mbuf.c:622 #6 0x8268bda0 in rack_output (tp=) at /usr/src/sys/modules/tcp/rack/../../../netinet/tcp_stacks/rack.c:7957 #7 0x80679176 in tcp_hptsi (hpts=) at /usr/src/sys/netinet/tcp_hpts.c:1621 #8 tcp_hpts_thread (ctx=) at /usr/src/sys/netinet/tcp_hpts.c:1842 #9 0x80484997 in intr_event_execute_handlers (p=out>, ie=) at /usr/src/sys/kern/kern_intr.c:1148 #10 ithread_execute_handlers (p=, ie=) at /usr/src/sys/kern/kern_intr.c:1161 #11 ithread_loop (arg=) at /usr/src/sys/kern/kern_intr.c:1241 #12 0x80481544 in fork_exit ( callout=0x80484810 , arg=0xf80106ec01a0, frame=0xfe011a798c00) at /usr/src/sys/kern/kern_fork.c:1057 #13 -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 signature.asc Description: OpenPGP digital signature
Re: [package - head-i386-default][sysutils/lsof] Failed for lsof-4.93.2_2,8 in build
On 07/25/2019 1:40 pm, Justin Hibbits wrote: On Thu, 25 Jul 2019 12:35:32 -0600 Alan Somers wrote: On Thu, Jul 25, 2019 at 12:13 PM Larry Rosenman wrote: > > On 07/25/2019 1:10 pm, Alan Somers wrote: > > On Thu, Jul 25, 2019 at 12:05 PM Larry Rosenman > > wrote: > >> > >> Um Who broke this? ... > > "svn blame" suggests r350199 by kib. However, refcount.h should > > only be included if lsof defines _KERNEL, which normal programs > > shouldn't. So I think this should be considered a bug in lsof. > > -Alan > > > we *HAVE* to define _KERNEL, to get at the kernel structures. Then I think you have to live with this amount of instability. refcount(9) says that you should include . Did you do that? If so, then this is a man page bug and refcount(9) should also specify stdbool.h. -Alan includes already, which typedefs bool. So should suffice to include in lsof. - Justin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" Thanks all! I've got a PR into the lsof repo, and I'll fix it there. If we can't get a release out in the next day or 2, I'll patch it in the port. https://github.com/lsof-org/lsof/pull/70 -- Larry Rosenman http://people.freebsd.org/~ler Phone: +1 214-642-9640 E-Mail: l...@freebsd.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106 ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"