Re: Files not deleted via update procedure: rescue/gbde usr/include/machine/fiq.h usr/lib/include/machine/fiq.h usr/share/man/man4/CAM.4.gz
On 7/27/24 19:28, Mark Millard wrote: On Jul 27, 2024, at 16:07, Mark Millard wrote: The following old files were in the historically incrementally updated directory tree but not in the installation to an empty directory tree (checked via diff -rq): /usr/obj/DESTDIRs/main-CA7-poud/rescue/gbde /usr/obj/DESTDIRs/main-CA7-poud/usr/include/machine/fiq.h /usr/obj/DESTDIRs/main-CA7-poud/usr/lib/include/machine/fiq.h /usr/obj/DESTDIRs/main-CA7-poud/usr/share/man/man4/CAM.4.gz That was an armv7 context. For comparison/contrast, aarch64 had: /usr/obj/DESTDIRs/main-CA76-poud/rescue/gbde /usr/obj/DESTDIRs/main-CA76-poud/usr/lib/debug/usr/tests/cddl/usr.sbin/dtrace/amd64/kinst/ /usr/obj/DESTDIRs/main-CA76-poud/usr/lib/debug/usr/tests/lib/libc/ssp/h_raw.debug /usr/obj/DESTDIRs/main-CA76-poud/usr/share/examples/IPv6/USAGE /usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man2/recvmmsg.2.gz /usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man2/sendmmsg.2.gz /usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man4/CAM.4.gz /usr/obj/DESTDIRs/main-CA76-poud/usr/share/man/man4/geom_map.4.gz /usr/obj/DESTDIRs/main-CA76-poud/usr/tests/cddl/usr.sbin/dtrace/amd64/kinst/ /usr/obj/DESTDIRs/main-CA76-poud/usr/tests/lib/libc/ssp/h_raw Thanks, I've pushed fixes for most of these. The *mmsg.2.gz links are actually not supposed to be stale and D46200 should fix those. h_raw is a bit more of an odd duck that isn't easily solved. I'm not sure why it was installed in the past for you but isn't installed anymore. -- John Baldwin
Re: aesni_load present in /boot/loader.conf on arm64
On 7/31/24 08:15, void wrote: Hi, Looking at man 4 aesni it appears this pertains to intel and AMD only? is its prescence on arm64 a bug? It seems to be added to /boot/loader.conf by default. The method I used to install is to boot to the latest snapshot at the time, then plug in a usb3 disk, ran bsdinstall to that disk, rebooted (this booted initially to the installer image), mounted the msdos partition on /mnt. moved the /boot/efi/efi from the installed-to disk out of the way, copied everything in /mnt to /boot/efi, moved the /boot/efi/efi back to where it originally was, halted the machine and removed the installer image. This was to achieve zfs-on-root. Maybe something about the way I installed meant aesni was added? Looks like bsdinstall hardcodes aesni without doing an architecture check for both ZFS and geli. Probably the bits of the zfsboot script referencing aesni need to switch on the architecture. The trick is that depending on the architecture you may want to load more than one module. For 14 I think you could get by with something like: crypto_kld() { case `uname -m` in amd64|i386) echo "aesni" ;; arm64) echo "armv8crypto" ;; *) echo "" } Then in the other parts of zfsboot call this function and treat it as a list of modules. On main I think you would want 32-bit arm and powerpc64 to list ossl, and you might want to include ossl for x86 and arm64 as well (eventually ossl should replace aesni and armv8crypto IMO). Side topic: the ossl(4) manpage in main is stale and needs to be updated to reflect armv7 and powerpc64 support. I'm not sure yet if it supports AES-GCM for armv8 as well. -- John Baldwin
Re: armv7-on-aarch64 stuck at urdlck
> On Jul 24, 2024, at 06:50, Konstantin Belousov wrote: > > On Wed, Jul 24, 2024 at 12:34:57PM +0200, m...@freebsd.org wrote: >> >> >> On 24.07.2024 12:24, Konstantin Belousov wrote: >>> On Tue, Jul 23, 2024 at 08:11:13PM +, John F Carr wrote: >>>> On Jul 23, 2024, at 13:46, Michal Meloun wrote: >>>>> >>>>> On 23.07.2024 11:36, Konstantin Belousov wrote: >>>>>> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote: >>>>>>> The good news is that I'm finally able to generate a working/locking >>>>>>> test case. The culprit (at least for me) is if "-mcpu" is used when >>>>>>> compiling libthr (e.g. indirectly injected via CPUTYPE in >>>>>>> /etc/make.conf). >>>>>>> If it is not used, libthr is broken (regardless of -O level or >>>>>>> debug/normal >>>>>>> build), but -mcpu=cortex-a15 will always produce a working libthr. >>>>>> I think this is very significant progress. >>>>>> Do you plan to drill down more to see what is going on? >>>>> >>>>> So the problem is now clear, and I fear it may apply to other >>>>> architectures as well. >>>>> dlopen_object() (from rtld_elf), >>>>> https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766, >>>>> holds the rtld_bind_lock write lock for almost the entire time a new >>>>> library is loaded. >>>>> If the code uses a yet unresolved symbol to load the library, the >>>>> rtl_bind() function attempts to get read lock of rtld_bind_lock and a >>>>> deadlock occurs. >>>>> >>>>> In this case, it round_up() in _thr_stack_fix_protection, >>>>> https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136. >>>>> Issued by __aeabi_uidiv (since not all armv7 processors support HW >>>>> divide). >>>>> >>>>> Unfortunately, I'm not sure how to fix it. The compiler can emit >>>>> __aeabi_<> in any place, and I'm not sure if it can resolve all the >>>>> symbols used by rtld_eld and libthr beforehand. >>>>> >>>>> >>>>> Michal >>>>> >>>> >>>> In this case (but not for all _aeabi_ functions) we can avoid division >>>> as long as page size is a power of 2. >>>> >>>> The function is >>>> >>>> static inline size_t >>>> round_up(size_t size) >>>> { >>>>if (size % _thr_page_size != 0) >>>>size = ((size / _thr_page_size) + 1) * >>>>_thr_page_size; >>>>return size; >>>> } >>>> >>>> The body can be condensed to >>>> >>>> return (size + _thr_page_size - 1) & ~(_thr_page_size - 1); >>>> >>>> This is shorter in both lines of code and instruction bytes. >>> >>> Lets not allow this to be lost. Could anybody confirm that the patch >>> below fixes the issue? >>> >>> commit d560f4f6690a48476565278fd07ca131bf4eeb3c >>> Author: Konstantin Belousov >>> Date: Wed Jul 24 13:17:55 2024 +0300 >>> >>> rtld: avoid division in __thr_map_stacks_exec() >>> The function is called by rtld with the rtld bind lock write-locked, >>> when fixing the stack permission during dso load. Not every ARMv7 CPU >>> supports the div, which causes the recursive entry into rtld to resolve >>> the __aeabi_uidiv symbol, causing self-lock. >>> Workaround the problem by using roundup2() instead of open-coding less >>> efficient formula. >>> Diagnosed by: mmel >>> Based on submission by: John F Carr >>> Sponsored by: The FreeBSD Foundation >>> MFC after: 1 week >>> > Just realized that it is wrong. Stack size is user-controlled and it does > not need to be power of two. Your change is correct. _thr_page_size is set to getpagesize(), which is a power of 2. The call to roundup2 takes a user-provided size and rounds it up to a multiple of the system page size. I tested the change and it works. My change also works and should compile to identical code. I forgot there was a standard function to do the rounding. > For final resolving of deadlocks, after a full day of digging, I'm very much >> incline of adding -znow to the linker flags for libthr.so (and maybe also >> for ld-elf.so). The runtime cost of resolving all symbols at startup is very >> low. Direct pre-solving in _thr_rtld_init() is problematic for the _aeabi_* >> symbols, since they don't have an official C prototypes, and some are not >> compatible with C calling conventions. > I do not like it. `-z now' changes (breaks) the ABI and makes some symbols > not preemtible. > > In the worst case, we would need a call to the asm routine which causes the > resolution of the _eabi_* symbols on arm. > It would also be possible to link libthr with libgcc.a and use a linker map to hide the _eabi_ symbols.
Re: armv7-on-aarch64 stuck at urdlck
On Jul 23, 2024, at 13:46, Michal Meloun wrote: > > On 23.07.2024 11:36, Konstantin Belousov wrote: >> On Tue, Jul 23, 2024 at 09:53:41AM +0200, Michal Meloun wrote: >>> The good news is that I'm finally able to generate a working/locking >>> test case. The culprit (at least for me) is if "-mcpu" is used when >>> compiling libthr (e.g. indirectly injected via CPUTYPE in /etc/make.conf). >>> If it is not used, libthr is broken (regardless of -O level or debug/normal >>> build), but -mcpu=cortex-a15 will always produce a working libthr. >> I think this is very significant progress. >> Do you plan to drill down more to see what is going on? > > So the problem is now clear, and I fear it may apply to other architectures > as well. > dlopen_object() (from rtld_elf), > https://cgit.freebsd.org/src/tree/libexec/rtld-elf/rtld.c#n3766, > holds the rtld_bind_lock write lock for almost the entire time a new library > is loaded. > If the code uses a yet unresolved symbol to load the library, the rtl_bind() > function attempts to get read lock of rtld_bind_lock and a deadlock occurs. > > In this case, it round_up() in _thr_stack_fix_protection, > https://cgit.freebsd.org/src/tree/lib/libthr/thread/thr_stack.c#n136. > Issued by __aeabi_uidiv (since not all armv7 processors support HW divide). > > Unfortunately, I'm not sure how to fix it. The compiler can emit __aeabi_<> > in any place, and I'm not sure if it can resolve all the symbols used by > rtld_eld and libthr beforehand. > > > Michal > In this case (but not for all _aeabi_ functions) we can avoid division as long as page size is a power of 2. The function is static inline size_t round_up(size_t size) { if (size % _thr_page_size != 0) size = ((size / _thr_page_size) + 1) * _thr_page_size; return size; } The body can be condensed to return (size + _thr_page_size - 1) & ~(_thr_page_size - 1); This is shorter in both lines of code and instruction bytes. John Carr
Re: armv7-on-aarch64 stuck at urdlck: I got a replication of the "ampere2" bulk build hangup problem on a Windows DevKit 2023
> On Jul 22, 2024, at 12:51, Mark Millard wrote: > > Another systematic difference in my personal builds vs. > official pkgbase builds, snapshots, releases, etc. is > that my armv7 builds are built on aarch64-as-armv7, not > on amd64. Not that I have any specific evidence that > such matters here. > > But Michal Meloun's report indicated not using builds > done on amd64 as well. ("Tegra" models and examples of > ARMv7-A and of ARMv8-A.) > > For John Carr, I do not know if amd64 based builds of > the world were systematically in use, never in use, > or some mix in his tests. > > === > Mark Millard > marklmi at yahoo.com > I reproduced the hang with code built on aarch64. I have not been cross-compiling from amd64. For poudriere I use armv7 jails running on aarch64. One of them just hit the hang with 14.1-STABLE kernel and 15.0-CURRENT userspace. # ps -d -J 1021 PID TT STATTIME COMMAND 77550 1 IJ 0:00.27 /usr/bin/make -C /usr/ports/graphics/librsvg2-rust stage 77574 1 IJ 0:00.00 - /bin/sh -e /wrkdirs/usr/ports/graphics/librsvg2-rust/work/makeiFVIOP 77575 1 IJ 0:00.06 `-- gmake -f Makefile DESTDIR=/wrkdirs/usr/ports/graphics/librsvg2-rust/wo 77576 1 IJ 0:00.06 `-- gmake INSTALL_PROGRAM=/bin/sh /wrkdirs/usr/ports/graphics/librsvg2-r 77577 1 IJ 0:00.06 `-- gmake install-recursive 77578 1 IJ 0:00.00 `-- /bin/sh -c fail=; \\\nif (target_option=k; case ${target_option- 77709 1 IJ 0:00.01 `-- gmake install 77710 1 IJ 0:00.00 `-- /bin/sh -c ( /usr/local/bin/gdk-pixbuf-query-loaders ./libpi 77711 1 IJ 0:00.01 `-- /usr/local/bin/gdk-pixbuf-query-loaders ./libpixbufloader- # ps -l -p 77711 UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 65534 77711 77710 27 20 0 27520 16660 urdlck IJ1 0:00.01 /usr/local/bin/gdk-pixbuf-query-l Poudriere told me I shouldn't run a newer userspace than kernel. It usually works despite the warning.
Re: 41dfea24eec panics during ata attach on ESXi VM
On 6/5/24 4:35 AM, Yuri Pankov wrote: After updating to 41dfea24eec (GENERIC-NODEBUG), ESXi VM started to panic while attaching atapci children. I was unable to grab original boot panic data ("keyboard" dead), but was able to boot with hint.ata.0.disabled=1, hint.ata.1.disabled=1, and `devctl enable ata0` reproduced the issue: ata0: at channel 0 on atapci0 This should be fixed now by commit 56b822a17cde5940909633c50623d463191a7852. Sorry for the breakage. -- John Baldwin
Re: Deprecating smbfs(5) and removing it before FreeBSD 14
> > Thank you for the message. I'm glad someone has the courage to take the > plunge. Smbfs is still very important to me. In a heterogeneous environment > it is still the most common way to share data between systems. > Are you planning the final version as a kernel module, or will the final > version be via FUSE? I have had bad experiences with FUSE in the past with > stability and performance. The final version will be a kernel module. It will also be BSD licensed. I am not an expert at the VFS layer so I want to get the stack ironed out in FUSE before moving it into kernel space. - John signature.asc Description: PGP signature
Re: Deprecating smbfs(5) and removing it before FreeBSD 14
On Mon, Jun 27, 2022 at 03:27:54PM +0200, Miroslav Lachman wrote: > On 16/06/2022 15:56, Rick Macklem wrote: > > Miroslav Lachman <000.f...@quip.cz> wrote: > > > On 24/01/2022 16:13, Rick Macklem wrote: > > > > > [...] > > > > > > > So, I think Mark and Yuri are correct and looking at up to date > > > > Illumos sources is the next step. > > > > (As I mentioned, porting the Apple sources is beyond what I am > > > >willing to attempt.) > > > > > > > > rick > > > > > > Hello Rick, > > > I would like to ask you I there is some progress with porting newer > > > SMBFS / CIFS version to FreeBSD? Did you find Illumos sources as a > > > possibility where to start porting? > > Yes. I have the stuff off Illumos-gate, which I think is pretty up-to-date > > and I agree that it should be easier than the Apple stuff to port into > > FreeBSD. I don't think it is "straightforward" as someone involved > > with Illumos said, due to the big differences in VFS/locking, but... > > > > Having said the above, I have not done much yet. I've been cleaning up > > NFS stuff, although I am nearly done with that now. > > I do plan on starting to work on it soon, but have no idea if/when I > > will have something that might be useful for others. > > I'm glad to hear that. > > > > We have more and more problems with current state of mount_smbfs. I > > > would be really glad if "somebody" can do the heroic work of > > > implementing SMBv2 in FreeBSD. > > > Maybe it's time to start some fundraising for sponsoring this work? > > Well, funding isn't an issue for me (I'm just a retired guy who does this > > stuff as a hobby). However, if there is someone else who is capable of > > doing it if they are funded, I have no problem with that. > > I could either help them, or simply stick with working on NFS and leave > > SMBv23 to them. > > > > Sorry, but I cannot report real progress on this as yet, rick > > No need to sorry. I really appreciate your endless work on NFS and that you > still have kind of interest to try porting SMBv2/3. > Unfortunately I don't know anybody else trying to do this tremendous work. > I am working on a from scratch implementation of smbfs. I do not have any kind of time estimate since it is in my spare time. I chose this route after spending considerable time looking at Apple and Solaris implementations and wanting something without all of the legacy 1.0 crap. I do have a very minimal working FUSE version at this point, but there is much to do, and even more to abide by the various specifications. I just thought I'd share in case anyone is interested. - John signature.asc Description: PGP signature
Re: gcc behavior of init priority of .ctors and .dtors section
On 5/16/24 4:05 PM, Lorenzo Salvadore wrote: On Thursday, May 16th, 2024 at 20:26, Konstantin Belousov wrote: gcc13 from ports `# gcc ctors.c && ./a.out init 1 init 2 init 5 init 4 init 3 main fini 3 fini 4 fini 5 fini 2 fini 1` The above order is not expected. I think clang's one is correct. Further hacking with readelf shows that clang produces the right order of section .rela.ctors but gcc does not. ``` # clang -fno-use-init-array -c ctors.c && readelf -r ctors.o | grep 'Relocation section with addend (.rela.ctors)' -A5 > clang.txt # gcc -c ctors.c && readelf -r ctors.o | grep 'Relocation section with addend (.rela.ctors)' -A5 > gcc.txt # diff clang.txt gcc.txt 3,5c3,5 < 00080001 R_X86_64_64 0060 init_65535_2 + 0 < 0008 00070001 R_X86_64_64 0040 init + 0 < 0010 00060001 R_X86_64_64 0020 init_65535 + 0 --- 00060001 R_X86_64_64 0011 init_65535 + 0 0008 00070001 R_X86_64_64 0022 init + 0 0010 00080001 R_X86_64_64 0033 init_65535_2 + 0 ``` The above show clearly gcc produces the wrong order of section `.rela.ctors`. Is that expected behavior ? I have not tried Linux version of gcc. Note that init array vs. init function behavior is encoded by a note added by crt1.o. I suspect that the problem is that gcc port is built without --enable-initfini-array configure option. Indeed, support for .init_array and .fini_array has been added to the GCC ports but is present in the *-devel ports only for now. I will soon proceed to enable it for the GCC standard ports too. lang/gcc14 is soon to be added to the ports tree and will have it since the beginning. If this is indeed the issue, switching to a -devel GCC port should fix it. FWIW, the devel/freebsd-gcc* ports have passed this flag to GCC's configure for a long time (since we made the switch in clang). -- John Baldwin
Kernel build broken without "options KTRACE"
Getting a set but not used warning for “td” in sys/kern/kern_condvar.c when doing a buildkernel for a config file without “options KTRACE”. I failed to copy the full error message/line numbers but I will reproduce this evening if needed. JN
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic [now fixed]
On Wed, Feb 14, 2024 at 06:19:04PM -0800, Mark Millard wrote: > Your changes have the RPi4B that previously got the > panic to boot all the way instead. Details: > > I have updated my pkg base environment to have the > downloaded kernels (and kernel source) with your > changes and have booted with each of: > > /boot/kernel/kernel > /boot/kernel.GENERIC-NODEBUG/kernel > > For reference: > > # uname -apKU > FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT > main-n268300-d79b6b8ec267 GENERIC-NODEBUG arm64 aarch64 1500014 1500012 > > Thanks for the fix. > > Now I'll update the rest of pkg base materials. The recent changes resolved my boot issues as well. FreeBSD 15.0-CURRENT #245 main-n268300-d79b6b8ec26 (GENERIC-NODEBUG arm64 1500014)
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic [now fixed]
On 2/14/24 11:03 PM, Mark Millard wrote: On Feb 14, 2024, at 18:19, Mark Millard wrote: Your changes have the RPi4B that previously got the panic to boot all the way instead. Details: I have updated my pkg base environment to have the downloaded kernels (and kernel source) with your changes and have booted with each of: /boot/kernel/kernel /boot/kernel.GENERIC-NODEBUG/kernel For reference: # uname -apKU FreeBSD aarch64-main-pkgs 15.0-CURRENT FreeBSD 15.0-CURRENT main-n268300-d79b6b8ec267 GENERIC-NODEBUG arm64 aarch64 1500014 1500012 Thanks for the fix. Now I'll update the rest of pkg base materials. Question: Are any of the changes to be MFC'd at some point? If I do I will merge a large batch at once, and probably adjust the order. For example, I'll merge the pci_host_generic changes before pci_pci changes so that stable branches will be bisectable. -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/14/24 10:16 AM, Mark Millard wrote: Top posting a related but separate item: I looked up some old (2022-Dec-17) lspci -v output from a Linux boot. Note the "Memory at" value 6 (in the 35 bit BCM2711 address space) and the "(64-bit, non-prefetchable)" (and "[size=4K]"). 01:00.0 USB controller: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller (rev 01) (prog-if 30 [XHCI]) Subsystem: VIA Technologies, Inc. VL805/806 xHCI USB 3.0 Controller Device tree node: /sys/firmware/devicetree/base/scb/pcie@7d50/pci@0,0/usb@0,0 Flags: bus master, fast devsel, latency 0, IRQ 51 Memory at 6 (64-bit, non-prefetchable) [size=4K] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable+ Count=1/4 Maskable- 64bit+ Capabilities: [c4] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Kernel driver in use: xhci_hcd "Memory at 6 (64-bit, non-prefetchable)": Violation of a PCIe standard? No, this is a device BAR which can be 64-bit (memory BARs can either be 32-bits or 64-bits). However, the "window" in a PCI _bridge_ for memory is only defined to be 32-bits. Windows in PCI-PCI bridges are a special type of BAR that defines the address ranges that the bridge decodes on the parent side and passes down to child devices. The prefetchable window in PCI-PCI bridges can optionally be 64-bit. BAR == a range of memory or I/O port addresses decoded by a device, usually mapped to a register bank, but sometimes mapped to internal memory (e.g. a framebuffer) Window == a range of memory or I/O port addresses decoded by a bridge for which transactions are passed across the bridge to be handled by a child device. -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/14/24 9:57 AM, Mark Millard wrote: On Feb 14, 2024, at 08:08, John Baldwin wrote: On 2/12/24 5:57 PM, Mark Millard wrote: On Feb 12, 2024, at 16:36, Mark Millard wrote: On Feb 12, 2024, at 16:10, Mark Millard wrote: On Feb 12, 2024, at 12:00, Mark Millard wrote: [Gack: I was looking at the wrong vintage of source code, predating your changes: wrong system used.] On Feb 12, 2024, at 10:41, Mark Millard wrote: On Feb 12, 2024, at 09:32, John Baldwin wrote: On 2/9/24 8:13 PM, Mark Millard wrote: Summary: pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 . . . rman_manage_region: request: start 0x6, end 0x6000f panic: Failed to add resource to rman Hmmm, I suspect this is due to the way that bus_translate_resource works which is fundamentally broken. It rewrites the start address of a resource in-situ instead of keeping downstream resources separate from the upstream resources. For example, I don't see how you could ever release a resource in this design without completely screwing up your rman. That is, I expect trying to detach a PCI device behind a translating bridge that uses the current approach should corrupt the allocated resource ranges in an rman long before my changes. That said, that doesn't really explain the panic. Hmm, the panic might be because for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource hack only kicks in the activate_resource method of pci_host_generic.c. Detail: . . . pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 This indicates this is a translating bus. pcib1: irq 91 at device 0.0 on pci0 rman_manage_region: request: start 0x1, end 0x1 pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10 rman_reserve_resource_bound: request: [0xc000, 0xc00f], length 0x10, flags 102, device pcib1 rman_reserve_resource_bound: trying 0x <0xc000,0xf> considering [0xc000, 0x] truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10) candidate region: [0xc000, 0xc00f], size 0x10 allocating from the beginning rman_manage_region: request: start 0x6, end 0x6000f What you later typed does not match: 0x6 0x6000f You later typed: 0x6000 0x600fff This seems to have lead to some confusion from using the wrong figure(s). The fact that we are trying to reserve the CPU addresses in the rman is because bus_translate_resource rewrote the start address in the resource after it was allocated. That said, I can't see why rman_manage_region would actually fail. At this point the rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), so only the check that should be failing are the checks against rm_start and rm_end. For the memory window, rm_start is always 0, and rm_end is always 0x, so both the old (0xc - 0xc00f) and new (0x6000 - 0x600fff) ranges are within those bounds. No: 0x .vs (actual): 0x6 0x6000f Ok, then this explains the failure if the "raw" addresses are above 4G. I have access to an emag I'm currently using to test fixes to pci_host_generic.c to avoid corrupting struct resource objects. I'll post the diff once I've got something verified to work. It looks to me like in sys/dev/pci/pci_pci.c the: static void pcib_probe_windows(struct pcib_softc *sc) { . . . pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x); . . . is just inappropriately restrictive about where in the system address space a PCIe can validly be mapped to on the high end. That, in turn, leads to the rejection on the RPi4B now that the range use is checked. No, the physical register in PCI-PCI bridges is only 32-bits. Only the prefetchable BAR supports 64-bit addresses. Just for my edification . . . As I understand, SYS_RES_MEMORY for the BCM2711 means the 35 bit addressing space in the BCM2711, not a PCIe device internal address range that corresponds. Am I wrong about that? If I'm wrong, what does identify the 35 bit addressing space in the BCM2711? If I'm correct, then the 0..0x seems to be from the wrong address space up front. Or, may be, the SYS_RES_MEMORY and the 0x argments are not related as I expected and the 0x is not a SYS_RES_MEMORY value? We use SYS_RES_MEMORY for both address spaces. SYS_RES_MEMORY is more of an address space "type" and doesn't necessarily name a single, unique address space. The way to think about these address spaces is instances of 'struct rman'. There's a global 'struct rman' in the arm64
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/14/24 8:42 AM, Warner Losh wrote: On Wed, Feb 14, 2024 at 9:08 AM John Baldwin wrote: On 2/12/24 5:57 PM, Mark Millard wrote: On Feb 12, 2024, at 16:36, Mark Millard wrote: On Feb 12, 2024, at 16:10, Mark Millard wrote: On Feb 12, 2024, at 12:00, Mark Millard wrote: [Gack: I was looking at the wrong vintage of source code, predating your changes: wrong system used.] On Feb 12, 2024, at 10:41, Mark Millard wrote: On Feb 12, 2024, at 09:32, John Baldwin wrote: On 2/9/24 8:13 PM, Mark Millard wrote: Summary: pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 . . . rman_manage_region: request: start 0x6, end 0x6000f panic: Failed to add resource to rman Hmmm, I suspect this is due to the way that bus_translate_resource works which is fundamentally broken. It rewrites the start address of a resource in-situ instead of keeping downstream resources separate from the upstream resources. For example, I don't see how you could ever release a resource in this design without completely screwing up your rman. That is, I expect trying to detach a PCI device behind a translating bridge that uses the current approach should corrupt the allocated resource ranges in an rman long before my changes. That said, that doesn't really explain the panic. Hmm, the panic might be because for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource hack only kicks in the activate_resource method of pci_host_generic.c. Detail: . . . pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 This indicates this is a translating bus. pcib1: irq 91 at device 0.0 on pci0 rman_manage_region: request: start 0x1, end 0x1 pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10 rman_reserve_resource_bound: request: [0xc000, 0xc00f], length 0x10, flags 102, device pcib1 rman_reserve_resource_bound: trying 0x <0xc000,0xf> considering [0xc000, 0x] truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10) candidate region: [0xc000, 0xc00f], size 0x10 allocating from the beginning rman_manage_region: request: start 0x6, end 0x6000f What you later typed does not match: 0x6 0x6000f You later typed: 0x6000 0x600fff This seems to have lead to some confusion from using the wrong figure(s). The fact that we are trying to reserve the CPU addresses in the rman is because bus_translate_resource rewrote the start address in the resource after it was allocated. That said, I can't see why rman_manage_region would actually fail. At this point the rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), so only the check that should be failing are the checks against rm_start and rm_end. For the memory window, rm_start is always 0, and rm_end is always 0x, so both the old (0xc - 0xc00f) and new (0x6000 - 0x600fff) ranges are within those bounds. No: 0x .vs (actual): 0x6 0x6000f Ok, then this explains the failure if the "raw" addresses are above 4G. I have access to an emag I'm currently using to test fixes to pci_host_generic.c to avoid corrupting struct resource objects. I'll post the diff once I've got something verified to work. It looks to me like in sys/dev/pci/pci_pci.c the: static void pcib_probe_windows(struct pcib_softc *sc) { . . . pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x); . . . is just inappropriately restrictive about where in the system address space a PCIe can validly be mapped to on the high end. That, in turn, leads to the rejection on the RPi4B now that the range use is checked. No, the physical register in PCI-PCI bridges is only 32-bits. Only the prefetchable BAR supports 64-bit addresses. This is why the host bridge is doing a translation from the CPU side (0x6) to the PCI BAR addresses (0xc000) so that the BAR addresses are down in the 32-bit address range. It's also true that many PCI devices only support 32-bit addresses in memory BARs. 64-bit BARs are an optional extension not universally supported. The translation here is somewhat akin to a type of MMU where the CPU addresses are mapped to PCI addresses. The problem here is that the PCI BAR resources need to "stay" as PCI addresses since we depend on being able to use rman_get_start/end to get the PCI addresses of allocated resources, but pci_host_generic.c currently rewrites the addresses. Probably I should remove rman_set_start/end entirely (Warner added them back in 2004) as the methods don't
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/12/24 5:57 PM, Mark Millard wrote: On Feb 12, 2024, at 16:36, Mark Millard wrote: On Feb 12, 2024, at 16:10, Mark Millard wrote: On Feb 12, 2024, at 12:00, Mark Millard wrote: [Gack: I was looking at the wrong vintage of source code, predating your changes: wrong system used.] On Feb 12, 2024, at 10:41, Mark Millard wrote: On Feb 12, 2024, at 09:32, John Baldwin wrote: On 2/9/24 8:13 PM, Mark Millard wrote: Summary: pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 . . . rman_manage_region: request: start 0x6, end 0x6000f panic: Failed to add resource to rman Hmmm, I suspect this is due to the way that bus_translate_resource works which is fundamentally broken. It rewrites the start address of a resource in-situ instead of keeping downstream resources separate from the upstream resources. For example, I don't see how you could ever release a resource in this design without completely screwing up your rman. That is, I expect trying to detach a PCI device behind a translating bridge that uses the current approach should corrupt the allocated resource ranges in an rman long before my changes. That said, that doesn't really explain the panic. Hmm, the panic might be because for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource hack only kicks in the activate_resource method of pci_host_generic.c. Detail: . . . pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 This indicates this is a translating bus. pcib1: irq 91 at device 0.0 on pci0 rman_manage_region: request: start 0x1, end 0x1 pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10 rman_reserve_resource_bound: request: [0xc000, 0xc00f], length 0x10, flags 102, device pcib1 rman_reserve_resource_bound: trying 0x <0xc000,0xf> considering [0xc000, 0x] truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10) candidate region: [0xc000, 0xc00f], size 0x10 allocating from the beginning rman_manage_region: request: start 0x6, end 0x6000f What you later typed does not match: 0x6 0x6000f You later typed: 0x6000 0x600fff This seems to have lead to some confusion from using the wrong figure(s). The fact that we are trying to reserve the CPU addresses in the rman is because bus_translate_resource rewrote the start address in the resource after it was allocated. That said, I can't see why rman_manage_region would actually fail. At this point the rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), so only the check that should be failing are the checks against rm_start and rm_end. For the memory window, rm_start is always 0, and rm_end is always 0x, so both the old (0xc - 0xc00f) and new (0x6000 - 0x600fff) ranges are within those bounds. No: 0x .vs (actual): 0x6 0x6000f Ok, then this explains the failure if the "raw" addresses are above 4G. I have access to an emag I'm currently using to test fixes to pci_host_generic.c to avoid corrupting struct resource objects. I'll post the diff once I've got something verified to work. It looks to me like in sys/dev/pci/pci_pci.c the: static void pcib_probe_windows(struct pcib_softc *sc) { . . . pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0x); . . . is just inappropriately restrictive about where in the system address space a PCIe can validly be mapped to on the high end. That, in turn, leads to the rejection on the RPi4B now that the range use is checked. No, the physical register in PCI-PCI bridges is only 32-bits. Only the prefetchable BAR supports 64-bit addresses. This is why the host bridge is doing a translation from the CPU side (0x6) to the PCI BAR addresses (0xc000) so that the BAR addresses are down in the 32-bit address range. It's also true that many PCI devices only support 32-bit addresses in memory BARs. 64-bit BARs are an optional extension not universally supported. The translation here is somewhat akin to a type of MMU where the CPU addresses are mapped to PCI addresses. The problem here is that the PCI BAR resources need to "stay" as PCI addresses since we depend on being able to use rman_get_start/end to get the PCI addresses of allocated resources, but pci_host_generic.c currently rewrites the addresses. Probably I should remove rman_set_start/end entirely (Warner added them back in 2004) as the methods don't do anything to deal with the fallout that the rman.rm_list linked-list is no longer sorted by address once some addresses get rewritten, etc. -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/10/24 2:09 PM, Michael Butler wrote: I have stability problems with anything at or after this commit (b377ff8) on an amd64 laptop. While I see the following panic logged, no crash dump is preserved :-( It happens after ~5-6 minutes running in KDE (X). Reverting to 36efc64 seems to work reliably (after ACPI changes but before the problematic PCI one) kernel: Fatal trap 12: page fault while in kernel mode kernel: cpuid = 2; apic id = 02 kernel: fault virtual address = 0x48 kernel: fault code= supervisor read data, page not present kernel: instruction pointer = 0x20:0x80acb962 kernel: stack pointer = 0x28:0xfe00c4318d80 kernel: frame pointer = 0x28:0xfe00c4318d80 kernel: code segment = base 0x0, limit 0xf, type 0x1b kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 kernel: processor eflags = interrupt enabled, resume, IOPL = 0 kernel: current process = 2 (clock (0)) kernel: rdi: f802e460c000 rsi: rdx: 0002 kernel: rcx: r8: 001e r9: fe00c4319000 kernel: rax: 0002 rbx: f802e460c000 rbp: fe00c4318d80 kernel: r10: 1388 r11: 7ffc765d r12: 000f kernel: r13: 0002 r14: f8000193e740 r15: kernel: trap number = 12 kernel: panic: page fault kernel: cpuid = 2 kernel: time = 1707573802 kernel: Uptime: 6m19s kernel: Dumping 942 out of 16242 MB:..2%..11%..21%..31%..41%..51%..62%..72%..82%..92% kernel: Dump complete kernel: Automatic reboot in 15 seconds - press a key on the console to abort Without a stack trace it is pretty much impossible to debug a panic like this. Do you have KDB_TRACE enabled in your kernel config? I'm also not sure how the PCI changes can result in a panic post-boot. If you were going to have problems they would be during device attach, not after you are booted and running X. Short of a stack trace, you can at least use lldb or gdb to lookup the source line associated with the faulting instruction pointer (as long as it isn't in a kernel module), e.g. for gdb you would use 'gdb /boot/kernel/kernel' and then 'l *', e.g. from above: 'l *0x80acb962' -- John Baldwin
Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic
On 2/9/24 8:13 PM, Mark Millard wrote: Summary: pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 . . . rman_manage_region: request: start 0x6, end 0x6000f panic: Failed to add resource to rman Hmmm, I suspect this is due to the way that bus_translate_resource works which is fundamentally broken. It rewrites the start address of a resource in-situ instead of keeping downstream resources separate from the upstream resources. For example, I don't see how you could ever release a resource in this design without completely screwing up your rman. That is, I expect trying to detach a PCI device behind a translating bridge that uses the current approach should corrupt the allocated resource ranges in an rman long before my changes. That said, that doesn't really explain the panic. Hmm, the panic might be because for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource hack only kicks in the activate_resource method of pci_host_generic.c. Detail: . . . pcib0: mem 0x7d50-0x7d50930f irq 80,81 on simplebus2 pcib0: parsing FDT for ECAM0: pcib0: PCI addr: 0xc000, CPU addr: 0x6, Size: 0x4000 This indicates this is a translating bus. pcib1: irq 91 at device 0.0 on pci0 rman_manage_region: request: start 0x1, end 0x1 pcib0: rman_reserve_resource: start=0xc000, end=0xc00f, count=0x10 rman_reserve_resource_bound: request: [0xc000, 0xc00f], length 0x10, flags 102, device pcib1 rman_reserve_resource_bound: trying 0x <0xc000,0xf> considering [0xc000, 0x] truncated region: [0xc000, 0xc00f]; size 0x10 (requested 0x10) candidate region: [0xc000, 0xc00f], size 0x10 allocating from the beginning rman_manage_region: request: start 0x6, end 0x6000f The fact that we are trying to reserve the CPU addresses in the rman is because bus_translate_resource rewrote the start address in the resource after it was allocated. That said, I can't see why rman_manage_region would actually fail. At this point the rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), so only the check that should be failing are the checks against rm_start and rm_end. For the memory window, rm_start is always 0, and rm_end is always 0x, so both the old (0xc - 0xc00f) and new (0x6000 - 0x600fff) ranges are within those bounds. I would instead expect to see some other issue later on where we fail to allocate a resource for a child BAR, but I wouldn't expect rman_manage_region to fail. Logging the return value from rman_manage_region would be the first step I think to see which error value it is returning. Probably I should fix pci_host_generic.c to handle translation properly however. I can work on a patch for that. -- John Baldwin
Re: Alder lake supported? (graphics)
In message , Chris writes: >I upgraded to an alder lake based machine and installed 14. >But I can't seem to get the intel graphics loaded (drm-515-kmod). >It simply freezes at load. Shot in the dark: # pkg delete drm-515-kmod && pkg install drm-510-kmod && kldload i915kms John groenv...@acm.org
Re: How to upgrade an EOL FreeBSD release or how to make it working again
Judging by a commit message BSD on the ARM Chromebook didn't work when support was removed in 2019. >RK* Exynos* and Meson*/Odroid* don't even work with current >source code, if someone wants to make them work again they >better use the Linux DTS. https://cgit.freebsd.org/src/commit?id=9dfa2a54684978d1d6cef67bbf6242e825801f18 I have one of the "snow" Chromebooks. The warnings in the web page https://wiki.freebsd.org/arm/Chromebook led me not to try FreeBSD. None of the many bugs seemed likely to ever be fixed. I'm not using it so I could try an experiment, but fighting with u-boot is not how I want to spend my days. Even the popular Raspberry Pi takes skill or luck. (So "build an arm6 world and copy X, Y, and Z to the DOS partition on your USB drive" is the kind of advice I need to supplement the old Chromebook wiki page.) There is at least a little value in getting it to work because the armv6 code is bit rotting and will go away entirely unless people use it. John Carr > On Jan 15, 2024, at 10:59, Mario Marietto wrote: > > Hello to everyone. > > I'm trying to install FreeBSD 14 natively on my ARM Chromebook model xe303c12 > ; I've found only one tutorial that teaches how to do that,that's it : > > https://wiki.freebsd.org/arm/Chromebook > > The problem is that it ends with the installation of FreeBSD 11,that's very > EOL. > I can't use it as is. I need to upgrade it to 14 (but I'm on arm 32 > bit,that's TIER-2,so I can't upgrade it automatically using the > freebsd-update script. It is also true that I can't install 14 directly on > that machine,as you can read below : > > > > > I've looked all around and I found the tool pkgbase,that I'm talking about on > the FreeBSD forum,to understand if it allows the 11 to be usable or > upgradable. It does not seem to be the proper tool to achieve my goal. Do you > have any suggestions that can help me ? Thanks. > > -- > Mario.
Re: ZFS problems since recently ?
On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote: > Please see/test: https://github.com/openzfs/zfs/pull/15732 . Looks like that has landed in current: commit f552d7adebb13e24f65276a6c4822bffeeac3993 Merge: 13720136fbf a382e21194c Author: Martin Matuska Date: Wed Jan 10 09:07:45 2024 +0100 zfs: merge openzfs/zfs@a382e2119 Notable upstream pull request merges: #15693 a382e2119 Add Gotify notification support to ZED --> #15732 e78aca3b3 Fix livelist assertions for dedup and cloning #15733 7ecaa0758 make zdb_decompress_block check decompression reliably #15735 255741fc9 Improve block sizes checks during cloning Obtained from: OpenZFS OpenZFS commit: a382e21194c1690951d2eee8ebd98bc096f01c83
Re: ZFS problems since recently ?
On Tue, Jan 02, 2024 at 08:02:04PM -0800, John Kennedy wrote: > On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote: > > On 01.01.2024 08:59, John Kennedy wrote: > > > ... > > >My poudriere build did eventually fail as well: > > > ... > > > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > > > [05:40:24] Stopping 2 builders > > > panic: VERIFY(BP_GET_DEDUP(bp)) failed > > > > Please see/test: https://github.com/openzfs/zfs/pull/15732 . > > It came back today at the end of my poudriere build. Your patch has fixed > it, so far at least. At the risk of conflating this with other ZFS issues, I beat on the VM a lot more last night without triggering any panics. My usual busy-workload is a total kernel+world rebuild (with whatever pending patches might be out), then a poudriere run (~230 or so packages). It's weird that the first (much bigger) run worked but later ones didn't (where maybe I had one port that failed to build), triggering the panic. Seemed repeatable, but don't have a feel for the exact trigger like the sysctl issue.
Re: ZFS problems since recently ?
On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote: > On 01.01.2024 08:59, John Kennedy wrote: > > ... > >My poudriere build did eventually fail as well: > > ... > > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > > [05:40:24] Stopping 2 builders > > panic: VERIFY(BP_GET_DEDUP(bp)) failed > > Please see/test: https://github.com/openzfs/zfs/pull/15732 . It came back today at the end of my poudriere build. Your patch has fixed it, so far at least.
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote: > > On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > > > markj@ pointed me in > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > > > to > > > https://github.com/openzfs/zfs/pull/15719 > > > > > > So it will probably be fixed sooner or later. > > > > > > The other ZFS crashes I've seen are still an issue. > > > > My poudriere build did eventually fail as well: > > ... > > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > > [05:40:24] Stopping 2 builders > > panic: VERIFY(BP_GET_DEDUP(bp)) failed > > That's one of the panic messages I had as well. > > See > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276051 > > for additional crashes and dumps. > > > I didn't tweak this system off defaults for block-cloning. I haven't > > been following > > that issue 100%. > > Do you have > vfs.zfs.dmu_offset_next_sync=0 > ? I reverted everything and reinstalled. The VERIFY(BP_GET_DEDUP(bp)) panic hasn't reoccurred (tended to happen on poudriere-build cleanup), which may lean it more towards corruption, or maybe I just haven't been "lucky" with my small random chance of corruption. I did set vfs.zfs.dmu_offset_next_sync=0 after the bsdinstall was complete (maybe I could have loaded the zfs kernel module from the shell and set it before things kicked off).
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 08:42:26AM -0800, John Kennedy wrote: > Applying the two ZFS kernel patches fixes that issue: commit 09af4bf2c987f6f57804162cef8aeee05575ad1d (zfs: Fix SPA sysctl handlers) landed too. root@bsd15:~ # sysctl -a | grep vfs.zfs.zio vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.zio.requeue_io_start_cut_in_line: 1 vfs.zfs.zio.slow_io_ms: 3 vfs.zfs.zio.taskq_wr_iss_ncpus: 0 vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5 vfs.zfs.zio.taskq_read: fixed,1,8 null scale null vfs.zfs.zio.taskq_batch_tpq: 0 vfs.zfs.zio.taskq_batch_pct: 80 vfs.zfs.zio.exclude_metadata: 0 root@bsd15:~ # uname -aUK FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #1 main-n267336-09af4bf2c98: Mon Jan 1 12:04:15 PST 2024 warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > > > I can crash mine with "sysctl -a" as well. > > markj@ pointed me in > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > to > https://github.com/openzfs/zfs/pull/15719 > > So it will probably be fixed sooner or later. > > The other ZFS crashes I've seen are still an issue. Applying the two ZFS kernel patches fixes that issue: root@bsd15:~ # sysctl -a | grep vfs.zfs.zio vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.zio.requeue_io_start_cut_in_line: 1 vfs.zfs.zio.slow_io_ms: 3 vfs.zfs.zio.taskq_wr_iss_ncpus: 0 vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5 vfs.zfs.zio.taskq_read: fixed,1,8 null scale null vfs.zfs.zio.taskq_batch_tpq: 0 vfs.zfs.zio.taskq_batch_pct: 80 vfs.zfs.zio.exclude_metadata: 0 root@bsd15:~ # uname -aUK FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #2 main-n267335-499e84e16f5-dirty: Mon Jan 1 08:04:59 PST 2024 warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote: > Do you have >vfs.zfs.dmu_offset_next_sync=0 I didn't initially, I do now. Like I said, I haven't been following that one 100%. I know it isn't block-clone per say, so much as some underlying problem it pokes with a pointy stick. Small chance multiplied by a bunch of ZFS IOPS. Seems like I'd have to revert it all the way back to fresh install if I want to get rid of all potential corruption unrelated to sysctl panic. But I'll do myh busy-work cycle (*) with that one and maybe another with it off and see what happens. * full kernel+world, plus my local poudriere package build, currenly wedged a bit with the heimdall build issue.
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > markj@ pointed me in > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > to > https://github.com/openzfs/zfs/pull/15719 > > So it will probably be fixed sooner or later. > > The other ZFS crashes I've seen are still an issue. My poudriere build did eventually fail as well: ... [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success [05:40:24] Stopping 2 builders panic: VERIFY(BP_GET_DEDUP(bp)) failed cpuid = 2 time = 1704091946 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00f62898c0 vpanic() at vpanic+0x131/frame 0xfe00f62899f0 spl_panic() at spl_panic+0x3a/frame 0xfe00f6289a50 dsl_livelist_iterate() at dsl_livelist_iterate+0x2de/frame 0xfe00f6289b30 bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs+0x235/frame 0xfe00f6289bf0 bpobj_iterate_impl() at bpobj_iterate_impl+0x16e/frame 0xfe00f6289c80 dsl_process_sub_livelist() at dsl_process_sub_livelist+0x5c/frame 0xfe00f6289d00 spa_livelist_delete_cb() at spa_livelist_delete_cb+0xf6/frame 0xfe00f6289ea0 zthr_procedure() at zthr_procedure+0xa5/frame 0xfe00f6289ef0 fork_exit() at fork_exit+0x82/frame 0xfe00f6289f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe00f6289f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 9 tid 100223 ] Stopped at kdb_enter+0x33: movq$0,0xe3a582(%rip) db> Trying to do another poudriere build fails almost immediatly with that verify error. Your verify errors don't match up exactly. I've got snapshots from before I started freaking it out with the sysctl calls and possibly inducing corruption. I didn't tweak this system off defaults for block-cloning. I haven't been following that issue 100%.
Re: ZFS problems since recently ?
> I can crash mine with "sysctl -a" as well. Smaller test, this is sufficient to crash things: root@bsd15:~ # sysctl vfs.zfs.zio vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.ziopanic: sbuf_clear makes no sense on sbuf 0xf8002c8dc300 with drain cpuid = 3 time = 1704069514 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00fa502960 vpanic() at vpanic+0x131/frame 0xfe00fa502a90 panic() at panic+0x43/frame 0xfe00fa502af0 sbuf_clear() at sbuf_clear+0xa8/frame 0xfe00fa502b00 sbuf_cpy() at sbuf_cpy+0x56/frame 0xfe00fa502b20 spa_taskq_write_param() at spa_taskq_write_param+0x85/frame 0xfe00fa502bd0 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 0xfe00fa502c20 sysctl_root() at sysctl_root+0x21e/frame 0xfe00fa502ca0 userland_sysctl() at userland_sysctl+0x184/frame 0xfe00fa502d50 sys___sysctl() at sys___sysctl+0x60/frame 0xfe00fa502e00 amd64_syscall() at amd64_syscall+0x153/frame 0xfe00fa502f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe00fa502f30 --- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x3733c1e5619a, rsp = 0x3733bf494538, rbp = 0x3733bf494570 --- KDB: enter: panic [ thread pid 780 tid 100237 ] Stopped at kdb_enter+0x33: movq$0,0xe3a582(%rip) db>
Re: ZFS problems since recently ?
On Sun, Dec 31, 2023 at 07:34:45PM +0100, Kurt Jaeger wrote: > Hi! > > Short overview: > - Had CURRENT system from around September > - Upgrade on the 23th of December > - crashes in ZFS, see > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538 > for details > - Reinstalled from scratch with new SSDs drives from > https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/15.0/ > freebsd-openzfs-amd64-2020081900-memstick.img.xz > - Had one crash with > sysctl -a > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > - Still see crashes with ZFS (and other) when using poudriere to > build ports. > > Problem: > > I happen to run in several cases of crashes in ZFS, some of > them fatal (zpool non-recoverable). I can crash mine with "sysctl -a" as well. I seeded my bhyve with: FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso Rebuilt the kernel (so now at main-n267320-4d08b569a01) and started crunching through poudriere package builds. Sorta stock install of encrypted ZFS. I didn't get it to crash with poudriere (yet). Mine lives in bhyve, so maybe less possible destruction via crashes. KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00fa5f3960 vpanic() at vpanic+0x131/frame 0xfe00fa5f3a90 panic() at panic+0x43/frame 0xfe00fa5f3af0 sbuf_clear() at sbuf_clear+0xa8/frame 0xfe00fa5f3b00 sbuf_cpy() at sbuf_cpy+0x56/frame 0xfe00fa5f3b20 spa_taskq_write_param() at spa_taskq_write_param+0x85/frame 0xfe00fa5f3bd0 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 0xfe00fa5f3c20 sysctl_root() at sysctl_root+0x21e/frame 0xfe00fa5f3ca0 userland_sysctl() at userland_sysctl+0x184/frame 0xfe00fa5f3d50 sys___sysctl() at sys___sysctl+0x60/frame 0xfe00fa5f3e00 amd64_syscall() at amd64_syscall+0x153/frame 0xfe00fa5f3f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe00fa5f3f30 --- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x22e42167019a, rsp = 0x22e41ee72518, rbp = 0x22e41ee72550 --- KDB: enter: panic The sysctl died at this point, but who knows if it had pending buffered output or anything... ... vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.zio.requeue_io_start_cut_in_line: 1 vfs.zfs.zio.slow_io_ms: 3 vfs.zfs.zio.taskq_wr_iss_ncpus: 0
Re: make installworld fails because /usr/include/c++/v1/__tuple is a file
On 12/10/23 8:43 AM, Dimitry Andric wrote: On 10 Dec 2023, at 15:11, Herbert J. Skuhra wrote: On Sun, Dec 10, 2023 at 01:22:38PM +, John F Carr wrote: On arm64 running CURRENT from two weeks ago I updated to c711af772782 Bump __FreeBSD_version for llvm 17.0.6 merge and built and installed from source. make installworld failed: install: target directory `/usr/include/c++/v1/__tuple/' does not exist That pathname is a file: -r--r--r-- 1 root wheel 20512 Feb 15 2023 /usr/include/c++/v1/__tuple Early in make output is mtree -deU -i -f /usr/src/etc/mtree/BSD.include.dist -p /usr/include ./c++/v1/__algorithm/pstl_backends missing (created) [...] ./c++/v1/__tuple missing (not created: File exists) Should I remove the file and try again, or is there a more elegant fix? The word "tuple" does not appear in UPDATING. 'make delete-old' should have removed this file. bdd1243df58e6 (Dimitry Andric 2023-04-14 23:41:27 +0200 965) OLD_FILES+=usr/include/c++/v1/__tuple Ah yes, that's it. The file was removed during the upgrade from libc++ 15.0 to 16.0, while its contents was split into a subdirectory named __tuple_dir. In libc++ 17.0.0 they renamed this subdirectory back to just __tuple. This means that apparently people are not running "make delete-old" after installations. Please don't forget that. :) Well, but if you have an old system with LLVM 15 that you upgrade directly to LLVM 17 you will hit this even if you ran delete-old after your last upgrading that used LLVM 15. We might need something to cope with this during the install target for libc++ in particular where this has occurred multiple times historically. -- John Baldwin
make installworld fails because /usr/include/c++/v1/__tuple is a file
On arm64 running CURRENT from two weeks ago I updated to c711af772782 Bump __FreeBSD_version for llvm 17.0.6 merge and built and installed from source. make installworld failed: install: target directory `/usr/include/c++/v1/__tuple/' does not exist That pathname is a file: -r--r--r-- 1 root wheel 20512 Feb 15 2023 /usr/include/c++/v1/__tuple Early in make output is mtree -deU -i -f /usr/src/etc/mtree/BSD.include.dist -p /usr/include ./c++/v1/__algorithm/pstl_backends missing (created) [...] ./c++/v1/__tuple missing (not created: File exists) Should I remove the file and try again, or is there a more elegant fix? The word "tuple" does not appear in UPDATING.
Re: How do I update the kernel of FreeBSD-CURRENT
On Nov 29, 2023, at 12:21 PM, Manoel Games wrote: > > I am a new FreeBSD user, and I am using FreeBSD-CURRENT. How do I update the > FreeBSD-CURRENT kernel, and is it done through pkg? I installed > FreeBSD-CURRENT without src. As a new user you should probably run a supported release version, such as 14.0. Releases have binary updates available via freebsd-update. (Upgrading the base OS via pkg is still experimental.) Current has no such feature, so you need to download/update the source and recompile. See the Handbook chapter on upgrading FreeBSD: https://docs.freebsd.org/en/books/handbook/cutting-edge/ JN
Re: bhyve -G
On 11/15/23 3:06 PM, Bakul Shah wrote: On Nov 15, 2023, at 7:57 AM, John Baldwin wrote: On 10/9/23 5:21 PM, Bakul Shah wrote: Any hints on how to use bhyve's -G option to debug a VM kernel? I can connect to it from gdb with "target remote :" & bhyve stops the VM initially but beyond that I am not sure. Ideally this should work just like an in-circuit-emulator, not requiring anything special in the VM or kernel itself. step only works on Intel CPUs currently (and is a bit fragile anyway due to interrupts firing while you try to step, but that happens for me in QEMU as well). Breakpoints should work fine. I tend to use 'until' to do stepping (basically stepping via temporary breakpoints) when debugging the kernel this way. Thanks for your response! I can ^C to stop the VM, examine the stack, set breakpoints, continue etc. but when the breakpoint is hit, kgdb doesn't regain control -- instead I get the usual db> ... prompt on the console. I guess I have to set some sysctl for this? Hmm, no, it shouldn't be breaking into DDB in the guest as the breakpoint exception should be intercepted by the stub and never made visible to the guest. -- John Baldwin
Re: [HEADS-UP] Quick update to 14.0-RELEASE schedule
On 11/14/23 8:52 PM, Glen Barber wrote: On Tue, Nov 14, 2023 at 08:10:23PM -0700, The Doctor wrote: On Wed, Nov 15, 2023 at 02:27:01AM +, Glen Barber wrote: On Tue, Nov 14, 2023 at 05:15:48PM -0700, The Doctor wrote: On Tue, Nov 14, 2023 at 08:36:54PM +, Glen Barber wrote: We are still waiting for a few (non-critical) things to complete before the announcement of 14.0-RELEASE will be ready. It should only be another day or so before these things complete. Thank you for your understanding. I always just installed my copy. Ok. I do not know what exactly is your point, but releases are never official until there is a PGP-signed email sent. The email is intended for the general public of consumers of official releases, not "yeah, but"s. Howver if you do a freebsd-update upgrade, you can upgrade. Is that suppose to happen? That does not say that the freebsd-update bits will not change *until* the official release announcement has been sent. In my past 15 years involved in the Project, I think we have been very clear on that. A RELEASE IS NOT FINAL UNTIL THE PGP-SIGNED ANNOUNCEMENT IS SENT. I mean, c'mon, dude. We really, seriously, for all intents and purposes, cannot be any more clear than that. So, yes, *IF* an update necessitates a new freebsd-update build, what you are running is *NOT* official. For at least 15 years, we have all said the same entire thing. Yes, but, if at this point we had to rebuild, it would have to be 14.0.1 or something (which we have done a few times in the past). It would be too confusing otherwise once the bits are built and published (where published means "uploaded to our CDN"). It is the 14.0 release bits, the only question is if for some reason we had a dire emergency that meant we had to pull it at the last minute and publish different bits (under a different release name). Realistically, once the bits are available, we can't prevent people from using them, it's just at their own risk to do so until the project says "yes, we believe these are good". Granted, they are under the same risk if they are still running the last RC. The best way to minimize that risk going forward is to add more automation of testing/CI to go along with the process of building release bits so that the build artifacts from the release build run through CI and are only published if the CI is green as that would give us greater confidence of "we believe these are good" before they are uploaded for publishing. -- John Baldwin
Re: bsdinstall/scriptedpart could not run ;-(
On 11/12/23 11:00 PM, KIRIYAMA Kazuhiko wrote: Hi, all I usually run bsdinstall by instllerconfig, but bsdinstall/scriptedpart could not run ;-( My installerconfig is: PARTITIONS='nda0 gpt { 200M efi, 804G freebsd-ufs /, 128G freebsd-swap }' DISTRIBUTIONS='base.txz kernel-dbg.txz kernel.txz lib32.txz tests.txz' ZFSBOOT_DISKS="" #!/bin/sh /bin/mkdir -p /.dake /bin/cp /usr/share/zoneinfo/Asia/Tokyo /etc/localtime /bin/cp /root/.cshrc /root/.cshrc.org /bin/cat <> /etc/fstab 192.168.1.17:/.dake /.dake nfs rw 0 0 EOF sed -i".bak" -Ee '/^#BDS_install.sh_added:start_line$/,/^#BDS_install.sh_added:end_line$/d' /root/.cshrc /bin/cat <<'EOF' >> /root/.cshrc #BDS_install.sh_added:start_line setenv PATH${PATH}:/.dake/bin setenv MGRHOME /usr/home/admin setenv OPENTOOLSDIR/.dake setenv DAKEDIR /.dake #BDS_install.sh_added:end_line EOF : (snip) : I investigated in bsdinstall script and found scriptedpart which acutually run partedit with scriptedpart would not be destroy existing partition. In fact scriptedpart -> partedit changed in script as follows, then parttion editor run at terminal. My guess is something to do with commit 23099099196548550461ba427dcf09dcfb01878d, though I don't see how it could work any differently in this case as the only change to part_config there was to return if if geom_gettree fails, and if it fails, provider_for_name would presumably have failed anyway. -- John Baldwin
Re: bhyve -G
On 10/9/23 5:21 PM, Bakul Shah wrote: Any hints on how to use bhyve's -G option to debug a VM kernel? I can connect to it from gdb with "target remote :" & bhyve stops the VM initially but beyond that I am not sure. Ideally this should work just like an in-circuit-emulator, not requiring anything special in the VM or kernel itself. step only works on Intel CPUs currently (and is a bit fragile anyway due to interrupts firing while you try to step, but that happens for me in QEMU as well). Breakpoints should work fine. I tend to use 'until' to do stepping (basically stepping via temporary breakpoints) when debugging the kernel this way. -- John Baldwin
Re: KTLS thread on 14.0-RC3
On 10/30/23 3:41 AM, Zhenlei Huang wrote: On Oct 30, 2023, at 12:09 PM, Zhenlei Huang wrote: On Oct 29, 2023, at 5:43 PM, Gordon Bergling wrote: Hi, I am currently building a new system, which should be based on 14.0-RELEASE. Therefor I am tracking releng/14.0 since its creation and updating it currently via the usualy buildworld steps. What I have noticed recently is, that the [KTLS] is missing. I have a stable/13 system which shows the [KTLS] thread and a very recent -CURRENT that also shows the [KTLS] thread. The stable/13 and releng/14.0 systems both use the GENERIC kernel, without any custom modifications. Loaded KLDs are also the same. Did I miss something, or is there something in releng/14.0 missing, which is currenlty enabled in stable/13? KTLS shall still work as intended, the creation of it threads is deferred. See a72ee355646c (ktls: Defer creation of threads and zones until first use) Run ktls_init() when the first KTLS session is created rather than unconditionally during boot. This avoids creating unused threads and allocating unused resources on systems which do not use KTLS. ``` -SYSINIT(ktls, SI_SUB_SMP + 1, SI_ORDER_ANY, ktls_init, NULL); ``` Seems 14.0 only create one KTLS thread. IIRC 13.2 create one thread per core. That part should not be different. There should always be one thread per core. -- John Baldwin
15/14 upgrades break old sudo, maybe bump PAM's shlib?
I upgraded my laptop from a late June current to current from yesterday today, and after installworld sudo stopped working (dies with a SIGBUS). After some debugging, the issue ended up being OpenSSL library version mismatches as sudo uses PAM and PAM is linked agianst OpenSSL 3, but sudo is linked against OpenSSL 1.1.1. Both shlibs get mapped into the the process and at some point sudo crosses the streams and the crash occurs inside OpenSSL 3's libcrypto. I realize that we do have a generate note about needing to update third party packages after an upgrade, but I tend to use sudo as part of my workflow for doing that sort of thing. I generally build all my own packages via poudriere and use sudo at various points in that process, but even if I were using FreeBSD.org packages I would be using sudo to try to run 'pkg upgrade'. su(8) in base works fine, so that's my workaround for now on my laptop, but I wonder if we want to make this particular bump on the upgrade path a little less bumpy? Either by being clear in our release notes that tools like sudo (and I suspect any other third-party su wrappers that also use PAM, xscreensaver's screen lock doesn't seem to be affected since it probably doesn't use OpenSSL directly thankfully) can break, or another route we could take would be to bump the DSO versions of things that depend on libcrypto/libssl in base. We did not do this latter approach for the OpenSSL 1.0.2 -> 1.1.1 upgrade FWIW. If we wanted to do the shlib bump approach, Enji had a good list from a while back (though Enji wanted to make them all private rather than bumping): - kerberos - libarchive - libbsnmp - libfetch - libgeli - libldns - libmp - libradius - libunbound From my research it seems that PAM (library and modules), gssapi libraries, and libzfs would also need to be on the list. libldns is already private as is libunbound, though bumping them might be safter anyway. There is on libgeli, instead there is geli_eli.so which has no version, but hopefully is not widely used in ports the same as PAM. Note also that if we did this, we would want to do it for 14.0 as 13.x -> 14 upgrades are affected in the same way. -- John Baldwin
Re: user problems when upgrading to v15
On 9/2/23 7:11 AM, Dimitry Andric wrote: On 1 Sep 2023, at 03:42, brian whalen wrote: Repeating the entire process: I created a 13.2 vm with 6 cores and 8GB of ram. Ran freebsd-update fetch and install. Ran pkg install git bash ccache open-vm-tools-nox11 Used git clone to get current and ports source files. Edited /etc/make.conf to use ccache Ran make -j6 buildworld && make -j6 kernel I then rebooted in single user mode and did the next steps saving output to a file with > filename. etcupdate -p was pretty uneventful. It did show the below and did not prompt to edit. root@f15:~ # less etcupdatep C /etc/group C /etc/master.passwd This is a problem: the "C" characters mean there were conflicts, and it's indeed very unfortunate that etcupdate does not immediately force you to resolve them. Because now you basically have mangled group and master.passwd files, with conflict markers in them! No, the conflicted files are in /var/db/etcupdate/conflicts, the files in /etc are still the old ones at this point and won't be updated until you run 'etcupdate resolve' to fix them. I suspect what happened here is that Brian chose the 'tf' (theirs-full) option for 'etcupdate resolve' when he really wanted to do 'e' to edit the conflicted version. Immediately after this, you should run "etcupdate resolve", and fix any conflicts that it has found. Note that recently there was a lot of churn due to the removal of $FreeBSD$ keywords, and this almost always creates conflicts in the group and passwd files. For lots of other files in /etc, the conflicts are resolved automatically, but unfortunately not for the files that are essential to log in! make installworld seemed mostly error free though I did see a nonzero status for a man page failed inn the man4 directory. etcupdate -B only showed the below. This was my first build after install. root@f15:~ # less etcupdateB Conflicts remain from previous update, aborting. Yes, that is indeed the problem. You must first resolve conflicts from any previous etcupdate run, before doing anything else. As to why it does not immediately forces you to do so, and delegates this to a separate step, which can easily be forgotten, I have no idea. So that if you are doing scripted upgrades, you don't hang forever in a script. The intention is that after doing a bunch of scripted installworld + etcupdate's on various hosts you can use 'etcupdate status' to see if there are any remaining steps requiring manual intervention. There could be an option to request batched vs interactivate updates perhaps. If I type exit in single user mode to go multi user mode, the local user still works. After a reboot the local user still works. This local user can also sudo as expected. This wasn't the case for the previous build when I first reported this. However, if I run etcupdate resolve it is still presenting /etc/group and /etc/master/passwd as problems. If this is is expected behavior for current then no big deal. I just wasn't sure. The conflicts themselves are expected, alas. But you _must_ resolve them, otherwise you can end up with a mostly-bricked system. No, the conflict markers are not placed in the versions in /etc. However, etucpdate does refuse to do a "new" upgrade until you resolve all the conflicts from your previous upgrade to ensure that conflicted upgrades aren't missed. -- John Baldwin
sscanf change prevents build of CURRENT
I had a problem yesterday and today rebuilding a -CURRENT system from source: --- magic.mgc --- ./mkmagic magic magic, 4979: Warning: Current entry does not yet have a description for adding a MIME type mkmagic: could not find any valid magic files! The cause was an sscanf call unexpectedly failing to parse the input. This caused the mkmagic program (internal tool used to build magic number table for file) to fail. If I link mkmagic against the static libc.a in /usr/obj then it works. So my installed libc.so is broken and the latest source works. I think. My installed kernel is at 76edfabbecde, the end of the binary integer parsing commit series, so my libc should be the same. The program below demonstrates the bug. See src/contrib/file/src for context. I am trying to manually compile a working mkmagic and restart the build to get unstuck. #include #include struct guid { uint32_t data1; uint16_t data2; uint16_t data3; uint8_t data4[8]; }; int main(int argc, char *argv[]) { struct guid g = {0, 0, 0, {0}}; char *text = "75B22630-668E-11CF-A6D9-00AA0062CE6C"; if (argc > 1) text = argv[1]; int count = sscanf(text, "%8x-%4hx-%4hx-%2hhx%2hhx-%2hhx%2hhx%2hhx%2hhx%2hhx%2hhx", &g.data1, &g.data2, &g.data3, &g.data4[0], &g.data4[1], &g.data4[2], &g.data4[3], &g.data4[4], &g.data4[5], &g.data4[6], &g.data4[7]); fprintf(stdout, "[%d]:\n%08x-%04hx-%04hx-%02hhx%02hhx-%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx\n", count, g.data1, g.data2, g.data3, g.data4[0], g.data4[1], g.data4[2], g.data4[3], g.data4[4], g.data4[5], g.data4[6], g.data4[7]); return count != 11; }
Re: shell hung in fork system call
> On Jul 9, 2023, at 19:59, Konstantin Belousov wrote: > > On Sun, Jul 09, 2023 at 11:36:03PM +0000, John F Carr wrote: >> >> >>> On Jul 9, 2023, at 19:25, Konstantin Belousov wrote: >>> >>> On Sun, Jul 09, 2023 at 10:41:27PM +, John F Carr wrote: >>>> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some >>>> irrelevant local changes, four 64 bit ARM processors, make.conf sets >>>> CPUTYPE?=cortex-a57. >>>> >>>> I typed ^C while /bin/sh was starting a pipeline and my shell got hung in >>>> the middle of fork(). >>>> >>>>> From the terminal: >>>> >>>> # git log --oneline --|more >>>> ^C^C^C >>>> load: 3.26 cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k >>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 >>>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 >>>> load: 3.16 cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k >>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 >>>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 >>>> >>>> According to ps -d on another terminal the shell has no children: >>>> >>>> PID TT STAT TIME COMMAND >>>> [...] >>>> 873 u0 IWs 0:00.00 `-- login [pam] (login) >>>> 874 u0 I 0:00.17 `-- -sh (sh) >>>> 95504 u0 I 0:00.01 `-- su - >>>> 95505 u0 D+ 0:00.05 `-- -su (sh) >>>> [...] >>>> >>>> Nothing on the (115200 bps serial) console. No change in system >>>> performance. >>>> >>>> The system is busy copying a large amount of data from the network to a >>>> ZFS pool on spinning disks. The git|more pipeline could have taken some >>>> time to get going while I/O requests worked their way through the queue. >>>> It would not have touched the busy pool, only the zroot pool on an SSD. >>>> >>>> Has anything changed recently that might cause this? >>> >>> There was some change around fork, but your sleep seems to be not from >>> that change. Can you show the wait channel for the process? Do something >>> like >>> $ ps alxww >>> >> >> UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TTTIME COMMAND >> 0 95505 95504 2 20 0 13508 2876 fork D+ u0 0:00.13 -su (sh) >> >> This is probably the same information displayed as [fork] in the output from >> ^T. >> >> Does it correspond to the source line >> >> pause("fork", hz / 2); >> >> ? > > Yes, it is rate-limiting code. Still it is interesting to see the whole > ps output. > > Do you have 7a70f17ac4bd64dc1a5020f in your source? No, I do not have that commit. The comment mentions livelock. CPU use as reported by iostat did not change after the process hung.
Re: shell hung in fork system call
> On Jul 9, 2023, at 19:25, Konstantin Belousov wrote: > > On Sun, Jul 09, 2023 at 10:41:27PM +0000, John F Carr wrote: >> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some >> irrelevant local changes, four 64 bit ARM processors, make.conf sets >> CPUTYPE?=cortex-a57. >> >> I typed ^C while /bin/sh was starting a pipeline and my shell got hung in >> the middle of fork(). >> >>> From the terminal: >> >> # git log --oneline --|more >> ^C^C^C >> load: 3.26 cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k >> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 >> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 >> load: 3.16 cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k >> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 >> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 >> >> According to ps -d on another terminal the shell has no children: >> >> PID TT STAT TIME COMMAND >> [...] >> 873 u0 IWs 0:00.00 `-- login [pam] (login) >> 874 u0 I 0:00.17 `-- -sh (sh) >> 95504 u0 I 0:00.01 `-- su - >> 95505 u0 D+ 0:00.05 `-- -su (sh) >> [...] >> >> Nothing on the (115200 bps serial) console. No change in system performance. >> >> The system is busy copying a large amount of data from the network to a ZFS >> pool on spinning disks. The git|more pipeline could have taken some time to >> get going while I/O requests worked their way through the queue. It would >> not have touched the busy pool, only the zroot pool on an SSD. >> >> Has anything changed recently that might cause this? > > There was some change around fork, but your sleep seems to be not from > that change. Can you show the wait channel for the process? Do something > like > $ ps alxww > UID PID PPID C PRI NI VSZ RSS MWCHAN STAT TTTIME COMMAND 0 95505 95504 2 20 0 13508 2876 fork D+ u0 0:00.13 -su (sh) This is probably the same information displayed as [fork] in the output from ^T. Does it correspond to the source line pause("fork", hz / 2); ?
shell hung in fork system call
Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some irrelevant local changes, four 64 bit ARM processors, make.conf sets CPUTYPE?=cortex-a57. I typed ^C while /bin/sh was starting a pipeline and my shell got hung in the middle of fork(). >From the terminal: # git log --oneline --|more ^C^C^C load: 3.26 cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 load: 3.16 cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 According to ps -d on another terminal the shell has no children: PID TT STAT TIME COMMAND [...] 873 u0 IWs 0:00.00 `-- login [pam] (login) 874 u0 I 0:00.17 `-- -sh (sh) 95504 u0 I 0:00.01 `-- su - 95505 u0 D+ 0:00.05 `-- -su (sh) [...] Nothing on the (115200 bps serial) console. No change in system performance. The system is busy copying a large amount of data from the network to a ZFS pool on spinning disks. The git|more pipeline could have taken some time to get going while I/O requests worked their way through the queue. It would not have touched the busy pool, only the zroot pool on an SSD. Has anything changed recently that might cause this?
Re: For snapshot builds: armv7 chroot on aarch64 has kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin hung up [in getpid?], unkillable, prevents reboot
On Jul 6, 2023, at 20:42, Mike Karels wrote: > > > Thanks for isolating this. Let me know when you have the bug number. > I just tested a fix (the compat code drops the reference on the current > address space an extra time, probably freeing it). > > Mike The bug was introduced in January, 2022. It allows 32 bit binaries to crash a 64 bit system when COMPAT_FREEBSD32 is on. Test coverage of the buggy function (sysctl_kern_proc_vm_layout) was added at the same time. There should be routine runs of 32 bit test suites on 64 bit systems. Although i386 and armv7 are tier 2 systems, the tier 1 COMPAT_FREEBSD32 kernel code needs to be exercised. This bug was only discovered by manually running tests in the right environment, 17 months after automated testing could have discovered it.
Re: For snapshot builds: armv7 chroot on aarch64 has kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin hung up [in getpid?], unkillable, prevents reboot
> On Jun 25, 2023, at 20:16, Mark Millard wrote: > > Using the likes of: > > FreeBSD-14.0-CURRENT-arm64-aarch64-ROCK64-20230622-b95d2237af40-263748.img > and: > FreeBSD-14.0-CURRENT-arm-armv7-GENERICSD-20230622-b95d2237af40-263748.img > > I have shown the following behavior after setting up storage > media based on them. (This was a test that my builds were not > odd for the issue.) > > Boot the aarch64 media and log in. (Note: I logged in > as root.) > > mount the armv7 media (-noatime is just my habit) > and then put it to use: > > # mount -onoatime /dev/da1s2a /mnt > > # chroot /mnt/ > > # kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin > sys/kern/kern_copyin:kern_copyin -> > > On the serial console: > > # ps -xu > USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND > root 11 1498.4 0.0 0 256 - RNL 23:24 542:52.92 [idle] > root 1174 100.0 0.0 0 16 - Rs 23:37 0:00.00 > /usr/tests/sys/kern/kern_copyin -vunprivileged-user=tests > -r/tmp/kyua.9YUttj/2/result.atf kern_copyin > root00.0 0.0 0 1616 - DLs 23:24 0:00.50 [kernel] > root10.0 0.0 11704 1288 - ILs 23:24 0:00.02 /sbin/init > root20.0 0.0 0 256 - WL 23:24 0:00.26 [clock] > root30.0 0.0 0 272 - DL 23:24 0:00.00 [crypto] > root40.0 0.0 0 80 - DL 23:24 0:00.95 [cam] > root50.0 0.0 0 16 - DL 23:24 0:00.00 [busdma] > root60.0 0.0 0 16 - DL 23:24 0:00.03 [rand_harvestq] > root70.0 0.0 0 48 - DL 23:24 0:00.06 [pagedaemon] > root80.0 0.0 0 16 - DL 23:24 0:00.00 [vmdaemon] > root90.0 0.0 0 160 - DL 23:24 0:00.38 [bufdaemon] > root 100.0 0.0 0 16 - DL 23:24 0:00.00 [audit] > root 120.0 0.0 0 880 - WL 23:24 0:11.81 [intr] > root 130.0 0.0 0 48 - DL 23:24 0:00.04 [geom] > root 140.0 0.0 0 16 - DL 23:24 0:00.00 [sequencer 00] > root 150.0 0.0 0 160 - DL 23:24 0:06.42 [usb] > root 160.0 0.0 0 16 - DL 23:24 0:00.10 [acpi_thermal] > root 170.0 0.0 0 16 - DL 23:24 0:00.00 [acpi_cooling0] > root 180.0 0.0 0 16 - DL 23:24 0:00.04 [syncer] > root 190.0 0.0 0 16 - DL 23:24 0:00.00 [vnlru] > root 6710.0 0.0 13260 2600 - Is 23:25 0:00.00 dhclient: > system.syslog (dhclient) > root 6740.0 0.0 13260 2752 - Is 23:25 0:00.00 dhclient: dpni0 > [priv] (dhclient) > root 7610.0 0.0 14572 3972 - Ss 23:25 0:00.02 /sbin/devd > root 9640.0 0.0 12832 2764 - Is 23:25 0:00.02 /usr/sbin/syslogd > -s > root 10330.0 0.0 13012 2604 - Ss 23:25 0:00.01 /usr/sbin/cron -s > root 10580.0 0.0 21052 8308 - Is 23:25 0:00.01 sshd: > /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd) > root 10780.0 0.0 21288 9304 - Is 23:26 0:00.09 sshd: root@pts/0 > (sshd) > root 11750.0 0.0 21288 9496 - Is 23:37 0:00.04 sshd: root@pts/1 > (sshd) > root 10740.0 0.0 13380 3008 u0 Is 23:25 0:00.01 login [pam] > (login) > root 10750.0 0.0 13460 3292 u0 S23:25 0:00.02 -sh (sh) > root 12330.0 0.0 13588 3016 u0 R+ 00:00 0:00.00 ps -xu > root 10810.0 0.0 13460 3328 0 Is 23:26 0:00.02 -sh (sh) > root 11700.0 0.0 5788 2884 0 I23:36 0:00.02 /bin/sh -i > root 11720.0 0.0 10408 7192 0 I+ 23:37 0:00.30 kyua test -k > /usr/tests/Kyuafile sys/kern/kern_copyin > root 11780.0 0.0 13460 3320 1 Is+ 23:38 0:00.01 -sh (sh) > > 1174 is stuck, even if one waits for 30min+. > kill and kill -9 will not kill 1174. > > "shutdown -r now" hangs before the reboot happens > and reports: "some processes would not die". > > An interesting property is that ps and top disagree > about 1174 CPU usage: ps 100%, top 0%. But top also > indicates 1174 always has CPU0 "STATE". (Across > tests CPUn varies but within a test it has > a fixed n.) > > I have also seen ps "STAT" being RXs. > > The following is from my earlier activity with my own > builds involved, here 1119, not the 1174 from above. > truss reports as the last thing for the stuck process > as "getpid()". > > . . . > 1119: 0.588983953 fstatat(AT_FDCWD,"/usr/tests/sys/kern/kern_copyin",{ > mode=-r-xr-xr-x ,inode=111756,size=9776,blksize=10240 },AT_SYMLINK_NOFOLLOW) > = 0 (0x0) > 1119: 0.589065030 > mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),-1,0x0) > = 1074188288 (0x4006d000) > 1119: 0.589227544 > openat(AT_FDCWD,"/tmp/kyua.aBQv6E/2/result.atf",O_WRONLY|O_CREAT|O_TRUNC,0644) > = 3 (0x3) > 1119: 0.589276503 getpid() = 1119 (0x45f) > > > > For reference, from inside an armv7 chroot session > before doing such a test: > > # uname -apKU > FreeBSD generic 14.0-CURRE
Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit
> On Jun 26, 2023, at 04:32, Mark Millard wrote: > > On Jun 24, 2023, at 17:25, Mark Millard wrote: > >> On Jun 24, 2023, at 14:26, John F Carr wrote: >> >>> >>>> On Jun 24, 2023, at 13:00, Mark Millard wrote: >>>> >>>> The running system build is a non-debug build (but >>>> with symbols not stripped). >>>> >>>> The HoneyComb's console log shows: >>>> >>>> . . . >>>> GEOM_STRIPE: Device stripe.IMfBZr destroyed. >>>> GEOM_NOP: Device md0.nop created. >>>> g_vfs_done():md0.nop[READ(offset=5885952, length=8192)]error = 5 >>>> GEOM_NOP: Device md0.nop removed. >>>> GEOM_NOP: Device md0.nop created. >>>> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5 >>>> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5 >>>> GEOM_NOP: Device md0.nop removed. >>>> GEOM_NOP: Device md0.nop created. >>>> GEOM_NOP: Device md0.nop removed. >>>> Fatal data abort: >>>> x0: a02506e64400 >>>> x1: 0001ea401880 (g_raid3_post_sync + 3a145f8) >>>> x2: 4b >>>> x3: a343932b0b22fb30 >>>> x4:0 >>>> x5: 3310b0d062d0e1d >>>> x6: 1d0e2d060d0b3103 >>>> x7:0 >>>> x8: ea325df8 >>>> x9: 0001eec946d0 ($d.6 + 0) >>>> x10: 0001ea401880 (g_raid3_post_sync + 3a145f8) >>>> x11:0 >>>> x12:0 >>>> x13: 00cd8960 (lock_class_mtx_sleep + 0) >>>> x14:0 >>>> x15: a02506e64405 >>>> x16: 0001eec94860 (_DYNAMIC + 160) >>>> x17: 0063a450 (ifc_attach_cloner + 0) >>>> x18: 0001eb290400 (g_raid3_post_sync + 48a3178) >>>> x19: 0001eec94600 (vnet_epair_init_vnet_init + 0) >>>> x20: 00fa5b68 (vnet_sysinit_sxlock + 18) >>>> x21: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x22: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x23: a042e500 >>>> x24: a042e500 >>>> x25: 00ce0788 (linker_lookup_set_desc + 0) >>>> x26: a0203cdef780 >>>> x27: 0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init + 0) >>>> x28: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0) >>>> x29: 0001eb290430 (g_raid3_post_sync + 48a31a8) >>>> sp: 0001eb290400 >>>> lr: 0001eec82a4c ($x.1 + 3c) >>>> elr: 0001eec82a60 ($x.1 + 50) >>>> spsr: 6045 >>>> far: 0002d8fba4c8 >>>> esr: 9646 >>>> panic: vm_fault failed: 0001eec82a60 error 1 >>>> cpuid = 14 >>>> time = 1687625470 >>>> KDB: stack backtrace: >>>> db_trace_self() at db_trace_self >>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>>> vpanic() at vpanic+0x13c >>>> panic() at panic+0x44 >>>> data_abort() at data_abort+0x2fc >>>> handle_el1h_sync() at handle_el1h_sync+0x14 >>>> --- exception, esr 0x9646 >>>> $x.1() at $x.1+0x50 >>>> vnet_register_sysinit() at vnet_register_sysinit+0x114 >>>> linker_load_module() at linker_load_module+0xae4 >>>> kern_kldload() at kern_kldload+0xfc >>>> sys_kldload() at sys_kldload+0x60 >>>> do_el0_sync() at do_el0_sync+0x608 >>>> handle_el0_sync() at handle_el0_sync+0x44 >>>> --- exception, esr 0x5600 >>>> KDB: enter: panic >>>> [ thread pid 70419 tid 101003 ] >>>> Stopped at kdb_enter+0x44: str xzr, [x19, #3200] >>>> db> >>> >>> The failure appears to be initializing module if_epair. >> >> Yep: trying: >> >> # kldload if_epair.ko >> >> was enough to cause the crash. (Just a HoneyComb context at >> that point.) >> >> I tried media dd'd from the recent main snapshot, booting the >> same system. No crash. I moved my build boot media to some >> other systems and tested them: crashes. I tried my boot media >> built optimized for Cortex-A53 or Cortex-X1C/Cortex-A78C >> instead of Cortex-A72: no crashes. (But only one system can >> use the X1C/A78C code in that build.) >> >> So variation testing only gets the crashes for my builds >> that are code-optimized for Cortex-A72's. The same source >
Re: twe(4) removed
> On Jun 24, 2023, at 4:16 AM, Marcin Cieslak wrote: > > I just noticed that I had to remove "device twe" > from my kernel configuration when rebuilding my -CURRENT today. > > Is there any problem with this driver that makes it difficult > to keep around? > > Believe or not, I still rent a machine using it in JBOD mode > (running 13 right now but I could switch it to -CURRENT for testing if > needed). The deprecation notice and partial justification are here: https://cgit.freebsd.org/src/commit/?id=4b22ce07306243d6641c93efcf315a787dd0876c JN
Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit
> On Jun 24, 2023, at 13:00, Mark Millard wrote: > > The running system build is a non-debug build (but > with symbols not stripped). > > The HoneyComb's console log shows: > > . . . > GEOM_STRIPE: Device stripe.IMfBZr destroyed. > GEOM_NOP: Device md0.nop created. > g_vfs_done():md0.nop[READ(offset=5885952, length=8192)]error = 5 > GEOM_NOP: Device md0.nop removed. > GEOM_NOP: Device md0.nop created. > g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5 > g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5 > GEOM_NOP: Device md0.nop removed. > GEOM_NOP: Device md0.nop created. > GEOM_NOP: Device md0.nop removed. > Fatal data abort: > x0: a02506e64400 > x1: 0001ea401880 (g_raid3_post_sync + 3a145f8) > x2: 4b > x3: a343932b0b22fb30 > x4:0 > x5: 3310b0d062d0e1d > x6: 1d0e2d060d0b3103 > x7:0 > x8: ea325df8 > x9: 0001eec946d0 ($d.6 + 0) > x10: 0001ea401880 (g_raid3_post_sync + 3a145f8) > x11:0 > x12:0 > x13: 00cd8960 (lock_class_mtx_sleep + 0) > x14:0 > x15: a02506e64405 > x16: 0001eec94860 (_DYNAMIC + 160) > x17: 0063a450 (ifc_attach_cloner + 0) > x18: 0001eb290400 (g_raid3_post_sync + 48a3178) > x19: 0001eec94600 (vnet_epair_init_vnet_init + 0) > x20: 00fa5b68 (vnet_sysinit_sxlock + 18) > x21: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0) > x22: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0) > x23: a042e500 > x24: a042e500 > x25: 00ce0788 (linker_lookup_set_desc + 0) > x26: a0203cdef780 > x27: 0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init + 0) > x28: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0) > x29: 0001eb290430 (g_raid3_post_sync + 48a31a8) > sp: 0001eb290400 > lr: 0001eec82a4c ($x.1 + 3c) > elr: 0001eec82a60 ($x.1 + 50) > spsr: 6045 > far: 0002d8fba4c8 > esr: 9646 > panic: vm_fault failed: 0001eec82a60 error 1 > cpuid = 14 > time = 1687625470 > KDB: stack backtrace: > db_trace_self() at db_trace_self > db_trace_self_wrapper() at db_trace_self_wrapper+0x30 > vpanic() at vpanic+0x13c > panic() at panic+0x44 > data_abort() at data_abort+0x2fc > handle_el1h_sync() at handle_el1h_sync+0x14 > --- exception, esr 0x9646 > $x.1() at $x.1+0x50 > vnet_register_sysinit() at vnet_register_sysinit+0x114 > linker_load_module() at linker_load_module+0xae4 > kern_kldload() at kern_kldload+0xfc > sys_kldload() at sys_kldload+0x60 > do_el0_sync() at do_el0_sync+0x608 > handle_el0_sync() at handle_el0_sync+0x44 > --- exception, esr 0x5600 > KDB: enter: panic > [ thread pid 70419 tid 101003 ] > Stopped at kdb_enter+0x44: str xzr, [x19, #3200] > db> The failure appears to be initializing module if_epair. I see no recent changes in that module that would be likely to break initialization. a9bfd080d09a if_epair: do not transmit packets that exceed the interface MTU 4d846d260e2b spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD a6b55ee6be15 net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH c69ae8419734 if_epair: also remove vlan metadata from mbufs 29c9b1673305 epair: Remove unneeded includes and sort some of the rest
Re: Support for more than 256 CPU cores
On 5/5/23 6:38 AM, Ed Maste wrote: FreeBSD supports up to 256 CPU cores in the default kernel configuration (on Tier-1 architectures). Systems with more than 256 cores are available now, and will become increasingly common over FreeBSD 14’s lifetime. The FreeBSD Foundation is supporting the effort to increase MAXCPU, and PR269572[1] is open to track tasks and changes. As a project we have scalability work ahead of us to make best use of high core count machines, but at a minimum we should be able to boot a GENERIC kernel on such systems, and have an ABI for the FreeBSD 14 release that supports such a configuration. Some changes have already been committed in support of increased MAXCPU, including increasing MAX_APIC_ID (commit c8113dad7ed4) and a number of changes to reduce bloat (such as commits 42f722e721cd, e72f7ed43eef, 78cfa762ebf2 and 74ac712f72cf). The next step is to increase the maximum cpuset size for userland. I have this change open in review D39941[2] and an exp-run request in PR271213[3]. Following that the kernel change for increasing MAXCPU is in D36838[4]. Additional work on bloat reduction will continue after this change, and looking forward FreeBSD is going to need ongoing effort from the community and the FreeBSD Foundation to continue improving scalability. [1] https://bugs.freebsd.org/269572 [2] https://reviews.freebsd.org/D39941 [3] https://bugs.freebsd.org/271213 [4] https://reviews.freebsd.org/D36838 FWIW, I think it will be useful for main to run with a larger userspace MAXCPU than kernel for at least a while so that we have better testing of that configuration and to give headroom for bumping MAXCPU in the kernel during the 14.x branch. The only other viable path I think which would be more work would be to rework cpuset_t in userspace to always use a dynamically sized mask. This could perhaps be done in an API-preserving manner by making cpuset_t an opaque wrapper type in userland and requiring CPU_* to indirect to functions in libc, etc. That's a fair bit more work however. -- John Baldwin
Re: morse(6) sound
Nuno Teixeira wrote this message on Fri, Oct 28, 2022 at 19:36 +0100: > Is there any way to get sound from morse(6) without speaker(4) device? I mean, I guess you could use sox (play command) and sed to make the audio.. morse -s converts it to . and -'s, so then you convert each one of those to a frequency and necessary delay. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Re: How to Enable support for IPsec deprecated algorithms: 3DES, MD5-HMAC
On 10/4/22 1:53 AM, alfadev wrote: Hi, i am trying to move my gateway from FreeBSD 11.0 to FreeBSD 14.0 to use newly added ipfw table lookup for mac addresses (https://reviews.freebsd.org/D35103) Also I have too many IPSec connections between fortigate, cisco etc. And their operators use only 3DES algorithms and they have no intention to change it for me. So, now i have to enable 3DES support for FreeBSD 14.0 . To add 3DES support again i changed some files shown below. I am not sure what i did any help welcomes. You do not want to just restore the files as-is. You instead want to revert some of the diffs from the first commit. The second commit for /dev/crypto doesn't matter for IPsec and you can ignore it. However, you will need to also partially revert commit 0e00c709d7f1cdaeb584d244df9534bcdd0ac527 which removes DES and 3DES from OCF itself. This is what removed enc_xform_des for example. -- John Baldwin
Re: pkg: Newer FreeBSD version for package... but why?
On 7/13/22 3:17 AM, Andriy Gapon wrote: On 2022-07-13 13:09, Michael Gmelin wrote: On Wed, 13 Jul 2022 10:29:06 +0300 Andriy Gapon wrote: # uname -U 1400063 # uname -K 1400063 # pkg upgrade Updating FreeBSD repository catalogue... Fetching packagesite.pkg: 100%5 MiB 4.8MB/s00:01 Processing entries: 0% Newer FreeBSD version for package zyre: To ignore this error set IGNORE_OSVERSION=yes - package: 1400063 - running kernel: 1400051 Ignore the mismatch and continue? [y/N]: Does anyone know why this would happen? Where does pkg get its notion of the running kernel version? If I'm reading the sources correctly, it's determining the OS version by looking at the elf headers of various files in this order: getenv("ABI_FILE") /usr/bin/uname /bin/sh So I would assume that `file /usr/bin/uname` shows 1400051 on your system. Thank you very much! That's it: # file /usr/bin/uname /usr/bin/uname: ELF 32-bit LSB executable, ARM, EABI5 version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, FreeBSD-style, for FreeBSD 14.0 (1400051), stripped You can point it to checking another file by setting ABI_FILE[0] in the environment or ignore the check by setting IGNORE_OSVERSION (like advised). The "running kernel:" label seems a bit misleading. Indeed. Now the next thing (for me) to research is why the binaries were built "for FreeBSD 14.0 (1400051)" when the source tree has 1400063 and uname -U also reports 1400063. FWIW, this was a cross-build, maybe that played a role too. If you do a NO_CLEAN=yes build, we don't relink binaries just because crt*.o changed (where the note is stored). -- John Baldwin
Re: BLAKE3 unstability?
On 7/12/22 1:41 AM, Evgeniy Khramtsov wrote: I can reproduce via: $ truncate -s 10G /tmp/test $ mdconfig -f /tmp/test -S 4096 $ zpool create test /dev/md1 $ zfs create -o checksum=blake3 test/b $ dd if=/dev/random of=/test/b/noise bs=1M count=4096 $ sync $ zpool scrub test $ zpool status I cannot reproduce this on openzfs/zfs@cb01da68057 (the commit that was most recently merged) built out of tree on either stable/13 70fd40edb86 or main 9aa02d5120a. I'll update a system and see if I can reproduce it with the in-tree ZFS. - Ryan It did not reproduce for me with in-tree ZFS on main@3c9ad9398fcd either. Could you share sysctl kstat.zfs.misc.chksum_bench, maybe we are using different implementations? I do see that blake3 went in with only a Linux module parameter for the implementation selection, so I'll have to fix that. For now we can at least see which was fastest, which should be the one selected. You just won't be able to manually change it to see if that helps. - Ryan I found the culprit (kernel and base from download.FreeBSD.org kernel.txz and base.txz respectively) (I forgot about local sysctl.conf...): kern.sched.steal_thresh=1 kern.sched.preempt_thresh=121 Then #!/bin/sh truncate -s 10G /tmp/test mdconfig -f /tmp/test -S 4096 zpool create test /dev/md0 zfs create -o checksum=blake3 test/b dd if=/dev/random of=/test/b/noise bs=1M count=4096 sync zpool scrub test sleep 3 zpool status zpool destroy test mdconfig -d -u 0 rm /tmp/test As for ULE "tuning", these values give me fine desktop interactivity when building lang/rust when nice and idprio did not help, so I left them in sysctl.conf. Not sure if scheduling parameters are worthy of a ZFS PR, maybe something essential is preempted. It could be missing fpu_kern_enter/leave that lack of preemption would cover over. I thought that missing that would give a panic in the kernel though due to FPU instructions being disabled (including vector instructions). Maybe ZFS isn't using fpu_kern_enter(FPU_NOCTX) and is instead trying to juggle contexts and it has a bug in how it manages saved FPU contexts and reuses a context? If so, I would just suggest that ZFS switch to using FPU_KERN_NOCTX instead which runs all SSE type code in a critical section to disable preemption but avoids having to allocate and manage FPU contexts. -- John Baldwin
Re: Accessibility in the FreeBSD installer and console
On Thu, Jul 07, 2022 at 10:11:52PM +0200, Klaus Küchemann wrote: > > Am 07.07.2022 um 19:32 schrieb Hans Petter Selasky : > > The only argument I've heard from some non-sighted friends about not using > > FreeBSD natively is that ooh, MacOSX is so cool. It starts speaking from > > the start if I press this and this key. Is anyone here working on or > > wanting such a feature? > > Possibly they didn’t want to be rude and your friends didn't tell you the > other argument :-) : according to the corresponding wiki page FreeBSD > doesn't natively support any audio output at all on your friends current M1 > Mac hardware. > since quite nothing is currently supported you probably will first take over > working on the Audio driver …..and of course USB :-) I think a huge benefit that Apple would have is that they might be able to guarantee some sort of audio speaker, period, since they control the hardware that the software runs on. That might be a big ask on FreeBSD, but maybe if there was some relatively ubiquitous assistance hardware, maybe doable. But text-to-speech (and then WHAT language's speech) is a big software chunk, audio layers seems large, and then having to worry about the potential driver issues (while not being able to see-to-hear any potential setup issues) seems huge. Everybody seems happy farming that out to the internet, except on system setup you're not connected to the internet yet. Plus Apple has some deep hooks into the app-stack since you're basically using their toolkit to make a graphical app, so they can guarantee some potential for GUI-textbox-speech, where FreeBSD has a hodgepodge of graphical toolkits (KDE, GTK, Gnome, etc).
Re: Accessibility in the FreeBSD installer and console
On Thu, Jul 07, 2022 at 10:11:52PM +0200, Klaus Küchemann wrote: > > Am 07.07.2022 um 19:32 schrieb Hans Petter Selasky : > > The only argument I've heard from some non-sighted friends about not using > > FreeBSD natively is that ooh, MacOSX is so cool. It starts speaking from > > the start if I press this and this key. Is anyone here working on or > > wanting such a feature? > > Possibly they didn’t want to be rude and your friends didn't tell you the > other argument :-) : according to the corresponding wiki page FreeBSD > doesn't natively support any audio output at all on your friends current M1 > Mac hardware. > since quite nothing is currently supported you probably will first take over > working on the Audio driver …..and of course USB :-) I think a huge benefit that Apple would have is that they might be able to guarantee some sort of audio speaker, period, since they control the hardware that the software runs on. That might be a big ask on FreeBSD, but maybe if there was some relatively ubiqitous
Re: Posting Netiquette [ref: Threads "look definitely like" unreadable mess. Handbook project.]
Greg 'groggy' Lehey wrote this message on Thu, Jun 23, 2022 at 16:33 +1000: > Does anybody have an opinion on character set recommendations? I > think we should ask for UTF-8 if at all possible. I don't think there's any need for a recommendation. All [modern] MUA should tag the post appropriately and each MUA be able to convert as needed between them. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not." signature.asc Description: PGP signature
Re: Profiled libraries on freebsd-current
On 5/4/22 1:38 PM, Steve Kargl wrote: On Wed, May 04, 2022 at 01:22:57PM -0700, John Baldwin wrote: On 5/4/22 12:53 PM, Steve Kargl wrote: On Wed, May 04, 2022 at 11:12:55AM -0700, John Baldwin wrote: I don't know the entire FreeBSD ecosystem. Do people use FreeBSD on embedded systems (e.g., nanobsd) where libthr may be stripped out? Thus, --enable-threads=no is needed. If they do, they are also using a constrained userland and probably are not shipping a GCC binary either. However, it's not clear to me what --enable-threads means. Does this enable -pthread as an option? If so, that should definitely just always be on. It's still an option users have to opt into via a command line flag and doesn't prevent building non-threaded programs. If it's enabling use of threads at runtime within GCC itself, I'd say that also should probably just be allowed to be on. I can't really imagine what else it might mean (and I doubt it means the latter). AFAICT, it controls whether -lpthread is automatically added to the command line. In the case of -pg, it is -lpthread_p. The relevant lines are #ifdef FBSD_NO_THREADS #define FBSD_LIB_SPEC "\ %{pthread: %eThe -pthread option is only supported on FreeBSD when gcc \ is built with the --enable-threads configure-time option.} \ %{!shared: \ %{!pg: -lc} \ %{pg: -lc_p} \ }" #else #define FBSD_LIB_SPEC "\ %{!shared: \ %{!pg: %{pthread:-lpthread} -lc} \ %{pg: %{pthread:-lpthread_p} -lc_p} \ }\ %{shared:\ %{pthread:-lpthread} -lc \ }" #endif Ed is wondering if one can get rid of FBSD_NO_THREADS. With the pending removal of WITH_PROFILE, the above reduces to #define FBSD_LIB_SPEC " \ %{!shared:\ %{pthread:-lpthread} -lc\ } \ %{shared: \ %{pthread:-lpthread} -lc\ }" If one can do the above, then freebsd-nthr.h is no longer needed and can be deleted and config.gcc's handling of --enable-threads can be updated/removed. Ok, so it's just if -pthread is supported (%{pthread:-lpthread} only adds -lpthread if -pthread was given on the command line). That can just be on all the time and Ed is correct that it is safe to remove the FBSD_NO_THREADS case and assume it is always present instead. -- John Baldwin
Re: Profiled libraries on freebsd-current
On 5/4/22 12:53 PM, Steve Kargl wrote: On Wed, May 04, 2022 at 11:12:55AM -0700, John Baldwin wrote: On 5/2/22 10:37 AM, Steve Kargl wrote: On Mon, May 02, 2022 at 12:32:25PM -0400, Ed Maste wrote: On Sun, 1 May 2022 at 11:54, Steve Kargl wrote: diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h index 594487829b5..1e8ab2e1827 100644 --- a/gcc/config/freebsd-spec.h +++ b/gcc/config/freebsd-spec.h @@ -93,14 +93,22 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see (similar to the default, except no -lg, and no -p). */ #ifdef FBSD_NO_THREADS I wonder if we can simplify things now, and remove this `FBSD_NO_THREADS` case. I didn't see anything similar in other GCC targets I looked at. That I don't know. FBSD_NO_THREADS is defined in freebsd-nthr.h. In fact, it's the only thing in that header (except copyright broilerplate). freebsd-nthr.h only appears in config.gcc and seems to only get added to the build if someone runs configure with --enable-threads=no. Looking at my last config.log for gcc trunk, I see "Thread model: posix", which appears to be the default case or if someone does --enable-threads=yes or --enable-threads=posix. So, I suppose it comes down to two questions: (1) is libpthread.* available on all supported targets and versions? (2) does anyone build gcc without threads support? libpthread is available on all supported architectures on all supported versions. libthr has been the default threading library since 7.0 and the only supported library since 8.0. In GDB I just assume libthr style threads, and I think GCC can safely do the same. I don't know the entire FreeBSD ecosystem. Do people use FreeBSD on embedded systems (e.g., nanobsd) where libthr may be stripped out? Thus, --enable-threads=no is needed. If they do, they are also using a constrained userland and probably are not shipping a GCC binary either. However, it's not clear to me what --enable-threads means. Does this enable -pthread as an option? If so, that should definitely just always be on. It's still an option users have to opt into via a command line flag and doesn't prevent building non-threaded programs. If it's enabling use of threads at runtime within GCC itself, I'd say that also should probably just be allowed to be on. I can't really imagine what else it might mean (and I doubt it means the latter). -- John Baldwin
Re: Profiled libraries on freebsd-current
On 5/2/22 10:37 AM, Steve Kargl wrote: On Mon, May 02, 2022 at 12:32:25PM -0400, Ed Maste wrote: On Sun, 1 May 2022 at 11:54, Steve Kargl wrote: diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h index 594487829b5..1e8ab2e1827 100644 --- a/gcc/config/freebsd-spec.h +++ b/gcc/config/freebsd-spec.h @@ -93,14 +93,22 @@ see the files COPYING3 and COPYING.RUNTIME respectively. If not, see (similar to the default, except no -lg, and no -p). */ #ifdef FBSD_NO_THREADS I wonder if we can simplify things now, and remove this `FBSD_NO_THREADS` case. I didn't see anything similar in other GCC targets I looked at. That I don't know. FBSD_NO_THREADS is defined in freebsd-nthr.h. In fact, it's the only thing in that header (except copyright broilerplate). freebsd-nthr.h only appears in config.gcc and seems to only get added to the build if someone runs configure with --enable-threads=no. Looking at my last config.log for gcc trunk, I see "Thread model: posix", which appears to be the default case or if someone does --enable-threads=yes or --enable-threads=posix. So, I suppose it comes down to two questions: (1) is libpthread.* available on all supported targets and versions? (2) does anyone build gcc without threads support? libpthread is available on all supported architectures on all supported versions. libthr has been the default threading library since 7.0 and the only supported library since 8.0. In GDB I just assume libthr style threads, and I think GCC can safely do the same. -- John Baldwin
Re: 'set but unused' breaks drm-*-kmod
On 4/21/22 6:45 AM, Emmanuel Vadot wrote: On Thu, 21 Apr 2022 08:51:26 -0400 Michael Butler wrote: On 4/21/22 03:42, Emmanuel Vadot wrote: Hello Michael, On Wed, 20 Apr 2022 23:39:12 -0400 Michael Butler wrote: Seems this new requirement breaks kmod builds too .. The first of many errors was (I stopped chasing them all for lack of time) .. --- amdgpu_cs.o --- /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.7.19_3/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c:1210:26: error: variable 'priority' set but not used [-Werror,-Wunused-but-set-variable] enum drm_sched_priority priority; ^ 1 error generated. *** [amdgpu_cs.o] Error code 1 How are you building the port, directly or with PORTS_MODULES ? I do make passes on the warning for drm and I did for set-but-not-used case but unfortunately this option doesn't exists in 13.0 so I couldn't apply those in every branch. I build this directly on -current. I'm guessing that these are what triggered this behaviour: commit 8b83d7e0ee54416b0ee58bd85f9c0ae7fb3357a1 Author: John Baldwin Date: Mon Apr 18 16:06:27 2022 -0700 Make -Wunused-but-set-variable a fatal error for clang 13+ for kernel builds. Reviewed by:imp, emaste Differential Revision: https://reviews.freebsd.org/D34949 commit 615d289ffefe2b175f80caa9b1e113c975576472 Author: John Baldwin Date: Mon Apr 18 16:06:14 2022 -0700 Re-enable set but not used warnings for kernel builds. make tinderbox now passes with this warning enabled as a fatal error, so revert the change to hide it in preparation for making it fatal. This reverts commit e8e691983bb75e80153b802f47733f1531615fa2. Reviewed by:imp, emaste Differential Revision: https://reviews.freebsd.org/D34948 Ok I see, I won't have time until monday (maybe tuesday to fix this) but if someone wants to beat me to it we should add some new CWARNFLAGS for each problematic files in the 5.4-lts and 5.7-table branches of drm-kmod (master which is following 5.10 is already good) only if $ {COMPILER_VERSION} >= 13. There is already a helper you can use that deals with compiler versions: CWARNFLAGS+= ${NO_WUNUSED_BUT_SET_VARIABLE} or some such. -- John Baldwin
Can't build with INVARIANTS but not WITNESS
My -CURRENT kernel has INVARIANTS (inherited from GENERIC) but not WITNESS: include GENERIC ident STRIATUS nooptions WITNESS nooptions WITNESS_SKIPSPIN My kernel build fails: /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:102:13: error: variable 'line' set but not used [-Werror,-Wunused-but-set-variable] int flags, line __diagused; ^ /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:101:14: error: variable 'file' set but not used [-Werror,-Wunused-but-set-variable] const char *file __diagused; The problem is, __diagused expands to nothing if INVARIANTS _or_ WITNESS is defined, but the variable in vfs_lookup.c is only used if WITNESS is defined. #if defined(INVARIANTS) || defined(WITNESS) #define __diagused #else #define __diagused __unused #endif I think this code is trying to be too clever and causing more trouble than it prevents. Change the || to &&, or replace __diagused with __unused everywhere.
Re: ktrace on NFSroot failing?
On 3/10/22 8:14 AM, Mateusz Guzik wrote: On 3/10/22, Bjoern A. Zeeb wrote: Hi, I am having a weird issue with ktrace on an nfsroot machine: root:/tmp # ktrace sleep 1 root:/tmp # kdump -559038242 Events dropped. kdump: bogus length 0xdeadc0de Anyone seen something like this before? I just did a quick check and it definitely fails on nfs mounts: # ktrace pwd /root/mjg # kdump -559038242 Events dropped. kdump: bogus length 0xdeadc0de I don't have time to look into it this week though. Possibly related: core dumps are no longer working for me on NFS mounts. I get a 0 byte foo.core instead of a valid core dump. -- John Baldwin
Re: Buildworld fails with external GCC toolchain
On 2/12/22 11:34 AM, Yasuhiro Kimura wrote: From: Dimitry Andric Subject: Re: Buildworld fails with external GCC toolchain Date: Fri, 11 Feb 2022 22:53:44 +0100 Not really, the gcc 9 build has been broken for months, as far as I know. See also: https://ci.freebsd.org/job/FreeBSD-main-amd64-gcc9_build/ The last build(s) show a different error from yours, though: /workspace/src/tests/sys/netinet/libalias/util.c: In function 'set_udp': /workspace/src/tests/sys/netinet/libalias/util.c:112:2: error: converting a packed 'struct ip' pointer (alignment 2) to a 'uint32_t' {aka 'unsigned int'} pointer (alignment 4) may result in an unaligned pointer value [-Werror=address-of-packed-member] 112 | uint32_t *up = (void *)p; | ^~~~ In file included from /workspace/src/tests/sys/netinet/libalias/util.h:37, from /workspace/src/tests/sys/netinet/libalias/util.c:39: /workspace/src/sys/netinet/ip.h:51:8: note: defined here 51 | struct ip { |^~ -Dimitry Thanks for information. I went back the commit history of main branch about every month and check if buildworld succeeds with GCC. But it didn't succeed even if I went back about a year. And devel/binutils port was update to 2.37 on last August. So I suspect external GCC toolchain doesn't work well after binutils is updated to current version. I have amd64 world + kernel building with GCC 9 and the only remaining open review not merged yet is https://reviews.freebsd.org/D34147. It is work to keep it working though and I hadn't worked on it again until recently. -- John Baldwin
Re: Dragonfly Mail Agent (dma) in the base system
On 1/27/22 1:34 PM, Ed Maste wrote: The Dragonfly Mail Agent (dma) is a small Mail Transport Agent (MTA) which accepts mail from a local Mail User Agent (MUA) and delivers it locally or to a smarthost for delivery. dma does not accept inbound mail (i.e., it does not listen on port 25) and is not intended to provide the same functionality as a full MTA like postfix or sendmail. It is intended for use cases such as delivering cron(8) mail. Since 2014 we have a copy of dma in the base system available as an optional component, enabled via the WITH_DMAGENT src.conf knob. I am interested in determining whether dma is a viable minimal base system MTA, and if not what gaps remain. If you have enabled DMA on your systems (or are willing to give it a try) and have any feedback or are aware of issues please follow up or submit a PR as appropriate. I've used DMA on systems without local mail accounts to forward cron periodic e-mails just fine. It even supports STARTTLS and SMTP AUTH. I haven't tried using it for simple local delivery to /var/mail/root. -- John Baldwin
Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]
On 1/1/22 9:00 AM, Ed Maste wrote: On Fri, 31 Dec 2021 at 18:04, John Baldwin wrote: However, your point about libcxxrt.so.1 is valid. It needs to also be moved to /lib if libc++.so.1 is moved to /lib. libcxxrt.so.1 has always been in /lib. Oh, I was thrown off by the .so indirection for libcxxrt in the linker script. -- John Baldwin
Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]
On 12/31/21 2:59 PM, Mark Millard wrote: On 2021-Dec-31, at 14:28, Mark Millard wrote: On 2021-Dec-30, at 14:04, John Baldwin wrote: On 12/30/21 1:09 PM, Mark Millard wrote: On 2021-Dec-30, at 13:05, Mark Millard wrote: This asks a question in a different direction that my prior reports about my builds vs. Cy's reported build. Background: /usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libc++.so:GROUP ( /lib/libc++.so.1 /usr/lib/libcxxrt.so and: lrwxr-xr-x 1 root wheel23 Dec 29 13:17:01 2021 /usr/lib/libcxxrt.so -> ../../lib/libcxxrt.so.1 Why did libc++.so.1 not get a: /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 I forgot to remove the .1 on the left hand side: /usr/lib/libc++.so -> ../../lib/libc++.so.1 Because for libc++.so we don't just symlink to the current version of the library (as we do for most other shared libraries) to tell the compiler what to link against for -lc++, instead we use a linker script that tells the compiler to link against both of those libraries when -lc++ is encountered. A better identification of what looks odd to me is the path variations in: # more /usr/lib/libc++.so Another not great day on my part: That path alone makes the mix of /lib/ and /usr/lib/ use involved, given the reference to /lib/libc++.so.1 . That would still be true if the other path had been /lib/libcxxrt.so . /usr/lib/libc++.so is only used by the compiler/linker when linking a binary. The resulting binary has the associated paths (/lib/libc++.so.1 and /usr/lib/libcxxrt.so.1) in its DT_NEEDED. So it is fine for the .so to be in /usr/lib. This is the same with /usr/lib/libc.so vs /lib/libc.so.7. However, your point about libcxxrt.so.1 is valid. It needs to also be moved to /lib if libc++.so.1 is moved to /lib. Doing so will also require yet another depend-clean.sh fixup (well, probably just adjusting the one I added to check the libcxxrt path instead of libc++ path). -- John Baldwin
Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib [add /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 ?]
On 12/30/21 1:09 PM, Mark Millard wrote: On 2021-Dec-30, at 13:05, Mark Millard wrote: This asks a question in a different direction that my prior reports about my builds vs. Cy's reported build. Background: /usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/tmp/usr/lib/libc++.so:GROUP ( /lib/libc++.so.1 /usr/lib/libcxxrt.so and: lrwxr-xr-x 1 root wheel23 Dec 29 13:17:01 2021 /usr/lib/libcxxrt.so -> ../../lib/libcxxrt.so.1 Why did libc++.so.1 not get a: /usr/lib/libc++.so.1 -> ../../lib/libc++.so.1 I forgot to remove the .1 on the left hand side: /usr/lib/libc++.so -> ../../lib/libc++.so.1 Because for libc++.so we don't just symlink to the current version of the library (as we do for most other shared libraries) to tell the compiler what to link against for -lc++, instead we use a linker script that tells the compiler to link against both of those libraries when -lc++ is encountered. I have finally reproduced Cy's build error locally and am testing my fix. If it works I'll commit it. -- John Baldwin
Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
On 12/14/21 9:40 AM, Gleb Smirnoff wrote: On Tue, Dec 14, 2021 at 09:28:07AM -0800, John Baldwin wrote: J> > AFAIK, today it will always panic only with WITNESS. Without WITNESS it would J> > pass through mtx_lock as long as the mutex is not locked. J> J> Yes, but the default kernel on head is GENERIC which has witness enabled, hence J> the out of the box kernel panics reliably. :) J> J> > So, do you suggest to push D33340 before finalizing D9? J> J> Yes, I think so. Pushed. And I plan to post new version of D9 today. Thanks! -- John Baldwin
Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
On 12/13/21 12:25 PM, Gleb Smirnoff wrote: On Mon, Dec 13, 2021 at 11:56:35AM -0800, John Baldwin wrote: J> > J> So there are two things here. The root issue is that the devel/apr1 port J> > J> runs a configure test for TCP_NDELAY being inherited by accepted sockets. J> > J> This test panics because prison_check_ip4() tries to lock a prison mutex J> > J> to walk the IPs assigned to a jail, but the caller (in_pcblookup_hash()) has J> > J> done an smr_enter() which is a critical_enter(): J> > J> > The first one is known, and I got a patch to fix it: J> > J> > https://reviews.freebsd.org/D33340 J> > J> > However, a pre-requisite to this simple patch is more complex: J> > J> > https://reviews.freebsd.org/D9 J> > J> > There is some discussion on how to improve that, and I decided to do that J> > rather than stick to original version. So I takes a few extra days. J> > J> > We could push D33340 into main, if the negative effects (raciness of J> > the prison check) is considered lesser evil then potentially contested J> > mtx_lock in smr section. J> J> I think raciness is probably better than always panicking as it does today. AFAIK, today it will always panic only with WITNESS. Without WITNESS it would pass through mtx_lock as long as the mutex is not locked. Yes, but the default kernel on head is GENERIC which has witness enabled, hence the out of the box kernel panics reliably. :) So, do you suggest to push D33340 before finalizing D9? Yes, I think so. -- John Baldwin
Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
On 12/14/21 2:14 AM, Alexey Dokuchaev wrote: How do you mean? Most FreeBSD people, not some random Twitter crowd, want the bell to be on by default, but it's still off. I don't know that that's true, and I myself am not sure that I want it back on by default. Previously my laptop had a rather annoying beep whose volume I couldn't control that I actually prefer to have off normally. On further reflection, the beep I was looking for for bad input may actually be an xscreensaver thing for an invalid character to unlock the screen vs a sysbeep anyway. -- John Baldwin
Re: RFC: What to do about Allocate in the NFS server for FreeBSD13?
On 12/13/21 8:30 AM, Konstantin Belousov wrote: On Mon, Dec 13, 2021 at 04:26:42PM +, Rick Macklem wrote: Hi, There are two problems with Allocate in the NFSv4.2 server in FreeBSD13: 1 - It uses the thread credentials instead of the ones for the RPC. 2 - It does not ensure that file changes are committed to stable storage. These problems are fixed by commit f0c9847a6c47 in main, which added ioflag and cred arguments to VOP_ALLOCATE(). I can think of 3 ways to fix Allocate in FreeBSD13: 1 - Apply a *hackish* patch like this: + savcred = p->td_ucred; + td->td_ucred = cred; do { olen = len; error = VOP_ALLOCATE(vp, &off, &len); if (error == 0 && len > 0 && olen > len) maybe_yield(); } while (error == 0 && len > 0 && olen > len); + p->td_ucred = savcred; if (error == 0 && len > 0) error = NFSERR_IO; + if (error == 0) + error = VOP_FSYNC(vp, MNT_WAIT, p); The worst part of it is temporarily setting td_ucred to cred. 2 - MFC'ng commit f0c9847a6c47. Normally changes to the VOP/VFS are not MFC'd. However, in this case, it might be ok to do so, since it is unlikely there is an out of source tree file system with a custom VOP_ALLOCATE() method? I do not see much wrong with #2, this is what I would do myself. I also think this is fine. -- John Baldwin
Re: smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
On 12/13/21 9:35 AM, Gleb Smirnoff wrote: Hi John, On Mon, Dec 13, 2021 at 07:45:07AM -0800, John Baldwin wrote: J> So there are two things here. The root issue is that the devel/apr1 port J> runs a configure test for TCP_NDELAY being inherited by accepted sockets. J> This test panics because prison_check_ip4() tries to lock a prison mutex J> to walk the IPs assigned to a jail, but the caller (in_pcblookup_hash()) has J> done an smr_enter() which is a critical_enter(): The first one is known, and I got a patch to fix it: https://reviews.freebsd.org/D33340 However, a pre-requisite to this simple patch is more complex: https://reviews.freebsd.org/D9 There is some discussion on how to improve that, and I decided to do that rather than stick to original version. So I takes a few extra days. We could push D33340 into main, if the negative effects (raciness of the prison check) is considered lesser evil then potentially contested mtx_lock in smr section. I think raciness is probably better than always panicking as it does today. J> However, it was a bit harder to see this originally as the 915kms driver J> tries to do a malloc(M_WAITOK) from cn_grab() when entering DDB which J> recursively panics (even a malloc(M_NOWAIT) from cn_grab() is probably a J> bad idea). When it panicked in X the result was that the screen just froze J> on whatever it had most recently drawn and the machine looked hung. (The J> fact that that sysbeep is off so I couldn't tell if typing in commands was J> doing anything vs emitting errors probably didn't improve trying to diagnose J> the hang as "sitting in ddb" initially, though I don't know if DDB itself J> emits a beep for invalid commands, etc.) Didn't know about this one. Is this isolated to actually entering DDB or there is some path that in a normal inpcb lookup we would M_WAITOK? This is in the drm(4) driver, nothing to do with in_pcb, just made it harder to see the in_pcb issue. -- John Baldwin
smr inp breaks some jail use cases and panics with i915kms don't switch to the console anymore
mprove trying to diagnose the hang as "sitting in ddb" initially, though I don't know if DDB itself emits a beep for invalid commands, etc.) -- John Baldwin
Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook
On 12/3/21 6:09 PM, Tomoaki AOKI wrote: On Fri, 03 Dec 2021 05:54:37 -0800 (PST) "Jeffrey Bouquet" wrote: On Fri, 3 Dec 2021 13:58:39 +0100, Miroslav Lachman <000.f...@quip.cz> wrote: On 03/12/2021 12:52, Yetoo Happy wrote: [...] Quick Start* and follow the instructions and get to step 7 and may think that even though etcupdate is different from mergemaster from the last time they used the handbook they have faith that following the instructions won't brick their system. This user will instead find that faith in general is just a very complex facade for the pain and suffering of not following *24.5.6.1 Merging Configuration Files* because the user doesn't know that step exists or relevant to the current step and ends up unknowingly having etcupdate append "<<<< yours ... >>>>> new" to the top of the user's very important configuration files that they didn't expect the program to actually modify that way when they resolved differences nor could they predict easily because the diff format is so unintuitive and different from mergemaster. Now unable to login or boot into single user mode because redirections instead of the actual configuration is parsed the user goes to the handbook to find out what might have happened and scrolls down to find *24.5.6.1 Merging Configuration Files* is under *24.5.6. [...] That's why I think etcupdate is not so intuitive as tool like this should be and etcupdate is extremely dangerous because it intentionally breaks syntax of files vital to have system up and running. If anything goes wrong with mergemaster automatic process then your have configuration not updated which is almost always fine to boot the system and fix it. But after etcupdate? Much worse... I maintain about 30 machines for 2 decades and had problems with etcupdate many times. I had ti use mergemaster as fall back many times. Mainly because of etcupdate said "Reference tree to diff against unavailable" or "No previous tree to compare against, a sane comparison is not possible.". And sometimes because etcupdate cannot automatically update many files in /etc/rc.d and manual merging of a lot of files with "<<<< >>>>" is realy painful while with mergemaster only simple keyboard shortcuts will solve it. All of this must be very stressful for beginners. So beside the update of documentation I really would like to see some changes to etcupdate workflow where files are modified in temporary location and moved to destination only if they do not contain any syntax breaking changes like <<<<, , >>>>. Kind regards Miroslav Lachman Agree. I fell back to mergemaster this Nov on 13-stable when the /var files pertaining to etcupdate were all missing current /etc data, and no study of man etcupdate was clear enough to rectify such a scenario, and suspect my initial use of etcupdate will or may require a planned reinstall, not having had to do so since Jan 2004 iirc, [ vs failed hard disk migrations ] and I am just hoping mergemaster stays in /usr/src and updated for system changes, even if moved to 'tools' or something, since its use seems intuitive and much less of a black box. Also, /usr/src/UPDATING still at the bottom emphasizes mergemaster still. Not sure it's fixed or not (tooo dangerous to try...), -n (dry-run) option of etcupdate is now quite harmful. Maybe by any commit done in this april on main (MFC'ed to stable/13 in june). *I got busy manually checking and applying changes to /etc, and forgot to file PR. Doing `etcupdate -n` itself runs OK, but following `etcupdate -B` does NOT do anything, hence nothing is actually updated. The only workaround I have is NOT to try dry-run. Humm. It would be because the same trees are used for dry-run and actual run. (Not looked into the code. Just a thought.) So the new changes always build a temporary tree (vs trying to build /var/db/etupdate/current in place). For -n it should be that it just doesn't change /var/db/etcupdate/current at the end, but if it did the move anyway that would explain the bug you are seeing. That does indeed look broken. Please file a PR as a reminder for me to fix it. -- John Baldwin
Re: Make etcupdate bootstrap requirement due to previous mergemaster usage more clear in handbook
On 12/3/21 4:58 AM, Miroslav Lachman wrote: So beside the update of documentation I really would like to see some changes to etcupdate workflow where files are modified in temporary location and moved to destination only if they do not contain any syntax breaking changes like <<<<, , >>>>. This is what etcupdate does, so I'm a bit confused why you are getting merge markers in /etc. When an automated 3-way merge doesn't work due to conflicts, the file with the conflicts is saved in /var/db/etcupdate/conflicts/. It is only copied to /etc when you mark it as fully resolved when running 'etcupdate resolve'. Perhaps you had multiple conflicts in a modified file and when editing the file you only fixed the first one and then marked it as resolved at the prompt? Even in that case etcupdate explicitly prompts you a second time after you say "r" with "File still has conflicts, are you sure?", so it will only install a file to /etc with those changes if you have explicitly confirmed you want it. -- John Baldwin
Re: failure of pructl (atexit/_Block_copy/--no-allow-shlib-undefined)
John-Mark Gurney wrote this message on Thu, Dec 02, 2021 at 15:43 -0800: > David Chisnall wrote this message on Thu, Dec 02, 2021 at 10:34 +: > > On 02/12/2021 09:51, Dimitry Andric wrote: > > > Apparently the "block runtime" is supposed to provide the actual object, > > > so I guess you have to explicitly link to that runtime? > > > > The block runtime provides this symbol. You use this libc API, you must > > be compiling with a toolchain that supports blocks and must be providing > > the blocks symbols. If you don't use `atexit_b` or any of the other > > `_b` APIs then you don't need to link the blocks runtime. > > > > I am not sure why this is causing linker failures - if it's a weak > > symbol and it's not defined then that's entirely expected: the point of > > a weak symbol is that it might not be defined. This avoids the need to > > link libc to the blocks runtime for code that doesn't use blocks (i.e. > > most code that doesn't come from macOS). > > > > This code is not using `atexit_b`, but because it is using `atexit` the > > linker is complaining that the compilation unit containing `atexit` is > > referring to a symbol that isn't defined. > > I assume that this failure was due to a recent llvm change, because I > haven't received any failures about pructl until Nov 16th, 2021, > despite the port and code being untouched since 2020-09-22. > > Digging in a bit more, it looks like libpru is compiled w/ -fblocks, > and so depending upon the _Block_copy symbol, the atexit is just the > "closest" symbol that's defined". pructl is not, but even compiling > pructl w/ -fblocks, doesn't fix the link error, as it looks like the > block runtime isn't linked. If I manually include > /usr/lib/libBlocksRuntime.so, then pructl is able to link. > > I can't seem to find any docs on clang about how to properly compile > code that uses blocks, so, unless someone points me to docs on how to > compile blocks enable programs, I'll just patch libpru to not use > blocks since it seems like blocks is well supported. I don't want > to fix this code every few years when things change. Thanks to some off-list comms, it appears that this was a regression in lld 13, and will be fixed by: https://reviews.llvm.org/D115041 Thanks to jrtc27 for [helping] tracking this down! -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Re: failure of pructl (atexit/_Block_copy/--no-allow-shlib-undefined)
David Chisnall wrote this message on Thu, Dec 02, 2021 at 10:34 +: > On 02/12/2021 09:51, Dimitry Andric wrote: > > Apparently the "block runtime" is supposed to provide the actual object, > > so I guess you have to explicitly link to that runtime? > > The block runtime provides this symbol. You use this libc API, you must > be compiling with a toolchain that supports blocks and must be providing > the blocks symbols. If you don't use `atexit_b` or any of the other > `_b` APIs then you don't need to link the blocks runtime. > > I am not sure why this is causing linker failures - if it's a weak > symbol and it's not defined then that's entirely expected: the point of > a weak symbol is that it might not be defined. This avoids the need to > link libc to the blocks runtime for code that doesn't use blocks (i.e. > most code that doesn't come from macOS). > > This code is not using `atexit_b`, but because it is using `atexit` the > linker is complaining that the compilation unit containing `atexit` is > referring to a symbol that isn't defined. I assume that this failure was due to a recent llvm change, because I haven't received any failures about pructl until Nov 16th, 2021, despite the port and code being untouched since 2020-09-22. Digging in a bit more, it looks like libpru is compiled w/ -fblocks, and so depending upon the _Block_copy symbol, the atexit is just the "closest" symbol that's defined". pructl is not, but even compiling pructl w/ -fblocks, doesn't fix the link error, as it looks like the block runtime isn't linked. If I manually include /usr/lib/libBlocksRuntime.so, then pructl is able to link. I can't seem to find any docs on clang about how to properly compile code that uses blocks, so, unless someone points me to docs on how to compile blocks enable programs, I'll just patch libpru to not use blocks since it seems like blocks is well supported. I don't want to fix this code every few years when things change. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
failure of pructl (atexit/_Block_copy/--no-allow-shlib-undefined)
Hello, It seems like the recent changes to make --no-allow-shlib-undefined broke pructl. lib/libc/stdlib/atexit.c uses a weak _Block_copy symbol, but pructl does not use atexit_b, and yet gets the following error: : && /usr/bin/cc -Werror -O2 -pipe -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -std=c99 -fstack-protector-strong CMakeFiles/pructl.dir/pructl.c.o -o pructl -Wl,-rpath,/usr/local/lib: /usr/local/lib/libpru.so && : ld: error: /lib/libc.so.7: undefined reference to _Block_copy [--no-allow-shlib-undefined] cc: error: linker command failed with exit code 1 (use -v to see invocation) What is the correct fix? It seems like atexit.c or the linker should be fixed, as pructl doesn't use atexit_b at all. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Re: amd64 (example) main [so: 14]: delete-old check-old delete-old-libs missing a bunch of files?
d without updating ObsoleteFiles.inc. Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/mixer: Kyuafile Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/mixer: mixer_test Fallout from recent mixer changes? Hans might know more. Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-sav.in Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-sav.out Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-u.out Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-usr.in Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v1-sparc64-usr.out Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v2-sparc64-sav.in Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v2-sparc64-u.out Only in /usr/obj/DESTDIRs/main-amd64-poud/usr/tests/usr.sbin/sa: v2-sparc64-usr.in I'll commit fixes for some of these. -- John Baldwin
Re: make cleandiry tries to access /lib/geom
On 11/24/21 3:30 AM, Bjoern A. Zeeb wrote: Hi, 673 ===> usr.bin/diff/tests (cleandir) 674 ===> lib/geom (cleandir) 675 ===> sbin/mount_udf (cleandir) 676 make[6] warning: /lib/geom: Permission denied. not sure what is going on here? 677 ===> share/i18n/esdb/ISO-8859 (cleandir) 678 ===> tests/sys/cddl/zfs/tests/cli_root/zfs_clone (cleandir) I think Jess has a possible fix. This is some regression added in the build system several months ago. -- John Baldwin
Re: "Khelp module "ertt" can't unload until its refcount drops from 1 to 0." after "All buffers synced."?
On 11/19/21 4:29 AM, tue...@freebsd.org wrote: On 19. Nov 2021, at 00:11, Mark Millard wrote: On 2021-Nov-18, at 12:31, tue...@freebsd.org wrote: On 17. Nov 2021, at 21:13, Mark Millard via freebsd-current wrote: I've not noticed the ertt message before in: . . . Waiting (max 60 seconds) for system thread `bufspacedaemon-1' to stop... done All buffers synced. Uptime: 1d9h57m18s Khelp module "ertt" can't unload until its refcount drops from 1 to 0. Hi Mark, what kernel configuration are you using? What kernel modules are loaded? The shutdown was of my ZFS boot media but the machine is currently doing builds on the UFS media. (The ZFS media is present but not mounted). For now I provide information from the booted UFS system. The UFS context is intended to be nearly a copy of the brctl selection for main [so: 14] from the ZFS media. Both systems have been doing the same poudriere builds for various comparison/contrast purposes. The current build activity will likely take 16+ hrs. Hi Mark, thanks a lot for the information. I was contemplating whether this message was related to a recent change in the TCP congestion module handling, but since it was already there in February, this is not the case. Will try to reproduce this, but wasn't able up to now. The congestion control changes just probably exacerbated the bug by adding a new reference on this module, just as they exposed the bug with khelp using the wrong SYSINIT subsystem. -- John Baldwin
Re: cross-compiling for i386 on amd64 fails
On 11/15/21 8:34 PM, Michael Butler via freebsd-current wrote: Haven't had time to identify which change caused this yet but I now get .. ===> lib/libsbuf (obj,all,install) ===> cddl/lib/libumem (obj,all,install) ===> cddl/lib/libnvpair (obj,all,install) ===> cddl/lib/libavl (obj,all,install) ld: error: /usr/obj/usr/src/i386.i386/tmp/usr/lib/libspl.a(assert.o) is incompatible with elf_i386_fbsd ===> cddl/lib/libspl (obj,all,install) cc: error: linker command failed with exit code 1 (use -v to see invocation) --- libavl.so.2 --- *** [libavl.so.2] Error code 1 make[4]: stopped in /usr/src/cddl/lib/libavl My guess is that this was fixed by git: 9e9c651caceb - main - cddl: fix missing ZFS library dependencies -- John Baldwin
Re: Problems with getting a crash dump
On Mon, Nov 08, 2021 at 07:08:31PM +, Alexander wrote: > Hello, I am currently using FreeBSD 14.0-CURRENT and I found a bug that > triggers a kernel panic. I wanted to make a kernel crash dump to further > investigate the issue, but after a few tries I still did not manage to do it. > I started by following the instructions in the FreeBSD Handbook. ... > /dev/nvd0p2.eli is an active swap device and I configured it to be used as a > dump device like this: ... Much like you, I found that my current (encryptd) swap files weren't going to work and I used an external USB stick. [/etc/rc.conf] # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable #dumpdev="AUTO" dumpdev="/dev/da0p1" [dumpon -vl] kernel dumps on priority: device 0: da0p1 [gpart show da0] => 40 240353200 da0 GPT (115G) 40 2403532001 freebsd-swap (115G) [swapctl -lm] Device: 1MB-blocks Used: /dev/nvd1p3.eli 8192 2932 Apparently the last time I crashed was ~Mar 2021 so your version mileage may vary (not 14), but make sure the OS didn't already do it for you (at least if you're booting up fully into multi-user mode; you did say single). The /var/crash directory is the default location for where savecore stashes the info for you. Note that I made da0p1 swap, but I didn't actually configure it that way in /etc/fstab so I'm not using slow, unencrypted USB for swap, just dumps. The stick had a little write-LED on it, so it was obvious when it was being hit and I think the kernel panic-dump had a status output of some sort (it's been a while), although that might be obscured (under X11, etc). I sort of remember a prompt where I could have done something interactive that I might have had to continue on from before it did the dump. Again, it's been a while since I had a dump that I was trying hard to report. 115G is more than enough to hold 32G of RAM and 8G of swap. Remember that some of your RAM might *be* swapped out (so, worse cast, RAM+swap). Seems like you'd have good odds in a nice, controlled test of not needing all that space but kernel crash dumps are often pretty brainless because they know they've just lost at Russian roulette and don't know what they can trust (don't know about FreeBSD specifically). Lets just say that it has a very different approach to swap than ancient SunOS. You've got some interesting physical quirks (ala, 14 + USB stick) that I couldn't test with my setup, but I do have a bhyve running 14 that I could probably try crashing in a similar way (no USB of course). It sounds like you're going down the right path, although I'd try to borrow a bigger USB stick and see if that helps.
Re: LAN ure interface problem
Ludovit Koren wrote this message on Fri, Oct 22, 2021 at 16:00 +0200: > I have installed FreeBSD 14.0-CURRENT #1 main-n250134-225639e7db6-dirty > on my notebook HP EliteBook 830 G7 and I am using RealTek usb LAN > interface: > > ure0 on uhub0 > ure0: on > usbus1 > miibus0: on ure0 > rgephy0: PHY 0 on miibus0 > rgephy0: OUI 0x00e04c, model 0x, rev. 0 > rgephy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, > 1000baseT-FDX, 1000baseT-FDX-master, auto > ue0: on ure0 > ue0: bpf attached > ue0: Ethernet address: 00:e0:4c:68:04:20 > > > When there is bigger load on the interface, for example rsync of the big > directory, the carrier is lost. The only solution I found is to remove > and insert the usb interface; ifconfig ue0 down, ifconfig ue0 up did not > help. The output of the ifconfig: > > ue0: flags=8843 metric 0 mtu 1500 > > options=68009b > ether 00:e0:4c:68:04:20 > inet 192.168.1.18 netmask 0xff00 broadcast 192.168.1.255 > media: Ethernet autoselect (100baseTX ) > status: active > nd6 options=29 > > I do not know and did not find anything relevant, if the driver is buggy > or the hardware has some problems. Please, advice. I have seen similar behavior, and unable to get an vendor support, so have stopped working on the driver. I have not found a reliable way to reset the hardware to a working state, even via power_off/power_on commands. Sorry that I don't have a solution for you. The closest that I could suggest is to try to drop the USB id from the ure driver or switch it's mode to try the ucdce driver instead. I've seen that it's been more reliable, but it could be because it also runs MUCH slower, and doesn't hit the same bug. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Re: git: 2f7f8995367b - main - libdialog: Bump shared library version to 10. [ the .so.10 is listed in mk/OptionalObsoleteFiles.inc ?]
On 10/27/21 3:23 PM, Mark Millard via freebsd-current wrote: On 2021-Oct-27, at 15:21, Mark Millard wrote: Unfortunately(?) this update added the .so.10 to mk/OptionalObsoleteFiles.inc : diff --git a/tools/build/mk/OptionalObsoleteFiles.inc b/tools/build/mk/OptionalObsoleteFiles.inc index a8b0329104c4..91822aac492a 100644 --- a/tools/build/mk/OptionalObsoleteFiles.inc +++ b/tools/build/mk/OptionalObsoleteFiles.inc _at__at_ -1663,11 +1663,11 _at__at_ OLD_FILES+=usr/bin/dialog . . . OLD_FILES+=usr/lib/libdialog.so -OLD_FILES+=usr/lib/libdialog.so.8 +OLD_FILES+=usr/lib/libdialog.so.10 . . . Looks to my like that +line should have been: +OLD_FILES+=usr/lib/libdialog.so.9 (presuming the original .so.8 was correct during .so.9 's time frame). Looks like: +OLD_FILES+=usr/lib/libdpv.so.3 is the same sort of issue and possibly should have been: +OLD_FILES+=usr/lib/libdpv.so.2 No, these lines are for removing the current versions of the libraries if you do 'make delete-old WITHOUT_DIALOG=yes'. They weren't bumped previously when I bumped them for ncurses (probably my fault). -- John Baldwin
Re: main changed DIALOG_STATE, DIALOG_VARS, and DIALOG_COLORS but /usr/lib/libdialog.so.? naming was not adjusted? (crashes in releng/13 programs on main [so: 14] can result)
On 10/22/21 1:08 AM, Mark Millard via freebsd-current wrote: main [soi: 14] commit a96ef450 (2021-02-26 09:16:49 +) changed DIALOG_STATE, DIALOG_VARS, and DIALOG_COLORS . These are publicly exposed in (ones that I noticed): /usr/include/dialog.h:extern DIALOG_STATE dialog_state; /usr/include/dialog.h:extern DIALOG_VARS dialog_vars; /usr/include/dialog.h:extern DIALOG_COLORS dlg_color_table[]; Then we need to bump libdialog's so version to 10? (I don't think libdialog has symbol versioning) -- John Baldwin
Re: ELF binary type "0" not known. (while compiling buildworld on risc-v/qemu)
On 9/27/21 7:40 AM, Karel Gardas wrote: Hello, I'm playing with compiling freebsd 13 (releng/13.0 2 days ago) and current (git HEAD as of today) on qemu-5.1.0/qemu-6.1.0 on risv64 platform. The emulator invocation is: qemu-system-riscv64 -machine virt -smp 8 -m 16G -nographic -device virtio-blk-device,drive=hd -drive file=FreeBSD-14.0-CURRENT-riscv-riscv64.qcow2,if=none,id=hd -device virtio-net-device,netdev=net -netdev user,id=net,hostfwd=tcp::2233-:22 -bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf -kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf -object rng-random,filename=/dev/urandom,id=rng -device virtio-rng-device,rng=rng -nographic -append "root=LABEL=rootfs console=ttyS0" and the host is Ubuntu 20.04.x LTS. Both qemu 5.1.0 and qemu 6.1.0 are compiled from, source, but both OpenSBI and u-boot for risc-v are Ubuntu packages provided (to accompany ubuntu provided qemu 4.2.1) My issue while compiling both 13 and current is that compilation after some time fails with: root@freebsd:/usr/src # time make -j8 buildworld > /tmp/build-j8-2.txt ELF binary type "0" not known. 17784.134u 21388.907s 1:50:13.83 592.2% 30721+572k 10+2177io 0pf+0w I'm curious if this is a know issue either in Qemu or in FreeBSD for risc-v or if I'm doing anything wrong here? It is a known issue with how we brand FreeBSD/riscv binaries. Jess (cc'd) has a WIP review with a possible fix IIRC. -- John Baldwin
Re: [HEADSUP] making /bin/sh the default shell for root
On 9/22/21 1:36 AM, Baptiste Daroussin wrote: Hello, TL;DR: this is not a proposal to deorbit csh from base!!! For years now, csh is the default root shell for FreeBSD, csh can be confusing as a default shell for many as all other unix like settled on a bourne shell compatible interactive shell: zsh, bash, or variant of ksh. Recently our sh(1) has receive update to make it more user friendly in interactive mode: * command completion (thanks pstef@) * improvement in the emacs mode, to make it behave by default like other shells * improvement in the vi mode (in particular the vi edit to respect $EDITOR) * support for history as described by POSIX. This makes it a usable shell by default, which is why I would like to propose to make it the default shell for root starting FreeBSD 14.0-RELEASE (not MFCed) If no strong arguments has been raised until October 15th, I will make this proposal happen. Again just in case: THIS IS NOT A PROPOSAL TO REMOVE CSH FROM BASE! I think this is fine. I would also be fine with either removing 'toor' from the default password file or just leaving it as-is for POLA. (I would probably prefer removing it outright.) -- John Baldwin
Re: rescue/sh check failed, installation aborted
On 8/23/21 12:18 PM, Graham Perrin wrote: Encountered whilst attempting to build and install 14.0-CURRENT over 13.0-RELEASE-p3 (experimental, helloSystem): <https://i.imgur.com/euFBA8M.png> Background, condensed, to the best of my recollection: cd /usr/src make buildworld # succeeded make kernel # failed make clean LOCAL_MODULES= # added to /etc/src.conf make kernel-toolchain make kernel restarted in single user mode mount -uw / service zfs start cd /usr/src make installworld – failed as pictured. I see <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231325>, fixed in 2018. Any suggestions? I'm not sure what the 'make clean' would have done. Did you mean 'make cleanworld'? If so, you will need to do a 'make buildworld' again before trying to do 'make installworld'. The error message implies that there is no 'make buildworld' output in /usr/obj (as if you had run 'make cleanworld' up above where you list 'make clean') -- John Baldwin
Re: etcupdate: Failed to build new tree
On 7/2/21 2:30 AM, Nuno Teixeira wrote: Hello, Last update I have some issues with etcupdate: etcupdate warning: "No previous tree to compare against, a sane comparison is not possible." That I corrected with: etcupdate extract etcupdate diff > /tmp/etc.diff patch -R < /tmp/etc.diff (etcupdate diff doesn't show any diffs.) Today I've just updated current and etcupdate -p gives: "Failed to build new tree" What might be wrong? You can look in /var/db/etcupdate/log to check for errors. -- John Baldwin
Re: CURRENT: acpi_wakecode.S error: unknown -Werror warning specifier: '-Wno-error-tautological-compare'
On 6/22/21 11:13 AM, O. Hartmann wrote: Hello, on a recent CURRENT (FreeBSD 14.0-CURRENT #6 main-n247512-e3be51b2bc7c: Tue Jun 22 15:31:03 CEST 2021 amd64) we build a 13-STABLE based NanoBSD from a dedicated source tree for a small routing appliance. It should be " ...we built ..." because sinde the introduction of LLVM/CLANG 12 on FreeBSD, the build of the source tree fails with the error shown below. Since these errors a re die to some compiler knobs, the question is how to avoid them and make the tree of 13-STABLE build again? We do not do explicetely cross compiling, so if there is in general an issue with this "brute force method" I would appreciate any recommendation to avoid such malfunctions using other techniques - as long as they are moderate to implement. Thanks in advance and kind regards, You can use 'make buildworld WITHOUT_SYSTEM_COMPILER=yes' to force your builds to use the clang 11 included in stable/13 instead of the host clang 12. You could also MFC the fixes from head to use -Wno-error= instead of -Wno-error-. -- John Baldwin
Re: etcupdate warning: "No previous tree to compare against, a sane comparison is not possible."
On 6/22/21 12:34 AM, Nuno Teixeira wrote: Hello, Should I be worry about etcupdate warning "No previous tree to compare against, a sane comparison is not possible." when I recompile and update current? I receive same warning when I do a 'etcupdate -p' after installworld too. Yes, this means etcupdate is not merging any changes to /etc. You should run 'etcupdate extract' before your next upgrade cycle. You should then review the output of 'etcupdate diff' to see if there are files in /etc that need updating. If there are files that you want to update to stock versions you can use 'etcupdate revert /path/to/file'. Otherwise, you can use the patch generated by 'etcupdate diff' either as a guide to manually update files to remove unwanted differences, or as input to patch -R. -- John Baldwin
Re: drm-kmod kernel crash fatal trap 12
On 6/15/21 11:22 AM, Bakul Shah wrote: On Jun 15, 2021, at 9:03 AM, John Baldwin wrote: On 6/10/21 8:13 AM, Bakul Shah wrote: On Jun 10, 2021, at 7:13 AM, Thomas Laus wrote: The drm-kmod module is the latest from the pkg server. It all worked this past Monday after the recent drm-kmod update. This is what I did: git clone https://github.com/freebsd/drm-kmod ln -s $PWD/drm-kmod /usr/local/sys/modules Now it gets compiled every time you do make buildkernel. If things break you can do a git pull in the drm-kmod dir and rebuild. This is what I do now as well. I think this is probably the sanest approach to use on HEAD at least. IIRC I learned this from one of your posts. The PORTS_MODULES approach results in installing kernel modules /boot/modules, which doesn't track /boot/kernel*/. Yes, PORTS_MODULES is not so great when you are building test kernels from branches that are different points in time and then go back to booting your "stock" kernel as the module is now built against the wrong ABI and breaks your "stock" kernel. This is why I added LOCAL_MODULES and the SRC knob to drm-kmod, but the source knob is a bit bumpy in practice as you sometimes need newer source than your current package. (For example, if your "stock" kernel only changes every few months, but you pull newer work trees for test kernels.) For that case, it has proven simpler to just do the direct checkout that I can git pull when needed. -- John Baldwin
Re: drm-kmod kernel crash fatal trap 12
On 6/10/21 8:13 AM, Bakul Shah wrote: On Jun 10, 2021, at 7:13 AM, Thomas Laus wrote: The drm-kmod module is the latest from the pkg server. It all worked this past Monday after the recent drm-kmod update. This is what I did: git clone https://github.com/freebsd/drm-kmod ln -s $PWD/drm-kmod /usr/local/sys/modules Now it gets compiled every time you do make buildkernel. If things break you can do a git pull in the drm-kmod dir and rebuild. This is what I do now as well. I think this is probably the sanest approach to use on HEAD at least. -- John Baldwin
Re: Files in /etc containing empty VCSId header
On 6/7/21 12:58 PM, Ian Lepore wrote: On Mon, 2021-06-07 at 13:53 -0600, Warner Losh wrote: On Mon, Jun 7, 2021 at 12:26 PM John Baldwin wrote: On 5/20/21 9:37 AM, Michael Gmelin wrote: Hi, After a binary update using freebsd-update, all files in /etc contain "empty" VCS Id headers, e.g., $ head /etc/nsswitch.conf # # nsswitch.conf(5) - name service switch configuration file # $FreeBSD$ # group: compat group_compat: nis hosts: files dns netgroup: compat networks: files passwd: compat After migrating to git, I would've expected those to contain something else or disappear completely. Is this expected and are there any plans to remove them completely? I believe we might eventually remove them in the future, but doing so right now would introduce a lot of churn and the conversion to git had enough other churn going on. We'd planned on not removing things that might be merged to stable/12 since those releases (12.3 only I think) will be built out of svn. We'll likely start to remove things more widely as the stable/12 branch reaches EOL and after. Warner It would be really nice if, instead of just deleting the $FreeBSD$ markers, they could be replaced with the path/filename of the file in the source tree. Sometimes it's a real interesting exercise to figure out where a file on your runtime system comes from in the source world. All the source tree layout changes that happened for packaged-base makes it even more interesting. My hope is that we un-break src/etc. :( A few folks have looked at doing that (notably Kyle). -- John Baldwin
Re: Files in /etc containing empty VCSId header
On 5/20/21 9:37 AM, Michael Gmelin wrote: Hi, After a binary update using freebsd-update, all files in /etc contain "empty" VCS Id headers, e.g., $ head /etc/nsswitch.conf # # nsswitch.conf(5) - name service switch configuration file # $FreeBSD$ # group: compat group_compat: nis hosts: files dns netgroup: compat networks: files passwd: compat After migrating to git, I would've expected those to contain something else or disappear completely. Is this expected and are there any plans to remove them completely? I believe we might eventually remove them in the future, but doing so right now would introduce a lot of churn and the conversion to git had enough other churn going on. -- John Baldwin
RFT: improvements to if_cdce driver
Hello, I decided to make some improvements to the CDCE driver as at least the RealTek devices (what I tested them with) when they aren't supported by ure will present as cdce devices. https://reviews.freebsd.org/D30625 This adds if_media support and link state support. The most significant change is this means that if a ue device is configured for DHCP, devd will now launch dhclient, where previously it would not, as it would neither receive the link up status (for when a cable is plugged in) nor would it be the requisit ethernet media type. The device I tested with was a RealTek 2.5G device. So, other non-RealTek devices would be great to test with. Let me know if you have any issues with the change! Thanks. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Re: tuning a zfs-mounted /var
Michael Gmelin wrote this message on Sat, May 22, 2021 at 21:13 +0200: > > On 22. May 2021, at 20:32, tech-lists wrote: > > > > ???Hi, > > > > What options could one pass to zfs to speed it up to characteristics > > favourable to what's usually in /var ? Like lots of fast writes, lots of > > files smaller than what's on /usr, lots of file creation and deletion > > but also quite a few files that might become large, like what's in > > /var/log, things like that. > > > > Make sure your pool (or at least the /var file system) has compression=lz4 > and that atime is off, beyond that I wouldn???t bother to try to optimize > manually there, unless you run a database like MySQL in /var/db/???, in which > case setting a fixed record size might make sense. And if you're running a db in /var, you should just create a new dataset for the database instead of reuse /var's dataset, that way the fixed record size does not cause problems for the rest of /var... -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
Re: etcupdate -p: No previous tree to compare against, a sane comparison is not possible. (was: Review D28062 …)
On 4/24/21 4:42 AM, Graham Perrin wrote: On 21/04/2021 18:19, John Baldwin wrote: On 4/17/21 12:52 PM, Graham Perrin wrote: 2) <https://reviews.freebsd.org/D28062#change-5KzY5tEtVUor> line 2274 etcupdate -p I get: > No previous tree to compare against, a sane comparison is not possible. Hmm, how did you initially install this machine? Release images should generally include a pre-populated /var/db/etcupdate so that etcupdate works. If you don't have one of those, you will have to perform an initial bootstrap of etcupdate (only once) by running 'etcupdate extract'. If you do this before you update /usr/src then 'etcupdate' will later work fine. If you are doing this after you have already updated /usr/src, you will need to run 'etcupdate diff' after 'etcupdate extract' and fix any unexpected local differences in the generated patch, e.g. by copying files from /var/db/etcupdate/current/etc to /etc. Once you have done this, 'etcupdate' will work fine on the next upgrade. However, I'm curious how you didn't get the etcupdate bootstrap when you initially installed. Sorry for not replying sooner. It's not an answer to your question, but might the thread at <https://lists.freebsd.org/pipermail/freebsd-current/2021-April/079538.html> be relevant? Yes, you might indeed have hit this bug (which has since been fixed). You might have to 'etcupdate extract' and then manually review 'etcupdate diff' to see if you have any unexpected diffs to recover. Sorry. :-/ -- John Baldwin ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"