Re: Graph of the FreeBSD memory fragmentation
Am 2024-05-14 03:54, schrieb Ryan Libby: That was a long winded way of saying: the "UMA bucket" axis is actually "vm phys free list order". That said, I find that dimension confusing because in fact there's just one piece of information there, the average size of a free list entry, and it doesn't actually depend on the free list order. The graph could be 2D. It evolved into that... At first I had a 3 dimensional dataset and the first try was to plot it as is (3D). The outcome (as points) was not as good as I wanted it to be, and plotting as lines gave the wrong direction of lines. I massaged the plotting instructions until it looked good enough. I did not try a 2D plot. I agree, with different colors for each free list order a 2D plot may work too. If a 2D plot is better than a 3D plot in this case, depends on the mental model of the topic the viewer has. One size may not fit all. Feel free to experiment with other plotting styles. The paper that defines this fragmentation index also says that "the fragmentation index is only meaningful when an allocation fails". Are you actually seeing any contiguous allocations failures in your measurements? I'm not aware of such. The index may only be meaningful for the purposes of the goal of the paper when there are such failures, but if you look at the graph and how it changed when Bojan changed the guard pages, I see value in the graph for more than what the paper suggests. Without that context, it seems like what the proposed sysctl reports is indirectly just the average size of free list entries. We could just report that. The calculation of the value is part of a bigger picture. The value returned is used by some other code to make decisions. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Graph of the FreeBSD memory fragmentation
Am 2024-05-08 18:45, schrieb Bojan Novković: Hi, On 5/7/24 14:02, Alexander Leidinger wrote: Hi, I created some graphs of the memory fragmentation. https://www.leidinger.net/blog/2024/05/07/plotting-the-freebsd-memory-fragmentation/ My goal was not comparing a specific change on a given benchmark, but to "have something which visualizes memory fragmentation". As part of that, Bojans commit https://cgit.freebsd.org/src/commit/?id=7a79d066976149349ecb90240d02eed0c4268737 was just in the middle of my data collection. I have the impression that it made a positive difference in my non deterministic workload. Thank you for working on this, the plots look great! They provide a really clean visual overview of what's happening. I'm working on another type of memory visualization which might interest you, I'll share it with you once its done. One small nit - the fragmentation index does not quantify fragmentation for UMA buckets, but for page allocator freelists. Do I get it more correctly now: UMA buckets are type/structure specific allocation lists, and the page allocator freelists are size-specific allocation lists (which are used by UMA when no free item is available in a bucket)? Is there anything which prevents https://reviews.freebsd.org/D40575 to be committed? D40575 is closely tied to the compaction patch (D40772) which is currently on hold until another issue is solved (see D45046 and related revisions for more details). Any idea about https://reviews.freebsd.org/D16620 ? Is D45046 supposed to replace this, or is it about something else? I wanted to try D16620, but it doesn't apply and my naive/mechanical way of applying it panics. I didn't consider landing D40575 because of that, but I guess it could be useful on its own. It at least gives a way to quantify with numbers resp. qualitatively visualize. And as such it may help in visualizing differences like with your guard-pages commit. I wonder if the segregation of nofree allocations may result in a similar improvement for long-running systems. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Graph of the FreeBSD memory fragmentation
Hi, I created some graphs of the memory fragmentation. https://www.leidinger.net/blog/2024/05/07/plotting-the-freebsd-memory-fragmentation/ My goal was not comparing a specific change on a given benchmark, but to "have something which visualizes memory fragmentation". As part of that, Bojans commit https://cgit.freebsd.org/src/commit/?id=7a79d066976149349ecb90240d02eed0c4268737 was just in the middle of my data collection. I have the impression that it made a positive difference in my non deterministic workload. Is there anything which prevents https://reviews.freebsd.org/D40575 to be committed? Maybe some other people want to have a look at the memory fragmentation and some of Bojans work (https://wiki.freebsd.org/SummerOfCode2023Projects/PhysicalMemoryAntiFragmentationMechanisms). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Strange network/socket anomalies since about a month
art (those with "Timed out waiting for server startup" are maybe the processes which fork to start the server and wait for it to be started), some are the stat-query, and some seem to be a successful start in another poudriere-builder (those with a successful /root/.ccache/sccache/5/4/ access look like from a successful start in another jail). Maybe there is also a --stop-server from poudriere somewhere. What I noticed (except that printing the new CAP stuff for non-CAP enabled processes by default is disturbing) is, that compat11 stuff is called (seems the rust ecosystem is not keeping up with our speed of development...). Not sure if it matters here that some compat stuff is called. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Strange network/socket anomalies since about a month
Hi, I see a higher failure rate of socket/network related stuff since a while. Those failures are transient. Directly executing the same thing again may or may not result in success/failure. I'm not able to reproduce this at will. Sometimes they show up. Examples: - poudriere runs with the sccache overlay (like ccache but also works for rust) sometimes fail to create the communication socket and as such the build fails. I have 3 different poudriere bulk runs after each other in my build script, and when the first one fails, the second and third still run. If the first fails due to the sccache issue, the second and 3rd may or may not fail. Sometimes the first fails and the rest is ok. Sometimes all fail, and if I then run one by hand it works (the script does the same as the manual run, the script is simply a "for type in A B C; do; poudriere bulk -O sccache -j $type -f ${type}.pkglist; done" which I execute from the same shell, and the script doesn't do env-sanityzing). - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx (webmail service) -> php -> imap) sees intermittent issues sometimes. Opening the same email directly again afterwards normally works. I've also seen transient issues with pgp signing (webmail interface -> gnupg / gpg-agent on the server), simply hitting send again after a failure works fine. Gleb, could this be related to the socket stuff you did 2 weeks ago? My world is from 2024-04-17-112537. I do notice this since at least then, but I'm not sure if they where there before that and I simply didn't notice them. They are surely "new recently", that amount of issues I haven's seen in January. The last two updates of current I did before the last one where on 2024-03-31-120210 and 2024-04-08-112551. I could also imagine that some memory related transient failure could cause this, but with >3 GB free I do not expect this. Important here may be that I have https://reviews.freebsd.org/D40575 in my tree, which is memory related, but it's only a metric to quantify memory fragmentation. Any ideas how to track this down more easily than running the entire poudriere in ktrace (e.g. a hint/script which dtrace probes to use)? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Am 2024-03-29 18:21, schrieb Alexander Leidinger: Am 2024-03-29 18:13, schrieb Mark Johnston: On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. The working src is from 2024-03-11-094351 (GMT+0100). The failing src was fetched after Glebs stabilization week message (and todays src before the sound stuff still fails). Retpoline wasn't the cause, next test is the CTF stuff in the kernel... A rather obscure problem was causing this. The "last" BE had canmount set to "on" instead of "noauto". No idea how this happened, but this resulted in the "last" BE to be mounted on "zfs mount -a" on top of the current BE. This means that all modules loaded after the zfs rc script has run was loading old kernel modules and the error message of kernel version mismatch was correct. I fiund the issue while bisecting the tree and suddenly the error message went away but the new issue of missing dev entries popped up (/dev was mounted correctly on the booting dataset, but the last BE was mounted on top of it and /dev went empty...). It looks to me like bectl was doing this (from "zpool history")... 2024-03-11.14:16:31 zpool set bootfs=rpool/ROOT/2024-03-11-094351 rpool 2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-01-18-092730 2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-02-10-144617 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-11-212006 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-16-082836 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211 2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211_ok 2024-03-11.14:16:33 zfs set canmount=on rpool/ROOT/2024-03-11-094351 2024-03-11.14:16:33 zfs promote rpool/ROOT/2024-03-11-094351 2024-03-11.14:17:03 zfs destroy -r rpool/ROOT/2024-02-24-140211_ok I surely didn't do the "zfs set canmount=..." for those by hand. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Am 2024-03-29 18:13, schrieb Mark Johnston: On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote: Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. What is the exact revision you're running? There were some unrelated changes to the kernel linker around the same time. The working src is from 2024-03-11-094351 (GMT+0100). The failing src was fetched after Glebs stabilization week message (and todays src before the sound stuff still fails). Retpoline wasn't the cause, next test is the CTF stuff in the kernel... I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? From my reading of linker_ctf_load_file(), this is exactly how it already works. Great that it works this way, I still suggest to print a message what the warning about missing stuff means. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)
Hi, sources from 2024-03-11 work. Sources from 2024-03-25 and today don't work (see below for the issue). As the monthly stabilisation pass didn't find obvious issues, it is something related to my setup: - not a generic kernel - very modular kernel (as much as possible as a module) - bind_now (a build without fails too, tested with clean /usr/obj) - ccache (a build without fails too, tested with clean /usr/obj) - kernel retpoline (build without in progress) - userland retpoline (build without in progress) - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't retpoline) - -fno-builtin - CPUFLAGS=native (except for stuff in /usr/src/sys/boot) - malloc production - COPTFLAGS= -O2 -pipe The issue is, that kernel modules load OK from loader, but once it starts init any module fails to load (e.g. via autodetection of hardware or rc.conf kld_list) with the message that the kernel and module versions are out of sync and the module refuses to load. I tried the workaround to load the modules from the loader, which works, but then I can't login remotely as ssh fails to allocate a pty. By loading modules via the loader, I can see messages about missing CTF info when the nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko instead of /boot/kernel/...ko) try to get initialised... and it looks like they are failing to get initialised because of this missing CTF stuff (I'm back to the previous boot env to be able to login remotely and send mails, I don't have a copy of the failure message at hand). I assume the missing CTF stuff is due to the CTF based pretty printing (https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). Is this supposed to fail to load modules which are compiled without CTF data? Shouldn't this work gracefully (e.g. spit out a warning that pretty printing is not available for module X and have the module working)? Next steps: - try a world without retpoline (bind_now and ccache active) - try a kernel without CTF (bind now, ccache, retpoline active) - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS If anyone has an idea how to debug this in some other way... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Reason why "nocache" option is not displayed in "mount"?
Am 2024-03-10 22:57, schrieb Konstantin Belousov: We are already low on the free bits in the flags, even after expanding them to 64bit. More, there are useful common fs services continuously consuming that flags, e.g. the recent NFS TLS options. I object against using the flags for absolutely not important things, like this nullfs "cache" option. In long term, we would have to export nmount(2) strings since bits in flags are finite, but I prefer to delay it as much as possible. Why do you want to delay this? Personal priorities, or technical reasons? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Reason why "nocache" option is not displayed in "mount"?
Am 2024-03-09 15:27, schrieb Rick Macklem: On Sat, Mar 9, 2024 at 5:08 AM Alexander Leidinger wrote: Am 2024-03-09 06:07, schrieb Warner Losh: > On Thu, Mar 7, 2024 at 1:05 PM Jamie Landeg-Jones > wrote: > >> Alexander Leidinger wrote: >> >>> Hi, >>> >>> what is the reason why "nocache" is not displayed in the output of >>> "mount" for nullfs options? >> >> Good catch. I also notice that "hidden" is not shown either. >> >> I guess that as for some time, "nocache" was a "secret" option, no-one >> update "mount" to display it? > > So a couple of things to know. > > First, there's a list of known options. These are converted to a > bitmask. This is then decoded and reported by mount. The other strings > are passed to the filesystem directly. They decode it and do things, > but they don't export them (that I can find). I believe that's why they > aren't reported with 'mount'. There's a couple of other options in > /etc/fstab that are pseudo options too. That's the technical explanation why it doesn't work. I'm a step further since initial mail, I even had a look at the code and know that nocache is recorded in a nullfs private flag and that the userland can not access this (mount looks at struct statfs which doesn't provide info to this and some other things). My question was targeted more in the direction if there is a conceptual reason or if it was an oversight that it is not displayed. I admit that this was lost in translation... Regarding the issue of not being able to see all options which are in effect for a given mount point (not specific to nocache): I consider this to be a bug. Pseudo options like "late" or "noauto" in fstab which don't make sense to use when you use mount(8) a FS by hand, I do not consider here. As a data point, I added the "-m"option to nfsstat(1) so that all the nfs related options get displayed. Part of the problem is that this will be file system specific, since nmount() defers processing options to the file systems. There exists values for a lot of the mount opions which are not displayed. For example the nocache option for nullfs is MNTK_NULL_NOCACHE in https://cgit.freebsd.org/src/tree/sys/sys/mount.h#n515 This may not be useable as is, but I use it to show that there are already bits public about it, just not in the proper place to be useful to the userland. Even FS specific options could be set as part of statfs (by letting the FS set them in struct statfs). Or there could be a per-mount callback / ioctl / whatever which provides the options in some way to the userland if requested. So we either have something which could be used but requires some interface to let a FS set a value somewhere, or if this is a too gross hack, we would need to come up with a new interface to query this info. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Reason why "nocache" option is not displayed in "mount"?
Am 2024-03-09 06:07, schrieb Warner Losh: On Thu, Mar 7, 2024 at 1:05 PM Jamie Landeg-Jones wrote: Alexander Leidinger wrote: Hi, what is the reason why "nocache" is not displayed in the output of "mount" for nullfs options? Good catch. I also notice that "hidden" is not shown either. I guess that as for some time, "nocache" was a "secret" option, no-one update "mount" to display it? So a couple of things to know. First, there's a list of known options. These are converted to a bitmask. This is then decoded and reported by mount. The other strings are passed to the filesystem directly. They decode it and do things, but they don't export them (that I can find). I believe that's why they aren't reported with 'mount'. There's a couple of other options in /etc/fstab that are pseudo options too. That's the technical explanation why it doesn't work. I'm a step further since initial mail, I even had a look at the code and know that nocache is recorded in a nullfs private flag and that the userland can not access this (mount looks at struct statfs which doesn't provide info to this and some other things). My question was targeted more in the direction if there is a conceptual reason or if it was an oversight that it is not displayed. I admit that this was lost in translation... Regarding the issue of not being able to see all options which are in effect for a given mount point (not specific to nocache): I consider this to be a bug. Pseudo options like "late" or "noauto" in fstab which don't make sense to use when you use mount(8) a FS by hand, I do not consider here. I'm not sure if this warrants a bug tracker item (which maybe nobody is interested to take ownership of), or if we need to extend the man pages with info which option will not by displayed in the output of mounted FS, or both. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Reason why "nocache" option is not displayed in "mount"?
Am 2024-03-07 14:59, schrieb Christos Chatzaras: what is the reason why "nocache" is not displayed in the output of "mount" for nullfs options? # grep packages /etc/fstab.commit_leidinger_net /shared/ports/packages /space/jails/commit.leidinger.net/shared/ports/packages nullfs rw,noatime,nocache 0 0 # mount | grep commit | grep packages /shared/ports/packages on /space/jails/commit.leidinger.net/shared/ports/packages (nullfs, local, noatime, noexec, nosuid, nfsv4acls) Context: I wanted to check if poudriere is mounting with or without "nocache", and instead of reading the source I wanted to do it more quickly by looking at the mount options. In my setup, I mount the /home directory using nullfs with the nocache option to facilitate access for certain jails. The primary reason for employing nocache is due to the implementation of ZFS quotas on the main system, which do not accurately reflect changes in file usage by users within the jail unless nocache is used. When files are added or removed by a user within jail, their disk usage wasn't properly updated on the main system until I started using nocache. Based on this experience, I'm confident that applying nocache works as expected in your scenario as well. It does. The question is how to I _see_ that a mount point is _setup_ with nocache? In the above example the FS _is_ mounted with nocache, but it is _not displayed_ in the output. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Reason why "nocache" option is not displayed in "mount"?
Hi, what is the reason why "nocache" is not displayed in the output of "mount" for nullfs options? # grep packages /etc/fstab.commit_leidinger_net /shared/ports/packages /space/jails/commit.leidinger.net/shared/ports/packages nullfs rw,noatime,nocache 0 0 # mount | grep commit | grep packages /shared/ports/packages on /space/jails/commit.leidinger.net/shared/ports/packages (nullfs, local, noatime, noexec, nosuid, nfsv4acls) Context: I wanted to check if poudriere is mounting with or without "nocache", and instead of reading the source I wanted to do it more quickly by looking at the mount options. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: main [so: 15] context, 7950X3D and RTL8251/8153 based Ethernet dongle: loss of state, example log information
On 04.03.2024 15:33, Jakob Alvermark wrote: On 3/4/24 21:13, Alexander Motin wrote: On 04.03.2024 15:00, Poul-Henning Kamp wrote: Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP I consistently had similar problems with my 0x17ef/0x3066 "ThinkPad Thunderbolt 3 Dock MCU", but they went away after I forced it to use the if_cdce driver instead with this quirk: /* This works much better with if_cdce than if_ure */ USB_QUIRK(LENOVO, TBT3LAN, 0x, 0x, UQ_CFG_INDEX_1), AFAIK it is only a workaround. I saw it myself on number of different USB dongles and laptops, that USB starting experience some problems with multiple NIC queues and some other factors. IIRC the Realtek driver was much more stable once I limited it to one queue and some other hacks. IIRC if_cdce just has only one queue and other limitations, that not only makes it more stable, but also much slower. It would be good to understand what's wrong is there exactly, since IMHO it is a big problem now. Unfortunately HPS was unable to reproduce it on his laptop (that makes me wonder if is is specific to chipset(s) or thunderbolt?), so it ended nowhere so far. I have a Lenovo USB 3 dongle, so no thunderbolt. I also use USB3 dongles. But in my laptops the USB 3 ports are provided by Intel Thunderbolt controller, while in HPS' they were plain from USB3 controller. Though it may be just a coincidence. USB ID 0x17ef/0x7205 rgephy1: PHY 0 on miibus1 I tried using the cdce driver, it gives me < 100Mb/s, while the ure driver gets > 500Mb/s Right, I saw about the same. -- Alexander Motin
Re: main [so: 15] context, 7950X3D and RTL8251/8153 based Ethernet dongle: loss of state, example log information
On 04.03.2024 15:00, Poul-Henning Kamp wrote: Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP I consistently had similar problems with my 0x17ef/0x3066 "ThinkPad Thunderbolt 3 Dock MCU", but they went away after I forced it to use the if_cdce driver instead with this quirk: /* This works much better with if_cdce than if_ure */ USB_QUIRK(LENOVO, TBT3LAN, 0x, 0x, UQ_CFG_INDEX_1), AFAIK it is only a workaround. I saw it myself on number of different USB dongles and laptops, that USB starting experience some problems with multiple NIC queues and some other factors. IIRC the Realtek driver was much more stable once I limited it to one queue and some other hacks. IIRC if_cdce just has only one queue and other limitations, that not only makes it more stable, but also much slower. It would be good to understand what's wrong is there exactly, since IMHO it is a big problem now. Unfortunately HPS was unable to reproduce it on his laptop (that makes me wonder if is is specific to chipset(s) or thunderbolt?), so it ended nowhere so far. -- Alexander Motin
Re: February 2024 stabilization week
Am 2024-02-24 21:18, schrieb Konstantin Belousov: On Fri, Feb 23, 2024 at 08:34:21PM -0800, Gleb Smirnoff wrote: Hi FreeBSD/main users, the February 2024 stabilization week started with 03cc3489a02d that was tagged as main-stabweek-2024-Feb. At the moment of the tag creation we already knew about several regression caused by libc/libsys split. In the stabilization branch stabweek-2024-Feb we accumulated following cherry-picks from FreeBSD/main: 1) closefrom() syscall was failing unless you have COMPAT_FREEBSD12 in kernel 99ea67573164637d633e8051eb0a5d52f1f9488e eb90239d08863bcff3cf82a556ad9d89776cdf3f 2) nextboot -k broken on ZFS 3aefe6759669bbadeb1a24a8956bf222ce279c68 0c3ade2cf13df1ed5cd9db4081137ec90fcd19d0 3) libsys links to libc baa7d0741b9a2117410d558c6715906980723eed 4) sleep(3) no longer being a pthread cancellation point 7d233b2220cd3d23c028bdac7eb3b6b7b2025125 We are aware of two regressions still unresolved: 1) libsys/rtld breaks bind 9.18 / mysql / java / ... https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277222 Konstantin, can you please check me? Is this the same issue fixed by baa7d0741b9a2117410d558c6715906980723eed or a different one? Most likely. Since no useful diagnostic was provided, I cannot confirm. It is. And for the curious reader: this affected a world which was build with WITH_BIND_NOW (ports build with RELRO and BIND_NOW were unaffected, as long as the basesystem was not build with BIND_NOW). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: sanitizers broken (was RE: libc/libsys split coming soon)
Am 2024-02-21 10:52, schrieb hartmut.bra...@dlr.de: Hi, I updated yesterday and now event a minimal program with cc -fsanitize=address produces ld: error: undefined symbol: __elf_aux_vector referenced by sanitizer_linux_libcdep.cpp:950 (/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:950) sanitizer_linux_libcdep.o:(__sanitizer::ReExec()) in archive /usr/lib/clang/17/lib/freebsd/libclang_rt.asan-x86_64.a cc: error: linker command failed with exit code 1 (use -v to see invocation) I think this is caused by the libsys split. There are other issues too. Discussed in multiple places. I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277222 this morning, maybe it can be used to centralize the libsys issues (= I don't mind of you add a comment there, but maybe brooks wants to have a separate PR). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: segfault in ld-elf.so.1
Am 2024-02-13 01:58, schrieb Konstantin Belousov: On Mon, Feb 12, 2024 at 11:54:02AM +0200, Konstantin Belousov wrote: On Mon, Feb 12, 2024 at 10:35:56AM +0100, Alexander Leidinger wrote: > Hi, > > dovecot (and no other program I use on this machine... at least not that I > notice it) segfaults in ld-elf.so.1 after an update from 2024-01-18-092730 > to 2024-02-10-144617 (and now 2024-02-11-212006 in the hope the issue would > have been fixed by changes to libc/libsys since 2024-02-10-144617). The > issue shows up when I try to do an IMAP login. A successful authentication > starts the imap process which immediately segfaults. > > I didn't recompile dovecot for the initial update, but I did now to rule > out a regression in this area (and to get access via imap do my normal mail > account). > > > Backtrace: The backtrace looks incomplete. It might be the case of infinite recursion, but I cannot claim it from the trace. Does the program segfault if you run it manually? If yes, please provide No. me with the tarball of the binary and all required shared libs, including base system libraries, from your machine. Regardless of my request, you might try the following. Note that I did not tested the patch, ensure that you have a way to recover ld-elf.so.1 if something goes wrong. [inline patch] This did the trick and I have IMAP access to my emails again. As this runs in a jail, it was easy to test without fear to kill something. I will try the patch in the review next. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
kernel crash in tcp_subr.c:2386
Hi, I got a coredump with sources from 2024-02-10-144617 (GMT+0100): ---snip--- __curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:57 57 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:57 td = #1 doadump (textdump=textdump@entry=1) at /space/system/usr_src/sys/kern/kern_shutdown.c:403 error = 0 coredump = #2 0x8052fe85 in kern_reboot (howto=260) at /space/system/usr_src/sys/kern/kern_shutdown.c:521 once = 0 __pc = #3 0x80530382 in vpanic ( fmt=0x808df476 "Assertion %s failed at %s:%d", ap=ap@entry=0xfe08a079ebf0) at /space/system/usr_src/sys/kern/kern_shutdownc:973 buf = "Assertion !callout_active(>t_callout) failed at /space/system/usr_src/sys/netinet/tcp_subr.c:2386", '\000' __pc = __pc = __pc = other_cpus = {__bits = {14680063, 0 }} td = 0xf8068ef99740 bootopt = newpanic = #4 0x805301d3 in panic (fmt=) at /space/system/usr_src/sys/kern/kern_shutdown.c:889 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0xfe08a079ec20, reg_save_area = 0xfe08a079ebc0}} #5 0x806c9d8c in tcp_discardcb (tp=tp@entry=0xf80af441ba80) at /space/system/usr_src/sys/netinet/tcp_subr.c:2386 inp = 0xf80af441ba80 so = 0xf804d23d2780 m = isipv6 = #6 0x806d6291 in tcp_usr_detach (so=0xf804d23d2780) at /space/system/usr_src/sys/netinet/tcp_usrreq.c:214 inp = 0xf80af441ba80 tp = 0xf80af441ba80 #7 0x805dba57 in sofree (so=0xf804d23d2780) at /space/system/usr_src/sys/kern/uipc_socket.c:1205 pr = 0x80a8bd18 #8 sorele_locked (so=so@entry=0xf804d23d2780) at /space/system/usr_src/sys/kern/uipc_socket.c:1232 No locals. #9 0x805dc8c0 in soclose (so=0xf804d23d2780) at /space/system/usr_src/sys/kern/uipc_socket.c:1302 lqueue = {tqh_first = 0xf8068ef99740, tqh_last = 0xfe08a079ed40} error = 0 saved_vnet = 0x0 last = listening = #10 0x804ccbd1 in fo_close (fp=0xf805f2dfc500, td=) at /space/system/usr_src/sys/sys/file.h:390 No locals. #11 _fdrop (fp=fp@entry=0xf805f2dfc500, td=, td@entry=0xf8068ef99740) at /space/system/usr_src/sys/kern/kern_descrip.c:3666 count = error = #12 0x804d02f3 in closef (fp=fp@entry=0xf805f2dfc500, td=td@entry=0xf8068ef99740) at /space/system/usr_src/sys/kern/kern_descrip.c:2839 _error = 0 _fp = 0xf805f2dfc500 lf = {l_start = -8791759350504, l_len = -8791759350528, l_pid = 0, l_type = 0, l_whence = 0, l_sysid = 0} vp = fdtol = fdp = #13 0x804cd50c in closefp_impl (fdp=0xfe07afebf860, fd=19, fp=0xf805f2dfc500, td=0xf8068ef99740, audit=) at /space/system/usr_src/sys/kern/kern_descrip.c:1315 error = #14 closefp (fdp=0xfe07afebf860, fd=19, fp=0xf805f2dfc500, td=0xf8068ef99740, holdleaders=true, audit=) at /space/system/usr_src/sys/kern/kern_descrip.c:1372 No locals. #15 0x808597d6 in syscallenter (td=0xf8068ef99740) at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:186 se = 0x80a48330 p = 0xfe07f29995c0 sa = 0xf8068ef99b30 error = sy_thr_static = traced = #16 amd64_syscall (td=0xf8068ef99740, traced=0) at /space/system/usr_src/sys/amd64/amd64/trap.c:1192 ksi = {ksi_link = {tqe_next = 0xfe08a079ef30, tqe_prev = 0x808588af }, ksi_info = { si_signo = 1, si_errno = 0, si_code = 2015268872, si_pid = -512, si_uid = 2398721856, si_status = -2042, si_addr = 0xfe08a079ef40, si_value = {sival_int = -1602621824, sival_ptr = 0xfe08a079ee80, sigval_int = -1602621824, sigval_ptr = 0xfe08a079ee80}, _reason = {_fault = { _trapno = 1489045984}, _timer = {_timerid = 1489045984, _overrun = 17999}, _mesgq = {_mqd = 1489045984}, _poll = { _band = 77306605406688}, _capsicum = {_syscall = 1489045984}, __spare__ = {__spare1__ = 77306605406688, __spare2__ = { 1489814048, 17999, 208, 0, 0, 0, 992191072, ksi_flags = 975329968, ksi_sigq = 0x8082f8f3 } #17 No locals. #18 0x3af13b17fc9a in ?? () No symbol table info available. Backtrace stopped: Cannot access memory at address 0x3af13a225ab8 ---snip--- Any ideas? Due to another issue in userland, I updated to 2024-02-11-212006, but I have the above mentioned version and core still in a BE if needed. Bye, Alexander.
segfault in ld-elf.so.1
.1`symlook_obj [inlined] load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80) at rtld.c:2589:2 frame #29: 0x4d3dfa2a223e ld-elf.so.1`symlook_obj(req=0x1ded011519c0, obj=0x49a47c228008) at rtld.c:4735:6 frame #30: 0x4d3dfa2a6992 ld-elf.so.1`symlook_list(req=0x1ded01151a48, objlist=, dlp=0x1ded01151b90) at rtld.c:4637:13 frame #31: 0x4d3dfa2a680b ld-elf.so.1`symlook_global(req=0x1ded01151b50, donelist=0x1ded01151b90) at rtld.c:4541:8 frame #32: 0x4d3dfa2a6673 ld-elf.so.1`get_program_var_addr(name=, lockstate=0x1ded0f98cb80) at rtld.c:4483:9 frame #33: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined] distribute_static_tls(list=0x1ded01152068, lockstate=0x1ded0f98cb80) at rtld.c:5908:6 frame #34: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1, refobj=0x49a47c228008, lo_flags=0, mode=1, lockstate=0x1ded0f98cb80) at rtld.c:3831:6 frame #35: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined] load_filtee1(obj=, needed=0x49a47c2007c8, flags=, lockstate=) at rtld.c:2576:16 frame #36: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined] load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80) at rtld.c:2589:2 frame #37: 0x4d3dfa2a223e ld-elf.so.1`symlook_obj(req=0x1ded01152160, obj=0x49a47c228008) at rtld.c:4735:6 frame #38: 0x4d3dfa2a6992 ld-elf.so.1`symlook_list(req=0x1ded011521e8, objlist=, dlp=0x1ded01152330) at rtld.c:4637:13 frame #39: 0x4d3dfa2a680b ld-elf.so.1`symlook_global(req=0x1ded011522f0, donelist=0x1ded01152330) at rtld.c:4541:8 frame #40: 0x4d3dfa2a6673 ld-elf.so.1`get_program_var_addr(name=, lockstate=0x1ded0f98cb80) at rtld.c:4483:9 frame #41: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined] distribute_static_tls(list=0x1ded01152808, lockstate=0x1ded0f98cb80) at rtld.c:5908:6 frame #42: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1, refobj=0x49a47c228008, lo_flags=0, mode=1, lockstate=0x1ded0f98cb80) at rtld.c:3831:6 frame #43: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined] load_filtee1(obj=, needed=0x49a47c2007c8, flags=, lockstate=) at rtld.c:2576:16 frame #44: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined] load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80) at rtld.c:2589:2 frame #45: 0x4d3dfa2a223e ld-elf.so.1`symlook_obj(req=0x1ded01152900, obj=0x49a47c228008) at rtld.c:4735:6 frame #46: 0x4d3dfa2a6992 ld-elf.so.1`symlook_list(req=0x1ded01152988, objlist=, dlp=0x1ded01152ad0) at rtld.c:4637:13 frame #47: 0x4d3dfa2a680b ld-elf.so.1`symlook_global(req=0x1ded01152a90, donelist=0x1ded01152ad0) at rtld.c:4541:8 frame #48: 0x4d3dfa2a6673 ld-elf.so.1`get_program_var_addr(name=, lockstate=0x1ded0f98cb80) at rtld.c:4483:9 frame #49: 0x4d3dfa2a4374 ld-elf.so1`dlopen_object [inlined] distribute_static_tls(list=0x1ded01152fa8, lockstate=0x1ded0f98cb80) at rtld.c:5908:6 frame #50: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1, refobj=0x49a47c228008, lo_flags=0, mode=1, lockstate=0x1ded0f98cb80) at rtld.c:3831:6 frame #51: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined] load_filtee1(obj=, needed=0x49a47c2007c8, flags=, lockstate=) at rtld.c:2576:16 frame #52: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined] load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80) at rtld.c:2589:2 frame #53: 0x4d3dfa2a223e ld-elf.so.1`symlook_obj(req=0x1ded011530a0, obj=0x49a47c228008) at rtld.c:4735:6 frame #54: 0x4d3dfa2a6992 ld-elf.so.1`symlook_list(req=0x1ded01153128, objlist=, dlp=0x1ded01153270) at rtld.c:4637:13 frame #55: 0x4d3dfa2a680b ld-elf.so.1`symlook_global(req=0x1ded01153230, donelist=0x1ded01153270) at rtld.c:4541:8 frame #56: 0x4d3dfa2a6673 ld-elf.so.1`get_program_var_addr(name=, lockstate=0x1ded0f98cb80) at rtld.c:4483:9 ---snip--- Bye, Alexander.
Re: noatime on ufs2
Am 2024-01-30 01:21, schrieb Warner Losh: On Mon, Jan 29, 2024 at 2:31 PM Olivier Certner wrote: It also seems undesirable to add a sysctl to control a value that the kernel doesn't use. The kernel has to use it to guarantee some uniform behavior irrespective of the mount being performed through mount(8) or by a direct call to nmount(2). I think this consistency is important. Perhaps all auto-mounters and mount helpers always run mount(8) and never deal with nmount(2), I would have to check (I seem to remember that, a long time ago, when nmount(2) was introduced as an enhancement over mount(2), the stance was that applications should use mount(8) and not nmount(2) directly). Even if there were no obvious callers of nmount(2), I would be a bit uncomfortable with this discrepancy in behavior. I disagree. I think Mike's suggestion was better and dealt with POLA and POLA breaking in a sane way. If the default is applied universally in user space, then we need not change the kernel at all. We lose all the chicken and egg problems and the non-linearness of the sysctl idea. I would like to add that a sysctl is some kind of a hidden setting, whereas /etc/fstab + /etc/defaults/fstab is a "right in the face" way of setting filesystem / mount related stuff. [...] It could also be generalized so that the FSTYPE could have different settings for different types of filesystem (maybe unique flags that some file systems don't understand). +1 nosuid for tmpfs comes into my mind here... One could also put it in /etc/defaults/fstab too and not break POLA since that's the pattern we use elsewhere. +1 Anyway, I've said my piece. I agree with Mike that there's consensus for this from the installer, and after that consensus falls away. Mike's idea is one that I can get behind since it elegantly solves the general problem. +1 Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Removing fdisk and bsdlabel (legacy partition tools)
Am 2024-01-25 18:49, schrieb Rodney W. Grimes: On Thu, Jan 25, 2024, 9:11?AM Ed Maste wrote: > On Thu, 25 Jan 2024 at 11:00, Rodney W. Grimes > wrote: > > > > > These will need to be addressed before actually removing any of these > > > binaries, of course. > > > > You seem to have missed /rescue. Now think about that long > > and hard, these tools classified as so important that they > > are part of /rescue. Again I can not stress enough how often > > I turn to these tools in a repair mode situation. > > I haven't missed rescue, it is included in the work in progress I > mentioned. Note that rescue has included gpart since 2007. > What can fdisk and/or disklabel repair that gpart can't? As far as I know there is no way in gpart to get to the MBR cyl/hd/sec values, you can only get to the LBA start and end values: sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD) start 63, size 8388513 (4095 Meg), flag 80 (active) beg: cyl 0/ head 1/ sector 1; end: cyl 1023/ head 15/ sector 63 gpart show ada0 => 63 8388545 ada0 MBR (4.0G) 63 8388513 1 freebsd [active] (4.0G) 8388576 32- free - (16K) What are you using cyl/hd/sec values for on a system which runs FreeBSD current or on which you would have to use FreeBSD-current in case of a repair need? What is the disk hardware on those systems that you still need cyl/hd/sec and LBA doesn't work? Serious questions out of curiosity. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: noatime on ufs2
e of having 'noatime' the default is in less tweaking by most people, and one less thing to worry about (for them). I proposed in another mail having a sysctl which indicates the default ('noatime' or 'atime') for all filesystems. This default would be used at mount time if neither 'atime' nor 'noatime' is explicitly specified. That way, people wanting 'noatime' by default everywhere could just set it to that. It may also convince reticent people to have the default (i.e., this sysctl's default value) changed to 'noatime', by providing a very simple way to revert to the old behavior. While I agree that this would be an easy way of globally changing the default, what makes noatime special compared to nocover, or nfs4acl, or noexec, or nosuid, or whatever other option? Mounting noexec and nosuid by default and having those FS be mounted explicitely suid/exec which really need it would be a security benefit. And cover/nocover would prevent accidental foot-shooting. Where do you want to draw the line between "easy" and "explicit"? Only having atime/noatime handled like that looks inconsistent to me (which - I hope - not only me thinks is a POLA violation). I fully agree with you regarding switching to noatime by default. I think this should not be done by changing the defaults in each FS. I think that having a sysctl only for atime/noatime is an ugly inconsistency (probably I wouldn't use a generic framework which handles all sensible mount options like that, and I think it would be overkill, but I wouldn't object to it). In my opinion the correct way of handling it is to ask the user at install time, and existing systems shall be handled by those which administrate them (don't touch an existing fstab; changing the default in the automounter config for a .0 release would be OK in my opinion, for a .x release in the middle of a stable branch I would add a commented out noatime option to make it visible but not active). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: noatime on ufs2
Am 2024-01-11 18:15, schrieb Rodney W. Grimes: Am 2024-01-10 22:49, schrieb Mark Millard: > I never use atime, always noatime, for UFS. That said, I'd never > propose > changing the long standing defaults for commands and calls. I'd avoid: [good points I fully agree on] There's one possibility which nobody talked about yet... changing the default to noatime at install time in fstab / zfs set. Perhaps you should take a closer look at what bsdinstall does when it creates a zfs install pool and boot environment, you might just find that noatime is already set everywhere but on /var/mail: /usr/libexec/bsdinstall/zfsboot:: ${ZFSBOOT_POOL_CREATE_OPTIONS:=-O compress=lz4 -O atime=off} /usr/libexec/bsdinstall/zfsboot:/var/mail atime=on While zfs is a part of what I talked about, it is not the complete picture. bsdinstall covers UFS and ZFS, and we should keep them in sync in this regard. Ideally with an option the user can modify. Personally I don't mind if the default setting for this option would be noatime. A quick serach in the scripts of bsdinstall didn't reveal to me what we use for UFS. I assume we use atime. I fully agree to not violate POLA by changing the default to noatime in any FS. I always set noatime everywhere on systems I take care about, no exceptions (any user visible mail is handled via maildir/IMAP, not mbox). I haven't made up my mind if it would be a good idea to change bsdinstall to set noatime (after asking the user about it, and later maybe offer the possibility to use relatime in case it gets implemented). I think it is at least worthwile to discuss this possibility (including what the default setting of bsdinstall should be for this option). Little late... iirc its been that way since day one of zfs support in bsdinstall. Which I don't mind, as this is what I use anyway. But the correct way would be to let the user decide. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: noatime on ufs2
Am 2024-01-10 22:49, schrieb Mark Millard: I never use atime, always noatime, for UFS. That said, I'd never propose changing the long standing defaults for commands and calls. I'd avoid: [good points I fully agree on] There's one possibility which nobody talked about yet... changing the default to noatime at install time in fstab / zfs set. I fully agree to not violate POLA by changing the default to noatime in any FS. I always set noatime everywhere on systems I take care about, no exceptions (any user visible mail is handled via maildir/IMAP, not mbox). I haven't made up my mind if it would be a good idea to change bsdinstall to set noatime (after asking the user about it, and later maybe offer the possibility to use relatime in case it gets implemented). I think it is at least worthwile to discuss this possibility (including what the default setting of bsdinstall should be for this option). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: ZFS problems since recently ?
John, On 04.01.2024 09:20, John Kennedy wrote: On Tue, Jan 02, 2024 at 08:02:04PM -0800, John Kennedy wrote: On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote: On 01.01.2024 08:59, John Kennedy wrote: ... My poudriere build did eventually fail as well: ... [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success [05:40:24] Stopping 2 builders panic: VERIFY(BP_GET_DEDUP(bp)) failed Please see/test: https://github.com/openzfs/zfs/pull/15732 . It came back today at the end of my poudriere build. Your patch has fixed it, so far at least. At the risk of conflating this with other ZFS issues, I beat on the VM a lot more last night without triggering any panics. My usual busy-workload is a total kernel+world rebuild (with whatever pending patches might be out), then a poudriere run (~230 or so packages). It's weird that the first (much bigger) run worked but later ones didn't (where maybe I had one port that failed to build), triggering the panic. Seemed repeatable, but don't have a feel for the exact trigger like the sysctl issue. What is the panic you see now? It can not be the same, since the dedup assertion is no longer there. -- Alexander Motin
Re: ZFS problems since recently ?
On 01.01.2024 08:59, John Kennedy wrote: On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: markj@ pointed me in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 to https://github.com/openzfs/zfs/pull/15719 So it will probably be fixed sooner or later. The other ZFS crashes I've seen are still an issue. My poudriere build did eventually fail as well: ... [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success [05:40:24] Stopping 2 builders panic: VERIFY(BP_GET_DEDUP(bp)) failed Please see/test: https://github.com/openzfs/zfs/pull/15732 . -- Alexander Motin
Re: ZFS problems since recently ?
Am 2024-01-02 08:22, schrieb Kurt Jaeger: Hi! The sysctl for block cloning is vfs.zfs.bclone_enabled. To check if a pool has made use of block cloning: zpool get all poolname | grep bclone One more thing: I have two pools on that box, and one of them has some bclone files: # zpool get all ref | grep bclone ref bcloneused 21.8M - ref bclonesaved24.4M - ref bcloneratio2.12x - # zpool get all pou | grep bclone pou bcloneused 0 - pou bclonesaved0 - pou bcloneratio1.00x - The ref pool contains the system and some files. The pou pool is for poudriere only. How do I find which files on ref are bcloned and how can I remove the bcloning from them ? No idea about the detection (I don't expect an easy way), but the answer to the second part is to copy the files after disabling block cloning. As this is system stuff, I would expect it is not much data, and you could copy everything and then move back to the original place. I would also assume original log files are not affected, and only files which were copied (installworld or installkernel or backup files or manual copies or port install (not sure about pkg install)) are possible targets. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: bridge(4) and IPv6 broken?
Am 2024-01-02 00:40, schrieb Lexi Winter: hello, i'm having an issue with bridge(4) and IPv6, with a configuration which is essentially identical to a working system running releng/14.0. ifconfig: lo0: flags=1008049 metric 0 mtu 16384 options=680003 inet 127.0.0.1 netmask 0xff00 inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1 groups: lo nd6 options=21 pflog0: flags=1000141 metric 0 mtu 33152 options=0 groups: pflog alc0: flags=1008943 metric 0 mtu 1500 options=c3098 ether 30:9c:23:a8:89:a0 inet6 fe80::329c:23ff:fea8:89a0%alc0 prefixlen 64 scopeid 0x3 media: Ethernet autoselect (1000baseT ) status: active nd6 options=1 wg0: flags=10080c1 metric 0 mtu 1420 options=8 inet 172.16.145.21 netmask 0x inet6 fd00:0:1337:cafe:::829a:595e prefixlen 128 groups: wg tunnelfib: 1 nd6 options=101 bridge0: flags=1008843 metric 0 mtu 1500 options=0 ether 58:9c:fc:10:ff:b6 inet 10.1.4.101 netmask 0xff00 broadcast 10.1.4.255 inet6 2001:8b0:aab5:104:3::101 prefixlen 64 id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: tap0 flags=143 ifmaxaddr 0 port 6 priority 128 path cost 200 member: alc0 flags=143 ifmaxaddr 0 port 3 priority 128 path cost 55 groups: bridge nd6 options=1 tap0: flags=9903 metric 0 mtu 1500 options=8 ether 58:9c:fc:10:ff:89 groups: tap media: Ethernet 1000baseT status: no carrier nd6 options=29 the issue is that the bridge doesn't seem to respond to IPv6 ICMP Neighbour Solicitation. for example, while running ping, tcpdump shows this: 23:30:16.567071 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 13, length 16 23:30:16.634860 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 23:30:17.567080 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 14, length 16 23:30:17.674842 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 23:30:17.936956 1e:ab:48:c1:f6:62 > 33:33:00:00:00:01, ethertype IPv6 (0x86dd), length 166: fe80::1cab:48ff:fec1:f662 > ff02::1: ICMP6, router advertisement, length 112 23:30:18.567093 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 15, length 16 23:30:19.567104 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 (0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: ICMP6, echo request, id 34603, seq 16, length 16 23:30:19.567529 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 (0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 32 fe80::1cab:48ff:fec1:f662 is the subnet router; it's sending solicitations but FreeBSD doesn't send a response, if i remove alc0 from the bridge and configure the IPv6 address directly on alc0 instead, everything works fine. i'm testing without any packet filter (ipfw/pf) in the kernel. it's possible i'm missing something obvious here; does anyone have an idea? Just an idea. I'm not sure if it is the right track... There is code in the kernel which is ignoring NS stuff from "non-valid" sources (security / spoofing reasons). The NS request is from a link local address. Your bridge has no link local address (and your tap has the auto linklocal flag set which I would have expected to be on the bridge instead). I'm not sure but I would guess it could be because of this. If my guess is not too far off, I would suggest to try: - remove auto linklocal from tap0 (like for alc0) - add auto linklocal to bridge0 If this doesn't help, there is the sysctl net.inet6.icmp6.nd6_onlink_ns_rfc4861 which you could try to set to 1. Please read https://www.freebsd.org/security/advisories/FreeBSD-SA-08:10.nd6.asc before you do that. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: ZFS problems since recently ?
Am 2023-12-31 19:34, schrieb Kurt Jaeger: I already have vfs.zfs.dmu_offset_next_sync=0 which is supposed to disable block-cloning. It isn't. This one is supposed to fix an issue which is unrelated to block cloning (but can be amplified by block cloning). This issue is fixed since some weeks, your Dec 23 build should not need it (when the issues happens, you have files with zero as parts of the data instead of the real data, and only if you copy files at the same time as those files are modified, and then only if you happen to get the timing right). The sysctl for block cloning is vfs.zfs.bclone_enabled. To check if a pool has made use of block cloning: zpool get all poolname | grep bclone Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
What is rc.d/opensm?
Hi, for my work on service jails (https://reviews.freebsd.org/D40370) I try to find out what opensm is. On my amd64 system I don't have a man page nor the binary (and man.freebsd.org doesn't know either about opensm). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: openzfs and block cloning question
Am 2023-11-24 08:10, schrieb Oleksandr Kryvulia: Hi, Recently cperciva@ published in his twitter [1] that enabling block cloning feature tends to data lost on 14. Is this statement true for the current? Since I am using current for daily work and block cloning enabled by default how can I verify that my data is not affected? Thank you. Block cloning may have an issue, or it does things which amplifies an old existing issue, or there are two issues... The full story is at https://github.com/openzfs/zfs/issues/15526 To be on the safe side, you may want to have vfs.zfs.dmu_offset_next_sync=0 (loader.conf / sysctl.conf) for the moment. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Request for Testing: TCP RACK
Am 2023-11-17 14:29, schrieb void: On Thu, Nov 16, 2023 at 10:13:05AM +0100, tue...@freebsd.org wrote: You can load the kernel module using kldload tcp_rack You can make the RACK stack the default stack using sysctl net.inet.tcp.functions_default=rack Hi, thank you for this. https://klarasystems.com/articles/using-the-freebsd-rack-tcp-stack/ mentions this needs to be set in /etc/src.conf : WITH_EXTRA_TCP_STACKS=1 Is this still the case? Context here is -current both in a vm and bare metal, on various machines, on various connections, from DSL to 10Gb. On a recent -current: this is not needed anymore, it is part of the defaults now. But you may still compile the kernel with "option TCPHPTS" (until it's added to the defaults too). Is there a method (yet) for enabling this functionality in various -RELENG maybe where one can compile in a vm built for that purpose, then transferring to the production vm? Copy the kernel which was build according to the acticle from klara systems to your target VM. Would it be expected to work on arm64? Yes (I use it on an ampere VM in the cloud). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: crash zfs_clone_range()
On 14.11.2023 12:44, Alexander Motin wrote: On 14.11.2023 12:39, Mateusz Guzik wrote: One of the vnodes is probably not zfs, I suspect this will do it (untested): diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c index 107cd69c756c..e799a7091b8e 100644 --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap) goto bad_write_fallback; } } + + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { + goto bad_write_fallback; + } + if (invp == outvp) { if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { goto bad_write_fallback; vn_copy_file_range() verifies for that: /* * If the two vnodes are for the same file system type, call * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range() * which can handle copies across multiple file system types. */ *lenp = len; if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, outmp->mnt_vfc->vfc_name) == 0) error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); else error = vn_generic_copy_file_range(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); Thinking again, what happen if there are two nullfs mounts on top of two different file systems, one of which is indeed not ZFS? Do we need to add those checks to all ZFS, NFS and FUSE, implementing VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS? -- Alexander Motin
Re: crash zfs_clone_range()
On 14.11.2023 12:39, Mateusz Guzik wrote: One of the vnodes is probably not zfs, I suspect this will do it (untested): diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c index 107cd69c756c..e799a7091b8e 100644 --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct vop_copy_file_range_args *ap) goto bad_write_fallback; } } + + if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) { + goto bad_write_fallback; + } + if (invp == outvp) { if (vn_lock(outvp, LK_EXCLUSIVE) != 0) { goto bad_write_fallback; vn_copy_file_range() verifies for that: /* * If the two vnodes are for the same file system type, call * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range() * which can handle copies across multiple file system types. */ *lenp = len; if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name, outmp->mnt_vfc->vfc_name) == 0) error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); else error = vn_generic_copy_file_range(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); -- Alexander Motin
Re: crash zfs_clone_range()
Hi Ronald, As I can see, the clone request to ZFS came through nullfs, and it crashed immediately on enter. I've never been a VFS layer expert, but to me it may be a nullfs problem, not zfs. Is there chance you was (un-)mounting something when this happened? On 10.11.2023 05:12, Ronald Klop wrote: Hi, Had this crash today on RPI4/15-CURRENT. FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 main-b0203aaa46-dirty: Sat Nov 4 11:48:33 CET 2023 ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64 $ sysctl -a | grep bclon vfs.zfs.bclone_enabled: 1 I started a jail with poudriere to build a package. The jail uses null mounts over ZFS. [root]# cu -s 115200 -l /dev/cuaU0 Connected db> bt Tracing pid 95213 tid 100438 td 0xe1e97900 db_trace_self() at db_trace_self db_stack_trace() at db_stack_trace+0x120 db_command() at db_command+0x2e4 db_command_loop() at db_command_loop+0x58 db_trap() at db_trap+0x100 kdb_trap() at kdb_trap+0x334 handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0xf200 kdb_enter() at kdb_enter+0x48 vpanic() at vpanic+0x1dc panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x9604 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x18c kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x5600 Oh.. While typing this I rebooted the machine and it happened again. I didn't start anything in particular although the machine runs some jails. x0: 0x00e0 x1: 0xa00090317a48 x2: 0xa000f79d4f00 x3: 0xa000c61a44a8 x4: 0xdeefe460 ($d.2 + 0xdd776560) x5: 0xa001250e4c00 x6: 0xe54025b5 ($d.5 + 0xc) x7: 0x030a x8: 0xe1559000 ($d.2 + 0xdfdd1100) x9: 0x0001 x10: 0x x11: 0x0001 x12: 0x0002 x13: 0x x14: 0x0001 x15: 0x x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310) x17: 0x004e0d44 (rms_rlock + 0x0) x18: 0xdeefe280 ($d.2 + 0xdd776380) x19: 0x x20: 0xdeefe460 ($d.2 + 0xdd776560) x21: 0x7fff x22: 0xa00090317a48 x23: 0xa000f79d4f00 x24: 0xa001067ef910 x25: 0x00e0 x26: 0xa000158a8000 x27: 0x x28: 0xa000158a8000 x29: 0xdeefe280 ($d.2 + 0xdd776380) sp: 0xdeefe280 lr: 0x01623564 (zfs_clone_range + 0x6c) elr: 0x004e0d60 (rms_rlock + 0x1c) spsr: 0xa045 far: 0x0108 esr: 0x9604 panic: data abort in critical section or under mutex cpuid = 1 time = 1699610885 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1a0 panic() at panic+0x48 data_abort() at data_abort+0x2fc handle_el1h_sync() at handle_el1h_sync+0x18 --- exception, esr 0x9604 rms_rlock() at rms_rlock+0x1c zfs_clone_range() at zfs_clone_range+0x68 zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c null_bypass() at null_bypass+0x118 vn_copy_file_range() at vn_copy_file_range+0x18c kern_copy_file_range() at kern_copy_file_range+0x36c sys_copy_file_range() at sys_copy_file_range+0x8c do_el0_sync() at do_el0_sync+0x634 handle_el0_sync() at handle_el0_sync+0x48 --- exception, esr 0x5600 KDB: enter: panic [ thread pid 3792 tid 100394 ] Stopped at kdb_enter+0x48: str xzr, [x19, #768] db> I'll keep the debugger open for a while. Can I type something for additional info? Regards, Ronald. -- Alexander Motin
Re: poudriere job && find jobs which received signal 11
Am 2023-10-18 09:54, schrieb Matthias Apitz: Hello, I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports, from git October 14, 2023. In the last two day 2229 packages were produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken). This morning I was looking for something in /var/log/messages and accidentally I detected that yesterday a few compilations failed: # grep 'signal 11' /var/log/messages | grep -v conftest Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: exited on signal 11 (core dumped) As I said, without that any of the 2229 jobs were failing: # cd /usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg # ls -C1 | wc -l 2229 # grep -l 'build failure' * p5-Gtk2-1.24993_3.log How this is possible, that the make engines didn't failing? The uid That can be part of configure runs which try to test some features. 65534 is the one used by poudriere, can I use the jid 24 somehow to find the job which received the signal 11? Or is the time the only way to jid = jail ID, the first column in the output of "jls". If you have the poudriere runtime logs (where it lists which package it is processing ATM), you will see a number from 1 to the max number of jails which run in parallel. This number is part of the hostname of the jail. So if you have the poudriere jails still running, you can make a mapping from the jid to the name to the number, and together with the time you can see which package it was building at that time. Unfortunately poudriere doesn't list the hostname of the builder nor the jid (feature request anyone?). Example poudriere runtime log: ---snip--- [00:54:11] [03] [00:00:00] Building security/nss | nss-3.94 [00:56:46] [03] [00:02:35] Finished security/nss | nss-3.94: Success [00:56:47] [03] [00:00:00] Building textproc/gsed | gsed-4.9 [00:57:41] [01] [00:06:18] Finished x11-toolkits/gtk30 | gtk3-3.24.34_1: Success [00:57:42] [01] [00:00:00] Building devel/qt6-base | qt6-base-6.5.3 ---snip--- While poudriere is running, jls reports this: ---snip--- # jls jid host.hostname [...] 91 poudriere-bastille-default 92 poudriere-bastille-default 93 poudriere-bastille-default-job-01 94 poudriere-bastille-default-job-01 95 poudriere-bastille-default-job-02 96 poudriere-bastille-default-job-03 97 poudriere-bastille-default-job-02 98 poudriere-bastille-default-job-03 ---snip--- So if we assume a coredump in jid 96 or 98, this means it was in builder 3. nss and gseed where build by poudriere builder number 3 (both about 56 minutes after start of poudriere), and gtk30 and qt6-base by poudriere builder number 1. If we assume further that the coredumps are in the timerange of 54 to 56 minutes after the poudriere start, the logs of nss may have a trace of it (or not, if it was part of configure, then you would have to do the configure run and check the messages if it generates similar coredumps) look, which of the 4 poudriere engines were running at this time? I'd like to rerun/reproduce the package again. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: issue: poudriere jail update fails after recent changes around certctl
Am 2023-10-13 17:42, schrieb Dag-Erling Smørgrav: Alexander Leidinger writes: some change around certctl (world from 2023-10-09) has broken the poudriere jail update command. The complete install finishes, certctl is run, and then there is an exit code 1. This is because I have some certs listed as untrusted, and this seems to give a retval of 1 inside certctl. This only happens if a certificate is listed as both trusted and untrusted, and I'm pretty sure the previous version would return 1 in that case as well. Can you check? I compared /usr/share/certs/untrusted/ with /usr/share/certs/trusted/ and some of them match with certs in /usr/share/certs/trusted/. Nothing in /usr/local/etc/ssl/untrusted/, one cert (as hash) in /usr/local/etc/ssl/blacklisted/ which is also in /usr/share/certs/untrusted/. If FreeBSD provides some certs as trusted (as part of e.g. installworld), and I have some of them listed in untrusted, I would not expect an error case, but a failsafe action of not trusting them and not complaining... am I doing something wrong? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
issue: poudriere jail update fails after recent changes around certctl
Hi, some change around certctl (world from 2023-10-09) has broken the poudriere jail update command. The complete install finishes, certctl is run, and then there is an exit code 1. This is because I have some certs listed as untrusted, and this seems to give a retval of 1 inside certctl. Testcase: set a cert as untrusted and try to use "poudriere jail -u -j YOUR_JAIL_NAME -m src=/usr/src" Relevant log: ---snip--- -- Installing everything completed on Fri Oct 13 10:00:04 CEST 2023 -- 83.55 real 103.83 user 109.42 sys certctl.sh: Skipping untrusted certificate ad088e1d (/space/poudriere/jails/poudriere-x11/etc/ssl/untrusted/ad088e1d.0) [some more untrusted] *** [installworld] Error code 1 make[1]: stopped in /space/system/usr_src 1 error make[1]: stopped in /space/system/usr_src make: stopped in /usr/src [00:01:32] Error: Failed to 'make installworld' ---snip--- Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: git: 989c5f6da990 - main - freebsd-update: create deep BEs by default [really about if -r for bectl create should just go away]
Am 2023-10-12 07:08, schrieb Mark Millard: I use the likes of: BE Active Mountpoint Space Created build_area_for-main-CA72 - - 1.99G 2023-09-20 10:19 main-CA72NR / 4.50G 2023-09-21 10:10 NAMECANMOUNT MOUNTPOINT zopt0 on/zopt0 . . . zopt0/ROOT onnone zopt0/ROOT/build_area_for-main-CA72 noautonone zopt0/ROOT/main-CA72noautonone zopt0/poudriere on /usr/local/poudriere zopt0/poudriere/dataon /usr/local/poudriere/data zopt0/poudriere/data/.m on /usr/local/poudriere/data/.m zopt0/poudriere/data/cache on /usr/local/poudriere/data/cache zopt0/poudriere/data/images on /usr/local/poudriere/data/images zopt0/poudriere/data/logs on /usr/local/poudriere/data/logs zopt0/poudriere/data/packages on /usr/local/poudriere/data/packages zopt0/poudriere/data/wrkdirson /usr/local/poudriere/data/wrkdirs zopt0/poudriere/jails on /usr/local/poudriere/jails zopt0/poudriere/ports on /usr/local/poudriere/ports zopt0/tmp on/tmp zopt0/usr off /usr zopt0/usr/13_0R-src on/usr/13_0R-src zopt0/usr/alt-main-src on/usr/alt-main-src zopt0/usr/home on/usr/home zopt0/usr/local on/usr/local [...] If such ends up as unsupportable, it will effectively eliminate my reason for using bectl (and, so, zfs): the sharing is important to my use. Additionally/complementary to what Kyle said... The -r option is about zop0/ROOT/main-CA72 zop0/ROOT/main-CA72/subDS1 zop0/ROOT/main-CA72/subDS2 A shallow clone is only taking zop0/ROOT/main-CA72 into account, while a -r clone is also cloning subDS1 and subDS2. So as Kyle said, your (and my) use case are not affected by this. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
base-krb5 issues (segfaults when adding principals in openssl)
Hi, has someone else issues with krb5 on -current when adding principals? With -current as of 2023-09-11 I get a segfault in openssl: ---snip--- Reading symbols from /usr/bin/kadmin... Reading symbols from /usr/lib/debug//usr/bin/kadmin.debug... [New LWP 270171] btCore was generated by `kadmin -l'. Program terminated with signal SIGSEGV, Segmentation fault. Address not mapped to object. #0 0x in ?? () (gdb) bt #0 0x in ?? () #1 0x0e118da145f8 in ARCFOUR_string_to_key (context=0x44f9fba1a000, enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., opaque=..., key=0x44f9fba211d8) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt-arcfour.c:84 #2 0x0e118da156e9 in krb5_string_to_key_data_salt_opaque (enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, salt=..., opaque=..., context=, password=..., key=) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:201 #3 krb5_string_to_key_data_salt (context=0x44f9fba1a000, enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., key=0x44f9fba211d8) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:173 #4 0x0e118da158cb in krb5_string_to_key_salt (context=0x44f9fba4bc60, context@entry=0x44f9fba1a000, enctype=-1980854121, password=0x0, password@entry=0xe1189ee9510 "1kad$uwi6!", salt=..., key=0x5) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:225 #5 0x0e118ba75423 in hdb_generate_key_set_password (context=0x44f9fba1a000, principal=, password=password@entry=0xe1189ee9510 "1kad$uwi6!", keys=keys@entry=0xe1189ee9210, num_keys=num_keys@entry=0xe1189ee9208) at /space/system/usr_src/crypto/heimdal/lib/hdb/keys.c:381 #6 0x0e118ca91c9a in _kadm5_set_keys (context=context@entry=0x44f9fba1a140, ent=ent@entry=0xe1189ee9258, password=0x1 , password@entry=0xe1189ee9510 "1kad$uwi6!") at /space/system/usr_src/crypto/heimdal/lib/kadm5/set_keys.c:51 #7 0x0e118ca8caac in kadm5_s_create_principal (server_handle=0x44f9fba1a140, princ=, mask=out>, password=0xe1189ee9510 "1kad$uwi6!") at /space/system/usr_src/crypto/heimdal/lib/kadm5/create_s.c:172 #8 0x0e0969e1a57b in add_one_principal (name=, rand_key=0, rand_password=0, use_defaults=0, password=0xe1189ee9510 "1kad$uwi6!", key_data=0x0, max_ticket_life=, max_renewable_life=, attributes=0x0, expiration=, pw_expiration=0x0) at /space/system/usr_src/crypto/heimdal/kadmin/ank.c:141 #9 add_new_key (opt=opt@entry=0xe1189ee9960, argc=argc@entry=1, argv=0x44f9fba49238, argv@entry=0x44f9fba49230) at /space/system/usr_src/crypto/heimdal/kadmin/ank.c:243 #10 0x0e0969e1e124 in add_wrap (argc=, argv=0x44f9fba49230) at kadmin-commands.c:210 #11 0x0e0969e23945 in sl_command (cmds=, argc=2, argv=0x44f9fba49230) at /space/system/usr_src/crypto/heimdal/lib/sl/sl.c:209 #12 sl_command_loop (cmds=cmds@entry=0xe0969e282a0 , prompt=prompt@entry=0xe0969e15cca "kadmin> ", data=) at /space/system/usr_src/crypto/heimdal/lib/sl/sl.c:328 #13 0x0e0969e1d876 in main (argc=, argv=out>) at /space/system/usr_src/crypto/heimdal/kadmin/kadmin.c:275 (gdb) up 1 #1 0x0e118da145f8 in ARCFOUR_string_to_key (context=0x44f9fba1a000, enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., opaque=..., key=0x44f9fba211d8) at /space/system/usr_src/crypto/heimdal/lib/krb5/salt-arcfour.c:84 84 EVP_DigestUpdate (m, , 1); (gdb) list 79 80 /* LE encoding */ 81 for (i = 0; i < len; i++) { 82 unsigned char p; 83 p = (s[i] & 0xff); 84 EVP_DigestUpdate (m, , 1); 85 p = (s[i] >> 8) & 0xff; 86 EVP_DigestUpdate (m, , 1); 87 } 88 (gdb) print i $1 = 0 (gdb) print len $2 = (gdb) print p $3 = 49 '1' (gdb) print m $4 = (EVP_MD_CTX *) 0x43e31de4bc60 (gdb) print *m $5 = {reqdigest = 0x17e678afd470, digest = 0x0, engine = 0x0, flags = 0, md_data = 0x0, pctx = 0x0, update = 0x0, algctx = 0x0, fetched_digest = 0x0} (gdb) ---snip--- Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: vfs.zfs.bclone_enabled (was: FreeBSD 14.0-BETA2 Now Available) [block_cloning and zilsaxattr missing from loader's features_for_read]
On 18.09.2023 19:21, Mark Millard wrote: On Sep 18, 2023, at 15:51, Mark Millard wrote: Alexander Motin wrote on Date: Mon, 18 Sep 2023 13:26:56 UTC : block_cloning feature is marked as READONLY_COMPAT. It should not require any special handling from the boot code. From stand/libsa/zfs/zfsimpl.c but adding a comment about the read-only compatibility status of each entry: /* * List of ZFS features supported for read */ static const char *features_for_read[] = { "com.datto:bookmark_v2", // READ-ONLY COMPATIBLE no "com.datto:encryption", // READ-ONLY COMPATIBLE no "com.datto:resilver_defer", // READ-ONLY COMPATIBLE yes "com.delphix:bookmark_written", // READ-ONLY COMPATIBLE no "com.delphix:device_removal", // READ-ONLY COMPATIBLE no "com.delphix:embedded_data", // READ-ONLY COMPATIBLE no "com.delphix:extensible_dataset", // READ-ONLY COMPATIBLE no "com.delphix:head_errlog", // READ-ONLY COMPATIBLE no "com.delphix:hole_birth", // READ-ONLY COMPATIBLE no "com.delphix:obsolete_counts", // READ-ONLY COMPATIBLE yes "com.delphix:spacemap_histogram", // READ-ONLY COMPATIBLE yes "com.delphix:spacemap_v2", // READ-ONLY COMPATIBLE yes "com.delphix:zpool_checkpoint", // READ-ONLY COMPATIBLE yes "com.intel:allocation_classes", // READ-ONLY COMPATIBLE yes "com.joyent:multi_vdev_crash_dump", // READ-ONLY COMPATIBLE no "com.klarasystems:vdev_zaps_v2", // READ-ONLY COMPATIBLE no "org.freebsd:zstd_compress", // READ-ONLY COMPATIBLE no "org.illumos:lz4_compress", // READ-ONLY COMPATIBLE no "org.illumos:sha512", // READ-ONLY COMPATIBLE no "org.illumos:skein", // READ-ONLY COMPATIBLE no "org.open-zfs:large_blocks", // READ-ONLY COMPATIBLE no "org.openzfs:blake3", // READ-ONLY COMPATIBLE no "org.zfsonlinux:allocation_classes", // READ-ONLY COMPATIBLE yes "org.zfsonlinux:large_dnode", // READ-ONLY COMPATIBLE no NULL }; So it appears that the design is that both "no" and "yes" ones that are known to be supported are listed and anything else is supposed to lead to rejection until explicitly added as known-compatibile. I don't think so. I think somebody by mistake added first featured that should not be here, and then others continued this irrelevant routine. My own development server/builder is happily running latest main with ZFS root without any patches and with block cloning not only enabled, but even active. So as I have told, it is not needed: mav@srv:/root# zpool get all | grep clon mavlab bcloneused 20.5M - mavlab bclonesaved20.9M - mavlab bcloneratio2.02x - mavlab feature@block_cloning active local Somebody should go through the list and clean in up from read-compatible features and document it, unless there are some features that were re-qualified at some point, I haven't checked if it could be. This matches up with stand/libsa/zfs/zfsimpl.c 's: static int nvlist_check_features_for_read(nvlist_t *nvl) { ... rc = nvlist_find(nvl, ZPOOL_CONFIG_FEATURES_FOR_READ, DATA_TYPE_NVLIST, NULL, , NULL); Take a note it reads ZPOOL_CONFIG_FEATURES_FOR_READ. Same time features declared as READONLY_COMPAT are stored in FEATURES_FOR_WRITE, that boot loader does not even care. I do not know if vfs.zfs.bclone_enabled=0 leads the loader to see vs. not-see a "com.fudosecurity:block_cloning". bclone_enabled=0 block copy_file_range() usage, that should keep the feature enabled, but not active. It could be related if the feature would be in FEATURES_FOR_WRITE, but here and now it is not. It appears that 2 additions afeter opebzfas-2.1-freebsd are missing from the above list: com.fudosecurity:block_cloning org.openzfs:zilsaxattr Nothing of ZIL is required for read-only import. So no, it is also not needed. -- Alexander Motin
Re: vfs.zfs.bclone_enabled (was: FreeBSD 14.0-BETA2 Now Available)
block_cloning feature is marked as READONLY_COMPAT. It should not require any special handling from the boot code. On 18.09.2023 07:22, Tomoaki AOKI wrote: Really OK? I cannot find block_cloning in array *features_for_read[] of stand/libsa/zfs/zfsimpl.c, which possibly mean boot codes (including loader) cannot boot from Root-on-ZFS pool having block_cloning active. Not sure adding '"com.fudosecurity:block_cloning",' here is sufficient or not. Possibly more works are needed. IMHO, all default-enabled features should be safe for booting. Implement features with disalded, impement boot codes to support them, then finally enable them by default should be the only valid route. [1] https://cgit.freebsd.org/src/tree/stand/libsa/zfs/zfsimpl.c On Mon, 18 Sep 2023 07:31:46 +0200 Martin Matuska wrote: I vote for enabling block cloning on main :-) mm On 16. 9. 2023 19:14, Alexander Motin wrote: On 16.09.2023 01:25, Graham Perrin wrote: On 16/09/2023 01:28, Glen Barber wrote: o A fix for the ZFS block_cloning feature has been implemented. Thanks I see <https://github.com/openzfs/zfs/commit/5cc1876f14f90430b24f1ad2f231de936691940f>, with <https://github.com/freebsd/freebsd-src/commit/9dcf00aa404bb62052433c45aaa5475e2760f5ed> in stable/14. As vfs.zfs.bclone_enabled is still 0 (at least, with 15.0-CURRENT n265350-72d97e1dd9cc): should we assume that additional fixes, not necessarily in time for 14.0-RELEASE, will be required before vfs.zfs.bclone_enabled can default to 1? I am not aware of any block cloning issues now. All this thread about bclone_enabled actually started after I asked why it is still disabled. Thanks to Mark Millard for spotting this issue I could fix, but now we are back at the point of re-enabling it again. Since the tunable does not even exist anywhere outside of FreeBSD base tree, I'd propose to give this code another try here too. I see no point to have it disabled at least in main unless somebody needs time to run some specific tests first. -- Alexander Motin
Re: vfs.zfs.bclone_enabled (was: FreeBSD 14.0-BETA2 Now Available)
On 16.09.2023 01:25, Graham Perrin wrote: On 16/09/2023 01:28, Glen Barber wrote: o A fix for the ZFS block_cloning feature has been implemented. Thanks I see <https://github.com/openzfs/zfs/commit/5cc1876f14f90430b24f1ad2f231de936691940f>, with <https://github.com/freebsd/freebsd-src/commit/9dcf00aa404bb62052433c45aaa5475e2760f5ed> in stable/14. As vfs.zfs.bclone_enabled is still 0 (at least, with 15.0-CURRENT n265350-72d97e1dd9cc): should we assume that additional fixes, not necessarily in time for 14.0-RELEASE, will be required before vfs.zfs.bclone_enabled can default to 1? I am not aware of any block cloning issues now. All this thread about bclone_enabled actually started after I asked why it is still disabled. Thanks to Mark Millard for spotting this issue I could fix, but now we are back at the point of re-enabling it again. Since the tunable does not even exist anywhere outside of FreeBSD base tree, I'd propose to give this code another try here too. I see no point to have it disabled at least in main unless somebody needs time to run some specific tests first. -- Alexander Motin
Re: Speed improvements in ZFS
Am 2023-09-15 13:40, schrieb George Michaelson: Not wanting to hijack threads I am interested if any of this can translate back up tree and make Linux ZFS faster. And, if there are simple sysctl tuning worth trying in large (tb) memory model pre 14 FreeBSD systems with slow zfs. Older freebsd alas. The current part of the discussion is not really about ZFS (I use a lot of nullfs on top of ZFS). So no to the first part. The tuning I did (maxvnodes) doesn't really depend on the FreeBSD version, but on the number of files touched/contained in the FS. The only other change I made is updating the OS itself, so this part doesn't apply to pre 14 systems. If you think your ZFS (with a large ARC) is slow, you need to review your primary cache settings per dataset, check the arcstats, and maybe think about a 2nd level arc on fast storage (cache device on nvm or ssd). IF you have a read-once workload, nothing of this will help. So all depends on your workload. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: Speed improvements in ZFS
Am 2023-09-04 14:26, schrieb Mateusz Guzik: On 9/4/23, Alexander Leidinger wrote: Am 2023-08-28 22:33, schrieb Alexander Leidinger: Am 2023-08-22 18:59, schrieb Mateusz Guzik: On 8/22/23, Alexander Leidinger wrote: Am 2023-08-21 10:53, schrieb Konstantin Belousov: On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger > > >> wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger > > >>>> wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you > > >>>>> interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has > > >>> several > > >>> null mounts. One basesystem mounted into every jail, and then > > >>> shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order > > >>>> to > > >>>> do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes > > >>>> from > > >>>> exclusive locking which should not be there to begin with -- > > >>>> as > > >>>> in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of > > >> xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how > > this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set > for > fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. By nfs I meant nfs client, not nfs exports. No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails. While I don't see why xlocking happens, you should be able to dtrace or printf your way into finding out. dtrace looks to me like a faster approach to get to the root than printf... my first naive try is to detect exclusive locks. I'm not 100% sure I got it right, but at least dtrace doesn't complain about it: ---snip--- #pragma D option dynvarsize=32m fbt:nullfs:null_lock:entry /args[0]->a_flags & 0x08 != 0/ { stack(); } ---snip--- In which direction should I look with dtrace if this works in tonights run of periodic? I don't have enough knowledge about VFS to come up with some immediate ideas. After your sysctl fix for maxvnodes I increased the amount of vnodes 10 times compared to the initial report. This has increased the speed of the operation, the find runs in all those jails finished today after ~5h (@~8am) instead of in the afternoon as before. Could this suggest that in parallel some null_reclaim() is running which does the exclusive locks and slows down the entire operation? That may be a slowdown to some extent, but the primary problem is exclusive vnode locking for stat lookup, which should not be happening. With -current as of 2023-09-03 (and right now 2023-09-11), the periodic daily runs are down to less than an hour... and this didn't happen directly after switching to 2023-09-13. First it went down to 4h, then down to 1h without any update of the OS. The only thing what I did was modifying the number of maxfiles. First to some huge amount after your commit in the sysctl affecting part. Then after noticing way more freevnodes than configured down to 5. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: sed in CURRENT fails in textproc/jq
Am 2023-09-10 18:53, schrieb Robert Clausecker: Hi Warner, Thank you for your response. Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh: On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker wrote: > Hi Warner, > > I have pushed a fix. It should hopefully address those failing tests. > The same issue should also affect memcmp(), but unlike for memchr(), it is > illegal to pass a length to memcmp() that extends past the actual end of > the buffer as memcmp() is permitted to examine the whole buffer regardless > of where the first mismatch is. > > I am considering a change to improve the behaviour of memcmp() on such > errorneous inputs. There are two options: (a) I could change memcmp() the > same way I fixed memchr() and have implausible buffer lengths behave as if > the buffer goes to the end of the address space or (b) I could change > memcmp() to crash loudly if it detects such a case. I could also > (c) leave memcmp() as is. Which of these three choices is preferable? > What does the standard say? I'm highly skeptical that these corner cases are UB behavior. I'd like actual support for this statement, rather than your conjecture that it's illegal. Even if you can come up with that, preserving the old behavior is my first choice. Especially since many of these functions aren't well defined by a standard, but are extensions. As for memchr, https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html has no such permission to examine 'the entire buffer at once' nor any restirction as to the length extending beyond the address space. I'm skeptical of your reading that it allows one to examine all of [b, b + len), so please explain where the standard supports reading past the first occurance. memchr() in particular is specified to only examine the input until the matching character is found (ISO/IEC 9899:2011 § 7.24.5.1): *** The memchr function locates the first occurrence of c (converted to an unsigned char) in the initial n characters (each interpreted as unsigned char) of the object pointed to by s. The implementation shall behave as if it reads the characters sequentially and stops as soon as a matching character is found. *** Therefore, it appears reasonable that calls with fake buffer lengths (e.g. SIZE_MAX, to read until a mismatch occurs) must be supported. However, memcmp() has no such language and the text explicitly states that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1): *** The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2. *** By omission, this seems to give license to e.g. implement memcmp() like timingsafe_memcmp() where it inspects all n characters of both buffers and only then gives a result. So if n is longer than the actual buffer (e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be a crash due to crossing into an unmapped page). Thus I have patched memchr() to behave correctly when length SIZE_MAX is given (commit b2618b65). My memcmp() suffers from similarly flawed logic and may need to be patched. However, as the language I cited above does not indicate that such usage needs to be supported for memcmp() (whereas it must be for memchr(), contrary to my assumptions), I was asking you for how to proceed with memcmp (hence choices (a)--(c)). My 2ct: What did the previous implementation of memcmp() do in this case? - If it was generous and behaved similar to the requirements of memchr(), POLA requires to have the same now too. - If it was crashing or silently going on (= lurking bugs in 3rd party code), we may have the possibility to do a coredump in case of running past the end of the buffer to prevent malicous use. - In general I go with the robustness principle, "be liberal what you accept, but strict in what you provide" = memcmp() should behave as if it is supported. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics
On 09.09.2023 12:32, Mark Millard wrote: On Sep 8, 2023, at 21:54, Mark Millard wrote: On Sep 8, 2023, at 18:19, Mark Millard wrote: On Sep 8, 2023, at 17:03, Mark Millard wrote: On Sep 8, 2023, at 15:30, Martin Matuska wrote: On 9. 9. 2023 0:09, Alexander Motin wrote: Thank you, Martin. I was able to reproduce the issue with your script and found the cause. I first though the issue is triggered by the `cp`, but it appeared to be triggered by `cat`. It also got copy_file_range() support, but later than `cp`. That is probably why it slipped through testing. This patch fixes it for me: https://github.com/openzfs/zfs/pull/15251 . Mark, could you please try the patch? I finally stopped it at 7473 built (a little over 13 hrs elapsed): ^C[13:08:30] Error: Signal SIGINT caught, cleaning up and exiting [main-amd64-bulk_a-default] [2023-09-08_19h51m52s] [sigint:] Queued: 34588 Built: 7473 Failed: 23Skipped: 798 Ignored: 335 Fetched: 0 Tobuild: 25959 Time: 13:08:26 [13:08:30] Logs: /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-08_19h51m52s [13:08:31] Cleaning up [13:17:10] Unmounting file systems Exiting with status 1 In part that was more evidence for deadlocks at least being fairly rare as well. None of the failed ones looked odd. (A fair portion are because the bulk -a was mostly doing WITH_DEBUG= builds. Many upstreams change library names, some other file names, or paths used for debug builds and ports generally do not cover well building the debug builds for such. I've used these runs to extend my list of exceptions that avoid using WITH_DEBUG .) So no evidence of corruptions. Thank you, Mark. The patch was accepted upstream and merged to both master and zfs-2.2-release branches. -- Alexander Motin
Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics
On 08.09.2023 09:52, Martin Matuska wrote: I digged a little and was able to reproduce the panic without poudriere with a shell script. #!/bin/sh nl=' ' sed_script=s/aaa/b/ for ac_i in 1 2 3 4 5 6 7; do sed_script="$sed_script$nl$sed_script" done echo "$sed_script" 2>/dev/null | sed 99q >conftest.sed repeats=8 count=0 echo -n 0123456789 >"conftest.in" while : do cat "conftest.in" "conftest.in" >"conftest.tmp" mv "conftest.tmp" "conftest.in" cp "conftest.in" "conftest.nl" echo '' >> "conftest.nl" sed -f conftest.sed < "conftest.nl" >"conftest.out" 2>/dev/null || break diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break count=$(($count + 1)) echo "count: $count" # 10*(2^10) chars as input seems more than enough test $count -gt $repeats && break done rm -f conftest.in conftest.tmp conftest.nl conftest.out Thank you, Martin. I was able to reproduce the issue with your script and found the cause. I first though the issue is triggered by the `cp`, but it appeared to be triggered by `cat`. It also got copy_file_range() support, but later than `cp`. That is probably why it slipped through testing. This patch fixes it for me: https://github.com/openzfs/zfs/pull/15251 . Mark, could you please try the patch? -- Alexander Motin
Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics
Thanks, Mark. On 07.09.2023 15:40, Mark Millard wrote: On Sep 7, 2023, at 11:48, Glen Barber wrote: On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote: When I next have time, should I retry based on a more recent vintage of main that includes 969071be938c ? Yes, please, if you can. As stands, I rebooted that machine into my normal enviroment, so the after-crash-with-dump-info context is preserved. I'll presume lack of a need to preserve that context unless I hear otherwise. (But I'll work on this until later today.) Even my normal environment predates the commit in question by a few commits. So I'll end up doing a more general round of updates overall. Someone can let me know if there is a preference for debug over non-debug for the next test run. It is not unknown when some bugs disappear once debugging is enabled due to different execution timings, but generally debug may to detect the problem closer to its origin instead of looking on random consequences. I am only starting to look on this report (unless Pawel or somebody beat me on it), and don't have additional requests yet, but if you can repeat the same with debug kernel (in-base ZFS's ZFS_DEBUG setting follows kernel's INVARIANTS), it may give us some additional information. Looking at "git: 969071be938c - main", the relevant part seems to be just (white space possibly not preserved accurately): diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c index 9fb5aee6a023..4e4161ef1a7f 100644 --- a/sys/kern/vfs_vnops.c +++ b/sys/kern/vfs_vnops.c @@ -3076,12 +3076,14 @@ vn_copy_file_range(struct vnode *invp, off_t *inoffp, struct vnode *outvp, goto out; /* -* If the two vnode are for the same file system, call +* If the two vnodes are for the same file system type, call * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range() -* which can handle copies across multiple file systems. +* which can handle copies across multiple file system types. */ *lenp = len; - if (invp->v_mount == outvp->v_mount) + if (invp->v_mount == outvp->v_mount || + strcmp(invp->v_mount->mnt_vfc->vfc_name, + outvp->v_mount->mnt_vfc->vfc_name) == 0) error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp, lenp, flags, incred, outcred, fsize_td); else That looks to call VOP_COPY_FILE_RANGE in more contexts and vn_generic_copy_file_range in fewer. The backtrace I reported involves: VOP_COPY_FILE_RANGE So it appears this change is unlikely to invalidate my test result, although failure might happen sooner if more VOP_COPY_FILE_RANGE calls happen with the newer code. Your logic is likely right, but if you have block cloning requests both within and between datasets, this patch may change the pattern. Though it is obviously not a fix for the issue. I responded to the commit email only because it makes no difference while vfs.zfs.bclone_enabled is 0. That in turns means that someone may come up with some other change for me to test by the time I get around to setting up another test. Let me know if so. -- Alexander Motin
Re: 100% CPU time for sysctl command, not killable
Am 2023-09-03 21:22, schrieb Alexander Leidinger: Am 2023-09-02 16:56, schrieb Mateusz Guzik: On 8/20/23, Alexander Leidinger wrote: Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... fixed here https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3 I confirm. There may be dragons...: kern.maxvnodes: 1048576000 vfs.wantfreevnodes: 262144000 vfs.freevnodes: 0 <--- vfs.vnodes_created: 11832359 vfs.numvnodes: 146699 vfs.recycles_free: 4700765 vfs.recycles: 0 vfs.vnode_alloc_sleeps: 0 Another time I got an insanely huge amount of free vnodes (more than maxvnodes). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned
On 04.09.2023 11:45, Mark Millard wrote: On Sep 4, 2023, at 06:09, Alexander Motin wrote: per_txg_dirty_frees_percent is directly related to the delete delays we see here. You are forcing ZFS to commit transactions each 5% of dirty ARC limit, which is 5% of 10% or memory size. I haven't looked on that code recently, but I guess setting it too low can make ZFS commit transactions too often, increasing write inflation for the underlying storage. I would propose you to restore the default and try again. While this machine is different, the original problem was worse than the issue here: the load average was less than 1 for the most part the parallel bulk build when 30 was used. The fraction of time waiting was much longer than with 5. If I understand right, both too high and too low for a type of context can lead to increased elapsed time and getting it set to a near optimal is a non-obvious exploration. IIRC this limit was modified several times since originally implemented. May be it could benefit from another look, if default 30% is not good. It would be good if generic ZFS issues like this were reported to OpenZFS upstream to be visible to a wider public. Unfortunately I have several other project I must work on, so if it is not a regression I can't promise I'll take it right now, so anybody else is welcome. An overall point for the goal of my activity is: what makes a good test context for checking if ZFS is again safe to use? May be other tradeoffs make, say, 4 hardware threads more reasonable than 32. Thank you for your testing. The best test is one that nobody else run. It also correlates with the topic of "safe to use", which also depends on what it is used for. :) -- Alexander Motin
Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned
On 04.09.2023 05:56, Mark Millard wrote: On Sep 4, 2023, at 02:00, Mark Millard wrote: On Sep 3, 2023, at 23:35, Mark Millard wrote: On Sep 3, 2023, at 22:06, Alexander Motin wrote: On 03.09.2023 22:54, Mark Millard wrote: After that ^t produced the likes of: load: 6.39 cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 1% 13004k So the full state is not "tx->tx", but is actually a "tx->tx_quiesce_done_cv", which means the thread is waiting for new transaction to be opened, which means some previous to be quiesced and then synced. #0 0x80b6f103 at mi_switch+0x173 #1 0x80bc0f24 at sleepq_switch+0x104 #2 0x80aec4c5 at _cv_wait+0x165 #3 0x82aba365 at txg_wait_open+0xf5 #4 0x82a11b81 at dmu_free_long_range+0x151 Here it seems like transaction commit is waited due to large amount of delete operations, which ZFS tries to spread between separate TXGs. That fit the context: cleaning out /usr/local/poudriere/data/.m/ You should probably see some large and growing number in sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay . After the reboot I started a -J64 example. It has avoided the early "witness exhausted". Again I ^C'd after about an hours after the 2nd builder had started. So: again cleaning out /usr/local/poudriere/data/.m/ Only seconds between: # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027 As expected, deletes trigger and wait for TXG commits. I have found a measure of progress: zfs list's USED for /usr/local/poudriere/data/.m is decreasing. So ztop's d/s was a good classification: deletes. #5 0x829a87d2 at zfs_rmnode+0x72 #6 0x829b658d at zfs_freebsd_reclaim+0x3d #7 0x8113a495 at VOP_RECLAIM_APV+0x35 #8 0x80c5a7d9 at vgonel+0x3a9 #9 0x80c5af7f at vrecycle+0x3f #10 0x829b643e at zfs_freebsd_inactive+0x4e #11 0x80c598cf at vinactivef+0xbf #12 0x80c590da at vput_final+0x2aa #13 0x80c68886 at kern_funlinkat+0x2f6 #14 0x80c68588 at sys_unlink+0x28 #15 0x8106323f at amd64_syscall+0x14f #16 0x8103512b at fast_syscall_common+0xf8 What we don't see here is what quiesce and sync threads of the pool are actually doing. Sync thread has plenty of different jobs, including async write, async destroy, scrub and others, that all may delay each other. Before you rebooted the system, depending how alive it is, could you save a number of outputs of `procstat -akk`, or at least specifically `procstat -akk | grep txg_thread_enter` if the full is hard? Or somehow else observe what they are doing. # grep txg_thread_enter ~/mmjnk0[0-5].txt /usr/home/root/mmjnk00.txt:6 100881 zfskern txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe /usr/home/root/mmjnk00.txt:6 100882 zfskern txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe /usr/home/root/mmjnk01.txt:6 100881 zfskern txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe /usr/home/root/mmjnk01.txt:6 100882 zfskern txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe /usr/home/root/mmjnk02.txt:6 100881 zfskern txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe /usr/home/root/mmjnk02.txt:6 100882 zfskern txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe /usr/home/root/mmjnk03.txt:6 100881 zfskern txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe /usr/home/root/mmjnk03.txt:6 100882 zfskern txg_thread_enter mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thre
Re: Speed improvements in ZFS
Am 2023-08-28 22:33, schrieb Alexander Leidinger: Am 2023-08-22 18:59, schrieb Mateusz Guzik: On 8/22/23, Alexander Leidinger wrote: Am 2023-08-21 10:53, schrieb Konstantin Belousov: On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger > > >>>> wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you > > >>>>> interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has > > >>> several > > >>> null mounts. One basesystem mounted into every jail, and then > > >>> shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order to > > >>>> do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from > > >>>> exclusive locking which should not be there to begin with -- as > > >>>> in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of > > >> xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how > > this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set > for > fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. By nfs I meant nfs client, not nfs exports. No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails. While I don't see why xlocking happens, you should be able to dtrace or printf your way into finding out. dtrace looks to me like a faster approach to get to the root than printf... my first naive try is to detect exclusive locks. I'm not 100% sure I got it right, but at least dtrace doesn't complain about it: ---snip--- #pragma D option dynvarsize=32m fbt:nullfs:null_lock:entry /args[0]->a_flags & 0x08 != 0/ { stack(); } ---snip--- In which direction should I look with dtrace if this works in tonights run of periodic? I don't have enough knowledge about VFS to come up with some immediate ideas. After your sysctl fix for maxvnodes I increased the amount of vnodes 10 times compared to the initial report. This has increased the speed of the operation, the find runs in all those jails finished today after ~5h (@~8am) instead of in the afternoon as before. Could this suggest that in parallel some null_reclaim() is running which does the exclusive locks and slows down the entire operation? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned
Mark, On 03.09.2023 22:54, Mark Millard wrote: After that ^t produced the likes of: load: 6.39 cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 1% 13004k So the full state is not "tx->tx", but is actually a "tx->tx_quiesce_done_cv", which means the thread is waiting for new transaction to be opened, which means some previous to be quiesced and then synced. #0 0x80b6f103 at mi_switch+0x173 #1 0x80bc0f24 at sleepq_switch+0x104 #2 0x80aec4c5 at _cv_wait+0x165 #3 0x82aba365 at txg_wait_open+0xf5 #4 0x82a11b81 at dmu_free_long_range+0x151 Here it seems like transaction commit is waited due to large amount of delete operations, which ZFS tries to spread between separate TXGs. You should probably see some large and growing number in sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay . #5 0x829a87d2 at zfs_rmnode+0x72 #6 0x829b658d at zfs_freebsd_reclaim+0x3d #7 0x8113a495 at VOP_RECLAIM_APV+0x35 #8 0x80c5a7d9 at vgonel+0x3a9 #9 0x80c5af7f at vrecycle+0x3f #10 0x829b643e at zfs_freebsd_inactive+0x4e #11 0x80c598cf at vinactivef+0xbf #12 0x80c590da at vput_final+0x2aa #13 0x80c68886 at kern_funlinkat+0x2f6 #14 0x80c68588 at sys_unlink+0x28 #15 0x8106323f at amd64_syscall+0x14f #16 0x8103512b at fast_syscall_common+0xf8 What we don't see here is what quiesce and sync threads of the pool are actually doing. Sync thread has plenty of different jobs, including async write, async destroy, scrub and others, that all may delay each other. Before you rebooted the system, depending how alive it is, could you save a number of outputs of `procstat -akk`, or at least specifically `procstat -akk | grep txg_thread_enter` if the full is hard? Or somehow else observe what they are doing. `zpool status`, `zpool get all` and `sysctl -a` would also not harm. PS: I may be wrong, but USB in "USB3 NVMe SSD storage" makes me shiver. Make sure there is no storage problems, like some huge delays, timeouts, etc, that can be seen, for example, as busy percents regularly spiking far above 100% in your `gstat -spod`. -- Alexander Motin
Re: 100% CPU time for sysctl command, not killable
Am 2023-09-02 16:56, schrieb Mateusz Guzik: On 8/20/23, Alexander Leidinger wrote: Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... fixed here https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3 I confirm. Thanks! Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: kernel 100% CPU, and ports-mgmt/poudriere-devel 'Inspecting ports tree for modifications to git checkout...' for an extraordinarily long time
On 02.09.2023 09:32, Graham Perrin wrote: On 02/09/2023 10:17, Mateusz Guzik wrote: get a flamegraph with dtrace https://github.com/brendangregg/FlameGraph See <https://bz-attachments.freebsd.org/attachment.cgi?id=244586> for a PDF of a reply that probably did not reach the list. Graham, the original SVG was scalable and searchable in browser. Your PNG inside PDF is not. -- Alexander Motin
Re: 100% CPU time for sysctl command, not killable
Am 2023-08-20 21:23, schrieb Alexander Leidinger: Am 2023-08-20 18:55, schrieb Mina Galić: procstat(1) kstack could be helpful here. Original Message On 20 Aug 2023, 17:29, Alexander Leidinger alexan...@leidinger.net> wrote: Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 0x8F31830F9F2772BF PIDTID COMMTDNAME KSTACK 94391 118678 sysctl - sysctl_maxvnodes sysctl_root_handler_locked sysctl_root userland_sysctl sys___sysctl amd64_syscall fast_syscall_common I experimented a bit by multiplying my initial value of 104857600. It fails between 5 and 6 times the initial value. sysctl kern.maxvnodes=524288000 is successful within 4 seconds. sysctl kern.maxvnodes=629145600 goes into a loop with the same procstat -k output. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Possible issue with linux xattr support?
Am 2023-08-29 21:31, schrieb Felix Palmen: * Shawn Webb [20230829 15:25]: On Tue, Aug 29, 2023 at 09:15:03PM +0200, Felix Palmen wrote: > * Kyle Evans [20230829 14:07]: > > On 8/29/23 14:02, Shawn Webb wrote: > > > Back in 2019, I had a similar issue: I needed access to be able to > > > read/write to the system extended attribute namespace from within a > > > jailed context. I wrote a rather simple patch that provides that > > > support on a per-jail basis: > > > > > > https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/96c85982b45e44a6105664c7068a92d0a61da2a3 > > > > > > Hopefully that's useful to someone. > > > > > > Thanks, > > > > > > > FWIW (which likely isn't much), I like this approach much better; it makes > > more sense to me that it's a feature controlled by the creator of the jail > > and not one allowed just by using a compat ABI within a jail. > > Well, a typical GNU userland won't work in a jail without this, that's > what I know now. But I'm certainly with you, it doesn't feel logical > that a Linux binary can do something in a jail a FreeBSD binary can't. > > So, indeed, making it a jail option sounds better. > > Unless, bringing back a question raised earlier in this thread: What's > the reason to restrict this in a jailed context in the first place? IOW, > could it just be allowed unconditionally? In HardenedBSD's case, since we use filesystem extended attributes to toggle exploit mitigations on a per-application basis, there's now a conceptual security boundary between the host and the jail. Should the jail and the host share resources, like executables, a jailed process could toggle an exploit mitigation, and the toggle would bubble up to the host. So the next time the host executed /shared/app/executable/here, the security posture of the host would be affected. Isn't the sane approach here *not* to share any executables with a jail other than via a read-only nullfs mount? In https://reviews.freebsd.org/D40370 I provide infrastructure to automatically jail rc.d services. It will use the complete filesystem of the system, but uses all the other restrictions of jails. So the answer to your questions is "it depends". Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Possible issue with linux xattr support?
Am 2023-08-29 21:02, schrieb Shawn Webb: Back in 2019, I had a similar issue: I needed access to be able to read/write to the system extended attribute namespace from within a jailed context. I wrote a rather simple patch that provides that support on a per-jail basis: https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/96c85982b45e44a6105664c7068a92d0a61da2a3 You enabled it by default. I would assume you had a thought about the implications... any memories about it? What I'm after is: - What can go wrong if we enable it by default? - Why would we like to disable it (or any ideas why it is disabled by default in FreeBSD)? Depending in the answers we may even use a simpler patch and have it allowed in jails even without the possibility to configure it. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-22 18:59, schrieb Mateusz Guzik: On 8/22/23, Alexander Leidinger wrote: Am 2023-08-21 10:53, schrieb Konstantin Belousov: On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger > > >>>> wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you > > >>>>> interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has > > >>> several > > >>> null mounts. One basesystem mounted into every jail, and then > > >>> shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order to > > >>>> do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from > > >>>> exclusive locking which should not be there to begin with -- as > > >>>> in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of > > >> xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how > > this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set > for > fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. By nfs I meant nfs client, not nfs exports. No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails. While I don't see why xlocking happens, you should be able to dtrace or printf your way into finding out. dtrace looks to me like a faster approach to get to the root than printf... my first naive try is to detect exclusive locks. I'm not 100% sure I got it right, but at least dtrace doesn't complain about it: ---snip--- #pragma D option dynvarsize=32m fbt:nullfs:null_lock:entry /args[0]->a_flags & 0x08 != 0/ { stack(); } ---snip--- In which direction should I look with dtrace if this works in tonights run of periodic? I don't have enough knowledge about VFS to come up with some immediate ideas. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: zfs autotrim default to off now
On 28.08.2023 13:56, Pete Wright wrote: So to be clear, if we were using the default autotrim=enabled behavior we in fact weren't having our SSDs trimmed? I think that's my concern, as an admin I was under the impression that it was enabled by default but apparently that wasn't actually happening. We wanted autotrim to be enabled by default, but it was not enabled, and it was reported as not enabled, so there should be no confusion. The only confusion may have been if you tried to read the code and see it should have been enabled. -- Alexander Motin
Re: zfs autotrim default to off now
Hi Pete, On 27.08.2023 23:34, Pete Wright wrote: looking at a recent pull of CURRENT i'm noticing this in the git logs: #15079 set autotrim default to 'off' everywhere which references this openzfs PR: https://github.com/openzfs/zfs/pull/15079 looking at the PR i'm not seeing a reference to a bug report or anything, is anyone able to point me to a bug report for this. it seems like a pretty major issue: "As it turns out having autotrim default to 'on' on FreeBSD never really worked due to mess with defines where userland and kernel module were getting different default values (userland was defaulting to 'off', module was thinking it's 'on')." i'd just like to make sure i better understand the issue and can see if my systems are impacted. You are probably misinterpreting the quote. There is nothing wrong with the autotrim itself, assuming your specific devices properly handle it. It is just saying that setting it to "on" by default on FreeBSD, that was done to keep pre-OpenZFS behavior, appeared broken for a while. So that commit merely confirmed the status quo. It should not affect any already existing pools. On a new pool creation the default is now officially "off", matching OpenZFS on other platforms, but there is no reason why you can not set it to "on", if it is beneficial for your devices and workloads. As alternative, for example, you may run trim manually from time to time during any low activity periods. -- Alexander Motin
Re: Possible issue with linux xattr support?
Am 2023-08-28 13:06, schrieb Dmitry Chagin: On Sun, Aug 27, 2023 at 09:55:23PM +0200, Felix Palmen wrote: * Dmitry Chagin [20230827 22:46]: > I can fix this completely disabling exttatr for jailed proc, > however, it's gonna be bullshit, though Would probably be better than nothing. AFAIK, "Linux jails" are used a lot, probably with userlands from distributions actually using xattr. It might sense to allow this priv (PRIV_VFS_EXTATTR_SYSTEM) for linux jails by default? What do think, James? I think the question is more if we want to allow it in jails (not specific to linux jails, as in: if it is ok for linux jails, it should be ok for FreeBSD jails too). So the question is what does this protect the hosts from, if this is not allowed in jails? Some kind of possibility to DoS the host? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: ZFS deadlock in 14
Martin, The PR was just merged to upstream master. Merge to zfs-2.2-release should follow shortly: https://github.com/openzfs/zfs/pull/15204 , same as some other 2.2 fixes: https://github.com/openzfs/zfs/pull/15205 . Can't wait to get back in sync with ZFS master in FreeBSD main. ;) On 22.08.2023 12:18, Alexander Motin wrote: Hi Martin, I am waiting for final test results from George Wilson and then will request quick merge of both to zfs-2.2-release branch. Unfortunately there are still not many reviewers for the PR, since the code is not trivial, but at least with the test reports Brian Behlendorf and Mark Maybee seem to be OK to merge the two PRs into 2.2. If somebody else have tested and/or reviewed the PR, you may comment on it. On 22.08.2023 04:26, Martin Matuska wrote: as 15107 is a prerequisite for 15122, would it be possible to have https://github.com/openzfs/zfs/pull/15107 merged into the OpenZFS zfs-2.2-release branch (and of course later 15122)? If the patches help I can cherry-pick them into main. Alexander Motin wrote: On 17.08.2023 15:41, Dag-Erling Smørgrav wrote: Alexander Motin writes: Trying to run your test (so far without reproduction) I see it producing a substantial amount of ZIL writes. The range of commits you reduced the scope to so far includes my ZIL locking refactoring, where I know for sure are some deadlocks. I am already waiting for 3 weeks now for reviews and tests for PR that should fix it: https://github.com/openzfs/zfs/pull/15122 . It would be good if you could test it, though it seems to depend on few more earlier patches not merged to FreeBSD yet. Do you have a FreeBSD branch with your patch applied? I don't have a FreeBSD branch, but these two patches apply clean and build on top of today's FreeBSD main branch: https://github.com/openzfs/zfs/pull/15107 https://github.com/openzfs/zfs/pull/15122 And if you still experience the issue, please show all stacks, or at least include ZFS sync threads. -- Alexander Motin
Re: ZFS deadlock in 14
On 22.08.2023 14:24, Mark Millard wrote: Alexander Motin wrote on Date: Tue, 22 Aug 2023 16:18:12 UTC : I am waiting for final test results from George Wilson and then will request quick merge of both to zfs-2.2-release branch. Unfortunately there are still not many reviewers for the PR, since the code is not trivial, but at least with the test reports Brian Behlendorf and Mark Maybee seem to be OK to merge the two PRs into 2.2. If somebody else have tested and/or reviewed the PR, you may comment on it. I had written to the list that when I tried to test the system doing poudriere builds (initially with your patches) using USE_TMPFS=no so that zfs had to deal with all the file I/O, I instead got only one builder that ended up active, the others never reaching "Builder started": Top was showing lots of "vlruwk" for the cpdup's. For example: . . . 362 0 root 400 27076Ki 13776Ki CPU19 19 4:23 0.00% cpdup -i0 -o ref 32 349 0 root 530 27076Ki 13776Ki vlruwk 22 4:20 0.01% cpdup -i0 -o ref 31 328 0 root 680 27076Ki 13804Ki vlruwk 8 4:30 0.01% cpdup -i0 -o ref 30 304 0 root 370 27076Ki 13792Ki vlruwk 6 4:18 0.01% cpdup -i0 -o ref 29 282 0 root 420 33220Ki 13956Ki vlruwk 8 4:33 0.01% cpdup -i0 -o ref 28 242 0 root 560 27076Ki 13796Ki vlruwk 4 4:28 0.00% cpdup -i0 -o ref 27 . . . But those processes did show CPU?? on occasion, as well as *vnode less often. None of the cpdup's was stuck in Removing your patches did not change the behavior. Mark, to me "vlruwk" looks like a limit on number of vnodes. I was not deep in that area at least recently, so somebody with more experience there could try to diagnose it. At very least it does not look related to the ZIL issue discussed in this thread, at least with the information provided, so I am not surprised that the mentioned patches do not affect it. -- Alexander Motin
Re: ZFS deadlock in 14
Hi Martin, I am waiting for final test results from George Wilson and then will request quick merge of both to zfs-2.2-release branch. Unfortunately there are still not many reviewers for the PR, since the code is not trivial, but at least with the test reports Brian Behlendorf and Mark Maybee seem to be OK to merge the two PRs into 2.2. If somebody else have tested and/or reviewed the PR, you may comment on it. On 22.08.2023 04:26, Martin Matuska wrote: as 15107 is a prerequisite for 15122, would it be possible to have https://github.com/openzfs/zfs/pull/15107 merged into the OpenZFS zfs-2.2-release branch (and of course later 15122)? If the patches help I can cherry-pick them into main. Alexander Motin wrote: On 17.08.2023 15:41, Dag-Erling Smørgrav wrote: Alexander Motin writes: Trying to run your test (so far without reproduction) I see it producing a substantial amount of ZIL writes. The range of commits you reduced the scope to so far includes my ZIL locking refactoring, where I know for sure are some deadlocks. I am already waiting for 3 weeks now for reviews and tests for PR that should fix it: https://github.com/openzfs/zfs/pull/15122 . It would be good if you could test it, though it seems to depend on few more earlier patches not merged to FreeBSD yet. Do you have a FreeBSD branch with your patch applied? I don't have a FreeBSD branch, but these two patches apply clean and build on top of today's FreeBSD main branch: https://github.com/openzfs/zfs/pull/15107 https://github.com/openzfs/zfs/pull/15122 And if you still experience the issue, please show all stacks, or at least include ZFS sync threads. -- Alexander Motin
Re: Speed improvements in ZFS
Am 2023-08-21 10:53, schrieb Konstantin Belousov: On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote: Am 2023-08-20 23:17, schrieb Konstantin Belousov: > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: > > On 8/20/23, Alexander Leidinger wrote: > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik: > > >> On 8/20/23, Alexander Leidinger wrote: > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: > > >>>> On 8/18/23, Alexander Leidinger wrote: > > >>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you interested > > >>>>> to > > >>>>> get it? > > >>>>> > > >>>> > > >>>> Your problem is not the vnode limit, but nullfs. > > >>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg > > >>> > > >>> 122 nullfs mounts on this system. And every jail I setup has several > > >>> null mounts. One basesystem mounted into every jail, and then shared > > >>> ports (packages/distfiles/ccache) across all of them. > > >>> > > >>>> First, some of the contention is notorious VI_LOCK in order to do > > >>>> anything. > > >>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from > > >>>> exclusive locking which should not be there to begin with -- as in > > >>>> that xlock in stat should be a slock. > > >>>> > > >>>> Maybe I'm going to look into it later. > > >>> > > >>> That would be fantastic. > > >>> > > >> > > >> I did a quick test, things are shared locked as expected. > > >> > > >> However, I found the following: > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { > > >> mp->mnt_kern_flag |= > > >> lowerrootvp->v_mount->mnt_kern_flag & > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | > > >> MNTK_EXTENDED_SHARED); > > >> } > > >> > > >> are you using the "nocache" option? it has a side effect of xlocking > > > > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > > > > > > > If you don't have "nocache" on null mounts, then I don't see how this > > could happen. > > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for > fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. By nfs I meant nfs client, not nfs exports. No NFS client mounts anywhere on this system. So where is this exclusive lock coming from then... This is a ZFS system. 2 pools: one for the root, one for anything I need space for. Both pools reside on the same disks. The root pool is a 3-way mirror, the "space-pool" is a 5-disk raidz2. All jails are on the space-pool. The jails are all basejail-style jails. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-20 23:17, schrieb Konstantin Belousov: On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote: On 8/20/23, Alexander Leidinger wrote: > Am 2023-08-20 22:02, schrieb Mateusz Guzik: >> On 8/20/23, Alexander Leidinger wrote: >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik: >>>> On 8/18/23, Alexander Leidinger wrote: >>> >>>>> I have a 51MB text file, compressed to about 1MB. Are you interested >>>>> to >>>>> get it? >>>>> >>>> >>>> Your problem is not the vnode limit, but nullfs. >>>> >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg >>> >>> 122 nullfs mounts on this system. And every jail I setup has several >>> null mounts. One basesystem mounted into every jail, and then shared >>> ports (packages/distfiles/ccache) across all of them. >>> >>>> First, some of the contention is notorious VI_LOCK in order to do >>>> anything. >>>> >>>> But more importantly the mind-boggling off-cpu time comes from >>>> exclusive locking which should not be there to begin with -- as in >>>> that xlock in stat should be a slock. >>>> >>>> Maybe I'm going to look into it later. >>> >>> That would be fantastic. >>> >> >> I did a quick test, things are shared locked as expected. >> >> However, I found the following: >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) { >> mp->mnt_kern_flag |= >> lowerrootvp->v_mount->mnt_kern_flag & >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | >> MNTK_EXTENDED_SHARED); >> } >> >> are you using the "nocache" option? it has a side effect of xlocking > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. > If you don't have "nocache" on null mounts, then I don't see how this could happen. There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for fuse and nfs at least. 11 of those 122 nullfs mounts are ZFS datasets which are also NFS exported. 6 of those nullfs mounts are also exported via Samba. The NFS exports shouldn't be needed anymore, I will remove them. Shouldn't this implicit nocache propagate to the mount of the upper fs to give the user feedback about the effective state? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-20 22:02, schrieb Mateusz Guzik: On 8/20/23, Alexander Leidinger wrote: Am 2023-08-20 19:10, schrieb Mateusz Guzik: On 8/18/23, Alexander Leidinger wrote: I have a 51MB text file, compressed to about 1MB. Are you interested to get it? Your problem is not the vnode limit, but nullfs. https://people.freebsd.org/~mjg/netchild-periodic-find.svg 122 nullfs mounts on this system. And every jail I setup has several null mounts. One basesystem mounted into every jail, and then shared ports (packages/distfiles/ccache) across all of them. First, some of the contention is notorious VI_LOCK in order to do anything. But more importantly the mind-boggling off-cpu time comes from exclusive locking which should not be there to begin with -- as in that xlock in stat should be a slock. Maybe I'm going to look into it later. That would be fantastic. I did a quick test, things are shared locked as expected. However, I found the following: if ((xmp->nullm_flags & NULLM_CACHE) != 0) { mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag & (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED | MNTK_EXTENDED_SHARED); } are you using the "nocache" option? it has a side effect of xlocking I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-20 19:10, schrieb Mateusz Guzik: On 8/18/23, Alexander Leidinger wrote: I have a 51MB text file, compressed to about 1MB. Are you interested to get it? Your problem is not the vnode limit, but nullfs. https://people.freebsd.org/~mjg/netchild-periodic-find.svg 122 nullfs mounts on this system. And every jail I setup has several null mounts. One basesystem mounted into every jail, and then shared ports (packages/distfiles/ccache) across all of them. First, some of the contention is notorious VI_LOCK in order to do anything. But more importantly the mind-boggling off-cpu time comes from exclusive locking which should not be there to begin with -- as in that xlock in stat should be a slock. Maybe I'm going to look into it later. That would be fantastic. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: 100% CPU time for sysctl command, not killable
Am 2023-08-20 18:55, schrieb Mina Galić: procstat(1) kstack could be helpful here. Original Message On 20 Aug 2023, 17:29, Alexander Leidinger alexan...@leidinger.net> wrote: Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 0x8F31830F9F2772BF PIDTID COMMTDNAME KSTACK 94391 118678 sysctl - sysctl_maxvnodes sysctl_root_handler_locked sysctl_root userland_sysctl sys___sysctl amd64_syscall fast_syscall_common Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
100% CPU time for sysctl command, not killable
Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable sysctl program. This is somewhat unexpected... Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: ZFS deadlock in 14
On 18.08.2023 18:34, Dag-Erling Smørgrav wrote: Dag-Erling Smørgrav writes: Plot twist: c47116e909 _without_ the patches also appears to be working fine. The last kernel I know for sure deadlocks is b36f469a15, so I'm going to test cd25b0f740 and 28d2e3b5de. c47116e909 with cd25b0f740 and 28d2e3b5de reverted deadlocks, see attached ddb.txt. I'm going to see if reverting only 28d2e3b5de but not cd25b0f740 changes anything. Yes, it looks like ZIL-related deadlock: ZFS sync thread in zil_sync() is waiting for allocated ZIL zios to complete: Tracing command zfskern pid 5 tid 101124 td 0xfe0408cbb020 sched_switch() at sched_switch+0x5da/frame 0xfe04090f7900 mi_switch() at mi_switch+0x173/frame 0xfe04090f7920 sleepq_switch() at sleepq_switch+0x104/frame 0xfe04090f7960 _cv_wait() at _cv_wait+0x165/frame 0xfe04090f79c0 zil_sync() at zil_sync+0x9b/frame 0xfe04090f7aa0 dmu_objset_sync() at dmu_objset_sync+0x51b/frame 0xfe04090f7b70 dsl_pool_sync() at dsl_pool_sync+0x11d/frame 0xfe04090f7bf0 spa_sync() at spa_sync+0xc68/frame 0xfe04090f7e20 txg_sync_thread() at txg_sync_thread+0x2eb/frame 0xfe04090f7ef0 fork_exit() at fork_exit+0x82/frame 0xfe04090f7f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe04090f7f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Some thread requested fsync(), allocated zil zio, but stuck waiting for z_teardown_inactive_lock in attempt to get data to be written into zil, so zios were never even issued: Tracing command blacklistd pid 521 tid 101136 td 0xfe040d08d000 sched_switch() at sched_switch+0x5da/frame 0xfe040c25c710 mi_switch() at mi_switch+0x173/frame 0xfe040c25c730 sleepq_switch() at sleepq_switch+0x104/frame 0xfe040c25c770 _sleep() at _sleep+0x2d6/frame 0xfe040c25c810 rms_rlock_fallback() at rms_rlock_fallback+0xd0/frame 0xfe040c25c850 zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x2b/frame 0xfe040c25c880 VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x35/frame 0xfe040c25c8a0 vgonel() at vgonel+0x3a9/frame 0xfe040c25c910 vnlru_free_impl() at vnlru_free_impl+0x371/frame 0xfe040c25c990 vn_alloc_hard() at vn_alloc_hard+0xd3/frame 0xfe040c25c9b0 getnewvnode_reserve() at getnewvnode_reserve+0xa0/frame 0xfe040c25c9d0 zfs_zget() at zfs_zget+0x1f/frame 0xfe040c25ca80 zfs_get_data() at zfs_get_data+0x62/frame 0xfe040c25cb20 zil_lwb_commit() at zil_lwb_commit+0x32f/frame 0xfe040c25cb70 zil_lwb_write_issue() at zil_lwb_write_issue+0x4e/frame 0xfe040c25cbb0 zil_commit_impl() at zil_commit_impl+0x943/frame 0xfe040c25cd40 zfs_fsync() at zfs_fsync+0x8f/frame 0xfe040c25cd80 kern_fsync() at kern_fsync+0x18a/frame 0xfe040c25ce00 amd64_syscall() at amd64_syscall+0x138/frame 0xfe040c25cf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe040c25cf30 --- syscall (95, FreeBSD ELF64, fsync), rip = 0x3f979eeb074a, rsp = 0x3f979a449c38, rbp = 0x3f979a449c50 --- Third thread doing zfs rollback while holding z_teardown_inactive_lock tries to wait for transaction commit, causing deadlock: Tracing command zfs pid 65636 tid 138109 td 0xfe0438d721e0 sched_switch() at sched_switch+0x5da/frame 0xfe0439b2b950 mi_switch() at mi_switch+0x173/frame 0xfe0439b2b970 sleepq_switch() at sleepq_switch+0x104/frame 0xfe0439b2b9b0 _cv_wait() at _cv_wait+0x165/frame 0xfe0439b2ba10 txg_wait_synced_impl() at txg_wait_synced_impl+0xeb/frame 0xfe0439b2ba50 txg_wait_synced() at txg_wait_synced+0xb/frame 0xfe0439b2ba60 zfsvfs_teardown() at zfsvfs_teardown+0x203/frame 0xfe0439b2bab0 zfs_ioc_rollback() at zfs_ioc_rollback+0x12f/frame 0xfe0439b2bb00 zfsdev_ioctl_common() at zfsdev_ioctl_common+0x612/frame 0xfe0439b2bbc0 zfsdev_ioctl() at zfsdev_ioctl+0x12a/frame 0xfe0439b2bbf0 devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfe0439b2bc40 vn_ioctl() at vn_ioctl+0xc2/frame 0xfe0439b2bcb0 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfe0439b2bcd0 kern_ioctl() at kern_ioctl+0x286/frame 0xfe0439b2bd30 sys_ioctl() at sys_ioctl+0x152/frame 0xfe0439b2be00 amd64_syscall() at amd64_syscall+0x138/frame 0xfe0439b2bf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0439b2bf30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x1afaddea3aaa, rsp = 0x1afad4058328, rbp = 0x1afad40583a0 --- Unfortunately I think the current code in main should still suffer from this specific deadlock. cd25b0f740 fixes some deadlocks in this area, may be that is why you are getting issues less often, but I don't believe it fixes this specific one, may be you was just lucky. Only https://github.com/openzfs/zfs/pull/15122 I believe should fix them. -- Alexander Motin
Re: Speed improvements in ZFS
Am 2023-08-16 18:48, schrieb Alexander Leidinger: Am 2023-08-15 23:29, schrieb Mateusz Guzik: On 8/15/23, Alexander Leidinger wrote: Am 2023-08-15 14:41, schrieb Mateusz Guzik: With this in mind can you provide: sysctl kern.maxvnodes vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes vfs.recycles_free vfs.recycles After a reboot: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 24696 vfs.vnodes_created: 1658162 vfs.numvnodes: 173937 vfs.recycles_free: 0 vfs.recycles: 0 New values after one rund of periodic: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 356202 vfs.vnodes_created: 427696288 vfs.numvnodes: 532620 vfs.recycles_free: 20213257 vfs.recycles: 0 And after the second round which only took 7h this night: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 3071754 vfs.vnodes_created: 1275963316 vfs.numvnodes: 3414906 vfs.recycles_free: 58411371 vfs.recycles: 0 Meanwhile if there is tons of recycles, you can damage control by bumping kern.maxvnodes. What's the difference between recycles and recycles_free? Does the above count as bumping the maxvnodes? ^ Looks like there are not much free directly after the reboot. I will check the values tomorrow after the periodic run again and maybe increase by 10 or 100 so see if it makes a difference. If this is not the problem you can use dtrace to figure it out. dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or something else? I mean checking where find is spending time instead of speculating. There is no productized way to do it so to speak, but the following crapper should be good enough: [script] I will let it run this night. I have a 51MB text file, compressed to about 1MB. Are you interested to get it? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: ZFS deadlock in 14
On 17.08.2023 15:41, Dag-Erling Smørgrav wrote: Alexander Motin writes: Trying to run your test (so far without reproduction) I see it producing a substantial amount of ZIL writes. The range of commits you reduced the scope to so far includes my ZIL locking refactoring, where I know for sure are some deadlocks. I am already waiting for 3 weeks now for reviews and tests for PR that should fix it: https://github.com/openzfs/zfs/pull/15122 . It would be good if you could test it, though it seems to depend on few more earlier patches not merged to FreeBSD yet. Do you have a FreeBSD branch with your patch applied? I don't have a FreeBSD branch, but these two patches apply clean and build on top of today's FreeBSD main branch: https://github.com/openzfs/zfs/pull/15107 https://github.com/openzfs/zfs/pull/15122 And if you still experience the issue, please show all stacks, or at least include ZFS sync threads. -- Alexander Motin
Re: ZFS deadlock in 14
On 17.08.2023 14:57, Alexander Motin wrote: On 15.08.2023 12:28, Dag-Erling Smørgrav wrote: Mateusz Guzik writes: Going through the list may or may not reveal other threads doing something in the area and it very well may be they are deadlocked, which then results in other processes hanging on them. Just like in your case the process reported as hung is a random victim and whatever the real culprit is deeper. We already know the real culprit, see upthread. Dag, I looked through the thread once more, and, while thank you for tracing it, but you never went beyond txg_wait_synced() in `zfs revert` thread. If you are saying that thread is holding the lock, then the question is why transaction commit is stuck. I need to see stacks for ZFS sync threads, or better all kernel stacks, just in case. Without that information I can only speculate. Trying to run your test (so far without reproduction) I see it producing a substantial amount of ZIL writes. The range of commits you reduced the scope to so far includes my ZIL locking refactoring, where I know for sure are some deadlocks. I am already waiting for 3 weeks now for reviews and tests for PR that should fix it: https://github.com/openzfs/zfs/pull/15122 . It would be good if you could test it, though it seems to depend on few more earlier patches not merged to FreeBSD yet. Ah, appears on the pool I tested first I have sync=always from earlier tests, that explains the high amount of ZIL traffic I saw, so it may be irrelevant. But I still wonder what sync threads are doing in your case. -- Alexander Motin
Re: ZFS deadlock in 14
On 15.08.2023 12:28, Dag-Erling Smørgrav wrote: Mateusz Guzik writes: Going through the list may or may not reveal other threads doing something in the area and it very well may be they are deadlocked, which then results in other processes hanging on them. Just like in your case the process reported as hung is a random victim and whatever the real culprit is deeper. We already know the real culprit, see upthread. Dag, I looked through the thread once more, and, while thank you for tracing it, but you never went beyond txg_wait_synced() in `zfs revert` thread. If you are saying that thread is holding the lock, then the question is why transaction commit is stuck. I need to see stacks for ZFS sync threads, or better all kernel stacks, just in case. Without that information I can only speculate. Trying to run your test (so far without reproduction) I see it producing a substantial amount of ZIL writes. The range of commits you reduced the scope to so far includes my ZIL locking refactoring, where I know for sure are some deadlocks. I am already waiting for 3 weeks now for reviews and tests for PR that should fix it: https://github.com/openzfs/zfs/pull/15122 . It would be good if you could test it, though it seems to depend on few more earlier patches not merged to FreeBSD yet. -- Alexander Motin
Re: Defaulting serial communication to 115200 bps for FreeBSD 14
On 16.08.2023 18:14, Dennis Clarke wrote: The default serial communications config on most telecom equipment that I have seen ( in the last forty years ) defaults to 9600 8n1. If people want something faster from FreeBSD then do the trivial : set comconsole_speed="115200" set console="comconsole" Is that not trivial enough? Except it is not a telecom equipment 40 years ago. Even at 115200 that I routinely use on my development systems I feel serial console output affects verbose boot time and kernel console debugging output. I also have BIOS console redirection enabled on my systems, and I believe the default there is also 115200, and even that is pretty slow. I see no point to stay compatible if it is unusable. -- Alexander Motin
Re: Speed improvements in ZFS
Am 2023-08-15 23:29, schrieb Mateusz Guzik: On 8/15/23, Alexander Leidinger wrote: Am 2023-08-15 14:41, schrieb Mateusz Guzik: With this in mind can you provide: sysctl kern.maxvnodes vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes vfs.recycles_free vfs.recycles After a reboot: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 24696 vfs.vnodes_created: 1658162 vfs.numvnodes: 173937 vfs.recycles_free: 0 vfs.recycles: 0 New values after one rund of periodic: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 356202 vfs.vnodes_created: 427696288 vfs.numvnodes: 532620 vfs.recycles_free: 20213257 vfs.recycles: 0 Meanwhile if there is tons of recycles, you can damage control by bumping kern.maxvnodes. What's the difference between recycles and recycles_free? Does the above count as bumping the maxvnodes? Looks like there are not much free directly after the reboot. I will check the values tomorrow after the periodic run again and maybe increase by 10 or 100 so see if it makes a difference. If this is not the problem you can use dtrace to figure it out. dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or something else? I mean checking where find is spending time instead of speculating. There is no productized way to do it so to speak, but the following crapper should be good enough: [script] I will let it run this night. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Speed improvements in ZFS
Am 2023-08-15 14:41, schrieb Mateusz Guzik: With this in mind can you provide: sysctl kern.maxvnodes vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes vfs.recycles_free vfs.recycles After a reboot: kern.maxvnodes: 10485760 vfs.wantfreevnodes: 2621440 vfs.freevnodes: 24696 vfs.vnodes_created: 1658162 vfs.numvnodes: 173937 vfs.recycles_free: 0 vfs.recycles: 0 Meanwhile if there is tons of recycles, you can damage control by bumping kern.maxvnodes. Looks like there are not much free directly after the reboot. I will check the values tomorrow after the periodic run again and maybe increase by 10 or 100 so see if it makes a difference. If this is not the problem you can use dtrace to figure it out. dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or something else? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Strange network issues with -current
Am 2023-08-15 14:24, schrieb Alexander Leidinger: Am 2023-08-15 13:48, schrieb Alexander Leidinger: since a while I have some strange network issues in some parts of a particular system. I just stumbled upon the mail which discusses issues with commit e3ba0d6adde3, and when I look into this I see changes related to the use of SO_REUSEPORT flags, and all my nginx systems use the reuseport directive in their config. I'm compiling right now with his change reverted. Once tested I will report back. Unfortunately it wasn't that. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: Strange network issues with -current
Am 2023-08-15 13:48, schrieb Alexander Leidinger: since a while I have some strange network issues in some parts of a particular system. I just stumbled upon the mail which discusses issues with commit e3ba0d6adde3, and when I look into this I see changes related to the use of SO_REUSEPORT flags, and all my nginx systems use the reuseport directive in their config. I'm compiling right now with his change reverted. Once tested I will report back. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Speed improvements in ZFS
Hi, just a report that I noticed a very high speed improvement in ZFS in -current. Since a looong time (at least since last year), for a jail-host of mine with about >20 jails on it which each runs periodic daily, the periodic daily runs of the jails take from about 3 am to 5pm or longer. I don't remember when this started, and I thought at that time that the problem may be data related. It's the long runs of "find" in one of the periodic daily jobs which takes that long, and the number of jails together with null-mounted basesystem inside the jail and a null-mounted package repository inside each jail the number of files and congruent access to the spining rust with first SSD and now NVME based cache may have reached some tipping point. I have all the periodic daily mails around, so theoretically I may be able to find when this started, but as can be seen in another mail to this mailinglist, the system which has all the periodic mails has some issues which have higher priority for me to track down... Since I updated to a src from 2023-07-20, this is not the case anymore. The data is the same (maybe even a bit more, as I have added 2 more jails since then and the periodic daily runs which run more or less in parallel, are not taking considerably longer). The speed increase with the July-build are in the area of 3-4 hours for 23 parallel periodic daily runs. So instead of finishing the periodic runs around 5pm, they finish already around 1pm/2pm. So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and 2023-07-20 has given a huge speed improvement. From my memory I would say there is still room for improvement, as I think it may be the case that the periodic daily runs ended in the morning instead of the afteroon, but my memory may be flaky in this regard... Great work to whoever was involved. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Strange network issues with -current
Hi, since a while I have some strange network issues in some parts of a particular system. A build with src from 2023-07-26 was still working ok. An update to 2023-08-07 broke some parts in a strange way. I tried again with src from 2023-08-11 didn't fix things. What I see is... strange and complex. I have a jail host with about 23 jails. All the jails are sitting on a bridge, and have IPv6 and IPV4 addresses. One jail is a DNS server for a domain which contains all the DNS entries for all the jails on the system (and more). Other jails have mysql (FS socket for mysql nullfs-mounted into other jails for connecting to mysql via the FS socket instead of the network), dovecot IMAP server, postfix SMTP server, a nginx based reverse proxy and 2 different kinds of webmail solutions (old php74 based on the way out on favour for a php81 based one), a wiki and other things. With the old working basesystem I can login into the old webmail system and read mails. With the newer non-working basesystem I still can login, but the auth-credentials are not stored in the backend-session and as such no mail is listed at all, as this requires subsequent connections from php to dovecot. This webmail system is going via the reverse proxy to the webmail-jail which has another nginx configured to connect to the php-fpm backend. With the new webmail system I can login, read mails, and even are writing this email from. The first login to it fails. The second succeeds. It is not behind the reverse proxy (as it is not fully ready yet for access from the outside (DSL with NAT on the DSL-box to the reverse proxy)), but a single nginx with php-fpm backend (instead of 2 nginx + php-fpm as in the old webmail). The wiki behind the reverse proxy is sometimes working, and sometimes not. Sometimes it is providing everything, sometimes parts of the site is missing (e.g. pictures / icons). Sometimes there is simply a blank page, sometimes it gives an error message from the wiki about an unforseen bug... The error messages in the nginx reverse proxy log for all the strange failure cases is "accept4() failed (53: Software caused connection abort)". Sometimes I get "upstream timed out". When it times out in the reverse proxy instead of getting the accept4-errors, I see the same accept4-error message in the nginx inside the wiki or webmail jail instead. I tried to recompile all the components of the wiki and reverse proxy and php81 based webmail, to no avail. The issue persists. Does this ring a bell to someone? Maybe some network or socket or VM based changes in this timeframe which smell like they could be related and maybe good candidates for a backup-test? Any ideas how to drill down with debugging to have a more simple test-case than the complex setup of if_bridge, epair, jails, wiki, php, nginx, ...? Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF
Re: RTM_NEWNEIGH message for static ARP entry
On Wed, 21 Jun 2023, at 5:19 PM, Hartmut Brandt wrote: > Hi, > > when I set a static ARP entry I see an RTM_NEWNEIGH message on a netlink > socket as expected, but the ndm_state is NUD_INCOMPLETE. Should'nt this be > NUD_NOARP? At least this is what Linux returns. Thanks for the report, I’ll take a look. To me, NUD_REACHABLE | NUD_PERMANENT looks better suited for the particular case, but I’ll dive deeper tomorrow. Anyway NUD_INCOMPLETE is certainly wrong. > > Cheers, > Harti > > /Alexander
Re: kernel: sonewconn: pcb 0xfffff8002b255a00 (local:/var/run/devd.seqpacket.pipe): Listen queue overflow: 1 already in queue awaiting acceptance (60 occurrences), ?
Quoting Gary Jennejohn (from Tue, 20 Jun 2023 14:41:41 +): On Tue, 20 Jun 2023 12:04:13 +0200 Alexander Leidinger wrote: "listen X backlog=y" and "sysctl kern.ipx.somaxconn=X" for FreeBSD On my FreeBSD14 system these things are all under kern.ipc. Typo on my side... it was supposed to read ipc, not ipx. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpTjpqaetzBB.pgp Description: Digitale PGP-Signatur
Re: kernel: sonewconn: pcb 0xfffff8002b255a00 (local:/var/run/devd.seqpacket.pipe): Listen queue overflow: 1 already in queue awaiting acceptance (60 occurrences), ?
Quoting Gary Jennejohn (from Tue, 20 Jun 2023 07:41:08 +): On Tue, 20 Jun 2023 06:25:05 +0100 Graham Perrin wrote: Please, what's the meaning of the sonewconn lines? sonewconn is described in socket(9). Below a copy/paste of the description from socket(9): Protocol implementations can use sonewconn() to create a socket and attach protocol state to that socket. This can be used to create new sockets available for soaccept() on a listen socket. The returned socket has a reference count of zero. Apparently there was already a listen socket in the queue which had not been consumed by soaccept() when a new sonewconn() call was made. Anyway, that's my understanding. Might be wrong. In other words the software listening on it didn't process the request fast enough and a backlog piled up (e.g apache ListenBacklog or nginx "listen X backlog=y" and "sysctl kern.ipx.somaxconn=X" for FreeBSD itself). You may need faster hardware, more processes/threads to handle the traffic, or configure your software to do less to produce the same result (e.g. no real-time DNS resolution in the logging of a webserver or increasing the amount of allowed items in the backlog). If you can change the software, there's also the possibility to switch from blocking sockets to non-blocking sockets (to not have the select/accept loop block / run into contention) or kqueue. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpAjQQlBmAmQ.pgp Description: Digitale PGP-Signatur
Re: ifconfig dumps core and gdb uses an undefined symbol
> On 14 Jun 2023, at 11:35, Gary Jennejohn wrote: > > On Wed, 14 Jun 2023 11:05:31 +0100 > Alexander Chernikov wrote: > >>> On 14 Jun 2023, at 10:53, Gary Jennejohn wrote: >>> >>> On Wed, 14 Jun 2023 09:01:35 + >>> Gary Jennejohn mailto:ga...@gmx.de>> wrote: >>> >>>> On Wed, 14 Jun 2023 09:09:04 +0100 >>>> Alexander Chernikov wrote: >>>> >>>>>> On 14 Jun 2023, at 08:59, Gary Jennejohn wrote: >>>>> Hi Gary, >>>>>> >>>>>> So, now I have a new problem with current. >>>>>> >>>>>> I just now updated my current sources and ran buildworld and buildkernel, >>>>>> since Gleb fixed the WITHOUT_PF problem. >>>>>> >>>>>> After installing the new world and kernel I see that ifconfig is dumping >>>>>> a core, apparently when it tries to show lo0, since re0 is correctly >>>>>> shown: >>>>>> >>>>>> ifconfig >>>>>> re0: flags=8843 metric 0 mtu >>>>>> 4088 >>>>>> options=82098 >>>>>> ether redacted >>>>>> inet 192.168.178.XXX netmask 0xff00 broadcast 192.168.178.255 >>>>>> Segmentation fault (core dumped) >>>>> Could you please try to narrow down the crashing command? e.g. >>>>> Ifconfig lo0 >>>>> Ifconfig lo0 net >>>>> Ifconfig lo0 inet6 >>>>> Could you try to rebuild ifconfig w/o netlink (e.g. set >>>>> WITHOUT_NETLINK=yes in the make.conf & make -C sbin/ifconfig clean all >>>>> install) and see if the new binary works? >>>>> >>>> >>>> I already have WITHOUT_NETLINK=yes in my /etc/src.conf. >>>> >>>> I didn't install ifconfig. I simply started it from the build directory. >>>> >>>> ifconfig lo0 shows the settings for lo0 and then dumps core. >>>> >>> >>> After your most recent changes "ifconfig re0" and "ifconfg lo0" don't >>> result in any errors. But "ifconfig" alone still results in a core >>> dump, which per gdb is happening in the strlcpy() call at in_status_tunnel() >>> in af_inet.c. >> Indeed. >> >> diff --git a/sbin/ifconfig/ifconfig.c b/sbin/ifconfig/ifconfig.c >> index d30d3e1909ae..6a80ad5763b2 100644 >> --- a/sbin/ifconfig/ifconfig.c >> +++ b/sbin/ifconfig/ifconfig.c >> @@ -822,6 +822,7 @@ list_interfaces_ioctl(if_ctx *ctx) >>continue; >>if (!group_member(ifa->ifa_name, args->matchgroup, >> args->nogroup)) >>continue; >> + ctx->ifname = cp; >>/* >> * Are we just listing the interfaces? >> */ >> >> Does this one fix the crash? >>> > > YES! Should be fixed by 52ff8883185a then. Thank you for the report and sorry for the breakage! > > -- > Gary Jennejohn >
Re: ifconfig dumps core and gdb uses an undefined symbol
> On 14 Jun 2023, at 10:53, Gary Jennejohn wrote: > > On Wed, 14 Jun 2023 09:01:35 + > Gary Jennejohn mailto:ga...@gmx.de>> wrote: > >> On Wed, 14 Jun 2023 09:09:04 +0100 >> Alexander Chernikov wrote: >> >>>> On 14 Jun 2023, at 08:59, Gary Jennejohn wrote: >>> Hi Gary, >>>> >>>> So, now I have a new problem with current. >>>> >>>> I just now updated my current sources and ran buildworld and buildkernel, >>>> since Gleb fixed the WITHOUT_PF problem. >>>> >>>> After installing the new world and kernel I see that ifconfig is dumping >>>> a core, apparently when it tries to show lo0, since re0 is correctly >>>> shown: >>>> >>>> ifconfig >>>> re0: flags=8843 metric 0 mtu 4088 >>>> options=82098 >>>> ether redacted >>>> inet 192.168.178.XXX netmask 0xff00 broadcast 192.168.178.255 >>>> Segmentation fault (core dumped) >>> Could you please try to narrow down the crashing command? e.g. >>> Ifconfig lo0 >>> Ifconfig lo0 net >>> Ifconfig lo0 inet6 >>> Could you try to rebuild ifconfig w/o netlink (e.g. set WITHOUT_NETLINK=yes >>> in the make.conf & make -C sbin/ifconfig clean all install) and see if the >>> new binary works? >>> >> >> I already have WITHOUT_NETLINK=yes in my /etc/src.conf. >> >> I didn't install ifconfig. I simply started it from the build directory. >> >> ifconfig lo0 shows the settings for lo0 and then dumps core. >> > > After your most recent changes "ifconfig re0" and "ifconfg lo0" don't > result in any errors. But "ifconfig" alone still results in a core > dump, which per gdb is happening in the strlcpy() call at in_status_tunnel() > in af_inet.c. Indeed. diff --git a/sbin/ifconfig/ifconfig.c b/sbin/ifconfig/ifconfig.c index d30d3e1909ae..6a80ad5763b2 100644 --- a/sbin/ifconfig/ifconfig.c +++ b/sbin/ifconfig/ifconfig.c @@ -822,6 +822,7 @@ list_interfaces_ioctl(if_ctx *ctx) continue; if (!group_member(ifa->ifa_name, args->matchgroup, args->nogroup)) continue; + ctx->ifname = cp; /* * Are we just listing the interfaces? */ Does this one fix the crash? > > -- > Gary Jennejohn
Re: ifconfig dumps core and gdb uses an undefined symbol
> On 14 Jun 2023, at 10:01, Gary Jennejohn wrote: > > On Wed, 14 Jun 2023 09:09:04 +0100 > Alexander Chernikov mailto:melif...@freebsd.org>> > wrote: > >>> On 14 Jun 2023, at 08:59, Gary Jennejohn wrote: >> Hi Gary, >>> >>> So, now I have a new problem with current. >>> >>> I just now updated my current sources and ran buildworld and buildkernel, >>> since Gleb fixed the WITHOUT_PF problem. >>> >>> After installing the new world and kernel I see that ifconfig is dumping >>> a core, apparently when it tries to show lo0, since re0 is correctly >>> shown: >>> >>> ifconfig >>> re0: flags=8843 metric 0 mtu 4088 >>> options=82098 >>> ether redacted >>> inet 192.168.178.XXX netmask 0xff00 broadcast 192.168.178.255 >>> Segmentation fault (core dumped) >> Could you please try to narrow down the crashing command? e.g. >> Ifconfig lo0 >> Ifconfig lo0 net >> Ifconfig lo0 inet6 >> Could you try to rebuild ifconfig w/o netlink (e.g. set WITHOUT_NETLINK=yes >> in the make.conf & make -C sbin/ifconfig clean all install) and see if the >> new binary works? >> > > I already have WITHOUT_NETLINK=yes in my /etc/src.conf. > > I didn't install ifconfig. I simply started it from the build directory. > > ifconfig lo0 shows the settings for lo0 and then dumps core. > >>> >>> Unfortunately, I see this error message when I try to look at the core >>> file with gdb: >>> >>> gdb /sbin/ifconfig ifconfig.core >>> ld-elf.so.1: Undefined symbol "rl_eof_found" referenced from COPY >>> relocation in /usr/local/bin/gdb >> Not a specialist here, but if you could build the binary with debug >> (make DEBUG_FLAGS=-O0 -g3 sbin/ifconfig clean all install) & share the >> binary & core with me, I could take a look on what?s happening. >>> > > I compiled gbd under /usr/ports and it now works, although it's emitting > some weird errors. > > -O0 -g3 removes too much and gdb shows no useful information. > > With just -g3 I get this output from gdb after running the newly compiled > ifconfig: > > Program terminated with signal SIGSEGV, Segmentation fault > warning: Section `.reg-xstate/100294' in core file too small. > #0 lagg_status (ctx=0x2f051660ba00) at /usr/src/sbin/ifconfig/iflagg.c:223 > 223 const int verbose = ctx->args->verbose; > (gdb) bt > #0 lagg_status (ctx=0x2f051660ba00) at /usr/src/sbin/ifconfig/iflagg.c:223 > #1 0x2efcf610ea55 in af_other_status (ctx=0x2f051660ba00) >at /usr/src/sbin/ifconfig/ifconfig.c:964 > #2 status (args=0x2f051660ba70, ifa=0x2f051a2f2000, sdl=) >at /usr/src/sbin/ifconfig/ifconfig.c:1788 > #3 list_interfaces_ioctl (args=0x2f051660ba70) >at /usr/src/sbin/ifconfig/ifconfig.c:845 > #4 list_interfaces (args=0x2f051660ba70) >at /usr/src/sbin/ifconfig/ifconfig.c:428 > #5 main (ac=, av=) >at /usr/src/sbin/ifconfig/ifconfig.c:724 > (gdb) > > I looked at ctx: > > (gdb) p ctx > $1 = (if_ctx *) 0x2f051660ba00 > (gdb) p/x *0x2f051660ba00 > $2 = 0x0 <== > (gdb) > > So, looks like the problem is in iflagg and ctx is NULL. Ack. Does bbad5525fabf fix the issue? > > -- > Gary Jennejohn
Re: ifconfig dumps core and gdb uses an undefined symbol
> On 14 Jun 2023, at 08:59, Gary Jennejohn wrote: Hi Gary, > > So, now I have a new problem with current. > > I just now updated my current sources and ran buildworld and buildkernel, > since Gleb fixed the WITHOUT_PF problem. > > After installing the new world and kernel I see that ifconfig is dumping > a core, apparently when it tries to show lo0, since re0 is correctly > shown: > > ifconfig > re0: flags=8843 metric 0 mtu 4088 > options=82098 > ether redacted > inet 192.168.178.XXX netmask 0xff00 broadcast 192.168.178.255 > Segmentation fault (core dumped) Could you please try to narrow down the crashing command? e.g. Ifconfig lo0 Ifconfig lo0 net Ifconfig lo0 inet6 Could you try to rebuild ifconfig w/o netlink (e.g. set WITHOUT_NETLINK=yes in the make.conf & make -C sbin/ifconfig clean all install) and see if the new binary works? > > Unfortunately, I see this error message when I try to look at the core > file with gdb: > > gdb /sbin/ifconfig ifconfig.core > ld-elf.so.1: Undefined symbol "rl_eof_found" referenced from COPY > relocation in /usr/local/bin/gdb Not a specialist here, but if you could build the binary with debug (make DEBUG_FLAGS=“-O0 -g3” sbin/ifconfig clean all install) & share the binary & core with me, I could take a look on what’s happening. > > pkg claims that my packages are all up to date. > > Not exactly a fatal error, but still rather surprising. > > -- > Gary Jennejohn >
Re: panic(s) in ZFS on CURRENT
Hi Gleb, There are two probably related PRs upstream: https://github.com/openzfs/zfs/pull/14939 https://github.com/openzfs/zfs/pull/14954 On 09.06.2023 00:57, Gleb Smirnoff wrote: On Thu, Jun 08, 2023 at 07:56:07PM -0700, Gleb Smirnoff wrote: T> I'm switching to INVARIANTS kernel right now and will see if that panics earlier. This is what I got with INVARIANTS: panic: VERIFY3(dev->l2ad_hand <= dev->l2ad_evict) failed (225142071296 <= 225142063104) cpuid = 17 time = 1686286015 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfe0160dcea90 kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe0160dceb40 vpanic() at vpanic+0x21f/frame 0xfe0160dcebe0 spl_panic() at spl_panic+0x4d/frame 0xfe0160dcec60 l2arc_write_buffers() at l2arc_write_buffers+0xcda/frame 0xfe0160dcedf0 l2arc_feed_thread() at l2arc_feed_thread+0x547/frame 0xfe0160dceec0 fork_exit() at fork_exit+0x122/frame 0xfe0160dcef30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe0160dcef30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 1m4s Dumping 5473 out of 65308 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% (kgdb) frame 4 #4 0x804342ea in l2arc_write_buffers (spa=0xfe022e942000, dev=0xfe023116a000, target_sz=16777216) at /usr/src/FreeBSD/sys/contrib/openzfs/module/zfs/arc.c:9445 9445ASSERT3U(dev->l2ad_hand, <=, dev->l2ad_evict); (kgdb) p dev $1 = (l2arc_dev_t *) 0xfe023116a000 (kgdb) p dev->l2ad_hand $2 = 225142071296 (kgdb) p dev->l2ad_evict $3 = 225142063104 (kgdb) p *dev value of type `l2arc_dev_t' requires 66136 bytes, which is more than max-value-size Never seen kgdb not being able to print a structure that reported to be too big. -- Alexander Motin
Re: Error building kernel in current
On Fri, 2 Jun 2023, at 4:30 PM, Gary Jennejohn wrote: > On Fri, 2 Jun 2023 09:59:40 + > Gary Jennejohn wrote: > > > On Fri, 2 Jun 2023 09:56:44 + > > Gary Jennejohn wrote: > > > > > Error building kernel in current: > > > > > > -- > > > >>> stage 3.1: building everything > > > -- > > > /usr/src/sys/netlink/route/iface.c:1315:22: error: use of undeclared > > > identifier 'if_flags' > > > if (error == 0 && !(if_flags & IFF_UP) && (if_getflags(ifp) & > > > IFF_UP)) > > > ^ > > > 1 error generated. > > > --- iface.o --- > > > *** [iface.o] Error code 1 Sorry for the breakage, I’ll fix it in a couple of hours. > > > > > > My source tree was updated just a few minutes ago and I didn't see any > > > recent changes to iface.c. > > > > > > I have WITHOUT_NETLINK_SUPPORT= in my src.conf. > > > > > > > Ah, my error. The failure occurs while building the kernel, so I fixed > > Subject accordingly. > > > > OK, this is another INET6 error. I don't have INET6 enabled. > > At line 1280 we have: > #ifdef INET6 > int if_flags = if_getflags(ifp); > #endif > > and if_flags is used at line 1315 without checking whether INET6 is > defined. > > if_flags seems to be totally redundant, since the code at line 1315 will > invoke if_getflags(ifp) if !(if_flags & IFF_UP) is true. I wish it was true. The case here is that interface flags can change after adding the address, as many interface drivers silently bring the interface up upon the first address addition. Please see https://cgit.freebsd.org/src/commit/sys/netinet6?id=a77facd27368f618520d25391cfce11149879a41 description for a more detailed explanation. > > -- > Gary Jennejohn > > /Alexander
Re: Surprise null root password
Quoting bob prohaska (from Tue, 30 May 2023 08:36:21 -0700): I suggest to review changes ("df" instead of "tf" in etcupdate) to at least those files which you know you have modified, including the password/group stuff. After that you can decide if the diff which is shown with "df" can be applied ("tf"), or if you want to keep the old version ("mf"), or if you want to modify the current file ("e", with both versions present in the file so that you can copy/paste between the different versions and keep what you need). The key sequences required to copy and paste between files in the edit screen were elusive. Probably it was thought self-evident, but not for me. I last tried it long ago, via mergemaster. Is there is a guide to commands for merging files using /etcupdate? Is it in the vi man page? I couldn't find it. etcupdate respects the EDITOR env-variable. You can use any editor you like there. Typically I use the mouse to copy myself and google every time I can't (https://linuxize.com/post/how-to-copy-cut-paste-in-vim/). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgp4RaBvVQJjb.pgp Description: Digitale PGP-Signatur
Re: Surprise null root password
Quoting bob prohaska (from Fri, 26 May 2023 16:26:06 -0700): On Fri, May 26, 2023 at 10:55:49PM +0200, Yuri wrote: The question is how you update the configuration files, mergemaster/etcupdate/something else? Via etcupdate after installworld. In the event the system requests manual intervention I accept "theirs all". It seems odd if that can null a root password. Still, it does seem an outside possibility. I could see it adding system users, but messing with root's existing password seems a bit unexpected. As you are posting to -current@, I expect you to report this issue about 14-current systems. As such: there was a "recent" change (2021-10-20) to the root entry to change the shell. https://cgit.freebsd.org/src/commit/etc/master.passwd?id=d410b585b6f00a26c2de7724d6576a3ea7d548b7 By blindly accepting all changes, this has reset the PW to the default setting (empty). I suggest to review changes ("df" instead of "tf" in etcupdate) to at least those files which you know you have modified, including the password/group stuff. After that you can decide if the diff which is shown with "df" can be applied ("tf"), or if you want to keep the old version ("mf"), or if you want to modify the current file ("e", with both versions present in the file so that you can copy/paste between the different versions and keep what you need). Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpGEjDP92h3s.pgp Description: Digitale PGP-Signatur
Re: builworld fails due to error in af_inet.c
Sorry for the breakage (and thanks for markj@ for the prompt fix) > On 22 May 2023, at 16:00, Gary Jennejohn wrote: > > I just ran buildworld using the latest current source. > > It dies due to this error in line 385 of /usr/src/sbin/ifconfig/af_inet.c: > > static void > warn_nomask(ifflags) > > The compiler really doesn't like not seeing a type for ifflags and bails > out as the result. > > Strangely enough, in_proc() a few lines later clearly has int ifflags in > its list of variables. > > Setting ifflags to int in warn_nomask() fixes the build. > > Wasn't this compile tested before it was committed? It was & it didn’t yell on my setup. > > -- > Gary Jennejohn >
Re: change in compat/linux breaking net/citrix_ica
Quoting Jakob Alvermark (from Wed, 26 Apr 2023 09:01:00 +0200): Hi, I use net/citrix_ica for work. After a recent change to -current in compat/linux it no longer works. The binary just segfaults. What does "sysctl compat.linux.osrelease" display? If it is not 2.6.30 or higher, try to set it to 2.6.30 or higher. Bye, Alexander. I have bisected and it happened after this commit: commit 40c36c4674eb9602709cf9d0483a4f34ad9753f6 Author: Dmitry Chagin Date: Sat Apr 22 22:17:17 2023 +0300 linux(4): Export the AT_RANDOM depending on the process osreldata AT_RANDOM has appeared in the 2.6.30 Linux kernel first time. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpvwszGFGPAo.pgp Description: Digitale PGP-Signatur
Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75
Quoting Mark Millard (from Wed, 12 Apr 2023 22:28:13 -0700): A fair number of errors are of the form: the build installing a previously built package for use in the builder but later the builder can not find some file from the package's installation. As a data point, last year I had such issues with one particular package. It was consistent no matter how often I was updating the ports tree. Poudriere always failed on port X which was depending on port Y (don't remember the names). The problem was, that port Y was build successfully but an extract of it was not having a file it was supposed to have. IIRC I fixed the issue by building the port Y manually, as re-building port Y with poudriere didn't change the outcome. So it seems this may not be specific to the most recent ZFS version, but could be an older issue. It may be the case that the more recent ZFS version amplifies the problem. It can also be that it is related to a specific use case in poudriere. I remember a recent mail which talks about poudriere failing to copy files in resource-limited environments, see https://lists.freebsd.org/archives/dev-commits-src-all/2023-April/025153.html While the issue you are trying to pin-point may not be related to this discussion, I mention it because it smells to me like we could be in a situation where a similar combination of unrelated to each other FreeBSD features could form a combination which triggers the issue at hand. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpjoaPNf5aAM.pgp Description: Digitale PGP-Signatur
Re: /usr/src/sys/netlink/route/iface.c:738:1: warning: unused function
> On 8 Apr 2023, at 20:21, Gary Jennejohn wrote: > > This isn't a fatal error, but it would be easy to fix: > > /usr/src/sys/netlink/route/iface.c:738:1: warning: unused function > 'inet6_get_plen' [-Wunused-function] > inet6_get_plen(const struct in6_addr *addr) > ^ > 1 warning generated. > > This function is called in get_sa_plen(const struct sockaddr *sa) and the > call is done inside #ifdef INET6...#endif, whereas the implementation is > NOT inside #ifdef INET6...#endif, as it should be. Thanks for the report, should be fixed by 39c0036d881b. > > I do not have INET6 in my kernel config file. > > -- > Gary Jennejohn >