Re: git: 4a864f624a70 - main - vm_pageout: Print a more accurate message to the console before an OOM kill [MFC in time for 13.1?]
On 2022-Jan-15, at 07:55, Mark Johnston wrote: > On Fri, Jan 14, 2022 at 09:38:56PM -0800, Mark Millard wrote: >> Thanks. This will allow me to remove part of my personal additions >> in this area --and my having to explain the misnomer when trying >> to help someone analyze why they end up with OOM activity so they >> can figure out what to do about it. >> >> There seem to be two separate sources of VM_OOM_SWAPZ. Showing >> my personal additions for them (just making them explicit in the >> sequence of messages generated): >> >> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c >> index 01cf9233329f..280621ca51be 100644 >> --- a/sys/vm/swap_pager.c >> +++ b/sys/vm/swap_pager.c >> @@ -2091,6 +2091,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t >> pindex, daddr_t swapblk) >>0, 1)) >>printf("swap blk zone exhausted, " >>"increase kern.maxswzone\n"); >> + printf("swp_pager_meta_build: swap blk uma >> zone exhausted\n"); >>vm_pageout_oom(VM_OOM_SWAPZ); >>pause("swzonxb", 10); >>} else >> @@ -2121,6 +2122,7 @@ swp_pager_meta_build(vm_object_t object, vm_pindex_t >> pindex, daddr_t swapblk) >>0, 1)) >>printf("swap pctrie zone exhausted, " >>"increase kern.maxswzone\n"); >> + printf("swp_pager_meta_build: swap pctrie >> uma zone exhausted\n"); >>vm_pageout_oom(VM_OOM_SWAPZ); >>pause("swzonxp", 10); >>} else >> >> Care to comment on the distinctions and why there are two >> contexts classified as "out of swap space"? Would either >> one show the swap space as (nearly?) all used in, say, top? >> Or might one of them still end up looking like a misnomer >> from just a top (or whatever) display? > > Hmm, those cases should likely be changed from "out of swap space" to > "failed to allocate swap metadata" or something like that. The above does not seem to have happened yet in main [so: 14]. Will 13.1 get an MFC of 4a864f624a70 in time, possibly with the above change also in place to fully avoid misnomer reporting that misleads folks? 4a864f624a70 listed: MFC after: 2 weeks but it has been more than a month. > . . . > === Mark Millard marklmi at yahoo.com
Re: ZFS PANIC: HELP.
On 02/26/2022 10:57 am, Larry Rosenman wrote: On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done I’d put, like: echo ${FS} before “sudo zfs send”, to get at least a bit of a clue on where it can get to. otis — Juraj Lutter o...@freebsd.org I just looked at the destination to see where it died (it did!) and I bectl destroy'd the BE that crashed it, and am running a new scrub -- we'll see whether that was sufficient. Thanks, all! Well, it was NOT sufficient More zfs export fun to come :( -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: ZFS PANIC: HELP.
On 02/26/2022 10:37 am, Juraj Lutter wrote: On 26 Feb 2022, at 03:03, Larry Rosenman wrote: I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done I’d put, like: echo ${FS} before “sudo zfs send”, to get at least a bit of a clue on where it can get to. otis — Juraj Lutter o...@freebsd.org I just looked at the destination to see where it died (it did!) and I bectl destroy'd the BE that crashed it, and am running a new scrub -- we'll see whether that was sufficient. Thanks, all! -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 214-642-9640 E-Mail: l...@lerctr.org US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
Re: Build faulure of editors/libreoffice only on src main (stable/13 is OK)
Am 26.02.22 um 14:14 schrieb Tomoaki AOKI: Thanks. But unfortunately, as I've described at Comment 21 [2] of Bug 262008, setting kern.elf64.aslr.enable=0 didn't help. As I'm building on amd64 and not built for compat32, I've not touched kern.elf32.aslr.enable. And as these are regular writable sysctl (and also are tunables, too), setting these in /boot/loader.conf and reboot before build is not tested. I just tried building _after a reboot_ whith kern.elf64.aslr.enable=0 on recent CURRENT and it doesn't work for me. 14.0-CURRENT #0 main-n253393-2bfdc1ee9b1 amd64 Best wishes, Rainer Should I set more sysctl's? I thought setting above actually disable all aslr related features (for 64bit), regardless its 1 ro 0. Error messages (with "MAKE_JOBS_UNSAFE=yes") and backtraces are described at Comment 20 [3]. [2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008#c21 [3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008#c20 On Sat, 26 Feb 2022 13:29:26 +0100 Michael Gmelin wrote: Maybe it’s related to ASLR? (or is it also enabled in 13/stable?) On 26. Feb 2022, at 13:05, Tomoaki AOKI wrote: 〓(Re-sent as not yet delivered in more than 5 hours) Hi. I have a build failure of editors/libreoffice on src main, amd64. As I've reported on Bug 262008 [1], problems on stable/13 is already fixed, but still fails on main with different faulure mode. A tool gengal.bin, built within whole libreoffice build, coredumps but it went OK on stable/13. Port options are now default on both main and stable/13. I now come to suspect the differences about toolchains within main and stable/13, but as editors/libreoffivce is giant and this failure happenes almost at the end of build, usual bisecting is not realistic. (Would require tens of weekends, maybe.) Any thoughts? Or am I missing something to check for? Regards. [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008 -- Tomoaki AOKI
Re: ZFS PANIC: HELP.
Quoting Larry Rosenman (from Fri, 25 Feb 2022 20:03:51 -0600): On 02/25/2022 2:11 am, Alexander Leidinger wrote: Quoting Larry Rosenman (from Thu, 24 Feb 2022 20:19:45 -0600): I tried a scrub -- it panic'd on a fatal double fault. Suggestions? The safest / cleanest (but not fastest) is data export and pool re-creation. If you export dataset by dataset (instead of recursively all), you can even see which dataset is causing the issue. In case this per dataset export narrows down the issue and it is a dataset you don't care about (as in: 1) no issue to recreate from scratch or 2) there is a backup available) you could delete this (or each such) dataset and re-create it in-place (= not re-creating the entire pool). Bye, Alexander. http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 0x8F31830F9F2772BF I'm running this script: #!/bin/sh for i in $(zfs list -H | awk '{print $1}') do FS=$1 FN=$(echo ${FS} | sed -e s@/@_@g) sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh l...@freenas.lerctr.org cat - \> $FN done How will I know a "Problem" dataset? You told a scrub is panicing the system. A scrub only touches occupied blocks. As such a problem-dataset should panic your system. If it doesn't panic at all, the problem may be within a snapshot which contains data which is deleted in later versions of the dataset. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF pgpYqsU391ZUr.pgp Description: Digitale PGP-Signatur
Re: Build faulure of editors/libreoffice only on src main (stable/13 is OK)
Thanks. But unfortunately, as I've described at Comment 21 [2] of Bug 262008, setting kern.elf64.aslr.enable=0 didn't help. As I'm building on amd64 and not built for compat32, I've not touched kern.elf32.aslr.enable. And as these are regular writable sysctl (and also are tunables, too), setting these in /boot/loader.conf and reboot before build is not tested. Should I set more sysctl's? I thought setting above actually disable all aslr related features (for 64bit), regardless its 1 ro 0. Error messages (with "MAKE_JOBS_UNSAFE=yes") and backtraces are described at Comment 20 [3]. [2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008#c21 [3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008#c20 On Sat, 26 Feb 2022 13:29:26 +0100 Michael Gmelin wrote: > Maybe it’s related to ASLR? (or is it also enabled in 13/stable?) > > > On 26. Feb 2022, at 13:05, Tomoaki AOKI wrote: > > > > 〓(Re-sent as not yet delivered in more than 5 hours) > > > > Hi. > > > > I have a build failure of editors/libreoffice on src main, amd64. > > As I've reported on Bug 262008 [1], problems on stable/13 is already > > fixed, but still fails on main with different faulure mode. > > > > A tool gengal.bin, built within whole libreoffice build, coredumps but > > it went OK on stable/13. > > > > Port options are now default on both main and stable/13. > > > > I now come to suspect the differences about toolchains within main and > > stable/13, but as editors/libreoffivce is giant and this failure > > happenes almost at the end of build, usual bisecting is not realistic. > > (Would require tens of weekends, maybe.) > > > > Any thoughts? Or am I missing something to check for? > > > > Regards. > > > > > > [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008 > > > > -- > > Tomoaki AOKI > > -- 青木 知明 [Tomoaki AOKI]
Build faulure of editors/libreoffice only on src main (stable/13 is OK)
(Re-sent as not yet delivered in more than 5 hours) Hi. I have a build failure of editors/libreoffice on src main, amd64. As I've reported on Bug 262008 [1], problems on stable/13 is already fixed, but still fails on main with different faulure mode. A tool gengal.bin, built within whole libreoffice build, coredumps but it went OK on stable/13. Port options are now default on both main and stable/13. I now come to suspect the differences about toolchains within main and stable/13, but as editors/libreoffivce is giant and this failure happenes almost at the end of build, usual bisecting is not realistic. (Would require tens of weekends, maybe.) Any thoughts? Or am I missing something to check for? Regards. [1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262008 -- Tomoaki AOKI