Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
On Aug 24, 2023, at 00:22, Mark Millard wrote: > On Aug 23, 2023, at 22:54, Mateusz Guzik wrote: > >> On 8/24/23, Mark Millard wrote: >>> On Aug 23, 2023, at 15:10, Mateusz Guzik wrote: >>> On 8/23/23, Mark Millard wrote: > [Forked off the ZFS deadlock 14 discussion, per feedback.] > . . . This is a known problem, but it is unclear if you should be running into it in this setup. >>> >>> The changed fixed the issue: so I do run into the the issue >>> for this setup. See below. >>> Can you try again but this time *revert* 138a5dafba312ff39ce0eefdbe34de95519e600d, like so: git revert 138a5dafba312ff39ce0eefdbe34de95519e600d may want to switch to a different branch first, for example: git checkout -b vfstesting >>> >>> # git -C /usr/main-src/ diff sys/kern/vfs_subr.c >>> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c >>> index 0f3f00abfd4a..5dff556ac258 100644 >>> --- a/sys/kern/vfs_subr.c >>> +++ b/sys/kern/vfs_subr.c >>> @@ -3528,25 +3528,17 @@ vdbatch_process(struct vdbatch *vd) >>> MPASS(curthread->td_pinned > 0); >>> MPASS(vd->index == VDBATCH_SIZE); >>> + mtx_lock(&vnode_list_mtx); >>> critical_enter(); >>> - if (mtx_trylock(&vnode_list_mtx)) { >>> - for (i = 0; i < VDBATCH_SIZE; i++) { >>> - vp = vd->tab[i]; >>> - vd->tab[i] = NULL; >>> - TAILQ_REMOVE(&vnode_list, vp, v_vnodelist); >>> - TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist); >>> - MPASS(vp->v_dbatchcpu != NOCPU); >>> - vp->v_dbatchcpu = NOCPU; >>> - } >>> - mtx_unlock(&vnode_list_mtx); >>> - } else { >>> - for (i = 0; i < VDBATCH_SIZE; i++) { >>> - vp = vd->tab[i]; >>> - vd->tab[i] = NULL; >>> - MPASS(vp->v_dbatchcpu != NOCPU); >>> - vp->v_dbatchcpu = NOCPU; >>> - } >>> + for (i = 0; i < VDBATCH_SIZE; i++) { >>> + vp = vd->tab[i]; >>> + TAILQ_REMOVE(&vnode_list, vp, v_vnodelist); >>> + TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist); >>> + MPASS(vp->v_dbatchcpu != NOCPU); >>> + vp->v_dbatchcpu = NOCPU; >>> } >>> + mtx_unlock(&vnode_list_mtx); >>> + bzero(vd->tab, sizeof(vd->tab)); >>> vd->index = 0; >>> critical_exit(); >>> } >>> >>> Still with: >>> >>> # grep USE_TMPFS= /usr/local/etc/poudriere.conf >>> # EXAMPLE: USE_TMPFS="wrkdir data" >>> #USE_TMPFS=all >>> #USE_TMPFS="data" >>> USE_TMPFS=no >>> >>> >>> That allowed the other builders to eventually reach "Builder started" >>> and later activity, [00:05:50] [27] [00:02:29] Builder started >>> being the first non-[01] to do so, no vlruwk's observed in what >>> I saw in top: >>> >>> . . . >>> >>> Now testing for the zfs deadlock issue should be possible for >>> this setup. >>> >> >> Thanks for testing, I wrote a fix: >> >> https://people.freebsd.org/~mjg/vfs-recycle-fix.diff >> >> Applies to *stock* kernel (as in without the revert). > > I'm going to leave the deadlock test running for when > I sleep tonight. So it is going to be a while before > I get to testing this. $ work will likely happen first > as well. (No deadlock observed yet, by the way. 6+ hrs > and 3000+ ports built so far.) > > I can easily restore the sys/kern/vfs_subr.c to then > do normal 14.0-ALPHA2-ish based patching with: so not > a problem. Thanks. > I stopped the deadlock experiment, cleaned out the partial bulk -a, put back the modern sys/kern/vfs_subr.c , applied your patch, built, installed, rebooted, and started another bulk -a run. It made progress on all the builders to and past "Builder started": . . . [00:01:34] Building 34042 packages using up to 32 builders [00:01:34] Hit CTRL+t at any time to see build progress and stats [00:01:34] [01] [00:00:00] Builder starting [00:01:57] [01] [00:00:23] Builder started [00:01:57] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.4 [00:03:09] [01] [00:01:12] Finished ports-mgmt/pkg | pkg-1.20.4: Success [00:03:22] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1 [00:03:22] [02] [00:00:00] Builder starting [00:03:22] [03] [00:00:00] Builder starting [00:03:22] [04] [00:00:00] Builder starting [00:03:22] [05] [00:00:00] Builder starting [00:03:22] [06] [00:00:00] Builder starting [00:03:22] [07] [00:00:00] Builder starting [00:03:22] [08] [00:00:00] Builder starting [00:03:22] [09] [00:00:00] Builder starting [00:03:22] [10] [00:00:00] Builder starting [00:03:22] [11] [00:00:00] Builder starting [00:03:22] [12] [00:00:00] Builder starting [00:03:22] [13] [00:00:00] Builder starting [00:03:22] [14] [00:00:00] Builder starting [00:03:22] [15] [00:00:00] Builder starting [00:03:22] [16] [00:00:00] Builder starting [00:03:22] [17] [00:00:00] Builder s
Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
On Aug 23, 2023, at 22:54, Mateusz Guzik wrote: > On 8/24/23, Mark Millard wrote: >> On Aug 23, 2023, at 15:10, Mateusz Guzik wrote: >> >>> On 8/23/23, Mark Millard wrote: [Forked off the ZFS deadlock 14 discussion, per feedback.] . . . >>> >>> This is a known problem, but it is unclear if you should be running >>> into it in this setup. >> >> The changed fixed the issue: so I do run into the the issue >> for this setup. See below. >> >>> Can you try again but this time *revert* >>> 138a5dafba312ff39ce0eefdbe34de95519e600d, like so: >>> git revert 138a5dafba312ff39ce0eefdbe34de95519e600d >>> >>> may want to switch to a different branch first, for example: git >>> checkout -b vfstesting >> >> # git -C /usr/main-src/ diff sys/kern/vfs_subr.c >> diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c >> index 0f3f00abfd4a..5dff556ac258 100644 >> --- a/sys/kern/vfs_subr.c >> +++ b/sys/kern/vfs_subr.c >> @@ -3528,25 +3528,17 @@ vdbatch_process(struct vdbatch *vd) >>MPASS(curthread->td_pinned > 0); >>MPASS(vd->index == VDBATCH_SIZE); >> + mtx_lock(&vnode_list_mtx); >>critical_enter(); >> - if (mtx_trylock(&vnode_list_mtx)) { >> - for (i = 0; i < VDBATCH_SIZE; i++) { >> - vp = vd->tab[i]; >> - vd->tab[i] = NULL; >> - TAILQ_REMOVE(&vnode_list, vp, v_vnodelist); >> - TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist); >> - MPASS(vp->v_dbatchcpu != NOCPU); >> - vp->v_dbatchcpu = NOCPU; >> - } >> - mtx_unlock(&vnode_list_mtx); >> - } else { >> - for (i = 0; i < VDBATCH_SIZE; i++) { >> - vp = vd->tab[i]; >> - vd->tab[i] = NULL; >> - MPASS(vp->v_dbatchcpu != NOCPU); >> - vp->v_dbatchcpu = NOCPU; >> - } >> + for (i = 0; i < VDBATCH_SIZE; i++) { >> + vp = vd->tab[i]; >> + TAILQ_REMOVE(&vnode_list, vp, v_vnodelist); >> + TAILQ_INSERT_TAIL(&vnode_list, vp, v_vnodelist); >> + MPASS(vp->v_dbatchcpu != NOCPU); >> + vp->v_dbatchcpu = NOCPU; >>} >> + mtx_unlock(&vnode_list_mtx); >> + bzero(vd->tab, sizeof(vd->tab)); >>vd->index = 0; >>critical_exit(); >> } >> >> Still with: >> >> # grep USE_TMPFS= /usr/local/etc/poudriere.conf >> # EXAMPLE: USE_TMPFS="wrkdir data" >> #USE_TMPFS=all >> #USE_TMPFS="data" >> USE_TMPFS=no >> >> >> That allowed the other builders to eventually reach "Builder started" >> and later activity, [00:05:50] [27] [00:02:29] Builder started >> being the first non-[01] to do so, no vlruwk's observed in what >> I saw in top: >> >> . . . >> >> Now testing for the zfs deadlock issue should be possible for >> this setup. >> > > Thanks for testing, I wrote a fix: > > https://people.freebsd.org/~mjg/vfs-recycle-fix.diff > > Applies to *stock* kernel (as in without the revert). I'm going to leave the deadlock test running for when I sleep tonight. So it is going to be a while before I get to testing this. $ work will likely happen first as well. (No deadlock observed yet, by the way. 6+ hrs and 3000+ ports built so far.) I can easily restore the sys/kern/vfs_subr.c to then do normal 14.0-ALPHA2-ish based patching with: so not a problem. Thanks. === Mark Millard marklmi at yahoo.com
Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
On 8/24/23, Mark Millard wrote: > On Aug 23, 2023, at 15:10, Mateusz Guzik wrote: > >> On 8/23/23, Mark Millard wrote: >>> [Forked off the ZFS deadlock 14 discussion, per feedback.] >>> >>> On Aug 23, 2023, at 11:40, Alexander Motin wrote: >>> On 22.08.2023 14:24, Mark Millard wrote: > Alexander Motin wrote on > Date: Tue, 22 Aug 2023 16:18:12 UTC : >> I am waiting for final test results from George Wilson and then will >> request quick merge of both to zfs-2.2-release branch. Unfortunately >> there are still not many reviewers for the PR, since the code is not >> trivial, but at least with the test reports Brian Behlendorf and Mark >> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else >> have tested and/or reviewed the PR, you may comment on it. > I had written to the list that when I tried to test the system > doing poudriere builds (initially with your patches) using > USE_TMPFS=no so that zfs had to deal with all the file I/O, I > instead got only one builder that ended up active, the others > never reaching "Builder started": > Top was showing lots of "vlruwk" for the cpdup's. For example: > . . . > 362 0 root 400 27076Ki 13776Ki CPU19 19 4:23 > 0.00% cpdup -i0 -o ref 32 > 349 0 root 530 27076Ki 13776Ki vlruwk 22 4:20 > 0.01% cpdup -i0 -o ref 31 > 328 0 root 680 27076Ki 13804Ki vlruwk 8 4:30 > 0.01% cpdup -i0 -o ref 30 > 304 0 root 370 27076Ki 13792Ki vlruwk 6 4:18 > 0.01% cpdup -i0 -o ref 29 > 282 0 root 420 33220Ki 13956Ki vlruwk 8 4:33 > 0.01% cpdup -i0 -o ref 28 > 242 0 root 560 27076Ki 13796Ki vlruwk 4 4:28 > 0.00% cpdup -i0 -o ref 27 > . . . > But those processes did show CPU?? on occasion, as well as > *vnode less often. None of the cpdup's was stuck in > Removing your patches did not change the behavior. Mark, to me "vlruwk" looks like a limit on number of vnodes. I was not deep in that area at least recently, so somebody with more experience there could try to diagnose it. At very least it does not look related to the ZIL issue discussed in this thread, at least with the information provided, so I am not surprised that the mentioned patches do not affect it. >>> >>> I did the above intending to test the deadlock in my context but >>> ended up not getting that far when I tried to make zfs handle all >>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like). >>> >>> The zfs context is a simple single partition on the boot media. I >>> use ZFS for bectl BE use, not for other typical reasons. The media >>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper >>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of >>> swap, also on that Optane. >>> >>> # uname -apKU >>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 >>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023 >>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG >>> amd64 amd64 1400096 1400096 >>> >>> The GENERIC-DBG variant of the kernel did not report any issues in >>> earlier testing. >>> >>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was >>> installed from the same build. >>> >>> # zfs list >>> NAME USED AVAIL REFER >>> MOUNTPOINT >>> zoptb79.9G 765G96K /zoptb >>> zoptb/BUILDs 20.5G 765G 8.29M >>> /usr/obj/BUILDs >>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M 765G 1.86M >>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt >>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt 30.2M 765G 30.2M >>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt >>> zoptb/BUILDs/main-amd64-dbg-clang9.96G 765G 9.96G >>> /usr/obj/BUILDs/main-amd64-dbg-clang >>> zoptb/BUILDs/main-amd64-dbg-gccxtc 38.5M 765G 38.5M >>> /usr/obj/BUILDs/main-amd64-dbg-gccxtc >>> zoptb/BUILDs/main-amd64-nodbg-clang 10.3G 765G 10.3G >>> /usr/obj/BUILDs/main-amd64-nodbg-clang >>> zoptb/BUILDs/main-amd64-nodbg-clang-alt 37.2M 765G 37.2M >>> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt >>> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M 765G 94.6M >>> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc >>> zoptb/DESTDIRs 4.33G 765G 104K >>> /usr/obj/DESTDIRs >>> zoptb/DESTDIRs/main-amd64-poud 2.16G 765G 2.16G >>> /usr/obj/DESTDIRs/main-amd64-poud >>> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G 765G 2.16G >>> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a >>> zoptb/ROOT 13.1G 765G96K none >>> zoptb/ROOT/build_area_for-main-amd64 5.03G 765G 3.24G
Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
On Aug 23, 2023, at 18:14, Mark Millard wrote: > On Aug 23, 2023, at 15:10, Mateusz Guzik wrote: > >> On 8/23/23, Mark Millard wrote: >>> [Forked off the ZFS deadlock 14 discussion, per feedback.] >>> >>> On Aug 23, 2023, at 11:40, Alexander Motin wrote: >>> On 22.08.2023 14:24, Mark Millard wrote: > Alexander Motin wrote on > Date: Tue, 22 Aug 2023 16:18:12 UTC : >> I am waiting for final test results from George Wilson and then will >> request quick merge of both to zfs-2.2-release branch. Unfortunately >> there are still not many reviewers for the PR, since the code is not >> trivial, but at least with the test reports Brian Behlendorf and Mark >> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else >> have tested and/or reviewed the PR, you may comment on it. > I had written to the list that when I tried to test the system > doing poudriere builds (initially with your patches) using > USE_TMPFS=no so that zfs had to deal with all the file I/O, I > instead got only one builder that ended up active, the others > never reaching "Builder started": > Top was showing lots of "vlruwk" for the cpdup's. For example: > . . . > 362 0 root 400 27076Ki 13776Ki CPU19 19 4:23 > 0.00% cpdup -i0 -o ref 32 > 349 0 root 530 27076Ki 13776Ki vlruwk 22 4:20 > 0.01% cpdup -i0 -o ref 31 > 328 0 root 680 27076Ki 13804Ki vlruwk 8 4:30 > 0.01% cpdup -i0 -o ref 30 > 304 0 root 370 27076Ki 13792Ki vlruwk 6 4:18 > 0.01% cpdup -i0 -o ref 29 > 282 0 root 420 33220Ki 13956Ki vlruwk 8 4:33 > 0.01% cpdup -i0 -o ref 28 > 242 0 root 560 27076Ki 13796Ki vlruwk 4 4:28 > 0.00% cpdup -i0 -o ref 27 > . . . > But those processes did show CPU?? on occasion, as well as > *vnode less often. None of the cpdup's was stuck in > Removing your patches did not change the behavior. Mark, to me "vlruwk" looks like a limit on number of vnodes. I was not deep in that area at least recently, so somebody with more experience there could try to diagnose it. At very least it does not look related to the ZIL issue discussed in this thread, at least with the information provided, so I am not surprised that the mentioned patches do not affect it. >>> >>> I did the above intending to test the deadlock in my context but >>> ended up not getting that far when I tried to make zfs handle all >>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like). >>> >>> The zfs context is a simple single partition on the boot media. I >>> use ZFS for bectl BE use, not for other typical reasons. The media >>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper >>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of >>> swap, also on that Optane. >>> >>> # uname -apKU >>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 >>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023 >>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG >>> amd64 amd64 1400096 1400096 >>> >>> The GENERIC-DBG variant of the kernel did not report any issues in >>> earlier testing. >>> >>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was >>> installed from the same build. >>> >>> # zfs list >>> NAME USED AVAIL REFER >>> MOUNTPOINT >>> zoptb79.9G 765G96K /zoptb >>> zoptb/BUILDs 20.5G 765G 8.29M >>> /usr/obj/BUILDs >>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M 765G 1.86M >>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt >>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt 30.2M 765G 30.2M >>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt >>> zoptb/BUILDs/main-amd64-dbg-clang9.96G 765G 9.96G >>> /usr/obj/BUILDs/main-amd64-dbg-clang >>> zoptb/BUILDs/main-amd64-dbg-gccxtc 38.5M 765G 38.5M >>> /usr/obj/BUILDs/main-amd64-dbg-gccxtc >>> zoptb/BUILDs/main-amd64-nodbg-clang 10.3G 765G 10.3G >>> /usr/obj/BUILDs/main-amd64-nodbg-clang >>> zoptb/BUILDs/main-amd64-nodbg-clang-alt 37.2M 765G 37.2M >>> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt >>> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M 765G 94.6M >>> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc >>> zoptb/DESTDIRs 4.33G 765G 104K >>> /usr/obj/DESTDIRs >>> zoptb/DESTDIRs/main-amd64-poud 2.16G 765G 2.16G >>> /usr/obj/DESTDIRs/main-amd64-poud >>> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G 765G 2.16G >>> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a >>> zoptb/ROOT 13.1G 765G96K none >>> zoptb/ROOT/build_area_for-main-amd64 5.
Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
On Aug 23, 2023, at 15:10, Mateusz Guzik wrote: > On 8/23/23, Mark Millard wrote: >> [Forked off the ZFS deadlock 14 discussion, per feedback.] >> >> On Aug 23, 2023, at 11:40, Alexander Motin wrote: >> >>> On 22.08.2023 14:24, Mark Millard wrote: Alexander Motin wrote on Date: Tue, 22 Aug 2023 16:18:12 UTC : > I am waiting for final test results from George Wilson and then will > request quick merge of both to zfs-2.2-release branch. Unfortunately > there are still not many reviewers for the PR, since the code is not > trivial, but at least with the test reports Brian Behlendorf and Mark > Maybee seem to be OK to merge the two PRs into 2.2. If somebody else > have tested and/or reviewed the PR, you may comment on it. I had written to the list that when I tried to test the system doing poudriere builds (initially with your patches) using USE_TMPFS=no so that zfs had to deal with all the file I/O, I instead got only one builder that ended up active, the others never reaching "Builder started": >>> Top was showing lots of "vlruwk" for the cpdup's. For example: . . . 362 0 root 400 27076Ki 13776Ki CPU19 19 4:23 0.00% cpdup -i0 -o ref 32 349 0 root 530 27076Ki 13776Ki vlruwk 22 4:20 0.01% cpdup -i0 -o ref 31 328 0 root 680 27076Ki 13804Ki vlruwk 8 4:30 0.01% cpdup -i0 -o ref 30 304 0 root 370 27076Ki 13792Ki vlruwk 6 4:18 0.01% cpdup -i0 -o ref 29 282 0 root 420 33220Ki 13956Ki vlruwk 8 4:33 0.01% cpdup -i0 -o ref 28 242 0 root 560 27076Ki 13796Ki vlruwk 4 4:28 0.00% cpdup -i0 -o ref 27 . . . But those processes did show CPU?? on occasion, as well as *vnode less often. None of the cpdup's was stuck in Removing your patches did not change the behavior. >>> >>> Mark, to me "vlruwk" looks like a limit on number of vnodes. I was not >>> deep in that area at least recently, so somebody with more experience >>> there could try to diagnose it. At very least it does not look related to >>> the ZIL issue discussed in this thread, at least with the information >>> provided, so I am not surprised that the mentioned patches do not affect >>> it. >> >> I did the above intending to test the deadlock in my context but >> ended up not getting that far when I tried to make zfs handle all >> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like). >> >> The zfs context is a simple single partition on the boot media. I >> use ZFS for bectl BE use, not for other typical reasons. The media >> here is PCIe Optane 1.4T media. The machine is a ThreadRipper >> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of >> swap, also on that Optane. >> >> # uname -apKU >> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 >> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023 >> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG >> amd64 amd64 1400096 1400096 >> >> The GENERIC-DBG variant of the kernel did not report any issues in >> earlier testing. >> >> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was >> installed from the same build. >> >> # zfs list >> NAME USED AVAIL REFER >> MOUNTPOINT >> zoptb79.9G 765G96K /zoptb >> zoptb/BUILDs 20.5G 765G 8.29M >> /usr/obj/BUILDs >> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M 765G 1.86M >> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt >> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt 30.2M 765G 30.2M >> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt >> zoptb/BUILDs/main-amd64-dbg-clang9.96G 765G 9.96G >> /usr/obj/BUILDs/main-amd64-dbg-clang >> zoptb/BUILDs/main-amd64-dbg-gccxtc 38.5M 765G 38.5M >> /usr/obj/BUILDs/main-amd64-dbg-gccxtc >> zoptb/BUILDs/main-amd64-nodbg-clang 10.3G 765G 10.3G >> /usr/obj/BUILDs/main-amd64-nodbg-clang >> zoptb/BUILDs/main-amd64-nodbg-clang-alt 37.2M 765G 37.2M >> /usr/obj/BUILDs/main-amd64-nodbg-clang-alt >> zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M 765G 94.6M >> /usr/obj/BUILDs/main-amd64-nodbg-gccxtc >> zoptb/DESTDIRs 4.33G 765G 104K >> /usr/obj/DESTDIRs >> zoptb/DESTDIRs/main-amd64-poud 2.16G 765G 2.16G >> /usr/obj/DESTDIRs/main-amd64-poud >> zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G 765G 2.16G >> /usr/obj/DESTDIRs/main-amd64-poud-bulk_a >> zoptb/ROOT 13.1G 765G96K none >> zoptb/ROOT/build_area_for-main-amd64 5.03G 765G 3.24G none >> zoptb/ROOT/main-amd648.04G 765G 3.23G none >> zoptb/poudriere 6.5
Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
On 8/23/23, Mark Millard wrote: > [Forked off the ZFS deadlock 14 discussion, per feedback.] > > On Aug 23, 2023, at 11:40, Alexander Motin wrote: > >> On 22.08.2023 14:24, Mark Millard wrote: >>> Alexander Motin wrote on >>> Date: Tue, 22 Aug 2023 16:18:12 UTC : I am waiting for final test results from George Wilson and then will request quick merge of both to zfs-2.2-release branch. Unfortunately there are still not many reviewers for the PR, since the code is not trivial, but at least with the test reports Brian Behlendorf and Mark Maybee seem to be OK to merge the two PRs into 2.2. If somebody else have tested and/or reviewed the PR, you may comment on it. >>> I had written to the list that when I tried to test the system >>> doing poudriere builds (initially with your patches) using >>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I >>> instead got only one builder that ended up active, the others >>> never reaching "Builder started": >> >>> Top was showing lots of "vlruwk" for the cpdup's. For example: >>> . . . >>> 362 0 root 400 27076Ki 13776Ki CPU19 19 4:23 >>> 0.00% cpdup -i0 -o ref 32 >>> 349 0 root 530 27076Ki 13776Ki vlruwk 22 4:20 >>> 0.01% cpdup -i0 -o ref 31 >>> 328 0 root 680 27076Ki 13804Ki vlruwk 8 4:30 >>> 0.01% cpdup -i0 -o ref 30 >>> 304 0 root 370 27076Ki 13792Ki vlruwk 6 4:18 >>> 0.01% cpdup -i0 -o ref 29 >>> 282 0 root 420 33220Ki 13956Ki vlruwk 8 4:33 >>> 0.01% cpdup -i0 -o ref 28 >>> 242 0 root 560 27076Ki 13796Ki vlruwk 4 4:28 >>> 0.00% cpdup -i0 -o ref 27 >>> . . . >>> But those processes did show CPU?? on occasion, as well as >>> *vnode less often. None of the cpdup's was stuck in >>> Removing your patches did not change the behavior. >> >> Mark, to me "vlruwk" looks like a limit on number of vnodes. I was not >> deep in that area at least recently, so somebody with more experience >> there could try to diagnose it. At very least it does not look related to >> the ZIL issue discussed in this thread, at least with the information >> provided, so I am not surprised that the mentioned patches do not affect >> it. > > I did the above intending to test the deadlock in my context but > ended up not getting that far when I tried to make zfs handle all > the file I/O (USE_TMPFS=no and no other use of tmpfs or the like). > > The zfs context is a simple single partition on the boot media. I > use ZFS for bectl BE use, not for other typical reasons. The media > here is PCIe Optane 1.4T media. The machine is a ThreadRipper > 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of > swap, also on that Optane. > > # uname -apKU > FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 > main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023 > root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG > amd64 amd64 1400096 1400096 > > The GENERIC-DBG variant of the kernel did not report any issues in > earlier testing. > > The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was > installed from the same build. > > # zfs list > NAME USED AVAIL REFER > MOUNTPOINT > zoptb79.9G 765G96K /zoptb > zoptb/BUILDs 20.5G 765G 8.29M > /usr/obj/BUILDs > zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M 765G 1.86M > /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt > zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt 30.2M 765G 30.2M > /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt > zoptb/BUILDs/main-amd64-dbg-clang9.96G 765G 9.96G > /usr/obj/BUILDs/main-amd64-dbg-clang > zoptb/BUILDs/main-amd64-dbg-gccxtc 38.5M 765G 38.5M > /usr/obj/BUILDs/main-amd64-dbg-gccxtc > zoptb/BUILDs/main-amd64-nodbg-clang 10.3G 765G 10.3G > /usr/obj/BUILDs/main-amd64-nodbg-clang > zoptb/BUILDs/main-amd64-nodbg-clang-alt 37.2M 765G 37.2M > /usr/obj/BUILDs/main-amd64-nodbg-clang-alt > zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M 765G 94.6M > /usr/obj/BUILDs/main-amd64-nodbg-gccxtc > zoptb/DESTDIRs 4.33G 765G 104K > /usr/obj/DESTDIRs > zoptb/DESTDIRs/main-amd64-poud 2.16G 765G 2.16G > /usr/obj/DESTDIRs/main-amd64-poud > zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G 765G 2.16G > /usr/obj/DESTDIRs/main-amd64-poud-bulk_a > zoptb/ROOT 13.1G 765G96K none > zoptb/ROOT/build_area_for-main-amd64 5.03G 765G 3.24G none > zoptb/ROOT/main-amd648.04G 765G 3.23G none > zoptb/poudriere 6.58G 765G 112K > /usr/local/poudriere > zoptb/poudriere/data 6.58G 765G 128K > /usr/local/poudriere/data > zoptb/poudriere/
poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder
[Forked off the ZFS deadlock 14 discussion, per feedback.] On Aug 23, 2023, at 11:40, Alexander Motin wrote: > On 22.08.2023 14:24, Mark Millard wrote: >> Alexander Motin wrote on >> Date: Tue, 22 Aug 2023 16:18:12 UTC : >>> I am waiting for final test results from George Wilson and then will >>> request quick merge of both to zfs-2.2-release branch. Unfortunately >>> there are still not many reviewers for the PR, since the code is not >>> trivial, but at least with the test reports Brian Behlendorf and Mark >>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else >>> have tested and/or reviewed the PR, you may comment on it. >> I had written to the list that when I tried to test the system >> doing poudriere builds (initially with your patches) using >> USE_TMPFS=no so that zfs had to deal with all the file I/O, I >> instead got only one builder that ended up active, the others >> never reaching "Builder started": > >> Top was showing lots of "vlruwk" for the cpdup's. For example: >> . . . >> 362 0 root 400 27076Ki 13776Ki CPU19 19 4:23 0.00% >> cpdup -i0 -o ref 32 >> 349 0 root 530 27076Ki 13776Ki vlruwk 22 4:20 0.01% >> cpdup -i0 -o ref 31 >> 328 0 root 680 27076Ki 13804Ki vlruwk 8 4:30 0.01% >> cpdup -i0 -o ref 30 >> 304 0 root 370 27076Ki 13792Ki vlruwk 6 4:18 0.01% >> cpdup -i0 -o ref 29 >> 282 0 root 420 33220Ki 13956Ki vlruwk 8 4:33 0.01% >> cpdup -i0 -o ref 28 >> 242 0 root 560 27076Ki 13796Ki vlruwk 4 4:28 0.00% >> cpdup -i0 -o ref 27 >> . . . >> But those processes did show CPU?? on occasion, as well as >> *vnode less often. None of the cpdup's was stuck in >> Removing your patches did not change the behavior. > > Mark, to me "vlruwk" looks like a limit on number of vnodes. I was not deep > in that area at least recently, so somebody with more experience there could > try to diagnose it. At very least it does not look related to the ZIL issue > discussed in this thread, at least with the information provided, so I am not > surprised that the mentioned patches do not affect it. I did the above intending to test the deadlock in my context but ended up not getting that far when I tried to make zfs handle all the file I/O (USE_TMPFS=no and no other use of tmpfs or the like). The zfs context is a simple single partition on the boot media. I use ZFS for bectl BE use, not for other typical reasons. The media here is PCIe Optane 1.4T media. The machine is a ThreadRipper 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of swap, also on that Optane. # uname -apKU FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112 main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023 root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG amd64 amd64 1400096 1400096 The GENERIC-DBG variant of the kernel did not report any issues in earlier testing. The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was installed from the same build. # zfs list NAME USED AVAIL REFER MOUNTPOINT zoptb79.9G 765G96K /zoptb zoptb/BUILDs 20.5G 765G 8.29M /usr/obj/BUILDs zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M 765G 1.86M /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt 30.2M 765G 30.2M /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt zoptb/BUILDs/main-amd64-dbg-clang9.96G 765G 9.96G /usr/obj/BUILDs/main-amd64-dbg-clang zoptb/BUILDs/main-amd64-dbg-gccxtc 38.5M 765G 38.5M /usr/obj/BUILDs/main-amd64-dbg-gccxtc zoptb/BUILDs/main-amd64-nodbg-clang 10.3G 765G 10.3G /usr/obj/BUILDs/main-amd64-nodbg-clang zoptb/BUILDs/main-amd64-nodbg-clang-alt 37.2M 765G 37.2M /usr/obj/BUILDs/main-amd64-nodbg-clang-alt zoptb/BUILDs/main-amd64-nodbg-gccxtc 94.6M 765G 94.6M /usr/obj/BUILDs/main-amd64-nodbg-gccxtc zoptb/DESTDIRs 4.33G 765G 104K /usr/obj/DESTDIRs zoptb/DESTDIRs/main-amd64-poud 2.16G 765G 2.16G /usr/obj/DESTDIRs/main-amd64-poud zoptb/DESTDIRs/main-amd64-poud-bulk_a2.16G 765G 2.16G /usr/obj/DESTDIRs/main-amd64-poud-bulk_a zoptb/ROOT 13.1G 765G96K none zoptb/ROOT/build_area_for-main-amd64 5.03G 765G 3.24G none zoptb/ROOT/main-amd648.04G 765G 3.23G none zoptb/poudriere 6.58G 765G 112K /usr/local/poudriere zoptb/poudriere/data 6.58G 765G 128K /usr/local/poudriere/data zoptb/poudriere/data/.m 112K 765G 112K /usr/local/poudriere/data/.m zoptb/poudriere/data/cache 17.4M