Re: ZFS problems since recently ?
On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote: > Please see/test: https://github.com/openzfs/zfs/pull/15732 . Looks like that has landed in current: commit f552d7adebb13e24f65276a6c4822bffeeac3993 Merge: 13720136fbf a382e21194c Author: Martin Matuska Date: Wed Jan 10 09:07:45 2024 +0100 zfs: merge openzfs/zfs@a382e2119 Notable upstream pull request merges: #15693 a382e2119 Add Gotify notification support to ZED --> #15732 e78aca3b3 Fix livelist assertions for dedup and cloning #15733 7ecaa0758 make zdb_decompress_block check decompression reliably #15735 255741fc9 Improve block sizes checks during cloning Obtained from: OpenZFS OpenZFS commit: a382e21194c1690951d2eee8ebd98bc096f01c83
Re: ZFS problems since recently ?
John, On 04.01.2024 09:20, John Kennedy wrote: On Tue, Jan 02, 2024 at 08:02:04PM -0800, John Kennedy wrote: On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote: On 01.01.2024 08:59, John Kennedy wrote: ... My poudriere build did eventually fail as well: ... [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success [05:40:24] Stopping 2 builders panic: VERIFY(BP_GET_DEDUP(bp)) failed Please see/test: https://github.com/openzfs/zfs/pull/15732 . It came back today at the end of my poudriere build. Your patch has fixed it, so far at least. At the risk of conflating this with other ZFS issues, I beat on the VM a lot more last night without triggering any panics. My usual busy-workload is a total kernel+world rebuild (with whatever pending patches might be out), then a poudriere run (~230 or so packages). It's weird that the first (much bigger) run worked but later ones didn't (where maybe I had one port that failed to build), triggering the panic. Seemed repeatable, but don't have a feel for the exact trigger like the sysctl issue. What is the panic you see now? It can not be the same, since the dedup assertion is no longer there. -- Alexander Motin
Re: ZFS problems since recently ?
On Tue, Jan 02, 2024 at 08:02:04PM -0800, John Kennedy wrote: > On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote: > > On 01.01.2024 08:59, John Kennedy wrote: > > > ... > > >My poudriere build did eventually fail as well: > > > ... > > > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > > > [05:40:24] Stopping 2 builders > > > panic: VERIFY(BP_GET_DEDUP(bp)) failed > > > > Please see/test: https://github.com/openzfs/zfs/pull/15732 . > > It came back today at the end of my poudriere build. Your patch has fixed > it, so far at least. At the risk of conflating this with other ZFS issues, I beat on the VM a lot more last night without triggering any panics. My usual busy-workload is a total kernel+world rebuild (with whatever pending patches might be out), then a poudriere run (~230 or so packages). It's weird that the first (much bigger) run worked but later ones didn't (where maybe I had one port that failed to build), triggering the panic. Seemed repeatable, but don't have a feel for the exact trigger like the sysctl issue.
Re: ZFS problems since recently ?
On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote: > On 01.01.2024 08:59, John Kennedy wrote: > > ... > >My poudriere build did eventually fail as well: > > ... > > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > > [05:40:24] Stopping 2 builders > > panic: VERIFY(BP_GET_DEDUP(bp)) failed > > Please see/test: https://github.com/openzfs/zfs/pull/15732 . It came back today at the end of my poudriere build. Your patch has fixed it, so far at least.
Re: ZFS problems since recently ?
On 01.01.2024 08:59, John Kennedy wrote: On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: markj@ pointed me in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 to https://github.com/openzfs/zfs/pull/15719 So it will probably be fixed sooner or later. The other ZFS crashes I've seen are still an issue. My poudriere build did eventually fail as well: ... [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success [05:40:24] Stopping 2 builders panic: VERIFY(BP_GET_DEDUP(bp)) failed Please see/test: https://github.com/openzfs/zfs/pull/15732 . -- Alexander Motin
Re: ZFS problems since recently ?
Am 2024-01-02 08:22, schrieb Kurt Jaeger: Hi! The sysctl for block cloning is vfs.zfs.bclone_enabled. To check if a pool has made use of block cloning: zpool get all poolname | grep bclone One more thing: I have two pools on that box, and one of them has some bclone files: # zpool get all ref | grep bclone ref bcloneused 21.8M - ref bclonesaved24.4M - ref bcloneratio2.12x - # zpool get all pou | grep bclone pou bcloneused 0 - pou bclonesaved0 - pou bcloneratio1.00x - The ref pool contains the system and some files. The pou pool is for poudriere only. How do I find which files on ref are bcloned and how can I remove the bcloning from them ? No idea about the detection (I don't expect an easy way), but the answer to the second part is to copy the files after disabling block cloning. As this is system stuff, I would expect it is not much data, and you could copy everything and then move back to the original place. I would also assume original log files are not affected, and only files which were copied (installworld or installkernel or backup files or manual copies or port install (not sure about pkg install)) are possible targets. Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: ZFS problems since recently ?
Hi! > The sysctl for block cloning is vfs.zfs.bclone_enabled. > To check if a pool has made use of block cloning: > zpool get all poolname | grep bclone One more thing: I have two pools on that box, and one of them has some bclone files: # zpool get all ref | grep bclone ref bcloneused 21.8M - ref bclonesaved24.4M - ref bcloneratio2.12x - # zpool get all pou | grep bclone pou bcloneused 0 - pou bclonesaved0 - pou bcloneratio1.00x - The ref pool contains the system and some files. The pou pool is for poudriere only. How do I find which files on ref are bcloned and how can I remove the bcloning from them ? -- p...@freebsd.org +49 171 3101372 Now what ?
Re: ZFS problems since recently ?
Hi! > Am 2023-12-31 19:34, schrieb Kurt Jaeger: > > I already have > > > > vfs.zfs.dmu_offset_next_sync=0 > > > > which is supposed to disable block-cloning. > > It isn't. This one is supposed to fix an issue which is unrelated to block > cloning (but can be amplified by block cloning). This issue is fixed since > some weeks, your Dec 23 build should not need it (when the issues happens, > you have files with zero as parts of the data instead of the real data, > and only if you copy files at the same time as those files are modified, > and then only if you happen to get the timing right). > > The sysctl for block cloning is vfs.zfs.bclone_enabled. > To check if a pool has made use of block cloning: > zpool get all poolname | grep bclone Thanks. I now used that sysctl and my testcase (testbuild of shells/bash) and it did not crash. -- p...@freebsd.org +49 171 3101372 Now what ?
Re: ZFS problems since recently ?
Am 2023-12-31 19:34, schrieb Kurt Jaeger: I already have vfs.zfs.dmu_offset_next_sync=0 which is supposed to disable block-cloning. It isn't. This one is supposed to fix an issue which is unrelated to block cloning (but can be amplified by block cloning). This issue is fixed since some weeks, your Dec 23 build should not need it (when the issues happens, you have files with zero as parts of the data instead of the real data, and only if you copy files at the same time as those files are modified, and then only if you happen to get the timing right). The sysctl for block cloning is vfs.zfs.bclone_enabled. To check if a pool has made use of block cloning: zpool get all poolname | grep bclone Bye, Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF http://www.FreeBSD.orgnetch...@freebsd.org : PGP 0x8F31830F9F2772BF signature.asc Description: OpenPGP digital signature
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote: > > On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > > > markj@ pointed me in > > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > > > to > > > https://github.com/openzfs/zfs/pull/15719 > > > > > > So it will probably be fixed sooner or later. > > > > > > The other ZFS crashes I've seen are still an issue. > > > > My poudriere build did eventually fail as well: > > ... > > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > > [05:40:24] Stopping 2 builders > > panic: VERIFY(BP_GET_DEDUP(bp)) failed > > That's one of the panic messages I had as well. > > See > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276051 > > for additional crashes and dumps. > > > I didn't tweak this system off defaults for block-cloning. I haven't > > been following > > that issue 100%. > > Do you have > vfs.zfs.dmu_offset_next_sync=0 > ? I reverted everything and reinstalled. The VERIFY(BP_GET_DEDUP(bp)) panic hasn't reoccurred (tended to happen on poudriere-build cleanup), which may lean it more towards corruption, or maybe I just haven't been "lucky" with my small random chance of corruption. I did set vfs.zfs.dmu_offset_next_sync=0 after the bsdinstall was complete (maybe I could have loaded the zfs kernel module from the shell and set it before things kicked off).
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 08:42:26AM -0800, John Kennedy wrote: > Applying the two ZFS kernel patches fixes that issue: commit 09af4bf2c987f6f57804162cef8aeee05575ad1d (zfs: Fix SPA sysctl handlers) landed too. root@bsd15:~ # sysctl -a | grep vfs.zfs.zio vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.zio.requeue_io_start_cut_in_line: 1 vfs.zfs.zio.slow_io_ms: 3 vfs.zfs.zio.taskq_wr_iss_ncpus: 0 vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5 vfs.zfs.zio.taskq_read: fixed,1,8 null scale null vfs.zfs.zio.taskq_batch_tpq: 0 vfs.zfs.zio.taskq_batch_pct: 80 vfs.zfs.zio.exclude_metadata: 0 root@bsd15:~ # uname -aUK FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #1 main-n267336-09af4bf2c98: Mon Jan 1 12:04:15 PST 2024 warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > > > I can crash mine with "sysctl -a" as well. > > markj@ pointed me in > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > to > https://github.com/openzfs/zfs/pull/15719 > > So it will probably be fixed sooner or later. > > The other ZFS crashes I've seen are still an issue. Applying the two ZFS kernel patches fixes that issue: root@bsd15:~ # sysctl -a | grep vfs.zfs.zio vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.zio.requeue_io_start_cut_in_line: 1 vfs.zfs.zio.slow_io_ms: 3 vfs.zfs.zio.taskq_wr_iss_ncpus: 0 vfs.zfs.zio.taskq_write: sync fixed,1,5 scale fixed,1,5 vfs.zfs.zio.taskq_read: fixed,1,8 null scale null vfs.zfs.zio.taskq_batch_tpq: 0 vfs.zfs.zio.taskq_batch_pct: 80 vfs.zfs.zio.exclude_metadata: 0 root@bsd15:~ # uname -aUK FreeBSD bsd15 15.0-CURRENT FreeBSD 15.0-CURRENT #2 main-n267335-499e84e16f5-dirty: Mon Jan 1 08:04:59 PST 2024 warlock@bsd15:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 158 158
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 02:27:17PM +0100, Kurt Jaeger wrote: > Do you have >vfs.zfs.dmu_offset_next_sync=0 I didn't initially, I do now. Like I said, I haven't been following that one 100%. I know it isn't block-clone per say, so much as some underlying problem it pokes with a pointy stick. Small chance multiplied by a bunch of ZFS IOPS. Seems like I'd have to revert it all the way back to fresh install if I want to get rid of all potential corruption unrelated to sysctl panic. But I'll do myh busy-work cycle (*) with that one and maybe another with it off and see what happens. * full kernel+world, plus my local poudriere package build, currenly wedged a bit with the heimdall build issue.
Re: ZFS problems since recently ?
Hi! > On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > > markj@ pointed me in > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > > to > > https://github.com/openzfs/zfs/pull/15719 > > > > So it will probably be fixed sooner or later. > > > > The other ZFS crashes I've seen are still an issue. > > My poudriere build did eventually fail as well: > ... > [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success > [05:40:24] Stopping 2 builders > panic: VERIFY(BP_GET_DEDUP(bp)) failed That's one of the panic messages I had as well. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276051 for additional crashes and dumps. > I didn't tweak this system off defaults for block-cloning. I haven't been > following > that issue 100%. Do you have vfs.zfs.dmu_offset_next_sync=0 ? -- p...@freebsd.org +49 171 3101372 Now what ?
Re: ZFS problems since recently ?
On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote: > markj@ pointed me in > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > to > https://github.com/openzfs/zfs/pull/15719 > > So it will probably be fixed sooner or later. > > The other ZFS crashes I've seen are still an issue. My poudriere build did eventually fail as well: ... [05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success [05:40:24] Stopping 2 builders panic: VERIFY(BP_GET_DEDUP(bp)) failed cpuid = 2 time = 1704091946 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00f62898c0 vpanic() at vpanic+0x131/frame 0xfe00f62899f0 spl_panic() at spl_panic+0x3a/frame 0xfe00f6289a50 dsl_livelist_iterate() at dsl_livelist_iterate+0x2de/frame 0xfe00f6289b30 bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs+0x235/frame 0xfe00f6289bf0 bpobj_iterate_impl() at bpobj_iterate_impl+0x16e/frame 0xfe00f6289c80 dsl_process_sub_livelist() at dsl_process_sub_livelist+0x5c/frame 0xfe00f6289d00 spa_livelist_delete_cb() at spa_livelist_delete_cb+0xf6/frame 0xfe00f6289ea0 zthr_procedure() at zthr_procedure+0xa5/frame 0xfe00f6289ef0 fork_exit() at fork_exit+0x82/frame 0xfe00f6289f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe00f6289f30 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 9 tid 100223 ] Stopped at kdb_enter+0x33: movq$0,0xe3a582(%rip) db> Trying to do another poudriere build fails almost immediatly with that verify error. Your verify errors don't match up exactly. I've got snapshots from before I started freaking it out with the sysctl calls and possibly inducing corruption. I didn't tweak this system off defaults for block-cloning. I haven't been following that issue 100%.
Re: ZFS problems since recently ?
Hi! > > I can crash mine with "sysctl -a" as well. markj@ pointed me in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 to https://github.com/openzfs/zfs/pull/15719 So it will probably be fixed sooner or later. The other ZFS crashes I've seen are still an issue. -- p...@freebsd.org +49 171 3101372 Now what ?
Re: ZFS problems since recently ?
> I can crash mine with "sysctl -a" as well. Smaller test, this is sufficient to crash things: root@bsd15:~ # sysctl vfs.zfs.zio vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.ziopanic: sbuf_clear makes no sense on sbuf 0xf8002c8dc300 with drain cpuid = 3 time = 1704069514 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00fa502960 vpanic() at vpanic+0x131/frame 0xfe00fa502a90 panic() at panic+0x43/frame 0xfe00fa502af0 sbuf_clear() at sbuf_clear+0xa8/frame 0xfe00fa502b00 sbuf_cpy() at sbuf_cpy+0x56/frame 0xfe00fa502b20 spa_taskq_write_param() at spa_taskq_write_param+0x85/frame 0xfe00fa502bd0 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 0xfe00fa502c20 sysctl_root() at sysctl_root+0x21e/frame 0xfe00fa502ca0 userland_sysctl() at userland_sysctl+0x184/frame 0xfe00fa502d50 sys___sysctl() at sys___sysctl+0x60/frame 0xfe00fa502e00 amd64_syscall() at amd64_syscall+0x153/frame 0xfe00fa502f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe00fa502f30 --- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x3733c1e5619a, rsp = 0x3733bf494538, rbp = 0x3733bf494570 --- KDB: enter: panic [ thread pid 780 tid 100237 ] Stopped at kdb_enter+0x33: movq$0,0xe3a582(%rip) db>
Re: ZFS problems since recently ?
On Sun, Dec 31, 2023 at 07:34:45PM +0100, Kurt Jaeger wrote: > Hi! > > Short overview: > - Had CURRENT system from around September > - Upgrade on the 23th of December > - crashes in ZFS, see > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538 > for details > - Reinstalled from scratch with new SSDs drives from > https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/15.0/ > freebsd-openzfs-amd64-2020081900-memstick.img.xz > - Had one crash with > sysctl -a > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 > - Still see crashes with ZFS (and other) when using poudriere to > build ports. > > Problem: > > I happen to run in several cases of crashes in ZFS, some of > them fatal (zpool non-recoverable). I can crash mine with "sysctl -a" as well. I seeded my bhyve with: FreeBSD-15.0-CURRENT-amd64-20231228-fb03f7f8e30d-267242-disc1.iso Rebuilt the kernel (so now at main-n267320-4d08b569a01) and started crunching through poudriere package builds. Sorta stock install of encrypted ZFS. I didn't get it to crash with poudriere (yet). Mine lives in bhyve, so maybe less possible destruction via crashes. KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe00fa5f3960 vpanic() at vpanic+0x131/frame 0xfe00fa5f3a90 panic() at panic+0x43/frame 0xfe00fa5f3af0 sbuf_clear() at sbuf_clear+0xa8/frame 0xfe00fa5f3b00 sbuf_cpy() at sbuf_cpy+0x56/frame 0xfe00fa5f3b20 spa_taskq_write_param() at spa_taskq_write_param+0x85/frame 0xfe00fa5f3bd0 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame 0xfe00fa5f3c20 sysctl_root() at sysctl_root+0x21e/frame 0xfe00fa5f3ca0 userland_sysctl() at userland_sysctl+0x184/frame 0xfe00fa5f3d50 sys___sysctl() at sys___sysctl+0x60/frame 0xfe00fa5f3e00 amd64_syscall() at amd64_syscall+0x153/frame 0xfe00fa5f3f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe00fa5f3f30 --- syscall (202, FreeBSD ELF64, __sysctl), rip = 0x22e42167019a, rsp = 0x22e41ee72518, rbp = 0x22e41ee72550 --- KDB: enter: panic The sysctl died at this point, but who knows if it had pending buffered output or anything... ... vfs.zfs.zio.deadman_log_all: 0 vfs.zfs.zio.dva_throttle_enabled: 1 vfs.zfs.zio.requeue_io_start_cut_in_line: 1 vfs.zfs.zio.slow_io_ms: 3 vfs.zfs.zio.taskq_wr_iss_ncpus: 0
ZFS problems since recently ?
Hi! Short overview: - Had CURRENT system from around September - Upgrade on the 23th of December - crashes in ZFS, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261538 for details - Reinstalled from scratch with new SSDs drives from https://download.freebsd.org/snapshots/amd64/amd64/ISO-IMAGES/15.0/ freebsd-openzfs-amd64-2020081900-memstick.img.xz - Had one crash with sysctl -a https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039 - Still see crashes with ZFS (and other) when using poudriere to build ports. Problem: I happen to run in several cases of crashes in ZFS, some of them fatal (zpool non-recoverable). The latest was: panic: VERIFY(BP_GET_DEDUP(bp)) failed cpuid = 29 time = 1704050745 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe02def538c0 vpanic() at vpanic+0x131/frame 0xfe02def539f0 spl_panic() at spl_panic+0x3a/frame 0xfe02def53a50 dsl_livelist_iterate() at dsl_livelist_iterate+0x2de/frame 0xfe02def53b30 bpobj_iterate_blkptrs() at bpobj_iterate_blkptrs+0x235/frame 0xfe02def53bf0 bpobj_iterate_impl() at bpobj_iterate_impl+0x16e/frame 0xfe02def53c80 dsl_process_sub_livelist() at dsl_process_sub_livelist+0x5c/frame 0xfe02def53d00 spa_livelist_delete_cb() at spa_livelist_delete_cb+0xf6/frame 0xfe02def53ea0 zthr_procedure() at zthr_procedure+0xa5/frame 0xfe02def53ef0 fork_exit() at fork_exit+0x82/frame 0xfe02def53f30 fork_trampoline() at fork_trampoline+0xe/frame 0xfe02def53f30 --- trap 0x85baac33, rip = 0xdfda59b01bda59f, rsp = 0xaaca3ed7a68a3ed3, rbp = 0x16f512f91ab512fd --- before I had to nuke that pool and restart... I already have vfs.zfs.dmu_offset_next_sync=0 which is supposed to disable block-cloning. Does anyone else see this with recent versions or is it just me ? -- p...@freebsd.org +49 171 3101372 Now what ?