Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard
[Today's main-snapshot kernel panics as well.]

On Sep 7, 2023, at 16:32, Mark Millard  wrote:

> On Sep 7, 2023, at 13:07, Alexander Motin  wrote:
> 
>> Thanks, Mark.
>> 
>> On 07.09.2023 15:40, Mark Millard wrote:
>>> On Sep 7, 2023, at 11:48, Glen Barber  wrote:
 On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:
> When I next have time, should I retry based on a more recent
> vintage of main that includes 969071be938c ?
 
 Yes, please, if you can.
>>> As stands, I rebooted that machine into my normal
>>> enviroment, so the after-crash-with-dump-info
>>> context is preserved. I'll presume lack of a need
>>> to preserve that context unless I hear otherwise.
>>> (But I'll work on this until later today.)
>>> Even my normal environment predates the commit in
>>> question by a few commits. So I'll end up doing a
>>> more general round of updates overall.
>>> Someone can let me know if there is a preference
>>> for debug over non-debug for the next test run.
>> 
>> It is not unknown when some bugs disappear once debugging is enabled due to 
>> different execution timings, but generally debug may to detect the problem 
>> closer to its origin instead of looking on random consequences. I am only 
>> starting to look on this report (unless Pawel or somebody beat me on it), 
>> and don't have additional requests yet, but if you can repeat the same with 
>> debug kernel (in-base ZFS's ZFS_DEBUG setting follows kernel's INVARIANTS), 
>> it may give us some additional information.
> 
> So I did a zpool import, rewinding to the checkpoint.
> (This depends on the questionable zfs doing fully as
> desired for this. Notably the normal environment has
> vfs.zfs.bclone_enabled=0 , including when it was
> doing this activity.) My normal environment reported
> no problems.
> 
> Note: the earlier snapshot from my first setup was
> still in place since it was made just before the
> original checkpoint used above.
> 
> However, the rewind did remove the /var/crash/
> material that had been added.
> 
> I did the appropriate zfs mount.
> 
> I installed a debug kernel and world to the import. Again,
> no problems reported.
> 
> I did the appropriate zfs umount.
> 
> I did the appropriate zpool export.
> 
> I rebooted with the test media.
> 
> # sysctl vfs.zfs.bclone_enabled
> vfs.zfs.bclone_enabled: 1
> 
> # zpool trim -w zamd64
> 
> # zpool checkpoint zamd64
> 
> # uname -apKU
> FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #74 
> main-n265188-117c54a78ccd-dirty: Tue Sep  5 21:29:53 PDT 2023 
> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG
>  amd64 amd64 150 150
> 
> (So, before the 969071be938c vintage, same sources as for
> my last run but a debug build.)
> 
> # poudriere bulk -jmain-amd64-bulk_a -a
> . . .
> [00:03:23] Building 34214 packages using up to 32 builders
> [00:03:23] Hit CTRL+t at any time to see build progress and stats
> [00:03:23] [01] [00:00:00] Builder starting
> [00:04:19] [01] [00:00:56] Builder started
> [00:04:20] [01] [00:00:01] Building ports-mgmt/pkg | pkg-1.20.6
> [00:05:33] [01] [00:01:14] Finished ports-mgmt/pkg | pkg-1.20.6: Success
> [00:05:53] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
> [00:05:53] [02] [00:00:00] Builder starting
> . . .
> [00:05:54] [32] [00:00:00] Builder starting
> [00:06:11] [01] [00:00:18] Finished print/indexinfo | indexinfo-0.3.1: Success
> [00:06:12] [01] [00:00:00] Building devel/gettext-runtime | 
> gettext-runtime-0.22_1
> [00:08:24] [01] [00:02:12] Finished devel/gettext-runtime | 
> gettext-runtime-0.22_1: Success
> [00:08:31] [01] [00:00:00] Building devel/libtextstyle | libtextstyle-0.22
> [00:10:06] [05] [00:04:13] Builder started
> [00:10:06] [05] [00:00:00] Building devel/autoconf-switch | 
> autoconf-switch-20220527
> [00:10:06] [31] [00:04:12] Builder started
> [00:10:06] [31] [00:00:00] Building devel/libatomic_ops | libatomic_ops-7.8.0
> . . .
> 
> Crashed again, with 158 *.pkg files in .building/All/ after
> rebooting.
> 
> The crash is similar to the non-debug one. No extra output
> from the debug build.
> 
> For reference:
> 
> Unread portion of the kernel message buffer:
> panic: Solaris(panic): zfs: accessing past end of object 422/10b1c02 
> (size=2560 access=2560+2560)
> . . .

Same world with newer snapshot main kernel that should
be compatible with the world:

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #0 
main-n265205-03a7c36ddbc0: Thu Sep  7 03:10:34 UTC 2023 
r...@releng3.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 
amd64 150 150


Steps:

#NOTE: (re)boot to normal environment
#NOTE: login

cd ~/artifacts/

#NOTE: as needed . . .
fetch 
http://ftp3.freebsd.org/pub/FreeBSD/snapshots/amd64/15.0-CURRENT/kernel.txz
fetch 
http://ftp3.freebsd.org/pub/FreeBSD/snapshots/amd64/15.0-CURRENT/kernel-dbg.txz
fetch 

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard
On Sep 7, 2023, at 13:07, Alexander Motin  wrote:

> Thanks, Mark.
> 
> On 07.09.2023 15:40, Mark Millard wrote:
>> On Sep 7, 2023, at 11:48, Glen Barber  wrote:
>>> On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:
 When I next have time, should I retry based on a more recent
 vintage of main that includes 969071be938c ?
>>> 
>>> Yes, please, if you can.
>> As stands, I rebooted that machine into my normal
>> enviroment, so the after-crash-with-dump-info
>> context is preserved. I'll presume lack of a need
>> to preserve that context unless I hear otherwise.
>> (But I'll work on this until later today.)
>> Even my normal environment predates the commit in
>> question by a few commits. So I'll end up doing a
>> more general round of updates overall.
>> Someone can let me know if there is a preference
>> for debug over non-debug for the next test run.
> 
> It is not unknown when some bugs disappear once debugging is enabled due to 
> different execution timings, but generally debug may to detect the problem 
> closer to its origin instead of looking on random consequences. I am only 
> starting to look on this report (unless Pawel or somebody beat me on it), and 
> don't have additional requests yet, but if you can repeat the same with debug 
> kernel (in-base ZFS's ZFS_DEBUG setting follows kernel's INVARIANTS), it may 
> give us some additional information.

So I did a zpool import, rewinding to the checkpoint.
(This depends on the questionable zfs doing fully as
desired for this. Notably the normal environment has
vfs.zfs.bclone_enabled=0 , including when it was
doing this activity.) My normal environment reported
no problems.

Note: the earlier snapshot from my first setup was
still in place since it was made just before the
original checkpoint used above.

However, the rewind did remove the /var/crash/
material that had been added.

I did the appropriate zfs mount.

I installed a debug kernel and world to the import. Again,
no problems reported.

I did the appropriate zfs umount.

I did the appropriate zpool export.

I rebooted with the test media.

# sysctl vfs.zfs.bclone_enabled
vfs.zfs.bclone_enabled: 1

# zpool trim -w zamd64

# zpool checkpoint zamd64

# uname -apKU
FreeBSD amd64-ZFS 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #74 
main-n265188-117c54a78ccd-dirty: Tue Sep  5 21:29:53 PDT 2023 
root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-dbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-DBG
 amd64 amd64 150 150

(So, before the 969071be938c vintage, same sources as for
my last run but a debug build.)

# poudriere bulk -jmain-amd64-bulk_a -a
. . .
[00:03:23] Building 34214 packages using up to 32 builders
[00:03:23] Hit CTRL+t at any time to see build progress and stats
[00:03:23] [01] [00:00:00] Builder starting
[00:04:19] [01] [00:00:56] Builder started
[00:04:20] [01] [00:00:01] Building ports-mgmt/pkg | pkg-1.20.6
[00:05:33] [01] [00:01:14] Finished ports-mgmt/pkg | pkg-1.20.6: Success
[00:05:53] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
[00:05:53] [02] [00:00:00] Builder starting
. . .
[00:05:54] [32] [00:00:00] Builder starting
[00:06:11] [01] [00:00:18] Finished print/indexinfo | indexinfo-0.3.1: Success
[00:06:12] [01] [00:00:00] Building devel/gettext-runtime | 
gettext-runtime-0.22_1
[00:08:24] [01] [00:02:12] Finished devel/gettext-runtime | 
gettext-runtime-0.22_1: Success
[00:08:31] [01] [00:00:00] Building devel/libtextstyle | libtextstyle-0.22
[00:10:06] [05] [00:04:13] Builder started
[00:10:06] [05] [00:00:00] Building devel/autoconf-switch | 
autoconf-switch-20220527
[00:10:06] [31] [00:04:12] Builder started
[00:10:06] [31] [00:00:00] Building devel/libatomic_ops | libatomic_ops-7.8.0
. . .

Crashed again, with 158 *.pkg files in .building/All/ after
rebooting.

The crash is similar to the non-debug one. No extra output
from the debug build.

For reference:

Unread portion of the kernel message buffer:
panic: Solaris(panic): zfs: accessing past end of object 422/10b1c02 (size=2560 
access=2560+2560)
cpuid = 15
time = 1694127988
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe02e783b5a0
vpanic() at vpanic+0x132/frame 0xfe02e783b6d0
panic() at panic+0x43/frame 0xfe02e783b730
vcmn_err() at vcmn_err+0xeb/frame 0xfe02e783b860
zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfe02e783b8c0
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0xb8/frame 
0xfe02e783b970
dmu_brt_clone() at dmu_brt_clone+0x61/frame 0xfe02e783b9f0
zfs_clone_range() at zfs_clone_range+0xaa3/frame 0xfe02e783bbc0
zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x18a/frame 
0xfe02e783bc40
vn_copy_file_range() at vn_copy_file_range+0x114/frame 0xfe02e783bce0
kern_copy_file_range() at kern_copy_file_range+0x36c/frame 0xfe02e783bdb0
sys_copy_file_range() at sys_copy_file_range+0x78/frame 0xfe02e783be00
amd64_syscall() at amd64_syscall+0x14f/frame 

Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Alexander Motin

Thanks, Mark.

On 07.09.2023 15:40, Mark Millard wrote:

On Sep 7, 2023, at 11:48, Glen Barber  wrote:


On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:

When I next have time, should I retry based on a more recent
vintage of main that includes 969071be938c ?


Yes, please, if you can.


As stands, I rebooted that machine into my normal
enviroment, so the after-crash-with-dump-info
context is preserved. I'll presume lack of a need
to preserve that context unless I hear otherwise.
(But I'll work on this until later today.)

Even my normal environment predates the commit in
question by a few commits. So I'll end up doing a
more general round of updates overall.

Someone can let me know if there is a preference
for debug over non-debug for the next test run.


It is not unknown when some bugs disappear once debugging is enabled due 
to different execution timings, but generally debug may to detect the 
problem closer to its origin instead of looking on random consequences. 
I am only starting to look on this report (unless Pawel or somebody beat 
me on it), and don't have additional requests yet, but if you can repeat 
the same with debug kernel (in-base ZFS's ZFS_DEBUG setting follows 
kernel's INVARIANTS), it may give us some additional information.



Looking at "git: 969071be938c - main", the relevant
part seems to be just (white space possibly not
preserved accurately):

diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
index 9fb5aee6a023..4e4161ef1a7f 100644
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -3076,12 +3076,14 @@ vn_copy_file_range(struct vnode *invp, off_t *inoffp, 
struct vnode *outvp,
goto out;
  
  	/*

-* If the two vnode are for the same file system, call
+* If the two vnodes are for the same file system type, call
 * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range()
-* which can handle copies across multiple file systems.
+* which can handle copies across multiple file system types.
 */
*lenp = len;
-   if (invp->v_mount == outvp->v_mount)
+   if (invp->v_mount == outvp->v_mount ||
+   strcmp(invp->v_mount->mnt_vfc->vfc_name,
+   outvp->v_mount->mnt_vfc->vfc_name) == 0)
error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp,
lenp, flags, incred, outcred, fsize_td);
else

That looks to call VOP_COPY_FILE_RANGE in more contexts and
vn_generic_copy_file_range in fewer.

The backtrace I reported involves: VOP_COPY_FILE_RANGE
So it appears this change is unlikely to invalidate my
test result,  although failure might happen sooner if
more VOP_COPY_FILE_RANGE calls happen with the newer code.


Your logic is likely right, but if you have block cloning requests both 
within and between datasets, this patch may change the pattern.  Though 
it is obviously not a fix for the issue.  I responded to the commit 
email only because it makes no difference while vfs.zfs.bclone_enabled is 0.



That in turns means that someone may come up with some
other change for me to test by the time I get around to
setting up another test. Let me know if so.


--
Alexander Motin



Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard
On Sep 7, 2023, at 11:48, Glen Barber  wrote:

> On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:
>> When I next have time, should I retry based on a more recent
>> vintage of main that includes 969071be938c ?
>> 
> 
> Yes, please, if you can.

As stands, I rebooted that machine into my normal
enviroment, so the after-crash-with-dump-info
context is preserved. I'll presume lack of a need
to preserve that context unless I hear otherwise.
(But I'll work on this until later today.)

Even my normal environment predates the commit in
question by a few commits. So I'll end up doing a
more general round of updates overall.

Someone can let me know if there is a preference
for debug over non-debug for the next test run.

Looking at "git: 969071be938c - main", the relevant
part seems to be just (white space possibly not
preserved accurately):

diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
index 9fb5aee6a023..4e4161ef1a7f 100644
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -3076,12 +3076,14 @@ vn_copy_file_range(struct vnode *invp, off_t *inoffp, 
struct vnode *outvp,
goto out;
 
/*
-* If the two vnode are for the same file system, call
+* If the two vnodes are for the same file system type, call
 * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range()
-* which can handle copies across multiple file systems.
+* which can handle copies across multiple file system types.
 */
*lenp = len;
-   if (invp->v_mount == outvp->v_mount)
+   if (invp->v_mount == outvp->v_mount ||
+   strcmp(invp->v_mount->mnt_vfc->vfc_name,
+   outvp->v_mount->mnt_vfc->vfc_name) == 0)
error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp,
lenp, flags, incred, outcred, fsize_td);
else

That looks to call VOP_COPY_FILE_RANGE in more contexts and
vn_generic_copy_file_range in fewer.

The backtrace I reported involves: VOP_COPY_FILE_RANGE
So it appears this change is unlikely to invalidate my
test result,  although failure might happen sooner if
more VOP_COPY_FILE_RANGE calls happen with the newer code.

That in turns means that someone may come up with some
other change for me to test by the time I get around to
setting up another test. Let me know if so.

===
Mark Millard
marklmi at yahoo.com




Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Glen Barber
On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:
> When I next have time, should I retry based on a more recent
> vintage of main that includes 969071be938c ?
> 

Yes, please, if you can.

Glen



signature.asc
Description: PGP signature


Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard
[Drat, the request to rerun my tests did not not mention
the more recent change:

vfs: copy_file_range() between multiple mountpoints of the same fs type

and I'd not noticed on my own and ran the test without updating.]


On Sep 7, 2023, at 11:02, Mark Millard  wrote:

> I was requested to do a test with vfs.zfs.bclone_enabled=1 and
> the bulk -a build paniced (having stored 128 *.pkg files in
> .building/ first):

Unfortunately, rerunning my tests with this set was testing a
context predating:

Wed, 06 Sep 2023
. . .
• git: 969071be938c - main - vfs: copy_file_range() between multiple 
mountpoints of the same fs type Martin Matuska

So the information might be out of date for main and for
stable/14 : I've no clue how good of a test it was.

May be some of those I've cc'd would know.

When I next have time, should I retry based on a more recent
vintage of main that includes 969071be938c ?

> # more /var/crash/core.txt.3
> . . .
> Unread portion of the kernel message buffer:
> panic: Solaris(panic): zfs: accessing past end of object 422/1108c16 
> (size=2560 access=2560+2560)
> cpuid = 15
> time = 1694103674
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0352758590
> vpanic() at vpanic+0x132/frame 0xfe03527586c0
> panic() at panic+0x43/frame 0xfe0352758720
> vcmn_err() at vcmn_err+0xeb/frame 0xfe0352758850
> zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfe03527588b0
> dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x97/frame 
> 0xfe0352758960
> dmu_brt_clone() at dmu_brt_clone+0x61/frame 0xfe03527589f0
> zfs_clone_range() at zfs_clone_range+0xa6a/frame 0xfe0352758bc0
> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x1ae/frame 
> 0xfe0352758c40
> vn_copy_file_range() at vn_copy_file_range+0x11e/frame 0xfe0352758ce0
> kern_copy_file_range() at kern_copy_file_range+0x338/frame 0xfe0352758db0
> sys_copy_file_range() at sys_copy_file_range+0x78/frame 0xfe0352758e00
> amd64_syscall() at amd64_syscall+0x109/frame 0xfe0352758f30
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0352758f30
> --- syscall (569, FreeBSD ELF64, copy_file_range), rip = 0x1ce4506d155a, rsp 
> = 0x1ce44ec71e88, rbp = 0x1ce44ec72320 ---
> KDB: enter: panic
> 
> __curthread () at /usr/main-src/sys/amd64/include/pcpu_aux.h:57
> 57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
> pcpu,
> (kgdb) #0  __curthread () at /usr/main-src/sys/amd64/include/pcpu_aux.h:57
> #1  doadump (textdump=textdump@entry=0)
>   at /usr/main-src/sys/kern/kern_shutdown.c:405
> #2  0x804a442a in db_dump (dummy=,  
> dummy2=, dummy3=, dummy4=)
>   at /usr/main-src/sys/ddb/db_command.c:591
> #3  0x804a422d in db_command (last_cmdp=,  
> cmd_table=, dopager=true)
>   at /usr/main-src/sys/ddb/db_command.c:504
> #4  0x804a3eed in db_command_loop ()
>   at /usr/main-src/sys/ddb/db_command.c:551
> #5  0x804a7876 in db_trap (type=, code=)
>   at /usr/main-src/sys/ddb/db_main.c:268
> #6  0x80bb9e57 in kdb_trap (type=type@entry=3, code=code@entry=0, 
>  tf=tf@entry=0xfe03527584d0) at /usr/main-src/sys/kern/subr_kdb.c:790
> #7  0x8104ad3d in trap (frame=0xfe03527584d0)
>   at /usr/main-src/sys/amd64/amd64/trap.c:608
> #8  
> #9  kdb_enter (why=, msg=)
>   at /usr/main-src/sys/kern/subr_kdb.c:556
> #10 0x80b6aab3 in vpanic (fmt=0x82be52d6 "%s%s",  
> ap=ap@entry=0xfe0352758700)
>   at /usr/main-src/sys/kern/kern_shutdown.c:958
> #11 0x80b6a943 in panic (
>   fmt=0x820aa2e8  "\312C$\201\377\377\377\377")
>   at /usr/main-src/sys/kern/kern_shutdown.c:894
> #12 0x82993c5b in vcmn_err (ce=,  
> fmt=0x82bfdd1f "zfs: accessing past end of object %llx/%llx (size=%u 
> access=%llu+%llu)", adx=0xfe0352758890)
>   at /usr/main-src/sys/contrib/openzfs/module/os/freebsd/spl/spl_cmn_err.c:60
> #13 0x82a84d69 in zfs_panic_recover (
>   fmt=0x12 )
>   at /usr/main-src/sys/contrib/openzfs/module/zfs/spa_misc.c:1594
> #14 0x829f8e27 in dmu_buf_hold_array_by_dnode (dn=0xf813dfc48978, 
>  offset=offset@entry=2560, length=length@entry=2560, read=read@entry=0,   
>tag=0x82bd8175, numbufsp=numbufsp@entry=0xfe03527589bc,  
> dbpp=0xfe03527589c0, flags=0)
>   at /usr/main-src/sys/contrib/openzfs/module/zfs/dmu.c:543
> #15 0x829fc6a1 in dmu_buf_hold_array (os=,  
> object=, read=0, numbufsp=0xfe03527589bc,  
> dbpp=0xfe03527589c0, offset=, length=,  
> tag=)
>   at /usr/main-src/sys/contrib/openzfs/module/zfs/dmu.c:654
> #16 dmu_brt_clone (os=os@entry=0xf8010ae0e000, object=,
>   offset=offset@entry=2560, length=length@entry=2560,  
> tx=tx@entry=0xf81aaeb6e100, bps=bps@entry=0xfe0595931000, nbps=1, 
>  replay=0) at /usr/main-src/sys/contrib/openzfs/module/zfs/dmu.c:2301
> #17 0x82b4440a in 

main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Mark Millard
I was requested to do a test with vfs.zfs.bclone_enabled=1 and
the bulk -a build paniced (having stored 128 *.pkg files in
.building/ first):

# more /var/crash/core.txt.3
. . .
Unread portion of the kernel message buffer:
panic: Solaris(panic): zfs: accessing past end of object 422/1108c16 (size=2560 
access=2560+2560)
cpuid = 15
time = 1694103674
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0352758590
vpanic() at vpanic+0x132/frame 0xfe03527586c0
panic() at panic+0x43/frame 0xfe0352758720
vcmn_err() at vcmn_err+0xeb/frame 0xfe0352758850
zfs_panic_recover() at zfs_panic_recover+0x59/frame 0xfe03527588b0
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x97/frame 
0xfe0352758960
dmu_brt_clone() at dmu_brt_clone+0x61/frame 0xfe03527589f0
zfs_clone_range() at zfs_clone_range+0xa6a/frame 0xfe0352758bc0
zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x1ae/frame 
0xfe0352758c40
vn_copy_file_range() at vn_copy_file_range+0x11e/frame 0xfe0352758ce0
kern_copy_file_range() at kern_copy_file_range+0x338/frame 0xfe0352758db0
sys_copy_file_range() at sys_copy_file_range+0x78/frame 0xfe0352758e00
amd64_syscall() at amd64_syscall+0x109/frame 0xfe0352758f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0352758f30
--- syscall (569, FreeBSD ELF64, copy_file_range), rip = 0x1ce4506d155a, rsp = 
0x1ce44ec71e88, rbp = 0x1ce44ec72320 ---
KDB: enter: panic

__curthread () at /usr/main-src/sys/amd64/include/pcpu_aux.h:57
57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
pcpu,
(kgdb) #0  __curthread () at /usr/main-src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=0)
   at /usr/main-src/sys/kern/kern_shutdown.c:405
#2  0x804a442a in db_dump (dummy=,  
dummy2=, dummy3=, dummy4=)
   at /usr/main-src/sys/ddb/db_command.c:591
#3  0x804a422d in db_command (last_cmdp=,  
cmd_table=, dopager=true)
   at /usr/main-src/sys/ddb/db_command.c:504
#4  0x804a3eed in db_command_loop ()
   at /usr/main-src/sys/ddb/db_command.c:551
#5  0x804a7876 in db_trap (type=, code=)
   at /usr/main-src/sys/ddb/db_main.c:268
#6  0x80bb9e57 in kdb_trap (type=type@entry=3, code=code@entry=0,  
tf=tf@entry=0xfe03527584d0) at /usr/main-src/sys/kern/subr_kdb.c:790
#7  0x8104ad3d in trap (frame=0xfe03527584d0)
   at /usr/main-src/sys/amd64/amd64/trap.c:608
#8  
#9  kdb_enter (why=, msg=)
   at /usr/main-src/sys/kern/subr_kdb.c:556
#10 0x80b6aab3 in vpanic (fmt=0x82be52d6 "%s%s",  
ap=ap@entry=0xfe0352758700)
   at /usr/main-src/sys/kern/kern_shutdown.c:958
#11 0x80b6a943 in panic (
   fmt=0x820aa2e8  "\312C$\201\377\377\377\377")
   at /usr/main-src/sys/kern/kern_shutdown.c:894
#12 0x82993c5b in vcmn_err (ce=,  
fmt=0x82bfdd1f "zfs: accessing past end of object %llx/%llx (size=%u 
access=%llu+%llu)", adx=0xfe0352758890)
   at /usr/main-src/sys/contrib/openzfs/module/os/freebsd/spl/spl_cmn_err.c:60
#13 0x82a84d69 in zfs_panic_recover (
   fmt=0x12 )
   at /usr/main-src/sys/contrib/openzfs/module/zfs/spa_misc.c:1594
#14 0x829f8e27 in dmu_buf_hold_array_by_dnode (dn=0xf813dfc48978,   
   offset=offset@entry=2560, length=length@entry=2560, read=read@entry=0,  
tag=0x82bd8175, numbufsp=numbufsp@entry=0xfe03527589bc,  
dbpp=0xfe03527589c0, flags=0)
   at /usr/main-src/sys/contrib/openzfs/module/zfs/dmu.c:543
#15 0x829fc6a1 in dmu_buf_hold_array (os=,  
object=, read=0, numbufsp=0xfe03527589bc,  
dbpp=0xfe03527589c0, offset=, length=,  
tag=)
   at /usr/main-src/sys/contrib/openzfs/module/zfs/dmu.c:654
#16 dmu_brt_clone (os=os@entry=0xf8010ae0e000, object=,  
offset=offset@entry=2560, length=length@entry=2560,  
tx=tx@entry=0xf81aaeb6e100, bps=bps@entry=0xfe0595931000, nbps=1,  
replay=0) at /usr/main-src/sys/contrib/openzfs/module/zfs/dmu.c:2301
#17 0x82b4440a in zfs_clone_range (inzp=0xf8100054c910,  
inoffp=0xf81910c3c7c8, outzp=0xf80fb3233000,  
outoffp=0xf819860a2c78, lenp=lenp@entry=0xfe0352758c00,  
cr=0xf80e32335200)
   at /usr/main-src/sys/contrib/openzfs/module/zfs/zfs_vnops.c:1302
#18 0x829b4ece in zfs_freebsd_copy_file_range (ap=0xfe0352758c58)
   at 
/usr/main-src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:6294
#19 0x80c7160e in VOP_COPY_FILE_RANGE (invp=,  
inoffp=0x40, outvp=0xfe03527581d0, outoffp=0x811e6eb7,  
lenp=0x0, flags=0, incred=0xf80e32335200, outcred=0x0,  
fsizetd=0xfe03586c0720) at ./vnode_if.h:2381
#20 vn_copy_file_range (invp=invp@entry=0xf8095e1e8000, inoffp=0x40,  
inoffp@entry=0xf81910c3c7c8, outvp=0xfe03527581d0,  
outvp@entry=0xf805d6107380, outoffp=0x811e6eb7,  

Re: 100% CPU time for sysctl command, not killable

2023-09-07 Thread Alexander Leidinger

Am 2023-09-03 21:22, schrieb Alexander Leidinger:

Am 2023-09-02 16:56, schrieb Mateusz Guzik:

On 8/20/23, Alexander Leidinger  wrote:

Hi,

sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable

sysctl program. This is somewhat unexpected...



fixed here 
https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3


I confirm.


There may be dragons...:
kern.maxvnodes: 1048576000
vfs.wantfreevnodes: 262144000
vfs.freevnodes: 0  <---
vfs.vnodes_created: 11832359
vfs.numvnodes: 146699
vfs.recycles_free: 4700765
vfs.recycles: 0
vfs.vnode_alloc_sleeps: 0

Another time I got an insanely huge amount of free vnodes (more than 
maxvnodes).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: kernel 100% CPU, and ports-mgmt/poudriere-devel 'Inspecting ports tree for modifications to git checkout...' for an extraordinarily long time

2023-09-07 Thread Graham Perrin

On 03/09/2023 10:26, Michael Gmelin wrote:


On Sat, 2 Sep 2023 09:53:38 +0100
Graham Perrin  wrote:


Some inspections are extraordinarily time-consuming. Others complete
very quickly, as they should.

One recent inspection took more than half an hour.

Anyone else?


Does `git clone https://git.freebsd.org/ports.git` work for you?
(currently it's not working from where I am). Maybe related.

Best
Michael


Today, yes.

Sorry for not replying sooner, Gmail sent your 2nd September email to spam.

% pwd
/tmp
% time git clone https://git.freebsd.org/ports.git && rm -r ports
Cloning into 'ports'...
remote: Enumerating objects: 5943170, done.
remote: Counting objects: 100% (943/943), done.
remote: Compressing objects: 100% (127/127), done.
remote: Total 5943170 (delta 923), reused 816 (delta 816), pack-reused 
5942227

Receiving objects: 100% (5943170/5943170), 1.11 GiB | 6.29 MiB/s, done.
Resolving deltas: 100% (3586216/3586216), done.
Updating files: 100% (156931/156931), done.
941.630u 59.980s 10:11.66 163.7%    +442k 14+0io 58pf+16w
override r--r--r-- grahamperrin/wheel for 
ports/.git/objects/pack/pack-d72c55d78249720bb87ae380c69ecb3c6dc5fe94.idx? 
^C

% sudo rm -r /tmp/ports
grahamperrin's password:
%




sysutils/panicmail and boot environments

2023-09-07 Thread Graham Perrin



For the penultimate panic: panicmail was received (in my Gmail Inbox).

For the most recent panic: no panicmail in the same Inbox. Might this 
be, because the kernel panicked in a different boot environment? If so: 
will a temporary boot of the environment be enough for delivery of the mail?



From :


Before the panic: with n265135-07bc20e4740d-b running, I created then 
mounted n265135-07bc20e4740d-c, upgraded its packages, unmounted then 
activated the environment, then restarted the OS.


If I recall correctly: the panic occurred very close to the end of the 
shutdown routine.