truss -f timeout 2 sleep 10 causes breakage

2024-03-27 Thread Mateusz Guzik
Top of main, but I reproduced it on stable/14-e64d827d3 as well.

Mere "timeout 2 sleep 10" correctly times out.

Running "truss -f timeout 2 sleep 10" prevents timeout from killing
sleep and the entire thing refuses to exit, truss has to be killed off
with SIGKILL.

Here is the best part: after doing the above, going back to mere
"timeout 2 sleep 10" (without truss!) no longer works -- timeout gets
stuck in the kernel: mi_switch sleepq_catch_signals sleepq_wait_sig
_sx_xlock_hard stop_all_proc_block kern_procctl sys_procctl
amd64_syscall fast_syscall_common

It does react to -9 though.

-- 
Mateusz Guzik 



Re: libc/libsys split coming soon

2024-02-03 Thread Mateusz Guzik
On 2/3/24, David Chisnall  wrote:
> On 3 Feb 2024, at 09:15, Mateusz Guzik  wrote:
>>
>> Binary startup is very slow, for example execve of a hello world
>> binary in a Linux-based chroot on FreeBSD is faster by a factor of 2
>> compared to a native one. As such perf-wise this looks like a step in
>> the wrong direction.
>
> Have you profiled this?  Is the Linux version using BIND_NOW (which comes
> with a load of problems, but it often the default for Linux systems and
> reduces the number of slow-path entries into rtld)?  Do they trigger the
> same number of CoW faults?  Is there a path in rtld that’s slower than the
> equivalent ld-linux.so path?
>

I only profiled FreeBSD, it was 4 years ago. I have neither time nor
interest in working on this.

Relevant excerpts from profiling an fexecve loop:

Sampling what syscalls was being executed when in kernel mode
(or trap):

syscalls:
   pread  108
   fstat  162
   issetugid  250
 sigprocmask  303
read  310
mprotect  341
open  380
   close 1547
mmap 2787
trap 5421
[snip]
In userspace most of the time is spent here:
  ld-elf.so.1`memset  406
  ld-elf.so.1`matched_symbol  431
  ld-elf.so.1`strcmp 1078
   ld-elf.so.1`reloc_non_plt 1102
 ld-elf.so.1`symlook_obj 1102
 ld-elf.so.1`find_symdef 1439

find_symdef iterates a linked list, which I presume induces strcmp calls
due to unwanted entries.
[snip]

Full profile
 user:
  libc.so.7`__je_extent_heap_new   71
  libc.so.7`__vdso_clock_gettime   73
libc.so.7`memset   75
   ld-elf.so.1`_rtld   83
  ld-elf.so.1`getenv   85
libc.so.7`__je_malloc_mutex_boot  132
   ld-elf.so.1`reloc_plt  148
ld-elf.so.1`__crt_malloc  163
 ld-elf.so.1`symlook_default  166
 ld-elf.so.1`digest_dynamic1  184
libc.so.7`__je_malloc_mutex_init  205
  ld-elf.so.1`symlook_global  281
  ld-elf.so.1`memset  406
  ld-elf.so.1`matched_symbol  431
  ld-elf.so.1`strcmp 1078
   ld-elf.so.1`reloc_non_plt 1102
 ld-elf.so.1`symlook_obj 1102
 ld-elf.so.1`find_symdef 1439
 kernel:
 kernel`vm_reserv_alloc_page   89
kernel`amd64_syscall   95
 kernel`0x80  102
   kernel`vm_page_alloc_domain_after  114
 kernel`vm_object_deallocate  117
kernel`vm_map_pmap_enter  122
kernel`pmap_enter_object  140
   kernel`uma_zalloc_arg  148
  kernel`vm_map_lookup_entry  148
 kernel`pmap_try_insert_pv_entry  156
   kernel`vm_fault_dirty  168
 kernel`pagecopy  177
 kernel`vm_fault  260
 kernel`get_pv_entry  265
kernel`pagezero_erms  367
  kernel`pmap_enter_quick_locked  380
   kernel`pmap_enter  432
 kernel`0x80 1126
 kernel`0x80 2017
 kernel`trap 2097
syscalls:
   pread  108
   fstat  162
   issetugid  250
 sigprocmask  303
read  310
mprotect  341
open  380
   close 1547
mmap 2787
trap 5421

Counting fexecve:
dtrace -n 'fbt::sys_fexecve:entry { @[count] = stack(); } tick-30s { exit(0); }'

dtrace script, can be run as: dtrace -w -x aggsize=128M -s script.d
assumes binary name is a.out

syscall::fexecve:entry
{
self->inexec = 1;
}

syscall::fexecve:return
{
self->inexec = 0;
}

fbt::trap:entry
{
self->tra

Re: libc/libsys split coming soon

2024-02-03 Thread Mateusz Guzik
On 2/2/24, Brooks Davis  wrote:
> TL;DR: The implementation of system calls is moving to a seperate
> library (libsys).  No changes are required to existing software (except
> to ensure that libsys is present when building custom disk images).
>
> Code: https://github.com/freebsd/freebsd-src/pull/908
>
> After nearly a decade of intermittent work, I'm about to land a series
> of patches which moves system calls, vdso support, and libc's parsing of
> the ELF auxiliary argument vector into a separate library (libsys).  I
> plan to do this early next week (February 5th).
>
> This change serves three primary purposes:
>   1. It's easier to completely replace system call implementations for
>  tracing or compartmentalization purposes.
>   2. It simplifies the implementation of restrictions on system calls such
>  as those implemented by OpenBSD's msyscall(2)
>  (https://man.openbsd.org/msyscall.2).
>   3. It allows language runtimes to link with libsys for system call
>  implementations without requiring libc.
>
> libsys is an auxiliary filter for libc.  This means that for any symbol
> defined by both, the libsys version takes precedence at runtime.  For
> system call implementations, libc contains empty stubs.  For others it
> contains copies of the functions (this could be further refined at a
> later date).  The statically linked libc contains the full
> implementations so linking libsys is not required.
>

Do I read it correctly that everything dynamically linked will also be
linked to libsys, as in executing such a binary will now require
loading one extra .so?

Binary startup is very slow, for example execve of a hello world
binary in a Linux-based chroot on FreeBSD is faster by a factor of 2
compared to a native one. As such perf-wise this looks like a step in
the wrong direction.

Is there a problem making it so that libc ends up unchanged, but all
the bits are available separately in libsys if one does not want libc?

-- 
Mateusz Guzik 



Re: crash zfs_clone_range()

2023-11-14 Thread Mateusz Guzik
On 11/14/23, Rick Macklem  wrote:
> On Tue, Nov 14, 2023 at 10:46 AM Alexander Motin  wrote:
>>
>> On 14.11.2023 12:44, Alexander Motin wrote:
>> > On 14.11.2023 12:39, Mateusz Guzik wrote:
>> >> One of the vnodes is probably not zfs, I suspect this will do it
>> >> (untested):
>> >>
>> >> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>> >> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>> >> index 107cd69c756c..e799a7091b8e 100644
>> >> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>> >> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>> >> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct
>> >> vop_copy_file_range_args *ap)
>> >>  goto bad_write_fallback;
>> >>  }
>> >>  }
>> >> +
>> >> +   if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) {
>> >> +   goto bad_write_fallback;
>> >> +   }
>> >> +
>> >>  if (invp == outvp) {
>> >>  if (vn_lock(outvp, LK_EXCLUSIVE) != 0) {
>> >>  goto bad_write_fallback;
>> >>
>> >
>> > vn_copy_file_range() verifies for that:
>> >
>> >  /*
>> >   * If the two vnodes are for the same file system type, call
>> >   * VOP_COPY_FILE_RANGE(), otherwise call
>> > vn_generic_copy_file_range()
>> >   * which can handle copies across multiple file system types.
>> >   */
>> >  *lenp = len;
>> >  if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name,
>> >  outmp->mnt_vfc->vfc_name) == 0)
>> >  error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp,
>> > outoffp,
>> >  lenp, flags, incred, outcred, fsize_td);
>> >  else
>> >  error = vn_generic_copy_file_range(invp, inoffp,
>> > outvp,
>> >  outoffp, lenp, flags, incred, outcred, fsize_td);
>>
>> Thinking again, what happen if there are two nullfs mounts on top of two
>> different file systems, one of which is indeed not ZFS?  Do we need to
>> add those checks to all ZFS, NFS and FUSE, implementing
>> VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS?
> Although it would be nice to do the check before the VOP call, I don't
> see an easy way to do that.
>
> It looks like the simple solution is to add a check in each of the
> VOP_COPY_FILE_RANGE() calls, such as mjg@ has proposed
> for ZFS. At this point there is only the three and I can easily do the
> NFS one.
>

All filesystems except for zfs are already covered because they check
for mismatched mount.

-- 
Mateusz Guzik 



Re: crash zfs_clone_range()

2023-11-14 Thread Mateusz Guzik
On 11/14/23, Alexander Motin  wrote:
> On 14.11.2023 12:44, Alexander Motin wrote:
>> On 14.11.2023 12:39, Mateusz Guzik wrote:
>>> One of the vnodes is probably not zfs, I suspect this will do it
>>> (untested):
>>>
>>> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>>> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>>> index 107cd69c756c..e799a7091b8e 100644
>>> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>>> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>>> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct
>>> vop_copy_file_range_args *ap)
>>>  goto bad_write_fallback;
>>>  }
>>>  }
>>> +
>>> +   if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) {
>>> +   goto bad_write_fallback;
>>> +   }
>>> +
>>>  if (invp == outvp) {
>>>  if (vn_lock(outvp, LK_EXCLUSIVE) != 0) {
>>>  goto bad_write_fallback;
>>>
>>
>> vn_copy_file_range() verifies for that:
>>
>>  /*
>>   * If the two vnodes are for the same file system type, call
>>   * VOP_COPY_FILE_RANGE(), otherwise call
>> vn_generic_copy_file_range()
>>   * which can handle copies across multiple file system types.
>>   */
>>  *lenp = len;
>>  if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name,
>>  outmp->mnt_vfc->vfc_name) == 0)
>>  error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp,
>> outoffp,
>>  lenp, flags, incred, outcred, fsize_td);
>>  else
>>  error = vn_generic_copy_file_range(invp, inoffp, outvp,
>>  outoffp, lenp, flags, incred, outcred, fsize_td);
>
> Thinking again, what happen if there are two nullfs mounts on top of two
> different file systems, one of which is indeed not ZFS?  Do we need to
> add those checks to all ZFS, NFS and FUSE, implementing
> VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS?
>

I already advocated for not trying to guess for filesystems what they
can or cannot handle internally.

That is to say vn_copy_file_range should call VOP_COPY_FILE_RANGE,
that can try to figure out what to do and if it got nothing punt to a
fallback. This already happens for some of the cases.

-- 
Mateusz Guzik 



Re: crash zfs_clone_range()

2023-11-14 Thread Mateusz Guzik
On 11/14/23, Alexander Motin  wrote:
> On 14.11.2023 12:39, Mateusz Guzik wrote:
>> One of the vnodes is probably not zfs, I suspect this will do it
>> (untested):
>>
>> diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>> b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>> index 107cd69c756c..e799a7091b8e 100644
>> --- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>> +++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
>> @@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct
>> vop_copy_file_range_args *ap)
>>  goto bad_write_fallback;
>>  }
>>  }
>> +
>> +   if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) {
>> +   goto bad_write_fallback;
>> +   }
>> +
>>  if (invp == outvp) {
>>  if (vn_lock(outvp, LK_EXCLUSIVE) != 0) {
>>  goto bad_write_fallback;
>>
>
> vn_copy_file_range() verifies for that:
>
>  /*
>   * If the two vnodes are for the same file system type, call
>   * VOP_COPY_FILE_RANGE(), otherwise call
> vn_generic_copy_file_range()
>   * which can handle copies across multiple file system types.
>   */
>  *lenp = len;
>  if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name,
>  outmp->mnt_vfc->vfc_name) == 0)
>  error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp,
>  lenp, flags, incred, outcred, fsize_td);
>  else
>  error = vn_generic_copy_file_range(invp, inoffp, outvp,
>          outoffp, lenp, flags, incred, outcred, fsize_td);
>
>

The crash at hand comes from nullfs. If "outward" vnodes are both
nullfs, but only one underlying vnode is zfs, you get the above.

-- 
Mateusz Guzik 



Re: crash zfs_clone_range()

2023-11-14 Thread Mateusz Guzik
76380)
>> > >sp: 0xdeefe280
>> > >lr: 0x01623564 (zfs_clone_range + 0x6c)
>> > >   elr: 0x004e0d60 (rms_rlock + 0x1c)
>> > > spsr: 0xa045
>> > >   far: 0x0108
>> > >   esr: 0x9604
>> > > panic: data abort in critical section or under mutex
>> > > cpuid = 1
>> > > time = 1699610885
>> > > KDB: stack backtrace:
>> > > db_trace_self() at db_trace_self
>> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x38
>> > > vpanic() at vpanic+0x1a0
>> > > panic() at panic+0x48
>> > > data_abort() at data_abort+0x2fc
>> > > handle_el1h_sync() at handle_el1h_sync+0x18
>> > > --- exception, esr 0x9604
>> > > rms_rlock() at rms_rlock+0x1c
>> > > zfs_clone_range() at zfs_clone_range+0x68
>> > > zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
>> > > null_bypass() at null_bypass+0x118
>> > > vn_copy_file_range() at vn_copy_file_range+0x18c
>> > > kern_copy_file_range() at kern_copy_file_range+0x36c
>> > > sys_copy_file_range() at sys_copy_file_range+0x8c
>> > > do_el0_sync() at do_el0_sync+0x634
>> > > handle_el0_sync() at handle_el0_sync+0x48
>> > > --- exception, esr 0x5600
>> > > KDB: enter: panic
>> > > [ thread pid 3792 tid 100394 ]
>> > > Stopped at  kdb_enter+0x48: str xzr, [x19, #768]
>> > > db>
>> > >
>> > > I'll keep the debugger open for a while. Can I type something for
>> > > additional info?
>> > >
>> > > Regards,
>> > > Ronald.
>> >
>> > --
>> > Alexander Motin
>>
>>
>>
>
>
> Hi,
>
> Build a new kernel today.
> FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #20 main-051d69d6f8: Tue Nov
> 14 12:16:28 CET 2023
> ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG
> arm64
>
> x0: 0x08e0
>
>   x1: 0xa0006ce1fb38
>
>   x2: 0xa0006837a400
>
>   x3: 0xa0012c503a48
>
>   x4: 0xeb0ef430 (next_index + 0x815d790)
>
>   x5: 0xa00152636300
>
>   x6: 0xe2e025b5 ($d.5 + 0xc)
>
>   x7: 0x030a
>
>   x8: 0xeb3212c0 (next_index + 0x838f620)
>
>   x9: 0x0001
>
>  x10: 0x
>
>  x11: 0x0001
>
>  x12: 0x0002
>
>  x13: 0x
>
>  x14: 0x0001
>
>  x15: 0x
>  x16: 0x016e5b58 (__stop_set_modmetadata_set + 0x1328)
>  x17: 0x004e0c28 (rms_rlock + 0x0)
>  x18: 0xeb0ef250 (next_index + 0x815d5b0)
>  x19: 0x0800
>  x20: 0xeb0ef430 (next_index + 0x815d790)
>  x21: 0x7fff
>  x22: 0xa0006ce1fb38
>  x23: 0xa0006837a400
>  x24: 0xa001ee486000
>  x25: 0x08e0
>  x26: 0xa000135ca000
>  x27: 0x0800
>  x28: 0xa000135ca000
>  x29: 0xeb0ef250 (next_index + 0x815d5b0)
>   sp: 0xeb0ef250
>   lr: 0x0162bee8 (zfs_clone_range + 0x6c)
>  elr: 0x004e0c44 (rms_rlock + 0x1c)
> spsr: 0xa045
>  far: 0x0908
>  esr: 0x9604
> panic: data abort in critical section or under mutex
> cpuid = 2
> time = 1699966486
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1a0
> panic() at panic+0x48
> data_abort() at data_abort+0x2fc
> handle_el1h_sync() at handle_el1h_sync+0x18
> --- exception, esr 0x9604
> rms_rlock() at rms_rlock+0x1c
> zfs_clone_range() at zfs_clone_range+0x68
> zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
> null_bypass() at null_bypass+0x118
> vn_copy_file_range() at vn_copy_file_range+0x1c0
> kern_copy_file_range() at kern_copy_file_range+0x36c
> sys_copy_file_range() at sys_copy_file_range+0x8c
> do_el0_sync() at do_el0_sync+0x634
> handle_el0_sync() at handle_el0_sync+0x48
> --- exception, esr 0x5600
> KDB: enter: panic
> [ thread pid 3620 tid 100911 ]
> Stopped at  kdb_enter+0x48: str xzr, [x19, #768]
> db>
>
> This happens as soon as I start poudriere in a jenkins-agent jail.
>
> AFAIK this includes the two recent vn_copy_file_range changes of
> Konstantin.
>
> Next I will install a GENERIC kernel instead of GENERIC-NODEBUG.
>

One of the vnodes is probably not zfs, I suspect this will do it (untested):

diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
index 107cd69c756c..e799a7091b8e 100644
--- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
+++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
@@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct
vop_copy_file_range_args *ap)
goto bad_write_fallback;
}
}
+
+   if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) {
+   goto bad_write_fallback;
+   }
+
if (invp == outvp) {
if (vn_lock(outvp, LK_EXCLUSIVE) != 0) {
goto bad_write_fallback;



-- 
Mateusz Guzik 



Re: kernel 100% CPU

2023-10-16 Thread Mateusz Guzik
On 10/14/23, Graham Perrin  wrote:
> On 03/09/2023 20:25, Mateusz Guzik wrote:
>
>> …
>>
>> Sorry mate, neglected to specify: collect sysctl -a once you run into
>> the problem.
>>
>> Once I look at that I'm probably going to ship some debug patches to
>> narrow it down.
>
>
> I had what might be the same issue this afternoon, for the first time in
> weeks. With the slowdown, it took a few minutes for me to find your
> email, unfortunately the symptoms subsided moments before finding it.
>
> So, I collected the information, but it's probably another missed
> opportunity (the information collected too late).
>
> The busy period lasted for around six minutes:
>
> root@mowa219-gjp4-8570p-freebsd:~ # poudriere bulk -j main -J 3 -Ctv
> devel/pkgconf
> [00:00:00] Creating the reference jail... done
> [00:00:01] Mounting system devices for main-default
> [00:00:02] Warning: Using packages from previously failed, or
> uncommitted, build:
> /usr/local/poudriere/data/packages/main-default/.building
> [00:00:02] Mounting ccache from: /usr/.ccache
> [00:00:02] Mounting ports from: /usr/local/poudriere/ports/default
> [00:00:02] Mounting packages from:
> /usr/local/poudriere/data/packages/main-default
> [00:00:02] Mounting distfiles from: /usr/ports/distfiles
> [00:00:02] Copying /var/db/ports from:
> /usr/local/etc/poudriere.d/main-options
> [00:00:02] Appending to make.conf: /usr/local/etc/poudriere.d/make.conf
> /etc/resolv.conf ->
> /usr/local/poudriere/data/.m/main-default/ref/etc/resolv.conf
> [00:00:02] Starting jail main-default
> [00:00:02] Will build as nobody:nobody (65534:65534)
> [00:00:05] Logs:
> /usr/local/poudriere/data/logs/bulk/main-default/2023-10-14_16h16m30s
> [00:00:05] Loading MOVED for
> /usr/local/poudriere/data/.m/main-default/ref/usr/ports
> [00:00:06] Ports supports: FLAVORS SELECTED_OPTIONS
> [00:00:06] Inspecting ports tree for modifications to git checkout... yes
> [00:06:39] Ports top-level git hash: e843b8293c (dirty)
> …
>

sysctl output you pasted privately shows you are running a kernel form
October 9th, fee14577d590. This predates a slew of fixes to vnlru I
committed the day after.

They fix some of the problems (not all of them!), but it should be good enough.

tl;dr just update to newest main and it should be fine.

-- 
Mateusz Guzik 



Re: Continually count the number of open files

2023-09-14 Thread Mateusz Guzik
On 9/13/23, David Chisnall  wrote:
> On 12 Sep 2023, at 17:19, Bakul Shah  wrote:
>>
>> FreeBSD
>> should add inotify.
>
> inotify is also probably not the right thing.  If someone is interested in
> adding this, Apple’s fsevents API is a better inspiration.  It is carefully
> designed to ensure that the things monitoring for events can’t ever block
> filesystem operations from making progress.

I'm not sure what you mean here specifically and I don't see anything
careful about what they did.

>From userspace POV the API is allowed to drop events, which makes life
easy on this front and is probably the right call.

The implementation is utterly horrid. For one, the non-blocking aspect
starts with the obvious equivalent of uma_zalloc(..., M_NOWAIT) and
bailing if it fails, except if you read past that to actual
registration it can perform an alloc which can block indefinitely
while holding on to some vnodes:
// if we haven't gotten the path yet, get it.
if (pathbuff == NULL) {
pathbuff = get_pathbuff();
pathbuff_len = MAXPATHLEN;

where get_pathbuf is:
return zalloc(ZV_NAMEI);

So the notification routine can block indefinitely in a low-memory
condition. I tried to figure out if this is ever called without other
vnodes write-locked (as opposed to "just" refed), but their code is
such a mess that my level of curiosity was dwarfed by difficulty of
getting a definitive answer.

Other than that it is terribly inefficient and artificially limited to
8 processes which can do anything.

That is to say it is unfit for anything but laptop-scale usage.

Perhaps you meant it does not block if the watchers decide to not
process any events, but that's almost inherently true if one allows
for lossy notifications.

> I think there’s a nice design
> possible with a bloom filter in the kernel of events that ensures that
> monitors may get spurious events but don’t miss out on anything.
>
[snip]
>  I think the right kernel API would walk the directory and add the vnodes to
> a bloom filter and trigger a notification on a match in the filter.  You’d
> then have occasional spurious notifications but you’d have something that
> could be monitored via kqueue and could be made to not block anything else
> in the kernel.
>

I don't see how this can work.

A directory can have more inodes than you can have vnodes at any
point. So if you add vnodes to a list as you go, they may fall off of
so that you can continue adding other entries.

But perhaps you mean you could store the inode number as opposed to
holding to the vnode? Even then, the number of entries to scan to make
it happen is so big that it is going to be impractical on anything but
laptop-scale.

What can be fast is checking if the parent dir wants notifications,
but this ignores changes to hardlinks. Except *currently* the VFS
layer does not reliably track who the parent is (and in fact it can
fail to spot one).

The VFS layer contains a lot of cruft and design decisions which at
least today are questionable at best, but fixable. A big chunk of this
concerns name caching, which currently is entirely optional. Should
someone want to propose an API for file notification changes, they
need to state something which if implemented does not result in
unfixable drag on the layer, even if initial implementation would be
suboptimal. Handling arbitrary hardlinks looks like a drag to me, but
I'm happy to review an implementation which avoids being a problem.

That is to say, a laptop-scale API can probably be implemented as is,
but solution which can provide reliable events (not to be confused
with reliably notifying about all events) would require numerous
changes.

-- 
Mateusz Guzik 



Re: Speed improvements in ZFS

2023-09-04 Thread Mateusz Guzik
On 9/4/23, Alexander Leidinger  wrote:
> Am 2023-08-28 22:33, schrieb Alexander Leidinger:
>> Am 2023-08-22 18:59, schrieb Mateusz Guzik:
>>> On 8/22/23, Alexander Leidinger  wrote:
>>>> Am 2023-08-21 10:53, schrieb Konstantin Belousov:
>>>>> On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:
>>>>>> Am 2023-08-20 23:17, schrieb Konstantin Belousov:
>>>>>> > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
>>>>>> > > On 8/20/23, Alexander Leidinger  wrote:
>>>>>> > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
>>>>>> > > >> On 8/20/23, Alexander Leidinger 
>>>>>> > > >> wrote:
>>>>>> > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
>>>>>> > > >>>> On 8/18/23, Alexander Leidinger 
>>>>>> > > >>>> wrote:
>>>>>> > > >>>
>>>>>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
>>>>>> > > >>>>> interested
>>>>>> > > >>>>> to
>>>>>> > > >>>>> get it?
>>>>>> > > >>>>>
>>>>>> > > >>>>
>>>>>> > > >>>> Your problem is not the vnode limit, but nullfs.
>>>>>> > > >>>>
>>>>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
>>>>>> > > >>>
>>>>>> > > >>> 122 nullfs mounts on this system. And every jail I setup has
>>>>>> > > >>> several
>>>>>> > > >>> null mounts. One basesystem mounted into every jail, and then
>>>>>> > > >>> shared
>>>>>> > > >>> ports (packages/distfiles/ccache) across all of them.
>>>>>> > > >>>
>>>>>> > > >>>> First, some of the contention is notorious VI_LOCK in order
>>>>>> > > >>>> to
>>>>>> > > >>>> do
>>>>>> > > >>>> anything.
>>>>>> > > >>>>
>>>>>> > > >>>> But more importantly the mind-boggling off-cpu time comes
>>>>>> > > >>>> from
>>>>>> > > >>>> exclusive locking which should not be there to begin with --
>>>>>> > > >>>> as
>>>>>> > > >>>> in
>>>>>> > > >>>> that xlock in stat should be a slock.
>>>>>> > > >>>>
>>>>>> > > >>>> Maybe I'm going to look into it later.
>>>>>> > > >>>
>>>>>> > > >>> That would be fantastic.
>>>>>> > > >>>
>>>>>> > > >>
>>>>>> > > >> I did a quick test, things are shared locked as expected.
>>>>>> > > >>
>>>>>> > > >> However, I found the following:
>>>>>> > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
>>>>>> > > >> mp->mnt_kern_flag |=
>>>>>> > > >> lowerrootvp->v_mount->mnt_kern_flag &
>>>>>> > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
>>>>>> > > >> MNTK_EXTENDED_SHARED);
>>>>>> > > >> }
>>>>>> > > >>
>>>>>> > > >> are you using the "nocache" option? it has a side effect of
>>>>>> > > >> xlocking
>>>>>> > > >
>>>>>> > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
>>>>>> > > >
>>>>>> > >
>>>>>> > > If you don't have "nocache" on null mounts, then I don't see how
>>>>>> > > this
>>>>>> > > could happen.
>>>>>> >
>>>>>> > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
>>>>>> > for
>>>>>> > fuse and nfs at least.
>>>>>>
>>>>>> 11 of those 122 nullfs mounts are ZFS datasets which are also NFS
>>>>>> exported.
>>>>>> 6 of those nullfs mounts are also exported via Samba. The NFS
>>>>>> exports
>>>>>> shouldn't be needed anymore, I will remove them.
>>>>> By nfs I meant nfs client, not nfs exports.
>>>>
>>>> No NFS client mounts anywhere on this system. So where is this
>>>> exclusive
>>>> lock coming from then...
>>>> This is a ZFS system. 2 pools: one for the root, one for anything I
>>>> need
>>>> space for. Both pools reside on the same disks. The root pool is a
>>>> 3-way
>>>> mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
>>>> space-pool. The jails are all basejail-style jails.
>>>>
>>>
>>> While I don't see why xlocking happens, you should be able to dtrace
>>> or printf your way into finding out.
>>
>> dtrace looks to me like a faster approach to get to the root than
>> printf... my first naive try is to detect exclusive locks. I'm not 100%
>> sure I got it right, but at least dtrace doesn't complain about it:
>> ---snip---
>> #pragma D option dynvarsize=32m
>>
>> fbt:nullfs:null_lock:entry
>> /args[0]->a_flags & 0x08 != 0/
>> {
>> stack();
>> }
>> ---snip---
>>
>> In which direction should I look with dtrace if this works in tonights
>> run of periodic? I don't have enough knowledge about VFS to come up
>> with some immediate ideas.
>
> After your sysctl fix for maxvnodes I increased the amount of vnodes 10
> times compared to the initial report. This has increased the speed of
> the operation, the find runs in all those jails finished today after ~5h
> (@~8am) instead of in the afternoon as before. Could this suggest that
> in parallel some null_reclaim() is running which does the exclusive
> locks and slows down the entire operation?
>

That may be a slowdown to some extent, but the primary problem is
exclusive vnode locking for stat lookup, which should not be
happening.

-- 
Mateusz Guzik 



Re: kernel 100% CPU

2023-09-03 Thread Mateusz Guzik
On 9/3/23, Graham Perrin  wrote:
> On 03/09/2023 17:55, Mateusz Guzik wrote:
>> On 9/3/23, Graham Perrin  wrote:
>>> On 02/09/2023 18:31, Mateusz Guzik wrote:
>>>> On 9/2/23, Graham Perrin wrote:
>>>>> … I began the trace /after/ the issue became observable.
>>>>> Will it be more meaningful to begin a trace and then reproduce the
>>>>> issue
>>>>> (before the trace ends)?
>>>>>
>>>>> …
>>>> Looks like you have a lot of unrelated traffic in there.
>>>>
>>>> …
>>> Instead, <https://mega.nz/folder/dQdgXK4K#Eb-uC02fT63eweQWWwD8TA> the
>>> two files from 09:21 this morning. Are these useful?
>>>
>>> Before this run of DTrace, I quit Firefox and other applications that
>>> might be causing noise (and the OS has been restarted since my last run
>>> of poudriere-bulk(8)).
>>>
>>> dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count();
>>> } tick-60s { exit(0); }' -o out.kern_stacks
>>>
>> Post your "sysctl -a" somewhere.
>>
> sysctl-a-2023-09-03-18-22.txt added to the MEGA folder is complete,
> including TSLOG-related lines.
>
> Alternatively, tslog under
> <https://bsd-hardware.info/?probe=d240fba8b7#Logs> is automatically
> pruned to exclude such lines. Hopefully not excessively pruned.
>
> TSLOG is one of three things in a Git stash that I apply before most
> builds, <https://reviews.freebsd.org/P601>.
>

Sorry mate, neglected to specify: collect sysctl -a once you run into
the problem.

Once I look at that I'm probably going to ship some debug patches to
narrow it down.

-- 
Mateusz Guzik 



Re: kernel 100% CPU

2023-09-03 Thread Mateusz Guzik
On 9/3/23, Graham Perrin  wrote:
> On 02/09/2023 18:31, Mateusz Guzik wrote:
>> On 9/2/23, Graham Perrin wrote:
>>> … I began the trace /after/ the issue became observable.
>>> Will it be more meaningful to begin a trace and then reproduce the issue
>>> (before the trace ends)?
>>>
>>> …
>> Looks like you have a lot of unrelated traffic in there.
>>
>> …
>
> Instead, <https://mega.nz/folder/dQdgXK4K#Eb-uC02fT63eweQWWwD8TA> the
> two files from 09:21 this morning. Are these useful?
>
> Before this run of DTrace, I quit Firefox and other applications that
> might be causing noise (and the OS has been restarted since my last run
> of poudriere-bulk(8)).
>
> dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count();
> } tick-60s { exit(0); }' -o out.kern_stacks
>

Post your "sysctl -a" somewhere.


-- 
Mateusz Guzik 



Re: kernel 100% CPU

2023-09-03 Thread Mateusz Guzik
On 9/3/23, Graham Perrin  wrote:
> On 03/09/2023 09:01, Juraj Lutter wrote:
>> … The script mjg@ provided is not a shell script.
>>
>> The script filename is “script.d” where you should put the
>> above-mentioned DTrace script (without the "dtrace -s script.d -o out”
>> line).
>
>
> Thanks, I guess that I'm still doing something wrong:
>
>
> root@mowa219-gjp4-8570p-freebsd:/home/grahamperrin/Documents/IT/BSD/FreeBSD/kernel-cpu
>
> # time dtrace -s script.d -o /tmp/out
> dtrace: script 'script.d' matched 4 probes
> ^C0.246u 4.049s 27:25.70 0.2%   14+91k 261+0io 274pf+0w
> root@mowa219-gjp4-8570p-freebsd:/home/grahamperrin/Documents/IT/BSD/FreeBSD/kernel-cpu
>
> # cat /tmp/out
>
> CPU IDFUNCTION:NAME
>3  2 :END
>
> root@mowa219-gjp4-8570p-freebsd:/home/grahamperrin/Documents/IT/BSD/FreeBSD/kernel-cpu
>

The script is intended to run when you have git executing for a long time.

-- 
Mateusz Guzik 



Re: kernel 100% CPU, and ports-mgmt/poudriere-devel 'Inspecting ports tree for modifications to git checkout...' for an extraordinarily long time

2023-09-02 Thread Mateusz Guzik
On 9/2/23, Graham Perrin  wrote:
> On 02/09/2023 10:17, Mateusz Guzik wrote:
>> On 9/2/23, Graham Perrin  wrote:
>>> Some inspections are extraordinarily time-consuming. Others complete
>>> very quickly, as they should.
>>>
>>> One recent inspection took more than half an hour.
>>>
>>> Anyone else?
>>>
>>> A screenshot: <https://i.imgur.com/SK9qvfw.png>
>>>
>>> % pkg iinfo poudriere-devel
>>> poudriere-devel-3.3.99.20220831
>>> % uname -aKU
>>> FreeBSD mowa219-gjp4-8570p-freebsd 15.0-CURRENT FreeBSD 15.0-CURRENT
>>> amd64 150 #10 main-n265053-315ee00fa961-dirty: Mon Aug 28 06:22:31
>>> BST 2023
>>> grahamperrin@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG
>>>
>>> amd64 150 150
>>> %
>> get a flamegraph with dtrace
>>
>> https://github.com/brendangregg/FlameGraph
>
> Thanks! TIL, it's ported.
>
> Graph attached, as a PNG, although I don't know whether it'll be useful,
> because I began the trace /after/ the issue became observable.
>
> Will it be more meaningful to begin a trace and then reproduce the issue
> (before the trace ends)?
>
> 
>
> root@mowa219-gjp4-8570p-freebsd:/tmp # dtrace -x stackframes=100 -n
> 'profile-997 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' -o
> out.kern_stacks
> dtrace: description 'profile-997 ' matched 2 probes
> root@mowa219-gjp4-8570p-freebsd:/tmp # stackcollapse.pl out.kern_stacks
>  > out.kern_folded
> root@mowa219-gjp4-8570p-freebsd:/tmp # flamegraph.pl out.kern_folded >
> kernel.svg
> root@mowa219-gjp4-8570p-freebsd:/tmp #
>

Looks like you have a lot of unrelated traffic in there.

Run this script:
#pragma D option dynvarsize=32m

profile:::profile-997
/execname == "find"/
{
@oncpu[stack(), "oncpu"] = count();
}

/*
 * The p_flag & 0x4 test filters out kernel threads.
 */

sched:::off-cpu
/execname == "find"/
{
self->ts = timestamp;
}

sched:::on-cpu
/self->ts/
{
@offcpu[stack(30), "offcpu"] = sum(timestamp - self->ts);
self->ts = 0;
}

dtrace:::END
{
normalize(@offcpu, 100);
printa("%k\n%s\n%@d\n\n", @offcpu);
printa("%k\n%s\n%@d\n\n", @oncpu);
}

dtrace -s script.d -o out

this can be fed to generate a flamegraph the same way; upload it to freefall

-- 
Mateusz Guzik 



Re: 100% CPU time for sysctl command, not killable

2023-09-02 Thread Mateusz Guzik
On 8/20/23, Alexander Leidinger  wrote:
> Hi,
>
> sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable
> sysctl program. This is somewhat unexpected...
>

fixed here 
https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3

employing basic profiling tooling immediately shows the issue, in this
case for example:
pmcstat -S inst_retired.any_p -TI

> Bye,
> Alexander.
>
> --
> http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF
>
>


-- 
Mateusz Guzik 



Re: kernel 100% CPU, and ports-mgmt/poudriere-devel 'Inspecting ports tree for modifications to git checkout...' for an extraordinarily long time

2023-09-02 Thread Mateusz Guzik
On 9/2/23, Graham Perrin  wrote:
> Some inspections are extraordinarily time-consuming. Others complete
> very quickly, as they should.
>
> One recent inspection took more than half an hour.
>
> Anyone else?
>
> A screenshot: <https://i.imgur.com/SK9qvfw.png>
>
> % pkg iinfo poudriere-devel
> poudriere-devel-3.3.99.20220831
> % uname -aKU
> FreeBSD mowa219-gjp4-8570p-freebsd 15.0-CURRENT FreeBSD 15.0-CURRENT
> amd64 150 #10 main-n265053-315ee00fa961-dirty: Mon Aug 28 06:22:31
> BST 2023
> grahamperrin@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG
>
> amd64 150 150
> %
>
>
>

get a flamegraph with dtrace

https://github.com/brendangregg/FlameGraph

-- 
Mateusz Guzik 



Re: Panic with 15.0-CURRENT

2023-08-25 Thread Mateusz Guzik
On 8/25/23, Mateusz Guzik  wrote:
> On 8/25/23, Yasuhiro Kimura  wrote:
>> Hello,
>>
>> I made regular update of my amd64 system from main-n264870-e5e6a865358
>> to main-n265022-1554ba03b65 and system crashed with panic while
>> building packages with poudriere.
>>
>> Screenshot of console:
>> https://people.freebsd.org/~yasu/FreeBSD-15-CURRENT-amd64-main-n265022-1554ba03b65.20230825.panic.png
>>
>
> this is a fallout from the recent timerfd commit. I'll fix it in few.
>

fixed in 
https://cgit.freebsd.org/src/commit/?id=02f534b57f84d6f4f97c337b05b383c8b3aaf18c

-- 
Mateusz Guzik 



Re: Panic with 15.0-CURRENT

2023-08-25 Thread Mateusz Guzik
On 8/25/23, Yasuhiro Kimura  wrote:
> Hello,
>
> I made regular update of my amd64 system from main-n264870-e5e6a865358
> to main-n265022-1554ba03b65 and system crashed with panic while
> building packages with poudriere.
>
> Screenshot of console:
> https://people.freebsd.org/~yasu/FreeBSD-15-CURRENT-amd64-main-n265022-1554ba03b65.20230825.panic.png
>

this is a fallout from the recent timerfd commit. I'll fix it in few.


-- 
Mateusz Guzik 



Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mateusz Guzik
On 8/24/23, Mark Millard  wrote:
> On Aug 23, 2023, at 15:10, Mateusz Guzik  wrote:
>
>> On 8/23/23, Mark Millard  wrote:
>>> [Forked off the ZFS deadlock 14 discussion, per feedback.]
>>>
>>> On Aug 23, 2023, at 11:40, Alexander Motin  wrote:
>>>
>>>> On 22.08.2023 14:24, Mark Millard wrote:
>>>>> Alexander Motin  wrote on
>>>>> Date: Tue, 22 Aug 2023 16:18:12 UTC :
>>>>>> I am waiting for final test results from George Wilson and then will
>>>>>> request quick merge of both to zfs-2.2-release branch. Unfortunately
>>>>>> there are still not many reviewers for the PR, since the code is not
>>>>>> trivial, but at least with the test reports Brian Behlendorf and Mark
>>>>>> Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
>>>>>> have tested and/or reviewed the PR, you may comment on it.
>>>>> I had written to the list that when I tried to test the system
>>>>> doing poudriere builds (initially with your patches) using
>>>>> USE_TMPFS=no so that zfs had to deal with all the file I/O, I
>>>>> instead got only one builder that ended up active, the others
>>>>> never reaching "Builder started":
>>>>
>>>>> Top was showing lots of "vlruwk" for the cpdup's. For example:
>>>>> . . .
>>>>> 362 0 root 400  27076Ki   13776Ki CPU19   19   4:23
>>>>> 0.00% cpdup -i0 -o ref 32
>>>>> 349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20
>>>>> 0.01% cpdup -i0 -o ref 31
>>>>> 328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30
>>>>> 0.01% cpdup -i0 -o ref 30
>>>>> 304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18
>>>>> 0.01% cpdup -i0 -o ref 29
>>>>> 282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33
>>>>> 0.01% cpdup -i0 -o ref 28
>>>>> 242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28
>>>>> 0.00% cpdup -i0 -o ref 27
>>>>> . . .
>>>>> But those processes did show CPU?? on occasion, as well as
>>>>> *vnode less often. None of the cpdup's was stuck in
>>>>> Removing your patches did not change the behavior.
>>>>
>>>> Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not
>>>> deep in that area at least recently, so somebody with more experience
>>>> there could try to diagnose it.  At very least it does not look related
>>>> to
>>>> the ZIL issue discussed in this thread, at least with the information
>>>> provided, so I am not surprised that the mentioned patches do not
>>>> affect
>>>> it.
>>>
>>> I did the above intending to test the deadlock in my context but
>>> ended up not getting that far when I tried to make zfs handle all
>>> the file I/O (USE_TMPFS=no and no other use of tmpfs or the like).
>>>
>>> The zfs context is a simple single partition on the boot media. I
>>> use ZFS for bectl BE use, not for other typical reasons. The media
>>> here is PCIe Optane 1.4T media. The machine is a ThreadRipper
>>> 1950X, so first generation. 128 GiBytes of RAM. 491520 MiBytes of
>>> swap, also on that Optane.
>>>
>>> # uname -apKU
>>> FreeBSD amd64-ZFS 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #112
>>> main-n264912-b1d3e2b77155-dirty: Sun Aug 20 10:01:48 PDT 2023
>>> root@amd64-ZFS:/usr/obj/BUILDs/main-amd64-nodbg-clang/usr/main-src/amd64.amd64/sys/GENERIC-NODBG
>>> amd64 amd64 1400096 1400096
>>>
>>> The GENERIC-DBG variant of the kernel did not report any issues in
>>> earlier testing.
>>>
>>> The alter referenced /usr/obj/DESTDIRs/main-amd64-poud-bulk_a was
>>> installed from the same build.
>>>
>>> # zfs list
>>> NAME  USED  AVAIL  REFER
>>> MOUNTPOINT
>>> zoptb79.9G   765G96K  /zoptb
>>> zoptb/BUILDs 20.5G   765G  8.29M
>>> /usr/obj/BUILDs
>>> zoptb/BUILDs/alt-main-amd64-dbg-clang-alt1.86M   765G  1.86M
>>> /usr/obj/BUILDs/alt-main-amd64-dbg-clang-alt
>>> zoptb/BUILDs/alt-main-amd64-nodbg-clang-alt  30.2M   765G  30.2M
>>> /usr/obj/BUILDs/alt-main-amd64-nodbg-clang-alt
>>> zoptb/B

Re: poudriere bulk with ZFS and USE_TMPFS=no on main [14-ALPHA2 based]: extensive vlruwk for cpdup's on new builders after pkg builds in first builder

2023-08-23 Thread Mateusz Guzik
t;  51 0x83b14000 3220 intpm.ko
>  61 0x83b18000 2178 smbus.ko
>  71 0x83b1b000 2240 cpuctl.ko
>  81 0x83b1e000 3360 uhid.ko
>  91 0x83b22000 4364 ums.ko
> 101 0x83b27000 33c0 usbhid.ko
> 111 0x83b2b000 3380 hidbus.ko
> 121 0x83b2f000 4d20 ng_ubt.ko
> 136 0x83b34000 abb8 netgraph.ko
> 142 0x83b3f000 a250 ng_hci.ko
> 154 0x83b4a000 2670 ng_bluetooth.ko
> 161 0x83b4d000 83a0 uftdi.ko
> 17    1 0xffff83b56000 4e58 ucom.ko
> 181 0x83b5b000 3360 wmt.ko
> 191 0x83b5f000 e268 ng_l2cap.ko
> 201 0x83b6e0001bf68 ng_btsocket.ko
> 211 0x83b8a000 38f8 ng_socket.ko
> 221 0x83b8e000 3250 filemon.ko
> 231 0x83b92000 4758 nullfs.ko
> 241 0x83b97000 73c0 linprocfs.ko
> 253 0x83b9f000 be70 linux_common.ko
> 261 0x83bab000 3558 fdescfs.ko
> 271 0x83baf00031b20 linux.ko
> 281 0x83be10002ed40 linux64.ko
>
>
> Note that before the "Cleaning up" notice, the vfs.freevnodes
> shows as being around (for example) 2210796. But after
> "Exiting with status": 61362. vfs.vnodes_created has a
> similar staging of in the ball park of up to 1343 but
> then the change to: 20135479. Similarly, vfs.numvnodes
> 2216564 -> 59860.
>
>
>
> Anything else I should gather and report as basic information?
>

This is a known problem, but it is unclear if you should be running
into it in this setup.

Can you try again but this time *revert*
138a5dafba312ff39ce0eefdbe34de95519e600d, like so:
git revert 138a5dafba312ff39ce0eefdbe34de95519e600d

may want to switch to a different branch first, for example: git
checkout -b vfstesting

-- 
Mateusz Guzik 



Re: Speed improvements in ZFS

2023-08-22 Thread Mateusz Guzik
On 8/22/23, Alexander Leidinger  wrote:
> Am 2023-08-21 10:53, schrieb Konstantin Belousov:
>> On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:
>>> Am 2023-08-20 23:17, schrieb Konstantin Belousov:
>>> > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
>>> > > On 8/20/23, Alexander Leidinger  wrote:
>>> > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
>>> > > >> On 8/20/23, Alexander Leidinger  wrote:
>>> > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
>>> > > >>>> On 8/18/23, Alexander Leidinger 
>>> > > >>>> wrote:
>>> > > >>>
>>> > > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
>>> > > >>>>> interested
>>> > > >>>>> to
>>> > > >>>>> get it?
>>> > > >>>>>
>>> > > >>>>
>>> > > >>>> Your problem is not the vnode limit, but nullfs.
>>> > > >>>>
>>> > > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
>>> > > >>>
>>> > > >>> 122 nullfs mounts on this system. And every jail I setup has
>>> > > >>> several
>>> > > >>> null mounts. One basesystem mounted into every jail, and then
>>> > > >>> shared
>>> > > >>> ports (packages/distfiles/ccache) across all of them.
>>> > > >>>
>>> > > >>>> First, some of the contention is notorious VI_LOCK in order to
>>> > > >>>> do
>>> > > >>>> anything.
>>> > > >>>>
>>> > > >>>> But more importantly the mind-boggling off-cpu time comes from
>>> > > >>>> exclusive locking which should not be there to begin with -- as
>>> > > >>>> in
>>> > > >>>> that xlock in stat should be a slock.
>>> > > >>>>
>>> > > >>>> Maybe I'm going to look into it later.
>>> > > >>>
>>> > > >>> That would be fantastic.
>>> > > >>>
>>> > > >>
>>> > > >> I did a quick test, things are shared locked as expected.
>>> > > >>
>>> > > >> However, I found the following:
>>> > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
>>> > > >> mp->mnt_kern_flag |=
>>> > > >> lowerrootvp->v_mount->mnt_kern_flag &
>>> > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
>>> > > >> MNTK_EXTENDED_SHARED);
>>> > > >> }
>>> > > >>
>>> > > >> are you using the "nocache" option? it has a side effect of
>>> > > >> xlocking
>>> > > >
>>> > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
>>> > > >
>>> > >
>>> > > If you don't have "nocache" on null mounts, then I don't see how
>>> > > this
>>> > > could happen.
>>> >
>>> > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
>>> > for
>>> > fuse and nfs at least.
>>>
>>> 11 of those 122 nullfs mounts are ZFS datasets which are also NFS
>>> exported.
>>> 6 of those nullfs mounts are also exported via Samba. The NFS exports
>>> shouldn't be needed anymore, I will remove them.
>> By nfs I meant nfs client, not nfs exports.
>
> No NFS client mounts anywhere on this system. So where is this exclusive
> lock coming from then...
> This is a ZFS system. 2 pools: one for the root, one for anything I need
> space for. Both pools reside on the same disks. The root pool is a 3-way
> mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
> space-pool. The jails are all basejail-style jails.
>

While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.

-- 
Mateusz Guzik 



Re: Speed improvements in ZFS

2023-08-20 Thread Mateusz Guzik
On 8/20/23, Alexander Leidinger  wrote:
> Am 2023-08-20 22:02, schrieb Mateusz Guzik:
>> On 8/20/23, Alexander Leidinger  wrote:
>>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
>>>> On 8/18/23, Alexander Leidinger  wrote:
>>>
>>>>> I have a 51MB text file, compressed to about 1MB. Are you interested
>>>>> to
>>>>> get it?
>>>>>
>>>>
>>>> Your problem is not the vnode limit, but nullfs.
>>>>
>>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
>>>
>>> 122 nullfs mounts on this system. And every jail I setup has several
>>> null mounts. One basesystem mounted into every jail, and then shared
>>> ports (packages/distfiles/ccache) across all of them.
>>>
>>>> First, some of the contention is notorious VI_LOCK in order to do
>>>> anything.
>>>>
>>>> But more importantly the mind-boggling off-cpu time comes from
>>>> exclusive locking which should not be there to begin with -- as in
>>>> that xlock in stat should be a slock.
>>>>
>>>> Maybe I'm going to look into it later.
>>>
>>> That would be fantastic.
>>>
>>
>> I did a quick test, things are shared locked as expected.
>>
>> However, I found the following:
>> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
>> mp->mnt_kern_flag |=
>> lowerrootvp->v_mount->mnt_kern_flag &
>> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
>> MNTK_EXTENDED_SHARED);
>> }
>>
>> are you using the "nocache" option? it has a side effect of xlocking
>
> I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
>

If you don't have "nocache" on null mounts, then I don't see how this
could happen.

-- 
Mateusz Guzik 



Re: Speed improvements in ZFS

2023-08-20 Thread Mateusz Guzik
On 8/20/23, Alexander Leidinger  wrote:
> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
>> On 8/18/23, Alexander Leidinger  wrote:
>
>>> I have a 51MB text file, compressed to about 1MB. Are you interested
>>> to
>>> get it?
>>>
>>
>> Your problem is not the vnode limit, but nullfs.
>>
>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
>
> 122 nullfs mounts on this system. And every jail I setup has several
> null mounts. One basesystem mounted into every jail, and then shared
> ports (packages/distfiles/ccache) across all of them.
>
>> First, some of the contention is notorious VI_LOCK in order to do
>> anything.
>>
>> But more importantly the mind-boggling off-cpu time comes from
>> exclusive locking which should not be there to begin with -- as in
>> that xlock in stat should be a slock.
>>
>> Maybe I'm going to look into it later.
>
> That would be fantastic.
>

I did a quick test, things are shared locked as expected.

However, I found the following:
if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
mp->mnt_kern_flag |= lowerrootvp->v_mount->mnt_kern_flag &
(MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
MNTK_EXTENDED_SHARED);
}

are you using the "nocache" option? it has a side effect of xlocking

-- 
Mateusz Guzik 



Re: Speed improvements in ZFS

2023-08-20 Thread Mateusz Guzik
On 8/18/23, Alexander Leidinger  wrote:
> Am 2023-08-16 18:48, schrieb Alexander Leidinger:
>> Am 2023-08-15 23:29, schrieb Mateusz Guzik:
>>> On 8/15/23, Alexander Leidinger  wrote:
>>>> Am 2023-08-15 14:41, schrieb Mateusz Guzik:
>>>>
>>>>> With this in mind can you provide: sysctl kern.maxvnodes
>>>>> vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
>>>>> vfs.recycles_free vfs.recycles
>>>>
>>>> After a reboot:
>>>> kern.maxvnodes: 10485760
>>>> vfs.wantfreevnodes: 2621440
>>>> vfs.freevnodes: 24696
>>>> vfs.vnodes_created: 1658162
>>>> vfs.numvnodes: 173937
>>>> vfs.recycles_free: 0
>>>> vfs.recycles: 0
>>
>> New values after one rund of periodic:
>> kern.maxvnodes: 10485760
>> vfs.wantfreevnodes: 2621440
>> vfs.freevnodes: 356202
>> vfs.vnodes_created: 427696288
>> vfs.numvnodes: 532620
>> vfs.recycles_free: 20213257
>> vfs.recycles: 0
>
> And after the second round which only took 7h this night:
> kern.maxvnodes: 10485760
> vfs.wantfreevnodes: 2621440
> vfs.freevnodes: 3071754
> vfs.vnodes_created: 1275963316
> vfs.numvnodes: 3414906
> vfs.recycles_free: 58411371
> vfs.recycles: 0
>
>>>>> Meanwhile if there is tons of recycles, you can damage control by
>>>>> bumping kern.maxvnodes.
>>
>> What's the difference between recycles and recycles_free? Does the
>> above count as bumping the maxvnodes?
>
> ^
>
>>>> Looks like there are not much free directly after the reboot. I will
>>>> check the values tomorrow after the periodic run again and maybe
>>>> increase by 10 or 100 so see if it makes a difference.
>>>>
>>>>> If this is not the problem you can use dtrace to figure it out.
>>>>
>>>> dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or
>>>> something else?
>>>>
>>>
>>> I mean checking where find is spending time instead of speculating.
>>>
>>> There is no productized way to do it so to speak, but the following
>>> crapper should be good enough:
>> [script]
>>
>> I will let it run this night.
>
> I have a 51MB text file, compressed to about 1MB. Are you interested to
> get it?
>

Your problem is not the vnode limit, but nullfs.

https://people.freebsd.org/~mjg/netchild-periodic-find.svg

First, some of the contention is notorious VI_LOCK in order to do anything.

But more importantly the mind-boggling off-cpu time comes from
exclusive locking which should not be there to begin with -- as in
that xlock in stat should be a slock.

Maybe I'm going to look into it later.

-- 
Mateusz Guzik 



Re: Speed improvements in ZFS

2023-08-18 Thread Mateusz Guzik
On 8/18/23, Alexander Leidinger  wrote:
> Am 2023-08-16 18:48, schrieb Alexander Leidinger:
>> Am 2023-08-15 23:29, schrieb Mateusz Guzik:
>>> On 8/15/23, Alexander Leidinger  wrote:
>>>> Am 2023-08-15 14:41, schrieb Mateusz Guzik:
>>>>
>>>>> With this in mind can you provide: sysctl kern.maxvnodes
>>>>> vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
>>>>> vfs.recycles_free vfs.recycles
>>>>
>>>> After a reboot:
>>>> kern.maxvnodes: 10485760
>>>> vfs.wantfreevnodes: 2621440
>>>> vfs.freevnodes: 24696
>>>> vfs.vnodes_created: 1658162
>>>> vfs.numvnodes: 173937
>>>> vfs.recycles_free: 0
>>>> vfs.recycles: 0
>>
>> New values after one rund of periodic:
>> kern.maxvnodes: 10485760
>> vfs.wantfreevnodes: 2621440
>> vfs.freevnodes: 356202
>> vfs.vnodes_created: 427696288
>> vfs.numvnodes: 532620
>> vfs.recycles_free: 20213257
>> vfs.recycles: 0
>
> And after the second round which only took 7h this night:
> kern.maxvnodes: 10485760
> vfs.wantfreevnodes: 2621440
> vfs.freevnodes: 3071754
> vfs.vnodes_created: 1275963316
> vfs.numvnodes: 3414906
> vfs.recycles_free: 58411371
> vfs.recycles: 0
>

so your setup has a vastly higher number of vnodes to inspect than the
number of vnodes it allows to exist at the same time, which further
suggests it easily may be about that msleep.

>>>>> Meanwhile if there is tons of recycles, you can damage control by
>>>>> bumping kern.maxvnodes.
>>
>> What's the difference between recycles and recycles_free? Does the
>> above count as bumping the maxvnodes?
>
> ^
>

"free" vnodes are just hanging around and can be directly whacked, the
others are used but *maybe* freeable (say a directory with a bunch of
vnodes already established).

>>>> Looks like there are not much free directly after the reboot. I will
>>>> check the values tomorrow after the periodic run again and maybe
>>>> increase by 10 or 100 so see if it makes a difference.
>>>>
>>>>> If this is not the problem you can use dtrace to figure it out.
>>>>
>>>> dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or
>>>> something else?
>>>>
>>>
>>> I mean checking where find is spending time instead of speculating.
>>>
>>> There is no productized way to do it so to speak, but the following
>>> crapper should be good enough:
>> [script]
>>
>> I will let it run this night.
>
> I have a 51MB text file, compressed to about 1MB. Are you interested to
> get it?
>

Yea, put it on freefall for example.

or feed it directly to flamegraph: cat file | ./stackcollapse.pl |
./flamegraph.pl > out.svg

see this repo https://github.com/brendangregg/FlameGraph.git


-- 
Mateusz Guzik 



Re: Speed improvements in ZFS

2023-08-15 Thread Mateusz Guzik
On 8/15/23, Alexander Leidinger  wrote:
> Am 2023-08-15 14:41, schrieb Mateusz Guzik:
>
>> With this in mind can you provide: sysctl kern.maxvnodes
>> vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
>> vfs.recycles_free vfs.recycles
>
> After a reboot:
> kern.maxvnodes: 10485760
> vfs.wantfreevnodes: 2621440
> vfs.freevnodes: 24696
> vfs.vnodes_created: 1658162
> vfs.numvnodes: 173937
> vfs.recycles_free: 0
> vfs.recycles: 0
>
>> Meanwhile if there is tons of recycles, you can damage control by
>> bumping kern.maxvnodes.
>
> Looks like there are not much free directly after the reboot. I will
> check the values tomorrow after the periodic run again and maybe
> increase by 10 or 100 so see if it makes a difference.
>
>> If this is not the problem you can use dtrace to figure it out.
>
> dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or
> something else?
>

I mean checking where find is spending time instead of speculating.

There is no productized way to do it so to speak, but the following
crapper should be good enough:
#pragma D option dynvarsize=32m

profile:::profile-997
/execname == "find"/
{
@oncpu[stack(), "oncpu"] = count();
}

/*
 * The p_flag & 0x4 test filters out kernel threads.
 */

sched:::off-cpu
/execname == "find"/
{
self->ts = timestamp;
}

sched:::on-cpu
/self->ts/
{
@offcpu[stack(30), "offcpu"] = sum(timestamp - self->ts);
self->ts = 0;
}

dtrace:::END
{
normalize(@offcpu, 100);
printa("%k\n%s\n%@d\n\n", @offcpu);
printa("%k\n%s\n%@d\n\n", @oncpu);
}

just leave it running as: dtrace -s script.d -o output

kill it after periodic finishes. it blindly assumes there will be no
other processes named "find" messing around.


> Bye,
> Alexander.
>
> --
> http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
> http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF
>


-- 
Mateusz Guzik 



Re: ZFS deadlock in 14

2023-08-15 Thread Mateusz Guzik
On 8/15/23, Dag-Erling Smørgrav  wrote:
> Mateusz Guzik  writes:
>> Given that the custom reproducer failed I think the most prudent
>> course of action is to reproduce again with poudriere, but this time
>> arrange to have all stacktraces dumped.
>
> Why?  What more information do you need?
>

Going through the list may or may not reveal other threads doing
something in the area and it very well may be they are deadlocked,
which then results in other processes hanging on them.

Just like in your case the process reported as hung is a random victim
and whatever the real culprit is deeper.

-- 
Mateusz Guzik 



Re: ZFS deadlock in 14

2023-08-15 Thread Mateusz Guzik
On 8/15/23, Dag-Erling Smørgrav  wrote:
> Dag-Erling Smørgrav  writes:
>> I managed to geat a deadlock with 4e8d558c9d1c.  Its predecessor
>> 5ca7f02946 appears to be working.  I'm going to try to come up with a
>> more efficient way to reproduce the deadlock than running poudriere.
>
> I wrote a script that creates multiple filesystems, snapshots them,
> populates them and rolls them back continuously but so far I have not
> succeeded in triggering the deadlock without poudriere.  I guess my
> script doesn't consume enough vnodes.
>
> Also, 9228ac3a69c4 (9 August, last commit before the contrib/googletest
> breakage) still deadlocks.
>

Given that the custom reproducer failed I think the most prudent
course of action is to reproduce again with poudriere, but this time
arrange to have all stacktraces dumped.

this should do it:
sbin/ddb/ddb.conf:script kdb.enter.panic=textdump set; capture on; run
lockinfo; show pcpu; bt; ps; alltrace; capture off; textdump dump;
reset

it is a slightly finicky beast so I would trigger a panic by hand
first to validate it works as expected.

-- 
Mateusz Guzik 



Re: Speed improvements in ZFS

2023-08-15 Thread Mateusz Guzik
On 8/15/23, Alexander Leidinger  wrote:
> Hi,
>
> just a report that I noticed a very high speed improvement in ZFS in
> -current. Since a looong time (at least since last year), for a
> jail-host of mine with about >20 jails on it which each runs periodic
> daily, the periodic daily runs of the jails take from about 3 am to 5pm
> or longer. I don't remember when this started, and I thought at that
> time that the problem may be data related. It's the long runs of "find"
> in one of the periodic daily jobs which takes that long, and the number
> of jails together with null-mounted basesystem inside the jail and a
> null-mounted package repository inside each jail the number of files and
> congruent access to the spining rust with first SSD and now NVME based
> cache may have reached some tipping point. I have all the periodic daily
> mails around, so theoretically I may be able to find when this started,
> but as can be seen in another mail to this mailinglist, the system which
> has all the periodic mails has some issues which have higher priority
> for me to track down...
>
> Since I updated to a src from 2023-07-20, this is not the case anymore.
> The data is the same (maybe even a bit more, as I have added 2 more
> jails since then and the periodic daily runs which run more or less in
> parallel, are not taking considerably longer). The speed increase with
> the July-build are in the area of 3-4 hours for 23 parallel periodic
> daily runs. So instead of finishing the periodic runs around 5pm, they
> finish already around 1pm/2pm.
>
> So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and
> 2023-07-20 has given a huge speed improvement. From my memory I would
> say there is still room for improvement, as I think it may be the case
> that the periodic daily runs ended in the morning instead of the
> afteroon, but my memory may be flaky in this regard...
>
> Great work to whoever was involved.
>

several hours to run periodic is still unusably slow.

have you tried figuring out where is the time spent?

I don't know what caused the change here, but do know of one major
bottleneck which you are almost guaranteed to run into if you inspect
all files everywhere -- namely bumping over a vnode limit.

In vn_alloc_hard you can find:
msleep(_sig, _list_mtx, PVFS, "vlruwk", hz);
if (atomic_load_long() + 1 > desiredvnodes &&
vnlru_read_freevnodes() > 1)
vnlru_free_locked(1);

that is, the allocating thread will sleep up to 1 second if there are
no vnodes up for grabs and then go ahead and allocate one anyway.
Going over the numvnodes is partially rate-limited, but in a manner
which is not very usable.

The entire is mostly borked and in desperate need of a rewrite.

With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles

Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.

If this is not the problem you can use dtrace to figure it out.

-- 
Mateusz Guzik 



Re: confusion about root partition causes panic during startup

2023-07-20 Thread Mateusz Guzik
On 7/20/23, Mike Karels  wrote:
> I installed an additional NVME drive on a system, and then booted.  It
> turns
> out that the new drive became nda0, renumbering the other drives.  The
> loader
> found the correct partition to boot (the only choice), and loaded the
> kernel
> correctly.  However, /etc/fstab still had the old name (nvd1p2), which is
> now drive 2.  I expected it to drop into single user, but instead the
> system
> panicked in vfs_mountroot_shuffle trying to switch root devices (see
> below).
> It doesn't seem that having the wrong root device in /etc/fstab should
> cause
> a panic; it makes it harder to patch the system.  I was unable to get the
> system to boot using boot-to-single-user or setting currdev, but I managed
> to remember doing "boot -a" from a loader prompt to get the system to ask
> the root device before mounting it.  I can easily reproduce this to test.
> Probably the NDFREE_PNBUF() shouldn't happen if namei() returned an error.
>

ye, this should do it (untested):

diff --git a/sys/kern/vfs_mountroot.c b/sys/kern/vfs_mountroot.c
index 956d29e3f084..85398ff781e4 100644
--- a/sys/kern/vfs_mountroot.c
+++ b/sys/kern/vfs_mountroot.c
@@ -352,13 +352,13 @@ vfs_mountroot_shuffle(struct thread *td, struct
mount *mpdevfs)
NDINIT(, LOOKUP, FOLLOW | LOCKLEAF, UIO_SYSSPACE, fspath);
error = namei();
if (error) {
-   NDFREE_PNBUF();
fspath = "/mnt";
NDINIT(, LOOKUP, FOLLOW | LOCKLEAF, UIO_SYSSPACE,
fspath);
error = namei();
}
if (!error) {
+   NDFREE_PNBUF();
vp = nd.ni_vp;
error = (vp->v_type == VDIR) ? 0 : ENOTDIR;
if (!error)
@@ -376,7 +376,6 @@ vfs_mountroot_shuffle(struct thread *td, struct
mount *mpdevfs)
} else
vput(vp);
}
-   NDFREE_PNBUF();

if (error)
printf("mountroot: unable to remount previous root "
@@ -387,6 +386,7 @@ vfs_mountroot_shuffle(struct thread *td, struct
mount *mpdevfs)
NDINIT(, LOOKUP, FOLLOW | LOCKLEAF, UIO_SYSSPACE, "/dev");
error = namei();
if (!error) {
+   NDFREE_PNBUF();
vp = nd.ni_vp;
error = (vp->v_type == VDIR) ? 0 : ENOTDIR;
if (!error)



>   Mike
>
> Trying to mount root from ufs:/dev/nvd1p2 [rw]...
> WARNING: WITNESS option enabled, expect reduced performance.
> mountroot: unable to remount devfs under /dev (error 2)
> panic: Assertion _ndp->ni_cnd.cn_pnbuf != NULL failed at
> ../../../kern/vfs_mountroot.c:416
> cpuid = 19
> time = 11
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe006d3bac40
> vpanic() at vpanic+0x149/frame 0xfe006d3bac90
> panic() at panic+0x43/frame 0xfe006d3bacf0
> vfs_mountroot() at vfs_mountroot+0x1bf7/frame 0xfe006d3bae60
> start_init() at start_init+0x23/frame 0xfe006d3baef0
> fork_exit() at fork_exit+0x82/frame 0xfe006d3baf30
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe006d3baf30
> --- trap 0x5c035c02, rip = 0x680c680c680c680c, rsp = 0x1b6b1f6b1b6b1b6b, rbp
> = 0x4eb54eb54eb54eb5 ---
> KDB: enter: panic
> [ thread pid 1 tid 12 ]
> Stopped at  kdb_enter+0x32: movq$0,0xde7643(%rip)
>
>


-- 
Mateusz Guzik 



Re: another crash and going forward with zfs

2023-04-24 Thread Mateusz Guzik
On 4/18/23, Pawel Jakub Dawidek  wrote:
> On 4/18/23 05:14, Mateusz Guzik wrote:
>> On 4/17/23, Pawel Jakub Dawidek  wrote:
>>> Correct me if I'm wrong, but from my understanding there were zero
>>> problems with block cloning when it wasn't in use or now disabled.
>>>
>>> The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
>>> avoid mess like this and give us more time to sort all the problems out
>>> while making it easy for people to try it.
>>>
>>> If there is no plan to revert the whole import, I don't see what value
>>> removing just block cloning will bring if it is now disabled by default
>>> and didn't cause any problems when disabled.
>>>
>>
>> The feature definitely was not properly stress tested and what not and
>> trying to do it keeps running into panics. Given the complexity of the
>> feature I would expect there are many bug lurking, some of which
>> possibly related to the on disk format. Not having to deal with any of
>> this is can be arranged as described above and is imo the most
>> sensible route given the timeline for 14.0
>
> Block cloning doesn't create, remove or modify any on-disk data until it
> is in use.
>
> Again, if we are not going to revert the whole merge, I see no point in
> reverting block cloning as until it is enabled, its code is not
> executed. This allow people who upgraded the pools to do nothing special
> and it will allow people to test it easily.
>

Some people will zpool upgrade out of habit or whatever after moving
to 14.0, which will then make them unable to go back to 13.x if woes
show up.

Woes don't even have to be zfs-related. This is a major release, one
has to suspect there will be some breakage and it maybe the best way
forward for some of the users will be to downgrade (e.g., with boot
envinronments). As is they wont be able to do it if they zpool
upgrade.

If someone *does* zpool upgrade and there is further data corruption
due to block cloning (which you really can't rule out given that the
feature so far did not survive under load), telephone game is going to
turn this into "14.0 corrupts data" and no amount of clarifying about
an optional feature is going to help the press.

If anything the real question is how come the feature got merged upstream, when:
1. FreeBSD CI for the project is offline
2. There is no Linux support
3. ... basic usage showed numerous bugs

Should the feature get whipped into shape, it can be a 14.1 candidate.

-- 
Mateusz Guzik 



Re: Problem compiling py-* ports

2023-04-18 Thread Mateusz Guzik
On 4/18/23, Filippo Moretti  wrote:
> Good morning,   I run this versione of Frrebsd and al
> py-* ports fail with the following message.sincerelyFilippo
>
> FreeBSD STING 14.0-CURRENT FreeBSD 14.0-CURRENT #6
> main-n261981-63b113af5706: Tue Apr  4 16:57:47 CEST 2023
> filippo@STING:/usr/obj/usr/src/amd64.amd64/sys/STING amd64
>
>

you are on a zfs commit with known data corruption (in fact, 2)

bare minimum you need to update to fresh main and reinstall all python stuff

>
>
>
>return _bootstrap._gcd_import(name[level:], package,
> level)
>File "", line 1030, in
> _gcd_import
>
>   File "", line 1007, in
> _find_and_load
>
>   File "", line 972, in
> _find_and_load_unlocked
>
>   File "", line 228, in
> _call_with_frames_removed
>   File "", line 1030, in _gcd_import
>   File "", line 1007, in _find_and_load
>   File "", line 986, in
> _find_and_load_unlocked
>   File "", line 680, in _load_unlocked
>   File "", line 850, in exec_module
>   File "", line 228, in
> _call_with_frames_removed
>   File "/usr/local/lib/python3.9/site-packages/setuptools/__init__.py", line
> 18, in 
> from setuptools.dist import Distribution
>   File "/usr/local/lib/python3.9/site-packages/setuptools/dist.py", line 34,
> in 
> from ._importlib import metadata
>   File "/usr/local/lib/python3.9/site-packages/setuptools/_importlib.py",
> line 39, in 
> disable_importlib_metadata_finder(metadata)
>   File "/usr/local/lib/python3.9/site-packages/setuptools/_importlib.py",
> line 28, in disable_importlib_metadata_finder
> to_remove = [
>   File "/usr/local/lib/python3.9/site-packages/setuptools/_importlib.py",
> line 31, in 
> if isinstance(ob, importlib_metadata.MetadataPathFinder)
> AttributeError: module 'importlib_metadata' has no attribute
> 'MetadataPathFinder'
>
> ERROR Backend subprocess exited when trying to invoke
> get_requires_for_build_wheel
> *** Error code 1
>
> Stop.
> make: stopped in /usr/ports/textproc/py-pygments
>
> ===>>> make build failed for textproc/py-pygments@py39
> ===>>> Aborting update
>
> ===>>> Update for textproc/py-pygments@py39 failed
> ===>>> Aborting update
>
>
> ===>>> You can restart from the point of failure with this command line:
>
>
>
>


-- 
Mateusz Guzik 



Re: another crash and going forward with zfs

2023-04-17 Thread Mateusz Guzik
On 4/17/23, Pawel Jakub Dawidek  wrote:
> On 4/18/23 03:51, Mateusz Guzik wrote:
>> After bugfixes got committed I decided to zpool upgrade and sysctl
>> vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very
>> quickly got a new crash:
>>
>> panic: VERIFY(arc_released(db->db_buf)) failed
>>
>> cpuid = 9
>> time = 1681755046
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe0a90b8e5f0
>> vpanic() at vpanic+0x152/frame 0xfe0a90b8e640
>> spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0
>> dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0
>> dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame
>> 0xfe0a90b8e700
>> dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame
>> 0xfe0a90b8e780
>> dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0
>> zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960
>> zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980
>> VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90
>> vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20
>> vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80
>> vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0
>> vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40
>> dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90
>> sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00
>> amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30
>> fast_syscall_common() at fast_syscall_common+0xf8/frame
>> 0xfe0a90b8ef30
>> --- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp =
>> 0x103cdc85dd48, rbp = 0x103cdc85dd80 ---
>> KDB: enter: panic
>> [ thread pid 95000 tid 135035 ]
>> Stopped at  kdb_enter+0x32: movq$0,0x9e4153(%rip)
>>
>> The posted 14.0 schedule which plans to branch stable/14 on May 12 and
>> one cannot bet on the feature getting beaten up into production shape
>> by that time. Given whatever non-block_clonning and not even zfs bugs
>> which are likely to come out I think this makes the feature a
>> non-starter for said release.
>>
>> I note:
>> 1. the current problems did not make it into stable branches.
>> 2. there was block_cloning-related data corruption (fixed) and there may
>> be more
>> 3. there was unrelated data corruption (see
>> https://github.com/openzfs/zfs/issues/14753), sorted out by reverting
>> the problematic commit in FreeBSD, not yet sorted out upstream
>>
>> As such people's data may be partially hosed as is.
>>
>> Consequently the proposed plan is as follows:
>> 1. whack the block cloning feature for the time being, but make sure
>> pools which upgraded to it can be mounted read-only
>> 2. run ztest and whatever other stress testing on FreeBSD, along with
>> restoring openzfs CI -- I can do the first part, I'm sure pho will not
>> mind to run some tests of his own
>> 3. recommend people create new pools and restore data from backup. if
>> restoring from backup is not an option, tar or cp (not zfs send) from
>> the read-only mount
>>
>> block cloning beaten into shape would use block_cloning_v2 or whatever
>> else, key point that the current feature name would be considered
>> bogus (not blocking RO import though) to prevent RW usage of the
>> current pools with it enabled.
>>
>> Comments?
>
> Correct me if I'm wrong, but from my understanding there were zero
> problems with block cloning when it wasn't in use or now disabled.
>
> The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
> avoid mess like this and give us more time to sort all the problems out
> while making it easy for people to try it.
>
> If there is no plan to revert the whole import, I don't see what value
> removing just block cloning will bring if it is now disabled by default
> and didn't cause any problems when disabled.
>

The feature definitely was not properly stress tested and what not and
trying to do it keeps running into panics. Given the complexity of the
feature I would expect there are many bug lurking, some of which
possibly related to the on disk format. Not having to deal with any of
this is can be arranged as described above and is imo the most
sensible route given the timeline for 14.0

-- 
Mateusz Guzik 



another crash and going forward with zfs

2023-04-17 Thread Mateusz Guzik
After bugfixes got committed I decided to zpool upgrade and sysctl
vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very
quickly got a new crash:

panic: VERIFY(arc_released(db->db_buf)) failed

cpuid = 9
time = 1681755046
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0a90b8e5f0
vpanic() at vpanic+0x152/frame 0xfe0a90b8e640
spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0
dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0
dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame
0xfe0a90b8e700
dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfe0a90b8e780
dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0
zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960
zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980
VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90
vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20
vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80
vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0
vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40
dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90
sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00
amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0a90b8ef30
--- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp =
0x103cdc85dd48, rbp = 0x103cdc85dd80 ---
KDB: enter: panic
[ thread pid 95000 tid 135035 ]
Stopped at  kdb_enter+0x32: movq$0,0x9e4153(%rip)

The posted 14.0 schedule which plans to branch stable/14 on May 12 and
one cannot bet on the feature getting beaten up into production shape
by that time. Given whatever non-block_clonning and not even zfs bugs
which are likely to come out I think this makes the feature a
non-starter for said release.

I note:
1. the current problems did not make it into stable branches.
2. there was block_cloning-related data corruption (fixed) and there may be more
3. there was unrelated data corruption (see
https://github.com/openzfs/zfs/issues/14753), sorted out by reverting
the problematic commit in FreeBSD, not yet sorted out upstream

As such people's data may be partially hosed as is.

Consequently the proposed plan is as follows:
1. whack the block cloning feature for the time being, but make sure
pools which upgraded to it can be mounted read-only
2. run ztest and whatever other stress testing on FreeBSD, along with
restoring openzfs CI -- I can do the first part, I'm sure pho will not
mind to run some tests of his own
3. recommend people create new pools and restore data from backup. if
restoring from backup is not an option, tar or cp (not zfs send) from
the read-only mount

block cloning beaten into shape would use block_cloning_v2 or whatever
else, key point that the current feature name would be considered
bogus (not blocking RO import though) to prevent RW usage of the
current pools with it enabled.

Comments?

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mateusz Guzik
e with the
>> > new option enabled on ZFS pools. Any advice?
>> >
>> > In an act of precausion (or call it panic) I shutdown several servers to
>> > prev
>> > ent irreversible
>> > damages to databases and data storages. We face on one host with
>> > /usr/ports r
>> > esiding on ZFS
>> > always errors on the same files created while staging (using portmaster,
>> > leav
>> > es the system
>> > with noninstalled software, i.e. www/apache24 in our case). Deleting the
>> > work
>> >  folder doesn't
>> > seem to change anything, even when starting a scrubbing of the entire
>> > pool (R
>> > AIDZ1 pool) -
>> > cause unknown, why it affects always the same files to be corrupted.
>> > Same wit
>> > h deve/ruby-gems.
>> >
>> > Poudriere has been shutdown for the time being to avoid further issues.
>> >
>> >
>> > Are there any advies to proceed apart from conserving the boxes via
>> > shutdown?
>> >
>> > Thank you ;-)
>> > oh
>> >
>> >
>> >
>> > --
>> > O. Hartmann
>>
>> With an up-to-date tree + pjd@'s "Fix data corruption when cloning
>> embedded
>> blocks. #14739" patch I didn't have any issues, except for email messages
>>
>> with corruption in my sent directory, nowhere else. I'm still
>> investigating
>> the email messages issue. IMO one is generally safe to run poudriere on
>> the
>> latest ZFS with the additional patch.
>>
>> My tests of the additional patch concluded that it resolved my last
>> problems, except for the sent email problem I'm still investigating. I'm
>> sure there's a simple explanation for it, i.e. the email thread was
>> corrupted by the EXDEV regression which cannot be fixed by anything, even
>>
>> reverting to the previous ZFS -- the data in those files will remain
>> damaged regardless.
>>
>> I cannot speak to the others who have had poudriere and other issues. I
>> never had any problems with poudriere on top of the new ZFS.
>>
>> WRT reverting block_cloning pools to without, your only option is to
>> backup
>> your pool and recreate it without block_cloning. Then restore your data.
>>
>>
>
> All right, I interpret the answer that way, that I need a most recent source
> tree (and
> accordingly built and installed OS) AND a patch that isn't officially
> commited?
>
> On a box I'm with:
>
> FreeBSD 14.0-CURRENT #8 main-n262175-5ee1c90e50ce: Sat Apr 15 07:57:16 CEST
> 2023 amd64
>
> The box is crashing while trying to update ports with the well known issue:
>
> Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
>
> At the moment all alarm bells are ringing and I lost track about what has
> been patched and
> already commited and what is still as patch available but in the phase of
> evaluation or
> inofficially emmited here.
>
> According to the EXDEV issue: in cases of poudriere or ports trees on ZFS,
> what do I have to
> do to ensure that those datasets are clean? The OS should detect file
> corruption but in my
> case the box is crashing :-(
>
> I did several times scrubbing, but this seems to be the action of a helpless
> and desperate man
> ... ;-/
>
> Greetings
>

Using block cloning is still not safe, but somewhere in this thread
pjd had a patch to keep it operatinal for already cloned files without
adding new ones.

Anyhow, as was indicated by vishwin@ there was data corruption
*unrelated* to block cloning which also came with the import, I
narrowed it down: https://github.com/openzfs/zfs/issues/14753

That said now I'm testing a kernel which does not do block cloning and
does not have the other problematic commit, we will see if things
work.

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mateusz Guzik
On 4/14/23, Charlie Li  wrote:
> Pawel Jakub Dawidek wrote:
>> On 4/14/23 07:52, Charlie Li wrote:
>>> Pawel Jakub Dawidek wrote:
>>>> thank you for your testing and patience so far. I'm working on a
>>>> patch to revert block cloning without affecting people who already
>>>> upgraded their pools.
>>>>
>>> Testing with mjg@ earlier today revealed that block_cloning was not
>>> the cause of poudriere bulk build (and similar cp(1)/install(1)-based)
>>> corruption, although may have exacerbated it.
>>
>> Can you please elaborate how were you testing and what exactly did you
>> exclude?
>>
> mjg@ prepared
> https://gitlab.com/vishwin/freebsd-src/-/commit/b41f187ba329621cda1e8e67a0786f07b1221a3c
>
> which only removes block_cloning, rebuilding kernel only (buildworld
> fails) for me to test poudriere bulk -c builds with. I used a world from
> https://gitlab.com/vishwin/freebsd-src/-/tree/zfs-revert which consists
> of reverting the merge commit plus a few other conflicts, but keeping
> vop_fplookup_vexec.
>

I'm going to narrow down the non-blockcopy corruption after my testjig
gets off the ground.

Basically I expect to have it sorted out on Friday.


-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mateusz Guzik
On 4/13/23, Cy Schubert  wrote:
> On Thu, 13 Apr 2023 19:54:42 +0900
> Paweł Jakub Dawidek  wrote:
>
>> On Apr 13, 2023, at 16:10, Cy Schubert  wrote:
>> >
>> > In message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Schubert
>> > writes:
>> > In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert
>> > writes:
>> >> In message , Mark
>> >> Millard
>> >>> write
>> >>> s:
>> >>> [This just puts my prior reply's material into Cy's
>> >>>> adjusted resend of the original. The To/Cc should
>> >>>> be coomplete this time.]
>> >>>>
>> >>>> On Apr 12, 2023, at 22:52, Cy Schubert  =
>> >>>> wrote:
>> >>>>
>> >>>> In message , Mark =
>> >>>>> Millard=20
>> >>>> write
>> >>>>> s:
>> >>>>> From: Charlie Li  wrote on
>> >>>>>> Date: Wed, 12 Apr 2023 20:11:16 UTC :
>> >>>>>> =20
>> >>>>>> Charlie Li wrote:
>> >>>>>>> Mateusz Guzik wrote:
>> >>>>>>>> can you please test poudriere with
>> >>>>>>>>> https://github.com/openzfs/zfs/pull/14739/files
>> >>>>>>>>> =20
>> >>>>>>>>> After applying, on the md(4)-backed pool regardless of =3D
>> >>>>>>>> block_cloning,=3D20
>> >>>>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
>> >>>>>>>> Will=3D20=3D
>> >>>> =20
>> >>>>>> report back on poudriere results (no block_cloning).
>> >>>>>>>> =3D20
>> >>>>>>>> As for poudriere, build failures are still rolling in. These are
>> >>>>>>>> =
>> >>>>>>> (and=3D20=3D
>> >>>> =20
>> >>>>>> have been) entirely random on every run. Some examples from this =
>> >>>>>>> run:
>> >>>> =3D20
>> >>>>>>> lang/php81:
>> >>>>>>> - post-install: @${INSTALL_DATA}
>> >>>>>>> ${WRKSRC}/php.ini-development=3D20
>> >>>>>>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
>> >>>>>>> ${STAGEDIR}/${PREFIX}/etc
>> >>>>>> - consumers fail to build due to corrupted php.conf packaged
>> >>>>>>> =3D20
>> >>>>>>> devel/ninja:
>> >>>>>>> - phase: stage
>> >>>>>>> - install -s -m 555=3D20
>> >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
>> >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
>> >>>>>>> - consumers fail to build due to corrupted bin/ninja packaged
>> >>>>>>> =3D20
>> >>>>>>> devel/netsurf-buildsystem:
>> >>>>>>> - phase: stage
>> >>>>>>> - mkdir -p=3D20
>> >>>>>>> =3D
>> >>>>>>> =
>> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>> >>>> e=
>> >> =3D
>> >>>> tsurf-buildsystem/makefiles=3D20
>> >>>>>> =3D
>> >>>>>>> =
>> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>> >>>> e=
>> >> =3D
>> >>>> tsurf-buildsystem/testtools
>> >>>>>> for M in Makefile.top Makefile.tools Makefile.subdir =3D
>> >>>>>>> Makefile.pkgconfig=3D20
>> >>>>>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do
>> >>>>>> \
>> >>>>>>> cp makefiles/$M=3D20
>> >>>>>>> =3D
>> >>>>>>> =
>> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>> >>>> e=
>> >> =3D
>> >>>> tsurf-buildsystem/makefiles/;=3D20
>> >>>>>> \
>> >>>>>>> done
>> >>>>>>> - graphics/libnsgif fails to build due to NUL char

Re: CURRENT: Panic VERIFY(!zil_replaying(zilog, tx)) failed (and crashing)

2023-04-09 Thread Mateusz Guzik
On 4/9/23, FreeBSD User  wrote:
> Today, after upgrading to FreeBSD 14.0-CURRENT #8 main-n262052-0d4038e3012b:
> Sun Apr  9
> 12:01:02 CEST 2023  amd64, AND upgrading ZPOOLs via
>
> zpool upgrade POOLNAME
>
> some boxes keep crashing when starting compiler runs (the trigger is
> different on boxes).
>
> ZFS module is statically compiled into the kernel (if this is of
> importance)
>
> Last known good was:
>
> [...]
> Apr  9 07:10:04 <0.2> thor kernel: FreeBSD 14.0-CURRENT #7
> main-n262051-75379ea2e461: Sun Apr
> 9 00:12:57 CEST 2023 Apr  9 07:10:04 <0.2> thor kernel:
> root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR amd64 Apr  9 07:10:04 <0.2>
> thor kernel:
> FreeBSD clang version 15.0.7 (https://github.com/llvm/llvm-project.git
> llvmorg-15.0.7-0-g8dfdcc7b7bf6) Apr  9 07:10:04 <0.2> thor kernel:
> VT(efifb): resolution
> 2560x1440 Apr  9 07:10:04 <0.2> thor kernel: module zfsctrl already
> present!
> [...]
>
> The file /var/crash/info.X
>
> contains:
>
> [...]
>
> root@thor:/var/crash # more info.2
> Dump header from device: /dev/gpt/swap
>   Architecture: amd64
>   Architecture Version: 2
>   Dump Length: 1095192576
>   Blocksize: 512
>   Compression: none
>   Dumptime: 2023-04-09 11:43:41 +
>   Hostname: thor.local
>   Magic: FreeBSD Kernel Dump
>   Version String: FreeBSD 14.0-CURRENT #8 main-n262052-0d4038e3012b: Sun Apr
>  9 12:01:02 CEST
> 2023
> root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR
>   Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
>
>   Dump Parity: 2961465682
>   Bounds: 2
>   Dump Status: good
>
> Until reconfigured for more debug stuff I do not have more to present.
>
> I rememeber now really scraed that there was a HEADSUP in the list regarding
> some serious ZFS
> problems - I didn't find it right now.
>
> Thanks in advance,
>

That's fallout from the new block cloning feature, adding the author

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-08 Thread Mateusz Guzik
On 4/8/23, Kyle Evans  wrote:
> On Fri, Apr 7, 2023 at 4:54 PM Mateusz Guzik  wrote:
>>
>> On 4/7/23, Mark Millard  wrote:
>> > On Apr 7, 2023, at 14:26, Mateusz Guzik  wrote:
>> >
>> >> On 4/7/23, Mateusz Guzik  wrote:
>> >>> can you try with this:
>> >>>
>> >>> diff --git
>> >>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> >>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> >>> index 16276b08c759..e1bca9ef140a 100644
>> >>> ---
>> >>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> >>> +++
>> >>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> >>> @@ -71,7 +71,7 @@
>> >>> #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
>> >>> #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
>> >>>
>> >>> -#definekfpu_allowed()  1
>> >>> +#definekfpu_allowed()  0
>> >>> #definekfpu_begin()kernel_neon_begin()
>> >>> #definekfpu_end()  kernel_neon_end()
>> >>> #definekfpu_init() (0)
>> >>>
>> >>>
>> >>
>> >> ops, wrong file
>> >>
>> >> diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> >> b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> >> index 178fbc3b3c6e..c462220289d6 100644
>> >> --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> >> +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> >> @@ -46,7 +46,7 @@
>> >> #include 
>> >> #include 
>> >>
>> >> -#definekfpu_allowed()  1
>> >> +#definekfpu_allowed()  0
>> >> #definekfpu_initialize(tsk)do {} while (0)
>> >> #definekfpu_begin()do {} while (0)
>> >> #definekfpu_end()  do {} while (0)
>> >
>> > It will take me a bit to setup a separate build/install
>> > context for the source code vintage involved. Then more
>> > time to do the build, install, and test. (I'm keeping
>> > my normal environments completely before the mess.)
>> >
>> > FYI:
>> >
>> > I have used the artifact build just after your pair of zfs
>> > related updates to confirm the VFP problem is still in
>> > place as of that point:
>> >
>> > https://artifact.ci.freebsd.org/snapshot/main/5e2e3615d91f9c0c688987915ff5c8de23c22bde/arm64/aarch64/kernel.txz
>> >
>> > (No artifact build was exactly at either of your commits.)
>> >
>> > ===
>> > Mark Millard
>> > marklmi at yahoo.com
>> >
>> >
>>
>> I have arm64 + zfs at $job and just verified the above lets it boot
>> again, so I committed already.
>>
>
> This was a known issue that we were working on fixing properly over in
> https://reviews.freebsd.org/D39448... this really could have waited
> just a little bit longer. This problem was already brought up in
> response to the commit in question days ago.
>

Mate, that's one confusing email.

I had seen the upstream review, apparently there is opposition to the
patch, it is clearly not going to land within hours.

Whatever the Real Fix(tm) might be, I'm confident my change has no
impact on work on it, past the need to flip kfpu_allowed back to 1.

At the same time things were broken to the point where aarch64 + zfs
literally did not boot. Once more, I fail to see how restoring basic
operation by fipping a macro to 0 throws any wrenches into the effort
to get simd working.

If anything the question is how come a clearly *not* implemented simd
support got kfpu_allowed set to 1.

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mateusz Guzik
On 4/7/23, Mark Millard  wrote:
> On Apr 7, 2023, at 14:26, Mateusz Guzik  wrote:
>
>> On 4/7/23, Mateusz Guzik  wrote:
>>> can you try with this:
>>>
>>> diff --git
>>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>>> index 16276b08c759..e1bca9ef140a 100644
>>> --- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>>> +++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>>> @@ -71,7 +71,7 @@
>>> #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
>>> #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
>>>
>>> -#definekfpu_allowed()  1
>>> +#definekfpu_allowed()  0
>>> #definekfpu_begin()kernel_neon_begin()
>>> #definekfpu_end()  kernel_neon_end()
>>> #definekfpu_init() (0)
>>>
>>>
>>
>> ops, wrong file
>>
>> diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> index 178fbc3b3c6e..c462220289d6 100644
>> --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> @@ -46,7 +46,7 @@
>> #include 
>> #include 
>>
>> -#definekfpu_allowed()  1
>> +#definekfpu_allowed()  0
>> #definekfpu_initialize(tsk)do {} while (0)
>> #definekfpu_begin()do {} while (0)
>> #definekfpu_end()  do {} while (0)
>
> It will take me a bit to setup a separate build/install
> context for the source code vintage involved. Then more
> time to do the build, install, and test. (I'm keeping
> my normal environments completely before the mess.)
>
> FYI:
>
> I have used the artifact build just after your pair of zfs
> related updates to confirm the VFP problem is still in
> place as of that point:
>
> https://artifact.ci.freebsd.org/snapshot/main/5e2e3615d91f9c0c688987915ff5c8de23c22bde/arm64/aarch64/kernel.txz
>
> (No artifact build was exactly at either of your commits.)
>
> ===
> Mark Millard
> marklmi at yahoo.com
>
>

I have arm64 + zfs at $job and just verified the above lets it boot
again, so I committed already.

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mateusz Guzik
On 4/7/23, Mateusz Guzik  wrote:
> can you try with this:
>
> diff --git
> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> index 16276b08c759..e1bca9ef140a 100644
> --- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> +++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> @@ -71,7 +71,7 @@
>  #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
>  #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
>
> -#definekfpu_allowed()  1
> +#definekfpu_allowed()  0
>  #definekfpu_begin()kernel_neon_begin()
>  #definekfpu_end()  kernel_neon_end()
>  #definekfpu_init() (0)
>
>

ops, wrong file

diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
index 178fbc3b3c6e..c462220289d6 100644
--- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
+++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
@@ -46,7 +46,7 @@
 #include 
 #include 

-#definekfpu_allowed()  1
+#definekfpu_allowed()  0
 #definekfpu_initialize(tsk)do {} while (0)
 #definekfpu_begin()do {} while (0)
 #definekfpu_end()  do {} while (0)


> On 4/7/23, Mark Millard  wrote:
>> Turns out that as of this commit aarch64 (Cortex-A72 and Cortex-A57
>> examples reported) gets the following even when no zfs media is
>> present (UFS boot):
>>
>> # zpool import
>>  x0: f0fa9168 (ucom_cons_softc + efbf1bb8)
>>  x1: ff90 ($d.1 + afa318)
>>  x2: ff900400 ($d.1 + afa718)
>>  x3: fec1b0a4 (sha_incremental + 0)
>>  x4:0
>>  x5:   10
>>  x6: 8e16db93
>>  x7:0
>>  x8: feb06168 (tf_sha256_neon + 0)
>>  x9: fea931fb ($d.1 + b)
>> x10: feb045f4 (SHA2Update + f4)
>> x11:   29
>> x12:1
>> x13:0
>> x14:0
>> x15:2
>> x16: feaf7500 ($d.0 + 0)
>> x17: 00476cf0 (nanouptime + 0)
>> x18: f0fa9000 (ucom_cons_softc + efbf1a50)
>> x19: f0fa9168 (ucom_cons_softc + efbf1bb8)
>> x20:  400
>> x21: ff90 ($d.1 + afa318)
>> x22: f0fa9198 (ucom_cons_softc + efbf1be8)
>> x23:0
>> x24:0
>> x25:0
>> x26: fed2df70 (sha256_neon_impl + 0)
>> x27:  203
>> x28:   31
>> x29: f0fa9040 (ucom_cons_softc + efbf1a90)
>>  sp: f0fa9000
>>  lr: feb04668 (SHA2Update + 168)
>> elr: feaf8684 (zfs_sha256_block_neon + 14)
>> spsr: 2045
>> esr: 1fe0
>> panic: VFP exception in the kernel
>> cpuid = 3
>> time = 1680786034
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>> vpanic() at vpanic+0x13c
>> panic() at panic+0x44
>> do_el1h_sync() at do_el1h_sync+0x210
>> handle_el1h_sync() at handle_el1h_sync+0x10
>> --- exception, esr 0xf0fa9198
>> (null)() at 0x400
>> KDB: enter: panic
>> [ thread pid 1446 tid 100101 ]
>> Stopped at  kdb_enter+0x44: undefined   f905c27f
>> db>
>>
>> The above was produced via using an artifact build's
>> kernel based on that exact commit:
>>
>> https://artifact.ci.freebsd.org/snapshot/main/2a58b312b62f908ec92311d1bd8536dbaeb8e55b/arm64/aarch64/kernel.txz
>>
>> By contrast, the prior commit had an artifact build
>> as well, but it's kernel does not get the panic for
>> zpool import :
>>
>> https://artifact.ci.freebsd.org/snapshot/main/b98fbf3781df16f7797b2bbeabf205dc7d4985ae/arm64/aarch64/kernel.txz
>>
>> See also:
>>
>> https://lists.freebsd.org/archives/freebsd-current/2023-April/003417.html
>>
>> ===
>> Mark Millard
>> marklmi at yahoo.com
>>
>>
>
>
> --
> Mateusz Guzik 
>


-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mateusz Guzik
can you try with this:

diff --git a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
index 16276b08c759..e1bca9ef140a 100644
--- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
+++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
@@ -71,7 +71,7 @@
 #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
 #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)

-#definekfpu_allowed()  1
+#definekfpu_allowed()  0
 #definekfpu_begin()kernel_neon_begin()
 #definekfpu_end()  kernel_neon_end()
 #definekfpu_init() (0)


On 4/7/23, Mark Millard  wrote:
> Turns out that as of this commit aarch64 (Cortex-A72 and Cortex-A57
> examples reported) gets the following even when no zfs media is
> present (UFS boot):
>
> # zpool import
>  x0: f0fa9168 (ucom_cons_softc + efbf1bb8)
>  x1: ff90 ($d.1 + afa318)
>  x2: ff900400 ($d.1 + afa718)
>  x3: fec1b0a4 (sha_incremental + 0)
>  x4:0
>  x5:   10
>  x6: 8e16db93
>  x7:0
>  x8: feb06168 (tf_sha256_neon + 0)
>  x9: fea931fb ($d.1 + b)
> x10: feb045f4 (SHA2Update + f4)
> x11:   29
> x12:1
> x13:0
> x14:0
> x15:2
> x16: feaf7500 ($d.0 + 0)
> x17: 00476cf0 (nanouptime + 0)
> x18: f0fa9000 (ucom_cons_softc + efbf1a50)
> x19: f0fa9168 (ucom_cons_softc + efbf1bb8)
> x20:  400
> x21: ff90 ($d.1 + afa318)
> x22: f0fa9198 (ucom_cons_softc + efbf1be8)
> x23:0
> x24:0
> x25:0
> x26: fed2df70 (sha256_neon_impl + 0)
> x27:  203
> x28:   31
> x29: f0fa9040 (ucom_cons_softc + efbf1a90)
>  sp: f0fa9000
>  lr: feb04668 (SHA2Update + 168)
> elr: feaf8684 (zfs_sha256_block_neon + 14)
> spsr: 2045
> esr: 1fe0
> panic: VFP exception in the kernel
> cpuid = 3
> time = 1680786034
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> do_el1h_sync() at do_el1h_sync+0x210
> handle_el1h_sync() at handle_el1h_sync+0x10
> --- exception, esr 0xf0fa9198
> (null)() at 0x400
> KDB: enter: panic
> [ thread pid 1446 tid 100101 ]
> Stopped at  kdb_enter+0x44: undefined   f905c27f
> db>
>
> The above was produced via using an artifact build's
> kernel based on that exact commit:
>
> https://artifact.ci.freebsd.org/snapshot/main/2a58b312b62f908ec92311d1bd8536dbaeb8e55b/arm64/aarch64/kernel.txz
>
> By contrast, the prior commit had an artifact build
> as well, but it's kernel does not get the panic for
> zpool import :
>
> https://artifact.ci.freebsd.org/snapshot/main/b98fbf3781df16f7797b2bbeabf205dc7d4985ae/arm64/aarch64/kernel.txz
>
> See also:
>
> https://lists.freebsd.org/archives/freebsd-current/2023-April/003417.html
>
> ===
> Mark Millard
> marklmi at yahoo.com
>
>


-- 
Mateusz Guzik 



Re: n262026-37d97b10ff0e installworld failure

2023-04-07 Thread Mateusz Guzik
On 4/7/23, Graham Perrin  wrote:
> Log: <https://bsd.to/SSV4/raw>
>
> Any ideas?
>
> 37d97b10ff0e was around twelve hours ago,
> <https://cgit.freebsd.org/src/log/?qt=range=37d97b10ff0e>
>

I pushed the fix. git pull, make sure you are at
20be1b4fc4b72f10d5f9411e5bbde0f46a98be5b or later. build and install
the new kernel, only then proceed with installworld and you should be
fine.

-- 
Mateusz Guzik 



Re: n262026-37d97b10ff0e installworld failure

2023-04-07 Thread Mateusz Guzik
yes, this is the recent zfs breakage. temporarily you can work around
by installing the patched cp by hand. it already in the tree.
alternatively get yourself https://github.com/openzfs/zfs/pull/14723

On 4/7/23, Graham Perrin  wrote:
> Log: <https://bsd.to/SSV4/raw>
>
> Any ideas?
>
> 37d97b10ff0e was around twelve hours ago,
> <https://cgit.freebsd.org/src/log/?qt=range=37d97b10ff0e>
>
>


-- 
Mateusz Guzik 



Re: NanoBSD: CURRENT unable to compile 13-STABLE : ld: error: args.o: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM15.0.7' Reader: 'LLVM 14.0.5')

2023-03-30 Thread Mateusz Guzik
On 3/30/23, Mateusz Guzik  wrote:
> On 3/30/23, FreeBSD User  wrote:
>> Hello folks,
>>
>> some strange misbehaviour in a NanoBSD compilation is driving me nuts.
>> Recently I posted some
>> error messages regarding
>>
>> [...]
>> src/sys/dev/an/if_an_pci.c:143:1: error: a
>> function definition without a prototype is deprecated in all versions of
>> C
>> and is not
>> supported in C2x [-Werror,-Wdeprecated-non-prototype]
>> [...]
>>
>> but being able compiling the kernel was "a lucky shot/mistake" and in the
>> vain of discussion
>> it has been revealed that my nanoBSD specific "make.conf/src.conf"
>> configurations were wrong.
>>
>> So, again:
>>
>> The builder host is a recent CURRENT (FreeBSD 14.0-CURRENT #2
>> main-n261876-f5a365e51fee: Thu
>> Mar 30 11:23:19 CEST 2023 amd64), the target is a most recent 13-STABLE
>> (git
>> pull on a
>> daily/hourly/most recentl basis when trying to build).
>>
>> As I understand the src/buildworld config, it seems crucial to have
>> CURRENT
>> and 13-STABLE
>> somehow separated due to their divergende in used LLVM/CLANG (CURRENT has
>> LLVM 15, 13-STABLE
>> is with LLVM 14).
>>
>> Putting
>>
>> WITHOUT_SYSTEM_COMPILER=YES
>> WITHOUT_SYSTEM_LINKER=YES
>>
>> into CONF_BUILD= AND CONF_WORLD= of NanoBSD configuration should prevent
>> the
>> usage of
>> CURRENT's LLVM 15 and instead a cross compiling with 13-STABLE's LLVM 14
>> compiler and linker
>> should be used to buildworld.
>>
>> But this doesn't seem to happen (at least in my case), since buildworld
>> fails to build with:
>>
>> [...]
>> cc -target x86_64-unknown-freebsd13.2
>> --sysroot=/pool/home/ohartmann/Projects/router/router/apu2c4/world/obj/amd64/ALERICH_13-STABLE_amd64/pool/home/ohartmann/Projects/router/router/apu2c4/src/amd64.amd64/tmp
>> -B/pool/home/ohartmann/Projects/router/router/apu2c4/world/obj/amd64/ALERICH_13-STABLE_amd64/pool/home/ohartmann/Projects/router/router/apu2c4/src/amd64.amd64/tmp/usr/bin
>> -O2 -pipe -fno-common -DMAINEXEC=bc -DNLSPATH=/usr/share/nls/%L/%N.cat
>> -DBUILD_TYPE=A
>> -DBC_DEFAULT_BANNER=0 -DBC_DEFAULT_PROMPT=0 -DBC_DEFAULT_SIGINT_RESET
>> -DBC_DEFAULT_TTY_MODE
>> -DBC_ENABLED -DBC_ENABLE_EDITLINE -DBC_ENABLE_EXTRA_MATH
>> -DBC_ENABLE_LIBRARY=0
>> -DBC_ENABLE_LONG_OPTIONS -DBC_ENABLE_HISTORY -DBC_ENABLE_PROMPT
>> -DBC_ENABLE_RAND
>> -DDC_DEFAULT_PROMPT=0 -DDC_DEFAULT_SIGINT_RESET -DDC_DEFAULT_TTY_MODE=0
>> -DDC_ENABLED -DNDEBUG
>> -I/pool/home/ohartmann/Projects/router/router/apu2c4/src/contrib/bc/include
>> -DBC_ENABLE_NLS=1
>> -flto -DNDEBUG -fPIE -mretpoline -ftrivial-auto-var-init=zero
>> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
>> -std=gnu99
>> -Wno-format-zero-length -fstack-protector-strong -Wsystem-headers -Wall
>> -Wno-format-y2k -W
>> -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes
>> -Wpointer-arith -Wreturn-type
>> -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter
>> -Wcast-align
>> -Wchar-subscripts -Wnested-externs -Wold-style-definition
>> -Wno-pointer-sign
>> -Wmissing-variable-declarations -Wthread-safety -Wno-empty-body
>> -Wno-string-plus-int
>> -Wno-unused-const-variable -Wno-error=unused-but-set-variable
>> -Qunused-arguments  -Wl,-zrelro
>> -pie -Wl,-zretpolineplt   -o gh-bc args.o bc.o bc_lex.o bc_parse.o data.o
>> dc.o dc_lex.o
>> dc_parse.o file.o history.o lang.o lex.o main.o num.o opt.o parse.o
>> program.o rand.o read.o
>> vector.o vm.o bc_help.o dc_help.o lib.o lib2.o   -ledit ld: error:
>> args.o:
>> Opaque pointers are
>> only supported in -opaque-pointers mode (Producer: 'LLVM15.0.7' Reader:
>> 'LLVM 14.0.5') cc:
>> error: linker command failed with exit code 1 (use -v to see invocation)
>> ***
>> [gh-bc] Error
>> code 1
>>
>> make[5]: stopped in
>> /pool/home/ohartmann/Projects/router/router/apu2c4/src/usr.bin/gh-bc
>> [...]
>>
>>
>> I'm now out of options here :-(
>>
>
> are you even using the dev/an driver?
>
> you should probably just remove it from the kernel (and any other
> driver of the sort)
>
> ultimately you should be able to stick to the compiler from main. in
> the worst case the commit to turn "function definition without a
> prototype is deprecated" from errors to warnings could be merged to
> stable/13 to facilitate the build
>
> it may be you will be able to get away with modifying CFLAGS like so:
> CFLAGS+=-Wno-deprecated-non-prototype
>
> in src.conf and/or make.conf
>

So I looked into it and landed
https://cgit.FreeBSD.org/src/commit/?id=82eb549f800e08158802b74bef62e7db0939a3fe

As of that commit I can both buildworld and buildkernel a stable/13
tree while running main, without any magic to change compilers.

-- 
Mateusz Guzik 



Re: NanoBSD: CURRENT unable to compile 13-STABLE : ld: error: args.o: Opaque pointers are only supported in -opaque-pointers mode (Producer: 'LLVM15.0.7' Reader: 'LLVM 14.0.5')

2023-03-30 Thread Mateusz Guzik
On 3/30/23, FreeBSD User  wrote:
> Hello folks,
>
> some strange misbehaviour in a NanoBSD compilation is driving me nuts.
> Recently I posted some
> error messages regarding
>
> [...]
> src/sys/dev/an/if_an_pci.c:143:1: error: a
> function definition without a prototype is deprecated in all versions of C
> and is not
> supported in C2x [-Werror,-Wdeprecated-non-prototype]
> [...]
>
> but being able compiling the kernel was "a lucky shot/mistake" and in the
> vain of discussion
> it has been revealed that my nanoBSD specific "make.conf/src.conf"
> configurations were wrong.
>
> So, again:
>
> The builder host is a recent CURRENT (FreeBSD 14.0-CURRENT #2
> main-n261876-f5a365e51fee: Thu
> Mar 30 11:23:19 CEST 2023 amd64), the target is a most recent 13-STABLE (git
> pull on a
> daily/hourly/most recentl basis when trying to build).
>
> As I understand the src/buildworld config, it seems crucial to have CURRENT
> and 13-STABLE
> somehow separated due to their divergende in used LLVM/CLANG (CURRENT has
> LLVM 15, 13-STABLE
> is with LLVM 14).
>
> Putting
>
> WITHOUT_SYSTEM_COMPILER=YES
> WITHOUT_SYSTEM_LINKER=YES
>
> into CONF_BUILD= AND CONF_WORLD= of NanoBSD configuration should prevent the
> usage of
> CURRENT's LLVM 15 and instead a cross compiling with 13-STABLE's LLVM 14
> compiler and linker
> should be used to buildworld.
>
> But this doesn't seem to happen (at least in my case), since buildworld
> fails to build with:
>
> [...]
> cc -target x86_64-unknown-freebsd13.2
> --sysroot=/pool/home/ohartmann/Projects/router/router/apu2c4/world/obj/amd64/ALERICH_13-STABLE_amd64/pool/home/ohartmann/Projects/router/router/apu2c4/src/amd64.amd64/tmp
> -B/pool/home/ohartmann/Projects/router/router/apu2c4/world/obj/amd64/ALERICH_13-STABLE_amd64/pool/home/ohartmann/Projects/router/router/apu2c4/src/amd64.amd64/tmp/usr/bin
> -O2 -pipe -fno-common -DMAINEXEC=bc -DNLSPATH=/usr/share/nls/%L/%N.cat
> -DBUILD_TYPE=A
> -DBC_DEFAULT_BANNER=0 -DBC_DEFAULT_PROMPT=0 -DBC_DEFAULT_SIGINT_RESET
> -DBC_DEFAULT_TTY_MODE
> -DBC_ENABLED -DBC_ENABLE_EDITLINE -DBC_ENABLE_EXTRA_MATH
> -DBC_ENABLE_LIBRARY=0
> -DBC_ENABLE_LONG_OPTIONS -DBC_ENABLE_HISTORY -DBC_ENABLE_PROMPT
> -DBC_ENABLE_RAND
> -DDC_DEFAULT_PROMPT=0 -DDC_DEFAULT_SIGINT_RESET -DDC_DEFAULT_TTY_MODE=0
> -DDC_ENABLED -DNDEBUG
> -I/pool/home/ohartmann/Projects/router/router/apu2c4/src/contrib/bc/include
> -DBC_ENABLE_NLS=1
> -flto -DNDEBUG -fPIE -mretpoline -ftrivial-auto-var-init=zero
> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
> -std=gnu99
> -Wno-format-zero-length -fstack-protector-strong -Wsystem-headers -Wall
> -Wno-format-y2k -W
> -Wno-unused-parameter -Wstrict-prototypes -Wmissing-prototypes
> -Wpointer-arith -Wreturn-type
> -Wcast-qual -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter
> -Wcast-align
> -Wchar-subscripts -Wnested-externs -Wold-style-definition -Wno-pointer-sign
> -Wmissing-variable-declarations -Wthread-safety -Wno-empty-body
> -Wno-string-plus-int
> -Wno-unused-const-variable -Wno-error=unused-but-set-variable
> -Qunused-arguments  -Wl,-zrelro
> -pie -Wl,-zretpolineplt   -o gh-bc args.o bc.o bc_lex.o bc_parse.o data.o
> dc.o dc_lex.o
> dc_parse.o file.o history.o lang.o lex.o main.o num.o opt.o parse.o
> program.o rand.o read.o
> vector.o vm.o bc_help.o dc_help.o lib.o lib2.o   -ledit ld: error: args.o:
> Opaque pointers are
> only supported in -opaque-pointers mode (Producer: 'LLVM15.0.7' Reader:
> 'LLVM 14.0.5') cc:
> error: linker command failed with exit code 1 (use -v to see invocation) ***
> [gh-bc] Error
> code 1
>
> make[5]: stopped in
> /pool/home/ohartmann/Projects/router/router/apu2c4/src/usr.bin/gh-bc
> [...]
>
>
> I'm now out of options here :-(
>

are you even using the dev/an driver?

you should probably just remove it from the kernel (and any other
driver of the sort)

ultimately you should be able to stick to the compiler from main. in
the worst case the commit to turn "function definition without a
prototype is deprecated" from errors to warnings could be merged to
stable/13 to facilitate the build

it may be you will be able to get away with modifying CFLAGS like so:
CFLAGS+=-Wno-deprecated-non-prototype

in src.conf and/or make.conf

-- 
Mateusz Guzik 



Re: oudriere/CURRENT jail upgrade: install error: libc.so.7: Operation not permitted *** [_libinstall]

2023-03-15 Thread Mateusz Guzik
I don't know why this broke, but as a temp hack you can probably get
away with just removing chflags, like so chflags -R noschg
/path/to/the/jail

On 3/15/23, FreeBSD User  wrote:
> Hello,
>
> running CURRENT on the host and a CURRENT jail on that specific host for
> poudriere test
> purposes, building the jail from sources succeeded, but installing stopped
> working a couple of
> weeks ago (installing 13-STABL:E jails stopped working, too, but building
> 13-STABLE on top of
> CURRENT works fine).
>
> The jail's base is built like a PkgBase. the I try to update the existing
> jail via
>
> poudriere jail -j head-amd64
>
> and that fails with (as 13-stable does also):
>
> [...]
> install -N /pool/sources/CURRENT/src/etc  -s -o root -g wheel -m 555
> mknetid
> /pool/poudriere/jails/head-amd64/usr/libexec/mknetid ---
> realinstall_subdir_lib ---
> install: rename: /pool/poudriere/jails/head-amd64/lib/INS@uPWcSw to
> /pool/poudriere/jails/head-amd64/lib/libc.so.7: Operation not permitted ***
> [_libinstall]
> Error code 71
> [...]
>
> The same with building the jail running "poudriere -u -b". The same is for
> 13-STABLE.
>
> The only way to circumvent this issue is to delete the jail and install it.
> The installation
> succeeds in both updating-failing scenarios (pkgbase and poudriere built
> case), which leads me
> to the conclusion, that a minor bug is preventing the update.
>
> Any suggestions how to make updating work again?
>
> Kind regards,
>
> oh
>
>
> --
> O. Hartmann
>
>


-- 
Mateusz Guzik 



Re: FreeBSD 13.2-stable crash in /usr/src/sys/amd64/include/pcpu_aux.h:55

2023-02-19 Thread Mateusz Guzik
looks like a jail problem, maintainer cc'ed

On 2/19/23, Michael Jung  wrote:
> After upgrading from
>
> FreeBSD firewall.mikej.com 13.1-STABLE FreeBSD 13.1-STABLE #21
> stable/13-n253337-16603f60156e:Wed Dec 28 08:22:48 EST 2022
> mi...@firewall.mikej.com:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
>
> TO
>
> FreeBSD firewall.mikej.com 13.2-STABLE FreeBSD 13.2-STABLE #3
> stable/13-n254483-e0c3f2a1e296: Tue Feb 14 19:25:51 EST 2023
> mi...@firewall.mikej.com:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
>
> I had a kernel crash which can be found here.
>
> http://mail.mikej.com/core.txt.0
>
> It has not happened again, but I'm putting it though it's normal paces.
>
> The only thing that was occurring which is not a normal thing for me to
> is I was moving TB's worth of data between directly attached two zpools.
>
> Regards,
>
> Michael Jung
>
>
>
>
>


-- 
Mateusz Guzik 



buildkernel avoidably building modules specified in the config

2023-02-07 Thread Mateusz Guzik
... then a lot of the code ends up being compiled twice for no good reason.

This popped up again as clang 15 emits a ton of warnings vs K funcs.

I don't know how this works internally, is it really a big problem to sort out?
I figured config(8) could generate a bunch of WITHOUT_ of similar, but I guess
sys/modules/Makefile will have to be patched to support it, which it
only does for some modules at the moment.

I don't have any interest in working on it, so just bringing this up
for interested.
-- 
Mateusz Guzik 



Re: 1 year src-patch anniversary!

2023-01-29 Thread Mateusz Guzik
On 1/29/23, Mateusz Guzik  wrote:
> On 1/29/23, Jamie Landeg-Jones  wrote:
>> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261657 is a trivial fix
>> to an admittedly trivial issue, but it's soon going to hit one year old,
>> and has not had any feedback. Not even "this is rubbish. close ticket"
>>
>> | jamie@catwalk:~ % stat 'so good they named it twice'
>> | stat: so good they named it twice: stat: No such file or directory
>>
>> As such, it's the oldest of my patches to be completely ignored, but
>> then,
>> most of my fixes I haven't even submitted, because, what's the point?
>> I've instead spent time writing something so the patches are
>> automatically
>> aplied to my src tree, and distributed to all my servers.
>>
>> I know it's a volunteer effort, but I've been here 25 years, and whilst
>> I could (and should) take on more port-maintainership, any other offers
>> of help have fallen on deaf ears.
>>
>
> Well I was not aware of it.
>
> mail me with git format-patch result and I'll commit.
>

also make sure the commit message starts with: stat:

-- 
Mateusz Guzik 



Re: 1 year src-patch anniversary!

2023-01-29 Thread Mateusz Guzik
On 1/29/23, Jamie Landeg-Jones  wrote:
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261657 is a trivial fix
> to an admittedly trivial issue, but it's soon going to hit one year old,
> and has not had any feedback. Not even "this is rubbish. close ticket"
>
> | jamie@catwalk:~ % stat 'so good they named it twice'
> | stat: so good they named it twice: stat: No such file or directory
>
> As such, it's the oldest of my patches to be completely ignored, but then,
> most of my fixes I haven't even submitted, because, what's the point?
> I've instead spent time writing something so the patches are automatically
> aplied to my src tree, and distributed to all my servers.
>
> I know it's a volunteer effort, but I've been here 25 years, and whilst
> I could (and should) take on more port-maintainership, any other offers
> of help have fallen on deaf ears.
>

Well I was not aware of it.

mail me with git format-patch result and I'll commit.

-- 
Mateusz Guzik 



Re: A panic a day

2022-09-22 Thread Mateusz Guzik
On 9/22/22, Steve Kargl  wrote:
> On Thu, Sep 22, 2022 at 03:00:53PM -0400, Mark Johnston wrote:
>> On Thu, Sep 22, 2022 at 11:31:40AM -0700, Steve Kargl wrote:
>> > All,
>> >
>> > I updated my kernel/world/all ports on Sept 19 2022.
>> > Since then, I have had daily panics and hard lock-up
>> > (no panic, keyboard, mouse, network, ...).  The one
>> > panic I did witness sent text scolling off the screen.
>> > There is no dump, or at least, I haven't figured out
>> > a way to get a dump.
>> >
>> > Using ports/graphics/tesseract and then hand editing
>> > the OCR result, the last visible portions is
>> >
>> >
>
> (panic messages removed).
>
>> It looks like you use the 4BSD scheduler?  I think there's a bug in
>> kick_other_cpu() in that it doesn't make sure that the remote CPU's
>> curthread lock is held when modifying thread state.  Because 4BSD has a
>> global scheduler lock, this is often true in practice, but doesn't have
>> to be.
>
> Yes, I use 4BSD.  ULE has very poor performance for HPC type work with
> OpenMPI.
>

Is there an easy way to set it up for testing purposes?

>> I think this untested patch will address the panics.  The bug was there
>> for a long time but some recent restructuring added an assertion which
>> caught it.
>
> I'll give it a try, and report back.  Thanks!
>
> --
> steve
>
>> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
>> index 9d48aa746f6d..484864b66c1c 100644
>> --- a/sys/kern/sched_4bsd.c
>> +++ b/sys/kern/sched_4bsd.c
>> @@ -1282,9 +1282,10 @@ kick_other_cpu(int pri, int cpuid)
>>  }
>>  #endif /* defined(IPI_PREEMPTION) && defined(PREEMPTION) */
>>
>> -ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
>> -ipi_cpu(cpuid, IPI_AST);
>> -return;
>> +if (pcpu->pc_curthread->td_lock == _lock) {
>> +ast_sched_locked(pcpu->pc_curthread, TDA_SCHED);
>> +ipi_cpu(cpuid, IPI_AST);
>> +}
>>  }
>>  #endif /* SMP */
>>
>> @@ -1397,7 +1398,7 @@ sched_add(struct thread *td, int flags)
>>
>>  cpuid = PCPU_GET(cpuid);
>>  if (single_cpu && cpu != cpuid) {
>> -kick_other_cpu(td->td_priority, cpu);
>> +kick_other_cpu(td->td_priority, cpu);
>>  } else {
>>  if (!single_cpu) {
>>  tidlemsk = idle_cpus_mask;
>
> --
> Steve
>
>


-- 
Mateusz Guzik 



Re: panic after update from main-n258027-c9baa974717a to main-n258075-5b5b7e2ca2fa

2022-09-17 Thread Mateusz Guzik
this is already fixed, please update

On 9/17/22, David Wolfskill  wrote:
> Not reproducible on reboot; only happened on one machine (main laptop)
> out of 3 that I updated this morning.  No dump. :-/
>
> A screenshot, a copy of the full dmesg.boot from the
> immediately-following successful (verbose) boot, and a copy of the
> uname output:
>
> FreeBSD 14.0-CURRENT #589 main-n258075-5b5b7e2ca2fa: Sat Sep 17 12:22:57 UTC
> 2022
> r...@g1-70.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY
> amd64 1400068 1400068
>
> may be found at https://www.catwhisker.org/~david/FreeBSD/head/n258075/
>
> The screenshot includes a backtrace; a hand-transcription:
>
> Trying to mount root from ufs:/dev/ada0s4a [rw]...
> panic: Assertion _ndp->ni_cnd.cn_pnbuf != NULL failed at
> /usr/src/sys/kern/vfs_mountroot.c:731
> cpuid = 1
> time = 2
> KDB: stack backtrace:
> db_trace_self_wrapper() at 0x804b9dfb =
> db_trace_self_wrapper+0x2b/frame 0xfe0fba5d4af0
> vpanic() at 0x80bd90f1 = vpanic+0x151/frame 0xfe0fba5d4b40
> panic() at 0x80bd8ec3 = panic+0x43/frame 0xfe0fba5d4ba0
> parse_mount_dev_present() at 0x80cc7f76 =
> parse_mount_dev_present+0x116/frame 0xfe0fba5d4c90
> parse_mount() at 0x80cc7c49 = parse_mount+0x5c9/frame
> 0xfe0fba5d4d00
> vfs_mountroot() at 0x80cc60c3 = vfs_mountroot+0x7c3/frame
> 0xfe0fba5d4e60
> start_init() at 0x80b60093 = start_init+0x23/frame
> 0xfe0fba5d4ef0
> fork_exit() at 0x80b8f770 = fork_exit+0x80/frame 0xfe0fba5d4f30
> fork_trampoline() at 0x810c264e = fork_trampoline+0xe/frame
> 0xfe0fba5d4f30
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
> [ thread pid 1 tid 12 ]
> Stopped at0x80c26ac2 = kdb_enter+0x32:movq
> $0,0x12ab643(%rip)
> db>
>
>
>
> For all I know, the machine may be a little flaky -- it is the
> oldest of the 3.  But I thought it might be worth mentioning.
>
> Peace,
> david
> --
> David H. Wolfskill  da...@catwhisker.org
> "In my administration, I'm going to enforce all laws concerning the
> protection of classified information. No one will be above the law."
>  -- D. Trump, August, 2016
>
> See https://www.catwhisker.org/~david/publickey.gpg for my public key.
>


-- 
Mateusz Guzik 



Re: build of vfs_lookup.c now broken in non-INVARIANTS kernels

2022-09-17 Thread Mateusz Guzik
fixed in 
https://cgit.freebsd.org/src/commit/?id=b77bdfdb67c2e9660658a0373662e4263a905e90

On 9/17/22, Gary Jennejohn  wrote:
> Compiling vfs_lookup.c now fails when NONINVARIANTS is not included in
> the kernel config file because NDVALIDATE is defined as NDVALIDATE_impl,
> which itself is only defined when NONINVARIANTS is also defined.
>
> This breaks buildkernel.
>
> --
> Gary Jennejohn
>
>


-- 
Mateusz Guzik 



Re: kernel-side thread stack swapping

2022-09-02 Thread Mateusz Guzik
On 9/2/22, Konstantin Belousov  wrote:
> On Fri, Sep 02, 2022 at 04:11:40PM +0200, Mateusz Guzik wrote:
>> On 9/2/22, Konstantin Belousov  wrote:
>> > On Fri, Sep 02, 2022 at 02:05:37PM +0200, Mateusz Guzik wrote:
>> >> Is this really of practical use today?
>> >>
>> >> I have a WIP patch which needs to temporarily store something on the
>> >> stack and should things go wrong enough it will be accessed by UMA,
>> >> which can't handle the fault nor decide to skip the access.
>> >>
>> >> I can add something like td_pinstack or whatever to keep it around,
>> >> but perhaps the entire machinery can be just whacked?
>> > p_hold already does that.
>> >
>>
>> I only need to protect the one stack and more importantly don't want
>> to take the proc lock to bump p_hold (nor convert it to atomics), it's
>> all thread-local so to speak.
>
> You do not want to take proc lock, or cannot?  Note that only sleeping
> thread' stack can be swapped out.
>

To add some context here I'm looking at reworking vnode batching in
vdrop -> vdbatch_enqueue to remove vnode interlock -> vdbatch lock ->
vnode list lock dependency (and improve scalability of the thing).

Adding a proc lock here would negatively affect performance for
everyone *and* weirdly serialize same-proc consumers.

-- 
Mateusz Guzik 



Re: kernel-side thread stack swapping

2022-09-02 Thread Mateusz Guzik
On 9/2/22, Konstantin Belousov  wrote:
> On Fri, Sep 02, 2022 at 02:05:37PM +0200, Mateusz Guzik wrote:
>> Is this really of practical use today?
>>
>> I have a WIP patch which needs to temporarily store something on the
>> stack and should things go wrong enough it will be accessed by UMA,
>> which can't handle the fault nor decide to skip the access.
>>
>> I can add something like td_pinstack or whatever to keep it around,
>> but perhaps the entire machinery can be just whacked?
> p_hold already does that.
>

I only need to protect the one stack and more importantly don't want
to take the proc lock to bump p_hold (nor convert it to atomics), it's
all thread-local so to speak.

-- 
Mateusz Guzik 



kernel-side thread stack swapping

2022-09-02 Thread Mateusz Guzik
Is this really of practical use today?

I have a WIP patch which needs to temporarily store something on the
stack and should things go wrong enough it will be accessed by UMA,
which can't handle the fault nor decide to skip the access.

I can add something like td_pinstack or whatever to keep it around,
but perhaps the entire machinery can be just whacked?

-- 
Mateusz Guzik 



Re: Lots of port failures today?

2022-08-18 Thread Mateusz Guzik
this should do it:
https://cgit.FreeBSD.org/src/commit/?id=545db925c3d5408e71e21432895770cd49fd2cf3

On 8/19/22, Shawn Webb  wrote:
> On Thu, Aug 18, 2022 at 02:28:58PM -0700, Mark Millard wrote:
>> Larry Rosenman  wrote on
>> Date: Thu, 18 Aug 2022 19:45:10 UTC :
>>
>> > https://home.lerctr.org:/build.html?mastername=live-host_ports=2022-08-18_13h12m51s
>> >
>> > circa 97ecdc00ac5 on main
>> > Ideas?
>>
>> Unsure but . . .
>>
>> A bunch of your errors start with text looking like:
>>
>> QUOTE
>> CMake Error:
>>   The detected version of Ninja () is less than the version of Ninja
>> required
>>   by CMake (1.3).
>> END QUOTE
>>
>
> The 14-CURRENT/amd64 package build I kicked off yesterday for
> HardenedBSD is experiencing the same exact failure. Nearly 12,000
> ports skipped:
>
> http://ci-08.md.hardenedbsd.org/build.html?mastername=hardenedbsd-current_amd64-local=2022-08-17_20h01m01s
>
> Thanks,
>
> --
> Shawn Webb
> Cofounder / Security Engineer
> HardenedBSD
>
> https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc
>


-- 
Mateusz Guzik 



Re: Lots of port failures today?

2022-08-18 Thread Mateusz Guzik
ye just get stock top of the tree

On 8/18/22, Larry Rosenman  wrote:
> On 08/18/2022 4:25 pm, Mateusz Guzik wrote:
>> On 8/18/22, Mateusz Guzik  wrote:
>>> On 8/18/22, Larry Rosenman  wrote:
>>>> https://home.lerctr.org:/build.html?mastername=live-host_ports=2022-08-18_13h12m51s
>>>>
>>>> circa 97ecdc00ac5 on main
>>>> Ideas?
>>>>
>>>
>>> try with 9ac6eda6c6a36db6bffa01be7faea24f8bb92a0f reverted
>>>
>>
>> I'm pretty sure it will be fixed with  URL:
>> https://cgit.FreeBSD.org/src/commit/?id=545db925c3d5408e71e21432895770cd49fd2cf3
> should I un-revert 9ac6eda6c6a36db6bffa01be7faea24f8bb92a0f and pick up
> a new pull?
> --
> Larry Rosenman http://www.lerctr.org/~ler
> Phone: +1 214-642-9640 E-Mail: l...@lerctr.org
> US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
>


-- 
Mateusz Guzik 



Re: Lots of port failures today?

2022-08-18 Thread Mateusz Guzik
On 8/18/22, Mateusz Guzik  wrote:
> On 8/18/22, Larry Rosenman  wrote:
>> https://home.lerctr.org:/build.html?mastername=live-host_ports=2022-08-18_13h12m51s
>>
>> circa 97ecdc00ac5 on main
>> Ideas?
>>
>
> try with 9ac6eda6c6a36db6bffa01be7faea24f8bb92a0f reverted
>

I'm pretty sure it will be fixed with  URL:
https://cgit.FreeBSD.org/src/commit/?id=545db925c3d5408e71e21432895770cd49fd2cf3

-- 
Mateusz Guzik 



Re: Lots of port failures today?

2022-08-18 Thread Mateusz Guzik
On 8/18/22, Larry Rosenman  wrote:
> https://home.lerctr.org:/build.html?mastername=live-host_ports=2022-08-18_13h12m51s
>
> circa 97ecdc00ac5 on main
> Ideas?
>

try with 9ac6eda6c6a36db6bffa01be7faea24f8bb92a0f reverted

-- 
Mateusz Guzik 



Re: Can't build with INVARIANTS but not WITNESS

2022-04-27 Thread Mateusz Guzik
On 4/27/22, John F Carr  wrote:
> My -CURRENT kernel has INVARIANTS (inherited from GENERIC) but not WITNESS:
>
> include GENERIC
> ident   STRIATUS
> nooptions   WITNESS
> nooptions   WITNESS_SKIPSPIN
>
> My kernel build fails:
>
> /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:102:13: error: variable
> 'line' set but not used [-Werror,-Wunused-but-set-variable]
> int flags, line __diagused;
>^
> /usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:101:14: error: variable
> 'file' set but not used [-Werror,-Wunused-but-set-variable]
> const char *file __diagused;
>
> The problem is, __diagused expands to nothing if INVARIANTS _or_ WITNESS is
> defined, but the variable in vfs_lookup.c is only used if WITNESS is
> defined.
>
> #if defined(INVARIANTS) || defined(WITNESS)
> #define __diagused
> #else
> #define __diagused  __unused
> #endif
>
> I think this code is trying to be too clever and causing more trouble than
> it prevents.  Change the || to &&, or replace __diagused with __unused
> everywhere.
>

I disagree. The entire point is to not end up with actually unused
variables even when is enabled.

I patched it up in
https://cgit.FreeBSD.org/src/commit/?id=b40c0db6f6d61ed594118d81dc691b9263a7e4d7
. This still allows for actually vars when only one of INVARIANTS or
WITNESS is defined, but that's a much smaller problem than allowing it
in general.

-- 
Mateusz Guzik 



Re: nullfs and ZFS issues

2022-04-21 Thread Mateusz Guzik
On 4/21/22, Alexander Leidinger  wrote:
> Quoting Doug Ambrisko  (from Wed, 20 Apr 2022
> 09:20:33 -0700):
>
>> On Wed, Apr 20, 2022 at 11:39:44AM +0200, Alexander Leidinger wrote:
>> | Quoting Doug Ambrisko  (from Mon, 18 Apr 2022
>> | 16:32:38 -0700):
>> |
>> | > With nullfs, nocache and settings max vnodes to a low number I can
>> |
>> | Where is nocache documented? I don't see it in mount_nullfs(8),
>> | mount(8) or nullfs(5).
>>
>> I didn't find it but it is in:
>>  src/sys/fs/nullfs/null_vfsops.c:  if (vfs_getopt(mp->mnt_optnew,
>> "nocache", NULL, NULL) == 0 ||
>>
>> Also some file systems disable it via MNTK_NULL_NOCACHE
>
> Does the attached diff look ok?
>
>> | I tried a nullfs mount with nocache and it doesn't show up in the
>> | output of "mount".
>>
>> Yep, I saw that as well.  I could tell by dropping into ddb and then
>> do a show mount on the FS and look at the count.  That is why I added
>> the vnode count to mount -v so I could see the usage without dropping
>> into ddb.
>
> I tried nocache on a system with a lot of jails which use nullfs,
> which showed very slow behavior in the daily periodic runs (12h runs
> in the night after boot, 24h or more in subsequent nights). Now the
> first nightly run after boot was finished after 4h.
>
> What is the benefit of not disabling the cache in nullfs? I would
> expect zfs (or ufs) to cache the (meta)data anyway.
>

does the poor performance show up with
https://people.freebsd.org/~mjg/vnlru_free_pick.diff ?

if the long runs are still there, can you get some profiling from it?
sysctl -a before and after would be a start.

My guess is that you are the vnode limit and bumping into the 1 second sleep.

-- 
Mateusz Guzik 



Re: nullfs and ZFS issues

2022-04-20 Thread Mateusz Guzik
On 4/19/22, Doug Ambrisko  wrote:
> On Tue, Apr 19, 2022 at 11:47:22AM +0200, Mateusz Guzik wrote:
> | Try this: https://people.freebsd.org/~mjg/vnlru_free_pick.diff
> |
> | this is not committable but should validate whether it works fine
>
> As a POC it's working.  I see the vnode count for the nullfs and
> ZFS go up.  The ARC cache also goes up until it exceeds the ARC max.
> size tten the vnodes for nullfs and ZFS goes down.  The ARC cache goes
> down as well.  This all repeats over and over.  The systems seems
> healthy.  No excessive running of arc_prune or arc_evict.
>
> My only comment is that the vnode freeing seems a bit agressive.
> Going from ~15,000 to ~200 vnode for nullfs and the same for ZFS.
> The ARC drops from 70M to 7M (max is set at 64M) for this unit
> test.
>

Can you check what kind of shrinking is requested by arc to begin
with? I imagine encountering a nullfs vnode may end up recycling 2
instead of 1, but even repeated a lot it does not explain the above.

>
> | On 4/19/22, Mateusz Guzik  wrote:
> | > On 4/19/22, Mateusz Guzik  wrote:
> | >> On 4/19/22, Doug Ambrisko  wrote:
> | >>> I've switched my laptop to use nullfs and ZFS.  Previously, I used
> | >>> localhost NFS mounts instead of nullfs when nullfs would complain
> | >>> that it couldn't mount.  Since that check has been removed, I've
> | >>> switched to nullfs only.  However, every so often my laptop would
> | >>> get slow and the the ARC evict and prune thread would consume two
> | >>> cores 100% until I rebooted.  I had a 1G max. ARC and have increased
> | >>> it to 2G now.  Looking into this has uncovered some issues:
> | >>>  -  nullfs would prevent vnlru_free_vfsops from doing anything
> | >>> when called from ZFS arc_prune_task
> | >>>  -  nullfs would hang onto a bunch of vnodes unless mounted with
> | >>> nocache
> | >>>  -  nullfs and nocache would break untar.  This has been fixed
> now.
> | >>>
> | >>> With nullfs, nocache and settings max vnodes to a low number I can
> | >>> keep the ARC around the max. without evict and prune consuming
> | >>> 100% of 2 cores.  This doesn't seem like the best solution but it
> | >>> better then when the ARC starts spinning.
> | >>>
> | >>> Looking into this issue with bhyve and a md drive for testing I
> create
> | >>> a brand new zpool mounted as /test and then nullfs mount /test to
> /mnt.
> | >>> I loop through untaring the Linux kernel into the nullfs mount, rm
> -rf
> | >>> it
> | >>> and repeat.  I set the ARC to the smallest value I can.  Untarring
> the
> | >>> Linux kernel was enough to get the ARC evict and prune to spin since
> | >>> they couldn't evict/prune anything.
> | >>>
> | >>> Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it
> | >>>   static int
> | >>>   vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode
> *mvp)
> | >>>   {
> | >>> ...
> | >>>
> | >>> for (;;) {
> | >>> ...
> | >>> vp = TAILQ_NEXT(vp, v_vnodelist);
> | >>> ...
> | >>>
> | >>> /*
> | >>>  * Don't recycle if our vnode is from different type
> | >>>  * of mount point.  Note that mp is type-safe, the
> | >>>  * check does not reach unmapped address even if
> | >>>  * vnode is reclaimed.
> | >>>  */
> | >>> if (mnt_op != NULL && (mp = vp->v_mount) != NULL &&
> | >>> mp->mnt_op != mnt_op) {
> | >>> continue;
> | >>> }
> | >>> ...
> | >>>
> | >>> The vp ends up being the nulfs mount and then hits the continue
> | >>> even though the passed in mvp is on ZFS.  If I do a hack to
> | >>> comment out the continue then I see the ARC, nullfs vnodes and
> | >>> ZFS vnodes grow.  When the ARC calls arc_prune_task that calls
> | >>> vnlru_free_vfsops and now the vnodes go down for nullfs and ZFS.
> | >>> The ARC cache usage also goes down.  Then they increase again until
> | >>> the ARC gets full and then they go down again.  So with this hack
> | >>> I don't need nocache passed to nullfs and I don't need to limit
> | >>> the max vno

Re: nullfs and ZFS issues

2022-04-19 Thread Mateusz Guzik
Try this: https://people.freebsd.org/~mjg/vnlru_free_pick.diff

this is not committable but should validate whether it works fine

On 4/19/22, Mateusz Guzik  wrote:
> On 4/19/22, Mateusz Guzik  wrote:
>> On 4/19/22, Doug Ambrisko  wrote:
>>> I've switched my laptop to use nullfs and ZFS.  Previously, I used
>>> localhost NFS mounts instead of nullfs when nullfs would complain
>>> that it couldn't mount.  Since that check has been removed, I've
>>> switched to nullfs only.  However, every so often my laptop would
>>> get slow and the the ARC evict and prune thread would consume two
>>> cores 100% until I rebooted.  I had a 1G max. ARC and have increased
>>> it to 2G now.  Looking into this has uncovered some issues:
>>>  -  nullfs would prevent vnlru_free_vfsops from doing anything
>>> when called from ZFS arc_prune_task
>>>  -  nullfs would hang onto a bunch of vnodes unless mounted with
>>> nocache
>>>  -  nullfs and nocache would break untar.  This has been fixed now.
>>>
>>> With nullfs, nocache and settings max vnodes to a low number I can
>>> keep the ARC around the max. without evict and prune consuming
>>> 100% of 2 cores.  This doesn't seem like the best solution but it
>>> better then when the ARC starts spinning.
>>>
>>> Looking into this issue with bhyve and a md drive for testing I create
>>> a brand new zpool mounted as /test and then nullfs mount /test to /mnt.
>>> I loop through untaring the Linux kernel into the nullfs mount, rm -rf
>>> it
>>> and repeat.  I set the ARC to the smallest value I can.  Untarring the
>>> Linux kernel was enough to get the ARC evict and prune to spin since
>>> they couldn't evict/prune anything.
>>>
>>> Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it
>>>   static int
>>>   vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode *mvp)
>>>   {
>>> ...
>>>
>>> for (;;) {
>>> ...
>>> vp = TAILQ_NEXT(vp, v_vnodelist);
>>> ...
>>>
>>> /*
>>>  * Don't recycle if our vnode is from different type
>>>  * of mount point.  Note that mp is type-safe, the
>>>  * check does not reach unmapped address even if
>>>  * vnode is reclaimed.
>>>  */
>>> if (mnt_op != NULL && (mp = vp->v_mount) != NULL &&
>>> mp->mnt_op != mnt_op) {
>>> continue;
>>> }
>>> ...
>>>
>>> The vp ends up being the nulfs mount and then hits the continue
>>> even though the passed in mvp is on ZFS.  If I do a hack to
>>> comment out the continue then I see the ARC, nullfs vnodes and
>>> ZFS vnodes grow.  When the ARC calls arc_prune_task that calls
>>> vnlru_free_vfsops and now the vnodes go down for nullfs and ZFS.
>>> The ARC cache usage also goes down.  Then they increase again until
>>> the ARC gets full and then they go down again.  So with this hack
>>> I don't need nocache passed to nullfs and I don't need to limit
>>> the max vnodes.  Doing multiple untars in parallel over and over
>>> doesn't seem to cause any issues for this test.  I'm not saying
>>> commenting out continue is the fix but a simple POC test.
>>>
>>
>> I don't see an easy way to say "this is a nullfs vnode holding onto a
>> zfs vnode". Perhaps the routine can be extrended with issuing a nullfs
>> callback, if the module is loaded.
>>
>> In the meantime I think a good enough(tm) fix would be to check that
>> nothing was freed and fallback to good old regular clean up without
>> filtering by vfsops. This would be very similar to what you are doing
>> with your hack.
>>
>
> Now that I wrote this perhaps an acceptable hack would be to extend
> struct mount with a pointer to "lower layer" mount (if any) and patch
> the vfsops check to also look there.
>
>>
>>> It appears that when ZFS is asking for cached vnodes to be
>>> free'd nullfs also needs to free some up as well so that
>>> they are free'd on the VFS level.  It seems that vnlru_free_impl
>>> should allow some of the related nullfs vnodes to be free'd so
>>> the ZFS ones can be free'd and reduce the size of the ARC.
>>>
>>> BTW, I also hacked the kernel and mount to show the vnodes used
>>> per mount ie. mount -v:
>>>   test on /test (zfs, NFS exported, local, nfsv4acls, fsid
>>> 2b23b2a1de21ed66,
>>> vnodes: count 13846 lazy 0)
>>>   /test on /mnt (nullfs, NFS exported, local, nfsv4acls, fsid
>>> 11ff00292900, vnodes: count 13846 lazy 0)
>>>
>>> Now I can easily see how the vnodes are used without going into ddb.
>>> On my laptop I have various vnet jails and nullfs mount my homedir into
>>> them so pretty much everything goes through nullfs to ZFS.  I'm limping
>>> along with the nullfs nocache and small number of vnodes but it would be
>>> nice to not need that.
>>>
>>> Thanks,
>>>
>>> Doug A.
>>>
>>>
>>
>>
>> --
>> Mateusz Guzik 
>>
>
>
> --
> Mateusz Guzik 
>


-- 
Mateusz Guzik 



Re: nullfs and ZFS issues

2022-04-19 Thread Mateusz Guzik
On 4/19/22, Mateusz Guzik  wrote:
> On 4/19/22, Doug Ambrisko  wrote:
>> I've switched my laptop to use nullfs and ZFS.  Previously, I used
>> localhost NFS mounts instead of nullfs when nullfs would complain
>> that it couldn't mount.  Since that check has been removed, I've
>> switched to nullfs only.  However, every so often my laptop would
>> get slow and the the ARC evict and prune thread would consume two
>> cores 100% until I rebooted.  I had a 1G max. ARC and have increased
>> it to 2G now.  Looking into this has uncovered some issues:
>>  -   nullfs would prevent vnlru_free_vfsops from doing anything
>>  when called from ZFS arc_prune_task
>>  -   nullfs would hang onto a bunch of vnodes unless mounted with
>>  nocache
>>  -   nullfs and nocache would break untar.  This has been fixed now.
>>
>> With nullfs, nocache and settings max vnodes to a low number I can
>> keep the ARC around the max. without evict and prune consuming
>> 100% of 2 cores.  This doesn't seem like the best solution but it
>> better then when the ARC starts spinning.
>>
>> Looking into this issue with bhyve and a md drive for testing I create
>> a brand new zpool mounted as /test and then nullfs mount /test to /mnt.
>> I loop through untaring the Linux kernel into the nullfs mount, rm -rf it
>> and repeat.  I set the ARC to the smallest value I can.  Untarring the
>> Linux kernel was enough to get the ARC evict and prune to spin since
>> they couldn't evict/prune anything.
>>
>> Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it
>>   static int
>>   vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode *mvp)
>>   {
>>  ...
>>
>> for (;;) {
>>  ...
>> vp = TAILQ_NEXT(vp, v_vnodelist);
>>  ...
>>
>> /*
>>  * Don't recycle if our vnode is from different type
>>  * of mount point.  Note that mp is type-safe, the
>>  * check does not reach unmapped address even if
>>  * vnode is reclaimed.
>>  */
>> if (mnt_op != NULL && (mp = vp->v_mount) != NULL &&
>> mp->mnt_op != mnt_op) {
>> continue;
>> }
>>  ...
>>
>> The vp ends up being the nulfs mount and then hits the continue
>> even though the passed in mvp is on ZFS.  If I do a hack to
>> comment out the continue then I see the ARC, nullfs vnodes and
>> ZFS vnodes grow.  When the ARC calls arc_prune_task that calls
>> vnlru_free_vfsops and now the vnodes go down for nullfs and ZFS.
>> The ARC cache usage also goes down.  Then they increase again until
>> the ARC gets full and then they go down again.  So with this hack
>> I don't need nocache passed to nullfs and I don't need to limit
>> the max vnodes.  Doing multiple untars in parallel over and over
>> doesn't seem to cause any issues for this test.  I'm not saying
>> commenting out continue is the fix but a simple POC test.
>>
>
> I don't see an easy way to say "this is a nullfs vnode holding onto a
> zfs vnode". Perhaps the routine can be extrended with issuing a nullfs
> callback, if the module is loaded.
>
> In the meantime I think a good enough(tm) fix would be to check that
> nothing was freed and fallback to good old regular clean up without
> filtering by vfsops. This would be very similar to what you are doing
> with your hack.
>

Now that I wrote this perhaps an acceptable hack would be to extend
struct mount with a pointer to "lower layer" mount (if any) and patch
the vfsops check to also look there.

>
>> It appears that when ZFS is asking for cached vnodes to be
>> free'd nullfs also needs to free some up as well so that
>> they are free'd on the VFS level.  It seems that vnlru_free_impl
>> should allow some of the related nullfs vnodes to be free'd so
>> the ZFS ones can be free'd and reduce the size of the ARC.
>>
>> BTW, I also hacked the kernel and mount to show the vnodes used
>> per mount ie. mount -v:
>>   test on /test (zfs, NFS exported, local, nfsv4acls, fsid
>> 2b23b2a1de21ed66,
>> vnodes: count 13846 lazy 0)
>>   /test on /mnt (nullfs, NFS exported, local, nfsv4acls, fsid
>> 11ff00292900, vnodes: count 13846 lazy 0)
>>
>> Now I can easily see how the vnodes are used without going into ddb.
>> On my laptop I have various vnet jails and nullfs mount my homedir into
>> them so pretty much everything goes through nullfs to ZFS.  I'm limping
>> along with the nullfs nocache and small number of vnodes but it would be
>> nice to not need that.
>>
>> Thanks,
>>
>> Doug A.
>>
>>
>
>
> --
> Mateusz Guzik 
>


-- 
Mateusz Guzik 



Re: nullfs and ZFS issues

2022-04-19 Thread Mateusz Guzik
On 4/19/22, Doug Ambrisko  wrote:
> I've switched my laptop to use nullfs and ZFS.  Previously, I used
> localhost NFS mounts instead of nullfs when nullfs would complain
> that it couldn't mount.  Since that check has been removed, I've
> switched to nullfs only.  However, every so often my laptop would
> get slow and the the ARC evict and prune thread would consume two
> cores 100% until I rebooted.  I had a 1G max. ARC and have increased
> it to 2G now.  Looking into this has uncovered some issues:
>  -nullfs would prevent vnlru_free_vfsops from doing anything
>   when called from ZFS arc_prune_task
>  -nullfs would hang onto a bunch of vnodes unless mounted with
>   nocache
>  -nullfs and nocache would break untar.  This has been fixed now.
>
> With nullfs, nocache and settings max vnodes to a low number I can
> keep the ARC around the max. without evict and prune consuming
> 100% of 2 cores.  This doesn't seem like the best solution but it
> better then when the ARC starts spinning.
>
> Looking into this issue with bhyve and a md drive for testing I create
> a brand new zpool mounted as /test and then nullfs mount /test to /mnt.
> I loop through untaring the Linux kernel into the nullfs mount, rm -rf it
> and repeat.  I set the ARC to the smallest value I can.  Untarring the
> Linux kernel was enough to get the ARC evict and prune to spin since
> they couldn't evict/prune anything.
>
> Looking at vnlru_free_vfsops called from ZFS arc_prune_task I see it
>   static int
>   vnlru_free_impl(int count, struct vfsops *mnt_op, struct vnode *mvp)
>   {
>   ...
>
> for (;;) {
>   ...
> vp = TAILQ_NEXT(vp, v_vnodelist);
>   ...
>
> /*
>  * Don't recycle if our vnode is from different type
>  * of mount point.  Note that mp is type-safe, the
>  * check does not reach unmapped address even if
>  * vnode is reclaimed.
>  */
> if (mnt_op != NULL && (mp = vp->v_mount) != NULL &&
> mp->mnt_op != mnt_op) {
> continue;
> }
>   ...
>
> The vp ends up being the nulfs mount and then hits the continue
> even though the passed in mvp is on ZFS.  If I do a hack to
> comment out the continue then I see the ARC, nullfs vnodes and
> ZFS vnodes grow.  When the ARC calls arc_prune_task that calls
> vnlru_free_vfsops and now the vnodes go down for nullfs and ZFS.
> The ARC cache usage also goes down.  Then they increase again until
> the ARC gets full and then they go down again.  So with this hack
> I don't need nocache passed to nullfs and I don't need to limit
> the max vnodes.  Doing multiple untars in parallel over and over
> doesn't seem to cause any issues for this test.  I'm not saying
> commenting out continue is the fix but a simple POC test.
>

I don't see an easy way to say "this is a nullfs vnode holding onto a
zfs vnode". Perhaps the routine can be extrended with issuing a nullfs
callback, if the module is loaded.

In the meantime I think a good enough(tm) fix would be to check that
nothing was freed and fallback to good old regular clean up without
filtering by vfsops. This would be very similar to what you are doing
with your hack.


> It appears that when ZFS is asking for cached vnodes to be
> free'd nullfs also needs to free some up as well so that
> they are free'd on the VFS level.  It seems that vnlru_free_impl
> should allow some of the related nullfs vnodes to be free'd so
> the ZFS ones can be free'd and reduce the size of the ARC.
>
> BTW, I also hacked the kernel and mount to show the vnodes used
> per mount ie. mount -v:
>   test on /test (zfs, NFS exported, local, nfsv4acls, fsid 2b23b2a1de21ed66,
> vnodes: count 13846 lazy 0)
>   /test on /mnt (nullfs, NFS exported, local, nfsv4acls, fsid
> 11ff00292900, vnodes: count 13846 lazy 0)
>
> Now I can easily see how the vnodes are used without going into ddb.
> On my laptop I have various vnet jails and nullfs mount my homedir into
> them so pretty much everything goes through nullfs to ZFS.  I'm limping
> along with the nullfs nocache and small number of vnodes but it would be
> nice to not need that.
>
> Thanks,
>
> Doug A.
>
>


-- 
Mateusz Guzik 



"set but not used" warnings in the kernel

2022-03-29 Thread Mateusz Guzik
This is way too spammy and there is no consistent effort to clean them up,
that I can see anyway.

As such, I think these warns are doing more damage than help and should be
disabled by default.

Alternatively, perhaps people can step up. I'm pretty sure to date I got
rid of more of these than anyone else.

Comments?
-- 
Mateusz Guzik 



Re: ktrace on NFSroot failing?

2022-03-10 Thread Mateusz Guzik
On 3/10/22, Bjoern A. Zeeb  wrote:
> Hi,
>
> I am having a weird issue with ktrace on an nfsroot machine:
>
> root:/tmp # ktrace sleep 1
> root:/tmp # kdump
> -559038242  Events dropped.
> kdump: bogus length 0xdeadc0de
>
> Anyone seen something like this before?
>

I just did a quick check and it definitely fails on nfs mounts:
# ktrace pwd
/root/mjg
# kdump
-559038242  Events dropped.
kdump: bogus length 0xdeadc0de

I don't have time to look into it this week though.

> --
> Bjoern A. Zeeb         r15:7
>
>


-- 
Mateusz Guzik 



Re: bastille : poudriere not working in jail: jail: jail:_set: Operation not permitted!

2022-02-28 Thread Mateusz Guzik
KER00/poudriere";
> exec.start += "/sbin/zfs mount -a";
> exec.poststop += "/sbin/zfs unjail BUNKER00/poudriere";
>
> }
> [...]
>
> Tracking the execution of the build process by issuing
>
> poudriere -x bulk ...
>
> and examin the resulting trace doesn' tgive me any hint, the error reported
> above
> immediately occurs when the jail is about to be started:
>
> + set -u +x
> + jail -c persist 'name=123-amd64-head-default'
> 'path=/mnt/poudriere/data/.m/ \
>   123-amd64-head-default/ref' 'host.hostname=basehost.local.domain' \
>   'ip4.addr=127.0.0.1' 'ip6.addr=::1' allow.chflags allow.sysvipc
> jail: jail_set: Operation not permitted
> + exit_handler
> [...]
>
> Searching the net revealed some issues with setting IP4 and IP6 in
> poudriere, but those
> findings are dated back to 2017 and 2014 and I guess this is solved right
> now.
>
> The difference between our manually jail.conf driven setup and the
> XigmaNAS/bastille
> based one is, bastille uses jib/netgraph based seutups of the vnet and the
> ip4/ip6 is
> setup from rc.conf, while we use epair in the other world and the ip is
> setup from
> withing the jail definition in jail.conf.
>
> I'm out of ideas here and after two days of trial and error and trying to
> understand
> what's going on lost ... Any hints or tipps?
>
> Thanks in advance,
>
> O. Hartmann
>
>


-- 
Mateusz Guzik 



Re: Benchmarks: FreeBSD 13 vs. NetBSD 9.2 vs. OpenBSD 7 vs. DragonFlyBSD 6 vs. Linux

2021-12-11 Thread Mateusz Guzik
On 12/11/21, Mateusz Guzik  wrote:
> On 12/11/21, Mateusz Guzik  wrote:
>> On 12/11/21, Piper H  wrote:
>>> I read this article from Reddit:
>>> https://www.phoronix.com/scan.php?page=article=bsd-linux-eo2021=1
>>>
>>> I am surprised to see that the BSD cluster today has much worse
>>> performance
>>> than Linux.
>>> What do you think of this?
>>>
>>
>> There is a lot to say here.
>>
>> One has to own up to Linux likely being a little bit (or even more so)
>> faster for some of the legitimate tests. One, there are certain
>> multicore scalability issues compared Linux, which should be pretty
>> mild given the scale (16 cores/32 threads). A more important problem
>> is userspace which fails to take advantage of SIMD instructions for
>> core primitives like memset, memcpy et al. However, if the difference
>> is more than few %, the result is likely bogus. Key thing to do when
>> benchmarking is being able to explain the result, most notably if you
>> run into huge discrepancies.
>>
>> I had a look at the most egregious result -- zstd and spoiler, it is a
>> bug in result reporting in zstd.
>>
>> I got FreeBSD and Linux (Ubuntu Focal) vms running on:
>> Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
>>
>> Their zstd test ultimately ends up spawning: zstd -T24 -S -i15 -b19
>> FreeBSD-12.2-RELEASE-amd64-memstick.img (yes, they compress a ~1GB
>> FreeBSD image).
>>
>> Side note, it does not matter, but I happen to have CURRENT kernel
>> running on the FreeBSD 13 vm right now.
>>
>> [16:37] freebsd13:~ # time zstd -T24 -S -i15 -b19
>> FreeBSD-12.2-RELEASE-amd64-memstick.img
>> 19#md64-memstick.img :1055957504 -> 692662162 (1.524),  3.97 MB/s ,2156.8
>> MB/s
>> zstd -T24 -S -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
>> 274.10s user 12.90s system 763% cpu 37.602 total
>>
>> In contrast:
>>
>> [16:37] ubuntu:...tem/compress-zstd-1.5.0 (130) # time zstd -T24 -S
>> -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
>> 19#md64-memstick.img :1055957504 -> 692662162 (1.524),  60.1 MB/s ,2030.6
>> MB/s
>> zstd -T24 -S -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
>> 328.65s user 3.48s system 850% cpu 39.070 total
>>
>> This is repeatable. If anything, FreeBSD did it *faster*. Yet zstd
>> reports:
>> FreeBSD: 3.97 MB/s ,2156.8 MB/s [total time real time of  37.602 seconds]
>> Linux:  60.1 MB/s ,2030.6 MB/s [total time real time of 39.070 seconds]
>>
>> I don't know what these numbers are supposed to be, but it is pretty
>> clear Phoronix grabs the first one.
>>
>> I'll look into sorting this out some time later.
>>
>
> So I cloned https://github.com/facebook/zstd/ and got the v1.4.8 tag,
> as currently imported into FreeBSD. The diff is pretty minimal and
> deals with exposing extra symbols.
>
> zstd directly compiled from that source (with mere gmake) correctly
> shows 2-digit MB speeds, so it has to be something in the FreeBSD
> build which ends up messing with it. I ran out of curiosity at this
> point (and more so time) at this point, but I invite someone else to
> get to the bottom of this.
>
> Bottom line though: there is no zstd performance problem on FreeBSD.
>

Well I had another look at found it: the low number is computed from
supposed total time spent on CPU. Compiling by hand gives c11
primitives to do it, while using the FreeBSD source tree lands with
c90 which end up giving bogus results.

A hack which I can't be bothered to productize pasted below.

I can't easily repeat the test with patched zstd on the same box, but
on another one this goes from supposed ~3.3MB/s to 70.2MB/s, which
assume sorts it out.

diff --git a/sys/contrib/zstd/programs/timefn.c
b/sys/contrib/zstd/programs/timefn.c
index 95460d0d971d..f5dcdf84186e 100644
--- a/sys/contrib/zstd/programs/timefn.c
+++ b/sys/contrib/zstd/programs/timefn.c
@@ -84,8 +84,7 @@ PTime UTIL_getSpanTimeNano(UTIL_time_t clockStart,
UTIL_time_t clockEnd)

 /* C11 requires timespec_get, but FreeBSD 11 lacks it, while still
claiming C11 compliance.
Android also lacks it but does define TIME_UTC. */
-#elif (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) /* C11 */) \
-&& defined(TIME_UTC) && !defined(__ANDROID__)
+#else

 #include/* abort */
 #include /* perror */
@@ -133,14 +132,6 @@ PTime UTIL_getSpanTimeNano(UTIL_time_t begin,
UTIL_time_t end)
 return nano;
 }

-
-
-#else   /* relies on standard C90 (note : clock_t measurements can be
wrong when using multi-threading) */
-
-UTIL_time_t UTIL_getTime(void) { return clock(); }
-PTime UTIL_getSpanTimeMicro(UTIL_tim

Re: Benchmarks: FreeBSD 13 vs. NetBSD 9.2 vs. OpenBSD 7 vs. DragonFlyBSD 6 vs. Linux

2021-12-11 Thread Mateusz Guzik
On 12/11/21, Mateusz Guzik  wrote:
> On 12/11/21, Piper H  wrote:
>> I read this article from Reddit:
>> https://www.phoronix.com/scan.php?page=article=bsd-linux-eo2021=1
>>
>> I am surprised to see that the BSD cluster today has much worse
>> performance
>> than Linux.
>> What do you think of this?
>>
>
> There is a lot to say here.
>
> One has to own up to Linux likely being a little bit (or even more so)
> faster for some of the legitimate tests. One, there are certain
> multicore scalability issues compared Linux, which should be pretty
> mild given the scale (16 cores/32 threads). A more important problem
> is userspace which fails to take advantage of SIMD instructions for
> core primitives like memset, memcpy et al. However, if the difference
> is more than few %, the result is likely bogus. Key thing to do when
> benchmarking is being able to explain the result, most notably if you
> run into huge discrepancies.
>
> I had a look at the most egregious result -- zstd and spoiler, it is a
> bug in result reporting in zstd.
>
> I got FreeBSD and Linux (Ubuntu Focal) vms running on:
> Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
>
> Their zstd test ultimately ends up spawning: zstd -T24 -S -i15 -b19
> FreeBSD-12.2-RELEASE-amd64-memstick.img (yes, they compress a ~1GB
> FreeBSD image).
>
> Side note, it does not matter, but I happen to have CURRENT kernel
> running on the FreeBSD 13 vm right now.
>
> [16:37] freebsd13:~ # time zstd -T24 -S -i15 -b19
> FreeBSD-12.2-RELEASE-amd64-memstick.img
> 19#md64-memstick.img :1055957504 -> 692662162 (1.524),  3.97 MB/s ,2156.8
> MB/s
> zstd -T24 -S -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
> 274.10s user 12.90s system 763% cpu 37.602 total
>
> In contrast:
>
> [16:37] ubuntu:...tem/compress-zstd-1.5.0 (130) # time zstd -T24 -S
> -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
> 19#md64-memstick.img :1055957504 -> 692662162 (1.524),  60.1 MB/s ,2030.6
> MB/s
> zstd -T24 -S -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
> 328.65s user 3.48s system 850% cpu 39.070 total
>
> This is repeatable. If anything, FreeBSD did it *faster*. Yet zstd reports:
> FreeBSD: 3.97 MB/s ,2156.8 MB/s [total time real time of  37.602 seconds]
> Linux:  60.1 MB/s ,2030.6 MB/s [total time real time of 39.070 seconds]
>
> I don't know what these numbers are supposed to be, but it is pretty
> clear Phoronix grabs the first one.
>
> I'll look into sorting this out some time later.
>

So I cloned https://github.com/facebook/zstd/ and got the v1.4.8 tag,
as currently imported into FreeBSD. The diff is pretty minimal and
deals with exposing extra symbols.

zstd directly compiled from that source (with mere gmake) correctly
shows 2-digit MB speeds, so it has to be something in the FreeBSD
build which ends up messing with it. I ran out of curiosity at this
point (and more so time) at this point, but I invite someone else to
get to the bottom of this.

Bottom line though: there is no zstd performance problem on FreeBSD.

-- 
Mateusz Guzik 



Re: Benchmarks: FreeBSD 13 vs. NetBSD 9.2 vs. OpenBSD 7 vs. DragonFlyBSD 6 vs. Linux

2021-12-11 Thread Mateusz Guzik
On 12/11/21, Piper H  wrote:
> I read this article from Reddit:
> https://www.phoronix.com/scan.php?page=article=bsd-linux-eo2021=1
>
> I am surprised to see that the BSD cluster today has much worse performance
> than Linux.
> What do you think of this?
>

There is a lot to say here.

One has to own up to Linux likely being a little bit (or even more so)
faster for some of the legitimate tests. One, there are certain
multicore scalability issues compared Linux, which should be pretty
mild given the scale (16 cores/32 threads). A more important problem
is userspace which fails to take advantage of SIMD instructions for
core primitives like memset, memcpy et al. However, if the difference
is more than few %, the result is likely bogus. Key thing to do when
benchmarking is being able to explain the result, most notably if you
run into huge discrepancies.

I had a look at the most egregious result -- zstd and spoiler, it is a
bug in result reporting in zstd.

I got FreeBSD and Linux (Ubuntu Focal) vms running on:
Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz

Their zstd test ultimately ends up spawning: zstd -T24 -S -i15 -b19
FreeBSD-12.2-RELEASE-amd64-memstick.img (yes, they compress a ~1GB
FreeBSD image).

Side note, it does not matter, but I happen to have CURRENT kernel
running on the FreeBSD 13 vm right now.

[16:37] freebsd13:~ # time zstd -T24 -S -i15 -b19
FreeBSD-12.2-RELEASE-amd64-memstick.img
19#md64-memstick.img :1055957504 -> 692662162 (1.524),  3.97 MB/s ,2156.8 MB/s
zstd -T24 -S -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
274.10s user 12.90s system 763% cpu 37.602 total

In contrast:

[16:37] ubuntu:...tem/compress-zstd-1.5.0 (130) # time zstd -T24 -S
-i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
19#md64-memstick.img :1055957504 -> 692662162 (1.524),  60.1 MB/s ,2030.6 MB/s
zstd -T24 -S -i15 -b19 FreeBSD-12.2-RELEASE-amd64-memstick.img
328.65s user 3.48s system 850% cpu 39.070 total

This is repeatable. If anything, FreeBSD did it *faster*. Yet zstd reports:
FreeBSD: 3.97 MB/s ,2156.8 MB/s [total time real time of  37.602 seconds]
Linux:  60.1 MB/s ,2030.6 MB/s [total time real time of 39.070 seconds]

I don't know what these numbers are supposed to be, but it is pretty
clear Phoronix grabs the first one.

I'll look into sorting this out some time later.

TL;DR don't drink and benchmark
-- 
Mateusz Guzik 



Re: Kernel panic by executing `poudriere bulk`

2021-11-26 Thread Mateusz Guzik
On 11/26/21, Yasuhiro Kimura  wrote:
> yasu@rolling-vm-freebsd1[1015]% uname -a
> ~
> FreeBSD rolling-vm-freebsd1.home.utahime.org 14.0-CURRENT FreeBSD
> 14.0-CURRENT #0 main-n251115-ae92ace05fd: Sat Nov 27 01:47:15 JST 2021
> ro...@rolling-vm-freebsd1.home.utahime.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
>  amd64
> yasu@rolling-vm-freebsd1[1016]%
>
> After regular weekly update of my 14-current amd64 system, kernel
> panic happens when I execute `poudriere bulk`.
>
> Snapshot of console:
> https://www.utahime.org/FreeBSD/FreeBSD-14-CURRENT-amd64-main-n251115-ae92ace05fd.panic.png
>

Should be fixed by
https://cgit.freebsd.org/src/commit?id=1879021942f56c8b264f4aeb1966b3733908ef62

-- 
Mateusz Guzik 



Re: drm-devel-kmod build failures

2021-10-11 Thread Mateusz Guzik
This should do it (untested):

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 37b268afa..f05de73fa 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -117,9 +117,15 @@ dma_buf_close(struct file *fp, struct thread *td)
return (0);
 }

+#if __FreeBSD_version >= 1400037
+static int
+dma_buf_stat(struct file *fp, struct stat *sb,
+struct ucred *active_cred __unused)
+#else
 static int
 dma_buf_stat(struct file *fp, struct stat *sb,
 struct ucred *active_cred __unused, struct thread *td __unused)
+#endif
 {

/* XXX need to define flags for st_mode */


On 10/11/21, Michael Butler via freebsd-current
 wrote:
> After the latest freebsd version bump in param.h, I tried to rebuild the
> DRM modules. It failed with ..
>
> --- dma-buf.o ---
> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.5.19_4/drivers/dma-buf//dma-buf.c:121:1:
>
> error: conflicting types for 'dma_buf_stat'
> dma_buf_stat(struct file *fp, struct stat *sb,
> ^
> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.5.19_4/drivers/dma-buf//dma-buf.c:70:18:
>
> note: previous declaration is here
> static fo_stat_t dma_buf_stat;
>   ^
> 1 error generated.
> *** [dma-buf.o] Error code 1
>
> make[3]: stopped in
> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.5.19_4/linuxkpi
> 1 error
>
> make[3]: stopped in
> /usr/ports/graphics/drm-devel-kmod/work/drm-kmod-drm_v5.5.19_4/linuxkpi
>
> I get a similar error with drm-current-kmod. What changed?
>
>   imb
>
>


-- 
Mateusz Guzik 



Re: witness_lock_list_get: witness exhausted

2021-10-04 Thread Mateusz Guzik
Just take it and change as you see fit, I don't have time to work on it.

On 10/4/21, Alan Somers  wrote:
> On Mon, Jan 8, 2018 at 5:31 PM Mateusz Guzik  wrote:
>>
>> On Tue, Jan 9, 2018 at 12:41 AM, Michael Jung  wrote:
>>
>> > On 2018-01-08 13:39, John Baldwin wrote:
>> >
>> >> On Tuesday, November 28, 2017 02:46:03 PM Michael Jung wrote:
>> >>
>> >>> Hi!
>> >>>
>> >>> I've recently up'd my processor count on our poudriere box and have
>> >>> started noticing the error
>> >>> "witness_lock_list_get: witness exhausted" on the console.  The
>> >>> kernel
>> >>> *DOES NOT* crash but I
>> >>> thought the report may be useful to someone.
>> >>>
>> >>> $ uname -a
>> >>> FreeBSD poudriere 12.0-CURRENT FreeBSD 12.0-CURRENT #1 r325999: Sun
>> >>> Nov
>> >>> 19 18:41:20 EST 2017
>> >>> mikej@poudriere:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
>> >>>
>> >>> The machine is pretty busy running four poudriere build instances.
>> >>>
>> >>> last pid: 76584;  load averages: 115.07, 115.96, 98.30
>> >>>
>> >>>   up 6+07:32:59  14:44:03
>> >>> 763 processes: 117 running, 581 sleeping, 2 zombie, 63 lock
>> >>> CPU: 59.0% user,  0.0% nice, 40.7% system,  0.1% interrupt,  0.1%
>> >>> idle
>> >>> Mem: 12G Active, 2003M Inact, 44G Wired, 29G Free
>> >>> ARC: 28G Total, 11G MFU, 16G MRU, 122M Anon, 359M Header, 1184M Other
>> >>>   25G Compressed, 32G Uncompressed, 1.24:1 Ratio
>> >>>
>> >>> Let me know what additional information I might supply.
>> >>>
>> >>
>> >> This just means that WITNESS stopped working because it ran out of
>> >> pre-allocated objects.  In particular the objects used to track how
>> >> many locks are held by how many threads:
>> >>
>> >> /*
>> >>  * XXX: This is somewhat bogus, as we assume here that at most 2048
>> >> threads
>> >>  * will hold LOCK_NCHILDREN locks.  We handle failure ok, and we
>> >> should
>> >>  * probably be safe for the most part, but it's still a SWAG.
>> >>  */
>> >> #define LOCK_NCHILDREN  5
>> >> #define LOCK_CHILDCOUNT 2048
>> >>
>> >> Probably the '2048' (max number of concurrent threads) needs to scale
>> >> with
>> >> MAXCPU.  2048 threads is probably a bit low on big x86 boxes.
>> >>
>> >
>> >
>> > Thank you for you explanation.  We are expanding our ESXi cluster and
>> > even
>> > though with standard edition I can only assign 64 vCPU's to a guest and
>> > as
>> > much
>> > RAM as I want, I do like to help with edge cases if I can make them
>> > occur
>> > pushing
>> > boundaries as I can towards additianional improvements in FreeBSD.
>> >
>>
>> Can you apply this and re-run the test?
>>
>> https://people.freebsd.org/~mjg/witness.diff
>>
>> It bumps the counters to be "high enough" but also starts tracking usage.
>> If you get
>> the message again, bump the values even higher.
>>
>> Once you get a complete poudriere run which did not result in the
>> problem,
>> do:
>> $ sysctl debug.witness.list_used debug.witness.list_max_used
>>
>> to dump the actual usage.
>
> This is a nice little patch.  Can we commit to head?  Even better
> would be if LOCK_CHILDCOUNT could be a tunable.  On my largish system,
> here's what I get shortly after boot:
>
> debug.witness.list_max_used: 8432
> debug.witness.list_used: 8420
>
> -Alan
>


-- 
Mateusz Guzik 



Re: kernel panic while copying files

2021-06-08 Thread Mateusz Guzik
.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
> 8 09:34:32 CEST 2021
>
> Here the kgdb backtrace:
>
> Unread portion of the kernel message buffer:
> panic: Duplicate free of 0xf800356b9000 from zone
> 0xfe00dcbdd800(da_ccb) slab 0xf800356b9fd8(0)
> cpuid = 8
> time = 1623140519
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe00c5f398c0
> vpanic() at vpanic+0x181/frame 0xfe00c5f39910
> panic() at panic+0x43/frame 0xfe00c5f39970
> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfe00c5f399b0
> uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfe00c5f39a00
> camperiphdone() at camperiphdone+0x1b7/frame 0xfe00c5f39b20
> xpt_done_process() at xpt_done_process+0x3dd/frame 0xfe00c5f39b60
> xpt_done_td() at xpt_done_td+0xf5/frame 0xfe00c5f39bb0
> fork_exit() at fork_exit+0x80/frame 0xfe00c5f39bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe00c5f39bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
>
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
> pcpu,
> (kgdb) bt
> #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> #1  doadump (textdump=textdump@entry=0)
> at /usr/src/sys/kern/kern_shutdown.c:399
> #2  0x8040c39a in db_dump (dummy=,
> dummy2=, dummy3=, dummy4=)
> at /usr/src/sys/ddb/db_command.c:575
> #3  0x8040c192 in db_command (last_cmdp=,
> cmd_table=, dopager=dopager@entry=1)
> at /usr/src/sys/ddb/db_command.c:482
> #4  0x8040beed in db_command_loop ()
> at /usr/src/sys/ddb/db_command.c:535
> #5  0x8040f616 in db_trap (type=, code= out>)
> at /usr/src/sys/ddb/db_main.c:270
> #6  0x8066b1c4 in kdb_trap (type=type@entry=3, code=code@entry=0,
> tf=, tf@entry=0xfe00c5f397f0)
> at /usr/src/sys/kern/subr_kdb.c:727
> #7  0x809a4e96 in trap (frame=0xfe00c5f397f0)
> at /usr/src/sys/amd64/amd64/trap.c:604
> #8  
> #9  kdb_enter (why=0x80a61a23 "panic", msg=)
> at /usr/src/sys/kern/subr_kdb.c:506
> #10 0x806207a2 in vpanic (fmt=, ap=,
> ap@entry=0xfe00c5f39950) at /usr/src/sys/kern/kern_shutdown.c:907
> #11 0x80620533 in panic (
> fmt=0x80d635c8  ".\024\244\200\377\377\377\377")
> at /usr/src/sys/kern/kern_shutdown.c:843
> #12 0x808e12b1 in uma_dbg_free (zone=0xfe00dcbdd800,
> slab=0xf800356b9fd8, item=0xf800356b9000)
> at /usr/src/sys/vm/uma_core.c:5664
> #13 0xffff808d9de7 in item_dtor (zone=0xfe00dcbdd800,
> item=0xf800356b9000, size=544, udata=0x0, skip=SKIP_NONE)
> at /usr/src/sys/vm/uma_core.c:3418
> #14 uma_zfree_arg (zone=0xfe00dcbdd800, item=0xf800356b9000,
> udata=udata@entry=0x0) at /usr/src/sys/vm/uma_core.c:4374
> #15 0x802da503 in uma_zfree (zone=0x80d635c8 ,
> item=0x200) at /usr/src/sys/vm/uma.h:404
> #16 0x802d9117 in camperiphdone (periph=0xf800061e2c00,
> done_ccb=0xf800355d6cc0) at /usr/src/sys/cam/cam_periph.c:1427
> #17 0x802dfebd in xpt_done_process (ccb_h=0xf800355d6cc0)
> at /usr/src/sys/cam/cam_xpt.c:5491
> #18 0x802e1ec5 in xpt_done_td (
> arg=arg@entry=0x80d33d80 )
> at /usr/src/sys/cam/cam_xpt.c:5546
> #19 0x805dad80 in fork_exit (callout=0x802e1dd0
> ,
> arg=0x80d33d80 , frame=0xfe00c5f39c00)
> at /usr/src/sys/kern/kern_fork.c:1083
> #20 
>
> Apparently caused by recent changes to CAM.
>
> Let me know if you want more information.
>
> --
> Gary Jennejohn
>
>


-- 
Mateusz Guzik 



Re: Panics in recent NFS server

2021-05-31 Thread Mateusz Guzik
I reproduced the panic, things work for me with the patch below.
However, there may be more to it so I'm going to ask Rick to weigh in.
but short version is that length returned by nfsrv_parsename is off by
one compared to copyinstr.

diff --git a/sys/fs/nfsserver/nfs_nfsdsubs.c b/sys/fs/nfsserver/nfs_nfsdsubs.c
index 2b6e17752544..8c7db36bbd05 100644
--- a/sys/fs/nfsserver/nfs_nfsdsubs.c
+++ b/sys/fs/nfsserver/nfs_nfsdsubs.c
@@ -2065,7 +2065,7 @@ nfsrv_parsename(struct nfsrv_descript *nd, char
*bufp, u_long *hashp,
}
}
*tocp = '\0';
-   *outlenp = (size_t)outlen;
+   *outlenp = (size_t)outlen + 1;
if (hashp != NULL)
*hashp = hash;
 nfsmout:


On 5/31/21, Mateusz Guzik  wrote:
> On 5/31/21, Mateusz Guzik  wrote:
>> It's probably my commit d81aefa8b7dd8cbeffeda541fca9962802404983 ,
>> I'll look at this later.
>
> Well let me rephrase. While the panic was added in said commit, I
> suspect the bug is on nfs side -- it has its own namei variant which I
> suspect is managing ni_pathlen in a manner different than the
> original, it just happens to not panic on kernels prior to the above
> change.
>
>>
>> On 5/31/21, Dimitry Andric  wrote:
>>> Hi,
>>>
>>> I recently upgraded a -CURRENT NFS server from 2021-05-12 to today
>>> (2021-05-31), and when the first NFS client attempted to connect, I got
>>> this
>>> panic:
>>>
>>> panic: lookup: expected nul at 0xf800104b3002; string [dim]
>>>
>>> cpuid = 0
>>> time = 1622463863
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfe00747e89b0
>>> vpanic() at vpanic+0x187/frame 0xfe00747e8a10
>>> panic() at panic+0x43/frame 0xfe00747e8a70
>>> lookup() at lookup+0xad2/frame 0xfe00747e8b10
>>> nfsvno_namei() at nfsvno_namei+0x1a4/frame 0xfe00747e8bc0
>>> nfsrvd_lookup() at nfsrvd_lookup+0x191/frame 0xfe00747e8eb0
>>> nfsrvd_dorpc() at nfsrvd_dorpc+0xfab/frame 0xfe00747e90c0
>>> nfssvc_program() at nfssvc_program+0x604/frame 0xfe00747e92a0
>>> svc_run_internal() at svc_run_internal+0xa72/frame 0xfe00747e93d0
>>> svc_run() at svc_run+0x250/frame 0xfe00747e9430
>>> nfsrvd_nfsd() at nfsrvd_nfsd+0x33c/frame 0xfe00747e9590
>>> nfssvc_nfsd() at nfssvc_nfsd+0x473/frame 0xfe00747e9aa0
>>> sys_nfssvc() at sys_nfssvc+0xc7/frame 0xfe00747e9ac0
>>> amd64_syscall() at amd64_syscall+0x12e/frame 0xfe00747e9bf0
>>> fast_syscall_common() at fast_syscall_common+0xf8/frame
>>> 0xfe00747e9bf0
>>> --- syscall (155, FreeBSD ELF64, sys_nfssvc), rip = 0x8011aa59a, rsp =
>>> 0x7fffe4e8, rbp = 0x7fffe780 ---
>>> KDB: enter: panic
>>>
>>> __curthread ()
>>> at /share/dim/src/freebsd/src-dim/sys/amd64/include/pcpu_aux.h:55
>>> 55  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
>>> pcpu,
>>> (kgdb) #0  __curthread ()
>>> at /share/dim/src/freebsd/src-dim/sys/amd64/include/pcpu_aux.h:55
>>> #1  doadump (textdump=textdump@entry=0)
>>> at /share/dim/src/freebsd/src-dim/sys/kern/kern_shutdown.c:399
>>> #2  0x804cca5a in db_dump (dummy=,
>>> dummy2=, dummy3=, dummy4=)
>>> at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:575
>>> #3  0x804cc912 in db_command (last_cmdp=,
>>> cmd_table=, dopager=dopager@entry=1)
>>> at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:482
>>> #4  0x804cc58d in db_command_loop ()
>>> at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:535
>>> #5  0x804cfd06 in db_trap (type=, code=>> out>)
>>> at /share/dim/src/freebsd/src-dim/sys/ddb/db_main.c:270
>>> #6  0x80c69f17 in kdb_trap (type=type@entry=3,
>>> code=code@entry=0,
>>> tf=tf@entry=0xfe00747e88e0)
>>> at /share/dim/src/freebsd/src-dim/sys/kern/subr_kdb.c:727
>>> #7  0x810d7aee in trap (frame=0xfe00747e88e0)
>>> at /share/dim/src/freebsd/src-dim/sys/amd64/amd64/trap.c:576
>>> #8  
>>> #9  kdb_enter (why=0x812d3d27 "panic", msg=)
>>> at /share/dim/src/freebsd/src-dim/sys/kern/subr_kdb.c:506
>>> #10 0x80c1d248 in vpanic (
>>> fmt=0x8129dfef "%s: expected nul at %p; string [%s]\n",
>>> ap=, ap@entry=0xfe00747e8a50)
>>> at /share/dim/src/freebsd/src-dim/sys/kern/kern_shutdown.c:907
>>> #11

Re: Panics in recent NFS server

2021-05-31 Thread Mateusz Guzik
On 5/31/21, Mateusz Guzik  wrote:
> It's probably my commit d81aefa8b7dd8cbeffeda541fca9962802404983 ,
> I'll look at this later.

Well let me rephrase. While the panic was added in said commit, I
suspect the bug is on nfs side -- it has its own namei variant which I
suspect is managing ni_pathlen in a manner different than the
original, it just happens to not panic on kernels prior to the above
change.

>
> On 5/31/21, Dimitry Andric  wrote:
>> Hi,
>>
>> I recently upgraded a -CURRENT NFS server from 2021-05-12 to today
>> (2021-05-31), and when the first NFS client attempted to connect, I got
>> this
>> panic:
>>
>> panic: lookup: expected nul at 0xf800104b3002; string [dim]
>>
>> cpuid = 0
>> time = 1622463863
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe00747e89b0
>> vpanic() at vpanic+0x187/frame 0xfe00747e8a10
>> panic() at panic+0x43/frame 0xfe00747e8a70
>> lookup() at lookup+0xad2/frame 0xfe00747e8b10
>> nfsvno_namei() at nfsvno_namei+0x1a4/frame 0xfe00747e8bc0
>> nfsrvd_lookup() at nfsrvd_lookup+0x191/frame 0xfe00747e8eb0
>> nfsrvd_dorpc() at nfsrvd_dorpc+0xfab/frame 0xfe00747e90c0
>> nfssvc_program() at nfssvc_program+0x604/frame 0xfe00747e92a0
>> svc_run_internal() at svc_run_internal+0xa72/frame 0xfe00747e93d0
>> svc_run() at svc_run+0x250/frame 0xfe00747e9430
>> nfsrvd_nfsd() at nfsrvd_nfsd+0x33c/frame 0xfe00747e9590
>> nfssvc_nfsd() at nfssvc_nfsd+0x473/frame 0xfe00747e9aa0
>> sys_nfssvc() at sys_nfssvc+0xc7/frame 0xfe00747e9ac0
>> amd64_syscall() at amd64_syscall+0x12e/frame 0xfe00747e9bf0
>> fast_syscall_common() at fast_syscall_common+0xf8/frame
>> 0xfe00747e9bf0
>> --- syscall (155, FreeBSD ELF64, sys_nfssvc), rip = 0x8011aa59a, rsp =
>> 0x7fffe4e8, rbp = 0x7fffe780 ---
>> KDB: enter: panic
>>
>> __curthread ()
>> at /share/dim/src/freebsd/src-dim/sys/amd64/include/pcpu_aux.h:55
>> 55   __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
>> pcpu,
>> (kgdb) #0  __curthread ()
>> at /share/dim/src/freebsd/src-dim/sys/amd64/include/pcpu_aux.h:55
>> #1  doadump (textdump=textdump@entry=0)
>> at /share/dim/src/freebsd/src-dim/sys/kern/kern_shutdown.c:399
>> #2  0x804cca5a in db_dump (dummy=,
>> dummy2=, dummy3=, dummy4=)
>> at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:575
>> #3  0x804cc912 in db_command (last_cmdp=,
>> cmd_table=, dopager=dopager@entry=1)
>> at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:482
>> #4  0x804cc58d in db_command_loop ()
>> at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:535
>> #5  0x804cfd06 in db_trap (type=, code=> out>)
>> at /share/dim/src/freebsd/src-dim/sys/ddb/db_main.c:270
>> #6  0x80c69f17 in kdb_trap (type=type@entry=3, code=code@entry=0,
>> tf=tf@entry=0xfe00747e88e0)
>> at /share/dim/src/freebsd/src-dim/sys/kern/subr_kdb.c:727
>> #7  0x810d7aee in trap (frame=0xfe00747e88e0)
>> at /share/dim/src/freebsd/src-dim/sys/amd64/amd64/trap.c:576
>> #8  
>> #9  kdb_enter (why=0x812d3d27 "panic", msg=)
>> at /share/dim/src/freebsd/src-dim/sys/kern/subr_kdb.c:506
>> #10 0x80c1d248 in vpanic (
>> fmt=0x8129dfef "%s: expected nul at %p; string [%s]\n",
>> ap=, ap@entry=0xfe00747e8a50)
>> at /share/dim/src/freebsd/src-dim/sys/kern/kern_shutdown.c:907
>> #11 0x80c1cfd3 in panic (
>> fmt=0x81e9b9c8  "=\t)\201\377\377\377\377")
>> at /share/dim/src/freebsd/src-dim/sys/kern/kern_shutdown.c:843
>> #12 0x80cfa992 in lookup (ndp=ndp@entry=0xfe00747e8d90)
>> at /share/dim/src/freebsd/src-dim/sys/kern/vfs_lookup.c:919
>> #13 0x80b33f84 in nfsvno_namei (nd=nd@entry=0xfe00747e9100,
>> ndp=ndp@entry=0xfe00747e8d90, dp=,
>> dp@entry=0xf80010544380, islocked=,
>> islocked@entry=0,
>> exp=exp@entry=0xfe00747e8fd8, p=p@entry=0xfe00bbfa3000,
>> retdirp=0xfe00747e8e78)
>> at /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdport.c:597
>> #14 0x80b266a1 in nfsrvd_lookup (nd=0xfe00747e9100,
>> isdgram=, dp=0xf80010544380,
>> vpp=0xfe00747e9010,
>> fhp=0xfe00747e9074, exp=0xfe00747e8fd8)
>> at /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdserv.c:607
>> #15 0x80b1073b in n

Re: Panics in recent NFS server

2021-05-31 Thread Mateusz Guzik
ff80003a14e00)
> at /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdkrpc.c:288
> #19 0x80edead2 in svc_executereq (rqstp=0xf80010455800)
> at /share/dim/src/freebsd/src-dim/sys/rpc/svc.c:1037
> #20 svc_run_internal (grp=, grp@entry=0xf800100e6100,
> ismaster=ismaster@entry=1)
> at /share/dim/src/freebsd/src-dim/sys/rpc/svc.c:1313
> #21 0x80eddf80 in svc_run (pool=)
> at /share/dim/src/freebsd/src-dim/sys/rpc/svc.c:1392
> #22 0x80b251ec in nfsrvd_nfsd (td=,
> td@entry=0xfe00bbfa3000, args=args@entry=0xfe00747e9660)
> at /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdkrpc.c:561
> #23 0x80b3ec93 in nfssvc_nfsd (td=0xfe00bbfa3000,
> uap=)
> at /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdport.c:3714
> #24 0x80e6f647 in sys_nfssvc (td=0xfe00bbfa3000,
> uap=0xfe00bbfa33e8)
> at /share/dim/src/freebsd/src-dim/sys/nfs/nfs_nfssvc.c:111
> #25 0x810d891e in syscallenter (td=)
> at
> /share/dim/src/freebsd/src-dim/sys/amd64/amd64/../../kern/subr_syscall.c:189
> #26 amd64_syscall (td=0xfe00bbfa3000, traced=0)
> at /share/dim/src/freebsd/src-dim/sys/amd64/amd64/trap.c:1156
> #27 
> #28 0x0008011aa59a in ?? ()
>
> Is anybody seeing this too? :)
>
> I can probably bisect, but it'll take quite a while.
>
> -Dimitry
>
>


-- 
Mateusz Guzik 



Re: [SOLVED] Re: Strange behavior after running under high load

2021-04-04 Thread Mateusz Guzik
On 4/3/21, Poul-Henning Kamp  wrote:
> 
> Mateusz Guzik writes:
>
>> It is high because of this:
>> msleep(_sig, _list_mtx, PVFS, "vlruwk",
>> hz);
>>
>> i.e. it literally sleeps for 1 second.
>
> Before the line looked like that, it slept on "lbolt" aka "lightning
> bolt" which was woken once a second.
>
> The calculations which come up with those "constants" have always
> been utterly bogus math, not quite "square-root of shoe-size
> times sun-angle in Patagonia", but close.
>
> The original heuristic came from university environments with tons of
> students doing assignments and nethack behind VT102 terminals, on
> filesystems where files only seldom grew past 100KB, so it made sense
> to scale number of vnodes to how much RAM was in the system, because
> that also scaled the size of the buffer-cache.
>
> With a merged VM buffer-cache, whatever validity that heuristic had
> was lost, and we tweaked the bogomath in various ways until it
> seemed to mostly work, trusting the users for which it did not, to
> tweak things themselves.
>
> Please dont tweak the Finagle Constants again.
>
> Rip all that crap out and come up with something fundamentally better.
>

Some level of pacing is probably useful to control total memory use --
there can be A LOT of memory tied up in mere fact that vnode is fully
cached. imo the thing to do is to come up with some watermarks to be
revisited every 1-2 years and to change the behavior when they get
exceeded -- try to whack some stuff but in face of trouble just go
ahead and alloc without sleep 1. Should the load spike sort itself
out, vnlru will slowly get things down to the watermark. If the
watermark is too low, maybe it can autotune. Bottom line is that even
with the current idea of limiting preferred total vnode count, the
corner case behavior can be drastically better suffering SOME perf
loss from recycling vnodes, but not sleeping for a second for every
single one.

I think the notion of 'struct vnode' being a separately allocated
object is not very useful and it comes with complexity (and happens to
suffer from several bugs).

That said, the easiest and safest thing to do in the meantime is to
bump the limit. Perhaps the sleep can be whacked as it is which would
largely sort it out.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [SOLVED] Re: Strange behavior after running under high load

2021-04-02 Thread Mateusz Guzik
On 4/2/21, Stefan Esser  wrote:
> Am 28.03.21 um 16:39 schrieb Stefan Esser:
>> After a period of high load, my now idle system needs 4 to 10 seconds to
>> run any trivial command - even after 20 minutes of no load ...
>>
>>
>> I have run some Monte-Carlo simulations for a few hours, with initially
> 35
>> processes running in parallel for some 10 seconds each.
>>
>> The load decreased over time since some parameter sets were faster to
>> process.
>> All in all 63000 processes ran within some 3 hours.
>>
>> When the system became idle, interactive performance was very bad.
>> Running
>> any trivial command (e.g. uptime) takes some 5 to 10 seconds. Since I
>> have
>> to have this system working, I plan to reboot it later today, but will
>> keep
>> it in this state for some more time to see whether this state persists or
>> whether the system recovers from it.
>>
>> Any ideas what might cause such a system state???
>
> Seems that Mateusz Guzik was right to mention performance issues when
> the system is very low on vnodes. (Thanks!)
>
> I have been able to reproduce the issue and have checked vnode stats:
>
> kern.maxvnodes: 620370
> kern.minvnodes: 155092
> vm.stats.vm.v_vnodepgsout: 6890171
> vm.stats.vm.v_vnodepgsin: 18475530
> vm.stats.vm.v_vnodeout: 228516
> vm.stats.vm.v_vnodein: 1592444
> vfs.wantfreevnodes: 155092
> vfs.freevnodes: 47<- obviously too low ...
> vfs.vnodes_created: 19554702
> vfs.numvnodes: 621284
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 6412
>
> The freevnodes value stayed in this region over several minutes, with
> typical program start times (e.g. for "uptime") in the region of 10 to
> 15 seconds.
>
> After rising maxvnodes to 2,000,000 form 600,000 the system performance
> is restored and I get:
>
> kern.maxvnodes: 200
> kern.minvnodes: 50
> vm.stats.vm.v_vnodepgsout: 7875198
> vm.stats.vm.v_vnodepgsin: 20788679
> vm.stats.vm.v_vnodeout: 261179
> vm.stats.vm.v_vnodein: 1817599
> vfs.wantfreevnodes: 50
> vfs.freevnodes: 205988<- still a lot higher than wantfreevnodes
> vfs.vnodes_created: 19956502
> vfs.numvnodes: 912880
> vfs.cache.debug.vnodes_cel_3_failures: 0
> vfs.cache.stats.heldvnodes: 20702
>
> I do not know why the performance impact is so high - there are a few
> free vnodes (more than required for the shared libraries to start e.g.
> the uptime program). Most probably each attempt to get a vnode triggers
> a clean-up attempt that runs for a significant time, but has no chance
> to actually reach near the goal of 155k or 500k free vnodes.
>

It is high because of this:
msleep(_sig, _list_mtx, PVFS, "vlruwk", hz);

i.e. it literally sleeps for 1 second.

The vnode limit is probably too conservative and behavior when limit
is reached is rather broken. Probably the thing to do is to let
allocations go through while kicking vnlru to free some stuff up. I'll
have to sleep on it.


> Anyway, kern.maxvnodes can be changed at run-time and it is thus easy
> to fix. It seems that no message is logged to report this situation.
> A rate limited hint to rise the limit should help other affected users.
>
> Regards, STefan
>
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Strange behavior after running under high load

2021-03-28 Thread Mateusz Guzik
This may be the problem fixed in
e9272225e6bed840b00eef1c817b188c172338ee ("vfs: fix vnlru marker
handling for filtered/unfiltered cases").

However, there is a long standing performance bug where if vnode limit
is hit, and there is nothing to reclaim, the code is just going to
sleep for one second.

On 3/28/21, Stefan Esser  wrote:
> Am 28.03.21 um 17:44 schrieb Andriy Gapon:
>> On 28/03/2021 17:39, Stefan Esser wrote:
>>> After a period of high load, my now idle system needs 4 to 10 seconds to
>>> run any trivial command - even after 20 minutes of no load ...
>>>
>>>
>>> I have run some Monte-Carlo simulations for a few hours, with initially
>>> 35
>>> processes running in parallel for some 10 seconds each.
>>
>> I saw somewhat similar symptoms with 13-CURRENT some time ago.
>> To me it looked like even small kernel memory allocations took a very long
>> time.
>> But it was hard to properly diagnose that as my favorite tool, dtrace, was
>> also
>> affected by the same problem.
>
> That could have been the case - but I had to reboot to recover the system.
>
> I had let it sit idle fpr a few hours and the last "time uptime" before
> the reboot took 15 second real time to complete.
>
> Response from within the shell (e.g. "echo *") was instantaneous, though.
>
> I tried to trace the program execution of "uptime" with truss and found,
> that the loading of shared libraries proceeded at about one or two per
> second until all were attached and then the program quickly printed the
> expected results.
>
> I could probably recreate the issue by running the same set of programs
> that triggered it a few hours ago, but this is a production system and
> I need it to be operational through the week ...
>
> Regards, STefan
>
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-18 Thread Mateusz Guzik
To sum up what happened, Yamagi was kind enough to test several
patches and ultimately the issue got solved here
https://cgit.freebsd.org/src/commit/?id=e9272225e6bed840b00eef1c817b188c172338ee
. The patch also got merged into releng/13.0

On 3/17/21, Yamagi  wrote:
> Hi,
> me and some other users in the ##bsdforen.de IRC channel have the
> problem that during Poudriere runs processes getting stuck in the
> 'vlruwk' state.
>
> For me it's fairly reproduceable. The problems begin about 20 to 25
> minutes after I've started poudriere. At first only some ccache
> processes hang in the 'vlruwk' state, after another 2 to 3 minutes
> nearly everything hangs and the total CPU load drops to about 5%.
> When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
> until the system recovers.
>
> First the setup:
> * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
>   The zvol has a 8k blocksize, the guests partition are aligned to 8k.
>   The guest has only zpool, the pool was created with ashift=13. The
>   vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
> * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
>   either of these options lowers the probability of the problem to show
>   up significantly.
>
> I've tried several git revisions starting with 14-CURRENT at
> 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
> least one known to be good revision. No chance, even a kernel build
> from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
> +) has the problem. The problem isn't reproduceable with
> 12.2-RELEASE.
>
> The kernel stack ('procstat -kk') of a hanging process is:
> mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
> sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
> amd64_syscall+0x140 fast_syscall_common+0xf8
>
> The kernel stack of vnlru is changing, even while the processes are
> hanging:
> * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
> _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
> * fork_exit+0x80 fork_trampoline+0xe
>
> Since vnlru is accumulating CPU time it looks like it's doing at least
> something. As an educated guess I would say that vn_alloc_hard() is
> waiting a long time or even forever to allocate new vnodes.
>
> I can provide more information, I just need to know what.
>
>
> Regards,
> Yamagi
>
> --
> Homepage: https://www.yamagi.org
> Github:   https://github.com/yamagi
> GPG:  0x1D502515
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-17 Thread Mateusz Guzik
Thanks, I'm going to have to ponder a little bit.

In the meantime can you apply this:
https://people.freebsd.org/~mjg/maxvnodes.diff

Once you boot, tweak maxvnodes:
sysctl kern.maxvnodes=1049226

Run poudriere. Once it finishes, inspect sysctl vfs.highest_numvnodes

On 3/17/21, Yamagi  wrote:
> Hi Mateusz,
> the sysctl output after about 10 minutes into the problem is attached.
> In case that its stripped by Mailman a copy can be found here:
> https://deponie.yamagi.org/temp/sysctl_vlruwk.txt.xz
>
> Regards,
> Yamagi
>
> On Wed, 17 Mar 2021 15:57:59 +0100
> Mateusz Guzik  wrote:
>
>> Can you reproduce the problem and run obtain "sysctl -a"?
>>
>> In general, there is a vnode limit which is probably too small. The
>> reclamation mechanism is deficient in that it will eventually inject
>> an arbitrary pause.
>>
>> On 3/17/21, Yamagi  wrote:
>> > Hi,
>> > me and some other users in the ##bsdforen.de IRC channel have the
>> > problem that during Poudriere runs processes getting stuck in the
>> > 'vlruwk' state.
>> >
>> > For me it's fairly reproduceable. The problems begin about 20 to 25
>> > minutes after I've started poudriere. At first only some ccache
>> > processes hang in the 'vlruwk' state, after another 2 to 3 minutes
>> > nearly everything hangs and the total CPU load drops to about 5%.
>> > When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
>> > until the system recovers.
>> >
>> > First the setup:
>> > * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
>> >   The zvol has a 8k blocksize, the guests partition are aligned to 8k.
>> >   The guest has only zpool, the pool was created with ashift=13. The
>> >   vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
>> > * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
>> >   either of these options lowers the probability of the problem to show
>> >   up significantly.
>> >
>> > I've tried several git revisions starting with 14-CURRENT at
>> > 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
>> > least one known to be good revision. No chance, even a kernel build
>> > from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
>> > +) has the problem. The problem isn't reproduceable with
>> > 12.2-RELEASE.
>> >
>> > The kernel stack ('procstat -kk') of a hanging process is:
>> > mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
>> > sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
>> > amd64_syscall+0x140 fast_syscall_common+0xf8
>> >
>> > The kernel stack of vnlru is changing, even while the processes are
>> > hanging:
>> > * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
>> > _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
>> > * fork_exit+0x80 fork_trampoline+0xe
>> >
>> > Since vnlru is accumulating CPU time it looks like it's doing at least
>> > something. As an educated guess I would say that vn_alloc_hard() is
>> > waiting a long time or even forever to allocate new vnodes.
>> >
>> > I can provide more information, I just need to know what.
>> >
>> >
>> > Regards,
>> > Yamagi
>> >
>> > --
>> > Homepage: https://www.yamagi.org
>> > Github:   https://github.com/yamagi
>> > GPG:  0x1D502515
>> >
>>
>>
>> --
>> Mateusz Guzik 
>
>
> --
> Homepage: https://www.yamagi.org
> Github:   https://github.com/yamagi
> GPG:  0x1D502515
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: 13.0-RC2 / 14-CURRENT: Processes getting stuck in vlruwk state

2021-03-17 Thread Mateusz Guzik
Can you reproduce the problem and run obtain "sysctl -a"?

In general, there is a vnode limit which is probably too small. The
reclamation mechanism is deficient in that it will eventually inject
an arbitrary pause.

On 3/17/21, Yamagi  wrote:
> Hi,
> me and some other users in the ##bsdforen.de IRC channel have the
> problem that during Poudriere runs processes getting stuck in the
> 'vlruwk' state.
>
> For me it's fairly reproduceable. The problems begin about 20 to 25
> minutes after I've started poudriere. At first only some ccache
> processes hang in the 'vlruwk' state, after another 2 to 3 minutes
> nearly everything hangs and the total CPU load drops to about 5%.
> When I stop poudriere with ctrl-c it takes another 3 to 5 minutes
> until the system recovers.
>
> First the setup:
> * poudriere runs in a bhyve vm on zvol. The host is a 12.2-RELEASE-p2.
>   The zvol has a 8k blocksize, the guests partition are aligned to 8k.
>   The guest has only zpool, the pool was created with ashift=13. The
>   vm has 16 E5-2620 and 16 gigabytes RAM assigned to it.
> * poudriere is configured with ccache and ALLOW_MAKE_JOBS=yes. Removing
>   either of these options lowers the probability of the problem to show
>   up significantly.
>
> I've tried several git revisions starting with 14-CURRENT at
> 54ac6f721efccdba5a09aa9f38be0a1c4ef6cf14 in the hope that I can find at
> least one known to be good revision. No chance, even a kernel build
> from 0932ee9fa0d82b2998993b649f9fa4cc95ba77d6 (Wed Sep 2 19:18:27 2020
> +) has the problem. The problem isn't reproduceable with
> 12.2-RELEASE.
>
> The kernel stack ('procstat -kk') of a hanging process is:
> mi_switch+0x155 sleepq_switch+0x109 sleepq_catch_signals+0x3f1
> sleepq_wait_sig+0x9 _sleep+0x2aa kern_wait6+0x482 sys_wait4+0x7d
> amd64_syscall+0x140 fast_syscall_common+0xf8
>
> The kernel stack of vnlru is changing, even while the processes are
> hanging:
> * mi_switch+0x155 sleepq_switch+0x109 sleepq_timedwait+0x4b
> _sleep+0x29b vnlru_proc+0xa05 fork_exit+0x80 fork_trampoline+0xe
> * fork_exit+0x80 fork_trampoline+0xe
>
> Since vnlru is accumulating CPU time it looks like it's doing at least
> something. As an educated guess I would say that vn_alloc_hard() is
> waiting a long time or even forever to allocate new vnodes.
>
> I can provide more information, I just need to know what.
>
>
> Regards,
> Yamagi
>
> --
> Homepage: https://www.yamagi.org
> Github:   https://github.com/yamagi
> GPG:  0x1D502515
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: -CURRENT panics in NFS

2021-02-27 Thread Mateusz Guzik
Thanks. I adjusted the namecache. However, the nfs fix provided by
Rick should go in regardless.

On 2/27/21, Juraj Lutter  wrote:
>
>
>> On 27 Feb 2021, at 21:49, Mateusz Guzik  wrote:
>>
>> Can you dump 'struct componentname *cnp'? This should do the trick:
>> f 12
>> p cnp
>>
>> Most notably I want to know if the name to added is a literal dot.
>>
>
> Yes, it is a dot (the directory itself):
>
> cn_nameptr = 0xfffffe0011428018 ".", cn_namelen = 1
>
> otis
>
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: -CURRENT panics in NFS

2021-02-27 Thread Mateusz Guzik
You should be able to just use kgdb on the old kernel and the
crashdump you already collected, provided both are still around.
Alternatively boot with this without the fix:

diff --git a/sys/kern/vfs_cache.c b/sys/kern/vfs_cache.c
index fef1e31d197b..c4d2990b155d 100644
--- a/sys/kern/vfs_cache.c
+++ b/sys/kern/vfs_cache.c
@@ -2266,6 +2266,9 @@ cache_enter_time(struct vnode *dvp, struct vnode
*vp, struct componentname *cnp,
KASSERT(cnp->cn_namelen <= NAME_MAX,
("%s: passed len %ld exceeds NAME_MAX (%d)", __func__,
cnp->cn_namelen,
NAME_MAX));
+   if (dvp == vp) {
+   panic("%s: same vnodes; cnp [%s] len %ld\n", __func__,
cnp->cn_nameptr, cnp->cn_namelen);
+   }
VNPASS(dvp != vp, dvp);
VNPASS(!VN_IS_DOOMED(dvp), dvp);
VNPASS(dvp->v_type != VNON, dvp);


On 2/27/21, Juraj Lutter  wrote:
> I am now running a patched kernel, without problems.
>
> I can boot to unpatched one and see the output of these ddb commands.
>
> otis
>
> —
> Juraj Lutter
> XMPP: juraj (at) lutter.sk
> GSM: +421907986576
>
>> On 27 Feb 2021, at 21:49, Mateusz Guzik  wrote:
>>
>> Can you dump 'struct componentname *cnp'? This should do the trick:
>> f 12
>> p cnp
>>
>
>
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: -CURRENT panics in NFS

2021-02-27 Thread Mateusz Guzik
Can you dump 'struct componentname *cnp'? This should do the trick:
f 12
p cnp

Most notably I want to know if the name to added is a literal dot.

That case is handled if necessary, but the assert was added to start
making the interface stricter. If the name is a dot I'll be inclined
to remove the assert for 13.x to avoid problems with other callers of
the sort.

Otherwise I'll have to check what's going on there.

On 2/27/21, Juraj Lutter  wrote:
> Hi,
>
> thank you for the swift reaction. This patch fixed my problem.
>
> otis
>
> —
> Juraj Lutter
> XMPP: juraj (at) lutter.sk
> GSM: +421907986576
>
>> On 27 Feb 2021, at 16:53, Rick Macklem  wrote:
>>
>> I reproduced the problem and the attached trivial patch
>> seems to fix it. Please test the patch if you can.
>>
>
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: condition seqc_in_modify(_vp->v_seqc) not met at zfs_acl.c:1147 (zfs_acl_chown_setattr)

2021-02-16 Thread Mateusz Guzik
I think for future proofing it would be best if all vnodes going there
had seqc marked, thus I think this should do the trick:

diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
index d5f0da9ecd4b..8172916c4329 100644
--- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
+++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
@@ -2756,7 +2756,9 @@ zfs_setattr(znode_t *zp, vattr_t *vap, int
flags, cred_t *cr)
err = zfs_acl_chown_setattr(zp);
ASSERT(err == 0);
if (attrzp) {
+   vn_seqc_write_begin(ZTOV(attrzp));
err = zfs_acl_chown_setattr(attrzp);
+   vn_seqc_write_end(ZTOV(attrzp));
ASSERT(err == 0);
}
}

I don't see other calls to the routine.

On 2/16/21, Andriy Gapon  wrote:
> On 15/02/2021 11:45, Andriy Gapon wrote:
>> On 15/02/2021 10:22, Andriy Gapon wrote:
>>>
>>> I've got this panic once when copying a couple of files.
>>> The system is stable/13 as of 1996360d7338d, a custom kernel
>>> configuration, but
>>> no local source code modifications.
>>>
>>> Unread portion of the kernel message buffer:
>>> VNASSERT failed: ({ seqc_t __seqc = (_vp->v_seqc);
>>> __builtin_expect((__seqc &
>>> 1), 0); }) not true at
>>> /usr/devel/git/trant/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_acl.c:1147
>>> (zfs_acl_chown_setattr)
>>> 0xf8013e4e85b8: type VDIR
>>> usecount 1, writecount 0, refcount 1 seqc users 0 mountedhere 0
>>> hold count flags ()
>>> flags ()
>>> lock type zfs: EXCL by thread 0xfe01dd1cd560 (pid 30747,
>>> kdeinit5, tid
>>> 159911)
>>> panic: condition seqc_in_modify(_vp->v_seqc) not met at
>>> /usr/devel/git/trant/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_acl.c:1147
>>> (zfs_acl_chown_setattr)
>>>
>>> Any ideas, suggestions, hints?
>>> Thanks!
>>>
>> ...
>>> #4  0x8036fd21 in zfs_acl_chown_setattr (zp=0xf801ccd203b0)
>>> at
>>> /usr/devel/git/trant/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_acl.c:1147
>>> #5  0x8037e52d in zfs_setattr (zp=0xf8024b04f760,
>>> vap=vap@entry=0xfe029a36c870, flags=flags@entry=0,
>>> cr=, cr@entry=0xf8003ecedc00)
>>> at
>>> /usr/devel/git/trant/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:2758
>>
>> So, this is actually the second zfs_acl_chown_setattr call here:
>> err = zfs_acl_chown_setattr(zp);
>> ASSERT(err == 0);
>> if (attrzp) {
>> err = zfs_acl_chown_setattr(attrzp);
>> ASSERT(err == 0);
>> }
>>
>> I am not sure if the assertion is actually applicable to attrzp (extended
>> attributes "directory").
>> At least I do not see any seq calls for it.
>>
>
> So, I think that the problem should be reproducible by simply chown-ing a
> file
> with an extended attribute.  The kernel should be compiled with both
> DEBUG_VFS_LOCKS and INVARIANTS.
>
> --
> Andriy Gapon
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: jails: /pool/jails/fulljailmake -> /pool/jails/fulljailbmake: No such file or directory

2021-02-15 Thread Mateusz Guzik
Can you try this with reverting:

commit ee10666327b622c2f20a4ac17e7a5673b04e7c9a
Author: Simon J. Gerraty 
Date:   Sun Feb 14 17:20:10 2021 -0800

Links for bmake and bmake.1

Some folk forget that make is bmake, and want the links...

MFC after: 1 week

diff --git a/usr.bin/bmake/Makefile.inc b/usr.bin/bmake/Makefile.inc
index 96431c19d2af..8c4cb659e1d8 100644
--- a/usr.bin/bmake/Makefile.inc
+++ b/usr.bin/bmake/Makefile.inc
@@ -9,6 +9,8 @@

 .if exists(${.CURDIR}/tests)
 PROG= make
+LINKS= make bmake
+MLINKS= ${MAN} b${MAN}
 .endif

 .if !defined(MK_SHARED_TOOLCHAIN) || ${MK_SHARED_TOOLCHAIN} == "no"

If reverting this does not help, can you try with:
sysctl vfs.cache_fast_lookup=0

On 2/15/21, O. Hartmann  wrote:
> The base host is running FreeBSD 14.0-CURRENT #6 main-n244784-8563de2f279:
> Fri
> Feb 12 12:48:34 CET 2021 amd64, the source tree is at "commit
> 5dce03847fdc7bc6eb959282c0ae2117b1991746".
>
>
> Updating jails via "ezjail-admin update -i", or for poudriere based CURRENT
> (14-CURRENT) jails via "poudriere jail -j jail -u -b", installation of
> world
> fails due to an error, shown below:
>
> [...]
>
> ===> usr.bin/bmake (install)
> install  -s -o root -g wheel -m 555   make
> /pool/jails/fulljail/usr/bin/make
> install  -o root -g wheel -m 444 make.1.gz
> /pool/jails/fulljail/usr/share/man/man1/ rm -f
> /pool/jails/fulljail/usr/share/man/man1/bmake.1
> /pool/jails/fulljail/usr/share/man/man1/bmake.1.gz;  install -l h -o root
> -g
> wheel -m 444  /pool/jails/fulljail/usr/share/man/man1/make.1.gz
> /pool/jails/fulljail/usr/share/man/man1/bmake.1.gz install -l h -o root -g
> wheel -m 555  /pool/jails/fulljailmake /pool/jails/fulljailbmake install:
> link
> /pool/jails/fulljailmake -> /pool/jails/fulljailbmake: No such file or
> directory *** Error code 71
>
> Stop.
> make[5]: stopped in /usr/src/usr.bin/bmake
> *** Error code 1
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Problem compiling git from ports

2021-01-01 Thread Mateusz Guzik
What filesystem is this?

Does it work if you sysctl vfs.cache_fast_lookup=0 ?

On 1/1/21, Filippo Moretti  wrote:
> I could not update git from 2.29 in my system
> [root@STING /usr/ports/devel/git]# uname -a
> FreeBSD STING 13.0-CURRENT FreeBSD 13.0-CURRENT #14
> main-c255423-gee938b20335: Wed Dec 30 10:41:00 CET 2020
> root@STING:/usr/obj/usr/src/amd64.amd64/sys/STING  amd64
>
>
>
>
> This is the error which can be duplicatedsincerelyFilippo
>
>
>
> [root@STING /usr/ports/devel/git]# gmake[3]: 'GIT-VERSION-FILE' is up to
> date.
> gmake[3]: Leaving directory '/usr/ports/devel/git/work-default/git-2.30.0'
> sed -e '1s|#!.*/sh|#!/bin/sh|' git-subtree.sh >git-subtree
> chmod +x git-subtree
> install -d -m 755
> /usr/ports/devel/git/work-default/stage/usr/local/libexec/git-core
> install -m 755 git-subtree
> /usr/ports/devel/git/work-default/stage/usr/local/libexec/git-core
> asciidoctor -b docbook -d manpage  \
> -agit_version=2.30.0 -I../../Documentation -rasciidoctor-extensions
> -alitdd='' git-subtree.txt
> gmake[2]: asciidoctor: No such file or directory
> gmake[2]: *** [Makefile:86: git-subtree.xml] Error 127
> gmake[2]: Leaving directory
> '/usr/ports/devel/git/work-default/git-2.30.0/contrib/subtree'
> *** Error code 2
>
> Stop.
> make[1]: stopped in /usr/ports/devel/git
> *** Error code 1
>
> Stop.
> make: stopped in /usr/ports/devel/git
> [root@STING /usr/ports/devel/git]#
>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: general protection fault from uipc_sockaddr+0x4c

2020-12-08 Thread Mateusz Guzik
On 12/8/20, Mark Johnston  wrote:
> On Tue, Dec 08, 2020 at 04:40:16PM +0100, Mateusz Guzik wrote:
>> I think this is a long standing bug against exiting processes.
>>
>> filedesc_out only increments *hold* count, but that does not prevent
>> fdescfree_fds from progressing and freeing everything without any
>> locks held.
>
> I think it is fallout from r36: before that, fdescfree() acquired
> and released the exclusive fd table lock between decrementing
> fdp->fd_refcount and calling fdescfree_fds().  This would serialize with
> the loop in kern_proc_fildesc_out(), which checks fdp->fd_refcount > 0
> at the beginning of each iteration.  Now there is no serialization and
> they can race.
>

Oh I forgot consumers keep checking for fd_refcount. In that case
probably would be best to add sx_wait_unlocked.

>> A hotfix (for mfc) would add locking around it, but a long term fix
>> should wait for hold count to drain. By that point there can't be any
>> new arrivals due to:
>>
>> PROC_LOCK(p);
>> p->p_fd = NULL;
>> PROC_UNLOCK(p);
>>
>> I'll code both later today.
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: general protection fault from uipc_sockaddr+0x4c

2020-12-08 Thread Mateusz Guzik
I think this is a long standing bug against exiting processes.

filedesc_out only increments *hold* count, but that does not prevent
fdescfree_fds from progressing and freeing everything without any
locks held.

A hotfix (for mfc) would add locking around it, but a long term fix
should wait for hold count to drain. By that point there can't be any
new arrivals due to:

PROC_LOCK(p);
p->p_fd = NULL;
PROC_UNLOCK(p);

I'll code both later today.

On 12/8/20, Mark Johnston  wrote:
> On Tue, Dec 08, 2020 at 12:47:18PM +0100, Peter Holm wrote:
>> I just got this panic:
>>
>> Fatal trap 9: general protection fault while in kernel mode
>> cpuid = 9; apic id = 09
>> instruction pointer = 0x20:0x80bc6e22
>> stack pointer = 0x28:0xfe0698887630
>> frame pointer = 0x28:0xfe06988876b0
>> code segment  = base 0x0, limit 0xf, type 0x1b
>>= DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = interrupt enabled, resume, IOPL = 0
>> current process  = 45966 (fstat)
>> trap number  = 9
>> panic: general protection fault
>> cpuid = 9
>> time = 1607416693
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe0698887340
>> vpanic() at vpanic+0x181/frame 0xfe0698887390
>> panic() at panic+0x43/frame 0xfe06988873f0
>> trap_fatal() at trap_fatal+0x387/frame 0xfe0698887450
>> trap() at trap+0xa4/frame 0xfe0698887560
>> calltrap() at calltrap+0x8/frame 0xfe0698887560
>> --- trap 0x9, rip = 0x80bc6e22, rsp = 0xfe0698887630, rbp =
>> 0xfe06988876b0 ---
>> __mtx_lock_sleep() at __mtx_lock_sleep+0xd2/frame 0xfe06988876b0
>> __mtx_lock_flags() at __mtx_lock_flags+0xe5/frame 0xfe0698887700
>> uipc_sockaddr() at uipc_sockaddr+0x4c/frame 0xfe0698887730
>> soo_fill_kinfo() at soo_fill_kinfo+0x11e/frame 0xfe0698887770
>> kern_proc_filedesc_out() at kern_proc_filedesc_out+0xb57/frame
>> 0xfe0698887810
>> sysctl_kern_proc_filedesc() at sysctl_kern_proc_filedesc+0x7d/frame
>> 0xfe0698887890
>> sysctl_root_handler_locked() at sysctl_root_handler_locked+0x9c/frame
>> 0xfe06988878e0
>> sysctl_root() at sysctl_root+0x20d/frame 0xfe0698887960
>> userland_sysctl() at userland_sysctl+0x180/frame 0xfe0698887a10
>> sys___sysctl() at sys___sysctl+0x5f/frame 0xfe0698887ac0
>> amd64_syscall() at amd64_syscall+0x147/frame 0xfe0698887bf0
>> fast_syscall_common() at fast_syscall_common+0xf8/frame
>> 0xfe0698887bf0
>> --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x8003948ea, rsp =
>> 0x7fffc138, rbp = 0x7fffc170 ---
>>
>> https://people.freebsd.org/~pho/stress/log/log0004.txt
>
> So here the unpcb is freed, and indeed the file itself has been closed:
>
> $3 = {f_flag = 0x3, f_count = 0x0, f_data = 0x0, f_ops = 0x81901f50
> ,
>   f_vnode = 0x0, f_cred = 0xf80248beb600, f_type = 0x2,
> f_vnread_flags = 0x0,
>   {f_seqcount = {0x0, 0x0}, f_pipegen = 0x0}, f_nextoff = {0x0, 0x0},
>   f_vnun = {fvn_cdevpriv = 0x0, fvn_advice = 0x0}, f_offset = 0x0}
>
> However, it must have happened very recently because soo_fill_kinfo()
> dereferences fp->f_data and yet we did not panic due to a null
> dereference.
>
> kern_proc_filedesc_out() holds the fdtable shared lock thoughout all of
> this, which is supposed to prevent the table entry from being freed
> since that requires the exclusive lock.
>
> Could you show fdp->fd_ofiles[3] and fdp->fd_map[0] from frame 26?
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: page fault due to close(2), possibly drm and i915kms related

2020-12-03 Thread Mateusz Guzik
0xfe018500e780 ---
> [160176] __mtx_lock_sleep() at 0x808cbd2c =
> __mtx_lock_sleep+0xfc/frame 0xfe018500e780
> [160176] doselwakeup() at 0x8095fbee = doselwakeup+0xde/frame
> 0xfe018500e7c0
> [160176] sowakeup() at 0x80988c7e = sowakeup+0x1e/frame
> 0xfe018500e7f0
> [160176] soisdisconnected() at 0x8099235a =
> soisdisconnected+0x8a/frame 0xfe018500e810
> [160176] unp_disconnect() at 0x8099a9fe = unp_disconnect+0x12e/frame
> 0xfe018500e850
> [160176] uipc_disconnect() at 0x809982a2 =
> uipc_disconnect+0x42/frame 0xfe018500e870
> [160176] soclose() at 0x8098cc96 = soclose+0x76/frame
> 0xfe018500e8d0
> [160176] _fdrop() at 0x80891eb1 = _fdrop+0x11/frame
> 0xfe018500e8f0
> [160176] closef() at 0x80895098 = closef+0x278/frame
> 0xfe018500e980
> [160176] closefp() at 0x808921d9 = closefp+0x89/frame
> 0xfe018500e9c0
> [160176] amd64_syscall() at 0x80cf8a45 = amd64_syscall+0x755/frame
> 0xfe018500eaf0
> [160176] fast_syscall_common() at 0x80ccfd0e =
> fast_syscall_common+0xf8/frame 0xfe018500eaf0
> [160176] --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x8003b0d7a, rsp =
> 0x7fffe998, rbp = 0x7fffe9b0 ---
> [160176] Uptime: 1d20h29m36s
> [160176] Dumping 6415 out of 32449
> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>
> No symbol "zombproc" in current context.
> Reading symbols from /boot/kernel/dtraceall.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/dtraceall.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/dtraceall.ko
> Reading symbols from /boot/kernel/profile.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/profile.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/profile.ko
> Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/opensolaris.ko
> Reading symbols from /boot/kernel/dtrace.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/dtrace.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/dtrace.ko
> Reading symbols from /boot/kernel/systrace_freebsd32.ko...Reading symbols
> from /usr/lib/debug//boot/kernel/systrace_freebsd32.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/systrace_freebsd32.ko
> Reading symbols from /boot/kernel/systrace.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/systrace.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/systrace.ko
> Reading symbols from /boot/kernel/sdt.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/sdt.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/sdt.ko
> Reading symbols from /boot/kernel/fasttrap.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/fasttrap.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/fasttrap.ko
> Reading symbols from /boot/kernel/fbt.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/fbt.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/fbt.ko
> Reading symbols from /boot/kernel/dtnfscl.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/dtnfscl.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/dtnfscl.ko
> Reading symbols from /boot/kernel/dtmalloc.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/dtmalloc.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/dtmalloc.ko
> Reading symbols from /boot/modules/sysctlinfo.ko...done.
> Loaded symbols for /boot/modules/sysctlinfo.ko
> Reading symbols from /boot/kernel/cc_htcp.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/cc_htcp.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/cc_htcp.ko
> Reading symbols from /boot/kernel/lindebugfs.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/lindebugfs.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/lindebugfs.ko
> Reading symbols from /boot/kernel/linuxkpi.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/linuxkpi.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/linuxkpi.ko
> Reading symbols from /boot/kernel/pchtherm.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/pchtherm.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/pchtherm.ko
> Reading symbols from /boot/kernel/drm.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/drm.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/drm.ko
> Reading symbols from /boot/kernel/linuxkpi_gplv2.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/linuxkpi_gplv2.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/linuxkpi_gplv2.ko
> Reading symbols from /boot/kernel/i915kms.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/i915kms.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/i915kms.ko
> Reading symbols from /boot/modules/i915_kbl_dmc_ver1_04_bin.ko...done.
> Loaded symbols for /boot/modules/i915_kbl_dmc_ver1_04_bin.ko
> Reading symbols from /boot/kernel/mac_ntpd.ko...Reading symbols from
> /usr/lib/debug//boot/kernel/mac_ntpd.ko.debug...done.
> done.
> Loaded symbols for /boot/kernel/mac_ntpd.ko
> #0  doadump (textdump=1) at src/sys/amd64/include/pcpu_aux.h:55
> 55__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
> pcpu,
> (kgdb) q
>
> Command exit status: 0
> Script done on Thu Dec  3 07:13:48 2020
>
> kkgdb is a shell script of mine that prepends /usr/libexec to PATH and
> exec's kgdb with the given command line. That shell script makes life
> in kgdb bearable.
>
> The minidump is available on request.
>
> Is this a known case?
>
> --
> Trond.
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Laptop exhibits erratic responsiveness

2020-11-30 Thread Mateusz Guzik
On 11/30/20, David Wolfskill  wrote:
> On Sun, Nov 29, 2020 at 03:20:15PM +0100, Mateusz Guzik wrote:
>> On 11/29/20, David Wolfskill  wrote:
>> > On Sat, Nov 28, 2020 at 10:47:57AM -0500, Jonathan Looney wrote:
>> >> FWIW, I would try running lockstat on the box. (My supposition is that
>> >> the
>> >> delay is due to a lock. That could be incorrect.  Lockstat may provide
>> >> some
>> >> clue as to whether this is a line of inquiry worth pursuing.)
>> >> 
>> >
>> > Thanks (again), Jonathan.
>> >
>> > So... I did that (during this morning's daily upgrade cycle); the
>> > results may be "of interest" to some.
>> ...
>> > http://www.catwhisker.org/~david/FreeBSD/head/lockstat/README
>> ...
>>
>> According to the data you got the entire kernel "freezes" every 11-12
>> seconds. So something way off is going on there.
>>
>> Given that the bug seems to be reproducible I think it would be best
>> if you just bisected to the offending commit.
>>
>> --
>> Mateusz Guzik 
>
> I had thought that the issue arose  (as noted in my initial message)
> around 09 Nov, which was the day that my daily update on head went from
> r367484 to r367517.
>
> So for my inital attempt at a "before the breakage" kernel (& world), I
> "updated" sources to r367444, as I had updated to that revision on 07
> Nov.
>
> Testing (again, via "ping albert") still showed most RTTs around 0.600
> ms, but (around) every 11th packet, I would see an RTT in the 400 - 650
> ms range -- around a factor of 1000(ish).  So while I was not seeing
> RTTs in excess of 68 seconds (as I did at least one time for the initial
> message), I believe(d) that the underlying issue still existed at
> r367444.
>
> Lather, rinse, repeat -- for:
>
> * r367484 (from 08 Nov)
> * r367243 (from 01 Nov)
> * r366312 (from 01 Oct)
> * r363759 (from 01 Aug)
>
> At this point, I was questioning many assumptions, so I cleared /usr/obj
> completely and then updated back to r368143... and the issue remained.
>
> (The laptop was quite busy yesterday.)
>
> I definitely do NOT see anything like this running stable/12 (presently
> at r368181).
>
> I will try some experiments with another laptop (a newer one, for which
> the built-in mouse is detected weirdly, making it annoying to use for me
> -- but I can still ping from it).
>

As a sanity check, does the issue manifest itself if you build a
GENERIC-NODEBUG kernel?

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Laptop exhibits erratic responsiveness

2020-11-29 Thread Mateusz Guzik
On 11/29/20, David Wolfskill  wrote:
> On Sat, Nov 28, 2020 at 10:47:57AM -0500, Jonathan Looney wrote:
>> FWIW, I would try running lockstat on the box. (My supposition is that
>> the
>> delay is due to a lock. That could be incorrect.  Lockstat may provide
>> some
>> clue as to whether this is a line of inquiry worth pursuing.)
>> 
>
> Thanks (again), Jonathan.
>
> So... I did that (during this morning's daily upgrade cycle); the
> results may be "of interest" to some.
>
> I have placed copies of the typescripts in:
>
> http://www.catwhisker.org/~david/FreeBSD/head/lockstat/
>
> I also scribbled a "README" in that same directory (though it doesn't
> seem to show up in the listing); it may be accessed via
>
> http://www.catwhisker.org/~david/FreeBSD/head/lockstat/README
>
> My prior message in this thread showed what I saw during a "ping albert"
> from the laptop while it was running head -- most RTTs were around 0.600
> ms, but some were notably longer, with at least one that was over 68
> seconds.
>
> So I did a "lockstat ping -c 64 albert" while the laptop was running
> stable/12@r368123 (as a reference point); it is probably boring. :-}
>
> Then (this morning), I tried a simple "lockstat sleep 600" on the laptop
> while it was running head@r368119 (and building head@r368143); we see
> the "lockstat" output in the "lockstat_head" file.
>
> It then occurred to me that trying a "lockstat ping albert" might be
> useful, so I fired up "lockstat ping -c 600 albert" -- which started up
> OK, and demonstrated some long RTTs about every 11 packets or so, but we
> see thing come to a screeching halt with:
>
> ...
> 64 bytes from 172.16.8.13: icmp_seq=534 ttl=63 time=0.664 ms
> lockstat: dtrace_status(): Abort due to systemic unresponsiveness
> 64 bytes from 172.16.8.13: icmp_seq=535 ttl=63 time=9404.383 ms
>
> and we get no lockstat output. :-/
>
>
> Finally, as another "control," I ran similar commands from freebeast,
> while it was running head@r368119 (and building head@r368143).  Those
> results are in the "lockstat_freebeast" file.
>

According to the data you got the entire kernel "freezes" every 11-12
seconds. So something way off is going on there.

Given that the bug seems to be reproducible I think it would be best
if you just bisected to the offending commit.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: VERIFY(ZFS_TEARDOWN_READ_HELD(zfsvfs)) failed

2020-11-07 Thread Mateusz Guzik
Fixed as of r367454 (also see r367453).

On 11/6/20, Mateusz Guzik  wrote:
> I think I have an idea how to keep this. In the meantime you can just
> comment it out.
>
> On 11/6/20, Mateusz Guzik  wrote:
>> On 11/6/20, Andriy Gapon  wrote:
>>> On 06/11/2020 22:58, Mateusz Guzik wrote:
>>>> Note the underlying primitive was recently replaced.
>>>>
>>>> One immediate thing to check would be exact state of the lock.
>>>> READ_HELD checks for reading only, fails if you have this
>>>> write-locked, which is a plausible explanation if you are coming in
>>>> from less likely codepath.
>>>>
>>>> iow what's the backtrace and can you print both rms->readers and
>>>> rms->owner (+ curthread)
>>>
>>> Unfortunately, I do not have a vmcore, only a picture of the screen.
>>>
>>> ZFS code looks correct, the lock should be held in read mode, so indeed
>>> I
>>> suspect that the problem is with rms.
>>>
>>> It looks like rms_rlock() does not change rmslock::readers, but
>>> rms_rowned()
>>> checks it?
>>>
>>> That's just from a first, super-quick look at the code.
>>>
>>
>> Heh, now that you mention it, I remember wanting to just remove the
>> arguably spurious assert. Linux is never doing it for reading. The
>> only state asserts made are for writing which works fine.
>>
>> As for reading assertions, there is no performant way to make it work
>> and I don't think it is worth it as it is.
>>
>> As such, I vote for just removing these 2 asserts. They really don't
>> buy anything to begin with.
>>
>> --
>> Mateusz Guzik 
>>
>
>
> --
> Mateusz Guzik 
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: VERIFY(ZFS_TEARDOWN_READ_HELD(zfsvfs)) failed

2020-11-06 Thread Mateusz Guzik
I think I have an idea how to keep this. In the meantime you can just
comment it out.

On 11/6/20, Mateusz Guzik  wrote:
> On 11/6/20, Andriy Gapon  wrote:
>> On 06/11/2020 22:58, Mateusz Guzik wrote:
>>> Note the underlying primitive was recently replaced.
>>>
>>> One immediate thing to check would be exact state of the lock.
>>> READ_HELD checks for reading only, fails if you have this
>>> write-locked, which is a plausible explanation if you are coming in
>>> from less likely codepath.
>>>
>>> iow what's the backtrace and can you print both rms->readers and
>>> rms->owner (+ curthread)
>>
>> Unfortunately, I do not have a vmcore, only a picture of the screen.
>>
>> ZFS code looks correct, the lock should be held in read mode, so indeed I
>> suspect that the problem is with rms.
>>
>> It looks like rms_rlock() does not change rmslock::readers, but
>> rms_rowned()
>> checks it?
>>
>> That's just from a first, super-quick look at the code.
>>
>
> Heh, now that you mention it, I remember wanting to just remove the
> arguably spurious assert. Linux is never doing it for reading. The
> only state asserts made are for writing which works fine.
>
> As for reading assertions, there is no performant way to make it work
> and I don't think it is worth it as it is.
>
> As such, I vote for just removing these 2 asserts. They really don't
> buy anything to begin with.
>
> --
> Mateusz Guzik 
>


-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: panic: VERIFY(ZFS_TEARDOWN_READ_HELD(zfsvfs)) failed

2020-11-06 Thread Mateusz Guzik
On 11/6/20, Andriy Gapon  wrote:
> On 06/11/2020 22:58, Mateusz Guzik wrote:
>> Note the underlying primitive was recently replaced.
>>
>> One immediate thing to check would be exact state of the lock.
>> READ_HELD checks for reading only, fails if you have this
>> write-locked, which is a plausible explanation if you are coming in
>> from less likely codepath.
>>
>> iow what's the backtrace and can you print both rms->readers and
>> rms->owner (+ curthread)
>
> Unfortunately, I do not have a vmcore, only a picture of the screen.
>
> ZFS code looks correct, the lock should be held in read mode, so indeed I
> suspect that the problem is with rms.
>
> It looks like rms_rlock() does not change rmslock::readers, but
> rms_rowned()
> checks it?
>
> That's just from a first, super-quick look at the code.
>

Heh, now that you mention it, I remember wanting to just remove the
arguably spurious assert. Linux is never doing it for reading. The
only state asserts made are for writing which works fine.

As for reading assertions, there is no performant way to make it work
and I don't think it is worth it as it is.

As such, I vote for just removing these 2 asserts. They really don't
buy anything to begin with.

-- 
Mateusz Guzik 
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


  1   2   3   >