Re: Reason why "nocache" option is not displayed in "mount"?

2024-03-11 Thread Alexander Leidinger

Am 2024-03-10 22:57, schrieb Konstantin Belousov:

We are already low on the free bits in the flags, even after expanding 
them
to 64bit.  More, there are useful common fs services continuously 
consuming

that flags, e.g. the recent NFS TLS options.

I object against using the flags for absolutely not important things, 
like

this nullfs "cache" option.

In long term, we would have to export nmount(2) strings since bits in
flags are finite, but I prefer to delay it as much as possible.


Why do you want to delay this? Personal priorities, or technical 
reasons?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Reason why "nocache" option is not displayed in "mount"?

2024-03-09 Thread Alexander Leidinger

Am 2024-03-09 15:27, schrieb Rick Macklem:

On Sat, Mar 9, 2024 at 5:08 AM Alexander Leidinger
 wrote:


Am 2024-03-09 06:07, schrieb Warner Losh:

> On Thu, Mar 7, 2024 at 1:05 PM Jamie Landeg-Jones 
> wrote:
>
>> Alexander Leidinger  wrote:
>>
>>> Hi,
>>>
>>> what is the reason why "nocache" is not displayed in the output of
>>> "mount" for nullfs options?
>>
>> Good catch. I also notice that "hidden" is not shown either.
>>
>> I guess that as for some time, "nocache" was a "secret" option, no-one
>> update "mount" to display it?
>
> So a couple of things to know.
>
> First, there's a list of known options. These are converted to a
> bitmask. This is then decoded and reported by mount. The other strings
> are passed to the filesystem directly. They decode it and do things,
> but they don't export them (that I can find). I believe that's why they
> aren't reported with 'mount'. There's a couple of other options in
> /etc/fstab that are pseudo options too.

That's the technical explanation why it doesn't work. I'm a step 
further
since initial mail, I even had a look at the code and know that 
nocache

is recorded in a nullfs private flag and that the userland can not
access this (mount looks at struct statfs which doesn't provide info 
to

this and some other things).

My question was targeted more in the direction if there is a 
conceptual
reason or if it was an oversight that it is not displayed. I admit 
that

this was lost in translation...

Regarding the issue of not being able to see all options which are in
effect for a given mount point (not specific to nocache): I consider
this to be a bug.
Pseudo options like "late" or "noauto" in fstab which don't make sense
to use when you use mount(8) a FS by hand, I do not consider here.
As a data point, I added the "-m"option to nfsstat(1) so that all the 
nfs

related options get displayed.

Part of the problem is that this will be file system specific, since 
nmount()

defers processing options to the file systems.


There exists values for a lot of the mount opions which are not 
displayed. For example the nocache option for nullfs is 
MNTK_NULL_NOCACHE in

https://cgit.freebsd.org/src/tree/sys/sys/mount.h#n515
This may not be useable as is, but I use it to show that there are 
already bits public about it, just not in the proper place to be useful 
to the userland.


Even FS specific options could be set as part of statfs (by letting the 
FS set them in struct statfs). Or there could be a per-mount callback / 
ioctl / whatever which provides the options in some way to the userland 
if requested.


So we either have something which could be used but requires some 
interface to let a FS set a value somewhere, or if this is a too gross 
hack, we would need to come up with a new interface to query this info.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Reason why "nocache" option is not displayed in "mount"?

2024-03-09 Thread Alexander Leidinger

Am 2024-03-09 06:07, schrieb Warner Losh:

On Thu, Mar 7, 2024 at 1:05 PM Jamie Landeg-Jones  
wrote:



Alexander Leidinger  wrote:


Hi,

what is the reason why "nocache" is not displayed in the output of
"mount" for nullfs options?


Good catch. I also notice that "hidden" is not shown either.

I guess that as for some time, "nocache" was a "secret" option, no-one
update "mount" to display it?


So a couple of things to know.

First, there's a list of known options. These are converted to a 
bitmask. This is then decoded and reported by mount. The other strings 
are passed to the filesystem directly. They decode it and do things, 
but they don't export them (that I can find). I believe that's why they 
aren't reported with 'mount'. There's a couple of other options in 
/etc/fstab that are pseudo options too.


That's the technical explanation why it doesn't work. I'm a step further 
since initial mail, I even had a look at the code and know that nocache 
is recorded in a nullfs private flag and that the userland can not 
access this (mount looks at struct statfs which doesn't provide info to 
this and some other things).


My question was targeted more in the direction if there is a conceptual 
reason or if it was an oversight that it is not displayed. I admit that 
this was lost in translation...


Regarding the issue of not being able to see all options which are in 
effect for a given mount point (not specific to nocache): I consider 
this to be a bug.
Pseudo options like "late" or "noauto" in fstab which don't make sense 
to use when you use mount(8) a FS by hand, I do not consider here.


I'm not sure if this warrants a bug tracker item (which maybe nobody is 
interested to take ownership of), or if we need to extend the man pages 
with info which option will not by displayed in the output of mounted 
FS, or both.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Reason why "nocache" option is not displayed in "mount"?

2024-03-07 Thread Alexander Leidinger

Am 2024-03-07 14:59, schrieb Christos Chatzaras:
what is the reason why "nocache" is not displayed in the output of 
"mount" for nullfs options?


# grep packages /etc/fstab.commit_leidinger_net
/shared/ports/packages  
/space/jails/commit.leidinger.net/shared/ports/packages nullfs 
 rw,noatime,nocache  0 0


# mount | grep commit | grep packages
/shared/ports/packages on 
/space/jails/commit.leidinger.net/shared/ports/packages (nullfs, 
local, noatime, noexec, nosuid, nfsv4acls)


Context: I wanted to check if poudriere is mounting with or without 
"nocache", and instead of reading the source I wanted to do it more 
quickly by looking at the mount options.


In my setup, I mount the /home directory using nullfs with the nocache 
option to facilitate access for certain jails. The primary reason for 
employing nocache is due to the implementation of ZFS quotas on the 
main system, which do not accurately reflect changes in file usage by 
users within the jail unless nocache is used. When files are added or 
removed by a user within jail, their disk usage wasn't properly updated 
on the main system until I started using nocache. Based on this 
experience, I'm confident that applying nocache works as expected in 
your scenario as well.


It does. The question is how to I _see_ that a mount point is _setup_ 
with nocache? In the above example the FS _is_ mounted with nocache, but 
it is _not displayed_ in the output.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Reason why "nocache" option is not displayed in "mount"?

2024-03-07 Thread Alexander Leidinger

Hi,

what is the reason why "nocache" is not displayed in the output of 
"mount" for nullfs options?


# grep packages /etc/fstab.commit_leidinger_net
/shared/ports/packages  
/space/jails/commit.leidinger.net/shared/ports/packages nullfs  
rw,noatime,nocache  0 0


# mount | grep commit | grep packages
/shared/ports/packages on 
/space/jails/commit.leidinger.net/shared/ports/packages (nullfs, local, 
noatime, noexec, nosuid, nfsv4acls)


Context: I wanted to check if poudriere is mounting with or without 
"nocache", and instead of reading the source I wanted to do it more 
quickly by looking at the mount options.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: February 2024 stabilization week

2024-02-24 Thread Alexander Leidinger

Am 2024-02-24 21:18, schrieb Konstantin Belousov:

On Fri, Feb 23, 2024 at 08:34:21PM -0800, Gleb Smirnoff wrote:

  Hi FreeBSD/main users,

the February 2024 stabilization week started with 03cc3489a02d that 
was tagged
as main-stabweek-2024-Feb.  At the moment of the tag creation we 
already knew

about several regression caused by libc/libsys split.

In the stabilization branch stabweek-2024-Feb we accumulated following 
cherry-picks

from FreeBSD/main:

1) closefrom() syscall was failing unless you have COMPAT_FREEBSD12 in 
kernel

   99ea67573164637d633e8051eb0a5d52f1f9488e
   eb90239d08863bcff3cf82a556ad9d89776cdf3f
2) nextboot -k broken on ZFS
   3aefe6759669bbadeb1a24a8956bf222ce279c68
   0c3ade2cf13df1ed5cd9db4081137ec90fcd19d0
3) libsys links to libc
   baa7d0741b9a2117410d558c6715906980723eed
4) sleep(3) no longer being a pthread cancellation point
   7d233b2220cd3d23c028bdac7eb3b6b7b2025125

We are aware of two regressions still unresolved:

1) libsys/rtld breaks bind 9.18 / mysql / java / ...
   https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277222

   Konstantin, can you please check me? Is this the same issue fixed 
by

   baa7d0741b9a2117410d558c6715906980723eed or a different one?

Most likely. Since no useful diagnostic was provided, I cannot confirm.


It is.
And for the curious reader: this affected a world which was build with 
WITH_BIND_NOW (ports build with RELRO and BIND_NOW were unaffected, as 
long as the basesystem was not build with BIND_NOW).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: sanitizers broken (was RE: libc/libsys split coming soon)

2024-02-22 Thread Alexander Leidinger

Am 2024-02-21 10:52, schrieb hartmut.bra...@dlr.de:

Hi,

I updated yesterday and now event a minimal program with

cc -fsanitize=address

produces

ld: error: undefined symbol: __elf_aux_vector
referenced by sanitizer_linux_libcdep.cpp:950 
(/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:950)
  sanitizer_linux_libcdep.o:(__sanitizer::ReExec()) in 
archive /usr/lib/clang/17/lib/freebsd/libclang_rt.asan-x86_64.a
cc: error: linker command failed with exit code 1 (use -v to see 
invocation)


I think this is caused by the libsys split.


There are other issues too. Discussed in multiple places.

I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277222 this 
morning, maybe it can be used to centralize the libsys issues (= I don't 
mind of you add a comment there, but maybe brooks wants to have a 
separate PR).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: segfault in ld-elf.so.1

2024-02-13 Thread Alexander Leidinger

Am 2024-02-13 01:58, schrieb Konstantin Belousov:

On Mon, Feb 12, 2024 at 11:54:02AM +0200, Konstantin Belousov wrote:

On Mon, Feb 12, 2024 at 10:35:56AM +0100, Alexander Leidinger wrote:
> Hi,
>
> dovecot (and no other program I use on this machine... at least not that I
> notice it) segfaults in ld-elf.so.1 after an update from 2024-01-18-092730
> to 2024-02-10-144617 (and now 2024-02-11-212006 in the hope the issue would
> have been fixed by changes to libc/libsys since 2024-02-10-144617). The
> issue shows up when I try to do an IMAP login. A successful authentication
> starts the imap process which immediately segfaults.
>
> I didn't recompile dovecot for the initial update, but I did now to rule
> out a regression in this area (and to get access via imap do my normal mail
> account).
>
>
> Backtrace:
The backtrace looks incomplete.  It might be the case of infinite 
recursion,

but I cannot claim it from the trace.

Does the program segfault if you run it manually?  If yes, please 
provide


No.

me with the tarball of the binary and all required shared libs, 
including

base system libraries, from your machine.


Regardless of my request, you might try the following.  Note that I did
not tested the patch, ensure that you have a way to recover ld-elf.so.1
if something goes wrong.


[inline patch]

This did the trick and I have IMAP access to my emails again. As this 
runs in a jail, it was easy to test without fear to kill something.


I will try the patch in the review next.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


kernel crash in tcp_subr.c:2386

2024-02-12 Thread Alexander Leidinger
Hi,

I got a coredump with sources from 2024-02-10-144617 (GMT+0100):
---snip---
__curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:57
57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
pcpu,
(kgdb) #0  __curthread () at
/space/system/usr_src/sys/amd64/include/pcpu_aux.h:57
td = 
#1  doadump (textdump=textdump@entry=1)
at /space/system/usr_src/sys/kern/kern_shutdown.c:403
error = 0
coredump = 
#2  0x8052fe85 in kern_reboot (howto=260)
at /space/system/usr_src/sys/kern/kern_shutdown.c:521
once = 0
__pc = 
#3  0x80530382 in vpanic (
fmt=0x808df476 "Assertion %s failed at %s:%d",
ap=ap@entry=0xfe08a079ebf0)
at /space/system/usr_src/sys/kern/kern_shutdownc:973
buf = "Assertion !callout_active(>t_callout) failed at
/space/system/usr_src/sys/netinet/tcp_subr.c:2386", '\000' 
__pc = 
__pc = 
__pc = 
other_cpus = {__bits = {14680063, 0 }}
td = 0xf8068ef99740
bootopt = 
newpanic = 
#4  0x805301d3 in panic (fmt=)
at /space/system/usr_src/sys/kern/kern_shutdown.c:889
ap = {{gp_offset = 32, fp_offset = 48,
overflow_arg_area = 0xfe08a079ec20,
reg_save_area = 0xfe08a079ebc0}}
#5  0x806c9d8c in tcp_discardcb (tp=tp@entry=0xf80af441ba80)
at /space/system/usr_src/sys/netinet/tcp_subr.c:2386
inp = 0xf80af441ba80
so = 0xf804d23d2780
m = 
isipv6 = 
#6  0x806d6291 in tcp_usr_detach (so=0xf804d23d2780)
at /space/system/usr_src/sys/netinet/tcp_usrreq.c:214
inp = 0xf80af441ba80
tp = 0xf80af441ba80
#7  0x805dba57 in sofree (so=0xf804d23d2780)
at /space/system/usr_src/sys/kern/uipc_socket.c:1205
pr = 0x80a8bd18 
#8  sorele_locked (so=so@entry=0xf804d23d2780)
at /space/system/usr_src/sys/kern/uipc_socket.c:1232
No locals.
#9  0x805dc8c0 in soclose (so=0xf804d23d2780)
at /space/system/usr_src/sys/kern/uipc_socket.c:1302
lqueue = {tqh_first = 0xf8068ef99740,
  tqh_last = 0xfe08a079ed40}
error = 0
saved_vnet = 0x0
last = 
listening = 
#10 0x804ccbd1 in fo_close (fp=0xf805f2dfc500, td=)
at /space/system/usr_src/sys/sys/file.h:390
No locals.
#11 _fdrop (fp=fp@entry=0xf805f2dfc500, td=,
td@entry=0xf8068ef99740)
at /space/system/usr_src/sys/kern/kern_descrip.c:3666
count = 
error = 
#12 0x804d02f3 in closef (fp=fp@entry=0xf805f2dfc500,
td=td@entry=0xf8068ef99740)
at /space/system/usr_src/sys/kern/kern_descrip.c:2839
_error = 0
_fp = 0xf805f2dfc500
lf = {l_start = -8791759350504, l_len = -8791759350528, l_pid = 0,
  l_type = 0, l_whence = 0, l_sysid = 0}
vp = 
fdtol = 
fdp = 
#13 0x804cd50c in closefp_impl (fdp=0xfe07afebf860, fd=19,
fp=0xf805f2dfc500, td=0xf8068ef99740, audit=)
at /space/system/usr_src/sys/kern/kern_descrip.c:1315
error = 
#14 closefp (fdp=0xfe07afebf860, fd=19, fp=0xf805f2dfc500,
td=0xf8068ef99740, holdleaders=true, audit=)
at /space/system/usr_src/sys/kern/kern_descrip.c:1372
No locals.
#15 0x808597d6 in syscallenter (td=0xf8068ef99740)
at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:186
se = 0x80a48330 
p = 0xfe07f29995c0
sa = 0xf8068ef99b30
error = 
sy_thr_static = 
traced = 
#16 amd64_syscall (td=0xf8068ef99740, traced=0)
at /space/system/usr_src/sys/amd64/amd64/trap.c:1192
ksi = {ksi_link = {tqe_next = 0xfe08a079ef30,
tqe_prev = 0x808588af }, ksi_info = {
si_signo = 1, si_errno = 0, si_code = 2015268872, si_pid = -512,
si_uid = 2398721856, si_status = -2042,
si_addr = 0xfe08a079ef40, si_value = {sival_int =
-1602621824,
  sival_ptr = 0xfe08a079ee80, sigval_int = -1602621824,
  sigval_ptr = 0xfe08a079ee80}, _reason = {_fault = {
_trapno = 1489045984}, _timer = {_timerid = 1489045984,
_overrun = 17999}, _mesgq = {_mqd = 1489045984}, _poll = {
_band = 77306605406688}, _capsicum = {_syscall =
1489045984},
  __spare__ = {__spare1__ = 77306605406688, __spare2__ = {
  1489814048, 17999, 208, 0, 0, 0, 992191072,
  ksi_flags = 975329968, ksi_sigq = 0x8082f8f3
}
#17 
No locals.
#18 0x3af13b17fc9a in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x3af13a225ab8
---snip---

Any ideas?

Due to another issue in userland, I updated to 2024-02-11-212006, but I
have the above mentioned version and core still in a BE if needed.

Bye,
Alexander.


segfault in ld-elf.so.1

2024-02-12 Thread Alexander Leidinger
Hi,

dovecot (and no other program I use on this machine... at least not that I
notice it) segfaults in ld-elf.so.1 after an update from 2024-01-18-092730
to 2024-02-10-144617 (and now 2024-02-11-212006 in the hope the issue would
have been fixed by changes to libc/libsys since 2024-02-10-144617). The
issue shows up when I try to do an IMAP login. A successful authentication
starts the imap process which immediately segfaults.

I didn't recompile dovecot for the initial update, but I did now to rule
out a regression in this area (and to get access via imap do my normal mail
account).


Backtrace:
---snip---
(lldb) target create "/usr/local/libexec/dovecot/imap" --core
"/var/run/dovecot/imap.core"
Core file '/var/run/dovecot/imap.core' (x86_64) was loaded.
* thread #1, name = 'imap', stop reason = signal SIGSEGV
  * frame #0: 0x4d3dfa2a4761 ld-elf.so.1`load_object [inlined]
object_match_name(obj=0x49a47c203408, name="") at rtld.c:5606:6
frame #1: 0x4d3dfa2a4742 ld-elf.so.1`load_object(name="", fd_u=-1,
refobj=0x49a47c228008, flags=0) at rtld.c:2704:10
frame #2: 0x4d3dfa2a3eaa ld-elf.so.1`dlopen_object(name="", fd=-1,
refobj=0x49a47c228008, lo_flags=0, mode=1,
lockstate=0x1ded0f98cb80) at rtld.c:3747:8
frame #3: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined]
load_filtee1(obj=, needed=0x49a47c2007c8,
flags=, lockstate=) at rtld.c:2576:16
frame #4: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined]
load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80)
at rtld.c:2589:2
frame #5: 0x4d3dfa2a223e
ld-elf.so.1`symlook_obj(req=0x1ded011502e0, obj=0x49a47c228008) at
rtld.c:4735:6
frame #6: 0x4d3dfa2a6992
ld-elf.so.1`symlook_list(req=0x1ded01150368, objlist=,
dlp=0x1ded011504b0) at rtld.c:4637:13
frame #7: 0x4d3dfa2a680b
ld-elf.so.1`symlook_global(req=0x1ded01150470,
donelist=0x1ded011504b0) at rtld.c:4541:8
frame #8: 0x4d3dfa2a6673
ld-elf.so.1`get_program_var_addr(name=,
lockstate=0x1ded0f98cb80) at rtld.c:4483:9
frame #9: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined]
distribute_static_tls(list=0x1ded01150988,
lockstate=0x1ded0f98cb80) at rtld.c:5908:6
frame #10: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1,
refobj=0x49a47c228008, lo_flags=0, mode=1,
lockstate=0x1ded0f98cb80) at rtld.c:3831:6
frame #11: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined]
load_filtee1(obj=, needed=0x49a47c2007c8,
flags=, lockstate=) at rtld.c:2576:16
frame #12: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined]
load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80)
at rtld.c:2589:2
frame #13: 0x4d3dfa2a223e
ld-elf.so.1`symlook_obj(req=0x1ded01150a80, obj=0x49a47c228008) at
rtld.c:4735:6
frame #14: 0x4d3dfa2a6992
ld-elf.so.1`symlook_list(req=0x1ded01150b08, objlist=,
dlp=0x1ded01150c50) at rtld.c:4637:13
frame #15: 0x4d3dfa2a680b
ld-elf.so.1`symlook_global(req=0x1ded01150c10,
donelist=0x1ded01150c50) at rtld.c:4541:8
frame #16: 0x4d3dfa2a6673
ld-elf.so.1`get_program_var_addr(name=,
lockstate=0x1ded0f98cb80) at rtld.c:4483:9
frame #17: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined]
distribute_static_tls(list=0x1ded01151128,
lockstate=0x1ded0f98cb80) at rtld.c:5908:6
frame #18: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1,
refobj=0x49a47c228008, lo_flags=0, mode=1,
lockstate=0x1ded0f98cb80) at rtld.c:3831:6
frame #19: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined]
load_filtee1(obj=, needed=0x49a47c2007c8,
flags=, lockstate=) at rtld.c:2576:16
frame #20: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined]
load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80)
at rtld.c:2589:2
frame #21: 0x4d3dfa2a223e
ld-elf.so.1`symlook_obj(req=0x1ded01151220, obj=0x49a47c228008) at
rtldc:4735:6
frame #22: 0x4d3dfa2a6992
ld-elf.so.1`symlook_list(req=0x1ded011512a8, objlist=,
dlp=0x1ded011513f0) at rtld.c:4637:13
frame #23: 0x4d3dfa2a680b
ld-elf.so.1`symlook_global(req=0x1ded011513b0,
donelist=0x1ded011513f0) at rtld.c:4541:8
frame #24: 0x4d3dfa2a6673
ld-elf.so.1`get_program_var_addr(name=,
lockstate=0x1ded0f98cb80) at rtld.c:4483:9
frame #25: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined]
distribute_static_tls(list=0x1ded011518c8,
lockstate=0x1ded0f98cb80) at rtld.c:5908:6
frame #26: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1,
refobj=0x49a47c228008, lo_flags=0, mode=1,
lockstate=0x1ded0f98cb80) at rtld.c:3831:6
frame #27: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined]
load_filtee1(obj=, needed=0x49a47c2007c8,
flags=, lockstate=) at rtld.c:2576:16
frame #28: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined]
load_filtees(obj=0x49a47c228008, flags=0, 

Re: noatime on ufs2

2024-01-29 Thread Alexander Leidinger

Am 2024-01-30 01:21, schrieb Warner Losh:

On Mon, Jan 29, 2024 at 2:31 PM Olivier Certner  
wrote:



It also seems undesirable to add a sysctl to control a value that the
kernel doesn't use.


The kernel has to use it to guarantee some uniform behavior 
irrespective of the mount being performed through mount(8) or by a 
direct call to nmount(2).  I think this consistency is important.  
Perhaps all auto-mounters and mount helpers always run mount(8) and 
never deal with nmount(2), I would have to check (I seem to remember 
that, a long time ago, when nmount(2) was introduced as an enhancement 
over mount(2), the stance was that applications should use mount(8) 
and not nmount(2) directly).  Even if there were no obvious callers of 
nmount(2), I would be a bit uncomfortable with this discrepancy in 
behavior.


I disagree. I think Mike's suggestion was better and dealt with POLA 
and POLA breaking in a sane way. If the default is applied universally 
in user space, then we need not change the kernel at all. We lose all 
the chicken and egg problems and the non-linearness of the sysctl idea.


I would like to add that a sysctl is some kind of a hidden setting, 
whereas /etc/fstab + /etc/defaults/fstab is a "right in the face" way of 
setting filesystem / mount related stuff.


[...]

It could also be generalized so that the FSTYPE could have different 
settings for different types of filesystem (maybe unique flags that 
some file systems don't understand).


+1

nosuid for tmpfs comes into my mind here...

One could also put it in /etc/defaults/fstab too and not break POLA 
since that's the pattern we use elsewhere.


+1

Anyway, I've said my piece. I agree with Mike that there's consensus 
for this from the installer, and after that consensus falls away. 
Mike's idea is one that I can get behind since it elegantly solves the 
general problem.


+1

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-26 Thread Alexander Leidinger

Am 2024-01-25 18:49, schrieb Rodney W. Grimes:

On Thu, Jan 25, 2024, 9:11?AM Ed Maste  wrote:

> On Thu, 25 Jan 2024 at 11:00, Rodney W. Grimes
>  wrote:
> >
> > > These will need to be addressed before actually removing any of these
> > > binaries, of course.
> >
> > You seem to have missed /rescue.  Now think about that long
> > and hard, these tools classified as so important that they
> > are part of /rescue.  Again I can not stress enough how often
> > I turn to these tools in a repair mode situation.
>
> I haven't missed rescue, it is included in the work in progress I
> mentioned. Note that rescue has included gpart since 2007.
>

What can fdisk and/or disklabel repair that gpart can't?


As far as I know there is no way in gpart to get to the
MBR cyl/hd/sec values, you can only get to the LBA start
and end values:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 8388513 (4095 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 1023/ head 15/ sector 63

gpart show ada0
=> 63  8388545  ada0  MBR  (4.0G)
   63  8388513 1  freebsd  [active]  (4.0G)
  8388576   32- free -  (16K)


What are you using cyl/hd/sec values for on a system which runs FreeBSD 
current or on which you would have to use FreeBSD-current in case of a 
repair need? What is the disk hardware on those systems that you still 
need cyl/hd/sec and LBA doesn't work? Serious questions out of 
curiosity.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: noatime on ufs2

2024-01-14 Thread Alexander Leidinger

Am 2024-01-15 00:08, schrieb Olivier Certner:

Hi Warner,


The consensus was we'd fix it in the installer.


Isn't speaking about a "consensus", at least as a general response to 
the idea of making 'noatime' the default, a little premature?  I have 
more to say on this topic (see below).  Also, I would not dismiss 
Lyndon's last mail too quickly, and in particular the last paragraph.  
I'm as interested as he is about possible answers for it.



We can't change ZFS easily, and discovery of the problem, should your
assertion be wrong, for UFS means metadata loss that can't be 
recovered.


Why ZFS would need changing?  If you're referring to an earlier 
objection from Mark Millard, I don't think it stands, please see my 
response to him.  Else, I don't know what you're referring to.


ZFS by default has atime=on. It is our installer which sets atime=off in 
the ZFS properties. I was understanding Warners comment about changing 
ZFS in the sense of changing the ZFS code to have a default of 
atime=off.


I agree with Warner that we should not do that. And in my opinion we 
should keep the other FS which support atime/noatime consistent (those 
which don't support atime/noatime due to technial limitations don't 
count in my opinion).


By pushing to the installer, most installations get most of benefits. 
And
people with special needs see the issue and can make an informed 
choice.


I agree for those who use the installer.  But I'm not sure which 
proportion of users they represent, and especially for those who care 
about disabling access times.  As for me, I don't think I have used the 
installer in the past 10 years (to be safe).  Is this such an atypical 
behavior?


I haven't used an installer myself since longer (either I was creating a 
new system by attaching a disk and prepping it from an existing system, 
or by creating an image and transferring it to the target over the 
network). But I would say this is atypical behavior by people which know 
exactly what they are doing, not what a normal consumer would do. Such 
experts know exactly what they want to do with atime and handle it as 
needed.



Additionally, the installer doesn't cover other use cases:
- Mounting filesystems by hand on USB keys or other removal medias 
(which are not in '/etc/fstab').  This causes users to have to remember 
to add 'noatime' on the command-line.


Those which care about that and know where this makes a difference, have 
it in their finger-memory.



- Using auto-mounters.  They have to be configured to use 'noatime'.


If our automounter is not able to handle that, it is a bug / missing 
feature we can change.
Personally I would have no objection to changing the automounter config 
to mount with noatime (if specifying noatime for a FS which don't 
support atime/noatime doesn't create failures).


- Desktop environments shipping a mount helper.  Again, they have to be 
configured, if at all possible.


If they are not able to handle that, it is a bug.
Typical media in desktop use-cases doesn't really need this. If you 
handle media which really _needs_ noatime in such a case, you may want 
to reconsider your way of operating.


So limiting action to the installer, while certainly a sensible and 
pragmatic step, still seems to miss a lot.


Nobody told to only limit this action to the installer. The pragmatic 
part here is to ask if it really matters for those use cases. For 
mounting by hand I disagree that it matters. For our automounter we 
should do something (at least making sure it is able to handle it, and 
if we don't want to swtich the default at least have a commented out 
entry in the config with a suitable comment). For the desktop helpers it 
is not our responsability, but interested people surely can file a bug 
report upstream.


Though in all honesty, I've never been able to measure a speed 
difference.

Nor have I worn out a ssd due to the tiny increase in write amp. Old
(<100MB) SD and CF cards included. This includes my armv7 based dns 
server
that I ran for a decade on a 256MB SD card with no special settings 
and
full read/write and lots of logging. So the harm is minimal typically. 
I'm
sure there are cases that it matters more than my experience. And it 
is
good practice to enable noatime. Just that failure to do so typically 
has

only a marginal effect.


It seemed to make a difference on slow USB keys (well, not just evenly 
slow, but which could exhibit strange pauses while writing), but I 
didn't gather enough hard data to call that "scientific".  I sometimes 
manage to saturate M2 SSD I/O bandwidth but then I always use 
'noatime', so not sure how much a difference it makes.  The "updatedb" 
scenario that runs 'find' causes access time updates for all 
directories, causing spikes in the number of writes which may affect 
the response time during the process.  That said, it is only run once a 
week by default.


I would say that most of the value of having 'noatime' the default is 
in 

Re: noatime on ufs2

2024-01-12 Thread Alexander Leidinger

Am 2024-01-11 18:15, schrieb Rodney W. Grimes:

Am 2024-01-10 22:49, schrieb Mark Millard:

> I never use atime, always noatime, for UFS. That said, I'd never
> propose
> changing the long standing defaults for commands and calls. I'd avoid:

[good points I fully agree on]

There's one possibility which nobody talked about yet... changing the
default to noatime at install time in fstab / zfs set.


Perhaps you should take a closer look at what bsdinstall does
when it creates a zfs install pool and boot environment, you
might just find that noatime is already set everywhere but
on /var/mail:

/usr/libexec/bsdinstall/zfsboot:: ${ZFSBOOT_POOL_CREATE_OPTIONS:=-O 
compress=lz4 -O atime=off}

/usr/libexec/bsdinstall/zfsboot:/var/mail   atime=on


While zfs is a part of what I talked about, it is not the complete 
picture. bsdinstall covers UFS and ZFS, and we should keep them in sync 
in this regard. Ideally with an option the user can modify. Personally I 
don't mind if the default setting for this option would be noatime. A 
quick serach in the scripts of bsdinstall didn't reveal to me what we 
use for UFS. I assume we use atime.


I fully agree to not violate POLA by changing the default to noatime 
in
any FS. I always set noatime everywhere on systems I take care about, 
no

exceptions (any user visible mail is handled via maildir/IMAP, not
mbox). I haven't made up my mind if it would be a good idea to change
bsdinstall to set noatime (after asking the user about it, and later
maybe offer  the possibility to use relatime in case it gets
implemented). I think it is at least worthwile to discuss this
possibility (including what the default setting of bsdinstall should 
be

for this option).


Little late... iirc its been that way since day one of zfs support
in bsdinstall.


Which I don't mind, as this is what I use anyway. But the correct way 
would be to let the user decide.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: noatime on ufs2

2024-01-10 Thread Alexander Leidinger

Am 2024-01-10 22:49, schrieb Mark Millard:

I never use atime, always noatime, for UFS. That said, I'd never 
propose

changing the long standing defaults for commands and calls. I'd avoid:


[good points I fully agree on]

There's one possibility which nobody talked about yet... changing the 
default to noatime at install time in fstab / zfs set.


I fully agree to not violate POLA by changing the default to noatime in 
any FS. I always set noatime everywhere on systems I take care about, no 
exceptions (any user visible mail is handled via maildir/IMAP, not 
mbox). I haven't made up my mind if it would be a good idea to change 
bsdinstall to set noatime (after asking the user about it, and later 
maybe offer  the possibility to use relatime in case it gets 
implemented). I think it is at least worthwile to discuss this 
possibility (including what the default setting of bsdinstall should be 
for this option).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: ZFS problems since recently ?

2024-01-02 Thread Alexander Leidinger

Am 2024-01-02 08:22, schrieb Kurt Jaeger:

Hi!


The sysctl for block cloning is vfs.zfs.bclone_enabled.
To check if a pool has made use of block cloning:
zpool get all poolname | grep bclone


One more thing:

I have two pools on that box, and one of them has some bclone files:

# zpool get all ref | grep bclone
ref   bcloneused 21.8M  -
ref   bclonesaved24.4M  -
ref   bcloneratio2.12x  -
# zpool get all pou | grep bclone
pou   bcloneused 0  -
pou   bclonesaved0  -
pou   bcloneratio1.00x  -

The ref pool contains the system and some files.
The pou pool is for poudriere only.

How do I find which files on ref are bcloned and how can I remove the
bcloning from them ?


No idea about the detection (I don't expect an easy way), but the answer 
to the second part is to copy the files after disabling block cloning. 
As this is system stuff, I would expect it is not much data, and you 
could copy everything and then move back to the original place. I would 
also assume original log files are not affected, and only files which 
were copied (installworld or installkernel or backup files or manual 
copies or port install (not sure about pkg install)) are possible 
targets.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: bridge(4) and IPv6 broken?

2024-01-01 Thread Alexander Leidinger

Am 2024-01-02 00:40, schrieb Lexi Winter:

hello,

i'm having an issue with bridge(4) and IPv6, with a configuration which
is essentially identical to a working system running releng/14.0.

ifconfig:

lo0: flags=1008049 metric 0 mtu 
16384

options=680003
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
groups: lo
nd6 options=21
pflog0: flags=1000141 metric 0 mtu 33152
options=0
groups: pflog
alc0: 
flags=1008943 
metric 0 mtu 1500


options=c3098
ether 30:9c:23:a8:89:a0
inet6 fe80::329c:23ff:fea8:89a0%alc0 prefixlen 64 scopeid 0x3
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=1
wg0: flags=10080c1 metric 0 mtu 
1420

options=8
inet 172.16.145.21 netmask 0x
inet6 fd00:0:1337:cafe:::829a:595e prefixlen 128
groups: wg
tunnelfib: 1
nd6 options=101
bridge0: flags=1008843 
metric 0 mtu 1500

options=0
ether 58:9c:fc:10:ff:b6
inet 10.1.4.101 netmask 0xff00 broadcast 10.1.4.255
inet6 2001:8b0:aab5:104:3::101 prefixlen 64
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap0 flags=143
ifmaxaddr 0 port 6 priority 128 path cost 200
member: alc0 flags=143
ifmaxaddr 0 port 3 priority 128 path cost 55
groups: bridge
nd6 options=1
tap0: flags=9903 metric 0 
mtu 1500

options=8
ether 58:9c:fc:10:ff:89
groups: tap
media: Ethernet 1000baseT 
status: no carrier
nd6 options=29

the issue is that the bridge doesn't seem to respond to IPv6 ICMP
Neighbour Solicitation.  for example, while running ping, tcpdump shows
this:

23:30:16.567071 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 13, length 16
23:30:16.634860 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32
23:30:17.567080 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 14, length 16
23:30:17.674842 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32
23:30:17.936956 1e:ab:48:c1:f6:62 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 166: fe80::1cab:48ff:fec1:f662 > ff02::1: ICMP6, 
router advertisement, length 112
23:30:18.567093 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 15, length 16
23:30:19.567104 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 16, length 16
23:30:19.567529 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32


fe80::1cab:48ff:fec1:f662 is the subnet router; it's sending
solicitations but FreeBSD doesn't send a response,

if i remove alc0 from the bridge and configure the IPv6 address 
directly

on alc0 instead, everything works fine.

i'm testing without any packet filter (ipfw/pf) in the kernel.

it's possible i'm missing something obvious here; does anyone have an
idea?


Just an idea. I'm not sure if it is the right track...

There is code in the kernel which is ignoring NS stuff from "non-valid" 
sources (security / spoofing reasons). The NS request is from a link 
local address. Your bridge has no link local address (and your tap has 
the auto linklocal flag set which I would have expected to be on the 
bridge instead). I'm not sure but I would guess it could be because of 
this.


If my guess is not too far off, I would suggest to try:
 - remove auto linklocal from tap0 (like for alc0)
 - add auto linklocal to bridge0

If this doesn't help, there is the sysctl 
net.inet6.icmp6.nd6_onlink_ns_rfc4861 which you could try to set to 1. 
Please read 
https://www.freebsd.org/security/advisories/FreeBSD-SA-08:10.nd6.asc 
before you do that.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: ZFS problems since recently ?

2024-01-01 Thread Alexander Leidinger

Am 2023-12-31 19:34, schrieb Kurt Jaeger:

I already have

vfs.zfs.dmu_offset_next_sync=0

which is supposed to disable block-cloning.


It isn't. This one is supposed to fix an issue which is unrelated to 
block cloning (but can be amplified by block cloning). This issue is 
fixed since some weeks, your Dec 23 build should not need it (when the 
issues happens, you have files with zero as parts of the data instead of 
the real data, and only if you copy files at the same time as those 
files are modified, and then only if you happen to get the timing 
right).


The sysctl for block cloning is vfs.zfs.bclone_enabled.
To check if a pool has made use of block cloning:
zpool get all poolname | grep bclone

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


What is rc.d/opensm?

2023-11-24 Thread Alexander Leidinger

Hi,

for my work on service jails (https://reviews.freebsd.org/D40370) I try 
to find out what opensm is. On my amd64 system I don't have a man page 
nor the binary (and man.freebsd.org doesn't know either about opensm).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: openzfs and block cloning question

2023-11-24 Thread Alexander Leidinger

Am 2023-11-24 08:10, schrieb Oleksandr Kryvulia:

Hi,
Recently cperciva@ published in his twitter [1] that enabling block 
cloning feature tends to data lost on 14.  Is this statement true for 
the current? Since I am using current for daily work and block cloning 
enabled by default how can I verify that my data is not affected?

Thank you.


Block cloning may have an issue, or it does things which amplifies an 
old existing issue, or there are two issues...

The full story is at
https://github.com/openzfs/zfs/issues/15526

To be on the safe side, you may want to have 
vfs.zfs.dmu_offset_next_sync=0 (loader.conf / sysctl.conf) for the 
moment.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Request for Testing: TCP RACK

2023-11-17 Thread Alexander Leidinger

Am 2023-11-17 14:29, schrieb void:

On Thu, Nov 16, 2023 at 10:13:05AM +0100, tue...@freebsd.org wrote:


You can load the kernel module using
kldload tcp_rack

You can make the RACK stack the default stack using
sysctl net.inet.tcp.functions_default=rack


Hi, thank you for this.

https://klarasystems.com/articles/using-the-freebsd-rack-tcp-stack/ 
mentions

this needs to be set in /etc/src.conf :

WITH_EXTRA_TCP_STACKS=1

Is this still the case? Context here is -current both in a vm and bare
metal, on various machines, on various connections, from DSL to 10Gb.


On a recent -current: this is not needed anymore, it is part of the 
defaults now. But you may still compile the kernel with "option TCPHPTS" 
(until it's added to the defaults too).


Is there a method (yet) for enabling this functionality in various 
-RELENG

maybe where one can compile in a vm built for that purpose, then
transferring to the production vm?


Copy the kernel which was build according to the acticle from klara 
systems to your target VM.



Would it be expected to work on arm64?


Yes (I use it on an ampere VM in the cloud).

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: poudriere job && find jobs which received signal 11

2023-10-18 Thread Alexander Leidinger

Am 2023-10-18 09:54, schrieb Matthias Apitz:

Hello,

I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports,
from git October 14, 2023. In the last two day 2229 packages were
produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken).

This morning I was looking for something in /var/log/messages and
accidentally I detected that yesterday a few compilations failed:

# grep 'signal 11' /var/log/messages | grep -v conftest
Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)


As I said, without that any of the 2229 jobs were failing:

# cd 
/usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg

# ls -C1  | wc -l
2229
# grep -l 'build failure' *
p5-Gtk2-1.24993_3.log

How this is possible, that the make engines didn't failing? The uid


That can be part of configure runs which try to test some features.

65534 is the one used by poudriere, can I use the jid 24 somehow to 
find

the job which received the signal 11? Or is the time the only way to


jid = jail ID, the first column in the output of "jls". If you have the 
poudriere runtime logs (where it lists which package it is processing 
ATM), you will see a number from 1 to the max number of jails which run 
in parallel. This number is part of the hostname of the jail. So if you 
have the poudriere jails still running, you can make a mapping from the 
jid to the name to the number, and together with the time you can see 
which package it was building at that time. Unfortunately poudriere 
doesn't list the hostname of the builder nor the jid (feature request 
anyone?).


Example poudriere runtime log:
---snip---
[00:54:11] [03] [00:00:00] Building security/nss | nss-3.94
[00:56:46] [03] [00:02:35] Finished security/nss | nss-3.94: Success
[00:56:47] [03] [00:00:00] Building textproc/gsed | gsed-4.9
[00:57:41] [01] [00:06:18] Finished x11-toolkits/gtk30 | gtk3-3.24.34_1: 
Success

[00:57:42] [01] [00:00:00] Building devel/qt6-base | qt6-base-6.5.3
---snip---

While poudriere is running, jls reports this:
---snip---
# jls jid host.hostname
[...]
91 poudriere-bastille-default
92 poudriere-bastille-default
93 poudriere-bastille-default-job-01
94 poudriere-bastille-default-job-01
95 poudriere-bastille-default-job-02
96 poudriere-bastille-default-job-03
97 poudriere-bastille-default-job-02
98 poudriere-bastille-default-job-03
---snip---

So if we assume a coredump in jid 96 or 98, this means it was in builder 
3.
nss and gseed where build by poudriere builder number 3 (both about 56 
minutes after start of poudriere), and gtk30 and qt6-base by poudriere 
builder number 1.
If we assume further that the coredumps are in the timerange of 54 to 56 
minutes after the poudriere start, the logs of nss may have a trace of 
it (or not, if it was part of configure, then you would have to do the 
configure run and check the messages if it generates similar coredumps)



look, which of the 4 poudriere engines were running at this time?
I'd like to rerun/reproduce the package again.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: issue: poudriere jail update fails after recent changes around certctl

2023-10-14 Thread Alexander Leidinger

Am 2023-10-13 17:42, schrieb Dag-Erling Smørgrav:

Alexander Leidinger  writes:

some change around certctl (world from 2023-10-09) has broken the
poudriere jail update command. The complete install finishes, certctl
is run, and then there is an exit code 1. This is because I have some
certs listed as untrusted, and this seems to give a retval of 1 inside
certctl.


This only happens if a certificate is listed as both trusted and
untrusted, and I'm pretty sure the previous version would return 1 in
that case as well.  Can you check?


I compared /usr/share/certs/untrusted/ with /usr/share/certs/trusted/ 
and some of them match with certs in /usr/share/certs/trusted/. Nothing 
in /usr/local/etc/ssl/untrusted/, one cert (as hash) in 
/usr/local/etc/ssl/blacklisted/ which is also in 
/usr/share/certs/untrusted/.


If FreeBSD provides some certs as trusted (as part of e.g. 
installworld), and I have some of them listed in untrusted, I would not 
expect an error case, but a failsafe action of not trusting them and not 
complaining... am I doing something wrong?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


issue: poudriere jail update fails after recent changes around certctl

2023-10-13 Thread Alexander Leidinger

Hi,

some change around certctl (world from 2023-10-09) has broken the 
poudriere jail update command. The complete install finishes, certctl is 
run, and then there is an exit code 1. This is because I have some certs 
listed as untrusted, and this seems to give a retval of 1 inside 
certctl.


Testcase: set a cert as untrusted and try to use "poudriere jail -u -j 
YOUR_JAIL_NAME -m src=/usr/src"


Relevant log:
---snip---
--

Installing everything completed on Fri Oct 13 10:00:04 CEST 2023

--
   83.55 real   103.83 user   109.42 sys
certctl.sh: Skipping untrusted certificate ad088e1d 
(/space/poudriere/jails/poudriere-x11/etc/ssl/untrusted/ad088e1d.0)

[some more untrusted]
*** [installworld] Error code 1

make[1]: stopped in /space/system/usr_src
1 error

make[1]: stopped in /space/system/usr_src

make: stopped in /usr/src
[00:01:32] Error: Failed to 'make installworld'
---snip---

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: git: 989c5f6da990 - main - freebsd-update: create deep BEs by default [really about if -r for bectl create should just go away]

2023-10-12 Thread Alexander Leidinger

Am 2023-10-12 07:08, schrieb Mark Millard:


I use the likes of:

BE   Active Mountpoint Space Created
build_area_for-main-CA72 -  -  1.99G 2023-09-20 10:19
main-CA72NR /  4.50G 2023-09-21 10:10

NAMECANMOUNT  MOUNTPOINT
zopt0   on/zopt0
. . .
zopt0/ROOT  onnone
zopt0/ROOT/build_area_for-main-CA72 noautonone
zopt0/ROOT/main-CA72noautonone
zopt0/poudriere on
/usr/local/poudriere
zopt0/poudriere/dataon
/usr/local/poudriere/data
zopt0/poudriere/data/.m on
/usr/local/poudriere/data/.m
zopt0/poudriere/data/cache  on
/usr/local/poudriere/data/cache
zopt0/poudriere/data/images on
/usr/local/poudriere/data/images
zopt0/poudriere/data/logs   on
/usr/local/poudriere/data/logs
zopt0/poudriere/data/packages   on
/usr/local/poudriere/data/packages
zopt0/poudriere/data/wrkdirson
/usr/local/poudriere/data/wrkdirs
zopt0/poudriere/jails   on
/usr/local/poudriere/jails
zopt0/poudriere/ports   on
/usr/local/poudriere/ports

zopt0/tmp   on/tmp
zopt0/usr   off   /usr
zopt0/usr/13_0R-src on/usr/13_0R-src
zopt0/usr/alt-main-src  on/usr/alt-main-src
zopt0/usr/home  on/usr/home
zopt0/usr/local on/usr/local


[...]


If such ends up as unsupportable, it will effectively eliminate my
reason for using bectl (and, so, zfs): the sharing is important to
my use.


Additionally/complementary to what Kyle said...

The -r option is about
zop0/ROOT/main-CA72
zop0/ROOT/main-CA72/subDS1
zop0/ROOT/main-CA72/subDS2

A shallow clone is only taking zop0/ROOT/main-CA72 into account, while a 
-r clone is also cloning subDS1 and subDS2.


So as Kyle said, your (and my) use case are not affected by this.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


base-krb5 issues (segfaults when adding principals in openssl)

2023-10-03 Thread Alexander Leidinger

Hi,

has someone else issues with krb5 on -current when adding principals?

With -current as of 2023-09-11 I get a segfault in openssl:
---snip---
Reading symbols from /usr/bin/kadmin...
Reading symbols from /usr/lib/debug//usr/bin/kadmin.debug...
[New LWP 270171]
btCore was generated by `kadmin -l'.
Program terminated with signal SIGSEGV, Segmentation fault.
Address not mapped to object.
#0  0x in ?? ()
(gdb) bt
#0  0x in ?? ()
#1  0x0e118da145f8 in ARCFOUR_string_to_key (context=0x44f9fba1a000, 
enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., 
opaque=..., key=0x44f9fba211d8)

at /space/system/usr_src/crypto/heimdal/lib/krb5/salt-arcfour.c:84
#2  0x0e118da156e9 in krb5_string_to_key_data_salt_opaque 
(enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, salt=..., opaque=..., 
context=, password=...,
key=) at 
/space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:201
#3  krb5_string_to_key_data_salt (context=0x44f9fba1a000, 
enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., 
key=0x44f9fba211d8)

at /space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:173
#4  0x0e118da158cb in krb5_string_to_key_salt 
(context=0x44f9fba4bc60, context@entry=0x44f9fba1a000, 
enctype=-1980854121, password=0x0,
password@entry=0xe1189ee9510 "1kad$uwi6!", salt=..., key=0x5) at 
/space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:225
#5  0x0e118ba75423 in hdb_generate_key_set_password 
(context=0x44f9fba1a000, principal=, 
password=password@entry=0xe1189ee9510 "1kad$uwi6!",
keys=keys@entry=0xe1189ee9210, 
num_keys=num_keys@entry=0xe1189ee9208) at 
/space/system/usr_src/crypto/heimdal/lib/hdb/keys.c:381
#6  0x0e118ca91c9a in _kadm5_set_keys 
(context=context@entry=0x44f9fba1a140, ent=ent@entry=0xe1189ee9258, 
password=0x1 ,
password@entry=0xe1189ee9510 "1kad$uwi6!") at 
/space/system/usr_src/crypto/heimdal/lib/kadm5/set_keys.c:51
#7  0x0e118ca8caac in kadm5_s_create_principal 
(server_handle=0x44f9fba1a140, princ=, mask=out>, password=0xe1189ee9510 "1kad$uwi6!")

at /space/system/usr_src/crypto/heimdal/lib/kadm5/create_s.c:172
#8  0x0e0969e1a57b in add_one_principal (name=, 
rand_key=0, rand_password=0, use_defaults=0, password=0xe1189ee9510 
"1kad$uwi6!", key_data=0x0,
max_ticket_life=, max_renewable_life=, 
attributes=0x0, expiration=, pw_expiration=0x0)

at /space/system/usr_src/crypto/heimdal/kadmin/ank.c:141
#9  add_new_key (opt=opt@entry=0xe1189ee9960, argc=argc@entry=1, 
argv=0x44f9fba49238, argv@entry=0x44f9fba49230) at 
/space/system/usr_src/crypto/heimdal/kadmin/ank.c:243
#10 0x0e0969e1e124 in add_wrap (argc=, 
argv=0x44f9fba49230) at kadmin-commands.c:210
#11 0x0e0969e23945 in sl_command (cmds=, argc=2, 
argv=0x44f9fba49230) at 
/space/system/usr_src/crypto/heimdal/lib/sl/sl.c:209
#12 sl_command_loop (cmds=cmds@entry=0xe0969e282a0 , 
prompt=prompt@entry=0xe0969e15cca "kadmin> ", data=)

at /space/system/usr_src/crypto/heimdal/lib/sl/sl.c:328
#13 0x0e0969e1d876 in main (argc=, argv=out>) at /space/system/usr_src/crypto/heimdal/kadmin/kadmin.c:275

(gdb) up 1
#1  0x0e118da145f8 in ARCFOUR_string_to_key (context=0x44f9fba1a000, 
enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., 
opaque=..., key=0x44f9fba211d8)

at /space/system/usr_src/crypto/heimdal/lib/krb5/salt-arcfour.c:84
84  EVP_DigestUpdate (m, , 1);
(gdb) list
79
80  /* LE encoding */
81  for (i = 0; i < len; i++) {
82  unsigned char p;
83  p = (s[i] & 0xff);
84  EVP_DigestUpdate (m, , 1);
85  p = (s[i] >> 8) & 0xff;
86  EVP_DigestUpdate (m, , 1);
87  }
88
(gdb) print i
$1 = 0
(gdb) print len
$2 = 
(gdb) print p
$3 = 49 '1'
(gdb) print m
$4 = (EVP_MD_CTX *) 0x43e31de4bc60
(gdb) print *m
$5 = {reqdigest = 0x17e678afd470, digest = 0x0, engine = 0x0, flags = 0, 
md_data = 0x0, pctx = 0x0, update = 0x0, algctx = 0x0, fetched_digest = 
0x0}

(gdb)
---snip---

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Speed improvements in ZFS

2023-09-15 Thread Alexander Leidinger

Am 2023-09-15 13:40, schrieb George Michaelson:

Not wanting to hijack threads I am interested if any of this can 
translate back up tree and make Linux ZFS faster.


And, if there are simple sysctl tuning worth trying in large (tb) 
memory model pre 14 FreeBSD systems with slow zfs. Older freebsd alas.


The current part of the discussion is not really about ZFS (I use a lot 
of nullfs on top of ZFS). So no to the first part.


The tuning I did (maxvnodes) doesn't really depend on the FreeBSD 
version, but on the number of files touched/contained in the FS. The 
only other change I made is updating the OS itself, so this part doesn't 
apply to pre 14 systems.


If you think your ZFS (with a large ARC) is slow, you need to review 
your primary cache settings per dataset, check the arcstats, and maybe 
think about a 2nd level arc on fast storage (cache device on nvm or 
ssd). IF you have a read-once workload, nothing of this will help. So 
all depends on your workload.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF

signature.asc
Description: OpenPGP digital signature


Re: Speed improvements in ZFS

2023-09-15 Thread Alexander Leidinger

Am 2023-09-04 14:26, schrieb Mateusz Guzik:

On 9/4/23, Alexander Leidinger  wrote:

Am 2023-08-28 22:33, schrieb Alexander Leidinger:

Am 2023-08-22 18:59, schrieb Mateusz Guzik:

On 8/22/23, Alexander Leidinger  wrote:

Am 2023-08-21 10:53, schrieb Konstantin Belousov:
On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger 
wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger 
> > >> wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger 
> > >>>> wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
> > >>>>> interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has
> > >>> several
> > >>> null mounts. One basesystem mounted into every jail, and then
> > >>> shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order
> > >>>> to
> > >>>> do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes
> > >>>> from
> > >>>> exclusive locking which should not be there to begin with --
> > >>>> as
> > >>>> in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of
> > >> xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how
> > this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
> for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS
exported.
6 of those nullfs mounts are also exported via Samba. The NFS
exports
shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this
exclusive
lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I
need
space for. Both pools reside on the same disks. The root pool is a
3-way
mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
space-pool. The jails are all basejail-style jails.



While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.


dtrace looks to me like a faster approach to get to the root than
printf... my first naive try is to detect exclusive locks. I'm not 
100%

sure I got it right, but at least dtrace doesn't complain about it:
---snip---
#pragma D option dynvarsize=32m

fbt:nullfs:null_lock:entry
/args[0]->a_flags & 0x08 != 0/
{
stack();
}
---snip---

In which direction should I look with dtrace if this works in 
tonights

run of periodic? I don't have enough knowledge about VFS to come up
with some immediate ideas.


After your sysctl fix for maxvnodes I increased the amount of vnodes 
10

times compared to the initial report. This has increased the speed of
the operation, the find runs in all those jails finished today after 
~5h

(@~8am) instead of in the afternoon as before. Could this suggest that
in parallel some null_reclaim() is running which does the exclusive
locks and slows down the entire operation?



That may be a slowdown to some extent, but the primary problem is
exclusive vnode locking for stat lookup, which should not be
happening.


With -current as of 2023-09-03 (and right now 2023-09-11), the periodic 
daily runs are down to less than an hour... and this didn't happen 
directly after switching to 2023-09-13. First it went down to 4h, then 
down to 1h without any update of the OS. The only thing what I did was 
modifying the number of maxfiles. First to some huge amount after your 
commit in the sysctl affecting part. Then after noticing way more 
freevnodes than configured down to 5.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: sed in CURRENT fails in textproc/jq

2023-09-11 Thread Alexander Leidinger

Am 2023-09-10 18:53, schrieb Robert Clausecker:

Hi Warner,

Thank you for your response.

Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh:

On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker  wrote:

> Hi Warner,
>
> I have pushed a fix.  It should hopefully address those failing tests.
> The same issue should also affect memcmp(), but unlike for memchr(), it is
> illegal to pass a length to memcmp() that extends past the actual end of
> the buffer as memcmp() is permitted to examine the whole buffer regardless
> of where the first mismatch is.
>
> I am considering a change to improve the behaviour of memcmp() on such
> errorneous inputs.  There are two options: (a) I could change memcmp() the
> same way I fixed memchr() and have implausible buffer lengths behave as if
> the buffer goes to the end of the address space or (b) I could change
> memcmp() to crash loudly if it detects such a case.  I could also
> (c) leave memcmp() as is.  Which of these three choices is preferable?
>

What does the standard say? I'm highly skeptical that these corner 
cases are

UB behavior.

I'd like actual support for this statement, rather than your 
conjecture

that it's
illegal. Even if you can come up with that, preserving the old 
behavior is

my
first choice. Especially since many of these functions aren't well 
defined

by
a standard, but are extensions.

As for memchr,
https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html
has no such permission to examine 'the entire buffer at once' nor any
restirction
as to the length extending beyond the address space. I'm skeptical of 
your

reading
that it allows one to examine all of [b, b + len), so please explain 
where

the standard
supports reading past the first occurance.


memchr() in particular is specified to only examine the input until the
matching character is found (ISO/IEC 9899:2011 § 7.24.5.1):

***
The memchr function locates the first occurrence of c (converted to an
unsigned char) in the initial n characters (each interpreted as 
unsigned

char) of the object pointed to by s. The implementation shall behave as
if it reads the characters sequentially and stops as soon as a matching
character is found.
***

Therefore, it appears reasonable that calls with fake buffer lengths
(e.g. SIZE_MAX, to read until a mismatch occurs) must be supported.
However, memcmp() has no such language and the text explicitly states
that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1):

***
The memcmp function compares the first n characters of the object
pointed to by s1 to the first n characters of the object pointed to by 
s2.

***

By omission, this seems to give license to e.g. implement memcmp() like
timingsafe_memcmp() where it inspects all n characters of both buffers
and only then gives a result.  So if n is longer than the actual buffer
(e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be
a crash due to crossing into an unmapped page).

Thus I have patched memchr() to behave correctly when length SIZE_MAX 
is

given (commit b2618b65).  My memcmp() suffers from similarly flawed
logic and may need to be patched.  However, as the language I cited 
above

does not indicate that such usage needs to be supported for memcmp()
(whereas it must be for memchr(), contrary to my assumptions), I was
asking you for how to proceed with memcmp (hence choices (a)--(c)).


My 2ct:
What did the previous implementation of memcmp() do in this case?
 - If it was generous and behaved similar to the requirements of
   memchr(), POLA requires to have the same now too.
 - If it was crashing or silently going on (= lurking bugs in 3rd
   party code), we may have the possibility to do a coredump in case
   of running past the end of the buffer to prevent malicous use.
 - In general I go with the robustness principle, "be liberal what you
   accept, but strict in what you provide" = memcmp() should behave
   as if it is supported.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: 100% CPU time for sysctl command, not killable

2023-09-07 Thread Alexander Leidinger

Am 2023-09-03 21:22, schrieb Alexander Leidinger:

Am 2023-09-02 16:56, schrieb Mateusz Guzik:

On 8/20/23, Alexander Leidinger  wrote:

Hi,

sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable

sysctl program. This is somewhat unexpected...



fixed here 
https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3


I confirm.


There may be dragons...:
kern.maxvnodes: 1048576000
vfs.wantfreevnodes: 262144000
vfs.freevnodes: 0  <---
vfs.vnodes_created: 11832359
vfs.numvnodes: 146699
vfs.recycles_free: 4700765
vfs.recycles: 0
vfs.vnode_alloc_sleeps: 0

Another time I got an insanely huge amount of free vnodes (more than 
maxvnodes).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Speed improvements in ZFS

2023-09-04 Thread Alexander Leidinger

Am 2023-08-28 22:33, schrieb Alexander Leidinger:

Am 2023-08-22 18:59, schrieb Mateusz Guzik:

On 8/22/23, Alexander Leidinger  wrote:

Am 2023-08-21 10:53, schrieb Konstantin Belousov:

On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger  wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger 
> > >>>> wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
> > >>>>> interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has
> > >>> several
> > >>> null mounts. One basesystem mounted into every jail, and then
> > >>> shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order to
> > >>>> do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes from
> > >>>> exclusive locking which should not be there to begin with -- as
> > >>>> in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of
> > >> xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how
> > this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
> for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS
exported.
6 of those nullfs mounts are also exported via Samba. The NFS 
exports

shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this 
exclusive

lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I 
need
space for. Both pools reside on the same disks. The root pool is a 
3-way

mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
space-pool. The jails are all basejail-style jails.



While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.


dtrace looks to me like a faster approach to get to the root than 
printf... my first naive try is to detect exclusive locks. I'm not 100% 
sure I got it right, but at least dtrace doesn't complain about it:

---snip---
#pragma D option dynvarsize=32m

fbt:nullfs:null_lock:entry
/args[0]->a_flags & 0x08 != 0/
{
stack();
}
---snip---

In which direction should I look with dtrace if this works in tonights 
run of periodic? I don't have enough knowledge about VFS to come up 
with some immediate ideas.


After your sysctl fix for maxvnodes I increased the amount of vnodes 10 
times compared to the initial report. This has increased the speed of 
the operation, the find runs in all those jails finished today after ~5h 
(@~8am) instead of in the afternoon as before. Could this suggest that 
in parallel some null_reclaim() is running which does the exclusive 
locks and slows down the entire operation?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: 100% CPU time for sysctl command, not killable

2023-09-03 Thread Alexander Leidinger

Am 2023-09-02 16:56, schrieb Mateusz Guzik:

On 8/20/23, Alexander Leidinger  wrote:

Hi,

sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable

sysctl program. This is somewhat unexpected...



fixed here 
https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3


I confirm.

Thanks!
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: 100% CPU time for sysctl command, not killable

2023-08-30 Thread Alexander Leidinger

Am 2023-08-20 21:23, schrieb Alexander Leidinger:


Am 2023-08-20 18:55, schrieb Mina Galić:
procstat(1) kstack could be helpful here.

 Original Message 
On 20 Aug 2023, 17:29, Alexander Leidinger alexan...@leidinger.net> 
wrote:
Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable sysctl program. This is somewhat unexpected... Bye, 
Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 
0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 
0x8F31830F9F2772BF


  PIDTID COMMTDNAME  KSTACK
94391 118678 sysctl  -   sysctl_maxvnodes 
sysctl_root_handler_locked sysctl_root userland_sysctl sys___sysctl 
amd64_syscall fast_syscall_common


I experimented a bit by multiplying my initial value of 104857600. It 
fails between 5 and 6 times the initial value.


sysctl kern.maxvnodes=524288000 is successful within 4 seconds.

sysctl kern.maxvnodes=629145600 goes into a loop with the same procstat 
-k output.


Bye,

Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF

Re: Possible issue with linux xattr support?

2023-08-29 Thread Alexander Leidinger

Am 2023-08-29 21:31, schrieb Felix Palmen:

* Shawn Webb  [20230829 15:25]:

On Tue, Aug 29, 2023 at 09:15:03PM +0200, Felix Palmen wrote:
> * Kyle Evans  [20230829 14:07]:
> > On 8/29/23 14:02, Shawn Webb wrote:
> > > Back in 2019, I had a similar issue: I needed access to be able to
> > > read/write to the system extended attribute namespace from within a
> > > jailed context. I wrote a rather simple patch that provides that
> > > support on a per-jail basis:
> > >
> > > 
https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/96c85982b45e44a6105664c7068a92d0a61da2a3
> > >
> > > Hopefully that's useful to someone.
> > >
> > > Thanks,
> > >
> >
> > FWIW (which likely isn't much), I like this approach much better; it makes
> > more sense to me that it's a feature controlled by the creator of the jail
> > and not one allowed just by using a compat ABI within a jail.
>
> Well, a typical GNU userland won't work in a jail without this, that's
> what I know now. But I'm certainly with you, it doesn't feel logical
> that a Linux binary can do something in a jail a FreeBSD binary can't.
>
> So, indeed, making it a jail option sounds better.
>
> Unless, bringing back a question raised earlier in this thread: What's
> the reason to restrict this in a jailed context in the first place? IOW,
> could it just be allowed unconditionally?

In HardenedBSD's case, since we use filesystem extended attributes to
toggle exploit mitigations on a per-application basis, there's now a
conceptual security boundary between the host and the jail.

Should the jail and the host share resources, like executables, a
jailed process could toggle an exploit mitigation, and the toggle
would bubble up to the host. So the next time the host executed
/shared/app/executable/here, the security posture of the host would be
affected.


Isn't the sane approach here *not* to share any executables with a jail
other than via a read-only nullfs mount?


In https://reviews.freebsd.org/D40370 I provide infrastructure to 
automatically jail rc.d services. It will use the complete filesystem of 
the system, but uses all the other restrictions of jails. So the answer 
to your questions is "it depends".


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Possible issue with linux xattr support?

2023-08-29 Thread Alexander Leidinger

Am 2023-08-29 21:02, schrieb Shawn Webb:


Back in 2019, I had a similar issue: I needed access to be able to
read/write to the system extended attribute namespace from within a
jailed context. I wrote a rather simple patch that provides that
support on a per-jail basis:

https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/96c85982b45e44a6105664c7068a92d0a61da2a3


You enabled it by default. I would assume you had a thought about the 
implications... any memories about it?


What I'm after is:
 - What can go wrong if we enable it by default?
 - Why would we like to disable it (or any ideas why it is disabled by 
default in FreeBSD)?


Depending in the answers we may even use a simpler patch and have it 
allowed in jails even without the possibility to configure it.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-28 Thread Alexander Leidinger

Am 2023-08-22 18:59, schrieb Mateusz Guzik:

On 8/22/23, Alexander Leidinger  wrote:

Am 2023-08-21 10:53, schrieb Konstantin Belousov:

On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger  wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger 
> > >>>> wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
> > >>>>> interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has
> > >>> several
> > >>> null mounts. One basesystem mounted into every jail, and then
> > >>> shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order to
> > >>>> do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes from
> > >>>> exclusive locking which should not be there to begin with -- as
> > >>>> in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of
> > >> xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how
> > this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
> for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS
exported.
6 of those nullfs mounts are also exported via Samba. The NFS 
exports

shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this 
exclusive

lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I 
need
space for. Both pools reside on the same disks. The root pool is a 
3-way

mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
space-pool. The jails are all basejail-style jails.



While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.


dtrace looks to me like a faster approach to get to the root than 
printf... my first naive try is to detect exclusive locks. I'm not 100% 
sure I got it right, but at least dtrace doesn't complain about it:

---snip---
#pragma D option dynvarsize=32m

fbt:nullfs:null_lock:entry
/args[0]->a_flags & 0x08 != 0/
{
stack();
}
---snip---

In which direction should I look with dtrace if this works in tonights 
run of periodic? I don't have enough knowledge about VFS to come up with 
some immediate ideas.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Possible issue with linux xattr support?

2023-08-28 Thread Alexander Leidinger

Am 2023-08-28 13:06, schrieb Dmitry Chagin:

On Sun, Aug 27, 2023 at 09:55:23PM +0200, Felix Palmen wrote:

* Dmitry Chagin  [20230827 22:46]:



> I can fix this completely disabling exttatr for jailed proc,
> however, it's gonna be bullshit, though

Would probably be better than nothing. AFAIK, "Linux jails" are used a
lot, probably with userlands from distributions actually using xattr.



It might sense to allow this priv (PRIV_VFS_EXTATTR_SYSTEM) for linux
jails by default? What do think, James?


I think the question is more if we want to allow it in jails (not 
specific to linux jails, as in: if it is ok for linux jails, it should 
be ok for FreeBSD jails too). So the question is what does this protect 
the hosts from, if this is not allowed in jails? Some kind of 
possibility to DoS the host?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-21 Thread Alexander Leidinger

Am 2023-08-21 10:53, schrieb Konstantin Belousov:

On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger  wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger  wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has several
> > >>> null mounts. One basesystem mounted into every jail, and then shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order to do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes from
> > >>>> exclusive locking which should not be there to begin with -- as in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS 
exported.

6 of those nullfs mounts are also exported via Samba. The NFS exports
shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this exclusive 
lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I need 
space for. Both pools reside on the same disks. The root pool is a 3-way 
mirror, the "space-pool" is a 5-disk raidz2. All jails are on the 
space-pool. The jails are all basejail-style jails.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-21 Thread Alexander Leidinger

Am 2023-08-20 23:17, schrieb Konstantin Belousov:

On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:

On 8/20/23, Alexander Leidinger  wrote:
> Am 2023-08-20 22:02, schrieb Mateusz Guzik:
>> On 8/20/23, Alexander Leidinger  wrote:
>>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
>>>> On 8/18/23, Alexander Leidinger  wrote:
>>>
>>>>> I have a 51MB text file, compressed to about 1MB. Are you interested
>>>>> to
>>>>> get it?
>>>>>
>>>>
>>>> Your problem is not the vnode limit, but nullfs.
>>>>
>>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
>>>
>>> 122 nullfs mounts on this system. And every jail I setup has several
>>> null mounts. One basesystem mounted into every jail, and then shared
>>> ports (packages/distfiles/ccache) across all of them.
>>>
>>>> First, some of the contention is notorious VI_LOCK in order to do
>>>> anything.
>>>>
>>>> But more importantly the mind-boggling off-cpu time comes from
>>>> exclusive locking which should not be there to begin with -- as in
>>>> that xlock in stat should be a slock.
>>>>
>>>> Maybe I'm going to look into it later.
>>>
>>> That would be fantastic.
>>>
>>
>> I did a quick test, things are shared locked as expected.
>>
>> However, I found the following:
>> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
>> mp->mnt_kern_flag |=
>> lowerrootvp->v_mount->mnt_kern_flag &
>> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
>> MNTK_EXTENDED_SHARED);
>> }
>>
>> are you using the "nocache" option? it has a side effect of xlocking
>
> I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
>

If you don't have "nocache" on null mounts, then I don't see how this
could happen.


There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for
fuse and nfs at least.


11 of those 122 nullfs mounts are ZFS datasets which are also NFS 
exported. 6 of those nullfs mounts are also exported via Samba. The NFS 
exports shouldn't be needed anymore, I will remove them.


Shouldn't this implicit nocache propagate to the mount of the upper fs 
to give the user feedback about the effective state?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-20 Thread Alexander Leidinger

Am 2023-08-20 22:02, schrieb Mateusz Guzik:

On 8/20/23, Alexander Leidinger  wrote:

Am 2023-08-20 19:10, schrieb Mateusz Guzik:

On 8/18/23, Alexander Leidinger  wrote:



I have a 51MB text file, compressed to about 1MB. Are you interested
to
get it?



Your problem is not the vnode limit, but nullfs.

https://people.freebsd.org/~mjg/netchild-periodic-find.svg


122 nullfs mounts on this system. And every jail I setup has several
null mounts. One basesystem mounted into every jail, and then shared
ports (packages/distfiles/ccache) across all of them.


First, some of the contention is notorious VI_LOCK in order to do
anything.

But more importantly the mind-boggling off-cpu time comes from
exclusive locking which should not be there to begin with -- as in
that xlock in stat should be a slock.

Maybe I'm going to look into it later.


That would be fantastic.



I did a quick test, things are shared locked as expected.

However, I found the following:
if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
mp->mnt_kern_flag |= 
lowerrootvp->v_mount->mnt_kern_flag &

(MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
MNTK_EXTENDED_SHARED);
}

are you using the "nocache" option? it has a side effect of xlocking


I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-20 Thread Alexander Leidinger

Am 2023-08-20 19:10, schrieb Mateusz Guzik:

On 8/18/23, Alexander Leidinger  wrote:


I have a 51MB text file, compressed to about 1MB. Are you interested 
to

get it?



Your problem is not the vnode limit, but nullfs.

https://people.freebsd.org/~mjg/netchild-periodic-find.svg


122 nullfs mounts on this system. And every jail I setup has several 
null mounts. One basesystem mounted into every jail, and then shared 
ports (packages/distfiles/ccache) across all of them.


First, some of the contention is notorious VI_LOCK in order to do 
anything.


But more importantly the mind-boggling off-cpu time comes from
exclusive locking which should not be there to begin with -- as in
that xlock in stat should be a slock.

Maybe I'm going to look into it later.


That would be fantastic.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: 100% CPU time for sysctl command, not killable

2023-08-20 Thread Alexander Leidinger

Am 2023-08-20 18:55, schrieb Mina Galić:


procstat(1) kstack could be helpful here.

 Original Message 
On 20 Aug 2023, 17:29, Alexander Leidinger alexan...@leidinger.net> 
wrote:


Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable sysctl program. This is somewhat unexpected... Bye, 
Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 
0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 
0x8F31830F9F2772BF


  PIDTID COMMTDNAME  KSTACK
94391 118678 sysctl  -   sysctl_maxvnodes 
sysctl_root_handler_locked sysctl_root userland_sysctl sys___sysctl 
amd64_syscall fast_syscall_common


Bye,

Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF

100% CPU time for sysctl command, not killable

2023-08-20 Thread Alexander Leidinger

Hi,

sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable 
sysctl program. This is somewhat unexpected...


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-18 Thread Alexander Leidinger

Am 2023-08-16 18:48, schrieb Alexander Leidinger:

Am 2023-08-15 23:29, schrieb Mateusz Guzik:

On 8/15/23, Alexander Leidinger  wrote:

Am 2023-08-15 14:41, schrieb Mateusz Guzik:


With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles


After a reboot:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 24696
vfs.vnodes_created: 1658162
vfs.numvnodes: 173937
vfs.recycles_free: 0
vfs.recycles: 0


New values after one rund of periodic:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 356202
vfs.vnodes_created: 427696288
vfs.numvnodes: 532620
vfs.recycles_free: 20213257
vfs.recycles: 0


And after the second round which only took 7h this night:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 3071754
vfs.vnodes_created: 1275963316
vfs.numvnodes: 3414906
vfs.recycles_free: 58411371
vfs.recycles: 0


Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.


What's the difference between recycles and recycles_free? Does the 
above count as bumping the maxvnodes?


^


Looks like there are not much free directly after the reboot. I will
check the values tomorrow after the periodic run again and maybe
increase by 10 or 100 so see if it makes a difference.


If this is not the problem you can use dtrace to figure it out.


dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or
something else?



I mean checking where find is spending time instead of speculating.

There is no productized way to do it so to speak, but the following
crapper should be good enough:

[script]

I will let it run this night.


I have a 51MB text file, compressed to about 1MB. Are you interested to 
get it?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-16 Thread Alexander Leidinger

Am 2023-08-15 23:29, schrieb Mateusz Guzik:

On 8/15/23, Alexander Leidinger  wrote:

Am 2023-08-15 14:41, schrieb Mateusz Guzik:


With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles


After a reboot:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 24696
vfs.vnodes_created: 1658162
vfs.numvnodes: 173937
vfs.recycles_free: 0
vfs.recycles: 0


New values after one rund of periodic:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 356202
vfs.vnodes_created: 427696288
vfs.numvnodes: 532620
vfs.recycles_free: 20213257
vfs.recycles: 0


Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.


What's the difference between recycles and recycles_free? Does the above 
count as bumping the maxvnodes?



Looks like there are not much free directly after the reboot. I will
check the values tomorrow after the periodic run again and maybe
increase by 10 or 100 so see if it makes a difference.


If this is not the problem you can use dtrace to figure it out.


dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or
something else?



I mean checking where find is spending time instead of speculating.

There is no productized way to do it so to speak, but the following
crapper should be good enough:

[script]

I will let it run this night.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-15 Thread Alexander Leidinger

Am 2023-08-15 14:41, schrieb Mateusz Guzik:


With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles


After a reboot:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 24696
vfs.vnodes_created: 1658162
vfs.numvnodes: 173937
vfs.recycles_free: 0
vfs.recycles: 0


Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.


Looks like there are not much free directly after the reboot. I will 
check the values tomorrow after the periodic run again and maybe 
increase by 10 or 100 so see if it makes a difference.



If this is not the problem you can use dtrace to figure it out.


dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or 
something else?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Strange network issues with -current

2023-08-15 Thread Alexander Leidinger

Am 2023-08-15 14:24, schrieb Alexander Leidinger:

Am 2023-08-15 13:48, schrieb Alexander Leidinger:

since a while I have some strange network issues in some parts of a 
particular system.


I just stumbled upon the mail which discusses issues with commit 
e3ba0d6adde3, and when I look into this I see changes related to the 
use of SO_REUSEPORT flags, and all my nginx systems use the reuseport 
directive in their config. I'm compiling right now with his change 
reverted. Once tested I will report back.


Unfortunately it wasn't that.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Strange network issues with -current

2023-08-15 Thread Alexander Leidinger

Am 2023-08-15 13:48, schrieb Alexander Leidinger:

since a while I have some strange network issues in some parts of a 
particular system.


I just stumbled upon the mail which discusses issues with commit 
e3ba0d6adde3, and when I look into this I see changes related to the use 
of SO_REUSEPORT flags, and all my nginx systems use the reuseport 
directive in their config. I'm compiling right now with his change 
reverted. Once tested I will report back.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Speed improvements in ZFS

2023-08-15 Thread Alexander Leidinger

Hi,

just a report that I noticed a very high speed improvement in ZFS in 
-current. Since a looong time (at least since last year), for a 
jail-host of mine with about >20 jails on it which each runs periodic 
daily, the periodic daily runs of the jails take from about 3 am to 5pm 
or longer. I don't remember when this started, and I thought at that 
time that the problem may be data related. It's the long runs of "find" 
in one of the periodic daily jobs which takes that long, and the number 
of jails together with null-mounted basesystem inside the jail and a 
null-mounted package repository inside each jail the number of files and 
congruent access to the spining rust with first SSD and now NVME based 
cache may have reached some tipping point. I have all the periodic daily 
mails around, so theoretically I may be able to find when this started, 
but as can be seen in another mail to this mailinglist, the system which 
has all the periodic mails has some issues which have higher priority 
for me to track down...


Since I updated to a src from 2023-07-20, this is not the case anymore. 
The data is the same (maybe even a bit more, as I have added 2 more 
jails since then and the periodic daily runs which run more or less in 
parallel, are not taking considerably longer). The speed increase with 
the July-build are in the area of 3-4 hours for 23 parallel periodic 
daily runs. So instead of finishing the periodic runs around 5pm, they 
finish already around 1pm/2pm.


So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and 
2023-07-20 has given a huge speed improvement. From my memory I would 
say there is still room for improvement, as I think it may be the case 
that the periodic daily runs ended in the morning instead of the 
afteroon, but my memory may be flaky in this regard...


Great work to whoever was involved.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Strange network issues with -current

2023-08-15 Thread Alexander Leidinger

Hi,

since a while I have some strange network issues in some parts of a 
particular system.


A build with src from 2023-07-26 was still working ok. An update to 
2023-08-07 broke some parts in a strange way. I tried again with src 
from 2023-08-11 didn't fix things.


What I see is... strange and complex.

I have a jail host with about 23 jails. All the jails are sitting on a 
bridge, and have IPv6 and IPV4 addresses. One jail is a DNS server for a 
domain which contains all the DNS entries for all the jails on the 
system (and more). Other jails have mysql (FS socket for mysql 
nullfs-mounted into other jails for connecting to mysql via the FS 
socket instead of the network), dovecot IMAP server, postfix SMTP 
server, a nginx based reverse proxy and 2 different kinds of webmail 
solutions (old php74 based on the way out on favour for a php81 based 
one), a wiki and other things.


With the old working basesystem I can login into the old webmail system 
and read mails. With the newer non-working basesystem I still can login, 
but the auth-credentials are not stored in the backend-session and as 
such no mail is listed at all, as this requires subsequent connections 
from php to dovecot. This webmail system is going via the reverse proxy 
to the webmail-jail which has another nginx configured to connect to the 
php-fpm backend.
With the new webmail system I can login, read mails, and even are 
writing this email from. The first login to it fails. The second 
succeeds. It is not behind the reverse proxy (as it is not fully ready 
yet for access from the outside (DSL with NAT on the DSL-box to the 
reverse proxy)), but a single nginx with php-fpm backend (instead of 2 
nginx + php-fpm as in the old webmail).


The wiki behind the reverse proxy is sometimes working, and sometimes 
not. Sometimes it is providing everything, sometimes parts of the site 
is missing (e.g. pictures / icons). Sometimes there is simply a blank 
page, sometimes it gives an error message from the wiki about an 
unforseen bug...


The error messages in the nginx reverse proxy log for all the strange 
failure cases is "accept4() failed (53: Software caused connection 
abort)". Sometimes I get "upstream timed out". When it times out in the 
reverse proxy instead of getting the accept4-errors, I see the same 
accept4-error message in the nginx inside the wiki or webmail jail 
instead.


I tried to recompile all the components of the wiki and reverse proxy 
and php81 based webmail, to no avail. The issue persists.


Does this ring a bell to someone? Maybe some network or socket or VM 
based changes in this timeframe which smell like they could be related 
and maybe good candidates for a backup-test? Any ideas how to drill down 
with debugging to have a more simple test-case than the complex setup of 
if_bridge, epair, jails, wiki, php, nginx, ...?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: kernel: sonewconn: pcb 0xfffff8002b255a00 (local:/var/run/devd.seqpacket.pipe): Listen queue overflow: 1 already in queue awaiting acceptance (60 occurrences), ?

2023-06-21 Thread Alexander Leidinger

Quoting Gary Jennejohn  (from Tue, 20 Jun 2023 14:41:41 +):


On Tue, 20 Jun 2023 12:04:13 +0200
Alexander Leidinger  wrote:



"listen X backlog=y" and "sysctl kern.ipx.somaxconn=X" for FreeBSD



On my FreeBSD14 system these things are all under kern.ipc.


Typo on my side... it was supposed to read ipc, not ipx.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpTjpqaetzBB.pgp
Description: Digitale PGP-Signatur


Re: kernel: sonewconn: pcb 0xfffff8002b255a00 (local:/var/run/devd.seqpacket.pipe): Listen queue overflow: 1 already in queue awaiting acceptance (60 occurrences), ?

2023-06-20 Thread Alexander Leidinger


Quoting Gary Jennejohn  (from Tue, 20 Jun 2023 07:41:08 +):


On Tue, 20 Jun 2023 06:25:05 +0100
Graham Perrin  wrote:


Please, what's the meaning of the sonewconn lines?



sonewconn is described in socket(9).  Below a copy/paste of the description
from socket(9):

 Protocol implementations can use sonewconn() to create a socket and
 attach protocol state to that socket.  This can be used to create new
 sockets available for soaccept() on a listen socket.  The  
returned socket

 has a reference count of zero.

Apparently there was already a listen socket in the queue which had not been
consumed by soaccept() when a new sonewconn() call was made.

Anyway, that's my understanding.  Might be wrong.


In other words the software listening on it didn't process the request  
fast enough and a backlog piled up (e.g apache ListenBacklog or nginx  
"listen X backlog=y" and "sysctl kern.ipx.somaxconn=X" for FreeBSD  
itself). You may need faster hardware, more processes/threads to  
handle the traffic, or configure your software to do less to produce  
the same result (e.g. no real-time DNS resolution in the logging of a  
webserver or increasing the amount of allowed items in the backlog).  
If you can change the software, there's also the possibility to switch  
from blocking sockets to non-blocking sockets (to not have the  
select/accept loop block / run into contention) or kqueue.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpAjQQlBmAmQ.pgp
Description: Digitale PGP-Signatur


Re: Surprise null root password

2023-05-31 Thread Alexander Leidinger
Quoting bob prohaska  (from Tue, 30 May 2023  
08:36:21 -0700):



I suggest to review changes ("df" instead of "tf" in etcupdate) to at least
those files which you know you have modified, including the password/group
stuff. After that you can decide if the diff which is shown with "df" can be
applied ("tf"), or if you want to keep the old version ("mf"), or if you
want to modify the current file ("e", with both versions present in the file
so that you can copy/paste between the different versions and keep what you
need).



The key sequences required to copy and paste between files in the edit screen
were elusive. Probably it was thought self-evident, but not for me.  
I last tried
it long ago, via mergemaster. Is there is a guide to commands for  
merging files

using /etcupdate? Is it in the vi man page? I couldn't find it.


etcupdate respects the EDITOR env-variable. You can use any editor you  
like there.


Typically I use the mouse to copy myself and google every time I  
can't (https://linuxize.com/post/how-to-copy-cut-paste-in-vim/).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgp4RaBvVQJjb.pgp
Description: Digitale PGP-Signatur


Re: Surprise null root password

2023-05-30 Thread Alexander Leidinger


Quoting bob prohaska  (from Fri, 26 May 2023  
16:26:06 -0700):



On Fri, May 26, 2023 at 10:55:49PM +0200, Yuri wrote:


The question is how you update the configuration files,
mergemaster/etcupdate/something else?



Via etcupdate after installworld. In the event the system
requests manual intervention I accept "theirs all". It seems
odd if that can null a root password.

Still, it does seem an outside possibility. I could see it adding
system users, but messing with root's existing password seems a
bit unexpected.


As you are posting to -current@, I expect you to report this issue  
about 14-current systems. As such: there was a "recent" change  
(2021-10-20) to the root entry to change the shell.
 
https://cgit.freebsd.org/src/commit/etc/master.passwd?id=d410b585b6f00a26c2de7724d6576a3ea7d548b7


By blindly accepting all changes, this has reset the PW to the default  
setting (empty).


I suggest to review changes ("df" instead of "tf" in etcupdate) to at  
least those files which you know you have modified, including the  
password/group stuff. After that you can decide if the diff which is  
shown with "df" can be applied ("tf"), or if you want to keep the old  
version ("mf"), or if you want to modify the current file ("e", with  
both versions present in the file so that you can copy/paste between  
the different versions and keep what you need).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpGEjDP92h3s.pgp
Description: Digitale PGP-Signatur


Re: change in compat/linux breaking net/citrix_ica

2023-04-26 Thread Alexander Leidinger
Quoting Jakob Alvermark  (from Wed, 26 Apr 2023  
09:01:00 +0200):



Hi,


I use net/citrix_ica for work.

After a recent change to -current in compat/linux it no longer  
works. The binary just segfaults.


What does "sysctl compat.linux.osrelease" display? If it is not 2.6.30  
or higher, try to set it to 2.6.30 or higher.


Bye,
Alexander.


I have bisected and it happened after this commit:

commit 40c36c4674eb9602709cf9d0483a4f34ad9753f6
Author: Dmitry Chagin 
Date:   Sat Apr 22 22:17:17 2023 +0300

    linux(4): Export the AT_RANDOM depending on the process osreldata

    AT_RANDOM has appeared in the 2.6.30 Linux kernel first time.



--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpvwszGFGPAo.pgp
Description: Digitale PGP-Signatur


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Alexander Leidinger
Quoting Mark Millard  (from Wed, 12 Apr 2023  
22:28:13 -0700):



A fair number of errors are of the form: the build
installing a previously built package for use in the
builder but later the builder can not find some file
from the package's installation.


As a data point, last year I had such issues with one particular  
package. It was consistent no matter how often I was updating the  
ports tree. Poudriere always failed on port X which was depending on  
port Y (don't remember the names). The problem was, that port Y was  
build successfully but an extract of it was not having a file it was  
supposed to have. IIRC I fixed the issue by building the port Y  
manually, as re-building port Y with poudriere didn't change the  
outcome.


So it seems this may not be specific to the most recent ZFS version,  
but could be an older issue. It may be the case that the more recent  
ZFS version amplifies the problem. It can also be that it is related  
to a specific use case in poudriere.


I remember a recent mail which talks about poudriere failing to copy  
files in resource-limited environments, see  
https://lists.freebsd.org/archives/dev-commits-src-all/2023-April/025153.html
While the issue you are trying to pin-point may not be related to this  
discussion, I mention it because it smells to me like we could be in a  
situation where a similar combination of unrelated to each other  
FreeBSD features could form a combination which triggers the issue at  
hand.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpjoaPNf5aAM.pgp
Description: Digitale PGP-Signatur


Re: py-libzfs build failure on current, zpool_search_import() missing

2023-02-08 Thread Alexander Leidinger
Quoting Ryan Moeller  (from Fri, 3 Feb 2023  
10:48:35 -0500):



The build still fails on -current as of end of Jan with "too few
argument to function call, expected 4, have 3" for zfs_iter_filesystems.

Is a patch for openzfs in -current missing? I haven't seen a commit to
-current in openzfs in the last 2 days.


The openzfs changes aren't that recent, but the py-libzfs port has
been out of date for a while. I'll spin up a new snapshot VM and fix
whatever is still broken.


I can confirm that the 20230207 version of py-libzfs builds (and  
works) on -current. Thanks!


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpMIsS8likHB.pgp
Description: Digitale PGP-Signatur


Re: py-libzfs build failure on current, zpool_search_import() missing

2023-02-02 Thread Alexander Leidinger
Quoting Ryan Moeller  (from Thu, 2 Feb 2023  
10:43:53 -0500):



I've updated the py-libzfs port to fix the build.


The build still fails on -current as of end of Jan with "too few  
argument to function call, expected 4, have 3" for zfs_iter_filesystems.


Is a patch for openzfs in -current missing? I haven't seen a commit to  
-current in openzfs in the last 2 days.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpXHOYNsPe15.pgp
Description: Digitale PGP-Signatur


Re: py-libzfs build failure on current, zpool_search_import() missing

2023-02-02 Thread Alexander Leidinger
Quoting Alan Somers  (from Thu, 2 Feb 2023  
06:58:35 -0700):



Unfortunately libzfs doesn't have a stable API, so this kind of
breakage is to be expected.  libzfs_core does, but libzfs_core is
incomplete.  You should report this problem upstream at
https://github.com/truenas/py-libzfs .


I did already.

https://github.com/truenas/py-libzfs/issues/224

There is no libzfs_core.h in /usr/include, can it be that we need to  
install this there?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpqkRgAqlhhN.pgp
Description: Digitale PGP-Signatur


py-libzfs build failure on current, zpool_search_import() missing

2023-02-02 Thread Alexander Leidinger

Hi,

the build of py-libzfs fails on -current due to a missing  
zpool_search_import(), and as such iocage can not be build (and the  
old iocage segfaults, so the ABI seems to have changed too). The  
symbol is available in libzutil, but I can not find  
zpool_search_import() in /usr/include.


Anyone with an idea if there is something missing (maybe something to  
be installed into /usr/include), or what needs to be done to py-libzfs?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpT2LpLYHf04.pgp
Description: Digitale PGP-Signatur


Re: RFC: nfsd in a vnet jail

2022-12-01 Thread Alexander Leidinger


Quoting Alan Somers  (from Tue, 29 Nov 2022  
17:28:10 -0700):



On Tue, Nov 29, 2022 at 5:21 PM Rick Macklem  wrote:



So, what do others think of enforcing the requirement that each jail
have its own file systems for this?


I think that's a totally reasonable requirement.  Especially so for
ZFS users, who already create a filesystem per jail for other reasons.


While I agree that it is a reasonable requirement, just a note that we  
can not assume that every existing jail resides on its own file  
system. The base system jail infrastructure doesn't check this, and  
the ezjail port doesn't either. The iocage port does it.


Is there a way to detect this inside a jail and error out in nfsd/mountd?

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpRjJWWhBIKb.pgp
Description: Digitale PGP-Signatur


Re: ULE realtime scheduler advice needed

2022-11-18 Thread Alexander Leidinger
Quoting Hans Petter Selasky  (from Fri, 18 Nov 2022  
05:47:58 +0100):



Hi,

I'm doing some work with audio and have noticed some problems with  
the ULE scheduler. I have a program that generate audio based on  
key-presses. When no keys are pressed, the load is near 0%, but as  
soon as you start pressing keys, the load goes maybe to 80% of a CPU  
core. This program I run with rtprio 8 xxx. The issue I observe or  
hear actually, is that it takes too long until the scheduler grasps  
that this program needs it's own CPU core and stops time-sharing the  
program. When I however use cpuset -l xxx rtprio 8 yyy everything is  
good, and the program outputs realtime audio in-time.


I have something in my mind about ULE not handling idleprio and/or  
rtprio correctly, but I have no pointer to a validation of this.



Or is this perhaps a CPU frequency stepping issue?


You could play with
rc.conf (/etc/rc.d/power_profile):
performance_cpu_freq="HIGH"
performance_cx_lowest="C3"   # see sysctl hw.cpu.0 | grep cx
economy_cx_lowest="C3"   # see sysctl hw.cpu.0 | grep cx

Your system may provide other Cx possibilities, and ging to a lower  
number (e.g. C1) means less power-saving but faster response from the  
CPU (I do not expect that this is causing the issue you have).



Any advice on where to look?


Potential sysctl to play with to change "interactivity detection" in ULE:
https://www.mail-archive.com/freebsd-stable@freebsd.org/msg112118.html

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpkBXOWiYjFT.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-12 Thread Alexander Leidinger

Quoting Warner Losh  (from Wed, 9 Nov 2022 08:54:33 -0700):


On Wed, Nov 9, 2022 at 5:46 AM Alexander Leidinger 
wrote:



While most of these options look OK on the surface, I'd feel a lot better
if there were tests for these to prove they work. I'd also feel better if
the ZFS experts could explain how those come to be set on a zpool
as well. I'd settle for a good script that could be run as root (better


It is explained in the zpool-features man page.


would be not as root) that would take a filesystem that was created
by makefs -t zfs and turn on these features after an zpool upgrade.


Script attached. Maybe a little bit too verbose, but you can see which  
features are active directly, and which ones only enabled.


It expects a zroot.img in the current directory and creates copies to  
zroot_num_featurename.img where it enables the features. In the  
beginning are some variables to adapt to pool/image name and  
destination directory.


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


zpool_features.sh
Description: Bourne shell script


pgpnwcnaQ1dYc.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger


Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
22:47:28 +0100):



Hi,


Am 09.11.2022 um 22:38 schrieb Patrick M. Hausen :
Am 09.11.2022 um 22:26 schrieb Alexander Leidinger  
:
On quick look I haven't found a place where a compatibility  
setting is used for the rpool during the creation, so I can't  
point out what the exact difference is.
Given that empty_bpobj is not in the list of the boot code, it  
can't be the same, some limit of enabled features has to be in  
place during initial install, and your example has to be different.


That feature was imported into FreeBSD in 2012 so it should be
enabled in every pool created since then.


I apologize, should have included that in the last mail.
This is a current FreeBSD 13.1-p2 hosting system we run.

Boots quite fine ;-)


There are several in the list which are not in the list in zfsipl.c.  
So that list is not the full truth...


Bye,
Alexander.


---
[ry93@pdn006 ~]$ zpool get all zroot|grep feature
zroot  feature@async_destroy  enabledlocal
zroot  feature@empty_bpobjactive local
zroot  feature@lz4_compress   active local
zroot  feature@multi_vdev_crash_dump  enabledlocal
zroot  feature@spacemap_histogram active local
zroot  feature@enabled_txgactive local
zroot  feature@hole_birth active local
zroot  feature@extensible_dataset active local
zroot  feature@embedded_data  active local
zroot  feature@bookmarks  enabledlocal
zroot  feature@filesystem_limits  enabledlocal
zroot  feature@large_blocks   enabledlocal
zroot  feature@large_dnodeenabledlocal
zroot  feature@sha512 enabledlocal
zroot  feature@skein  enabledlocal
zroot  feature@userobj_accounting active local
zroot  feature@encryption enabledlocal
zroot  feature@project_quota  active local
zroot  feature@device_removal enabledlocal
zroot  feature@obsolete_countsenabledlocal
zroot  feature@zpool_checkpoint   enabledlocal
zroot  feature@spacemap_v2active local
zroot  feature@allocation_classes enabledlocal
zroot  feature@resilver_defer enabledlocal
zroot  feature@bookmark_v2enabledlocal
zroot  feature@redaction_bookmarksenabledlocal
zroot  feature@redacted_datasets  enabledlocal
zroot  feature@bookmark_written   enabledlocal
zroot  feature@log_spacemap   active local
zroot  feature@livelist   active local
zroot  feature@device_rebuild enabledlocal
zroot  feature@zstd_compress  enabledlocal
zroot  feature@draid  enabledlocal
---



--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpfPnLCToEq6.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger
Quoting Brooks Davis  (from Wed, 9 Nov 2022  
21:18:41 +):



On Wed, Nov 09, 2022 at 09:19:47PM +0100, Alexander Leidinger wrote:

Quoting Mark Millard  (from Wed, 9 Nov 2022
12:10:18 -0800):

> On Nov 9, 2022, at 11:58, Alexander Leidinger
>  wrote:
>
>> Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022
>> 20:49:37 +0100):
>>
>>> Hi,
>>>
>>>> Am 09.11.2022 um 20:45 schrieb Alexander Leidinger
>>>> :
>>>> But "zpool set feature@edonr=enabled rpool" (or any other feature
>>>> not in the list we talk about) would render it unbootable.
>>>
>>> Sorry, just to be sure. So an active change of e.g. checksum or
>>> compression algorithm
>>> might render the system unbootable but a zpool upgrade never will?
>>> At least not intentionally? ;-)
>>
>> If you mean "zpool upgrade", then no (modulo bugs). OpenZFS uses
>> the feature flags instead of zpool upgrade.
>
> I'm confused by that answer:

See my correction in another mail, the behavior seems to have changed
and yes, doing a zpool upgrade on a boot pool should not be done.

Maybe someone wants to check or add provisions to not do that on a
pool which has the bootfs property set.


Literally the entire point of the script added in the commit this thread
is about upgrade the boot pool on first boot so that seems like it would
be counterproductive.


Something is missing here. Either some pointer to some safetynet for  
pools with the bootfs property set (or a similar "this is a bootable  
pool" flag), or a real-world test of the script.


Any brave soul around to spin up a test-VM and perform a "echo before;  
zpool get all rpool | grep feature; zpool upgrade rpool; echo after;  
zpool get all rpool | grep feature" inside?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpbcaJMrp3Tk.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger
Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
22:11:29 +0100):



Hi,

Am 09.11.2022 um 22:05 schrieb Alexander Leidinger  
:
Attention, "upgrade" is overloaded here. "OS upgrade" will not  
render the pool unbootable (modulo bugs), but "zpool upgrade rpool"  
will (except we have provisions that zpool upgrade doesn't enable  
all features in case the bootfs property is set).


And we are back at the start. The "problem" is that I really like  
consistency.

So when "zpool status" throws that ominous message at me - any you have
to admit that it is phrased like a warning - I want simply to get  
rid of that.

After a reasonable after-update grace period.

But during our discussion I have come to wonder:

- I upgrade from 13.0 to 13.1, I do a "zpool upgrade" afterwards, I  
also upgrade the boot loader


- I install 13.1 with ZFS

What is the difference? Shouldn't these two imaginary systems be  
absolutely the same in terms

of ZFS features, boot loader, and all that?


On quick look I haven't found a place where a compatibility setting is  
used for the rpool during the creation, so I can't point out what the  
exact difference is.
Given that empty_bpobj is not in the list of the boot code, it can't  
be the same, some limit of enabled features has to be in place during  
initial install, and your example has to be different.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpGlO3ESlGcF.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger

 Quoting Warner Losh  (from Wed, 9 Nov 2022 13:53:59 -0700):


 

   On Wed, Nov 9, 2022 at 12:47 PM Alexander Leidinger  
 wrote:



Quoting Warner Losh  (from Wed, 9 Nov 2022 08:54:33 -0700):


as well. I'd settle for a good script that could be run as root (better
would be not as root) that would take a filesystem that was created
by makefs -t zfs and turn on these features after an zpool upgrade.
I have the vague outlines of a test suite for the boot loader that I
could see about integrating something like that into, but most of my
time these days is chasing after 'the last bug' in some kboot stuff I'm
working on (which includes issues with our ZFS in the boot loader
integration).


How would you test a given image? bhyve/qemu/...?


 
I have a script that creates a number of image files and a  
number of qemu scripts that look

like the following:
 
/home/imp/git/qemu/00-build/qemu-system-aarch64 -nographic -machine  
virt,gic-version=3 -m 512M -smp 4 \

        -cpu cortex-a57 \
        -drive  
file=/home/imp/stand-test-root/images/arm64-aarch64/linuxboot-arm64-aarch64-zfs.img,if=none,id=drive0,cache=writeback  
\

        -device virtio-blk,drive=drive0,bootindex=0 \
        -drive  
file=/home/imp/stand-test-root/bios/edk2-arm64-aarch64-code.fd,format=raw,if=pflash  
\
        -drive  
file=/home/imp/stand-test-root/bios/edk2-arm64-aarch64-vars.fd,format=raw,if=pflash  
\

        -monitor telnet::,server,nowait \
        -serial stdio $*
 
There's a list of these files that's generated and looks to see  
if it gets to the 'success' echo in the minimal root I have for them.

 


So a little script which makes a copy of a source image, enables  
features on the copies and spits out a list of image files would suit  
your needs?

e.g.:
for feature A B C; do
   # ignoring inter-feature dependencies for a moment
  cp $source_image zfs_feature_$feature.img
  pool_name = import_pool zfs_feature_$feature.img
  enable_feature $pool_name $feature
  export_pool $pool_name
  echo zfs_feature_$feature.img
done

Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpK5i4nalL3R.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger

Quoting Warner Losh  (from Wed, 9 Nov 2022 13:56:43 -0700):


On Wed, Nov 9, 2022 at 1:54 PM Patrick M. Hausen  wrote:


Hi Warner,

> Am 09.11.2022 um 21:51 schrieb Warner Losh :
> Yes. For safety, boot loader upgrade is mandatory when you do a zpool
upgrade of the root filesystem.
> It was definitely needed in the OpenZFS jump, and we've had one or two
other flag days since.

That's a given and not a problem. What I fear from my understanding of
this thread so far is
that there might be a situation when I upgrade the zpool and the boot
loader and the system
ends up unbootable nonetheless.

Possible or not?



If all you do is upgrade, then no, modulo bugs that we've thankfully not
had yet. It's when you enable something on the zpool that you can run into
trouble, but that's true independent of upgrade :)


Attention, "upgrade" is overloaded here. "OS upgrade" will not render  
the pool unbootable (modulo bugs), but "zpool upgrade rpool" will  
(except we have provisions that zpool upgrade doesn't enable all  
features in case the bootfs property is set).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpm5pT61kFUD.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger
Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
21:19:23 +0100):



Hi,

Am 09.11.2022 um 21:15 schrieb Alexander Leidinger  
:
Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
21:02:52 +0100):

Yet, I made it a habit to whenever I see this message:

---
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
---

to do a "zpool upgrade" after some time of burn in followed by an  
update of the

boot loader.

I desire to know if that is in fact dangerous.


Ugh. This changed. It is indeed dangerous now. I just tested it  
with a non-root pool which didn't had all flags enabled. "zpool  
upgrade " will now enable all features.


I know. But until now I assumed that features *enabled* but not  
*used* were not impeding booting.

And that for all others the boot loader was supposed to keep track.


Some features are used directly when enabled. Some features go back to  
the enabled state when some conditions are met. Some features are not  
reversible without re-creating the pool (e.g. device_removal). The  
zzpool-features man-page gives explanations which features belong into  
which category.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgp9Oy22Z69dx.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger
Quoting Mark Millard  (from Wed, 9 Nov 2022  
12:10:18 -0800):


On Nov 9, 2022, at 11:58, Alexander Leidinger  
 wrote:


Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
20:49:37 +0100):



Hi,

Am 09.11.2022 um 20:45 schrieb Alexander Leidinger  
:
But "zpool set feature@edonr=enabled rpool" (or any other feature  
not in the list we talk about) would render it unbootable.


Sorry, just to be sure. So an active change of e.g. checksum or  
compression algorithm
might render the system unbootable but a zpool upgrade never will?  
At least not intentionally? ;-)


If you mean "zpool upgrade", then no (modulo bugs). OpenZFS uses  
the feature flags instead of zpool upgrade.


I'm confused by that answer:


See my correction in another mail, the behavior seems to have changed  
and yes, doing a zpool upgrade on a boot pool should not be done.


Maybe someone wants to check or add provisions to not do that on a  
pool which has the bootfs property set.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpmpZ1ZW63NA.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger
Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
21:02:52 +0100):



Hi,

Am 09.11.2022 um 20:58 schrieb Alexander Leidinger  
:


Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
20:49:37 +0100):



Hi,

Am 09.11.2022 um 20:45 schrieb Alexander Leidinger  
:
But "zpool set feature@edonr=enabled rpool" (or any other feature  
not in the list we talk about) would render it unbootable.


Sorry, just to be sure. So an active change of e.g. checksum or  
compression algorithm
might render the system unbootable but a zpool upgrade never will?  
At least not intentionally? ;-)


If you mean "zpool upgrade", then no (modulo bugs). OpenZFS uses  
the feature flags instead of zpool upgrade.


I know about feature flags and all my pools are recent enough to have them.

Yet, I made it a habit to whenever I see this message:

---
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
---

to do a "zpool upgrade" after some time of burn in followed by an  
update of the

boot loader.

I desire to know if that is in fact dangerous.


Ugh. This changed. It is indeed dangerous now. I just tested it with a  
non-root pool which didn't had all flags enabled. "zpool upgrade  
" will now enable all features.


I remember that it wasn't in the past and I had to enable the feature  
flags by hand. I don't know if a pool with bootfs set is behaving  
differently, but I consider testing this with a real rpool to be  
dangerous.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgp55U6MDTOxF.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger
Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
20:49:37 +0100):



Hi,

Am 09.11.2022 um 20:45 schrieb Alexander Leidinger  
:
But "zpool set feature@edonr=enabled rpool" (or any other feature  
not in the list we talk about) would render it unbootable.


Sorry, just to be sure. So an active change of e.g. checksum or  
compression algorithm
might render the system unbootable but a zpool upgrade never will?  
At least not intentionally? ;-)


If you mean "zpool upgrade", then no (modulo bugs). OpenZFS uses the  
feature flags instead of zpool upgrade.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpO78tAvNPWO.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger

Quoting Warner Losh  (from Wed, 9 Nov 2022 08:54:33 -0700):


as well. I'd settle for a good script that could be run as root (better
would be not as root) that would take a filesystem that was created
by makefs -t zfs and turn on these features after an zpool upgrade.
I have the vague outlines of a test suite for the boot loader that I
could see about integrating something like that into, but most of my
time these days is chasing after 'the last bug' in some kboot stuff I'm
working on (which includes issues with our ZFS in the boot loader
integration).


How would you test a given image? bhyve/qemu/...?

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpBddbGiB5KJ.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger
Quoting "Patrick M. Hausen"  (from Wed, 9 Nov 2022  
20:02:55 +0100):



Hi all,


Am 09.11.2022 um 16:54 schrieb Warner Losh :
>>There is a fixed list of features we support in the boot loader:
>>[...]
>>Any feature not on this list will cause the boot loader to
>> reject the pool.


I admit that I do not grasp the full implications of this thread and  
the proposed

and debated changes. Does that imply that a simple "zpool upgrade" of the
boot/root pool might lead to an unbootable system in the future - even if the
boot loader is upgraded as it should, too?


For a recent pool (zpool get all rpool | grep -q feature && echo  
recent enough): no.


But "zpool set feature@edonr=enabled rpool" (or any other feature not  
in the list we talk about) would render it unbootable.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgp2qrhU55d75.pgp
Description: Digitale PGP-Signatur


Re: changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger

Quoting Warner Losh  (from Wed, 9 Nov 2022 08:54:33 -0700):


On Wed, Nov 9, 2022 at 5:46 AM Alexander Leidinger 
wrote:


Quoting Alexander Leidinger  (from Tue, 08
Nov 2022 10:50:53 +0100):



> Should the above list be sorted in some way? Maybe in the same order
> as the zpool-features lists them (sort by feature name after the
> colon), or alphabetical?

Is it OK if I commit this alphabetical sorting?


[diff of feature-sorting]



This patch looks good because it's a nop and just tidies things up a bit.

Reviewed by: imp


Will do later.


> As Mark already mentioned some flags, I checked the features marked
> as read only (I checked in the zpool-features man page, including
> the dependencies documented there) and here are those not listed in
> zfsimpl.c. I would assume as they are read-only compatible, we
> should add them:
> com.delphix:async_destroy
> com.delphix:bookmarks
> org.openzfs:device_rebuild
> com.delphix:empty_bpobj
> com.delphix:enable_txg
> com.joyent:filesystem_limits
> com.delphix:livelist
> com.delphix:log_spacemap
> com.zfsonlinux:project_quota
> com.zfsonlinux:userobj_accounting
> com.openzfs:zilsaxattr

If my understanding is correct that the read-only compatible parts
(according to the zpool-features man page) are safe to add to the zfs
boot, here is what I have only build-tested (relative to the above
alphabetical sorting):
---snip---
--- zfsimpl.c_sorted2022-11-09 12:55:06.346083000 +0100
+++ zfsimpl.c2022-11-09 13:01:24.083364000 +0100
@@ -121,24 +121,35 @@
  "com.datto:bookmark_v2",
  "com.datto:encryption",
  "com.datto:resilver_defer",
+"com.delphix:async_destroy",
  "com.delphix:bookmark_written",
+"com.delphix:bookmarks",
  "com.delphix:device_removal",
  "com.delphix:embedded_data",
+"com.delphix:empty_bpobj",
+"com.delphix:enable_txg",
  "com.delphix:extensible_dataset",
  "com.delphix:head_errlog",
  "com.delphix:hole_birth",
+"com.delphix:livelist",
+"com.delphix:log_spacemap",
  "com.delphix:obsolete_counts",
  "com.delphix:spacemap_histogram",
  "com.delphix:spacemap_v2",
  "com.delphix:zpool_checkpoint",
  "com.intel:allocation_classes",
+"com.joyent:filesystem_limits",
  "com.joyent:multi_vdev_crash_dump",
+"com.openzfs:zilsaxattr",
+"com.zfsonlinux:project_quota",
+"com.zfsonlinux:userobj_accounting",
  "org.freebsd:zstd_compress",
  "org.illumos:lz4_compress",
  "org.illumos:sha512",
  "org.illumos:skein",
  "org.open-zfs:large_blocks",
  "org.openzfs:blake3",
+"org.openzfs:device_rebuild",
  "org.zfsonlinux:allocation_classes",
  "org.zfsonlinux:large_dnode",
  NULL
---snip---

Anyone able to test some of those or confirms my understanding is
correct and would sign-off on a "reviewed by" level?



I'm inclined to strongly NAK this patch, absent some way to test it.
There's no issues today with any of them being absent causing
problems on boot that have been reported. The ZFS that's in the
boot loader is a reduced copy of what's in base and not everything is
supported. There's no urgency here to rush into this. The ones that
are on the list already are for things that we know we support in the
boot loader because we've gone to the trouble to put blake3 or sha512
into it (note: Not all boot loaders will support all ZFS features in the
future... x86 BIOS booting likely is going to have to be frozen at its
current ZFS feature set due to code size issues).

While most of these options look OK on the surface, I'd feel a lot better
if there were tests for these to prove they work. I'd also feel better if
the ZFS experts could explain how those come to be set on a zpool
as well. I'd settle for a good script that could be run as root (better
would be not as root) that would take a filesystem that was created
by makefs -t zfs and turn on these features after an zpool upgrade.
I have the vague outlines of a test suite for the boot loader that I
could see about integrating something like that into, but most of my
time these days is chasing after 'the last bug' in some kboot stuff I'm
working on (which includes issues with our ZFS in the boot loader
integration).

So not a hard no, but I plea for additional scripts to create images
that can be tested.


I didn't want to commit untested or unverified stuff. I fully agree  
with your reasoning.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgppq4Jrt1La5.pgp
Description: Digitale PGP-Signatur


changes to the zfs boot (was: Re: git: 72a1cb05cd23 - main - rc(8): Add a zpoolupgrade rc.d script)

2022-11-09 Thread Alexander Leidinger
Quoting Alexander Leidinger  (from Tue, 08  
Nov 2022 10:50:53 +0100):



Quoting Warner Losh  (from Mon, 7 Nov 2022 14:23:11 -0700):


 

  On Mon, Nov 7, 2022 at 4:15 AM Alexander Leidinger  
 wrote:



Quoting Li-Wen Hsu  (from Mon, 7 Nov 2022 03:39:19 GMT):


The branch main has been updated by lwhsu:

URL: 
https://cgit.FreeBSD.org/src/commit/?id=72a1cb05cd230ce0d12a7180ae65ddbba2e0cb6d

commit 72a1cb05cd230ce0d12a7180ae65ddbba2e0cb6d
Author:     Li-Wen Hsu 
AuthorDate: 2022-11-07 03:30:09 +
Commit:     Li-Wen Hsu 
CommitDate: 2022-11-07 03:30:09 +

     rc(8): Add a zpoolupgrade rc.d script

     If a zpool is created by makefs(8), its version is 5000, i.e., all
     feature flags are off.  Introduce an rc script to run `zpool upgrade`
     over the assigned zpools on the first boot.  This is useful to the
     ZFS based VM images built from release(7).



diff --git a/share/man/man5/rc.conf.5 b/share/man/man5/rc.conf.5
index f9ceabc83120..43fa44a5f1cb 100644
--- a/share/man/man5/rc.conf.5
+++ b/share/man/man5/rc.conf.5
@@ -24,7 +24,7 @@
  .\"
  .\" $FreeBSD$
  .\"
-.Dd August 28, 2022
+.Dd November 7, 2022
  .Dt RC.CONF 5
  .Os
  .Sh NAME
@@ -2109,6 +2109,13 @@ A space-separated list of ZFS pool names for 
which new pool GUIDs should be
  assigned upon first boot.
  This is useful when using a ZFS pool copied from a template, such 
as a virtual
  machine image.
+.It Va zpool_upgrade
+.Pq Vt str
+A space-separated list of ZFS pool names for which version should 
be upgraded
+upon first boot.
+This is useful when using a ZFS pool generated by
+.Xr makefs 8
+utility.


For someone who knows ZFS well, it is clear that only a zpool upgrade 
is done. Not so experienced people may assume there is a combination 
of zpool upgrade and zfs upgrade (more so for people which do not know 
what the difference is). Maybe you want to add some explicit 
documentation, that zfs upgrade + feature flags needs to be done by 
hand.

And this brings me to a second topic, we don't have an explicit list 
of features which are supported by the bootloader (I had a look at the 
zfs and the boot related man pages, if I overlooked a place, then the 
other places should reference this important part with some text).


    
   There is a fixed list of features we support in the boot loader:
    
   /*
 * List of ZFS features supported for read
 */
static const char *features_for_read[] = {
        "org.illumos:lz4_compress",
        "com.delphix:hole_birth",
        "com.delphix:extensible_dataset",
        "com.delphix:embedded_data",
        "org.open-zfs:large_blocks",
        "org.illumos:sha512",
        "org.illumos:skein",
        "org.zfsonlinux:large_dnode",
        "com.joyent:multi_vdev_crash_dump",
        "com.delphix:spacemap_histogram",
        "com.delphix:zpool_checkpoint",
        "com.delphix:spacemap_v2",
        "com.datto:encryption",
        "com.datto:bookmark_v2",
        "org.zfsonlinux:allocation_classes",
        "com.datto:resilver_defer",
        "com.delphix:device_removal",
        "com.delphix:obsolete_counts",
        "com.intel:allocation_classes",
        "org.freebsd:zstd_compress",
        "com.delphix:bookmark_written",
        "com.delphix:head_errlog",
        "org.openzfs:blake3",
        NULL
};
    
   Any feature not on this list will cause the boot loader to  
reject the pool.

    
   Whether or not it should do that by default, always, or never is an open
   question. I've thought there should be a 'shoot footing'  
override that isn't

   there today.
    


Thanks for the list. For those interested, it is in
    $SRC/stand/libsa/zfs/zfsimpl.c

Just to make my opinion expressed before explicit again, this should  
be documented in a boot / bootloader related man-page, but isn't.


Should the above list be sorted in some way? Maybe in the same order  
as the zpool-features lists them (sort by feature name after the  
colon), or alphabetical?


Is it OK if I commit this alphabetical sorting?
---snip---
diff --git a/stand/libsa/zfs/zfsimpl.c b/stand/libsa/zfs/zfsimpl.c
index 6b961f3110a..36c90613e82 100644
--- a/stand/libsa/zfs/zfsimpl.c
+++ b/stand/libsa/zfs/zfsimpl.c
@@ -118,29 +118,29 @@ static vdev_list_t zfs_vdevs;
  * List of ZFS features supported for read
  */
 static const char *features_for_read[] = {
-"org.illumos:lz4_compress",
-"com.delphix:hole_birth",
-"com.delphix:extensible_dataset",
-"com.delphix:embedded_data",
-"org.open-zfs:large_blocks",
-"org.illumos:sha512",
-"org.illumos:skein",
-"org.zfsonlinux:large_dnode",
-"com.j

Re: Did clang 14 lose some intrinsics support?

2022-09-26 Thread Alexander Leidinger
Quoting Dimitry Andric  (from Mon, 26 Sep 2022  
12:03:03 +0200):



Sure, but if you are compiling without -mavx, why would you want the AVX
intrinsics? You cannot use AVX intrinsics anyway, if AVX is not enabled.

So I don't fully understand the problem this configure scripting is
supposed to solve?


Think about run time check of available CPU features and then using  
this code for performance critical sections only. Allows to generate  
programs which are generic to all CPUs in the main code paths, and  
able to switch to high performance implementations of critical code  
paths depending on the feature of the CPU.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpjLRXsPlQVc.pgp
Description: Digitale PGP-Signatur


Re: Good practices with bectl

2022-09-21 Thread Alexander Leidinger
Quoting David Wolfskill  (from Wed, 21 Sep 2022  
03:25:52 -0700):



On Wed, Sep 21, 2022 at 11:27:06AM +0200, Alexander Leidinger wrote:

  ...
make DESTDIR=${BASEDIR} -DBATCH_DELETE_OLD_FILES delete-old delete-old-libs

Usually I replace the delete-old-libs with check-old, as I don't want
to blindly delete them (some ports may depend on them... at least for
the few libs which don't have symbol versioning).



A way to address that issue that may work for you is to install
appropriate misc/compat* ports/packages.


I'm running exclusively on -current. In the cases where this happens,  
there are no compat packages yet. And I rather update the ports than  
to install a compat package. It doesn't hurt me to keep the libs  
during the pkg rebuild.


In the generic case I prefer to stay safe and keep the libs until I  
validated that nothing uses them anymore. That's the reason why I made  
the delete-old-libs functionality separate from delete-old already in  
the initial implementation.


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpNgN_xBrE1T.pgp
Description: Digitale PGP-Signatur


Re: Good practices with bectl

2022-09-21 Thread Alexander Leidinger
Quoting Alan Somers  (from Tue, 20 Sep 2022  
16:19:49 -0600):



sudo bectl activate ${RELEASE}


Failsafe (if the machine is too far away to simply walk over and  
switch to the old BE):

bectl activate -t ${RELEASE}

Needs an activate without -t later.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpXQ2xA9dSvy.pgp
Description: Digitale PGP-Signatur


Re: Good practices with bectl

2022-09-21 Thread Alexander Leidinger
 Quoting Nuno Teixeira  (from Wed, 21 Sep 2022  
00:11:41 +0100):



(...)
   maybe:
   > yes | make DESTDIR=${BASEDIR} delete-old delete-old-libs


make DESTDIR=${BASEDIR} -DBATCH_DELETE_OLD_FILES delete-old delete-old-libs

Usually I replace the delete-old-libs with check-old, as I don't want  
to blindly delete them (some ports may depend on them... at least for  
the few libs which don't have symbol versioning).


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpXRih39w_V7.pgp
Description: Digitale PGP-Signatur


Re: domain names and internationalization?

2022-09-19 Thread Alexander Leidinger
Quoting Rick Macklem  (from Mon, 19 Sep 2022  
20:27:29 +):



Hi,

Recently there has been discussion on the NFSv4 IETF working
group email list w.r.t. internationalization for the domain name
it uses for users/groups.

Right now, I am pretty sure the FreeBSD nfsuserd(8) only works
for ascii domain names, but...

I am hoping someone knows what DNS does in this area (the
working group list uses terms like umlaut, which I have never
even heard of;-).


DNS does this:
https://en.wikipedia.org/wiki/Punycode
This page also shows some umlauts (German ones to be precise, e.g.  
"Bücher") and other things like chinese and other characters.


There are libs which do the conversation, e.g.  
https://www.gnu.org/software/libidn/doxygen/index.html

I don't know if there are libs with more preferred licenses.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpb4gSj3Lh6y.pgp
Description: Digitale PGP-Signatur


Re: nullfs and ZFS issues

2022-04-26 Thread Alexander Leidinger
Quoting Eirik Øverby  (from Mon, 25 Apr 2022  
18:44:19 +0200):



On Mon, 2022-04-25 at 15:27 +0200, Alexander Leidinger wrote:

Quoting Alexander Leidinger  (from Sun, 24
Apr 2022 19:58:17 +0200):

> Quoting Alexander Leidinger  (from Fri, 22
> Apr 2022 09:04:39 +0200):
>
> > Quoting Doug Ambrisko  (from Thu, 21 Apr
> > 2022 09:38:35 -0700):
>
> > > I've attached mount.patch that when doing mount -v should
> > > show the vnode usage per filesystem.  Note that the problem I was
> > > running into was after some operations arc_prune and arc_evict would
> > > consume 100% of 2 cores and make ZFS really slow.  If you are not
> > > running into that issue then nocache etc. shouldn't be needed.
> >
> > I don't run into this issue, but I have a huge perf difference when
> > using nocache in the nightly periodic runs. 4h instead of 12-24h
> > (22 jails on this system).
> >
> > > On my laptop I set ARC to 1G since I don't use swap and in the past
> > > ARC would consume to much memory and things would die.  When the
> > > nullfs holds a bunch of vnodes then ZFS couldn't release them.
> > >
> > > FYI, on my laptop with nocache and limited vnodes I haven't run
> > > into this problem.  I haven't tried the patch to let ZFS free
> > > it's and nullfs vnodes on my laptop.  I have only tried it via
> >
> > I have this patch and your mount patch installed now, without
> > nocache and reduced arc reclaim settings (100, 1). I will check the
> > runtime for the next 2 days.
>
> 9-10h runtime with the above settings (compared to 4h with nocache
> and 12-24h without any patch and without nocache).
> I changed the sysctls back to the defaults and will see in the next
> run (in 7h) what the result is with just the patches.

And again 9-10h runtime (I've seen a lot of the find processes in the
periodic daily run of those 22 jails in the state "*vnode"). Seems
nocache gives the best perf for me in this case.


Sorry for jumping in here - I've got a couple of questions:
- Will this also apply to nullfs read-only mounts? Or is it only in
case of writing "through" a nullfs mount that these problems are seen?
- Is it a problem also in 13, or is this "new" in -CURRENT?

We're having weird and unexplained CPU spikes on several systems, even
after tuning geli to not use gazillions of threads. So far our
suspicion has been ZFS snapshot cleanups but this is an interesting
contender - unless the whole "read only" part makes it moot.


For me this started after creating one more jail on this system and I  
dont't see CPU spikes (as the system is running permanently at 100%  
and the distribution of the CPU looks as I would expect it). The  
experience of Doug is a little bit different, as he experiences a high  
amount of CPU usage "for nothing" or even a dead-lock like situation.  
So I would say we see different things based on similar triggers.


The nocache option for nullfs is affecting the number of vnodes in use  
on the system no matter if ro or rw. As such you can give it a try.  
Note, depending on the usage pattern, the nocache option may increase  
lock contention. So it may or may not have a positive or negative  
performance impact.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgp8REsu61FBW.pgp
Description: Digitale PGP-Signatur


Re: nullfs and ZFS issues

2022-04-25 Thread Alexander Leidinger
Quoting Alexander Leidinger  (from Sun, 24  
Apr 2022 19:58:17 +0200):


Quoting Alexander Leidinger  (from Fri, 22  
Apr 2022 09:04:39 +0200):


Quoting Doug Ambrisko  (from Thu, 21 Apr  
2022 09:38:35 -0700):



I've attached mount.patch that when doing mount -v should
show the vnode usage per filesystem.  Note that the problem I was
running into was after some operations arc_prune and arc_evict would
consume 100% of 2 cores and make ZFS really slow.  If you are not
running into that issue then nocache etc. shouldn't be needed.


I don't run into this issue, but I have a huge perf difference when  
using nocache in the nightly periodic runs. 4h instead of 12-24h  
(22 jails on this system).



On my laptop I set ARC to 1G since I don't use swap and in the past
ARC would consume to much memory and things would die.  When the
nullfs holds a bunch of vnodes then ZFS couldn't release them.

FYI, on my laptop with nocache and limited vnodes I haven't run
into this problem.  I haven't tried the patch to let ZFS free
it's and nullfs vnodes on my laptop.  I have only tried it via


I have this patch and your mount patch installed now, without  
nocache and reduced arc reclaim settings (100, 1). I will check the  
runtime for the next 2 days.


9-10h runtime with the above settings (compared to 4h with nocache  
and 12-24h without any patch and without nocache).
I changed the sysctls back to the defaults and will see in the next  
run (in 7h) what the result is with just the patches.


And again 9-10h runtime (I've seen a lot of the find processes in the  
periodic daily run of those 22 jails in the state "*vnode"). Seems  
nocache gives the best perf for me in this case.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpbbgb6PWKOs.pgp
Description: Digitale PGP-Signatur


Re: nullfs and ZFS issues

2022-04-24 Thread Alexander Leidinger
Quoting Alexander Leidinger  (from Fri, 22  
Apr 2022 09:04:39 +0200):


Quoting Doug Ambrisko  (from Thu, 21 Apr 2022  
09:38:35 -0700):



I've attached mount.patch that when doing mount -v should
show the vnode usage per filesystem.  Note that the problem I was
running into was after some operations arc_prune and arc_evict would
consume 100% of 2 cores and make ZFS really slow.  If you are not
running into that issue then nocache etc. shouldn't be needed.


I don't run into this issue, but I have a huge perf difference when  
using nocache in the nightly periodic runs. 4h instead of 12-24h (22  
jails on this system).



On my laptop I set ARC to 1G since I don't use swap and in the past
ARC would consume to much memory and things would die.  When the
nullfs holds a bunch of vnodes then ZFS couldn't release them.

FYI, on my laptop with nocache and limited vnodes I haven't run
into this problem.  I haven't tried the patch to let ZFS free
it's and nullfs vnodes on my laptop.  I have only tried it via


I have this patch and your mount patch installed now, without  
nocache and reduced arc reclaim settings (100, 1). I will check the  
runtime for the next 2 days.


9-10h runtime with the above settings (compared to 4h with nocache and  
12-24h without any patch and without nocache).
I changed the sysctls back to the defaults and will see in the next  
run (in 7h) what the result is with just the patches.


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpufK9bQL8ps.pgp
Description: Digitale PGP-Signatur


Re: nullfs and ZFS issues

2022-04-22 Thread Alexander Leidinger
Quoting Doug Ambrisko  (from Thu, 21 Apr 2022  
09:38:35 -0700):



On Thu, Apr 21, 2022 at 03:44:02PM +0200, Alexander Leidinger wrote:
| Quoting Mateusz Guzik  (from Thu, 21 Apr 2022
| 14:50:42 +0200):
|
| > On 4/21/22, Alexander Leidinger  wrote:
| >> I tried nocache on a system with a lot of jails which use nullfs,
| >> which showed very slow behavior in the daily periodic runs (12h runs
| >> in the night after boot, 24h or more in subsequent nights). Now the
| >> first nightly run after boot was finished after 4h.
| >>
| >> What is the benefit of not disabling the cache in nullfs? I would
| >> expect zfs (or ufs) to cache the (meta)data anyway.
| >>
| >
| > does the poor performance show up with
| > https://people.freebsd.org/~mjg/vnlru_free_pick.diff ?
|
| I would like to have all the 22 jails run the periodic scripts a
| second night in a row before trying this.
|
| > if the long runs are still there, can you get some profiling from it?
| > sysctl -a before and after would be a start.
| >
| > My guess is that you are the vnode limit and bumping into the 1  
second sleep.

|
| That would explain the behavior I see since I added the last jail
| which seems to have crossed a threshold which triggers the slow
| behavior.
|
| Current status (with the 112 nullfs mounts with nocache):
| kern.maxvnodes:   10485760
| kern.numvnodes:3791064
| kern.freevnodes:   3613694
| kern.cache.stats.heldvnodes:151707
| kern.vnodes_created: 260288639
|
| The maxvnodes value is already increased by 10 times compared to the
| default value on this system.

I've attached mount.patch that when doing mount -v should
show the vnode usage per filesystem.  Note that the problem I was
running into was after some operations arc_prune and arc_evict would
consume 100% of 2 cores and make ZFS really slow.  If you are not
running into that issue then nocache etc. shouldn't be needed.


I don't run into this issue, but I have a huge perf difference when  
using nocache in the nightly periodic runs. 4h instead of 12-24h (22  
jails on this system).



On my laptop I set ARC to 1G since I don't use swap and in the past
ARC would consume to much memory and things would die.  When the
nullfs holds a bunch of vnodes then ZFS couldn't release them.

FYI, on my laptop with nocache and limited vnodes I haven't run
into this problem.  I haven't tried the patch to let ZFS free
it's and nullfs vnodes on my laptop.  I have only tried it via


I have this patch and your mount patch installed now, without nocache  
and reduced arc reclaim settings (100, 1). I will check the runtime  
for the next 2 days.


Your mount patch to show the per mount vnodes count looks useful, not  
only for this particular case. Do you intend to commit it?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpaRSyTU_E11.pgp
Description: Digitale PGP-Signatur


Re: nullfs and ZFS issues

2022-04-21 Thread Alexander Leidinger
Quoting Mateusz Guzik  (from Thu, 21 Apr 2022  
14:50:42 +0200):



On 4/21/22, Alexander Leidinger  wrote:

I tried nocache on a system with a lot of jails which use nullfs,
which showed very slow behavior in the daily periodic runs (12h runs
in the night after boot, 24h or more in subsequent nights). Now the
first nightly run after boot was finished after 4h.

What is the benefit of not disabling the cache in nullfs? I would
expect zfs (or ufs) to cache the (meta)data anyway.



does the poor performance show up with
https://people.freebsd.org/~mjg/vnlru_free_pick.diff ?


I would like to have all the 22 jails run the periodic scripts a  
second night in a row before trying this.



if the long runs are still there, can you get some profiling from it?
sysctl -a before and after would be a start.

My guess is that you are the vnode limit and bumping into the 1 second sleep.


That would explain the behavior I see since I added the last jail  
which seems to have crossed a threshold which triggers the slow  
behavior.


Current status (with the 112 nullfs mounts with nocache):
kern.maxvnodes:   10485760
kern.numvnodes:3791064
kern.freevnodes:   3613694
kern.cache.stats.heldvnodes:151707
kern.vnodes_created: 260288639

The maxvnodes value is already increased by 10 times compared to the  
default value on this system.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpDvfG_fAon2.pgp
Description: Digitale PGP-Signatur


Re: nullfs and ZFS issues

2022-04-21 Thread Alexander Leidinger
Quoting Doug Ambrisko  (from Wed, 20 Apr 2022  
09:20:33 -0700):



On Wed, Apr 20, 2022 at 11:39:44AM +0200, Alexander Leidinger wrote:
| Quoting Doug Ambrisko  (from Mon, 18 Apr 2022
| 16:32:38 -0700):
|
| > With nullfs, nocache and settings max vnodes to a low number I can
|
| Where is nocache documented? I don't see it in mount_nullfs(8),
| mount(8) or nullfs(5).

I didn't find it but it is in:
	src/sys/fs/nullfs/null_vfsops.c:  if (vfs_getopt(mp->mnt_optnew,  
"nocache", NULL, NULL) == 0 ||


Also some file systems disable it via MNTK_NULL_NOCACHE


Does the attached diff look ok?


| I tried a nullfs mount with nocache and it doesn't show up in the
| output of "mount".

Yep, I saw that as well.  I could tell by dropping into ddb and then
do a show mount on the FS and look at the count.  That is why I added
the vnode count to mount -v so I could see the usage without dropping
into ddb.


I tried nocache on a system with a lot of jails which use nullfs,  
which showed very slow behavior in the daily periodic runs (12h runs  
in the night after boot, 24h or more in subsequent nights). Now the  
first nightly run after boot was finished after 4h.


What is the benefit of not disabling the cache in nullfs? I would  
expect zfs (or ufs) to cache the (meta)data anyway.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF
diff --git a/sbin/mount/mount.8 b/sbin/mount/mount.8
index 2a877c04c07..823df63953d 100644
--- a/sbin/mount/mount.8
+++ b/sbin/mount/mount.8
@@ -28,7 +28,7 @@
 .\" @(#)mount.8	8.8 (Berkeley) 6/16/94
 .\" $FreeBSD$
 .\"
-.Dd March 17, 2022
+.Dd April 21, 2022
 .Dt MOUNT 8
 .Os
 .Sh NAME
@@ -245,6 +245,9 @@ This file system should be skipped when
 is run with the
 .Fl a
 flag.
+.It Cm nocache
+Disable caching.
+Some filesystems may not support this.
 .It Cm noclusterr
 Disable read clustering.
 .It Cm noclusterw


pgpYPDNVeBw1Z.pgp
Description: Digitale PGP-Signatur


Re: nullfs and ZFS issues

2022-04-20 Thread Alexander Leidinger
Quoting Doug Ambrisko  (from Mon, 18 Apr 2022  
16:32:38 -0700):



With nullfs, nocache and settings max vnodes to a low number I can


Where is nocache documented? I don't see it in mount_nullfs(8),  
mount(8) or nullfs(5).


I tried a nullfs mount with nocache and it doesn't show up in the  
output of "mount".


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpGalMcrXooX.pgp
Description: Digitale PGP-Signatur


Re: main-n254654-d4e8207317c results in "no pools available to import"

2022-04-12 Thread Alexander Leidinger

Quoting Thomas Laus  (from Tue, 12 Apr 2022 11:17:09 +):


On 4/11/22 14:18, Ronald Klop wrote:

On 4/11/22 17:17, Dennis Clarke wrote:


Did the usual git pull origin main and buildworld/buildkernel but  
after installkernel the machine will not boot.


The rev seems to be main-n254654-d4e8207317c.

I can boot single user mode and get a command prompt but nothing past
that. Is there something borked in ZFS in CURRENT ?

Up until now you are the only one with this error on the  
mailinglist today. So I doubt something is borked.
You could consider to share more details about your setup to help  
people to think along with you.


I can confirm this issue.  My last update was  
'main-n253996-1b3af110bc' from March 31, 2022 that worked fine.  My  
update yesterday received the same error and refused to boot past  
looking for kernel modules.  I did receive the "no pools available  
to import" message a couple of lines earlier.  My hardware is a Dell  
Inspiron laptop with a SSD and ZFS filesystem.  I have a little time  
today and plan on git reverting back to March 31 to further isolate  
the problem.


Some data point from a system with current as of 2022-04-06 15:23 (not  
sure if related or a red hering): pool imports fine, but iocage spits  
out a lot of "setting up zpool for iocage usage" during an "iocage  
list". And it doesn't auto-start the iocage jails. As I only updated  
the OS and not the ports (besides: no change to iocage in ports), it  
may be the case that some kind of detection logic in zfs code is now  
misbehaving in some cases...


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgphOwnY6qoc_.pgp
Description: Digitale PGP-Signatur


Re: injecting vars into rc-service-scripts at jail-start?

2022-04-01 Thread Alexander Leidinger
Quoting Jens Schweikhardt  (from Fri, 1 Apr  
2022 14:26:27 +0200 (CEST)):



Identifier confusion? You use _rc_svcs and _rc_svcj in your description.


Typo s/svcs/svcj/ in the explanation.

The diff/code has the vars correct (svcj) and the conditional and the  
setting are close to each other and are "_rc_svcj".


Bye,
Alexander.


--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpMgVvmC6DdY.pgp
Description: Digitale PGP-Signatur


injecting vars into rc-service-scripts at jail-start?

2022-04-01 Thread Alexander Leidinger

Hi,

I'm overlooking something fundamental it seems...

Context:
I'm working on my auto-jailing of services idea: if the auto-jail is  
enabled, a service like syslog is started inside a jail (which  
inherits the FS and depending on some settings also inherits network  
and other stuff or not).


My previous implementation was using _rc_prefix (jailstart) to denote  
the start of a service inside a jail so that "service XXX start" on a  
host would "service XXX jailstart" inside a jail. This had off course  
issues as there is no infrastructure for multiple prefix like  
onejailstart or jailonestart...


Problem:
Now I try to find a way to do it without a prefix, and the first thing  
which comes to my mind is to do "jail xxx 'exec.start=/usr/bin/env  
_rc_svcs=jailing /usr/bin/service XXX CMD ARGS'".



My expectation is, that this would set _rc_svcs=jailing for the  
command service XXX CMND args. Having a "set -x" in rc.subr shows  
clearly in the jail-console log, that inside that jail, the variable  
_rc_svcj is not set. Using "-v" for the env command shows in the log  
that it is called and it sets the var and executes the service command  
with syslog start as arguments.


I tried to find some env-cleanup part in rc.subr, which would discard  
all _rc* variables, but if there is something like that I overlooked it.


For a stop, I call "jexec /usr/bin/env _rc_svcj=jailing  
/usr/sbin/service XXX stop args", and it works, so I rather tend to  
believe there is no env-cleanup.


What am I doing wrong so that _rc_svcj is not picked up inside the jail?

So here is my diff between "prefix driven" (= working) and "var  
driven" (var not picked up inside the jail):

---snip---
case "$rc_arg" in
start)
-   if [ "${_rc_prefix}" != jail ]; then
+   if [ "${_rc_svcj}" != jailing ]; then
_return=1
$JAIL_CMD -c  
$_svcj_generic_params $_svcj_cmd_options \
-
exec.start="/usr/sbin/service ${name} jailstart $rc_extra_args" \
-
exec.stop="/usr/sbin/service ${name} jailstop $rc_extra_args" \
+
exec.start="/usr/bin/env _rc_svcj=jailing /usr/sbin/service ${name}  
${rc_arg} $rc_extra_args" \
+
exec.stop="/usr/bin/env _rc_svcj=jailing /usr/sbin/service ${name}  
${rc_arg} $rc_extra_args" \
 
exec.consolelog="/var/log/svcj_${name}_console.log" \
name=svcj-${name}  
&& _return=0

else
# normal start of  
_cmd via _run_rc_doit

---snip---

What set -x tells what it calls:
---snip---
+ /usr/sbin/jail -c 'path=/' mount.nodevfs 'host=inherit'  
'ip4=inherit' 'ip6=inherit' allow.reserved_ports  
'exec.start=/usr/bin/env -v _rc_svcj=jailing /usr/sbin/service -v  
syslogd start  ' 'exec.stop=/usr/bin/env _rc_svcj=jailing  
/usr/sbin/service syslogd start  '  
'exec.consolelog=/var/log/svcj_syslogd_console.log' 'name=svcj-syslogd'

---snip---

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpqKaeDPTVSH.pgp
Description: Digitale PGP-Signatur


Re: What are the in-kernel functions to print human readable timestamps (bintime)?

2022-03-12 Thread Alexander Leidinger

Quoting Warner Losh  (from Fri, 11 Mar 2022 08:57:33 -0700):


On Fri, Mar 11, 2022 at 2:52 AM Alexander Leidinger 
wrote:


Hi,

I'm looking for a function to convert bintime to a human readable
format in the kernel... and what is the usual format we use?



Yes. We don't generally log it.


Would it be acceptable in this particular case (or should I keep this  
change for me)?


I have to check out the kern.msgbuf_show_timestamp part in this  
thread, which looks interesting. It may fit my needs here.



The use case for this is: if something throws a log from the kernel
about a signal, I want to know when it happened, or in terms of code
see below (tabs are most probably messed up).

Do we have some kind of policy in terms of kernel messages and
timestamps? Like "do not commit logging with timestamps"?



Correct. The kernel doesn't know enough to properly render timestamps,
nor should it. It's a lot more complicated than you'd expect, and the
simple,
naive implementation has too many flaws...


I'm aware that it is complicated. IMO too complicated for a full  
implementation which will satisfy all needs.



I have the
code below because I needed it at least once and think something like
this (in a human readably shape) would be beneficial to have in the
tree.



I really don't want to see code like that in the tree. Having it per-message
in an ad-hoc manner strikes me as quite unwise, since how do you interpret
it after the fact, especially in the face of adjustments to boottime for any
time adjustments that happen after the event.


Sorry, I was not verbose enough in my mail it seems. I do _not_ want  
to commit this code like it is. I was looking for stuff which could  
print something human readable, like "2022-12-03 14:23:22.45  
'kernel-time'" (just as an example). I don't want to push the TZ into  
the kernel, or the knowledge if the cmos clock is set to localtime or  
UTC. I want to provide an admin with a way to determine when such a  
message may have printed. Currently there is no way to know at which  
time it was printed. The admin needs to know if the clock is set in  
UTC or localtime, and how to calculate the wall-clock time from this,  
which is less heavy on the implementation and provides a mean to  
correlate the mesage with some action on the machine.



Now, having said that, having good timestamps in the dmesg buffer is
a desirable feature. 'Good' is subjective here, and there are times early
in boot where it's simply not possible to get a time better than '0' until
the timehands are ticking...  But the dmesg buffer has more than what
dmesg prints: it has syslog'd things (at least some of them) as well.
There's
also a priority that some lines have. <3>Foo, for example. It would be
better,
imho, to add a timestamp to the start of the lines (perhaps optionally
since that might be expensive in $HUGE systems, or at times of
extreme dmesg spew and the could be omitted in those cases).
If you are interested in just the log messages, it wouldn't be terrible
since we already add stuff to what's printed for the priority. We could say
<3,seconds-since-boot.fracsec> instead of just <3> and hack dmesg
to print the right thing.


From the other message in the thread it looks like  
kern.msgbuf_show_timestamp is what you describe here?
And it looks like kern.msgbuf_show_timestamp is not the same as  
printing a timestamp to the console... (it looks like the timestamp is  
printed in the dmesg, but not to the console, which is what I have in  
mind for this particular message... respectively could be added log()  
with a sysctl to activate it or not).


Bye,
Alexander.


Warner



Code:
---snip---
diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
index 4a15bd45355..a83eebe0736 100644
--- a/sys/kern/kern_sig.c
+++ b/sys/kern/kern_sig.c
@@ -80,6 +80,7 @@ __FBSDID("$FreeBSD$");
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -3440,14 +3441,18 @@ sigexit(struct thread *td, int sig)
  */
 if (coredump(td) == 0)
 sig |= WCOREFLAG;
-   if (kern_logsigexit)
+   if (kern_logsigexit) {
+   struct bintime now;
+
+   getbintime();
 log(LOG_INFO,
-   "pid %d (%s), jid %d, uid %d: exited on "
-   "signal %d%s\n", p->p_pid, p->p_comm,
+   "%zd: pid %d (%s), jid %d, uid %d: exited on "
+   "signal %d%s\n", now.sec, p->p_pid, p->p_comm,
 p->p_ucred->cr_prison->pr_id,
 td->td_ucred->cr_uid,
 sig &~ WCOREFLAG,
 sig & WCOREFLAG ? " (core dumped)" : "");
+   }
 } els

What are the in-kernel functions to print human readable timestamps (bintime)?

2022-03-11 Thread Alexander Leidinger

Hi,

I'm looking for a function to convert bintime to a human readable  
format in the kernel... and what is the usual format we use?



The use case for this is: if something throws a log from the kernel  
about a signal, I want to know when it happened, or in terms of code  
see below (tabs are most probably messed up).


Do we have some kind of policy in terms of kernel messages and  
timestamps? Like "do not commit logging with timestamps"? I have the  
code below because I needed it at least once and think something like  
this (in a human readably shape) would be beneficial to have in the  
tree.


Code:
---snip---
diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
index 4a15bd45355..a83eebe0736 100644
--- a/sys/kern/kern_sig.c
+++ b/sys/kern/kern_sig.c
@@ -80,6 +80,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3440,14 +3441,18 @@ sigexit(struct thread *td, int sig)
 */
if (coredump(td) == 0)
sig |= WCOREFLAG;
-   if (kern_logsigexit)
+   if (kern_logsigexit) {
+   struct bintime now;
+
+   getbintime();
log(LOG_INFO,
-   "pid %d (%s), jid %d, uid %d: exited on "
-   "signal %d%s\n", p->p_pid, p->p_comm,
+   "%zd: pid %d (%s), jid %d, uid %d: exited on "
+   "signal %d%s\n", now.sec, p->p_pid, p->p_comm,
p->p_ucred->cr_prison->pr_id,
td->td_ucred->cr_uid,
sig &~ WCOREFLAG,
sig & WCOREFLAG ? " (core dumped)" : "");
+   }
} else
PROC_UNLOCK(p);
exit1(td, 0, sig);
---snip---

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpG9ftB1Te63.pgp
Description: Digitale PGP-Signatur


What are the in-kernel functions to format time?

2022-03-11 Thread Alexander Leidinger

Hi,

I'm looking for a function to convert bintime to a human readable  
format in the kernel... and what is the usual format we use?



The use case for this is: if something throws a log from the kernel  
about a signal, I want to know when it happened, or in terms of code  
see below (tabs are most probably messed up).


Do we have some kind of policy in terms of kernel messages and  
timestamps? Like "do not commit logging with timestamps"? I have the  
code below because I needed it at least once and think something like  
this (in a human readably shape) would be beneficial to have in the  
tree.


Code:
---snip---
diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
index 4a15bd45355..a83eebe0736 100644
--- a/sys/kern/kern_sig.c
+++ b/sys/kern/kern_sig.c
@@ -80,6 +80,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3440,14 +3441,18 @@ sigexit(struct thread *td, int sig)
 */
if (coredump(td) == 0)
sig |= WCOREFLAG;
-   if (kern_logsigexit)
+   if (kern_logsigexit) {
+   struct bintime now;
+
+   getbintime();
log(LOG_INFO,
-   "pid %d (%s), jid %d, uid %d: exited on "
-   "signal %d%s\n", p->p_pid, p->p_comm,
+   "%zd: pid %d (%s), jid %d, uid %d: exited on "
+   "signal %d%s\n", now.sec, p->p_pid, p->p_comm,
p->p_ucred->cr_prison->pr_id,
td->td_ucred->cr_uid,
sig &~ WCOREFLAG,
sig & WCOREFLAG ? " (core dumped)" : "");
+   }
} else
PROC_UNLOCK(p);
exit1(td, 0, sig);
---snip---

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpiH30aWZMth.pgp
Description: Digitale PGP-Signatur


Re: ZFS PANIC: HELP.

2022-02-26 Thread Alexander Leidinger
 Quoting Larry Rosenman  (from Fri, 25 Feb 2022  
20:03:51 -0600):



On 02/25/2022 2:11 am, Alexander Leidinger wrote:

Quoting Larry Rosenman  (from Thu, 24 Feb 2022  
20:19:45 -0600):



I tried a scrub -- it panic'd on a fatal double fault. 

  Suggestions?


 The safest / cleanest (but not fastest) is data export and  
pool re-creation. If you export dataset by dataset (instead of  
recursively all), you can even see which dataset is causing the  
issue. In case this per dataset export narrows down the issue and  
it is a dataset you don't care about (as in: 1) no issue to  
recreate from scratch or 2) there is a backup available) you could  
delete this (or each such) dataset and re-create it in-place (= not  
re-creating the entire pool).


Bye,
Alexander.
 http://www.Leidinger.net alexan...@leidinger.net: PGP  
0x8F31830F9F2772BF

http://www.FreeBSD.org    netch...@freebsd.org  : PGP 0x8F31830F9F2772BF


  I'm running this script:
#!/bin/sh
for i in $(zfs list -H | awk '{print $1}')
do
  FS=$1
  FN=$(echo ${FS} | sed -e s@/@_@g)
  sudo zfs send -vecLep ${FS}@REPAIR_SNAP | ssh  
l...@freenas.lerctr.org cat - \> $FN

done

   

  How will I know a "Problem" dataset?


You told a scrub is panicing the system. A scrub only touches occupied  
blocks. As such a problem-dataset should panic your system. If it  
doesn't panic at all, the problem may be within a snapshot which  
contains data which is deleted in later versions of the dataset.


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpYqsU391ZUr.pgp
Description: Digitale PGP-Signatur


Re: ZFS PANIC: HELP.

2022-02-25 Thread Alexander Leidinger
 Quoting Larry Rosenman  (from Thu, 24 Feb 2022  
20:19:45 -0600):



I tried a scrub -- it panic'd on a fatal double fault. 

  Suggestions?


The safest / cleanest (but not fastest) is data export and pool  
re-creation. If you export dataset by dataset (instead of recursively  
all), you can even see which dataset is causing the issue. In case  
this per dataset export narrows down the issue and it is a dataset you  
don't care about (as in: 1) no issue to recreate from scratch or 2)  
there is a backup available) you could delete this (or each such)  
dataset and re-create it in-place (= not re-creating the entire pool).


Bye,
Alexander.
--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpbleK3b3rSl.pgp
Description: Digitale PGP-Signatur


Re: [RFC] Making mount_nfs to attempt NFSv4 before NFSv3 and NFSv2?

2022-01-03 Thread Alexander Leidinger via freebsd-current
Quoting Rick Macklem  (from Tue, 4 Jan 2022  
03:18:36 +):



Konstantin Belousov wrote:
[good stuff snipped]

The v4 NFS is very different from v3, it is not an upgrade, it is rather
a different network filesystem with some (significant) similarities to v3.

That said, it should be fine changing the defaults, but you need to ensure
that reasonable scenarios, like the changed FreeBSD client mounting
from v3-only server, still work correctly.  The change should be made in a
way that only affects client that connects to the server that has both
v4 and v3.

A particular test case that needs to be done is the diskless NFS root fs.
This case must use NFSv3 and if it is not the default, it might break?
I am not really set up to test this at this time.
(There are assorted reasons that NFSv4 does not, or at least might not,
 work for a diskless root fs, but that is a separate topic.)

Other than testing diskless NFS root file systems, I do not have a
strong opinion w.r.t. whether the default should change.

If the default stays as NFSv3, a fallback to NFSv4 could be done, which
would handle the NFSv4 only server case. (No one uses NFSv2 any more,
so the fallback to NFSv2 is almost irrelevant, imho.)


As you particiate in interoperability tests, would it make sense to  
check how those other implementations handle this case? I naively  
assume you have some contacts or a mailinglist you could use for that.


Bye,
Alexander.


--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Waiting for bufdaemon

2021-03-06 Thread Alexander Leidinger via freebsd-current


Quoting Konstantin Belousov  (from Fri, 5 Mar  
2021 22:43:58 +0200):



On Fri, Mar 05, 2021 at 04:03:11PM +0900, Yasuhiro Kimura wrote:

Dear src committers,

From: Yasuhiro Kimura 
Subject: Re: Waiting for bufdaemon
Date: Thu, 28 Jan 2021 05:02:42 +0900 (JST)

>>> I have been experiencing same problem with my 13-CURRENT amd64
>>> VirtualBox VM for about a month. The conditions that the problem
>>> happens are unclear and all what I can say is
>>>
>>> * It happens only after I login in the VM and do something for a
>>>   while. If I boot the VM and shut it down immediately, it never
>>>   happens.
>>> * When the problem happens, one or more unkillable processes seem to
>>>   be left.
>>
>> CPU of my host is not AMD but Intel. According to the
>> /var/run/dmesg.boot of VM, information of CPU is as following.
>>
>> CPU: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz (3000.09-MHz K8-class CPU)
>>   Origin="GenuineIntel"  Id=0x906ed  Family=0x6  Model=0x9e  Stepping=13
>>
Features=0x1783fbff
>>
Features2=0x5eda2203

>>   AMD Features=0x28100800
>>   AMD Features2=0x121
>>   Structured Extended  
Features=0x842421

>>   Structured Extended Features3=0x3400
>>   IA32_ARCH_CAPS=0x29
>>   TSC: P-state invariant
>>
>> Just FYI.
>
> It took for a while to investigate, but according to the result of
> bisect following commit is the source of problem in my case.
>
> --
> commit 84eaf2ccc6a
> Author: Konstantin Belousov 
> Date:   Mon Dec 21 19:02:31 2020 +0200
>
> x86: stop punishing VMs with low priority for TSC timecounter
>
> I suspect that virtualization techniques improved from the  
time when we
> have to effectively disable TSC use in VM.  For instance, it  
was reported

> (complained) in https://github.com/JuliaLang/julia/issues/38877 that
> FreeBSD is groundlessly slow on AWS with some loads.
>
> Remove the check and start watching for complaints.
>
> Reviewed by:emaste, grehan
> Discussed with: cperciva
> Sponsored by:   The FreeBSD Foundation
> Differential Revision:  https://reviews.freebsd.org/D27629
> --
>
> I confirmed the problem still happens with 5c325977b11 but reverting
> above commit fixes it.

Would someone please revert above commit from main, stable/13 and
releng/13.0? As I wrote previous mail I submitted this problem to
Bugzilla as bug 253087 and added the committer to Cc. But there is no
response for 34 days.

I confirmed the problem still happens with 37cd6c20dbc of main and
13.0-RC1.


My belief is that this commit helps more users than it hurts.  Namely,
the VMWare and KVM users, which are majority, use fast timecounter,
comparing to the more niche hypervisors like VirtualBox.

For you, a simple but manual workaround, setting the timecounter to
ACPI (?) or might be HPET, with a loader tunable, should do it.


Do you propose this to him as a test if this solves the issue with the  
intend to change the code to use a more suitable TC if VirtualBox is  
detected?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpZtn2nb4T4U.pgp
Description: Digitale PGP-Signatur


panic rtentry / hashdestroy /in6_purgeaddr related?

2021-01-21 Thread Alexander Leidinger

Hi,

-current at d7fc908cffa as of 2021-01-20-154143.

I've seen several failures to free an rtentry on the console shortly  
before the panic. May or may not be related..._

---snip---
in6_purgeaddr: err=65, destination address delete failed
Freed UMA keg (rtentry) was not empty (1 items).  Lost 1 pages of memory.
j_gallery_hif: link state changed to DOWN
j_gallery_jif: link state changed to DOWN
 samba.leidinger.net
in6_purgeaddr: err=65, destination address delete failed
Freed UMA keg (rtentry) was not empty (1 items).  Lost 1 pages of memory.
j_samba_hif: link state changed to DOWN
j_samba_jif: link state changed to DOWN
---snip---

This is on shutdown with 22 vnet jails:
---snip---
Unread portion of the kernel message buffer:
<6>in6_purgeaddr: err=65, destination address delete failed
panic: hashdestroy: hashtbl 0xf80be7108800 not empty (malloc type ifaddr)
cpuid = 14
time = 1611248408
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0064b55b10
vpanic() at vpanic+0x181/frame 0xfe0064b55b60
panic() at panic+0x43/frame 0xfe0064b55bc0
hashdestroy() at hashdestroy+0x54/frame 0xfe0064b55bd0
vnet_destroy() at vnet_destroy+0x146/frame 0xfe0064b55c00
prison_deref() at prison_deref+0x28e/frame 0xfe0064b55c40
taskqueue_run_locked() at taskqueue_run_locked+0xb0/frame 0xfe0064b55cc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x97/frame 0xfe0064b55cf0
fork_exit() at fork_exit+0x85/frame 0xfe0064b55d30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0064b55d30
---snip---

Full crashinfo output available on request. Kerneldump is also  
available if you want I peek into it at some specific place.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpXhRFcbfDZO.pgp
Description: Digitale PGP-Signatur


Re: Identifying broken applications following careless use of make -DBATCH_DELETE_OLD_FILES delete-old-libs

2020-12-15 Thread Alexander Leidinger
Quoting Jens Schweikhardt  (from Mon, 14  
Dec 2020 14:15:05 +0100 (CET)):



Alexander,

it would seem that

find /usr/local/*bin* /usr/local/lib* -type f \
| xargs ldd -f '%p|%A\n' 2>/dev/null \
| grep '^not found' | cut -d '|' -f2 \
| xargs pkg which -q | sort -u

is prone to false positives, since ldd is sensitive to LD_LIBRARY_PATH, viz.:


Yes. Firefox, LibreOffice/OpenOffice come to my mind directly. There  
may be others. I expect those to be rare (compared to the size of the  
ports collection), but if you encounter some false positives, it's  
probably a big package. Either way, "locate $missing_lib" is a good  
idea here.


[...]

So make sure you look into what exact library is missing and if
it's actually somewhere "non-standard",
that directory should be in LD_LIBRARY_PATH.


Temporary for the run of this check, yes.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgp867e3Ceh6b.pgp
Description: Digitale PGP-Signatur


  1   2   3   4   5   6   7   >