Re: Graph of the FreeBSD memory fragmentation

2024-05-14 Thread Alexander Leidinger

Am 2024-05-14 03:54, schrieb Ryan Libby:

That was a long winded way of saying: the "UMA bucket" axis is
actually "vm phys free list order".

That said, I find that dimension confusing because in fact there's
just one piece of information there, the average size of a free list
entry, and it doesn't actually depend on the free list order.  The
graph could be 2D.


It evolved into that...
At first I had a 3 dimensional dataset and the first try was to plot it 
as is (3D). The outcome (as points) was not as good as I wanted it to 
be, and plotting as lines gave the wrong direction of lines. I massaged 
the plotting instructions until it looked good enough. I did not try a 
2D plot. I agree, with different colors for each free list order a 2D 
plot may work too. If a 2D plot is better than a 3D plot in this case, 
depends on the mental model of the topic the viewer has. One size may 
not fit all. Feel free to experiment with other plotting styles.



The paper that defines this fragmentation index also says that "the
fragmentation index is only meaningful when an allocation fails".  Are
you actually seeing any contiguous allocations failures in your
measurements?


I'm not aware of such.
The index may only be meaningful for the purposes of the goal of the 
paper when there are such failures, but if you look at the graph and how 
it changed when Bojan changed the guard pages, I see value in the graph 
for more than what the paper suggests.



Without that context, it seems like what the proposed sysctl reports
is indirectly just the average size of free list entries.  We could
just report that.


The calculation of the value is part of a bigger picture. The value 
returned is used by some other code to make decisions.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Graph of the FreeBSD memory fragmentation

2024-05-09 Thread Alexander Leidinger

Am 2024-05-08 18:45, schrieb Bojan Novković:

Hi,

On 5/7/24 14:02, Alexander Leidinger wrote:


Hi,

I created some graphs of the memory fragmentation.
https://www.leidinger.net/blog/2024/05/07/plotting-the-freebsd-memory-fragmentation/

My goal was not comparing a specific change on a given benchmark, but 
to "have something which visualizes memory fragmentation". As part of 
that, Bojans commit 
https://cgit.freebsd.org/src/commit/?id=7a79d066976149349ecb90240d02eed0c4268737 
was just in the middle of my data collection. I have the impression 
that it made a positive difference in my non deterministic workload.

Thank you for working on this, the plots look great!
They provide a really clean visual overview of what's happening.
I'm working on another type of memory visualization which might 
interest you, I'll share it with you once its done.
One small nit - the fragmentation index does not quantify fragmentation 
for UMA buckets, but for page allocator freelists.


Do I get it more correctly now: UMA buckets are type/structure specific 
allocation lists, and the page allocator freelists are size-specific 
allocation lists (which are used by UMA when no free item is available 
in a bucket)?


Is there anything which prevents https://reviews.freebsd.org/D40575 to 
be committed?
D40575 is closely tied to the compaction patch (D40772) which is 
currently on hold until another issue is solved (see D45046 and related 
revisions for more details).


Any idea about https://reviews.freebsd.org/D16620 ? Is D45046 supposed 
to replace this, or is it about something else?
I wanted to try D16620, but it doesn't apply and my naive/mechanical way 
of applying it panics.


I didn't consider landing D40575 because of that, but I guess it could 
be useful on its own.


It at least gives a way to quantify with numbers resp. qualitatively 
visualize. And as such it may help in visualizing differences like with 
your guard-pages commit. I wonder if the segregation of nofree 
allocations may result in a similar improvement for long-running 
systems.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Graph of the FreeBSD memory fragmentation

2024-05-07 Thread Alexander Leidinger

Hi,

I created some graphs of the memory fragmentation.

https://www.leidinger.net/blog/2024/05/07/plotting-the-freebsd-memory-fragmentation/


My goal was not comparing a specific change on a given benchmark, but to 
"have something which visualizes memory fragmentation". As part of that, 
Bojans commit 
https://cgit.freebsd.org/src/commit/?id=7a79d066976149349ecb90240d02eed0c4268737 
was just in the middle of my data collection. I have the impression that 
it made a positive difference in my non deterministic workload.


Is there anything which prevents https://reviews.freebsd.org/D40575 to 
be committed?


Maybe some other people want to have a look at the memory fragmentation 
and some of Bojans work 
(https://wiki.freebsd.org/SummerOfCode2023Projects/PhysicalMemoryAntiFragmentationMechanisms).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Strange network/socket anomalies since about a month

2024-04-24 Thread Alexander Leidinger
art (those with "Timed out waiting for 
server startup" are maybe the processes which fork to start the server 
and wait for it to be started), some are the stat-query, and some seem 
to be a successful start in another poudriere-builder (those with a 
successful /root/.ccache/sccache/5/4/ access look like from a 
successful start in another jail). Maybe there is also a --stop-server 
from poudriere somewhere.


What I noticed (except that printing the new CAP stuff for non-CAP 
enabled processes by default is disturbing) is, that compat11 stuff is 
called (seems the rust ecosystem is not keeping up with our speed of 
development...). Not sure if it matters here that some compat stuff is 
called.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Strange network/socket anomalies since about a month

2024-04-22 Thread Alexander Leidinger

Hi,

I see a higher failure rate of socket/network related stuff since a 
while. Those failures are transient. Directly executing the same thing 
again may or may not result in success/failure. I'm not able to 
reproduce this at will. Sometimes they show up.


Examples:
 - poudriere runs with the sccache overlay (like ccache but also works 
for rust) sometimes fail to create the communication socket and as such 
the build fails. I have 3 different poudriere bulk runs after each other 
in my build script, and when the first one fails, the second and third 
still run. If the first fails due to the sccache issue, the second and 
3rd may or may not fail. Sometimes the first fails and the rest is ok. 
Sometimes all fail, and if I then run one by hand it works (the script 
does the same as the manual run, the script is simply a "for type in A B 
C; do; poudriere bulk -O sccache -j $type -f  ${type}.pkglist; done" 
which I execute from the same shell, and the script doesn't do 
env-sanityzing).
 - A webmail interface (inet / local net -> nginx (rev-proxy) -> nginx 
(webmail service) -> php -> imap) sees intermittent issues sometimes. 
Opening the same email directly again afterwards normally works. I've 
also seen transient issues with pgp signing (webmail interface -> gnupg 
/ gpg-agent on the server), simply hitting send again after a failure 
works fine.


Gleb, could this be related to the socket stuff you did 2 weeks ago? My 
world is from 2024-04-17-112537. I do notice this since at least then, 
but I'm not sure if they where there before that and I simply didn't 
notice them. They are surely "new recently", that amount of issues I 
haven's seen in January. The last two updates of current I did before 
the last one where on 2024-03-31-120210 and 2024-04-08-112551.


I could also imagine that some memory related transient failure could 
cause this, but with >3 GB free I do not expect this. Important here may 
be that I have https://reviews.freebsd.org/D40575 in my tree, which is 
memory related, but it's only a metric to quantify memory fragmentation.


Any ideas how to track this down more easily than running the entire 
poudriere in ktrace (e.g. a hint/script which dtrace probes to use)?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-31 Thread Alexander Leidinger

Am 2024-03-29 18:21, schrieb Alexander Leidinger:

Am 2024-03-29 18:13, schrieb Mark Johnston:

On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote:

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work
(see below for the issue). As the monthly stabilisation pass didn't 
find

obvious issues, it is something related to my setup:
 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't
retpoline)
 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts
init any module fails to load (e.g. via autodetection of hardware or 
rc.conf
kld_list) with the message that the kernel and module versions are 
out of

sync and the module refuses to load.


What is the exact revision you're running?  There were some unrelated
changes to the kernel linker around the same time.


The working src is from 2024-03-11-094351 (GMT+0100).
The failing src was fetched after Glebs stabilization week message (and 
todays src before the sound stuff still fails).


Retpoline wasn't the cause, next test is the CTF stuff in the kernel...


A rather obscure problem was causing this. The "last" BE had canmount 
set to "on" instead of "noauto". No idea how this happened, but this 
resulted in the "last" BE to be mounted on "zfs mount -a" on top of the 
current BE. This means that all modules loaded after the zfs rc script 
has run was loading old kernel modules and the error message of kernel 
version mismatch was correct. I fiund the issue while bisecting the tree 
and suddenly the error message went away but the new issue of missing 
dev entries popped up (/dev was mounted correctly on the booting 
dataset, but the last BE was mounted on top of it and /dev went 
empty...).


It looks to me like bectl was doing this (from "zpool history")...
2024-03-11.14:16:31 zpool set bootfs=rpool/ROOT/2024-03-11-094351 rpool
2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-01-18-092730
2024-03-11.14:16:31 zfs set canmount=noauto rpool/ROOT/2024-02-10-144617
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-11-212006
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-16-082836
2024-03-11.14:16:32 zfs set canmount=noauto rpool/ROOT/2024-02-24-140211
2024-03-11.14:16:32 zfs set canmount=noauto 
rpool/ROOT/2024-02-24-140211_ok

2024-03-11.14:16:33 zfs set canmount=on rpool/ROOT/2024-03-11-094351
2024-03-11.14:16:33 zfs promote rpool/ROOT/2024-03-11-094351
2024-03-11.14:17:03 zfs destroy -r rpool/ROOT/2024-02-24-140211_ok

I surely didn't do the "zfs set canmount=..." for those by hand.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Alexander Leidinger

Am 2024-03-29 18:13, schrieb Mark Johnston:

On Fri, Mar 29, 2024 at 04:52:55PM +0100, Alexander Leidinger wrote:

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work
(see below for the issue). As the monthly stabilisation pass didn't 
find

obvious issues, it is something related to my setup:
 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't
retpoline)
 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts
init any module fails to load (e.g. via autodetection of hardware or 
rc.conf
kld_list) with the message that the kernel and module versions are out 
of

sync and the module refuses to load.


What is the exact revision you're running?  There were some unrelated
changes to the kernel linker around the same time.


The working src is from 2024-03-11-094351 (GMT+0100).
The failing src was fetched after Glebs stabilization week message (and 
todays src before the sound stuff still fails).


Retpoline wasn't the cause, next test is the CTF stuff in the kernel...

I tried the workaround to load the modules from the loader, which 
works, but

then I can't login remotely as ssh fails to allocate a pty. By loading
modules via the loader, I can see messages about missing CTF info when 
the

nvidia modules (from ports = not yet rebuild = in /boot/modules/...ko
instead of /boot/kernel/...ko) try to get initialised... and it looks 
like
they are failing to get initialised because of this missing CTF stuff 
(I'm
back to the previous boot env to be able to login remotely and send 
mails, I

don't have a copy of the failure message at hand).

I assume the missing CTF stuff is due to the CTF based pretty printing 
(https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3).
Is this supposed to fail to load modules which are compiled without 
CTF
data? Shouldn't this work gracefully (e.g. spit out a warning that 
pretty

printing is not available for module X and have the module working)?


From my reading of linker_ctf_load_file(), this is exactly how it
already works.


Great that it works this way, I still suggest to print a message what 
the warning about missing stuff means.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Multiple issues with current (kldload failures, missing CTF stuff, pty issues, ...)

2024-03-29 Thread Alexander Leidinger

Hi,

sources from 2024-03-11 work. Sources from 2024-03-25 and today don't 
work (see below for the issue). As the monthly stabilisation pass didn't 
find obvious issues, it is something related to my setup:

 - not a generic kernel
 - very modular kernel (as much as possible as a module)
 - bind_now (a build without fails too, tested with clean /usr/obj)
 - ccache (a build without fails too, tested with clean /usr/obj)
 - kernel retpoline (build without in progress)
 - userland retpoline (build without in progress)
 - kernel build with WITH_CTF / DDB_CTF (next one to test if it isn't 
retpoline)

 - -fno-builtin
 - CPUFLAGS=native (except for stuff in /usr/src/sys/boot)
 - malloc production
 - COPTFLAGS= -O2 -pipe

The issue is, that kernel modules load OK from loader, but once it 
starts init any module fails to load (e.g. via autodetection of hardware 
or rc.conf kld_list) with the message that the kernel and module 
versions are out of sync and the module refuses to load.


I tried the workaround to load the modules from the loader, which works, 
but then I can't login remotely as ssh fails to allocate a pty. By 
loading modules via the loader, I can see messages about missing CTF 
info when the nvidia modules (from ports = not yet rebuild = in 
/boot/modules/...ko instead of /boot/kernel/...ko) try to get 
initialised... and it looks like they are failing to get initialised 
because of this missing CTF stuff (I'm back to the previous boot env to 
be able to login remotely and send mails, I don't have a copy of the 
failure message at hand).


I assume the missing CTF stuff is due to the CTF based pretty printing 
(https://cgit.freebsd.org/src/commit/?id=c21bc6f3c2425de74141bfee07b609bf65b5a6b3). 
Is this supposed to fail to load modules which are compiled without CTF 
data? Shouldn't this work gracefully (e.g. spit out a warning that 
pretty printing is not available for module X and have the module 
working)?


Next steps:
 - try a world without retpoline (bind_now and ccache active)
 - try a kernel without CTF (bind now, ccache, retpoline active)
 - try a world without bind_now, retpoline, CTF, CPUFLAGS, COPTFLAGS

If anyone has an idea how to debug this in some other way...

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Reason why "nocache" option is not displayed in "mount"?

2024-03-11 Thread Alexander Leidinger

Am 2024-03-10 22:57, schrieb Konstantin Belousov:

We are already low on the free bits in the flags, even after expanding 
them
to 64bit.  More, there are useful common fs services continuously 
consuming

that flags, e.g. the recent NFS TLS options.

I object against using the flags for absolutely not important things, 
like

this nullfs "cache" option.

In long term, we would have to export nmount(2) strings since bits in
flags are finite, but I prefer to delay it as much as possible.


Why do you want to delay this? Personal priorities, or technical 
reasons?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Reason why "nocache" option is not displayed in "mount"?

2024-03-09 Thread Alexander Leidinger

Am 2024-03-09 15:27, schrieb Rick Macklem:

On Sat, Mar 9, 2024 at 5:08 AM Alexander Leidinger
 wrote:


Am 2024-03-09 06:07, schrieb Warner Losh:

> On Thu, Mar 7, 2024 at 1:05 PM Jamie Landeg-Jones 
> wrote:
>
>> Alexander Leidinger  wrote:
>>
>>> Hi,
>>>
>>> what is the reason why "nocache" is not displayed in the output of
>>> "mount" for nullfs options?
>>
>> Good catch. I also notice that "hidden" is not shown either.
>>
>> I guess that as for some time, "nocache" was a "secret" option, no-one
>> update "mount" to display it?
>
> So a couple of things to know.
>
> First, there's a list of known options. These are converted to a
> bitmask. This is then decoded and reported by mount. The other strings
> are passed to the filesystem directly. They decode it and do things,
> but they don't export them (that I can find). I believe that's why they
> aren't reported with 'mount'. There's a couple of other options in
> /etc/fstab that are pseudo options too.

That's the technical explanation why it doesn't work. I'm a step 
further
since initial mail, I even had a look at the code and know that 
nocache

is recorded in a nullfs private flag and that the userland can not
access this (mount looks at struct statfs which doesn't provide info 
to

this and some other things).

My question was targeted more in the direction if there is a 
conceptual
reason or if it was an oversight that it is not displayed. I admit 
that

this was lost in translation...

Regarding the issue of not being able to see all options which are in
effect for a given mount point (not specific to nocache): I consider
this to be a bug.
Pseudo options like "late" or "noauto" in fstab which don't make sense
to use when you use mount(8) a FS by hand, I do not consider here.
As a data point, I added the "-m"option to nfsstat(1) so that all the 
nfs

related options get displayed.

Part of the problem is that this will be file system specific, since 
nmount()

defers processing options to the file systems.


There exists values for a lot of the mount opions which are not 
displayed. For example the nocache option for nullfs is 
MNTK_NULL_NOCACHE in

https://cgit.freebsd.org/src/tree/sys/sys/mount.h#n515
This may not be useable as is, but I use it to show that there are 
already bits public about it, just not in the proper place to be useful 
to the userland.


Even FS specific options could be set as part of statfs (by letting the 
FS set them in struct statfs). Or there could be a per-mount callback / 
ioctl / whatever which provides the options in some way to the userland 
if requested.


So we either have something which could be used but requires some 
interface to let a FS set a value somewhere, or if this is a too gross 
hack, we would need to come up with a new interface to query this info.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Reason why "nocache" option is not displayed in "mount"?

2024-03-09 Thread Alexander Leidinger

Am 2024-03-09 06:07, schrieb Warner Losh:

On Thu, Mar 7, 2024 at 1:05 PM Jamie Landeg-Jones  
wrote:



Alexander Leidinger  wrote:


Hi,

what is the reason why "nocache" is not displayed in the output of
"mount" for nullfs options?


Good catch. I also notice that "hidden" is not shown either.

I guess that as for some time, "nocache" was a "secret" option, no-one
update "mount" to display it?


So a couple of things to know.

First, there's a list of known options. These are converted to a 
bitmask. This is then decoded and reported by mount. The other strings 
are passed to the filesystem directly. They decode it and do things, 
but they don't export them (that I can find). I believe that's why they 
aren't reported with 'mount'. There's a couple of other options in 
/etc/fstab that are pseudo options too.


That's the technical explanation why it doesn't work. I'm a step further 
since initial mail, I even had a look at the code and know that nocache 
is recorded in a nullfs private flag and that the userland can not 
access this (mount looks at struct statfs which doesn't provide info to 
this and some other things).


My question was targeted more in the direction if there is a conceptual 
reason or if it was an oversight that it is not displayed. I admit that 
this was lost in translation...


Regarding the issue of not being able to see all options which are in 
effect for a given mount point (not specific to nocache): I consider 
this to be a bug.
Pseudo options like "late" or "noauto" in fstab which don't make sense 
to use when you use mount(8) a FS by hand, I do not consider here.


I'm not sure if this warrants a bug tracker item (which maybe nobody is 
interested to take ownership of), or if we need to extend the man pages 
with info which option will not by displayed in the output of mounted 
FS, or both.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Reason why "nocache" option is not displayed in "mount"?

2024-03-07 Thread Alexander Leidinger

Am 2024-03-07 14:59, schrieb Christos Chatzaras:
what is the reason why "nocache" is not displayed in the output of 
"mount" for nullfs options?


# grep packages /etc/fstab.commit_leidinger_net
/shared/ports/packages  
/space/jails/commit.leidinger.net/shared/ports/packages nullfs 
 rw,noatime,nocache  0 0


# mount | grep commit | grep packages
/shared/ports/packages on 
/space/jails/commit.leidinger.net/shared/ports/packages (nullfs, 
local, noatime, noexec, nosuid, nfsv4acls)


Context: I wanted to check if poudriere is mounting with or without 
"nocache", and instead of reading the source I wanted to do it more 
quickly by looking at the mount options.


In my setup, I mount the /home directory using nullfs with the nocache 
option to facilitate access for certain jails. The primary reason for 
employing nocache is due to the implementation of ZFS quotas on the 
main system, which do not accurately reflect changes in file usage by 
users within the jail unless nocache is used. When files are added or 
removed by a user within jail, their disk usage wasn't properly updated 
on the main system until I started using nocache. Based on this 
experience, I'm confident that applying nocache works as expected in 
your scenario as well.


It does. The question is how to I _see_ that a mount point is _setup_ 
with nocache? In the above example the FS _is_ mounted with nocache, but 
it is _not displayed_ in the output.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Reason why "nocache" option is not displayed in "mount"?

2024-03-07 Thread Alexander Leidinger

Hi,

what is the reason why "nocache" is not displayed in the output of 
"mount" for nullfs options?


# grep packages /etc/fstab.commit_leidinger_net
/shared/ports/packages  
/space/jails/commit.leidinger.net/shared/ports/packages nullfs  
rw,noatime,nocache  0 0


# mount | grep commit | grep packages
/shared/ports/packages on 
/space/jails/commit.leidinger.net/shared/ports/packages (nullfs, local, 
noatime, noexec, nosuid, nfsv4acls)


Context: I wanted to check if poudriere is mounting with or without 
"nocache", and instead of reading the source I wanted to do it more 
quickly by looking at the mount options.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: main [so: 15] context, 7950X3D and RTL8251/8153 based Ethernet dongle: loss of state, example log information

2024-03-04 Thread Alexander Motin

On 04.03.2024 15:33, Jakob Alvermark wrote:

On 3/4/24 21:13, Alexander Motin wrote:

On 04.03.2024 15:00, Poul-Henning Kamp wrote:

Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP


I consistently had similar problems with my 0x17ef/0x3066 "ThinkPad
Thunderbolt 3 Dock MCU", but they went away after I forced it to
use the if_cdce driver instead with this quirk:

 /* This works much better with if_cdce than if_ure */
 USB_QUIRK(LENOVO, TBT3LAN,  0x, 0x, UQ_CFG_INDEX_1),


AFAIK it is only a workaround.  I saw it myself on number of different 
USB dongles and laptops, that USB starting experience some problems 
with multiple NIC queues and some other factors. IIRC the Realtek 
driver was much more stable once I limited it to one queue and some 
other hacks. IIRC if_cdce just has only one queue and other 
limitations, that not only makes it more stable, but also much 
slower.  It would be good to understand what's wrong is there exactly, 
since IMHO it is a big problem now. Unfortunately HPS was unable to 
reproduce it on his laptop (that makes me wonder if is is specific to 
chipset(s) or thunderbolt?), so it ended nowhere so far.


I have a Lenovo USB 3 dongle, so no thunderbolt.


I also use USB3 dongles.  But in my laptops the USB 3 ports are provided 
by Intel Thunderbolt controller, while in HPS' they were plain from USB3 
controller.  Though it may be just a coincidence.



USB ID 0x17ef/0x7205

rgephy1:  PHY 0 on miibus1

I tried using the cdce driver, it gives me < 100Mb/s, while the ure 
driver gets > 500Mb/s


Right, I saw about the same.

--
Alexander Motin



Re: main [so: 15] context, 7950X3D and RTL8251/8153 based Ethernet dongle: loss of state, example log information

2024-03-04 Thread Alexander Motin

On 04.03.2024 15:00, Poul-Henning Kamp wrote:

Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to DOWN
Nov 30 03:23:18 7950X3D-UFS kernel: ue0: link state changed to UP


I consistently had similar problems with my 0x17ef/0x3066 "ThinkPad
Thunderbolt 3 Dock MCU", but they went away after I forced it to
use the if_cdce driver instead with this quirk:

 /* This works much better with if_cdce than if_ure */
 USB_QUIRK(LENOVO, TBT3LAN,  0x, 0x, UQ_CFG_INDEX_1),


AFAIK it is only a workaround.  I saw it myself on number of different 
USB dongles and laptops, that USB starting experience some problems with 
multiple NIC queues and some other factors.  IIRC the Realtek driver was 
much more stable once I limited it to one queue and some other hacks. 
IIRC if_cdce just has only one queue and other limitations, that not 
only makes it more stable, but also much slower.  It would be good to 
understand what's wrong is there exactly, since IMHO it is a big problem 
now.  Unfortunately HPS was unable to reproduce it on his laptop (that 
makes me wonder if is is specific to chipset(s) or thunderbolt?), so it 
ended nowhere so far.


--
Alexander Motin



Re: February 2024 stabilization week

2024-02-24 Thread Alexander Leidinger

Am 2024-02-24 21:18, schrieb Konstantin Belousov:

On Fri, Feb 23, 2024 at 08:34:21PM -0800, Gleb Smirnoff wrote:

  Hi FreeBSD/main users,

the February 2024 stabilization week started with 03cc3489a02d that 
was tagged
as main-stabweek-2024-Feb.  At the moment of the tag creation we 
already knew

about several regression caused by libc/libsys split.

In the stabilization branch stabweek-2024-Feb we accumulated following 
cherry-picks

from FreeBSD/main:

1) closefrom() syscall was failing unless you have COMPAT_FREEBSD12 in 
kernel

   99ea67573164637d633e8051eb0a5d52f1f9488e
   eb90239d08863bcff3cf82a556ad9d89776cdf3f
2) nextboot -k broken on ZFS
   3aefe6759669bbadeb1a24a8956bf222ce279c68
   0c3ade2cf13df1ed5cd9db4081137ec90fcd19d0
3) libsys links to libc
   baa7d0741b9a2117410d558c6715906980723eed
4) sleep(3) no longer being a pthread cancellation point
   7d233b2220cd3d23c028bdac7eb3b6b7b2025125

We are aware of two regressions still unresolved:

1) libsys/rtld breaks bind 9.18 / mysql / java / ...
   https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277222

   Konstantin, can you please check me? Is this the same issue fixed 
by

   baa7d0741b9a2117410d558c6715906980723eed or a different one?

Most likely. Since no useful diagnostic was provided, I cannot confirm.


It is.
And for the curious reader: this affected a world which was build with 
WITH_BIND_NOW (ports build with RELRO and BIND_NOW were unaffected, as 
long as the basesystem was not build with BIND_NOW).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: sanitizers broken (was RE: libc/libsys split coming soon)

2024-02-22 Thread Alexander Leidinger

Am 2024-02-21 10:52, schrieb hartmut.bra...@dlr.de:

Hi,

I updated yesterday and now event a minimal program with

cc -fsanitize=address

produces

ld: error: undefined symbol: __elf_aux_vector
referenced by sanitizer_linux_libcdep.cpp:950 
(/usr/src/contrib/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:950)
  sanitizer_linux_libcdep.o:(__sanitizer::ReExec()) in 
archive /usr/lib/clang/17/lib/freebsd/libclang_rt.asan-x86_64.a
cc: error: linker command failed with exit code 1 (use -v to see 
invocation)


I think this is caused by the libsys split.


There are other issues too. Discussed in multiple places.

I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277222 this 
morning, maybe it can be used to centralize the libsys issues (= I don't 
mind of you add a comment there, but maybe brooks wants to have a 
separate PR).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: segfault in ld-elf.so.1

2024-02-13 Thread Alexander Leidinger

Am 2024-02-13 01:58, schrieb Konstantin Belousov:

On Mon, Feb 12, 2024 at 11:54:02AM +0200, Konstantin Belousov wrote:

On Mon, Feb 12, 2024 at 10:35:56AM +0100, Alexander Leidinger wrote:
> Hi,
>
> dovecot (and no other program I use on this machine... at least not that I
> notice it) segfaults in ld-elf.so.1 after an update from 2024-01-18-092730
> to 2024-02-10-144617 (and now 2024-02-11-212006 in the hope the issue would
> have been fixed by changes to libc/libsys since 2024-02-10-144617). The
> issue shows up when I try to do an IMAP login. A successful authentication
> starts the imap process which immediately segfaults.
>
> I didn't recompile dovecot for the initial update, but I did now to rule
> out a regression in this area (and to get access via imap do my normal mail
> account).
>
>
> Backtrace:
The backtrace looks incomplete.  It might be the case of infinite 
recursion,

but I cannot claim it from the trace.

Does the program segfault if you run it manually?  If yes, please 
provide


No.

me with the tarball of the binary and all required shared libs, 
including

base system libraries, from your machine.


Regardless of my request, you might try the following.  Note that I did
not tested the patch, ensure that you have a way to recover ld-elf.so.1
if something goes wrong.


[inline patch]

This did the trick and I have IMAP access to my emails again. As this 
runs in a jail, it was easy to test without fear to kill something.


I will try the patch in the review next.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


kernel crash in tcp_subr.c:2386

2024-02-12 Thread Alexander Leidinger
Hi,

I got a coredump with sources from 2024-02-10-144617 (GMT+0100):
---snip---
__curthread () at /space/system/usr_src/sys/amd64/include/pcpu_aux.h:57
57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
pcpu,
(kgdb) #0  __curthread () at
/space/system/usr_src/sys/amd64/include/pcpu_aux.h:57
td = 
#1  doadump (textdump=textdump@entry=1)
at /space/system/usr_src/sys/kern/kern_shutdown.c:403
error = 0
coredump = 
#2  0x8052fe85 in kern_reboot (howto=260)
at /space/system/usr_src/sys/kern/kern_shutdown.c:521
once = 0
__pc = 
#3  0x80530382 in vpanic (
fmt=0x808df476 "Assertion %s failed at %s:%d",
ap=ap@entry=0xfe08a079ebf0)
at /space/system/usr_src/sys/kern/kern_shutdownc:973
buf = "Assertion !callout_active(>t_callout) failed at
/space/system/usr_src/sys/netinet/tcp_subr.c:2386", '\000' 
__pc = 
__pc = 
__pc = 
other_cpus = {__bits = {14680063, 0 }}
td = 0xf8068ef99740
bootopt = 
newpanic = 
#4  0x805301d3 in panic (fmt=)
at /space/system/usr_src/sys/kern/kern_shutdown.c:889
ap = {{gp_offset = 32, fp_offset = 48,
overflow_arg_area = 0xfe08a079ec20,
reg_save_area = 0xfe08a079ebc0}}
#5  0x806c9d8c in tcp_discardcb (tp=tp@entry=0xf80af441ba80)
at /space/system/usr_src/sys/netinet/tcp_subr.c:2386
inp = 0xf80af441ba80
so = 0xf804d23d2780
m = 
isipv6 = 
#6  0x806d6291 in tcp_usr_detach (so=0xf804d23d2780)
at /space/system/usr_src/sys/netinet/tcp_usrreq.c:214
inp = 0xf80af441ba80
tp = 0xf80af441ba80
#7  0x805dba57 in sofree (so=0xf804d23d2780)
at /space/system/usr_src/sys/kern/uipc_socket.c:1205
pr = 0x80a8bd18 
#8  sorele_locked (so=so@entry=0xf804d23d2780)
at /space/system/usr_src/sys/kern/uipc_socket.c:1232
No locals.
#9  0x805dc8c0 in soclose (so=0xf804d23d2780)
at /space/system/usr_src/sys/kern/uipc_socket.c:1302
lqueue = {tqh_first = 0xf8068ef99740,
  tqh_last = 0xfe08a079ed40}
error = 0
saved_vnet = 0x0
last = 
listening = 
#10 0x804ccbd1 in fo_close (fp=0xf805f2dfc500, td=)
at /space/system/usr_src/sys/sys/file.h:390
No locals.
#11 _fdrop (fp=fp@entry=0xf805f2dfc500, td=,
td@entry=0xf8068ef99740)
at /space/system/usr_src/sys/kern/kern_descrip.c:3666
count = 
error = 
#12 0x804d02f3 in closef (fp=fp@entry=0xf805f2dfc500,
td=td@entry=0xf8068ef99740)
at /space/system/usr_src/sys/kern/kern_descrip.c:2839
_error = 0
_fp = 0xf805f2dfc500
lf = {l_start = -8791759350504, l_len = -8791759350528, l_pid = 0,
  l_type = 0, l_whence = 0, l_sysid = 0}
vp = 
fdtol = 
fdp = 
#13 0x804cd50c in closefp_impl (fdp=0xfe07afebf860, fd=19,
fp=0xf805f2dfc500, td=0xf8068ef99740, audit=)
at /space/system/usr_src/sys/kern/kern_descrip.c:1315
error = 
#14 closefp (fdp=0xfe07afebf860, fd=19, fp=0xf805f2dfc500,
td=0xf8068ef99740, holdleaders=true, audit=)
at /space/system/usr_src/sys/kern/kern_descrip.c:1372
No locals.
#15 0x808597d6 in syscallenter (td=0xf8068ef99740)
at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:186
se = 0x80a48330 
p = 0xfe07f29995c0
sa = 0xf8068ef99b30
error = 
sy_thr_static = 
traced = 
#16 amd64_syscall (td=0xf8068ef99740, traced=0)
at /space/system/usr_src/sys/amd64/amd64/trap.c:1192
ksi = {ksi_link = {tqe_next = 0xfe08a079ef30,
tqe_prev = 0x808588af }, ksi_info = {
si_signo = 1, si_errno = 0, si_code = 2015268872, si_pid = -512,
si_uid = 2398721856, si_status = -2042,
si_addr = 0xfe08a079ef40, si_value = {sival_int =
-1602621824,
  sival_ptr = 0xfe08a079ee80, sigval_int = -1602621824,
  sigval_ptr = 0xfe08a079ee80}, _reason = {_fault = {
_trapno = 1489045984}, _timer = {_timerid = 1489045984,
_overrun = 17999}, _mesgq = {_mqd = 1489045984}, _poll = {
_band = 77306605406688}, _capsicum = {_syscall =
1489045984},
  __spare__ = {__spare1__ = 77306605406688, __spare2__ = {
  1489814048, 17999, 208, 0, 0, 0, 992191072,
  ksi_flags = 975329968, ksi_sigq = 0x8082f8f3
}
#17 
No locals.
#18 0x3af13b17fc9a in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x3af13a225ab8
---snip---

Any ideas?

Due to another issue in userland, I updated to 2024-02-11-212006, but I
have the above mentioned version and core still in a BE if needed.

Bye,
Alexander.


segfault in ld-elf.so.1

2024-02-12 Thread Alexander Leidinger
.1`symlook_obj [inlined]
load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80)
at rtld.c:2589:2
frame #29: 0x4d3dfa2a223e
ld-elf.so.1`symlook_obj(req=0x1ded011519c0, obj=0x49a47c228008) at
rtld.c:4735:6
frame #30: 0x4d3dfa2a6992
ld-elf.so.1`symlook_list(req=0x1ded01151a48, objlist=,
dlp=0x1ded01151b90) at rtld.c:4637:13
frame #31: 0x4d3dfa2a680b
ld-elf.so.1`symlook_global(req=0x1ded01151b50,
donelist=0x1ded01151b90) at rtld.c:4541:8
frame #32: 0x4d3dfa2a6673
ld-elf.so.1`get_program_var_addr(name=,
lockstate=0x1ded0f98cb80) at rtld.c:4483:9
frame #33: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined]
distribute_static_tls(list=0x1ded01152068,
lockstate=0x1ded0f98cb80) at rtld.c:5908:6
frame #34: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1,
refobj=0x49a47c228008, lo_flags=0, mode=1,
lockstate=0x1ded0f98cb80) at rtld.c:3831:6
frame #35: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined]
load_filtee1(obj=, needed=0x49a47c2007c8,
flags=, lockstate=) at rtld.c:2576:16
frame #36: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined]
load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80)
at rtld.c:2589:2
frame #37: 0x4d3dfa2a223e
ld-elf.so.1`symlook_obj(req=0x1ded01152160, obj=0x49a47c228008) at
rtld.c:4735:6
frame #38: 0x4d3dfa2a6992
ld-elf.so.1`symlook_list(req=0x1ded011521e8, objlist=,
dlp=0x1ded01152330) at rtld.c:4637:13
frame #39: 0x4d3dfa2a680b
ld-elf.so.1`symlook_global(req=0x1ded011522f0,
donelist=0x1ded01152330) at rtld.c:4541:8
frame #40: 0x4d3dfa2a6673
ld-elf.so.1`get_program_var_addr(name=,
lockstate=0x1ded0f98cb80) at rtld.c:4483:9
frame #41: 0x4d3dfa2a4374 ld-elf.so.1`dlopen_object [inlined]
distribute_static_tls(list=0x1ded01152808,
lockstate=0x1ded0f98cb80) at rtld.c:5908:6
frame #42: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1,
refobj=0x49a47c228008, lo_flags=0, mode=1,
lockstate=0x1ded0f98cb80) at rtld.c:3831:6
frame #43: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined]
load_filtee1(obj=, needed=0x49a47c2007c8,
flags=, lockstate=) at rtld.c:2576:16
frame #44: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined]
load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80)
at rtld.c:2589:2
frame #45: 0x4d3dfa2a223e
ld-elf.so.1`symlook_obj(req=0x1ded01152900, obj=0x49a47c228008) at
rtld.c:4735:6
frame #46: 0x4d3dfa2a6992
ld-elf.so.1`symlook_list(req=0x1ded01152988, objlist=,
dlp=0x1ded01152ad0) at rtld.c:4637:13
frame #47: 0x4d3dfa2a680b
ld-elf.so.1`symlook_global(req=0x1ded01152a90,
donelist=0x1ded01152ad0) at rtld.c:4541:8
frame #48: 0x4d3dfa2a6673
ld-elf.so.1`get_program_var_addr(name=,
lockstate=0x1ded0f98cb80) at rtld.c:4483:9
frame #49: 0x4d3dfa2a4374 ld-elf.so1`dlopen_object [inlined]
distribute_static_tls(list=0x1ded01152fa8,
lockstate=0x1ded0f98cb80) at rtld.c:5908:6
frame #50: 0x4d3dfa2a4364 ld-elf.so.1`dlopen_object(name="", fd=-1,
refobj=0x49a47c228008, lo_flags=0, mode=1,
lockstate=0x1ded0f98cb80) at rtld.c:3831:6
frame #51: 0x4d3dfa2a2274 ld-elf.so.1`symlook_obj [inlined]
load_filtee1(obj=, needed=0x49a47c2007c8,
flags=, lockstate=) at rtld.c:2576:16
frame #52: 0x4d3dfa2a2245 ld-elf.so.1`symlook_obj [inlined]
load_filtees(obj=0x49a47c228008, flags=0, lockstate=0x1ded0f98cb80)
at rtld.c:2589:2
frame #53: 0x4d3dfa2a223e
ld-elf.so.1`symlook_obj(req=0x1ded011530a0, obj=0x49a47c228008) at
rtld.c:4735:6
frame #54: 0x4d3dfa2a6992
ld-elf.so.1`symlook_list(req=0x1ded01153128, objlist=,
dlp=0x1ded01153270) at rtld.c:4637:13
frame #55: 0x4d3dfa2a680b
ld-elf.so.1`symlook_global(req=0x1ded01153230,
donelist=0x1ded01153270) at rtld.c:4541:8
frame #56: 0x4d3dfa2a6673
ld-elf.so.1`get_program_var_addr(name=,
lockstate=0x1ded0f98cb80) at rtld.c:4483:9
---snip---

Bye,
Alexander.


Re: noatime on ufs2

2024-01-29 Thread Alexander Leidinger

Am 2024-01-30 01:21, schrieb Warner Losh:

On Mon, Jan 29, 2024 at 2:31 PM Olivier Certner  
wrote:



It also seems undesirable to add a sysctl to control a value that the
kernel doesn't use.


The kernel has to use it to guarantee some uniform behavior 
irrespective of the mount being performed through mount(8) or by a 
direct call to nmount(2).  I think this consistency is important.  
Perhaps all auto-mounters and mount helpers always run mount(8) and 
never deal with nmount(2), I would have to check (I seem to remember 
that, a long time ago, when nmount(2) was introduced as an enhancement 
over mount(2), the stance was that applications should use mount(8) 
and not nmount(2) directly).  Even if there were no obvious callers of 
nmount(2), I would be a bit uncomfortable with this discrepancy in 
behavior.


I disagree. I think Mike's suggestion was better and dealt with POLA 
and POLA breaking in a sane way. If the default is applied universally 
in user space, then we need not change the kernel at all. We lose all 
the chicken and egg problems and the non-linearness of the sysctl idea.


I would like to add that a sysctl is some kind of a hidden setting, 
whereas /etc/fstab + /etc/defaults/fstab is a "right in the face" way of 
setting filesystem / mount related stuff.


[...]

It could also be generalized so that the FSTYPE could have different 
settings for different types of filesystem (maybe unique flags that 
some file systems don't understand).


+1

nosuid for tmpfs comes into my mind here...

One could also put it in /etc/defaults/fstab too and not break POLA 
since that's the pattern we use elsewhere.


+1

Anyway, I've said my piece. I agree with Mike that there's consensus 
for this from the installer, and after that consensus falls away. 
Mike's idea is one that I can get behind since it elegantly solves the 
general problem.


+1

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-26 Thread Alexander Leidinger

Am 2024-01-25 18:49, schrieb Rodney W. Grimes:

On Thu, Jan 25, 2024, 9:11?AM Ed Maste  wrote:

> On Thu, 25 Jan 2024 at 11:00, Rodney W. Grimes
>  wrote:
> >
> > > These will need to be addressed before actually removing any of these
> > > binaries, of course.
> >
> > You seem to have missed /rescue.  Now think about that long
> > and hard, these tools classified as so important that they
> > are part of /rescue.  Again I can not stress enough how often
> > I turn to these tools in a repair mode situation.
>
> I haven't missed rescue, it is included in the work in progress I
> mentioned. Note that rescue has included gpart since 2007.
>

What can fdisk and/or disklabel repair that gpart can't?


As far as I know there is no way in gpart to get to the
MBR cyl/hd/sec values, you can only get to the LBA start
and end values:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 8388513 (4095 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 1023/ head 15/ sector 63

gpart show ada0
=> 63  8388545  ada0  MBR  (4.0G)
   63  8388513 1  freebsd  [active]  (4.0G)
  8388576   32- free -  (16K)


What are you using cyl/hd/sec values for on a system which runs FreeBSD 
current or on which you would have to use FreeBSD-current in case of a 
repair need? What is the disk hardware on those systems that you still 
need cyl/hd/sec and LBA doesn't work? Serious questions out of 
curiosity.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: noatime on ufs2

2024-01-14 Thread Alexander Leidinger
e of having 'noatime' the default is 
in less tweaking by most people, and one less thing to worry about (for 
them).


I proposed in another mail having a sysctl which indicates the default 
('noatime' or 'atime') for all filesystems.  This default would be used 
at mount time if neither 'atime' nor 'noatime' is explicitly specified. 
 That way, people wanting 'noatime' by default everywhere could just 
set it to that.  It may also convince reticent people to have the 
default (i.e., this sysctl's default value) changed to 'noatime', by 
providing a very simple way to revert to the old behavior.


While I agree that this would be an easy way of globally changing the 
default, what makes noatime special compared to nocover, or nfs4acl, or 
noexec, or nosuid, or whatever other option? Mounting noexec and nosuid 
by default and having those FS be mounted explicitely suid/exec which 
really need it would be a security benefit. And cover/nocover would 
prevent accidental foot-shooting. Where do you want to draw the line 
between "easy" and "explicit"? Only having atime/noatime handled like 
that looks inconsistent to me (which - I hope - not only me thinks is a 
POLA violation).


I fully agree with you regarding switching to noatime by default. I 
think this should not be done by changing the defaults in each FS. I 
think that having a sysctl only for atime/noatime is an ugly 
inconsistency (probably I wouldn't use a generic framework which handles 
all sensible mount options like that, and I think it would be overkill, 
but I wouldn't object to it). In my opinion the correct way of handling 
it is to ask the user at install time, and existing systems shall be 
handled by those which administrate them (don't touch an existing fstab; 
changing the default in the automounter config for a .0 release would be 
OK in my opinion, for a .x release in the middle of a stable branch I 
would add a commented out noatime option to make it visible but not 
active).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: noatime on ufs2

2024-01-12 Thread Alexander Leidinger

Am 2024-01-11 18:15, schrieb Rodney W. Grimes:

Am 2024-01-10 22:49, schrieb Mark Millard:

> I never use atime, always noatime, for UFS. That said, I'd never
> propose
> changing the long standing defaults for commands and calls. I'd avoid:

[good points I fully agree on]

There's one possibility which nobody talked about yet... changing the
default to noatime at install time in fstab / zfs set.


Perhaps you should take a closer look at what bsdinstall does
when it creates a zfs install pool and boot environment, you
might just find that noatime is already set everywhere but
on /var/mail:

/usr/libexec/bsdinstall/zfsboot:: ${ZFSBOOT_POOL_CREATE_OPTIONS:=-O 
compress=lz4 -O atime=off}

/usr/libexec/bsdinstall/zfsboot:/var/mail   atime=on


While zfs is a part of what I talked about, it is not the complete 
picture. bsdinstall covers UFS and ZFS, and we should keep them in sync 
in this regard. Ideally with an option the user can modify. Personally I 
don't mind if the default setting for this option would be noatime. A 
quick serach in the scripts of bsdinstall didn't reveal to me what we 
use for UFS. I assume we use atime.


I fully agree to not violate POLA by changing the default to noatime 
in
any FS. I always set noatime everywhere on systems I take care about, 
no

exceptions (any user visible mail is handled via maildir/IMAP, not
mbox). I haven't made up my mind if it would be a good idea to change
bsdinstall to set noatime (after asking the user about it, and later
maybe offer  the possibility to use relatime in case it gets
implemented). I think it is at least worthwile to discuss this
possibility (including what the default setting of bsdinstall should 
be

for this option).


Little late... iirc its been that way since day one of zfs support
in bsdinstall.


Which I don't mind, as this is what I use anyway. But the correct way 
would be to let the user decide.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: noatime on ufs2

2024-01-10 Thread Alexander Leidinger

Am 2024-01-10 22:49, schrieb Mark Millard:

I never use atime, always noatime, for UFS. That said, I'd never 
propose

changing the long standing defaults for commands and calls. I'd avoid:


[good points I fully agree on]

There's one possibility which nobody talked about yet... changing the 
default to noatime at install time in fstab / zfs set.


I fully agree to not violate POLA by changing the default to noatime in 
any FS. I always set noatime everywhere on systems I take care about, no 
exceptions (any user visible mail is handled via maildir/IMAP, not 
mbox). I haven't made up my mind if it would be a good idea to change 
bsdinstall to set noatime (after asking the user about it, and later 
maybe offer  the possibility to use relatime in case it gets 
implemented). I think it is at least worthwile to discuss this 
possibility (including what the default setting of bsdinstall should be 
for this option).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: ZFS problems since recently ?

2024-01-04 Thread Alexander Motin

John,

On 04.01.2024 09:20, John Kennedy wrote:

On Tue, Jan 02, 2024 at 08:02:04PM -0800, John Kennedy wrote:

On Tue, Jan 02, 2024 at 05:51:32PM -0500, Alexander Motin wrote:

On 01.01.2024 08:59, John Kennedy wrote:

  ...
My poudriere build did eventually fail as well:
...
[05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
[05:40:24] Stopping 2 builders
panic: VERIFY(BP_GET_DEDUP(bp)) failed


Please see/test: https://github.com/openzfs/zfs/pull/15732 .


   It came back today at the end of my poudriere build.  Your patch has fixed
it, so far at least.


   At the risk of conflating this with other ZFS issues, I beat on the VM a lot
more last night without triggering any panics.  My usual busy-workload is a
total kernel+world rebuild (with whatever pending patches might be out), then
a poudriere run (~230 or so packages).  It's weird that the first (much bigger)
run worked but later ones didn't (where maybe I had one port that failed to
build), triggering the panic.  Seemed repeatable, but don't have a feel for
the exact trigger like the sysctl issue.


What is the panic you see now?  It can not be the same, since the dedup 
assertion is no longer there.


--
Alexander Motin



Re: ZFS problems since recently ?

2024-01-02 Thread Alexander Motin

On 01.01.2024 08:59, John Kennedy wrote:

On Mon, Jan 01, 2024 at 06:43:58AM +0100, Kurt Jaeger wrote:

markj@ pointed me in
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276039
to
https://github.com/openzfs/zfs/pull/15719

So it will probably be fixed sooner or later.

The other ZFS crashes I've seen are still an issue.


   My poudriere build did eventually fail as well:

...
[05:40:24] [01] [00:17:20] Finished devel/gdb@py39 | gdb-13.2_1: Success
[05:40:24] Stopping 2 builders
panic: VERIFY(BP_GET_DEDUP(bp)) failed


Please see/test: https://github.com/openzfs/zfs/pull/15732 .

--
Alexander Motin



Re: ZFS problems since recently ?

2024-01-02 Thread Alexander Leidinger

Am 2024-01-02 08:22, schrieb Kurt Jaeger:

Hi!


The sysctl for block cloning is vfs.zfs.bclone_enabled.
To check if a pool has made use of block cloning:
zpool get all poolname | grep bclone


One more thing:

I have two pools on that box, and one of them has some bclone files:

# zpool get all ref | grep bclone
ref   bcloneused 21.8M  -
ref   bclonesaved24.4M  -
ref   bcloneratio2.12x  -
# zpool get all pou | grep bclone
pou   bcloneused 0  -
pou   bclonesaved0  -
pou   bcloneratio1.00x  -

The ref pool contains the system and some files.
The pou pool is for poudriere only.

How do I find which files on ref are bcloned and how can I remove the
bcloning from them ?


No idea about the detection (I don't expect an easy way), but the answer 
to the second part is to copy the files after disabling block cloning. 
As this is system stuff, I would expect it is not much data, and you 
could copy everything and then move back to the original place. I would 
also assume original log files are not affected, and only files which 
were copied (installworld or installkernel or backup files or manual 
copies or port install (not sure about pkg install)) are possible 
targets.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: bridge(4) and IPv6 broken?

2024-01-01 Thread Alexander Leidinger

Am 2024-01-02 00:40, schrieb Lexi Winter:

hello,

i'm having an issue with bridge(4) and IPv6, with a configuration which
is essentially identical to a working system running releng/14.0.

ifconfig:

lo0: flags=1008049 metric 0 mtu 
16384

options=680003
inet 127.0.0.1 netmask 0xff00
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
groups: lo
nd6 options=21
pflog0: flags=1000141 metric 0 mtu 33152
options=0
groups: pflog
alc0: 
flags=1008943 
metric 0 mtu 1500


options=c3098
ether 30:9c:23:a8:89:a0
inet6 fe80::329c:23ff:fea8:89a0%alc0 prefixlen 64 scopeid 0x3
media: Ethernet autoselect (1000baseT )
status: active
nd6 options=1
wg0: flags=10080c1 metric 0 mtu 
1420

options=8
inet 172.16.145.21 netmask 0x
inet6 fd00:0:1337:cafe:::829a:595e prefixlen 128
groups: wg
tunnelfib: 1
nd6 options=101
bridge0: flags=1008843 
metric 0 mtu 1500

options=0
ether 58:9c:fc:10:ff:b6
inet 10.1.4.101 netmask 0xff00 broadcast 10.1.4.255
inet6 2001:8b0:aab5:104:3::101 prefixlen 64
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: tap0 flags=143
ifmaxaddr 0 port 6 priority 128 path cost 200
member: alc0 flags=143
ifmaxaddr 0 port 3 priority 128 path cost 55
groups: bridge
nd6 options=1
tap0: flags=9903 metric 0 
mtu 1500

options=8
ether 58:9c:fc:10:ff:89
groups: tap
media: Ethernet 1000baseT 
status: no carrier
nd6 options=29

the issue is that the bridge doesn't seem to respond to IPv6 ICMP
Neighbour Solicitation.  for example, while running ping, tcpdump shows
this:

23:30:16.567071 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 13, length 16
23:30:16.634860 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32
23:30:17.567080 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 14, length 16
23:30:17.674842 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32
23:30:17.936956 1e:ab:48:c1:f6:62 > 33:33:00:00:00:01, ethertype IPv6 
(0x86dd), length 166: fe80::1cab:48ff:fec1:f662 > ff02::1: ICMP6, 
router advertisement, length 112
23:30:18.567093 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 15, length 16
23:30:19.567104 58:9c:fc:10:ff:b6 > 1e:ab:48:c1:f6:62, ethertype IPv6 
(0x86dd), length 70: 2001:8b0:aab5:104:3::101 > 2001:8b0:aab5:106::12: 
ICMP6, echo request, id 34603, seq 16, length 16
23:30:19.567529 1e:ab:48:c1:f6:62 > 33:33:ff:00:01:01, ethertype IPv6 
(0x86dd), length 86: fe80::1cab:48ff:fec1:f662 > ff02::1:ff00:101: 
ICMP6, neighbor solicitation, who has 2001:8b0:aab5:104:3::101, length 
32


fe80::1cab:48ff:fec1:f662 is the subnet router; it's sending
solicitations but FreeBSD doesn't send a response,

if i remove alc0 from the bridge and configure the IPv6 address 
directly

on alc0 instead, everything works fine.

i'm testing without any packet filter (ipfw/pf) in the kernel.

it's possible i'm missing something obvious here; does anyone have an
idea?


Just an idea. I'm not sure if it is the right track...

There is code in the kernel which is ignoring NS stuff from "non-valid" 
sources (security / spoofing reasons). The NS request is from a link 
local address. Your bridge has no link local address (and your tap has 
the auto linklocal flag set which I would have expected to be on the 
bridge instead). I'm not sure but I would guess it could be because of 
this.


If my guess is not too far off, I would suggest to try:
 - remove auto linklocal from tap0 (like for alc0)
 - add auto linklocal to bridge0

If this doesn't help, there is the sysctl 
net.inet6.icmp6.nd6_onlink_ns_rfc4861 which you could try to set to 1. 
Please read 
https://www.freebsd.org/security/advisories/FreeBSD-SA-08:10.nd6.asc 
before you do that.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: ZFS problems since recently ?

2024-01-01 Thread Alexander Leidinger

Am 2023-12-31 19:34, schrieb Kurt Jaeger:

I already have

vfs.zfs.dmu_offset_next_sync=0

which is supposed to disable block-cloning.


It isn't. This one is supposed to fix an issue which is unrelated to 
block cloning (but can be amplified by block cloning). This issue is 
fixed since some weeks, your Dec 23 build should not need it (when the 
issues happens, you have files with zero as parts of the data instead of 
the real data, and only if you copy files at the same time as those 
files are modified, and then only if you happen to get the timing 
right).


The sysctl for block cloning is vfs.zfs.bclone_enabled.
To check if a pool has made use of block cloning:
zpool get all poolname | grep bclone

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


What is rc.d/opensm?

2023-11-24 Thread Alexander Leidinger

Hi,

for my work on service jails (https://reviews.freebsd.org/D40370) I try 
to find out what opensm is. On my amd64 system I don't have a man page 
nor the binary (and man.freebsd.org doesn't know either about opensm).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: openzfs and block cloning question

2023-11-24 Thread Alexander Leidinger

Am 2023-11-24 08:10, schrieb Oleksandr Kryvulia:

Hi,
Recently cperciva@ published in his twitter [1] that enabling block 
cloning feature tends to data lost on 14.  Is this statement true for 
the current? Since I am using current for daily work and block cloning 
enabled by default how can I verify that my data is not affected?

Thank you.


Block cloning may have an issue, or it does things which amplifies an 
old existing issue, or there are two issues...

The full story is at
https://github.com/openzfs/zfs/issues/15526

To be on the safe side, you may want to have 
vfs.zfs.dmu_offset_next_sync=0 (loader.conf / sysctl.conf) for the 
moment.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: Request for Testing: TCP RACK

2023-11-17 Thread Alexander Leidinger

Am 2023-11-17 14:29, schrieb void:

On Thu, Nov 16, 2023 at 10:13:05AM +0100, tue...@freebsd.org wrote:


You can load the kernel module using
kldload tcp_rack

You can make the RACK stack the default stack using
sysctl net.inet.tcp.functions_default=rack


Hi, thank you for this.

https://klarasystems.com/articles/using-the-freebsd-rack-tcp-stack/ 
mentions

this needs to be set in /etc/src.conf :

WITH_EXTRA_TCP_STACKS=1

Is this still the case? Context here is -current both in a vm and bare
metal, on various machines, on various connections, from DSL to 10Gb.


On a recent -current: this is not needed anymore, it is part of the 
defaults now. But you may still compile the kernel with "option TCPHPTS" 
(until it's added to the defaults too).


Is there a method (yet) for enabling this functionality in various 
-RELENG

maybe where one can compile in a vm built for that purpose, then
transferring to the production vm?


Copy the kernel which was build according to the acticle from klara 
systems to your target VM.



Would it be expected to work on arm64?


Yes (I use it on an ampere VM in the cloud).

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: crash zfs_clone_range()

2023-11-14 Thread Alexander Motin

On 14.11.2023 12:44, Alexander Motin wrote:

On 14.11.2023 12:39, Mateusz Guzik wrote:
One of the vnodes is probably not zfs, I suspect this will do it 
(untested):


diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
index 107cd69c756c..e799a7091b8e 100644
--- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
+++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
@@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct
vop_copy_file_range_args *ap)
 goto bad_write_fallback;
 }
 }
+
+   if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) {
+   goto bad_write_fallback;
+   }
+
 if (invp == outvp) {
 if (vn_lock(outvp, LK_EXCLUSIVE) != 0) {
 goto bad_write_fallback;



vn_copy_file_range() verifies for that:

     /*
  * If the two vnodes are for the same file system type, call
  * VOP_COPY_FILE_RANGE(), otherwise call 
vn_generic_copy_file_range()

  * which can handle copies across multiple file system types.
  */
     *lenp = len;
     if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name,
     outmp->mnt_vfc->vfc_name) == 0)
     error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp,
     lenp, flags, incred, outcred, fsize_td);
     else
     error = vn_generic_copy_file_range(invp, inoffp, outvp,
     outoffp, lenp, flags, incred, outcred, fsize_td);


Thinking again, what happen if there are two nullfs mounts on top of two 
different file systems, one of which is indeed not ZFS?  Do we need to 
add those checks to all ZFS, NFS and FUSE, implementing 
VOP_COPY_FILE_RANGE, or it is responsibility of nullfs or VFS?


--
Alexander Motin



Re: crash zfs_clone_range()

2023-11-14 Thread Alexander Motin

On 14.11.2023 12:39, Mateusz Guzik wrote:

One of the vnodes is probably not zfs, I suspect this will do it (untested):

diff --git a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
index 107cd69c756c..e799a7091b8e 100644
--- a/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
+++ b/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c
@@ -6270,6 +6270,11 @@ zfs_freebsd_copy_file_range(struct
vop_copy_file_range_args *ap)
 goto bad_write_fallback;
 }
 }
+
+   if (invp->v_mount->mnt_vfc != outvp->v_mount->mnt_vfc) {
+   goto bad_write_fallback;
+   }
+
 if (invp == outvp) {
 if (vn_lock(outvp, LK_EXCLUSIVE) != 0) {
 goto bad_write_fallback;



vn_copy_file_range() verifies for that:

/*
 * If the two vnodes are for the same file system type, call
 * VOP_COPY_FILE_RANGE(), otherwise call 
vn_generic_copy_file_range()

 * which can handle copies across multiple file system types.
 */
*lenp = len;
if (inmp == outmp || strcmp(inmp->mnt_vfc->vfc_name,
outmp->mnt_vfc->vfc_name) == 0)
error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp,
lenp, flags, incred, outcred, fsize_td);
else
error = vn_generic_copy_file_range(invp, inoffp, outvp,
outoffp, lenp, flags, incred, outcred, fsize_td);


--
Alexander Motin



Re: crash zfs_clone_range()

2023-11-12 Thread Alexander Motin

Hi Ronald,

As I can see, the clone request to ZFS came through nullfs, and it 
crashed immediately on enter.  I've never been a VFS layer expert, but 
to me it may be a nullfs problem, not zfs.  Is there chance you was 
(un-)mounting something when this happened?


On 10.11.2023 05:12, Ronald Klop wrote:

Hi,

Had this crash today on RPI4/15-CURRENT.

FreeBSD rpi4 15.0-CURRENT FreeBSD 15.0-CURRENT #19 
main-b0203aaa46-dirty: Sat Nov  4 11:48:33 CET 2023 
ronald@rpi4:/home/ronald/dev/freebsd/obj/home/ronald/dev/freebsd/src/arm64.aarch64/sys/GENERIC-NODEBUG arm64


$ sysctl -a | grep bclon
vfs.zfs.bclone_enabled: 1

I started a jail with poudriere to build a package. The jail uses null 
mounts over ZFS.


[root]# cu -s 115200 -l /dev/cuaU0
Connected

db> bt
Tracing pid 95213 tid 100438 td 0xe1e97900
db_trace_self() at db_trace_self
db_stack_trace() at db_stack_trace+0x120
db_command() at db_command+0x2e4
db_command_loop() at db_command_loop+0x58
db_trap() at db_trap+0x100
kdb_trap() at kdb_trap+0x334
handle_el1h_sync() at handle_el1h_sync+0x18
--- exception, esr 0xf200
kdb_enter() at kdb_enter+0x48
vpanic() at vpanic+0x1dc
panic() at panic+0x48
data_abort() at data_abort+0x2fc
handle_el1h_sync() at handle_el1h_sync+0x18
--- exception, esr 0x9604
rms_rlock() at rms_rlock+0x1c
zfs_clone_range() at zfs_clone_range+0x68
zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
null_bypass() at null_bypass+0x118
vn_copy_file_range() at vn_copy_file_range+0x18c
kern_copy_file_range() at kern_copy_file_range+0x36c
sys_copy_file_range() at sys_copy_file_range+0x8c
do_el0_sync() at do_el0_sync+0x634
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x5600


Oh.. While typing this I rebooted the machine and it happened again. I 
didn't start anything in particular although the machine runs some jails.


x0: 0x00e0
   x1: 0xa00090317a48
   x2: 0xa000f79d4f00
   x3: 0xa000c61a44a8
   x4: 0xdeefe460 ($d.2 + 0xdd776560)
   x5: 0xa001250e4c00
   x6: 0xe54025b5 ($d.5 + 0xc)
   x7: 0x030a
   x8: 0xe1559000 ($d.2 + 0xdfdd1100)
   x9: 0x0001
  x10: 0x
  x11: 0x0001
  x12: 0x0002
  x13: 0x
  x14: 0x0001
  x15: 0x
  x16: 0x016dce88 (__stop_set_modmetadata_set + 0x1310)
  x17: 0x004e0d44 (rms_rlock + 0x0)
  x18: 0xdeefe280 ($d.2 + 0xdd776380)
  x19: 0x
  x20: 0xdeefe460 ($d.2 + 0xdd776560)
  x21: 0x7fff
  x22: 0xa00090317a48
  x23: 0xa000f79d4f00
  x24: 0xa001067ef910
  x25: 0x00e0
  x26: 0xa000158a8000
  x27: 0x
  x28: 0xa000158a8000
  x29: 0xdeefe280 ($d.2 + 0xdd776380)
   sp: 0xdeefe280
   lr: 0x01623564 (zfs_clone_range + 0x6c)
  elr: 0x004e0d60 (rms_rlock + 0x1c)
spsr: 0xa045
  far: 0x0108
  esr: 0x9604
panic: data abort in critical section or under mutex
cpuid = 1
time = 1699610885
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a0
panic() at panic+0x48
data_abort() at data_abort+0x2fc
handle_el1h_sync() at handle_el1h_sync+0x18
--- exception, esr 0x9604
rms_rlock() at rms_rlock+0x1c
zfs_clone_range() at zfs_clone_range+0x68
zfs_freebsd_copy_file_range() at zfs_freebsd_copy_file_range+0x19c
null_bypass() at null_bypass+0x118
vn_copy_file_range() at vn_copy_file_range+0x18c
kern_copy_file_range() at kern_copy_file_range+0x36c
sys_copy_file_range() at sys_copy_file_range+0x8c
do_el0_sync() at do_el0_sync+0x634
handle_el0_sync() at handle_el0_sync+0x48
--- exception, esr 0x5600
KDB: enter: panic
[ thread pid 3792 tid 100394 ]
Stopped at  kdb_enter+0x48: str xzr, [x19, #768]
db>

I'll keep the debugger open for a while. Can I type something for 
additional info?


Regards,
Ronald.


--
Alexander Motin



Re: poudriere job && find jobs which received signal 11

2023-10-18 Thread Alexander Leidinger

Am 2023-10-18 09:54, schrieb Matthias Apitz:

Hello,

I'm compiling with poudriere on 14.0-CURRENT 1400094 amd64 "my" ports,
from git October 14, 2023. In the last two day 2229 packages were
produced fine, on job failed (p5-Gtk2-1.24993_3 for been known broken).

This morning I was looking for something in /var/log/messages and
accidentally I detected that yesterday a few compilations failed:

# grep 'signal 11' /var/log/messages | grep -v conftest
Oct 17 10:58:02 jet kernel: pid 12765 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 10:59:32 jet kernel: pid 27104 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:07:38 jet kernel: pid 85640 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:08:17 jet kernel: pid 94451 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)
Oct 17 12:36:01 jet kernel: pid 77914 (cc1plus), jid 24, uid 65534: 
exited on signal 11 (core dumped)


As I said, without that any of the 2229 jobs were failing:

# cd 
/usr/local/poudriere/data/logs/bulk/140-CURRENT-ports20231014/latest-per-pkg

# ls -C1  | wc -l
2229
# grep -l 'build failure' *
p5-Gtk2-1.24993_3.log

How this is possible, that the make engines didn't failing? The uid


That can be part of configure runs which try to test some features.

65534 is the one used by poudriere, can I use the jid 24 somehow to 
find

the job which received the signal 11? Or is the time the only way to


jid = jail ID, the first column in the output of "jls". If you have the 
poudriere runtime logs (where it lists which package it is processing 
ATM), you will see a number from 1 to the max number of jails which run 
in parallel. This number is part of the hostname of the jail. So if you 
have the poudriere jails still running, you can make a mapping from the 
jid to the name to the number, and together with the time you can see 
which package it was building at that time. Unfortunately poudriere 
doesn't list the hostname of the builder nor the jid (feature request 
anyone?).


Example poudriere runtime log:
---snip---
[00:54:11] [03] [00:00:00] Building security/nss | nss-3.94
[00:56:46] [03] [00:02:35] Finished security/nss | nss-3.94: Success
[00:56:47] [03] [00:00:00] Building textproc/gsed | gsed-4.9
[00:57:41] [01] [00:06:18] Finished x11-toolkits/gtk30 | gtk3-3.24.34_1: 
Success

[00:57:42] [01] [00:00:00] Building devel/qt6-base | qt6-base-6.5.3
---snip---

While poudriere is running, jls reports this:
---snip---
# jls jid host.hostname
[...]
91 poudriere-bastille-default
92 poudriere-bastille-default
93 poudriere-bastille-default-job-01
94 poudriere-bastille-default-job-01
95 poudriere-bastille-default-job-02
96 poudriere-bastille-default-job-03
97 poudriere-bastille-default-job-02
98 poudriere-bastille-default-job-03
---snip---

So if we assume a coredump in jid 96 or 98, this means it was in builder 
3.
nss and gseed where build by poudriere builder number 3 (both about 56 
minutes after start of poudriere), and gtk30 and qt6-base by poudriere 
builder number 1.
If we assume further that the coredumps are in the timerange of 54 to 56 
minutes after the poudriere start, the logs of nss may have a trace of 
it (or not, if it was part of configure, then you would have to do the 
configure run and check the messages if it generates similar coredumps)



look, which of the 4 poudriere engines were running at this time?
I'd like to rerun/reproduce the package again.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: issue: poudriere jail update fails after recent changes around certctl

2023-10-14 Thread Alexander Leidinger

Am 2023-10-13 17:42, schrieb Dag-Erling Smørgrav:

Alexander Leidinger  writes:

some change around certctl (world from 2023-10-09) has broken the
poudriere jail update command. The complete install finishes, certctl
is run, and then there is an exit code 1. This is because I have some
certs listed as untrusted, and this seems to give a retval of 1 inside
certctl.


This only happens if a certificate is listed as both trusted and
untrusted, and I'm pretty sure the previous version would return 1 in
that case as well.  Can you check?


I compared /usr/share/certs/untrusted/ with /usr/share/certs/trusted/ 
and some of them match with certs in /usr/share/certs/trusted/. Nothing 
in /usr/local/etc/ssl/untrusted/, one cert (as hash) in 
/usr/local/etc/ssl/blacklisted/ which is also in 
/usr/share/certs/untrusted/.


If FreeBSD provides some certs as trusted (as part of e.g. 
installworld), and I have some of them listed in untrusted, I would not 
expect an error case, but a failsafe action of not trusting them and not 
complaining... am I doing something wrong?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


issue: poudriere jail update fails after recent changes around certctl

2023-10-13 Thread Alexander Leidinger

Hi,

some change around certctl (world from 2023-10-09) has broken the 
poudriere jail update command. The complete install finishes, certctl is 
run, and then there is an exit code 1. This is because I have some certs 
listed as untrusted, and this seems to give a retval of 1 inside 
certctl.


Testcase: set a cert as untrusted and try to use "poudriere jail -u -j 
YOUR_JAIL_NAME -m src=/usr/src"


Relevant log:
---snip---
--

Installing everything completed on Fri Oct 13 10:00:04 CEST 2023

--
   83.55 real   103.83 user   109.42 sys
certctl.sh: Skipping untrusted certificate ad088e1d 
(/space/poudriere/jails/poudriere-x11/etc/ssl/untrusted/ad088e1d.0)

[some more untrusted]
*** [installworld] Error code 1

make[1]: stopped in /space/system/usr_src
1 error

make[1]: stopped in /space/system/usr_src

make: stopped in /usr/src
[00:01:32] Error: Failed to 'make installworld'
---snip---

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: git: 989c5f6da990 - main - freebsd-update: create deep BEs by default [really about if -r for bectl create should just go away]

2023-10-12 Thread Alexander Leidinger

Am 2023-10-12 07:08, schrieb Mark Millard:


I use the likes of:

BE   Active Mountpoint Space Created
build_area_for-main-CA72 -  -  1.99G 2023-09-20 10:19
main-CA72NR /  4.50G 2023-09-21 10:10

NAMECANMOUNT  MOUNTPOINT
zopt0   on/zopt0
. . .
zopt0/ROOT  onnone
zopt0/ROOT/build_area_for-main-CA72 noautonone
zopt0/ROOT/main-CA72noautonone
zopt0/poudriere on
/usr/local/poudriere
zopt0/poudriere/dataon
/usr/local/poudriere/data
zopt0/poudriere/data/.m on
/usr/local/poudriere/data/.m
zopt0/poudriere/data/cache  on
/usr/local/poudriere/data/cache
zopt0/poudriere/data/images on
/usr/local/poudriere/data/images
zopt0/poudriere/data/logs   on
/usr/local/poudriere/data/logs
zopt0/poudriere/data/packages   on
/usr/local/poudriere/data/packages
zopt0/poudriere/data/wrkdirson
/usr/local/poudriere/data/wrkdirs
zopt0/poudriere/jails   on
/usr/local/poudriere/jails
zopt0/poudriere/ports   on
/usr/local/poudriere/ports

zopt0/tmp   on/tmp
zopt0/usr   off   /usr
zopt0/usr/13_0R-src on/usr/13_0R-src
zopt0/usr/alt-main-src  on/usr/alt-main-src
zopt0/usr/home  on/usr/home
zopt0/usr/local on/usr/local


[...]


If such ends up as unsupportable, it will effectively eliminate my
reason for using bectl (and, so, zfs): the sharing is important to
my use.


Additionally/complementary to what Kyle said...

The -r option is about
zop0/ROOT/main-CA72
zop0/ROOT/main-CA72/subDS1
zop0/ROOT/main-CA72/subDS2

A shallow clone is only taking zop0/ROOT/main-CA72 into account, while a 
-r clone is also cloning subDS1 and subDS2.


So as Kyle said, your (and my) use case are not affected by this.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


base-krb5 issues (segfaults when adding principals in openssl)

2023-10-03 Thread Alexander Leidinger

Hi,

has someone else issues with krb5 on -current when adding principals?

With -current as of 2023-09-11 I get a segfault in openssl:
---snip---
Reading symbols from /usr/bin/kadmin...
Reading symbols from /usr/lib/debug//usr/bin/kadmin.debug...
[New LWP 270171]
btCore was generated by `kadmin -l'.
Program terminated with signal SIGSEGV, Segmentation fault.
Address not mapped to object.
#0  0x in ?? ()
(gdb) bt
#0  0x in ?? ()
#1  0x0e118da145f8 in ARCFOUR_string_to_key (context=0x44f9fba1a000, 
enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., 
opaque=..., key=0x44f9fba211d8)

at /space/system/usr_src/crypto/heimdal/lib/krb5/salt-arcfour.c:84
#2  0x0e118da156e9 in krb5_string_to_key_data_salt_opaque 
(enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, salt=..., opaque=..., 
context=, password=...,
key=) at 
/space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:201
#3  krb5_string_to_key_data_salt (context=0x44f9fba1a000, 
enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., 
key=0x44f9fba211d8)

at /space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:173
#4  0x0e118da158cb in krb5_string_to_key_salt 
(context=0x44f9fba4bc60, context@entry=0x44f9fba1a000, 
enctype=-1980854121, password=0x0,
password@entry=0xe1189ee9510 "1kad$uwi6!", salt=..., key=0x5) at 
/space/system/usr_src/crypto/heimdal/lib/krb5/salt.c:225
#5  0x0e118ba75423 in hdb_generate_key_set_password 
(context=0x44f9fba1a000, principal=, 
password=password@entry=0xe1189ee9510 "1kad$uwi6!",
keys=keys@entry=0xe1189ee9210, 
num_keys=num_keys@entry=0xe1189ee9208) at 
/space/system/usr_src/crypto/heimdal/lib/hdb/keys.c:381
#6  0x0e118ca91c9a in _kadm5_set_keys 
(context=context@entry=0x44f9fba1a140, ent=ent@entry=0xe1189ee9258, 
password=0x1 ,
password@entry=0xe1189ee9510 "1kad$uwi6!") at 
/space/system/usr_src/crypto/heimdal/lib/kadm5/set_keys.c:51
#7  0x0e118ca8caac in kadm5_s_create_principal 
(server_handle=0x44f9fba1a140, princ=, mask=out>, password=0xe1189ee9510 "1kad$uwi6!")

at /space/system/usr_src/crypto/heimdal/lib/kadm5/create_s.c:172
#8  0x0e0969e1a57b in add_one_principal (name=, 
rand_key=0, rand_password=0, use_defaults=0, password=0xe1189ee9510 
"1kad$uwi6!", key_data=0x0,
max_ticket_life=, max_renewable_life=, 
attributes=0x0, expiration=, pw_expiration=0x0)

at /space/system/usr_src/crypto/heimdal/kadmin/ank.c:141
#9  add_new_key (opt=opt@entry=0xe1189ee9960, argc=argc@entry=1, 
argv=0x44f9fba49238, argv@entry=0x44f9fba49230) at 
/space/system/usr_src/crypto/heimdal/kadmin/ank.c:243
#10 0x0e0969e1e124 in add_wrap (argc=, 
argv=0x44f9fba49230) at kadmin-commands.c:210
#11 0x0e0969e23945 in sl_command (cmds=, argc=2, 
argv=0x44f9fba49230) at 
/space/system/usr_src/crypto/heimdal/lib/sl/sl.c:209
#12 sl_command_loop (cmds=cmds@entry=0xe0969e282a0 , 
prompt=prompt@entry=0xe0969e15cca "kadmin> ", data=)

at /space/system/usr_src/crypto/heimdal/lib/sl/sl.c:328
#13 0x0e0969e1d876 in main (argc=, argv=out>) at /space/system/usr_src/crypto/heimdal/kadmin/kadmin.c:275

(gdb) up 1
#1  0x0e118da145f8 in ARCFOUR_string_to_key (context=0x44f9fba1a000, 
enctype=KRB5_ENCTYPE_ARCFOUR_HMAC_MD5, password=..., salt=..., 
opaque=..., key=0x44f9fba211d8)

at /space/system/usr_src/crypto/heimdal/lib/krb5/salt-arcfour.c:84
84  EVP_DigestUpdate (m, , 1);
(gdb) list
79
80  /* LE encoding */
81  for (i = 0; i < len; i++) {
82  unsigned char p;
83  p = (s[i] & 0xff);
84  EVP_DigestUpdate (m, , 1);
85  p = (s[i] >> 8) & 0xff;
86  EVP_DigestUpdate (m, , 1);
87  }
88
(gdb) print i
$1 = 0
(gdb) print len
$2 = 
(gdb) print p
$3 = 49 '1'
(gdb) print m
$4 = (EVP_MD_CTX *) 0x43e31de4bc60
(gdb) print *m
$5 = {reqdigest = 0x17e678afd470, digest = 0x0, engine = 0x0, flags = 0, 
md_data = 0x0, pctx = 0x0, update = 0x0, algctx = 0x0, fetched_digest = 
0x0}

(gdb)
---snip---

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: vfs.zfs.bclone_enabled (was: FreeBSD 14.0-BETA2 Now Available) [block_cloning and zilsaxattr missing from loader's features_for_read]

2023-09-18 Thread Alexander Motin

On 18.09.2023 19:21, Mark Millard wrote:

On Sep 18, 2023, at 15:51, Mark Millard  wrote:

Alexander Motin  wrote on
Date: Mon, 18 Sep 2023 13:26:56 UTC :

block_cloning feature is marked as READONLY_COMPAT. It should not
require any special handling from the boot code.


 From stand/libsa/zfs/zfsimpl.c but adding a comment about the
read-only compatibility status of each entry:

/*
* List of ZFS features supported for read
*/

static const char *features_for_read[] = {

"com.datto:bookmark_v2", // READ-ONLY COMPATIBLE no
"com.datto:encryption", // READ-ONLY COMPATIBLE no
"com.datto:resilver_defer", // READ-ONLY COMPATIBLE yes
"com.delphix:bookmark_written", // READ-ONLY COMPATIBLE no
"com.delphix:device_removal", // READ-ONLY COMPATIBLE no
"com.delphix:embedded_data", // READ-ONLY COMPATIBLE no
"com.delphix:extensible_dataset", // READ-ONLY COMPATIBLE no
"com.delphix:head_errlog", // READ-ONLY COMPATIBLE no
"com.delphix:hole_birth", // READ-ONLY COMPATIBLE no
"com.delphix:obsolete_counts", // READ-ONLY COMPATIBLE yes
"com.delphix:spacemap_histogram", // READ-ONLY COMPATIBLE yes
"com.delphix:spacemap_v2", // READ-ONLY COMPATIBLE yes
"com.delphix:zpool_checkpoint", // READ-ONLY COMPATIBLE yes
"com.intel:allocation_classes", // READ-ONLY COMPATIBLE yes
"com.joyent:multi_vdev_crash_dump", // READ-ONLY COMPATIBLE no
"com.klarasystems:vdev_zaps_v2", // READ-ONLY COMPATIBLE no
"org.freebsd:zstd_compress", // READ-ONLY COMPATIBLE no
"org.illumos:lz4_compress", // READ-ONLY COMPATIBLE no
"org.illumos:sha512", // READ-ONLY COMPATIBLE no
"org.illumos:skein", // READ-ONLY COMPATIBLE no
"org.open-zfs:large_blocks", // READ-ONLY COMPATIBLE no
"org.openzfs:blake3", // READ-ONLY COMPATIBLE no
"org.zfsonlinux:allocation_classes", // READ-ONLY COMPATIBLE yes
"org.zfsonlinux:large_dnode", // READ-ONLY COMPATIBLE no
NULL
};

So it appears that the design is that both "no" and "yes" ones
that are known to be supported are listed and anything else is
supposed to lead to rejection until explicitly added as
known-compatibile.


I don't think so.  I think somebody by mistake added first featured that 
should not be here, and then others continued this irrelevant routine. 
My own development server/builder is happily running latest main with 
ZFS root without any patches and with block cloning not only enabled, 
but even active.  So as I have told, it is not needed:


mav@srv:/root# zpool get all | grep clon
mavlab  bcloneused 20.5M  -
mavlab  bclonesaved20.9M  -
mavlab  bcloneratio2.02x  -
mavlab  feature@block_cloning  active local

Somebody should go through the list and clean in up from read-compatible 
features and document it, unless there are some features that were 
re-qualified at some point, I haven't checked if it could be.



This matches up with stand/libsa/zfs/zfsimpl.c 's:

static int
nvlist_check_features_for_read(nvlist_t *nvl)
{

...

rc = nvlist_find(nvl, ZPOOL_CONFIG_FEATURES_FOR_READ,
DATA_TYPE_NVLIST, NULL, , NULL);


Take a note it reads ZPOOL_CONFIG_FEATURES_FOR_READ.  Same time features 
declared as READONLY_COMPAT are stored in FEATURES_FOR_WRITE, that boot 
loader does not even care.



I do not know if vfs.zfs.bclone_enabled=0 leads the loader
to see vs. not-see a "com.fudosecurity:block_cloning".


bclone_enabled=0 block copy_file_range() usage, that should keep the 
feature enabled, but not active.  It could be related if the feature 
would be in FEATURES_FOR_WRITE, but here and now it is not.



It appears that 2 additions afeter opebzfas-2.1-freebsd are
missing from the above list:

com.fudosecurity:block_cloning
org.openzfs:zilsaxattr


Nothing of ZIL is required for read-only import.  So no, it is also not 
needed.


--
Alexander Motin



Re: vfs.zfs.bclone_enabled (was: FreeBSD 14.0-BETA2 Now Available)

2023-09-18 Thread Alexander Motin
block_cloning feature is marked as READONLY_COMPAT.  It should not 
require any special handling from the boot code.


On 18.09.2023 07:22, Tomoaki AOKI wrote:

Really OK?

I cannot find block_cloning in array *features_for_read[] of
stand/libsa/zfs/zfsimpl.c, which possibly mean boot codes (including
loader) cannot boot from Root-on-ZFS pool having block_cloning active.

Not sure adding '"com.fudosecurity:block_cloning",' here is sufficient
or not. Possibly more works are needed.

IMHO, all default-enabled features should be safe for booting.
Implement features with disalded, impement boot codes to support them,
then finally enable them by default should be the only valid route.


[1] https://cgit.freebsd.org/src/tree/stand/libsa/zfs/zfsimpl.c


On Mon, 18 Sep 2023 07:31:46 +0200
Martin Matuska  wrote:


I vote for enabling block cloning on main :-)

mm

On 16. 9. 2023 19:14, Alexander Motin wrote:

On 16.09.2023 01:25, Graham Perrin wrote:

On 16/09/2023 01:28, Glen Barber wrote:

o A fix for the ZFS block_cloning feature has been implemented.


Thanks

I see
<https://github.com/openzfs/zfs/commit/5cc1876f14f90430b24f1ad2f231de936691940f>,
with
<https://github.com/freebsd/freebsd-src/commit/9dcf00aa404bb62052433c45aaa5475e2760f5ed>
in stable/14.

As vfs.zfs.bclone_enabled is still 0 (at least, with 15.0-CURRENT
n265350-72d97e1dd9cc): should we assume that additional fixes, not
necessarily in time for 14.0-RELEASE, will be required before
vfs.zfs.bclone_enabled can default to 1?


I am not aware of any block cloning issues now.  All this thread about
bclone_enabled actually started after I asked why it is still
disabled. Thanks to Mark Millard for spotting this issue I could fix,
but now we are back at the point of re-enabling it again.  Since the
tunable does not even exist anywhere outside of FreeBSD base tree, I'd
propose to give this code another try here too.  I see no point to
have it disabled at least in main unless somebody needs time to run
some specific tests first.




--
Alexander Motin



Re: vfs.zfs.bclone_enabled (was: FreeBSD 14.0-BETA2 Now Available)

2023-09-16 Thread Alexander Motin

On 16.09.2023 01:25, Graham Perrin wrote:

On 16/09/2023 01:28, Glen Barber wrote:

o A fix for the ZFS block_cloning feature has been implemented.


Thanks

I see 
<https://github.com/openzfs/zfs/commit/5cc1876f14f90430b24f1ad2f231de936691940f>, with <https://github.com/freebsd/freebsd-src/commit/9dcf00aa404bb62052433c45aaa5475e2760f5ed> in stable/14.


As vfs.zfs.bclone_enabled is still 0 (at least, with 15.0-CURRENT 
n265350-72d97e1dd9cc): should we assume that additional fixes, not 
necessarily in time for 14.0-RELEASE, will be required before 
vfs.zfs.bclone_enabled can default to 1?


I am not aware of any block cloning issues now.  All this thread about 
bclone_enabled actually started after I asked why it is still disabled. 
Thanks to Mark Millard for spotting this issue I could fix, but now we 
are back at the point of re-enabling it again.  Since the tunable does 
not even exist anywhere outside of FreeBSD base tree, I'd propose to 
give this code another try here too.  I see no point to have it disabled 
at least in main unless somebody needs time to run some specific tests 
first.


--
Alexander Motin



Re: Speed improvements in ZFS

2023-09-15 Thread Alexander Leidinger

Am 2023-09-15 13:40, schrieb George Michaelson:

Not wanting to hijack threads I am interested if any of this can 
translate back up tree and make Linux ZFS faster.


And, if there are simple sysctl tuning worth trying in large (tb) 
memory model pre 14 FreeBSD systems with slow zfs. Older freebsd alas.


The current part of the discussion is not really about ZFS (I use a lot 
of nullfs on top of ZFS). So no to the first part.


The tuning I did (maxvnodes) doesn't really depend on the FreeBSD 
version, but on the number of files touched/contained in the FS. The 
only other change I made is updating the OS itself, so this part doesn't 
apply to pre 14 systems.


If you think your ZFS (with a large ARC) is slow, you need to review 
your primary cache settings per dataset, check the arcstats, and maybe 
think about a 2nd level arc on fast storage (cache device on nvm or 
ssd). IF you have a read-once workload, nothing of this will help. So 
all depends on your workload.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF

signature.asc
Description: OpenPGP digital signature


Re: Speed improvements in ZFS

2023-09-15 Thread Alexander Leidinger

Am 2023-09-04 14:26, schrieb Mateusz Guzik:

On 9/4/23, Alexander Leidinger  wrote:

Am 2023-08-28 22:33, schrieb Alexander Leidinger:

Am 2023-08-22 18:59, schrieb Mateusz Guzik:

On 8/22/23, Alexander Leidinger  wrote:

Am 2023-08-21 10:53, schrieb Konstantin Belousov:
On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger 
wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger 
> > >> wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger 
> > >>>> wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
> > >>>>> interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has
> > >>> several
> > >>> null mounts. One basesystem mounted into every jail, and then
> > >>> shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order
> > >>>> to
> > >>>> do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes
> > >>>> from
> > >>>> exclusive locking which should not be there to begin with --
> > >>>> as
> > >>>> in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of
> > >> xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how
> > this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
> for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS
exported.
6 of those nullfs mounts are also exported via Samba. The NFS
exports
shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this
exclusive
lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I
need
space for. Both pools reside on the same disks. The root pool is a
3-way
mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
space-pool. The jails are all basejail-style jails.



While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.


dtrace looks to me like a faster approach to get to the root than
printf... my first naive try is to detect exclusive locks. I'm not 
100%

sure I got it right, but at least dtrace doesn't complain about it:
---snip---
#pragma D option dynvarsize=32m

fbt:nullfs:null_lock:entry
/args[0]->a_flags & 0x08 != 0/
{
stack();
}
---snip---

In which direction should I look with dtrace if this works in 
tonights

run of periodic? I don't have enough knowledge about VFS to come up
with some immediate ideas.


After your sysctl fix for maxvnodes I increased the amount of vnodes 
10

times compared to the initial report. This has increased the speed of
the operation, the find runs in all those jails finished today after 
~5h

(@~8am) instead of in the afternoon as before. Could this suggest that
in parallel some null_reclaim() is running which does the exclusive
locks and slows down the entire operation?



That may be a slowdown to some extent, but the primary problem is
exclusive vnode locking for stat lookup, which should not be
happening.


With -current as of 2023-09-03 (and right now 2023-09-11), the periodic 
daily runs are down to less than an hour... and this didn't happen 
directly after switching to 2023-09-13. First it went down to 4h, then 
down to 1h without any update of the OS. The only thing what I did was 
modifying the number of maxfiles. First to some huge amount after your 
commit in the sysctl affecting part. Then after noticing way more 
freevnodes than configured down to 5.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: sed in CURRENT fails in textproc/jq

2023-09-11 Thread Alexander Leidinger

Am 2023-09-10 18:53, schrieb Robert Clausecker:

Hi Warner,

Thank you for your response.

Am Sun, Sep 10, 2023 at 09:53:03AM -0600 schrieb Warner Losh:

On Sun, Sep 10, 2023, 7:36 AM Robert Clausecker  wrote:

> Hi Warner,
>
> I have pushed a fix.  It should hopefully address those failing tests.
> The same issue should also affect memcmp(), but unlike for memchr(), it is
> illegal to pass a length to memcmp() that extends past the actual end of
> the buffer as memcmp() is permitted to examine the whole buffer regardless
> of where the first mismatch is.
>
> I am considering a change to improve the behaviour of memcmp() on such
> errorneous inputs.  There are two options: (a) I could change memcmp() the
> same way I fixed memchr() and have implausible buffer lengths behave as if
> the buffer goes to the end of the address space or (b) I could change
> memcmp() to crash loudly if it detects such a case.  I could also
> (c) leave memcmp() as is.  Which of these three choices is preferable?
>

What does the standard say? I'm highly skeptical that these corner 
cases are

UB behavior.

I'd like actual support for this statement, rather than your 
conjecture

that it's
illegal. Even if you can come up with that, preserving the old 
behavior is

my
first choice. Especially since many of these functions aren't well 
defined

by
a standard, but are extensions.

As for memchr,
https://pubs.opengroup.org/onlinepubs/009696799/functions/memchr.html
has no such permission to examine 'the entire buffer at once' nor any
restirction
as to the length extending beyond the address space. I'm skeptical of 
your

reading
that it allows one to examine all of [b, b + len), so please explain 
where

the standard
supports reading past the first occurance.


memchr() in particular is specified to only examine the input until the
matching character is found (ISO/IEC 9899:2011 § 7.24.5.1):

***
The memchr function locates the first occurrence of c (converted to an
unsigned char) in the initial n characters (each interpreted as 
unsigned

char) of the object pointed to by s. The implementation shall behave as
if it reads the characters sequentially and stops as soon as a matching
character is found.
***

Therefore, it appears reasonable that calls with fake buffer lengths
(e.g. SIZE_MAX, to read until a mismatch occurs) must be supported.
However, memcmp() has no such language and the text explicitly states
that the whole buffer is compared (ISO/IEC 9899:2011 § 7.24.4.1):

***
The memcmp function compares the first n characters of the object
pointed to by s1 to the first n characters of the object pointed to by 
s2.

***

By omission, this seems to give license to e.g. implement memcmp() like
timingsafe_memcmp() where it inspects all n characters of both buffers
and only then gives a result.  So if n is longer than the actual buffer
(e.g. n == SIZE_MAX), behaviour may not be defined (e.g. there could be
a crash due to crossing into an unmapped page).

Thus I have patched memchr() to behave correctly when length SIZE_MAX 
is

given (commit b2618b65).  My memcmp() suffers from similarly flawed
logic and may need to be patched.  However, as the language I cited 
above

does not indicate that such usage needs to be supported for memcmp()
(whereas it must be for memchr(), contrary to my assumptions), I was
asking you for how to proceed with memcmp (hence choices (a)--(c)).


My 2ct:
What did the previous implementation of memcmp() do in this case?
 - If it was generous and behaved similar to the requirements of
   memchr(), POLA requires to have the same now too.
 - If it was crashing or silently going on (= lurking bugs in 3rd
   party code), we may have the possibility to do a coredump in case
   of running past the end of the buffer to prevent malicous use.
 - In general I go with the robustness principle, "be liberal what you
   accept, but strict in what you provide" = memcmp() should behave
   as if it is supported.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-10 Thread Alexander Motin

On 09.09.2023 12:32, Mark Millard wrote:

On Sep 8, 2023, at 21:54, Mark Millard  wrote:

On Sep 8, 2023, at 18:19, Mark Millard  wrote:

On Sep 8, 2023, at 17:03, Mark Millard  wrote:

On Sep 8, 2023, at 15:30, Martin Matuska  wrote:
On 9. 9. 2023 0:09, Alexander Motin wrote:

Thank you, Martin.  I was able to reproduce the issue with your script and 
found the cause.

I first though the issue is triggered by the `cp`, but it appeared to be 
triggered by `cat`.  It also got copy_file_range() support, but later than 
`cp`.  That is probably why it slipped through testing.  This patch fixes it 
for me: https://github.com/openzfs/zfs/pull/15251 .

Mark, could you please try the patch?


I finally stopped it at 7473 built (a little over 13 hrs elapsed):

^C[13:08:30] Error: Signal SIGINT caught, cleaning up and exiting
[main-amd64-bulk_a-default] [2023-09-08_19h51m52s] [sigint:] Queued: 34588 
Built: 7473  Failed: 23Skipped: 798   Ignored: 335   Fetched: 0 
Tobuild: 25959  Time: 13:08:26
[13:08:30] Logs: 
/usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-08_19h51m52s
[13:08:31] Cleaning up
[13:17:10] Unmounting file systems
Exiting with status 1

In part that was more evidence for deadlocks at least being fairly
rare as well.

None of the failed ones looked odd. (A fair portion are because the
bulk -a was mostly doing WITH_DEBUG= builds. Many upstreams change
library names, some other file names, or paths used for debug
builds and ports generally do not cover well building the debug
builds for such. I've used these runs to extend my list of
exceptions that avoid using WITH_DEBUG .) So no evidence of
corruptions.


Thank you, Mark.  The patch was accepted upstream and merged to both 
master and zfs-2.2-release branches.


--
Alexander Motin



Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-08 Thread Alexander Motin

On 08.09.2023 09:52, Martin Matuska wrote:
I digged a little and was able to reproduce the panic without poudriere 
with a shell script.


#!/bin/sh
nl='
'
sed_script=s/aaa/b/
for ac_i in 1 2 3 4 5 6 7; do
     sed_script="$sed_script$nl$sed_script"
done
echo "$sed_script" 2>/dev/null | sed 99q >conftest.sed

repeats=8
count=0
echo -n 0123456789 >"conftest.in"
while :
do
     cat "conftest.in" "conftest.in" >"conftest.tmp"
     mv "conftest.tmp" "conftest.in"
     cp "conftest.in" "conftest.nl"
     echo '' >> "conftest.nl"
     sed -f conftest.sed < "conftest.nl" >"conftest.out" 2>/dev/null || 
break

     diff "conftest.out" "conftest.nl" >/dev/null 2>&1 || break
     count=$(($count + 1))
     echo "count: $count"
     # 10*(2^10) chars as input seems more than enough
     test $count -gt $repeats && break
done
rm -f conftest.in conftest.tmp conftest.nl conftest.out


Thank you, Martin.  I was able to reproduce the issue with your script 
and found the cause.


I first though the issue is triggered by the `cp`, but it appeared to be 
triggered by `cat`.  It also got copy_file_range() support, but later 
than `cp`.  That is probably why it slipped through testing.  This patch 
fixes it for me: https://github.com/openzfs/zfs/pull/15251 .


Mark, could you please try the patch?

--
Alexander Motin



Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-07 Thread Alexander Motin

Thanks, Mark.

On 07.09.2023 15:40, Mark Millard wrote:

On Sep 7, 2023, at 11:48, Glen Barber  wrote:


On Thu, Sep 07, 2023 at 11:17:22AM -0700, Mark Millard wrote:

When I next have time, should I retry based on a more recent
vintage of main that includes 969071be938c ?


Yes, please, if you can.


As stands, I rebooted that machine into my normal
enviroment, so the after-crash-with-dump-info
context is preserved. I'll presume lack of a need
to preserve that context unless I hear otherwise.
(But I'll work on this until later today.)

Even my normal environment predates the commit in
question by a few commits. So I'll end up doing a
more general round of updates overall.

Someone can let me know if there is a preference
for debug over non-debug for the next test run.


It is not unknown when some bugs disappear once debugging is enabled due 
to different execution timings, but generally debug may to detect the 
problem closer to its origin instead of looking on random consequences. 
I am only starting to look on this report (unless Pawel or somebody beat 
me on it), and don't have additional requests yet, but if you can repeat 
the same with debug kernel (in-base ZFS's ZFS_DEBUG setting follows 
kernel's INVARIANTS), it may give us some additional information.



Looking at "git: 969071be938c - main", the relevant
part seems to be just (white space possibly not
preserved accurately):

diff --git a/sys/kern/vfs_vnops.c b/sys/kern/vfs_vnops.c
index 9fb5aee6a023..4e4161ef1a7f 100644
--- a/sys/kern/vfs_vnops.c
+++ b/sys/kern/vfs_vnops.c
@@ -3076,12 +3076,14 @@ vn_copy_file_range(struct vnode *invp, off_t *inoffp, 
struct vnode *outvp,
goto out;
  
  	/*

-* If the two vnode are for the same file system, call
+* If the two vnodes are for the same file system type, call
 * VOP_COPY_FILE_RANGE(), otherwise call vn_generic_copy_file_range()
-* which can handle copies across multiple file systems.
+* which can handle copies across multiple file system types.
 */
*lenp = len;
-   if (invp->v_mount == outvp->v_mount)
+   if (invp->v_mount == outvp->v_mount ||
+   strcmp(invp->v_mount->mnt_vfc->vfc_name,
+   outvp->v_mount->mnt_vfc->vfc_name) == 0)
error = VOP_COPY_FILE_RANGE(invp, inoffp, outvp, outoffp,
lenp, flags, incred, outcred, fsize_td);
else

That looks to call VOP_COPY_FILE_RANGE in more contexts and
vn_generic_copy_file_range in fewer.

The backtrace I reported involves: VOP_COPY_FILE_RANGE
So it appears this change is unlikely to invalidate my
test result,  although failure might happen sooner if
more VOP_COPY_FILE_RANGE calls happen with the newer code.


Your logic is likely right, but if you have block cloning requests both 
within and between datasets, this patch may change the pattern.  Though 
it is obviously not a fix for the issue.  I responded to the commit 
email only because it makes no difference while vfs.zfs.bclone_enabled is 0.



That in turns means that someone may come up with some
other change for me to test by the time I get around to
setting up another test. Let me know if so.


--
Alexander Motin



Re: 100% CPU time for sysctl command, not killable

2023-09-07 Thread Alexander Leidinger

Am 2023-09-03 21:22, schrieb Alexander Leidinger:

Am 2023-09-02 16:56, schrieb Mateusz Guzik:

On 8/20/23, Alexander Leidinger  wrote:

Hi,

sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable

sysctl program. This is somewhat unexpected...



fixed here 
https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3


I confirm.


There may be dragons...:
kern.maxvnodes: 1048576000
vfs.wantfreevnodes: 262144000
vfs.freevnodes: 0  <---
vfs.vnodes_created: 11832359
vfs.numvnodes: 146699
vfs.recycles_free: 4700765
vfs.recycles: 0
vfs.vnode_alloc_sleeps: 0

Another time I got an insanely huge amount of free vnodes (more than 
maxvnodes).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


signature.asc
Description: OpenPGP digital signature


Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Alexander Motin

On 04.09.2023 11:45, Mark Millard wrote:

On Sep 4, 2023, at 06:09, Alexander Motin  wrote:

per_txg_dirty_frees_percent is directly related to the delete delays we see 
here.  You are forcing ZFS to commit transactions each 5% of dirty ARC limit, 
which is 5% of 10% or memory size.  I haven't looked on that code recently, but 
I guess setting it too low can make ZFS commit transactions too often, 
increasing write inflation for the underlying storage.  I would propose you to 
restore the default and try again.


While this machine is different, the original problem was worse than
the issue here: the load average was less than 1 for the most part
the parallel bulk build when 30 was used. The fraction of time waiting
was much longer than with 5. If I understand right, both too high and
too low for a type of context can lead to increased elapsed time and
getting it set to a near optimal is a non-obvious exploration.


IIRC this limit was modified several times since originally implemented. 
 May be it could benefit from another look, if default 30% is not good. 
 It would be good if generic ZFS issues like this were reported to 
OpenZFS upstream to be visible to a wider public.  Unfortunately I have 
several other project I must work on, so if it is not a regression I 
can't promise I'll take it right now, so anybody else is welcome.



An overall point for the goal of my activity is: what makes a
good test context for checking if ZFS is again safe to use?
May be other tradeoffs make, say, 4 hardware threads more
reasonable than 32.


Thank you for your testing.  The best test is one that nobody else run. 
It also correlates with the topic of "safe to use", which also depends 
on what it is used for. :)


--
Alexander Motin



Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Alexander Motin

On 04.09.2023 05:56, Mark Millard wrote:

On Sep 4, 2023, at 02:00, Mark Millard  wrote:

On Sep 3, 2023, at 23:35, Mark Millard  wrote:

On Sep 3, 2023, at 22:06, Alexander Motin  wrote:

On 03.09.2023 22:54, Mark Millard wrote:

After that ^t produced the likes of:
load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 1% 
13004k


So the full state is not "tx->tx", but is actually a "tx->tx_quiesce_done_cv", 
which means the thread is waiting for new transaction to be opened, which means some previous to be 
quiesced and then synced.


#0 0x80b6f103 at mi_switch+0x173
#1 0x80bc0f24 at sleepq_switch+0x104
#2 0x80aec4c5 at _cv_wait+0x165
#3 0x82aba365 at txg_wait_open+0xf5
#4 0x82a11b81 at dmu_free_long_range+0x151


Here it seems like transaction commit is waited due to large amount of delete 
operations, which ZFS tries to spread between separate TXGs.


That fit the context: cleaning out /usr/local/poudriere/data/.m/


You should probably see some large and growing number in sysctl 
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay .


After the reboot I started a -J64 example. It has avoided the
early "witness exhausted". Again I ^C'd after about an hours
after the 2nd builder had started. So: again cleaning out
/usr/local/poudriere/data/.m/ Only seconds between:

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027


As expected, deletes trigger and wait for TXG commits.


I have found a measure of progress: zfs list's USED
for /usr/local/poudriere/data/.m is decreasing. So
ztop's d/s was a good classification: deletes.


#5 0x829a87d2 at zfs_rmnode+0x72
#6 0x829b658d at zfs_freebsd_reclaim+0x3d
#7 0x8113a495 at VOP_RECLAIM_APV+0x35
#8 0x80c5a7d9 at vgonel+0x3a9
#9 0x80c5af7f at vrecycle+0x3f
#10 0x829b643e at zfs_freebsd_inactive+0x4e
#11 0x80c598cf at vinactivef+0xbf
#12 0x80c590da at vput_final+0x2aa
#13 0x80c68886 at kern_funlinkat+0x2f6
#14 0x80c68588 at sys_unlink+0x28
#15 0x8106323f at amd64_syscall+0x14f
#16 0x8103512b at fast_syscall_common+0xf8


What we don't see here is what quiesce and sync threads of the pool are 
actually doing.  Sync thread has plenty of different jobs, including async 
write, async destroy, scrub and others, that all may delay each other.

Before you rebooted the system, depending how alive it is, could you save a 
number of outputs of `procstat -akk`, or at least specifically `procstat -akk | 
grep txg_thread_enter` if the full is hard?  Or somehow else observe what they 
are doing.


# grep txg_thread_enter ~/mmjnk0[0-5].txt
/usr/home/root/mmjnk00.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk00.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk01.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk01.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk02.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk02.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk03.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk03.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thre

Re: Speed improvements in ZFS

2023-09-04 Thread Alexander Leidinger

Am 2023-08-28 22:33, schrieb Alexander Leidinger:

Am 2023-08-22 18:59, schrieb Mateusz Guzik:

On 8/22/23, Alexander Leidinger  wrote:

Am 2023-08-21 10:53, schrieb Konstantin Belousov:

On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger  wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger 
> > >>>> wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
> > >>>>> interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has
> > >>> several
> > >>> null mounts. One basesystem mounted into every jail, and then
> > >>> shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order to
> > >>>> do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes from
> > >>>> exclusive locking which should not be there to begin with -- as
> > >>>> in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of
> > >> xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how
> > this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
> for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS
exported.
6 of those nullfs mounts are also exported via Samba. The NFS 
exports

shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this 
exclusive

lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I 
need
space for. Both pools reside on the same disks. The root pool is a 
3-way

mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
space-pool. The jails are all basejail-style jails.



While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.


dtrace looks to me like a faster approach to get to the root than 
printf... my first naive try is to detect exclusive locks. I'm not 100% 
sure I got it right, but at least dtrace doesn't complain about it:

---snip---
#pragma D option dynvarsize=32m

fbt:nullfs:null_lock:entry
/args[0]->a_flags & 0x08 != 0/
{
stack();
}
---snip---

In which direction should I look with dtrace if this works in tonights 
run of periodic? I don't have enough knowledge about VFS to come up 
with some immediate ideas.


After your sysctl fix for maxvnodes I increased the amount of vnodes 10 
times compared to the initial report. This has increased the speed of 
the operation, the find runs in all those jails finished today after ~5h 
(@~8am) instead of in the afternoon as before. Could this suggest that 
in parallel some null_reclaim() is running which does the exclusive 
locks and slows down the entire operation?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-03 Thread Alexander Motin

Mark,

On 03.09.2023 22:54, Mark Millard wrote:

After that ^t produced the likes of:

load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 1% 
13004k


So the full state is not "tx->tx", but is actually a 
"tx->tx_quiesce_done_cv", which means the thread is waiting for new 
transaction to be opened, which means some previous to be quiesced and 
then synced.



#0 0x80b6f103 at mi_switch+0x173
#1 0x80bc0f24 at sleepq_switch+0x104
#2 0x80aec4c5 at _cv_wait+0x165
#3 0x82aba365 at txg_wait_open+0xf5
#4 0x82a11b81 at dmu_free_long_range+0x151


Here it seems like transaction commit is waited due to large amount of 
delete operations, which ZFS tries to spread between separate TXGs.  You 
should probably see some large and growing number in sysctl 
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay .



#5 0x829a87d2 at zfs_rmnode+0x72
#6 0x829b658d at zfs_freebsd_reclaim+0x3d
#7 0x8113a495 at VOP_RECLAIM_APV+0x35
#8 0x80c5a7d9 at vgonel+0x3a9
#9 0x80c5af7f at vrecycle+0x3f
#10 0x829b643e at zfs_freebsd_inactive+0x4e
#11 0x80c598cf at vinactivef+0xbf
#12 0x80c590da at vput_final+0x2aa
#13 0x80c68886 at kern_funlinkat+0x2f6
#14 0x80c68588 at sys_unlink+0x28
#15 0x8106323f at amd64_syscall+0x14f
#16 0x8103512b at fast_syscall_common+0xf8


What we don't see here is what quiesce and sync threads of the pool are 
actually doing.  Sync thread has plenty of different jobs, including 
async write, async destroy, scrub and others, that all may delay each 
other.


Before you rebooted the system, depending how alive it is, could you 
save a number of outputs of `procstat -akk`, or at least specifically 
`procstat -akk | grep txg_thread_enter` if the full is hard?  Or somehow 
else observe what they are doing.


`zpool status`, `zpool get all` and `sysctl -a` would also not harm.

PS: I may be wrong, but USB in "USB3 NVMe SSD storage" makes me shiver. 
Make sure there is no storage problems, like some huge delays, timeouts, 
etc, that can be seen, for example, as busy percents regularly spiking 
far above 100% in your `gstat -spod`.


--
Alexander Motin



Re: 100% CPU time for sysctl command, not killable

2023-09-03 Thread Alexander Leidinger

Am 2023-09-02 16:56, schrieb Mateusz Guzik:

On 8/20/23, Alexander Leidinger  wrote:

Hi,

sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable

sysctl program. This is somewhat unexpected...



fixed here 
https://cgit.freebsd.org/src/commit/?id=32988c1499f8698b41e15ed40a46d271e757bba3


I confirm.

Thanks!
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: kernel 100% CPU, and ports-mgmt/poudriere-devel 'Inspecting ports tree for modifications to git checkout...' for an extraordinarily long time

2023-09-02 Thread Alexander Motin

On 02.09.2023 09:32, Graham Perrin wrote:

On 02/09/2023 10:17, Mateusz Guzik wrote:

get a flamegraph with dtrace

https://github.com/brendangregg/FlameGraph

See <https://bz-attachments.freebsd.org/attachment.cgi?id=244586> for a 
PDF of a reply that probably did not reach the list.


Graham, the original SVG was scalable and searchable in browser.  Your 
PNG inside PDF is not.


--
Alexander Motin



Re: 100% CPU time for sysctl command, not killable

2023-08-30 Thread Alexander Leidinger

Am 2023-08-20 21:23, schrieb Alexander Leidinger:


Am 2023-08-20 18:55, schrieb Mina Galić:
procstat(1) kstack could be helpful here.

 Original Message 
On 20 Aug 2023, 17:29, Alexander Leidinger alexan...@leidinger.net> 
wrote:
Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable sysctl program. This is somewhat unexpected... Bye, 
Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 
0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 
0x8F31830F9F2772BF


  PIDTID COMMTDNAME  KSTACK
94391 118678 sysctl  -   sysctl_maxvnodes 
sysctl_root_handler_locked sysctl_root userland_sysctl sys___sysctl 
amd64_syscall fast_syscall_common


I experimented a bit by multiplying my initial value of 104857600. It 
fails between 5 and 6 times the initial value.


sysctl kern.maxvnodes=524288000 is successful within 4 seconds.

sysctl kern.maxvnodes=629145600 goes into a loop with the same procstat 
-k output.


Bye,

Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF

Re: Possible issue with linux xattr support?

2023-08-29 Thread Alexander Leidinger

Am 2023-08-29 21:31, schrieb Felix Palmen:

* Shawn Webb  [20230829 15:25]:

On Tue, Aug 29, 2023 at 09:15:03PM +0200, Felix Palmen wrote:
> * Kyle Evans  [20230829 14:07]:
> > On 8/29/23 14:02, Shawn Webb wrote:
> > > Back in 2019, I had a similar issue: I needed access to be able to
> > > read/write to the system extended attribute namespace from within a
> > > jailed context. I wrote a rather simple patch that provides that
> > > support on a per-jail basis:
> > >
> > > 
https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/96c85982b45e44a6105664c7068a92d0a61da2a3
> > >
> > > Hopefully that's useful to someone.
> > >
> > > Thanks,
> > >
> >
> > FWIW (which likely isn't much), I like this approach much better; it makes
> > more sense to me that it's a feature controlled by the creator of the jail
> > and not one allowed just by using a compat ABI within a jail.
>
> Well, a typical GNU userland won't work in a jail without this, that's
> what I know now. But I'm certainly with you, it doesn't feel logical
> that a Linux binary can do something in a jail a FreeBSD binary can't.
>
> So, indeed, making it a jail option sounds better.
>
> Unless, bringing back a question raised earlier in this thread: What's
> the reason to restrict this in a jailed context in the first place? IOW,
> could it just be allowed unconditionally?

In HardenedBSD's case, since we use filesystem extended attributes to
toggle exploit mitigations on a per-application basis, there's now a
conceptual security boundary between the host and the jail.

Should the jail and the host share resources, like executables, a
jailed process could toggle an exploit mitigation, and the toggle
would bubble up to the host. So the next time the host executed
/shared/app/executable/here, the security posture of the host would be
affected.


Isn't the sane approach here *not* to share any executables with a jail
other than via a read-only nullfs mount?


In https://reviews.freebsd.org/D40370 I provide infrastructure to 
automatically jail rc.d services. It will use the complete filesystem of 
the system, but uses all the other restrictions of jails. So the answer 
to your questions is "it depends".


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Possible issue with linux xattr support?

2023-08-29 Thread Alexander Leidinger

Am 2023-08-29 21:02, schrieb Shawn Webb:


Back in 2019, I had a similar issue: I needed access to be able to
read/write to the system extended attribute namespace from within a
jailed context. I wrote a rather simple patch that provides that
support on a per-jail basis:

https://git.hardenedbsd.org/hardenedbsd/HardenedBSD/-/commit/96c85982b45e44a6105664c7068a92d0a61da2a3


You enabled it by default. I would assume you had a thought about the 
implications... any memories about it?


What I'm after is:
 - What can go wrong if we enable it by default?
 - Why would we like to disable it (or any ideas why it is disabled by 
default in FreeBSD)?


Depending in the answers we may even use a simpler patch and have it 
allowed in jails even without the possibility to configure it.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-28 Thread Alexander Leidinger

Am 2023-08-22 18:59, schrieb Mateusz Guzik:

On 8/22/23, Alexander Leidinger  wrote:

Am 2023-08-21 10:53, schrieb Konstantin Belousov:

On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger  wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger 
> > >>>> wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you
> > >>>>> interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has
> > >>> several
> > >>> null mounts. One basesystem mounted into every jail, and then
> > >>> shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order to
> > >>>> do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes from
> > >>>> exclusive locking which should not be there to begin with -- as
> > >>>> in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of
> > >> xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how
> > this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
> for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS
exported.
6 of those nullfs mounts are also exported via Samba. The NFS 
exports

shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this 
exclusive

lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I 
need
space for. Both pools reside on the same disks. The root pool is a 
3-way

mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
space-pool. The jails are all basejail-style jails.



While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.


dtrace looks to me like a faster approach to get to the root than 
printf... my first naive try is to detect exclusive locks. I'm not 100% 
sure I got it right, but at least dtrace doesn't complain about it:

---snip---
#pragma D option dynvarsize=32m

fbt:nullfs:null_lock:entry
/args[0]->a_flags & 0x08 != 0/
{
stack();
}
---snip---

In which direction should I look with dtrace if this works in tonights 
run of periodic? I don't have enough knowledge about VFS to come up with 
some immediate ideas.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: zfs autotrim default to off now

2023-08-28 Thread Alexander Motin

On 28.08.2023 13:56, Pete Wright wrote:
So to be clear, if we were using the default autotrim=enabled behavior 
we in fact weren't having our SSDs trimmed?  I think that's my concern, 
as an admin I was under the impression that it was enabled by default 
but apparently that wasn't actually happening.


We wanted autotrim to be enabled by default, but it was not enabled, and 
it was reported as not enabled, so there should be no confusion.  The 
only confusion may have been if you tried to read the code and see it 
should have been enabled.


--
Alexander Motin



Re: zfs autotrim default to off now

2023-08-28 Thread Alexander Motin

Hi Pete,

On 27.08.2023 23:34, Pete Wright wrote:

looking at a recent pull of CURRENT i'm noticing this in the git logs:

#15079 set autotrim default to 'off' everywhere

which references this openzfs PR:
https://github.com/openzfs/zfs/pull/15079


looking at the PR i'm not seeing a reference to a bug report or anything, is
anyone able to point me to a bug report for this.  it seems like a pretty major
issue:
"As it turns out having autotrim default to 'on' on FreeBSD never really worked
due to mess with defines where userland and kernel module were getting different
default values (userland was defaulting to 'off', module was thinking it's
'on')."

i'd just like to make sure i better understand the issue and can see if my
systems are impacted.


You are probably misinterpreting the quote.  There is nothing wrong with 
the autotrim itself, assuming your specific devices properly handle it. 
It is just saying that setting it to "on" by default on FreeBSD, that 
was done to keep pre-OpenZFS behavior, appeared broken for a while.  So 
that commit merely confirmed the status quo.  It should not affect any 
already existing pools.  On a new pool creation the default is now 
officially "off", matching OpenZFS on other platforms, but there is no 
reason why you can not set it to "on", if it is beneficial for your 
devices and workloads.  As alternative, for example, you may run trim 
manually from time to time during any low activity periods.


--
Alexander Motin



Re: Possible issue with linux xattr support?

2023-08-28 Thread Alexander Leidinger

Am 2023-08-28 13:06, schrieb Dmitry Chagin:

On Sun, Aug 27, 2023 at 09:55:23PM +0200, Felix Palmen wrote:

* Dmitry Chagin  [20230827 22:46]:



> I can fix this completely disabling exttatr for jailed proc,
> however, it's gonna be bullshit, though

Would probably be better than nothing. AFAIK, "Linux jails" are used a
lot, probably with userlands from distributions actually using xattr.



It might sense to allow this priv (PRIV_VFS_EXTATTR_SYSTEM) for linux
jails by default? What do think, James?


I think the question is more if we want to allow it in jails (not 
specific to linux jails, as in: if it is ok for linux jails, it should 
be ok for FreeBSD jails too). So the question is what does this protect 
the hosts from, if this is not allowed in jails? Some kind of 
possibility to DoS the host?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: ZFS deadlock in 14

2023-08-24 Thread Alexander Motin

Martin,

The PR was just merged to upstream master.  Merge to zfs-2.2-release 
should follow shortly: https://github.com/openzfs/zfs/pull/15204 , same 
as some other 2.2 fixes: https://github.com/openzfs/zfs/pull/15205 .


Can't wait to get back in sync with ZFS master in FreeBSD main. ;)

On 22.08.2023 12:18, Alexander Motin wrote:

Hi Martin,

I am waiting for final test results from George Wilson and then will 
request quick merge of both to zfs-2.2-release branch.  Unfortunately 
there are still not many reviewers for the PR, since the code is not 
trivial, but at least with the test reports Brian Behlendorf and Mark 
Maybee seem to be OK to merge the two PRs into 2.2.  If somebody else 
have tested and/or reviewed the PR, you may comment on it.


On 22.08.2023 04:26, Martin Matuska wrote:

as 15107 is a prerequisite for 15122,
would it be possible to have https://github.com/openzfs/zfs/pull/15107 
merged into the OpenZFS zfs-2.2-release branch (and of course later 
15122)?


If the patches help I can cherry-pick them into main.

Alexander Motin  wrote:


On 17.08.2023 15:41, Dag-Erling Smørgrav wrote:

Alexander Motin  writes:

Trying to run your test (so far without reproduction) I see it
producing a substantial amount of ZIL writes.  The range of commits
you reduced the scope to so far includes my ZIL locking refactoring,
where I know for sure are some deadlocks.  I am already waiting for 3
weeks now for reviews and tests for PR that should fix it:
https://github.com/openzfs/zfs/pull/15122 .  It would be good if you
could test it, though it seems to depend on few more earlier patches
not merged to FreeBSD yet.


Do you have a FreeBSD branch with your patch applied?


I don't have a FreeBSD branch, but these two patches apply clean and 
build on top of today's FreeBSD main branch:


https://github.com/openzfs/zfs/pull/15107
https://github.com/openzfs/zfs/pull/15122

And if you still experience the issue, please show all stacks, or at 
least include ZFS sync threads.




--
Alexander Motin



Re: ZFS deadlock in 14

2023-08-23 Thread Alexander Motin

On 22.08.2023 14:24, Mark Millard wrote:

Alexander Motin  wrote on
Date: Tue, 22 Aug 2023 16:18:12 UTC :


I am waiting for final test results from George Wilson and then will
request quick merge of both to zfs-2.2-release branch. Unfortunately
there are still not many reviewers for the PR, since the code is not
trivial, but at least with the test reports Brian Behlendorf and Mark
Maybee seem to be OK to merge the two PRs into 2.2. If somebody else
have tested and/or reviewed the PR, you may comment on it.


I had written to the list that when I tried to test the system
doing poudriere builds (initially with your patches) using
USE_TMPFS=no so that zfs had to deal with all the file I/O, I
instead got only one builder that ended up active, the others
never reaching "Builder started":



Top was showing lots of "vlruwk" for the cpdup's. For example:

. . .
  362 0 root 400  27076Ki   13776Ki CPU19   19   4:23   0.00% 
cpdup -i0 -o ref 32
  349 0 root 530  27076Ki   13776Ki vlruwk  22   4:20   0.01% 
cpdup -i0 -o ref 31
  328 0 root 680  27076Ki   13804Ki vlruwk   8   4:30   0.01% 
cpdup -i0 -o ref 30
  304 0 root 370  27076Ki   13792Ki vlruwk   6   4:18   0.01% 
cpdup -i0 -o ref 29
  282 0 root 420  33220Ki   13956Ki vlruwk   8   4:33   0.01% 
cpdup -i0 -o ref 28
  242 0 root 560  27076Ki   13796Ki vlruwk   4   4:28   0.00% 
cpdup -i0 -o ref 27
. . .

But those processes did show CPU?? on occasion, as well as
*vnode less often. None of the cpdup's was stuck in

Removing your patches did not change the behavior.


Mark, to me "vlruwk" looks like a limit on number of vnodes.  I was not 
deep in that area at least recently, so somebody with more experience 
there could try to diagnose it.  At very least it does not look related 
to the ZIL issue discussed in this thread, at least with the information 
provided, so I am not surprised that the mentioned patches do not affect it.


--
Alexander Motin



Re: ZFS deadlock in 14

2023-08-22 Thread Alexander Motin

Hi Martin,

I am waiting for final test results from George Wilson and then will 
request quick merge of both to zfs-2.2-release branch.  Unfortunately 
there are still not many reviewers for the PR, since the code is not 
trivial, but at least with the test reports Brian Behlendorf and Mark 
Maybee seem to be OK to merge the two PRs into 2.2.  If somebody else 
have tested and/or reviewed the PR, you may comment on it.


On 22.08.2023 04:26, Martin Matuska wrote:

as 15107 is a prerequisite for 15122,
would it be possible to have https://github.com/openzfs/zfs/pull/15107 
merged into the OpenZFS zfs-2.2-release branch (and of course later 15122)?


If the patches help I can cherry-pick them into main.

Alexander Motin  wrote:


On 17.08.2023 15:41, Dag-Erling Smørgrav wrote:

Alexander Motin  writes:

Trying to run your test (so far without reproduction) I see it
producing a substantial amount of ZIL writes.  The range of commits
you reduced the scope to so far includes my ZIL locking refactoring,
where I know for sure are some deadlocks.  I am already waiting for 3
weeks now for reviews and tests for PR that should fix it:
https://github.com/openzfs/zfs/pull/15122 .  It would be good if you
could test it, though it seems to depend on few more earlier patches
not merged to FreeBSD yet.


Do you have a FreeBSD branch with your patch applied?


I don't have a FreeBSD branch, but these two patches apply clean and 
build on top of today's FreeBSD main branch:


https://github.com/openzfs/zfs/pull/15107
https://github.com/openzfs/zfs/pull/15122

And if you still experience the issue, please show all stacks, or at 
least include ZFS sync threads.


--
Alexander Motin



Re: Speed improvements in ZFS

2023-08-21 Thread Alexander Leidinger

Am 2023-08-21 10:53, schrieb Konstantin Belousov:

On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger  wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> > >>>> On 8/18/23, Alexander Leidinger  wrote:
> > >>>
> > >>>>> I have a 51MB text file, compressed to about 1MB. Are you interested
> > >>>>> to
> > >>>>> get it?
> > >>>>>
> > >>>>
> > >>>> Your problem is not the vnode limit, but nullfs.
> > >>>>
> > >>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has several
> > >>> null mounts. One basesystem mounted into every jail, and then shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> > >>>> First, some of the contention is notorious VI_LOCK in order to do
> > >>>> anything.
> > >>>>
> > >>>> But more importantly the mind-boggling off-cpu time comes from
> > >>>> exclusive locking which should not be there to begin with -- as in
> > >>>> that xlock in stat should be a slock.
> > >>>>
> > >>>> Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS 
exported.

6 of those nullfs mounts are also exported via Samba. The NFS exports
shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this exclusive 
lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I need 
space for. Both pools reside on the same disks. The root pool is a 3-way 
mirror, the "space-pool" is a 5-disk raidz2. All jails are on the 
space-pool. The jails are all basejail-style jails.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-21 Thread Alexander Leidinger

Am 2023-08-20 23:17, schrieb Konstantin Belousov:

On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:

On 8/20/23, Alexander Leidinger  wrote:
> Am 2023-08-20 22:02, schrieb Mateusz Guzik:
>> On 8/20/23, Alexander Leidinger  wrote:
>>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
>>>> On 8/18/23, Alexander Leidinger  wrote:
>>>
>>>>> I have a 51MB text file, compressed to about 1MB. Are you interested
>>>>> to
>>>>> get it?
>>>>>
>>>>
>>>> Your problem is not the vnode limit, but nullfs.
>>>>
>>>> https://people.freebsd.org/~mjg/netchild-periodic-find.svg
>>>
>>> 122 nullfs mounts on this system. And every jail I setup has several
>>> null mounts. One basesystem mounted into every jail, and then shared
>>> ports (packages/distfiles/ccache) across all of them.
>>>
>>>> First, some of the contention is notorious VI_LOCK in order to do
>>>> anything.
>>>>
>>>> But more importantly the mind-boggling off-cpu time comes from
>>>> exclusive locking which should not be there to begin with -- as in
>>>> that xlock in stat should be a slock.
>>>>
>>>> Maybe I'm going to look into it later.
>>>
>>> That would be fantastic.
>>>
>>
>> I did a quick test, things are shared locked as expected.
>>
>> However, I found the following:
>> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
>> mp->mnt_kern_flag |=
>> lowerrootvp->v_mount->mnt_kern_flag &
>> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
>> MNTK_EXTENDED_SHARED);
>> }
>>
>> are you using the "nocache" option? it has a side effect of xlocking
>
> I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
>

If you don't have "nocache" on null mounts, then I don't see how this
could happen.


There is also MNTK_NULL_NOCACHE on lower fs, which is currently set for
fuse and nfs at least.


11 of those 122 nullfs mounts are ZFS datasets which are also NFS 
exported. 6 of those nullfs mounts are also exported via Samba. The NFS 
exports shouldn't be needed anymore, I will remove them.


Shouldn't this implicit nocache propagate to the mount of the upper fs 
to give the user feedback about the effective state?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-20 Thread Alexander Leidinger

Am 2023-08-20 22:02, schrieb Mateusz Guzik:

On 8/20/23, Alexander Leidinger  wrote:

Am 2023-08-20 19:10, schrieb Mateusz Guzik:

On 8/18/23, Alexander Leidinger  wrote:



I have a 51MB text file, compressed to about 1MB. Are you interested
to
get it?



Your problem is not the vnode limit, but nullfs.

https://people.freebsd.org/~mjg/netchild-periodic-find.svg


122 nullfs mounts on this system. And every jail I setup has several
null mounts. One basesystem mounted into every jail, and then shared
ports (packages/distfiles/ccache) across all of them.


First, some of the contention is notorious VI_LOCK in order to do
anything.

But more importantly the mind-boggling off-cpu time comes from
exclusive locking which should not be there to begin with -- as in
that xlock in stat should be a slock.

Maybe I'm going to look into it later.


That would be fantastic.



I did a quick test, things are shared locked as expected.

However, I found the following:
if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
mp->mnt_kern_flag |= 
lowerrootvp->v_mount->mnt_kern_flag &

(MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
MNTK_EXTENDED_SHARED);
}

are you using the "nocache" option? it has a side effect of xlocking


I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-20 Thread Alexander Leidinger

Am 2023-08-20 19:10, schrieb Mateusz Guzik:

On 8/18/23, Alexander Leidinger  wrote:


I have a 51MB text file, compressed to about 1MB. Are you interested 
to

get it?



Your problem is not the vnode limit, but nullfs.

https://people.freebsd.org/~mjg/netchild-periodic-find.svg


122 nullfs mounts on this system. And every jail I setup has several 
null mounts. One basesystem mounted into every jail, and then shared 
ports (packages/distfiles/ccache) across all of them.


First, some of the contention is notorious VI_LOCK in order to do 
anything.


But more importantly the mind-boggling off-cpu time comes from
exclusive locking which should not be there to begin with -- as in
that xlock in stat should be a slock.

Maybe I'm going to look into it later.


That would be fantastic.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: 100% CPU time for sysctl command, not killable

2023-08-20 Thread Alexander Leidinger

Am 2023-08-20 18:55, schrieb Mina Galić:


procstat(1) kstack could be helpful here.

 Original Message 
On 20 Aug 2023, 17:29, Alexander Leidinger alexan...@leidinger.net> 
wrote:


Hi, sysctl kern.maxvnodes=1048576000 results in 100% CPU and a 
non-killable sysctl program. This is somewhat unexpected... Bye, 
Alexander. -- http://www.Leidinger.net alexan...@leidinger.net: PGP 
0x8F31830F9F2772BF http://www.FreeBSD.org netch...@freebsd.org : PGP 
0x8F31830F9F2772BF


  PIDTID COMMTDNAME  KSTACK
94391 118678 sysctl  -   sysctl_maxvnodes 
sysctl_root_handler_locked sysctl_root userland_sysctl sys___sysctl 
amd64_syscall fast_syscall_common


Bye,

Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF

100% CPU time for sysctl command, not killable

2023-08-20 Thread Alexander Leidinger

Hi,

sysctl kern.maxvnodes=1048576000 results in 100% CPU and a non-killable 
sysctl program. This is somewhat unexpected...


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: ZFS deadlock in 14

2023-08-19 Thread Alexander Motin

On 18.08.2023 18:34, Dag-Erling Smørgrav wrote:

Dag-Erling Smørgrav  writes:

Plot twist: c47116e909 _without_ the patches also appears to be working
fine.  The last kernel I know for sure deadlocks is b36f469a15, so I'm
going to test cd25b0f740 and 28d2e3b5de.


c47116e909 with cd25b0f740 and 28d2e3b5de reverted deadlocks, see
attached ddb.txt.  I'm going to see if reverting only 28d2e3b5de but not
cd25b0f740 changes anything.


Yes, it looks like ZIL-related deadlock:

ZFS sync thread in zil_sync() is waiting for allocated ZIL zios to complete:

Tracing command zfskern pid 5 tid 101124 td 0xfe0408cbb020 

sched_switch() at sched_switch+0x5da/frame 0xfe04090f7900 

mi_switch() at mi_switch+0x173/frame 0xfe04090f7920 

sleepq_switch() at sleepq_switch+0x104/frame 0xfe04090f7960 

_cv_wait() at _cv_wait+0x165/frame 0xfe04090f79c0 

zil_sync() at zil_sync+0x9b/frame 0xfe04090f7aa0 

dmu_objset_sync() at dmu_objset_sync+0x51b/frame 0xfe04090f7b70 

dsl_pool_sync() at dsl_pool_sync+0x11d/frame 0xfe04090f7bf0 

spa_sync() at spa_sync+0xc68/frame 0xfe04090f7e20 

txg_sync_thread() at txg_sync_thread+0x2eb/frame 0xfe04090f7ef0 

fork_exit() at fork_exit+0x82/frame 0xfe04090f7f30 

fork_trampoline() at fork_trampoline+0xe/frame 0xfe04090f7f30 

--- trap 0, rip = 0, rsp = 0, rbp = 0 --- 



Some thread requested fsync(), allocated zil zio, but stuck waiting for 
z_teardown_inactive_lock in attempt to get data to be written into zil, 
so zios were never even issued:


Tracing command blacklistd pid 521 tid 101136 td 0xfe040d08d000
sched_switch() at sched_switch+0x5da/frame 0xfe040c25c710
mi_switch() at mi_switch+0x173/frame 0xfe040c25c730
sleepq_switch() at sleepq_switch+0x104/frame 0xfe040c25c770
_sleep() at _sleep+0x2d6/frame 0xfe040c25c810
rms_rlock_fallback() at rms_rlock_fallback+0xd0/frame 0xfe040c25c850
zfs_freebsd_reclaim() at zfs_freebsd_reclaim+0x2b/frame 0xfe040c25c880
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x35/frame 0xfe040c25c8a0
vgonel() at vgonel+0x3a9/frame 0xfe040c25c910
vnlru_free_impl() at vnlru_free_impl+0x371/frame 0xfe040c25c990
vn_alloc_hard() at vn_alloc_hard+0xd3/frame 0xfe040c25c9b0
getnewvnode_reserve() at getnewvnode_reserve+0xa0/frame 0xfe040c25c9d0
zfs_zget() at zfs_zget+0x1f/frame 0xfe040c25ca80
zfs_get_data() at zfs_get_data+0x62/frame 0xfe040c25cb20
zil_lwb_commit() at zil_lwb_commit+0x32f/frame 0xfe040c25cb70
zil_lwb_write_issue() at zil_lwb_write_issue+0x4e/frame 0xfe040c25cbb0
zil_commit_impl() at zil_commit_impl+0x943/frame 0xfe040c25cd40
zfs_fsync() at zfs_fsync+0x8f/frame 0xfe040c25cd80
kern_fsync() at kern_fsync+0x18a/frame 0xfe040c25ce00
amd64_syscall() at amd64_syscall+0x138/frame 0xfe040c25cf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe040c25cf30
--- syscall (95, FreeBSD ELF64, fsync), rip = 0x3f979eeb074a, rsp = 
0x3f979a449c38, rbp = 0x3f979a449c50 ---


Third thread doing zfs rollback while holding z_teardown_inactive_lock 
tries to wait for transaction commit, causing deadlock:


Tracing command zfs pid 65636 tid 138109 td 0xfe0438d721e0 

sched_switch() at sched_switch+0x5da/frame 0xfe0439b2b950 

mi_switch() at mi_switch+0x173/frame 0xfe0439b2b970 

sleepq_switch() at sleepq_switch+0x104/frame 0xfe0439b2b9b0 

_cv_wait() at _cv_wait+0x165/frame 0xfe0439b2ba10 

txg_wait_synced_impl() at txg_wait_synced_impl+0xeb/frame 
0xfe0439b2ba50 

txg_wait_synced() at txg_wait_synced+0xb/frame 0xfe0439b2ba60 

zfsvfs_teardown() at zfsvfs_teardown+0x203/frame 0xfe0439b2bab0 

zfs_ioc_rollback() at zfs_ioc_rollback+0x12f/frame 0xfe0439b2bb00 

zfsdev_ioctl_common() at zfsdev_ioctl_common+0x612/frame 
0xfe0439b2bbc0 

zfsdev_ioctl() at zfsdev_ioctl+0x12a/frame 0xfe0439b2bbf0 

devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfe0439b2bc40 

vn_ioctl() at vn_ioctl+0xc2/frame 0xfe0439b2bcb0 

devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfe0439b2bcd0 

kern_ioctl() at kern_ioctl+0x286/frame 0xfe0439b2bd30 

sys_ioctl() at sys_ioctl+0x152/frame 0xfe0439b2be00 

amd64_syscall() at amd64_syscall+0x138/frame 0xfe0439b2bf30 

fast_syscall_common() at fast_syscall_common+0xf8/frame 
0xfe0439b2bf30 

--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x1afaddea3aaa, rsp = 
0x1afad4058328, rbp = 0x1afad40583a0 ---


Unfortunately I think the current code in main should still suffer from 
this specific deadlock.  cd25b0f740 fixes some deadlocks in this area, 
may be that is why you are getting issues less often, but I don't 
believe it fixes this specific one, may be you was just lucky.  Only 
https://github.com/openzfs/zfs/pull/15122 I believe should fix them.


--
Alexander Motin



Re: Speed improvements in ZFS

2023-08-18 Thread Alexander Leidinger

Am 2023-08-16 18:48, schrieb Alexander Leidinger:

Am 2023-08-15 23:29, schrieb Mateusz Guzik:

On 8/15/23, Alexander Leidinger  wrote:

Am 2023-08-15 14:41, schrieb Mateusz Guzik:


With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles


After a reboot:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 24696
vfs.vnodes_created: 1658162
vfs.numvnodes: 173937
vfs.recycles_free: 0
vfs.recycles: 0


New values after one rund of periodic:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 356202
vfs.vnodes_created: 427696288
vfs.numvnodes: 532620
vfs.recycles_free: 20213257
vfs.recycles: 0


And after the second round which only took 7h this night:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 3071754
vfs.vnodes_created: 1275963316
vfs.numvnodes: 3414906
vfs.recycles_free: 58411371
vfs.recycles: 0


Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.


What's the difference between recycles and recycles_free? Does the 
above count as bumping the maxvnodes?


^


Looks like there are not much free directly after the reboot. I will
check the values tomorrow after the periodic run again and maybe
increase by 10 or 100 so see if it makes a difference.


If this is not the problem you can use dtrace to figure it out.


dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or
something else?



I mean checking where find is spending time instead of speculating.

There is no productized way to do it so to speak, but the following
crapper should be good enough:

[script]

I will let it run this night.


I have a 51MB text file, compressed to about 1MB. Are you interested to 
get it?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: ZFS deadlock in 14

2023-08-17 Thread Alexander Motin

On 17.08.2023 15:41, Dag-Erling Smørgrav wrote:

Alexander Motin  writes:

Trying to run your test (so far without reproduction) I see it
producing a substantial amount of ZIL writes.  The range of commits
you reduced the scope to so far includes my ZIL locking refactoring,
where I know for sure are some deadlocks.  I am already waiting for 3
weeks now for reviews and tests for PR that should fix it:
https://github.com/openzfs/zfs/pull/15122 .  It would be good if you
could test it, though it seems to depend on few more earlier patches
not merged to FreeBSD yet.


Do you have a FreeBSD branch with your patch applied?


I don't have a FreeBSD branch, but these two patches apply clean and 
build on top of today's FreeBSD main branch:


https://github.com/openzfs/zfs/pull/15107
https://github.com/openzfs/zfs/pull/15122

And if you still experience the issue, please show all stacks, or at 
least include ZFS sync threads.


--
Alexander Motin



Re: ZFS deadlock in 14

2023-08-17 Thread Alexander Motin

On 17.08.2023 14:57, Alexander Motin wrote:

On 15.08.2023 12:28, Dag-Erling Smørgrav wrote:

Mateusz Guzik  writes:

Going through the list may or may not reveal other threads doing
something in the area and it very well may be they are deadlocked,
which then results in other processes hanging on them.

Just like in your case the process reported as hung is a random victim
and whatever the real culprit is deeper.


We already know the real culprit, see upthread.


Dag, I looked through the thread once more, and, while thank you for 
tracing it, but you never went beyond txg_wait_synced() in `zfs revert` 
thread.  If you are saying that thread is holding the lock, then the 
question is why transaction commit is stuck.  I need to see stacks for 
ZFS sync threads, or better all kernel stacks, just in case.  Without 
that information I can only speculate.


Trying to run your test (so far without reproduction) I see it producing 
a substantial amount of ZIL writes.  The range of commits you reduced 
the scope to so far includes my ZIL locking refactoring, where I know 
for sure are some deadlocks.  I am already waiting for 3 weeks now for 
reviews and tests for PR that should fix it: 
https://github.com/openzfs/zfs/pull/15122 .  It would be good if you 
could test it, though it seems to depend on few more earlier patches not 
merged to FreeBSD yet.


Ah, appears on the pool I tested first I have sync=always from earlier 
tests, that explains the high amount of ZIL traffic I saw, so it may be 
irrelevant.  But I still wonder what sync threads are doing in your case.


--
Alexander Motin



Re: ZFS deadlock in 14

2023-08-17 Thread Alexander Motin

On 15.08.2023 12:28, Dag-Erling Smørgrav wrote:

Mateusz Guzik  writes:

Going through the list may or may not reveal other threads doing
something in the area and it very well may be they are deadlocked,
which then results in other processes hanging on them.

Just like in your case the process reported as hung is a random victim
and whatever the real culprit is deeper.


We already know the real culprit, see upthread.


Dag, I looked through the thread once more, and, while thank you for 
tracing it, but you never went beyond txg_wait_synced() in `zfs revert` 
thread.  If you are saying that thread is holding the lock, then the 
question is why transaction commit is stuck.  I need to see stacks for 
ZFS sync threads, or better all kernel stacks, just in case.  Without 
that information I can only speculate.


Trying to run your test (so far without reproduction) I see it producing 
a substantial amount of ZIL writes.  The range of commits you reduced 
the scope to so far includes my ZIL locking refactoring, where I know 
for sure are some deadlocks.  I am already waiting for 3 weeks now for 
reviews and tests for PR that should fix it: 
https://github.com/openzfs/zfs/pull/15122 .  It would be good if you 
could test it, though it seems to depend on few more earlier patches not 
merged to FreeBSD yet.


--
Alexander Motin



Re: Defaulting serial communication to 115200 bps for FreeBSD 14

2023-08-16 Thread Alexander Motin

On 16.08.2023 18:14, Dennis Clarke wrote:

The default serial communications config on most telecom equipment that
I have seen ( in the last forty years ) defaults to 9600 8n1. If people
want something faster from FreeBSD then do the trivial :

     set comconsole_speed="115200"
     set console="comconsole"

Is that not trivial enough?


Except it is not a telecom equipment 40 years ago.  Even at 115200 that 
I routinely use on my development systems I feel serial console output 
affects verbose boot time and kernel console debugging output.  I also 
have BIOS console redirection enabled on my systems, and I believe the 
default there is also 115200, and even that is pretty slow.  I see no 
point to stay compatible if it is unusable.


--
Alexander Motin



Re: Speed improvements in ZFS

2023-08-16 Thread Alexander Leidinger

Am 2023-08-15 23:29, schrieb Mateusz Guzik:

On 8/15/23, Alexander Leidinger  wrote:

Am 2023-08-15 14:41, schrieb Mateusz Guzik:


With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles


After a reboot:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 24696
vfs.vnodes_created: 1658162
vfs.numvnodes: 173937
vfs.recycles_free: 0
vfs.recycles: 0


New values after one rund of periodic:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 356202
vfs.vnodes_created: 427696288
vfs.numvnodes: 532620
vfs.recycles_free: 20213257
vfs.recycles: 0


Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.


What's the difference between recycles and recycles_free? Does the above 
count as bumping the maxvnodes?



Looks like there are not much free directly after the reboot. I will
check the values tomorrow after the periodic run again and maybe
increase by 10 or 100 so see if it makes a difference.


If this is not the problem you can use dtrace to figure it out.


dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or
something else?



I mean checking where find is spending time instead of speculating.

There is no productized way to do it so to speak, but the following
crapper should be good enough:

[script]

I will let it run this night.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Speed improvements in ZFS

2023-08-15 Thread Alexander Leidinger

Am 2023-08-15 14:41, schrieb Mateusz Guzik:


With this in mind can you provide: sysctl kern.maxvnodes
vfs.wantfreevnodes vfs.freevnodes vfs.vnodes_created vfs.numvnodes
vfs.recycles_free vfs.recycles


After a reboot:
kern.maxvnodes: 10485760
vfs.wantfreevnodes: 2621440
vfs.freevnodes: 24696
vfs.vnodes_created: 1658162
vfs.numvnodes: 173937
vfs.recycles_free: 0
vfs.recycles: 0


Meanwhile if there is tons of recycles, you can damage control by
bumping kern.maxvnodes.


Looks like there are not much free directly after the reboot. I will 
check the values tomorrow after the periodic run again and maybe 
increase by 10 or 100 so see if it makes a difference.



If this is not the problem you can use dtrace to figure it out.


dtrace-count on vnlru_read_freevnodes() and vnlru_free_locked()? Or 
something else?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Strange network issues with -current

2023-08-15 Thread Alexander Leidinger

Am 2023-08-15 14:24, schrieb Alexander Leidinger:

Am 2023-08-15 13:48, schrieb Alexander Leidinger:

since a while I have some strange network issues in some parts of a 
particular system.


I just stumbled upon the mail which discusses issues with commit 
e3ba0d6adde3, and when I look into this I see changes related to the 
use of SO_REUSEPORT flags, and all my nginx systems use the reuseport 
directive in their config. I'm compiling right now with his change 
reverted. Once tested I will report back.


Unfortunately it wasn't that.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: Strange network issues with -current

2023-08-15 Thread Alexander Leidinger

Am 2023-08-15 13:48, schrieb Alexander Leidinger:

since a while I have some strange network issues in some parts of a 
particular system.


I just stumbled upon the mail which discusses issues with commit 
e3ba0d6adde3, and when I look into this I see changes related to the use 
of SO_REUSEPORT flags, and all my nginx systems use the reuseport 
directive in their config. I'm compiling right now with his change 
reverted. Once tested I will report back.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Speed improvements in ZFS

2023-08-15 Thread Alexander Leidinger

Hi,

just a report that I noticed a very high speed improvement in ZFS in 
-current. Since a looong time (at least since last year), for a 
jail-host of mine with about >20 jails on it which each runs periodic 
daily, the periodic daily runs of the jails take from about 3 am to 5pm 
or longer. I don't remember when this started, and I thought at that 
time that the problem may be data related. It's the long runs of "find" 
in one of the periodic daily jobs which takes that long, and the number 
of jails together with null-mounted basesystem inside the jail and a 
null-mounted package repository inside each jail the number of files and 
congruent access to the spining rust with first SSD and now NVME based 
cache may have reached some tipping point. I have all the periodic daily 
mails around, so theoretically I may be able to find when this started, 
but as can be seen in another mail to this mailinglist, the system which 
has all the periodic mails has some issues which have higher priority 
for me to track down...


Since I updated to a src from 2023-07-20, this is not the case anymore. 
The data is the same (maybe even a bit more, as I have added 2 more 
jails since then and the periodic daily runs which run more or less in 
parallel, are not taking considerably longer). The speed increase with 
the July-build are in the area of 3-4 hours for 23 parallel periodic 
daily runs. So instead of finishing the periodic runs around 5pm, they 
finish already around 1pm/2pm.


So whatever was done inside ZFS or VFS or nullfs between 2023-06-19 and 
2023-07-20 has given a huge speed improvement. From my memory I would 
say there is still room for improvement, as I think it may be the case 
that the periodic daily runs ended in the morning instead of the 
afteroon, but my memory may be flaky in this regard...


Great work to whoever was involved.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Strange network issues with -current

2023-08-15 Thread Alexander Leidinger

Hi,

since a while I have some strange network issues in some parts of a 
particular system.


A build with src from 2023-07-26 was still working ok. An update to 
2023-08-07 broke some parts in a strange way. I tried again with src 
from 2023-08-11 didn't fix things.


What I see is... strange and complex.

I have a jail host with about 23 jails. All the jails are sitting on a 
bridge, and have IPv6 and IPV4 addresses. One jail is a DNS server for a 
domain which contains all the DNS entries for all the jails on the 
system (and more). Other jails have mysql (FS socket for mysql 
nullfs-mounted into other jails for connecting to mysql via the FS 
socket instead of the network), dovecot IMAP server, postfix SMTP 
server, a nginx based reverse proxy and 2 different kinds of webmail 
solutions (old php74 based on the way out on favour for a php81 based 
one), a wiki and other things.


With the old working basesystem I can login into the old webmail system 
and read mails. With the newer non-working basesystem I still can login, 
but the auth-credentials are not stored in the backend-session and as 
such no mail is listed at all, as this requires subsequent connections 
from php to dovecot. This webmail system is going via the reverse proxy 
to the webmail-jail which has another nginx configured to connect to the 
php-fpm backend.
With the new webmail system I can login, read mails, and even are 
writing this email from. The first login to it fails. The second 
succeeds. It is not behind the reverse proxy (as it is not fully ready 
yet for access from the outside (DSL with NAT on the DSL-box to the 
reverse proxy)), but a single nginx with php-fpm backend (instead of 2 
nginx + php-fpm as in the old webmail).


The wiki behind the reverse proxy is sometimes working, and sometimes 
not. Sometimes it is providing everything, sometimes parts of the site 
is missing (e.g. pictures / icons). Sometimes there is simply a blank 
page, sometimes it gives an error message from the wiki about an 
unforseen bug...


The error messages in the nginx reverse proxy log for all the strange 
failure cases is "accept4() failed (53: Software caused connection 
abort)". Sometimes I get "upstream timed out". When it times out in the 
reverse proxy instead of getting the accept4-errors, I see the same 
accept4-error message in the nginx inside the wiki or webmail jail 
instead.


I tried to recompile all the components of the wiki and reverse proxy 
and php81 based webmail, to no avail. The issue persists.


Does this ring a bell to someone? Maybe some network or socket or VM 
based changes in this timeframe which smell like they could be related 
and maybe good candidates for a backup-test? Any ideas how to drill down 
with debugging to have a more simple test-case than the complex setup of 
if_bridge, epair, jails, wiki, php, nginx, ...?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF



Re: RTM_NEWNEIGH message for static ARP entry

2023-06-21 Thread Alexander Chernikov


On Wed, 21 Jun 2023, at 5:19 PM, Hartmut Brandt wrote:
> Hi,
> 
> when I set a static ARP entry I see an RTM_NEWNEIGH message on a netlink 
> socket as expected, but the ndm_state is NUD_INCOMPLETE. Should'nt this be 
> NUD_NOARP? At least this is what Linux returns.
Thanks for the report, I’ll take a look.
To me, NUD_REACHABLE | NUD_PERMANENT looks better suited for the particular 
case, but I’ll dive deeper tomorrow. Anyway NUD_INCOMPLETE is certainly wrong.
> 
> Cheers,
> Harti
> 
> 

/Alexander


Re: kernel: sonewconn: pcb 0xfffff8002b255a00 (local:/var/run/devd.seqpacket.pipe): Listen queue overflow: 1 already in queue awaiting acceptance (60 occurrences), ?

2023-06-21 Thread Alexander Leidinger

Quoting Gary Jennejohn  (from Tue, 20 Jun 2023 14:41:41 +):


On Tue, 20 Jun 2023 12:04:13 +0200
Alexander Leidinger  wrote:



"listen X backlog=y" and "sysctl kern.ipx.somaxconn=X" for FreeBSD



On my FreeBSD14 system these things are all under kern.ipc.


Typo on my side... it was supposed to read ipc, not ipx.

Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpTjpqaetzBB.pgp
Description: Digitale PGP-Signatur


Re: kernel: sonewconn: pcb 0xfffff8002b255a00 (local:/var/run/devd.seqpacket.pipe): Listen queue overflow: 1 already in queue awaiting acceptance (60 occurrences), ?

2023-06-20 Thread Alexander Leidinger


Quoting Gary Jennejohn  (from Tue, 20 Jun 2023 07:41:08 +):


On Tue, 20 Jun 2023 06:25:05 +0100
Graham Perrin  wrote:


Please, what's the meaning of the sonewconn lines?



sonewconn is described in socket(9).  Below a copy/paste of the description
from socket(9):

 Protocol implementations can use sonewconn() to create a socket and
 attach protocol state to that socket.  This can be used to create new
 sockets available for soaccept() on a listen socket.  The  
returned socket

 has a reference count of zero.

Apparently there was already a listen socket in the queue which had not been
consumed by soaccept() when a new sonewconn() call was made.

Anyway, that's my understanding.  Might be wrong.


In other words the software listening on it didn't process the request  
fast enough and a backlog piled up (e.g apache ListenBacklog or nginx  
"listen X backlog=y" and "sysctl kern.ipx.somaxconn=X" for FreeBSD  
itself). You may need faster hardware, more processes/threads to  
handle the traffic, or configure your software to do less to produce  
the same result (e.g. no real-time DNS resolution in the logging of a  
webserver or increasing the amount of allowed items in the backlog).  
If you can change the software, there's also the possibility to switch  
from blocking sockets to non-blocking sockets (to not have the  
select/accept loop block / run into contention) or kqueue.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpAjQQlBmAmQ.pgp
Description: Digitale PGP-Signatur


Re: ifconfig dumps core and gdb uses an undefined symbol

2023-06-14 Thread Alexander Chernikov



> On 14 Jun 2023, at 11:35, Gary Jennejohn  wrote:
> 
> On Wed, 14 Jun 2023 11:05:31 +0100
> Alexander Chernikov  wrote:
> 
>>> On 14 Jun 2023, at 10:53, Gary Jennejohn  wrote:
>>> 
>>> On Wed, 14 Jun 2023 09:01:35 +
>>> Gary Jennejohn mailto:ga...@gmx.de>> wrote:
>>> 
>>>> On Wed, 14 Jun 2023 09:09:04 +0100
>>>> Alexander Chernikov  wrote:
>>>> 
>>>>>> On 14 Jun 2023, at 08:59, Gary Jennejohn  wrote:
>>>>> Hi Gary,
>>>>>> 
>>>>>> So, now I have a new problem with current.
>>>>>> 
>>>>>> I just now updated my current sources and ran buildworld and buildkernel,
>>>>>> since Gleb fixed the WITHOUT_PF problem.
>>>>>> 
>>>>>> After installing the new world and kernel I see that ifconfig is dumping
>>>>>> a core, apparently when it tries to show lo0, since re0 is correctly
>>>>>> shown:
>>>>>> 
>>>>>> ifconfig
>>>>>> re0: flags=8843 metric 0 mtu 
>>>>>> 4088 
>>>>>> options=82098
>>>>>> ether redacted
>>>>>> inet 192.168.178.XXX netmask 0xff00 broadcast 192.168.178.255
>>>>>> Segmentation fault (core dumped)
>>>>> Could you please try to narrow down the crashing command? e.g.
>>>>> Ifconfig lo0
>>>>> Ifconfig lo0 net
>>>>> Ifconfig lo0 inet6
>>>>> Could you try to rebuild ifconfig w/o netlink (e.g. set 
>>>>> WITHOUT_NETLINK=yes in the make.conf & make -C sbin/ifconfig clean all 
>>>>> install) and see if the new binary works?
>>>>> 
>>>> 
>>>> I already have WITHOUT_NETLINK=yes in my /etc/src.conf.
>>>> 
>>>> I didn't install ifconfig. I simply started it from the build directory.
>>>> 
>>>> ifconfig lo0 shows the settings for lo0 and then dumps core.
>>>> 
>>> 
>>> After your most recent changes "ifconfig re0" and "ifconfg lo0" don't
>>> result in any errors.  But "ifconfig" alone still results in a core
>>> dump, which per gdb is happening in the strlcpy() call at in_status_tunnel()
>>> in af_inet.c.
>> Indeed.
>> 
>> diff --git a/sbin/ifconfig/ifconfig.c b/sbin/ifconfig/ifconfig.c
>> index d30d3e1909ae..6a80ad5763b2 100644
>> --- a/sbin/ifconfig/ifconfig.c
>> +++ b/sbin/ifconfig/ifconfig.c
>> @@ -822,6 +822,7 @@ list_interfaces_ioctl(if_ctx *ctx)
>>continue;
>>if (!group_member(ifa->ifa_name, args->matchgroup, 
>> args->nogroup))
>>continue;
>> +   ctx->ifname = cp;
>>/*
>> * Are we just listing the interfaces?
>> */
>> 
>> Does this one fix the crash?
>>> 
> 
> YES!
Should be fixed by 52ff8883185a then.
Thank you for the report and sorry for the breakage!
> 
> --
> Gary Jennejohn
> 




Re: ifconfig dumps core and gdb uses an undefined symbol

2023-06-14 Thread Alexander Chernikov


> On 14 Jun 2023, at 10:53, Gary Jennejohn  wrote:
> 
> On Wed, 14 Jun 2023 09:01:35 +
> Gary Jennejohn mailto:ga...@gmx.de>> wrote:
> 
>> On Wed, 14 Jun 2023 09:09:04 +0100
>> Alexander Chernikov  wrote:
>> 
>>>> On 14 Jun 2023, at 08:59, Gary Jennejohn  wrote:
>>> Hi Gary,
>>>> 
>>>> So, now I have a new problem with current.
>>>> 
>>>> I just now updated my current sources and ran buildworld and buildkernel,
>>>> since Gleb fixed the WITHOUT_PF problem.
>>>> 
>>>> After installing the new world and kernel I see that ifconfig is dumping
>>>> a core, apparently when it tries to show lo0, since re0 is correctly
>>>> shown:
>>>> 
>>>> ifconfig
>>>> re0: flags=8843 metric 0 mtu 4088 
>>>> options=82098
>>>>  ether redacted
>>>>  inet 192.168.178.XXX netmask 0xff00 broadcast 192.168.178.255
>>>> Segmentation fault (core dumped)
>>> Could you please try to narrow down the crashing command? e.g.
>>> Ifconfig lo0
>>> Ifconfig lo0 net
>>> Ifconfig lo0 inet6
>>> Could you try to rebuild ifconfig w/o netlink (e.g. set WITHOUT_NETLINK=yes 
>>> in the make.conf & make -C sbin/ifconfig clean all install) and see if the 
>>> new binary works?
>>> 
>> 
>> I already have WITHOUT_NETLINK=yes in my /etc/src.conf.
>> 
>> I didn't install ifconfig. I simply started it from the build directory.
>> 
>> ifconfig lo0 shows the settings for lo0 and then dumps core.
>> 
> 
> After your most recent changes "ifconfig re0" and "ifconfg lo0" don't
> result in any errors.  But "ifconfig" alone still results in a core
> dump, which per gdb is happening in the strlcpy() call at in_status_tunnel()
> in af_inet.c.
Indeed.

diff --git a/sbin/ifconfig/ifconfig.c b/sbin/ifconfig/ifconfig.c
index d30d3e1909ae..6a80ad5763b2 100644
--- a/sbin/ifconfig/ifconfig.c
+++ b/sbin/ifconfig/ifconfig.c
@@ -822,6 +822,7 @@ list_interfaces_ioctl(if_ctx *ctx)
continue;
if (!group_member(ifa->ifa_name, args->matchgroup, 
args->nogroup))
continue;
+   ctx->ifname = cp;
/*
 * Are we just listing the interfaces?
 */

Does this one fix the crash?
> 
> --
> Gary Jennejohn



Re: ifconfig dumps core and gdb uses an undefined symbol

2023-06-14 Thread Alexander Chernikov


> On 14 Jun 2023, at 10:01, Gary Jennejohn  wrote:
> 
> On Wed, 14 Jun 2023 09:09:04 +0100
> Alexander Chernikov mailto:melif...@freebsd.org>> 
> wrote:
> 
>>> On 14 Jun 2023, at 08:59, Gary Jennejohn  wrote:
>> Hi Gary,
>>> 
>>> So, now I have a new problem with current.
>>> 
>>> I just now updated my current sources and ran buildworld and buildkernel,
>>> since Gleb fixed the WITHOUT_PF problem.
>>> 
>>> After installing the new world and kernel I see that ifconfig is dumping
>>> a core, apparently when it tries to show lo0, since re0 is correctly
>>> shown:
>>> 
>>> ifconfig
>>> re0: flags=8843 metric 0 mtu 4088 
>>> options=82098
>>>  ether redacted
>>>  inet 192.168.178.XXX netmask 0xff00 broadcast 192.168.178.255
>>> Segmentation fault (core dumped)
>> Could you please try to narrow down the crashing command? e.g.
>> Ifconfig lo0
>> Ifconfig lo0 net
>> Ifconfig lo0 inet6
>> Could you try to rebuild ifconfig w/o netlink (e.g. set WITHOUT_NETLINK=yes 
>> in the make.conf & make -C sbin/ifconfig clean all install) and see if the 
>> new binary works?
>> 
> 
> I already have WITHOUT_NETLINK=yes in my /etc/src.conf.
> 
> I didn't install ifconfig. I simply started it from the build directory.
> 
> ifconfig lo0 shows the settings for lo0 and then dumps core.
> 
>>> 
>>> Unfortunately, I see this error message when I try to look at the core
>>> file with gdb:
>>> 
>>> gdb /sbin/ifconfig ifconfig.core
>>> ld-elf.so.1: Undefined symbol "rl_eof_found" referenced from COPY
>>> relocation in /usr/local/bin/gdb
>> Not a specialist here, but if you could build the binary with debug
>> (make DEBUG_FLAGS=-O0 -g3 sbin/ifconfig clean all install) & share the
>> binary & core with me, I could take a look on what?s happening.
>>> 
> 
> I compiled gbd under /usr/ports and it now works, although it's emitting
> some weird errors.
> 
> -O0 -g3 removes too much and gdb shows no useful information.
> 
> With just -g3 I get this output from gdb after running the newly compiled
> ifconfig:
> 
> Program terminated with signal SIGSEGV, Segmentation fault
> warning: Section `.reg-xstate/100294' in core file too small.
> #0  lagg_status (ctx=0x2f051660ba00) at /usr/src/sbin/ifconfig/iflagg.c:223
> 223 const int verbose = ctx->args->verbose;
> (gdb) bt
> #0  lagg_status (ctx=0x2f051660ba00) at /usr/src/sbin/ifconfig/iflagg.c:223
> #1  0x2efcf610ea55 in af_other_status (ctx=0x2f051660ba00)
>at /usr/src/sbin/ifconfig/ifconfig.c:964
> #2  status (args=0x2f051660ba70, ifa=0x2f051a2f2000, sdl=)
>at /usr/src/sbin/ifconfig/ifconfig.c:1788
> #3  list_interfaces_ioctl (args=0x2f051660ba70)
>at /usr/src/sbin/ifconfig/ifconfig.c:845
> #4  list_interfaces (args=0x2f051660ba70)
>at /usr/src/sbin/ifconfig/ifconfig.c:428
> #5  main (ac=, av=)
>at /usr/src/sbin/ifconfig/ifconfig.c:724
> (gdb)
> 
> I looked at ctx:
> 
> (gdb) p ctx
> $1 = (if_ctx *) 0x2f051660ba00
> (gdb) p/x *0x2f051660ba00
> $2 = 0x0 <==
> (gdb)
> 
> So, looks like the problem is in iflagg and ctx is NULL.
Ack. Does bbad5525fabf fix the issue?
> 
> --
> Gary Jennejohn



Re: ifconfig dumps core and gdb uses an undefined symbol

2023-06-14 Thread Alexander Chernikov


> On 14 Jun 2023, at 08:59, Gary Jennejohn  wrote:
Hi Gary,
> 
> So, now I have a new problem with current.
> 
> I just now updated my current sources and ran buildworld and buildkernel,
> since Gleb fixed the WITHOUT_PF problem.
> 
> After installing the new world and kernel I see that ifconfig is dumping
> a core, apparently when it tries to show lo0, since re0 is correctly
> shown:
> 
> ifconfig
> re0: flags=8843 metric 0 mtu 4088 
> options=82098
>   ether redacted
>   inet 192.168.178.XXX netmask 0xff00 broadcast 192.168.178.255
> Segmentation fault (core dumped)
Could you please try to narrow down the crashing command? e.g.
Ifconfig lo0
Ifconfig lo0 net
Ifconfig lo0 inet6
Could you try to rebuild ifconfig w/o netlink (e.g. set WITHOUT_NETLINK=yes in 
the make.conf & make -C sbin/ifconfig clean all install) and see if the new 
binary works?

> 
> Unfortunately, I see this error message when I try to look at the core
> file with gdb:
> 
> gdb /sbin/ifconfig ifconfig.core
> ld-elf.so.1: Undefined symbol "rl_eof_found" referenced from COPY
> relocation in /usr/local/bin/gdb
Not a specialist here, but if you could build the binary with debug (make 
DEBUG_FLAGS=“-O0 -g3” sbin/ifconfig clean all install) & share the binary & 
core with me, I could take a look on what’s happening.
> 
> pkg claims that my packages are all up to date.
> 
> Not exactly a fatal error, but still rather surprising.
> 
> --
> Gary Jennejohn
> 




Re: panic(s) in ZFS on CURRENT

2023-06-09 Thread Alexander Motin

Hi Gleb,

There are two probably related PRs upstream:
https://github.com/openzfs/zfs/pull/14939
https://github.com/openzfs/zfs/pull/14954

On 09.06.2023 00:57, Gleb Smirnoff wrote:

On Thu, Jun 08, 2023 at 07:56:07PM -0700, Gleb Smirnoff wrote:
T> I'm switching to INVARIANTS kernel right now and will see if that panics 
earlier.

This is what I got with INVARIANTS:

panic: VERIFY3(dev->l2ad_hand <= dev->l2ad_evict) failed (225142071296 <= 
225142063104)

cpuid = 17
time = 1686286015
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfe0160dcea90
kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe0160dceb40
vpanic() at vpanic+0x21f/frame 0xfe0160dcebe0
spl_panic() at spl_panic+0x4d/frame 0xfe0160dcec60
l2arc_write_buffers() at l2arc_write_buffers+0xcda/frame 0xfe0160dcedf0
l2arc_feed_thread() at l2arc_feed_thread+0x547/frame 0xfe0160dceec0
fork_exit() at fork_exit+0x122/frame 0xfe0160dcef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0160dcef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 1m4s
Dumping 5473 out of 65308 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

(kgdb) frame 4
#4  0x804342ea in l2arc_write_buffers (spa=0xfe022e942000, 
dev=0xfe023116a000, target_sz=16777216)
 at /usr/src/FreeBSD/sys/contrib/openzfs/module/zfs/arc.c:9445
9445ASSERT3U(dev->l2ad_hand, <=, dev->l2ad_evict);
(kgdb) p dev
$1 = (l2arc_dev_t *) 0xfe023116a000
(kgdb) p dev->l2ad_hand
$2 = 225142071296
(kgdb) p dev->l2ad_evict
$3 = 225142063104
(kgdb) p *dev
value of type `l2arc_dev_t' requires 66136 bytes, which is more than 
max-value-size

Never seen kgdb not being able to print a structure that reported to be too big.



--
Alexander Motin



Re: Error building kernel in current

2023-06-02 Thread Alexander Chernikov

On Fri, 2 Jun 2023, at 4:30 PM, Gary Jennejohn wrote:
> On Fri, 2 Jun 2023 09:59:40 +
> Gary Jennejohn  wrote:
> 
> > On Fri, 2 Jun 2023 09:56:44 +
> > Gary Jennejohn  wrote:
> >
> > > Error building kernel in current:
> > >
> > > --
> > > >>> stage 3.1: building everything
> > > --
> > > /usr/src/sys/netlink/route/iface.c:1315:22: error: use of undeclared
> > > identifier 'if_flags'
> > > if (error == 0 && !(if_flags & IFF_UP) && (if_getflags(ifp) & 
> > > IFF_UP))
> > > ^
> > > 1 error generated.
> > > --- iface.o ---
> > > *** [iface.o] Error code 1
Sorry for the breakage, I’ll fix it in a couple of hours.
> > >
> > > My source tree was updated just a few minutes ago and I didn't see any
> > > recent changes to iface.c.
> > >
> > > I have WITHOUT_NETLINK_SUPPORT= in my src.conf.
> > >
> >
> > Ah, my error.  The failure occurs while building the kernel, so I fixed
> > Subject accordingly.
> >
> 
> OK, this is another INET6 error.  I don't have INET6 enabled.
> 
> At line 1280 we have:
> #ifdef INET6
> int if_flags = if_getflags(ifp);
> #endif
> 
> and if_flags is used at line 1315 without checking whether INET6 is
> defined.
> 
> if_flags seems to be totally redundant, since the code at line 1315 will
> invoke if_getflags(ifp) if !(if_flags & IFF_UP) is true.
I wish it was true. The case here is that interface flags can change after 
adding the address, as many interface drivers silently bring the interface up 
upon the first address addition. Please see 
https://cgit.freebsd.org/src/commit/sys/netinet6?id=a77facd27368f618520d25391cfce11149879a41
 description for a more detailed explanation.
> 
> --
> Gary Jennejohn
> 
> 

/Alexander


Re: Surprise null root password

2023-05-31 Thread Alexander Leidinger
Quoting bob prohaska  (from Tue, 30 May 2023  
08:36:21 -0700):



I suggest to review changes ("df" instead of "tf" in etcupdate) to at least
those files which you know you have modified, including the password/group
stuff. After that you can decide if the diff which is shown with "df" can be
applied ("tf"), or if you want to keep the old version ("mf"), or if you
want to modify the current file ("e", with both versions present in the file
so that you can copy/paste between the different versions and keep what you
need).



The key sequences required to copy and paste between files in the edit screen
were elusive. Probably it was thought self-evident, but not for me.  
I last tried
it long ago, via mergemaster. Is there is a guide to commands for  
merging files

using /etcupdate? Is it in the vi man page? I couldn't find it.


etcupdate respects the EDITOR env-variable. You can use any editor you  
like there.


Typically I use the mouse to copy myself and google every time I  
can't (https://linuxize.com/post/how-to-copy-cut-paste-in-vim/).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgp4RaBvVQJjb.pgp
Description: Digitale PGP-Signatur


Re: Surprise null root password

2023-05-30 Thread Alexander Leidinger


Quoting bob prohaska  (from Fri, 26 May 2023  
16:26:06 -0700):



On Fri, May 26, 2023 at 10:55:49PM +0200, Yuri wrote:


The question is how you update the configuration files,
mergemaster/etcupdate/something else?



Via etcupdate after installworld. In the event the system
requests manual intervention I accept "theirs all". It seems
odd if that can null a root password.

Still, it does seem an outside possibility. I could see it adding
system users, but messing with root's existing password seems a
bit unexpected.


As you are posting to -current@, I expect you to report this issue  
about 14-current systems. As such: there was a "recent" change  
(2021-10-20) to the root entry to change the shell.
 
https://cgit.freebsd.org/src/commit/etc/master.passwd?id=d410b585b6f00a26c2de7724d6576a3ea7d548b7


By blindly accepting all changes, this has reset the PW to the default  
setting (empty).


I suggest to review changes ("df" instead of "tf" in etcupdate) to at  
least those files which you know you have modified, including the  
password/group stuff. After that you can decide if the diff which is  
shown with "df" can be applied ("tf"), or if you want to keep the old  
version ("mf"), or if you want to modify the current file ("e", with  
both versions present in the file so that you can copy/paste between  
the different versions and keep what you need).


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpGEjDP92h3s.pgp
Description: Digitale PGP-Signatur


Re: builworld fails due to error in af_inet.c

2023-05-22 Thread Alexander Chernikov
Sorry for the breakage (and thanks for markj@ for the prompt fix)

> On 22 May 2023, at 16:00, Gary Jennejohn  wrote:
> 
> I just ran buildworld using the latest current source.
> 
> It dies due to this error in line 385 of /usr/src/sbin/ifconfig/af_inet.c:
> 
> static void
> warn_nomask(ifflags)
> 
> The compiler really doesn't like not seeing a type for ifflags and bails
> out as the result.
> 
> Strangely enough, in_proc() a few lines later clearly has int ifflags in
> its list of variables.
> 
> Setting ifflags to int in warn_nomask() fixes the build.
> 
> Wasn't this compile tested before it was committed?
It was & it didn’t yell on my setup.
> 
> --
> Gary Jennejohn
> 




Re: change in compat/linux breaking net/citrix_ica

2023-04-26 Thread Alexander Leidinger
Quoting Jakob Alvermark  (from Wed, 26 Apr 2023  
09:01:00 +0200):



Hi,


I use net/citrix_ica for work.

After a recent change to -current in compat/linux it no longer  
works. The binary just segfaults.


What does "sysctl compat.linux.osrelease" display? If it is not 2.6.30  
or higher, try to set it to 2.6.30 or higher.


Bye,
Alexander.


I have bisected and it happened after this commit:

commit 40c36c4674eb9602709cf9d0483a4f34ad9753f6
Author: Dmitry Chagin 
Date:   Sat Apr 22 22:17:17 2023 +0300

    linux(4): Export the AT_RANDOM depending on the process osreldata

    AT_RANDOM has appeared in the 2.6.30 Linux kernel first time.



--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpvwszGFGPAo.pgp
Description: Digitale PGP-Signatur


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Alexander Leidinger
Quoting Mark Millard  (from Wed, 12 Apr 2023  
22:28:13 -0700):



A fair number of errors are of the form: the build
installing a previously built package for use in the
builder but later the builder can not find some file
from the package's installation.


As a data point, last year I had such issues with one particular  
package. It was consistent no matter how often I was updating the  
ports tree. Poudriere always failed on port X which was depending on  
port Y (don't remember the names). The problem was, that port Y was  
build successfully but an extract of it was not having a file it was  
supposed to have. IIRC I fixed the issue by building the port Y  
manually, as re-building port Y with poudriere didn't change the  
outcome.


So it seems this may not be specific to the most recent ZFS version,  
but could be an older issue. It may be the case that the more recent  
ZFS version amplifies the problem. It can also be that it is related  
to a specific use case in poudriere.


I remember a recent mail which talks about poudriere failing to copy  
files in resource-limited environments, see  
https://lists.freebsd.org/archives/dev-commits-src-all/2023-April/025153.html
While the issue you are trying to pin-point may not be related to this  
discussion, I mention it because it smells to me like we could be in a  
situation where a similar combination of unrelated to each other  
FreeBSD features could form a combination which triggers the issue at  
hand.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpjoaPNf5aAM.pgp
Description: Digitale PGP-Signatur


Re: /usr/src/sys/netlink/route/iface.c:738:1: warning: unused function

2023-04-08 Thread Alexander Chernikov



> On 8 Apr 2023, at 20:21, Gary Jennejohn  wrote:
> 
> This isn't a fatal error, but it would be easy to fix:
> 
> /usr/src/sys/netlink/route/iface.c:738:1: warning: unused function 
> 'inet6_get_plen' [-Wunused-function]
> inet6_get_plen(const struct in6_addr *addr)
> ^
> 1 warning generated.
> 
> This function is called in get_sa_plen(const struct sockaddr *sa) and the
> call is done inside #ifdef INET6...#endif, whereas the implementation is
> NOT inside #ifdef INET6...#endif, as it should be.
Thanks for the report, should be fixed by 39c0036d881b.
> 
> I do not have INET6 in my kernel config file.
> 
> --
> Gary Jennejohn
> 




  1   2   3   4   5   6   7   8   9   10   >