Re: Possible issue with linux xattr support?

2023-09-04 Thread Felix Palmen
* Dmitry Chagin  [20230904 21:49]:
> On Mon, Sep 04, 2023 at 08:43:02PM +0200, Felix Palmen wrote:
> > I guess I'll now do a full rebuild of my linuxulator userspace branch on
> > a kernel with that patch, just to be sure it won't break anything else,
> > this will take a few hours, I will report back. So far looking really
> > good :)

This completed while I was sleeping. No issues!

With ~120 "Linux ports" building fine, using all sorts of tools (like
python and some tools from coreutils), I think it's fair to assume it's
all fixed now.

Cheers, Felix

-- 
 Felix Palmen  {private}   fe...@palmen-it.de
 -- ports committer -- {web}  http://palmen-it.de
 {pgp public key}  http://palmen-it.de/pub.txt
 {pgp fingerprint} 6936 13D5 5BBF 4837 B212  3ACC 54AD E006 9879 F231


signature.asc
Description: PGP signature


Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Mark Millard
On Sep 4, 2023, at 18:39, Mark Millard  wrote:

> On Sep 4, 2023, at 10:05, Alexander Motin  wrote:
> 
>> On 04.09.2023 11:45, Mark Millard wrote:
>>> On Sep 4, 2023, at 06:09, Alexander Motin  wrote:
 per_txg_dirty_frees_percent is directly related to the delete delays we 
 see here.  You are forcing ZFS to commit transactions each 5% of dirty ARC 
 limit, which is 5% of 10% or memory size.  I haven't looked on that code 
 recently, but I guess setting it too low can make ZFS commit transactions 
 too often, increasing write inflation for the underlying storage.  I would 
 propose you to restore the default and try again.
>>> While this machine is different, the original problem was worse than
>>> the issue here: the load average was less than 1 for the most part
>>> the parallel bulk build when 30 was used. The fraction of time waiting
>>> was much longer than with 5. If I understand right, both too high and
>>> too low for a type of context can lead to increased elapsed time and
>>> getting it set to a near optimal is a non-obvious exploration.
>> 
>> IIRC this limit was modified several times since originally implemented.  
>> May be it could benefit from another look, if default 30% is not good.  It 
>> would be good if generic ZFS issues like this were reported to OpenZFS 
>> upstream to be visible to a wider public.  Unfortunately I have several 
>> other project I must work on, so if it is not a regression I can't promise 
>> I'll take it right now, so anybody else is welcome.
> 
> As I understand, there are contexts were 5 is inappropriate
> and 30 works fairly well: no good single answer as to what
> value range will avoid problems.
> 
>>> An overall point for the goal of my activity is: what makes a
>>> good test context for checking if ZFS is again safe to use?
>>> May be other tradeoffs make, say, 4 hardware threads more
>>> reasonable than 32.
>> 
>> Thank you for your testing.  The best test is one that nobody else run. It 
>> also correlates with the topic of "safe to use", which also depends on what 
>> it is used for. :)
> 
> Looks like use of a M.2 Samsung SSD 960 PRO 1TB with a
> non-debug FreeBSD build is suitable for the bulk -a -J128
> test (no ALLOW_MAKE_JOBS variants enabled, USE_TMPFS=no in
> use) on the 32 hardware thread system. (The swap partition
> in use is from the normal environment's PCIe Optane media.)
> The %idle and the load averages and %user stayed reasonable
> in a preliminary test. One thing it does introduce is trim
> management (both available and potentially useful). (Optane
> media does not support or need it.) No
> per_txg_dirty_frees_percent adjustment involved (still 5).
> 
> I've learned to not use ^T for fear of /bin/sh aborting
> and messing up poudriere's context. So I now monitor with:
> 
> # poudriere status -b
> 
> in a separate ssh session.
> 
> I'll note that I doubt I'd try for a complete bulk -a .
> I'd probably stop it if I notice that the number of
> active builders drops off for a notable time (normal
> waiting for prerequisites appearing to be why).
> 
> 

So much for that idea. It has reached a state of staying
under 3500 w/s and up to 4.5ms/w (normally above 2ms/w).
%busy wondering in the range 85% to 101%. Lots of top
showing tx->tx. There is some read and other activity as
well. Of course the kBps figures are larger than the
earlier USB3 context (larger kB figures).

It reached about 1350 port->package builds over the first
hour after the 2nd "Buildee started".

autotrim is off. Doing a "zpool trim -w zamd64" leads to
. . . larger w/s figures during the process. So
more exploring to do at some point. Possibly:

autotrim
per_txg_dirty_frees_percent

For now, I'm just running "zpool trim -w zamd64" once
and a while so the process continues better.

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

. . . At around the end of 2nd hour: 2920 or so built. 

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

. . . I've turned on autotrim without stopping the bulk
build.

. . . At around the end of 3rd hour: 4080 or so built. 

Still no evidence of deadlocks. No evidence of builds
failing for corruptions.

Looks like the % idle has been high for a significant
time. I think I'll stop this specific test and clean
things out.

Looks like lang/guile* are examples of not respecting
the lack of ALLOW_MAKE_JOBS use.

Hmm. The ^C ended up with:

^C[03:41:07] Error: Signal SIGINT caught, cleaning up and exiting
[main-amd64-bulk_a-default] [2023-09-04_17h49m28s] [sigint:] Queued: 34588 
Built: 4331  Failed: 12Skipped: 303   Ignored: 335   Fetched: 0 
Tobuild: 29607  Time: 03:41:05
[03:41:07] Logs: 
/usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-04_17h49m28s
[03:41:19] [95] [00:06:59] Finished databases/php80-pdo_sqlite | 
php80-pdo_sqlite-8.0.30: Failed: build-depends
[03:41:20] [60] [00:10:53] Finished www/php82-session | php82-session-8.2.9: 

Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Mark Millard



On Sep 4, 2023, at 10:05, Alexander Motin  wrote:

> On 04.09.2023 11:45, Mark Millard wrote:
>> On Sep 4, 2023, at 06:09, Alexander Motin  wrote:
>>> per_txg_dirty_frees_percent is directly related to the delete delays we see 
>>> here.  You are forcing ZFS to commit transactions each 5% of dirty ARC 
>>> limit, which is 5% of 10% or memory size.  I haven't looked on that code 
>>> recently, but I guess setting it too low can make ZFS commit transactions 
>>> too often, increasing write inflation for the underlying storage.  I would 
>>> propose you to restore the default and try again.
>> While this machine is different, the original problem was worse than
>> the issue here: the load average was less than 1 for the most part
>> the parallel bulk build when 30 was used. The fraction of time waiting
>> was much longer than with 5. If I understand right, both too high and
>> too low for a type of context can lead to increased elapsed time and
>> getting it set to a near optimal is a non-obvious exploration.
> 
> IIRC this limit was modified several times since originally implemented.  May 
> be it could benefit from another look, if default 30% is not good.  It would 
> be good if generic ZFS issues like this were reported to OpenZFS upstream to 
> be visible to a wider public.  Unfortunately I have several other project I 
> must work on, so if it is not a regression I can't promise I'll take it right 
> now, so anybody else is welcome.

As I understand, there are contexts were 5 is inappropriate
and 30 works fairly well: no good single answer as to what
value range will avoid problems.

>> An overall point for the goal of my activity is: what makes a
>> good test context for checking if ZFS is again safe to use?
>> May be other tradeoffs make, say, 4 hardware threads more
>> reasonable than 32.
> 
> Thank you for your testing.  The best test is one that nobody else run. It 
> also correlates with the topic of "safe to use", which also depends on what 
> it is used for. :)

Looks like use of a M.2 Samsung SSD 960 PRO 1TB with a
non-debug FreeBSD build is suitable for the bulk -a -J128
test (no ALLOW_MAKE_JOBS variants enabled, USE_TMPFS=no in
use) on the 32 hardware thread system. (The swap partition
in use is from the normal environment's PCIe Optane media.)
The %idle and the load averages and %user stayed reasonable
in a preliminary test. One thing it does introduce is trim
management (both available and potentially useful). (Optane
media does not support or need it.) No
per_txg_dirty_frees_percent adjustment involved (still 5).

I've learned to not use ^T for fear of /bin/sh aborting
and messing up poudriere's context. So I now monitor with:

# poudriere status -b

in a separate ssh session.

I'll note that I doubt I'd try for a complete bulk -a .
I'd probably stop it if I notice that the number of
active builders drops off for a notable time (normal
waiting for prerequisites appearing to be why).


===
Mark Millard
marklmi at yahoo.com




FYI: ^T use during poudriere bulk vs. /bin/sh operation: I got a "Unsafe ckmalloc() call" Abort trap that left a mess

2023-09-04 Thread Mark Millard
During a (zfs based) poudriere bulk -a run a ^T got a:

Unsafe ckmalloc() call
Abort trap (core dumped)

from /bin/sh . AT that point it had
built something like 350 ports into
packages in .building/ . After the
last finish was reported for the
already building ports, I killed 2
stuck poudriere related processes
(the rest were gone already) and
then I used:

# poudriere jail -jmain-amd64-bulk_a -k
[00:06:03] Unmounting file systems
umount: unmount of 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/tests failed: 
Device busy
umount: unmount of 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/share failed: 
Device busy
umount: unmount of 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/lib32 failed: 
Device busy
umount: unmount of 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/rescue failed: 
Device busy
umount: unmount of 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/tests failed: 
Device busy
umount: unmount of 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/share failed: 
Device busy
umount: unmount of 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/lib32 failed: 
Device busy
umount: unmount of 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/rescue failed: 
Device busy
chflags: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/rescue/rcorder: 
Read-only file system
chflags: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/rescue/mdconfig: 
Read-only file system
chflags: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/rescue/mdmfs: 
Read-only file system
. . .
chflags: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/tests/lib: 
Read-only file system
chflags: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/tests: 
Read-only file system
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/rescue: Device 
busy
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/lib32: 
Device busy
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/share: 
Device busy
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr/tests: 
Device busy
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/usr: Directory 
not empty
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref: Directory not 
empty
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/dev: 
Device busy
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/var/db/ports: 
Device busy
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/var/db: 
Directory not empty
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/var/empty: 
Operation not permitted
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/var: 
Directory not empty
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/rescue: 
Device busy
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/sbin/init: 
Operation not permitted
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/sbin: 
Directory not empty
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/libexec/ld-elf.so.1:
 Operation not permitted
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/libexec/ld-elf32.so.1:
 Operation not permitted
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/libexec: 
Directory not empty
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/packages: 
Device busy
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/proc: 
Device busy
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/distfiles: 
Device busy
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/compat/linux/proc:
 Device busy
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/compat/linux: 
Directory not empty
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/compat: 
Directory not empty
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/lib/libthr.so.3:
 Operation not permitted
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/lib/libc.so.7: 
Operation not permitted
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/lib/libcrypt.so.5:
 Operation not permitted
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/lib: 
Directory not empty
rm: /usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/usr/src: 
Device busy
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/usr/bin/crontab:
 Operation not permitted
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/usr/bin/passwd:
 Operation not permitted
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/usr/bin/chpass:
 Operation not permitted
rm: 
/usr/local/poudriere/data/.m/main-amd64-bulk_a-default/ref/../40/usr/bin/su: 

Re: Possible issue with linux xattr support?

2023-09-04 Thread Felix Palmen
* Dmitry Chagin  [20230904 21:49]:
> On Mon, Sep 04, 2023 at 08:43:02PM +0200, Felix Palmen wrote:
> > Thanks a lot for the quick patch! I have to admit I don't fully
> > understand it immediately, but I assume it's an attempt to only list
> > attributes not in system.* when that namespace is restricted?
> 
> no,
> +   if (error == EPERM)
> +   break;
> mean that listxattr will fail if user is unauthorized to access system
> namespace, ie, listxattr will return ENOTSUP

Ah, now I get it, so breaking out of the loop there will leave
attrnamespace at EXTATTR_NAMESPACE_SYSTEM if that was the cause,
therefore error_to_xattrerror() works as expected. Nice!

Well, given that I really don't expect any further problems, but give my
poor builder a few hours to *really* verify :)

Cheers, Felix

-- 
 Felix Palmen  {private}   fe...@palmen-it.de
 -- ports committer -- {web}  http://palmen-it.de
 {pgp public key}  http://palmen-it.de/pub.txt
 {pgp fingerprint} 6936 13D5 5BBF 4837 B212  3ACC 54AD E006 9879 F231


signature.asc
Description: PGP signature


Re: Possible issue with linux xattr support?

2023-09-04 Thread Dmitry Chagin
On Mon, Sep 04, 2023 at 08:43:02PM +0200, Felix Palmen wrote:
> Hello Dmitry,
> 
> * Dmitry Chagin  [20230904 21:29]:
> > Thanks for the report,
> > please, try this: https://people.freebsd.org/~dchagin/lxattr.patch
> > Don't be surprised, there 2 fixes
> 
> Thanks a lot for the quick patch! I have to admit I don't fully
> understand it immediately, but I assume it's an attempt to only list
> attributes not in system.* when that namespace is restricted?

no,
+   if (error == EPERM)
+   break;
mean that listxattr will fail if user is unauthorized to access system
namespace, ie, listxattr will return ENOTSUP

> 
> Well, it *does* solve the issue with 'cp -p' from GNU coreutils!
> 
> I guess I'll now do a full rebuild of my linuxulator userspace branch on
> a kernel with that patch, just to be sure it won't break anything else,
> this will take a few hours, I will report back. So far looking really
> good :)
> 

nice, thanks

> Cheers, Felix
> 
> -- 
>  Felix Palmen  {private}   fe...@palmen-it.de
>  -- ports committer -- {web}  http://palmen-it.de
>  {pgp public key}  http://palmen-it.de/pub.txt
>  {pgp fingerprint} 6936 13D5 5BBF 4837 B212  3ACC 54AD E006 9879 F231





Re: Possible issue with linux xattr support?

2023-09-04 Thread Felix Palmen
Hello Dmitry,

* Dmitry Chagin  [20230904 21:29]:
> Thanks for the report,
> please, try this: https://people.freebsd.org/~dchagin/lxattr.patch
> Don't be surprised, there 2 fixes

Thanks a lot for the quick patch! I have to admit I don't fully
understand it immediately, but I assume it's an attempt to only list
attributes not in system.* when that namespace is restricted?

Well, it *does* solve the issue with 'cp -p' from GNU coreutils!

I guess I'll now do a full rebuild of my linuxulator userspace branch on
a kernel with that patch, just to be sure it won't break anything else,
this will take a few hours, I will report back. So far looking really
good :)

Cheers, Felix

-- 
 Felix Palmen  {private}   fe...@palmen-it.de
 -- ports committer -- {web}  http://palmen-it.de
 {pgp public key}  http://palmen-it.de/pub.txt
 {pgp fingerprint} 6936 13D5 5BBF 4837 B212  3ACC 54AD E006 9879 F231


signature.asc
Description: PGP signature


Re: Possible issue with linux xattr support?

2023-09-04 Thread Dmitry Chagin
On Mon, Sep 04, 2023 at 04:03:54PM +0200, Felix Palmen wrote:
> * Felix Palmen  [20230904 15:39]:
> > For some reason (I still have to try to get more information about it,
> > will do soon), something else is broken now. While install from GNU
> > coreutils works fine, some build systems use 'cp -p' to install files
> > instead, which now fails with:
> > 
> > | cp: preserving permissions for [...]: Operation not permitted
> 
> The issue seems to be with listing attributes:
> 
> | root@15-default:/wrkdirs/usr/ports/x11/linuxsrc-libxcb/work/libxcb-1.15/doc 
> # truss /compat/linux/bin/cp -pR ./tutorial 
> '/wrkdirs/usr/ports/x11/linuxsrc-libxcb/work/stage/compat/linux/usr/share/doc/libxcb/'
>  2>&1 | grep xattr
> | linux_flistxattr(0x4,0x0,0x0)ERR#-1 'Operation not 
> permitted'
> | linux_flistxattr(0x4,0x0,0x0)ERR#-1 'Operation not 
> permitted'
> | linux_llistxattr(0xcdd9,0x0,0x0) ERR#-1 'Operation not 
> permitted'
> 
> The following q patch makes it work again:
> 
> #v+
> diff --git a/sys/compat/linux/linux_xattr.c b/sys/compat/linux/linux_xattr.c
> index 74b47f1cbaec..0b5af084b1b1 100644
> --- a/sys/compat/linux/linux_xattr.c
> +++ b/sys/compat/linux/linux_xattr.c
> @@ -198,7 +198,7 @@ listxattr(struct thread *td, struct listxattr_args *args)
> if (error == 0)
> td->td_retval[0] = cnt;
> free(data, M_LINUX);
> -   return (error_to_xattrerror(attrnamespace, error));
> +   return (error_to_xattrerror(EXTATTR_NAMESPACE_SYSTEM, error));
>  }
>  
>  int
> #v-
> 
> I think this makes sense, because listxattr iterates over all
> namespaces, so there's no sane way to know whether the EPERM was caused
> by trying to access "system" or something else.

Thanks for the report,
please, try this: https://people.freebsd.org/~dchagin/lxattr.patch
Don't be surprised, there 2 fixes


> 
> Cheers, Felix
> 
> -- 
>  Felix Palmen  {private}   fe...@palmen-it.de
>  -- ports committer -- {web}  http://palmen-it.de
>  {pgp public key}  http://palmen-it.de/pub.txt
>  {pgp fingerprint} 6936 13D5 5BBF 4837 B212  3ACC 54AD E006 9879 F231





Re: 14.0-CURRENT boots fine but keyboard does not work

2023-09-04 Thread Warner Losh
On Mon, Sep 4, 2023, 11:40 AM Michael Gmelin  wrote:

>
>
> On 4. Sep 2023, at 19:34, Matthias Apitz  wrote:
>
> El día lunes, septiembre 04, 2023 a las 07:29:41p. m. +0200, Michael
> Gmelin escribió:
>
>
>
> On 4. Sep 2023, at 19:23, Matthias Apitz  wrote:
>
>
> 
>
> Added Alexander Motin  to To: as the origin of the CI;
>
>
> Neither hw.atkbd.hz=1 nor hw.atkbd.hz=10 makes the keyboard working on
>
> my beloved Acer C720. Should I file a new PR?
>
>
>
> Filing a PR makes sense, could you please Cc me on it?
>
>
> Do you know which version of FreeBSD was the last that worked for you?
>
>
> I'm actually using (and typing this on it) r368166. Will file a PR
> tomorrow. Thanks
>
>
> This could also be related:
>
>
> https://cgit.freebsd.org/src/commit/?id=319d2bf407b3762da6f1c67ffe8dce2fee587aaf
>
> You could try to undo that patch and build a new kernel.
>

It shouldn't make a difference... but I may have been given bad advice if
it did... if it's this one, I'll help sort it out.

Warner

Best
> Michael
>
>
>
>matthias
>
> El día lunes, septiembre 04, 2023 a las 06:55:52p. m. +0200, Michael
> Gmelin escribió:
>
>
>
>
> On Mon, 4 Sep 2023 18:43:11 +0200
>
> Matthias Apitz  wrote:
>
>
> I have a 14.0-CURRENT compiled from sources of head from August 4,
>
> which boots fine from a produced USB key, but the keyboard does not
>
> work on an Acer C720 (amd64), on other laptops the keyboard is fine.
>
>
> The keyboard works during the boot menu (for example to enable verbose
>
> boot messages) but not on the login: prompt of the booted system.
>
>
> I've enabled SSH access into the C720 (if someone need more
>
> information) and I'm attaching /var/log/messages of the booted system.
>
>
> Hi Matthias,
>
>
> The C720 required special patches for the keyboard to work, which I
>
> originally added here:
>
>
> https://cgit.freebsd.org/src/commit/?id=6c176113bbdd598231ec47d161d4c3714997169b
>
>
> I assume that something in that area changed recently.
>
>
> Without digging into it, this looks like a possible cause:
>
>
>
> https://cgit.freebsd.org/src/commit/sys/dev/atkbdc/atkbd.c?id=ce881170088c4c98c036fe561f8ee8413c2e2585
>
>
> atkbd: Disable periodic polling by default.
>
> It is one of the few remaining Giant-locked callouts.  It would be
>
> good to remove it, not mentioning that polling itself is not good.
>
>
> If this cause keyboard/mouse freezes on some hardware, please set
>
> loader tunable hw.atkbd.hz=1 as workaround and report the issue.
>
>
> So you could try to set hw.atkbd.hz=1 (or hw.atkbd.hz=10) in
>
> /boot/loader.conf, then reboot and see if it helps.
>
>
> Best
>
> Michael
>
>
> --
>
> Michael Gmelin
>
>
>
> --
>
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/
> +49-176-38902045
>
> Public GnuPG key: http://www.unixarea.de/key.pub
>
>
>
>
> --
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/
> +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub
>
>


Re: 14.0-CURRENT boots fine but keyboard does not work

2023-09-04 Thread Michael Gmelin


> On 4. Sep 2023, at 19:34, Matthias Apitz  wrote:
> 
> El día lunes, septiembre 04, 2023 a las 07:29:41p. m. +0200, Michael Gmelin 
> escribió:
> 
>> 
>> 
 On 4. Sep 2023, at 19:23, Matthias Apitz  wrote:
>>> 
>>> 
>>> Added Alexander Motin  to To: as the origin of the CI;
>>> 
>>> Neither hw.atkbd.hz=1 nor hw.atkbd.hz=10 makes the keyboard working on
>>> my beloved Acer C720. Should I file a new PR?
>>> 
>> 
>> Filing a PR makes sense, could you please Cc me on it?
>> 
>> Do you know which version of FreeBSD was the last that worked for you?
> 
> I'm actually using (and typing this on it) r368166. Will file a PR
> tomorrow. Thanks
> 

This could also be related:

https://cgit.freebsd.org/src/commit/?id=319d2bf407b3762da6f1c67ffe8dce2fee587aaf

You could try to undo that patch and build a new kernel.

Best
Michael



>matthias
> 
 El día lunes, septiembre 04, 2023 a las 06:55:52p. m. +0200, Michael 
 Gmelin escribió:
 
 
 
 On Mon, 4 Sep 2023 18:43:11 +0200
 Matthias Apitz  wrote:
 
> I have a 14.0-CURRENT compiled from sources of head from August 4,
> which boots fine from a produced USB key, but the keyboard does not
> work on an Acer C720 (amd64), on other laptops the keyboard is fine.
> 
> The keyboard works during the boot menu (for example to enable verbose
> boot messages) but not on the login: prompt of the booted system.
> 
> I've enabled SSH access into the C720 (if someone need more
> information) and I'm attaching /var/log/messages of the booted system.
 
 Hi Matthias,
 
 The C720 required special patches for the keyboard to work, which I
 originally added here:
 https://cgit.freebsd.org/src/commit/?id=6c176113bbdd598231ec47d161d4c3714997169b
 
 I assume that something in that area changed recently.
 
 Without digging into it, this looks like a possible cause:
 
 https://cgit.freebsd.org/src/commit/sys/dev/atkbdc/atkbd.c?id=ce881170088c4c98c036fe561f8ee8413c2e2585
 
 atkbd: Disable periodic polling by default.
 It is one of the few remaining Giant-locked callouts.  It would be
 good to remove it, not mentioning that polling itself is not good.
 
 If this cause keyboard/mouse freezes on some hardware, please set
 loader tunable hw.atkbd.hz=1 as workaround and report the issue.
 
 So you could try to set hw.atkbd.hz=1 (or hw.atkbd.hz=10) in
 /boot/loader.conf, then reboot and see if it helps.
 
 Best
 Michael
 
 -- 
 Michael Gmelin
 
>>> 
>>> -- 
>>> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
>>> Public GnuPG key: http://www.unixarea.de/key.pub
>> 
>> 
> 
> -- 
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub


Re: 14.0-CURRENT boots fine but keyboard does not work

2023-09-04 Thread Matthias Apitz
El día lunes, septiembre 04, 2023 a las 07:29:41p. m. +0200, Michael Gmelin 
escribió:

> 
> 
> > On 4. Sep 2023, at 19:23, Matthias Apitz  wrote:
> > 
> > 
> > Added Alexander Motin  to To: as the origin of the CI;
> > 
> > Neither hw.atkbd.hz=1 nor hw.atkbd.hz=10 makes the keyboard working on
> > my beloved Acer C720. Should I file a new PR?
> > 
> 
> Filing a PR makes sense, could you please Cc me on it?
> 
> Do you know which version of FreeBSD was the last that worked for you?

I'm actually using (and typing this on it) r368166. Will file a PR
tomorrow. Thanks

matthias

> >> El día lunes, septiembre 04, 2023 a las 06:55:52p. m. +0200, Michael 
> >> Gmelin escribió:
> >> 
> >> 
> >> 
> >> On Mon, 4 Sep 2023 18:43:11 +0200
> >> Matthias Apitz  wrote:
> >> 
> >>> I have a 14.0-CURRENT compiled from sources of head from August 4,
> >>> which boots fine from a produced USB key, but the keyboard does not
> >>> work on an Acer C720 (amd64), on other laptops the keyboard is fine.
> >>> 
> >>> The keyboard works during the boot menu (for example to enable verbose
> >>> boot messages) but not on the login: prompt of the booted system.
> >>> 
> >>> I've enabled SSH access into the C720 (if someone need more
> >>> information) and I'm attaching /var/log/messages of the booted system.
> >> 
> >> Hi Matthias,
> >> 
> >> The C720 required special patches for the keyboard to work, which I
> >> originally added here:
> >> https://cgit.freebsd.org/src/commit/?id=6c176113bbdd598231ec47d161d4c3714997169b
> >> 
> >> I assume that something in that area changed recently.
> >> 
> >> Without digging into it, this looks like a possible cause:
> >> 
> >>  
> >> https://cgit.freebsd.org/src/commit/sys/dev/atkbdc/atkbd.c?id=ce881170088c4c98c036fe561f8ee8413c2e2585
> >> 
> >>  atkbd: Disable periodic polling by default.
> >>  It is one of the few remaining Giant-locked callouts.  It would be
> >>  good to remove it, not mentioning that polling itself is not good.
> >> 
> >>  If this cause keyboard/mouse freezes on some hardware, please set
> >>  loader tunable hw.atkbd.hz=1 as workaround and report the issue.
> >> 
> >> So you could try to set hw.atkbd.hz=1 (or hw.atkbd.hz=10) in
> >> /boot/loader.conf, then reboot and see if it helps.
> >> 
> >> Best
> >> Michael
> >> 
> >> -- 
> >> Michael Gmelin
> >> 
> > 
> > -- 
> > Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
> > Public GnuPG key: http://www.unixarea.de/key.pub
> 
> 

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: 14.0-CURRENT boots fine but keyboard does not work

2023-09-04 Thread Michael Gmelin



> On 4. Sep 2023, at 19:23, Matthias Apitz  wrote:
> 
> 
> Added Alexander Motin  to To: as the origin of the CI;
> 
> Neither hw.atkbd.hz=1 nor hw.atkbd.hz=10 makes the keyboard working on
> my beloved Acer C720. Should I file a new PR?
> 

Filing a PR makes sense, could you please Cc me on it?

Do you know which version of FreeBSD was the last that worked for you?

Cheers


> 
> 
>> El día lunes, septiembre 04, 2023 a las 06:55:52p. m. +0200, Michael Gmelin 
>> escribió:
>> 
>> 
>> 
>> On Mon, 4 Sep 2023 18:43:11 +0200
>> Matthias Apitz  wrote:
>> 
>>> I have a 14.0-CURRENT compiled from sources of head from August 4,
>>> which boots fine from a produced USB key, but the keyboard does not
>>> work on an Acer C720 (amd64), on other laptops the keyboard is fine.
>>> 
>>> The keyboard works during the boot menu (for example to enable verbose
>>> boot messages) but not on the login: prompt of the booted system.
>>> 
>>> I've enabled SSH access into the C720 (if someone need more
>>> information) and I'm attaching /var/log/messages of the booted system.
>> 
>> Hi Matthias,
>> 
>> The C720 required special patches for the keyboard to work, which I
>> originally added here:
>> https://cgit.freebsd.org/src/commit/?id=6c176113bbdd598231ec47d161d4c3714997169b
>> 
>> I assume that something in that area changed recently.
>> 
>> Without digging into it, this looks like a possible cause:
>> 
>>  
>> https://cgit.freebsd.org/src/commit/sys/dev/atkbdc/atkbd.c?id=ce881170088c4c98c036fe561f8ee8413c2e2585
>> 
>>  atkbd: Disable periodic polling by default.
>>  It is one of the few remaining Giant-locked callouts.  It would be
>>  good to remove it, not mentioning that polling itself is not good.
>> 
>>  If this cause keyboard/mouse freezes on some hardware, please set
>>  loader tunable hw.atkbd.hz=1 as workaround and report the issue.
>> 
>> So you could try to set hw.atkbd.hz=1 (or hw.atkbd.hz=10) in
>> /boot/loader.conf, then reboot and see if it helps.
>> 
>> Best
>> Michael
>> 
>> -- 
>> Michael Gmelin
>> 
> 
> -- 
> Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
> Public GnuPG key: http://www.unixarea.de/key.pub




Re: 14.0-CURRENT boots fine but keyboard does not work

2023-09-04 Thread Matthias Apitz


Added Alexander Motin  to To: as the origin of the CI;

Neither hw.atkbd.hz=1 nor hw.atkbd.hz=10 makes the keyboard working on
my beloved Acer C720. Should I file a new PR?

Thanks

matthias


El día lunes, septiembre 04, 2023 a las 06:55:52p. m. +0200, Michael Gmelin 
escribió:

> 
> 
> On Mon, 4 Sep 2023 18:43:11 +0200
> Matthias Apitz  wrote:
> 
> > I have a 14.0-CURRENT compiled from sources of head from August 4,
> > which boots fine from a produced USB key, but the keyboard does not
> > work on an Acer C720 (amd64), on other laptops the keyboard is fine.
> > 
> > The keyboard works during the boot menu (for example to enable verbose
> > boot messages) but not on the login: prompt of the booted system.
> > 
> > I've enabled SSH access into the C720 (if someone need more
> > information) and I'm attaching /var/log/messages of the booted system.
> 
> Hi Matthias,
> 
> The C720 required special patches for the keyboard to work, which I
> originally added here:
> https://cgit.freebsd.org/src/commit/?id=6c176113bbdd598231ec47d161d4c3714997169b
> 
> I assume that something in that area changed recently.
> 
> Without digging into it, this looks like a possible cause:
> 
>   
> https://cgit.freebsd.org/src/commit/sys/dev/atkbdc/atkbd.c?id=ce881170088c4c98c036fe561f8ee8413c2e2585
> 
>   atkbd: Disable periodic polling by default.
>   It is one of the few remaining Giant-locked callouts.  It would be
>   good to remove it, not mentioning that polling itself is not good.
> 
>   If this cause keyboard/mouse freezes on some hardware, please set
>   loader tunable hw.atkbd.hz=1 as workaround and report the issue.
> 
> So you could try to set hw.atkbd.hz=1 (or hw.atkbd.hz=10) in
> /boot/loader.conf, then reboot and see if it helps.
> 
> Best
> Michael
> 
> -- 
> Michael Gmelin
> 

-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub



Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Alexander Motin

On 04.09.2023 11:45, Mark Millard wrote:

On Sep 4, 2023, at 06:09, Alexander Motin  wrote:

per_txg_dirty_frees_percent is directly related to the delete delays we see 
here.  You are forcing ZFS to commit transactions each 5% of dirty ARC limit, 
which is 5% of 10% or memory size.  I haven't looked on that code recently, but 
I guess setting it too low can make ZFS commit transactions too often, 
increasing write inflation for the underlying storage.  I would propose you to 
restore the default and try again.


While this machine is different, the original problem was worse than
the issue here: the load average was less than 1 for the most part
the parallel bulk build when 30 was used. The fraction of time waiting
was much longer than with 5. If I understand right, both too high and
too low for a type of context can lead to increased elapsed time and
getting it set to a near optimal is a non-obvious exploration.


IIRC this limit was modified several times since originally implemented. 
 May be it could benefit from another look, if default 30% is not good. 
 It would be good if generic ZFS issues like this were reported to 
OpenZFS upstream to be visible to a wider public.  Unfortunately I have 
several other project I must work on, so if it is not a regression I 
can't promise I'll take it right now, so anybody else is welcome.



An overall point for the goal of my activity is: what makes a
good test context for checking if ZFS is again safe to use?
May be other tradeoffs make, say, 4 hardware threads more
reasonable than 32.


Thank you for your testing.  The best test is one that nobody else run. 
It also correlates with the topic of "safe to use", which also depends 
on what it is used for. :)


--
Alexander Motin



Re: 14.0-CURRENT boots fine but keyboard does not work

2023-09-04 Thread Michael Gmelin



On Mon, 4 Sep 2023 18:43:11 +0200
Matthias Apitz  wrote:

> I have a 14.0-CURRENT compiled from sources of head from August 4,
> which boots fine from a produced USB key, but the keyboard does not
> work on an Acer C720 (amd64), on other laptops the keyboard is fine.
> 
> The keyboard works during the boot menu (for example to enable verbose
> boot messages) but not on the login: prompt of the booted system.
> 
> I've enabled SSH access into the C720 (if someone need more
> information) and I'm attaching /var/log/messages of the booted system.

Hi Matthias,

The C720 required special patches for the keyboard to work, which I
originally added here:
https://cgit.freebsd.org/src/commit/?id=6c176113bbdd598231ec47d161d4c3714997169b

I assume that something in that area changed recently.

Without digging into it, this looks like a possible cause:

  
https://cgit.freebsd.org/src/commit/sys/dev/atkbdc/atkbd.c?id=ce881170088c4c98c036fe561f8ee8413c2e2585

  atkbd: Disable periodic polling by default.
  It is one of the few remaining Giant-locked callouts.  It would be
  good to remove it, not mentioning that polling itself is not good.

  If this cause keyboard/mouse freezes on some hardware, please set
  loader tunable hw.atkbd.hz=1 as workaround and report the issue.

So you could try to set hw.atkbd.hz=1 (or hw.atkbd.hz=10) in
/boot/loader.conf, then reboot and see if it helps.

Best
Michael

-- 
Michael Gmelin



14.0-CURRENT boots fine but keyboard does not work

2023-09-04 Thread Matthias Apitz

I have a 14.0-CURRENT compiled from sources of head from August 4, which
boots fine from a produced USB key, but the keyboard does not work on an
Acer C720 (amd64), on other laptops the keyboard is fine.

The keyboard works during the boot menu (for example to enable verbose
boot messages) but not on the login: prompt of the booted system.

I've enabled SSH access into the C720 (if someone need more information)
and I'm attaching /var/log/messages of the booted system.

Any ideas?

matthias


-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Sep  4 13:44:55 C720-1400094 syslogd: kernel boot file is /boot/kernel/kernel
Sep  4 13:44:55 C720-1400094 kernel: ---<>---
Sep  4 13:44:55 C720-1400094 kernel: APIC: Using the MADT enumerator.
Sep  4 13:44:55 C720-1400094 kernel: Copyright (c) 1992-2023 The FreeBSD 
Project.
Sep  4 13:44:55 C720-1400094 kernel: Copyright (c) 1979, 1980, 1983, 1986, 
1988, 1989, 1991, 1992, 1993, 1994
Sep  4 13:44:55 C720-1400094 kernel:The Regents of the University of 
California. All rights reserved.
Sep  4 13:44:55 C720-1400094 kernel: FreeBSD is a registered trademark of The 
FreeBSD Foundation.
Sep  4 13:44:55 C720-1400094 kernel: FreeBSD 14.0-CURRENT amd64 1400094 #0 
main-n264568-1d7ffb373c9d: Fri Aug 18 12:55:08 CEST 2023
Sep  4 13:44:55 C720-1400094 kernel: 
guru@jet:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
Sep  4 13:44:55 C720-1400094 kernel: FreeBSD clang version 16.0.6 
(https://github.com/llvm/llvm-project.git llvmorg-16.0.6-0-g7cbf1a259152)
Sep  4 13:44:55 C720-1400094 kernel: WARNING: WITNESS option enabled, expect 
reduced performance.
Sep  4 13:44:55 C720-1400094 kernel: PPIM 0: PA=0xa, VA=0x8281, 
size=0x1, mode=0
Sep  4 13:44:55 C720-1400094 kernel: pmap: large map 8 PML4 slots (4096 GB)
Sep  4 13:44:55 C720-1400094 kernel: VT(vga): resolution 640x480
Sep  4 13:44:55 C720-1400094 kernel: Preloaded elf kernel "/boot/kernel/kernel" 
at 0x81fee540.
Sep  4 13:44:55 C720-1400094 kernel: Preloaded boot_entropy_cache 
"/boot/entropy" at 0x81ff77c8.
Sep  4 13:44:55 C720-1400094 kernel: Preloaded TSLOG data "TSLOG" at 
0x81ff7820.
Sep  4 13:44:55 C720-1400094 kernel: Early TSC frequency 1396833720Hz 
calibrated from 8254 PIT
Sep  4 13:44:55 C720-1400094 kernel: CPU: Intel(R) Celeron(R) 2955U @ 1.40GHz 
(1396.83-MHz K8-class CPU)
Sep  4 13:44:55 C720-1400094 kernel:   Origin="GenuineIntel"  Id=0x40651  
Family=0x6  Model=0x45  Stepping=1
Sep  4 13:44:55 C720-1400094 kernel:   
Features=0xbfebfbff
Sep  4 13:44:55 C720-1400094 kernel:   
Features2=0x4ddaebbf
Sep  4 13:44:55 C720-1400094 kernel:   AMD 
Features=0x2c100800
Sep  4 13:44:55 C720-1400094 kernel:   AMD Features2=0x21
Sep  4 13:44:55 C720-1400094 kernel:   Structured Extended 
Features=0x2603
Sep  4 13:44:55 C720-1400094 kernel:   XSAVE Features=0x1
Sep  4 13:44:55 C720-1400094 kernel:   VT-x: (disabled in BIOS) Basic 
Features=0xda0400
Sep  4 13:44:55 C720-1400094 kernel: Pin-Based 
Controls=0x7f
Sep  4 13:44:55 C720-1400094 kernel: Primary Processor 
Controls=0xfff9fffe
Sep  4 13:44:55 C720-1400094 kernel: Secondary Processor 
Controls=0x1cff
Sep  4 13:44:55 C720-1400094 kernel: Exit 
Controls=0xda0400
Sep  4 13:44:55 C720-1400094 kernel: Entry Controls=0xda0400
Sep  4 13:44:55 C720-1400094 kernel: EPT 
Features=0x6334141
Sep  4 13:44:55 C720-1400094 kernel: VPID 
Features=0xf01
Sep  4 13:44:55 C720-1400094 kernel:   TSC: P-state invariant, performance 
statistics
Sep  4 13:44:55 C720-1400094 kernel: Data TLB: 2 MByte or 4 MByte pages, 4-way 
set associative, 32 entries and a separate array with 1 GByte pages, 4-way set 
associative, 4 entries
Sep  4 13:44:55 C720-1400094 kernel: Data TLB: 4 KB pages, 4-way set 
associative, 64 entries
Sep  4 13:44:55 C720-1400094 kernel: Instruction TLB: 2M/4M pages, fully 
associative, 8 entries
Sep  4 13:44:55 C720-1400094 kernel: Instruction TLB: 4KByte pages, 8-way set 
associative, 128 entries
Sep  4 13:44:55 C720-1400094 kernel: 64-Byte prefetching
Sep  4 13:44:55 C720-1400094 kernel: Shared 2nd-Level TLB: 4 KByte/2MByte 
pages, 8-way associative, 1024 entries
Sep  4 13:44:55 C720-1400094 kernel: L2 cache: 256 kbytes, 8-way associative, 
64 bytes/line
Sep  4 13:44:55 C720-1400094 kernel: real memory  = 4301258752 (4102 MB)
Sep  4 13:44:55 C720-1400094 kernel: Physical memory chunk(s):
Sep  4 13:44:55 C720-1400094 kernel: 0x0001 - 0x0002, 
131072 bytes (32 pages)
Sep  4 13:44:55 C720-1400094 kernel: 0x0004 - 0x0009dfff, 
385024 bytes (94 pages)
Sep  4 13:44:55 C720-1400094 kernel: 0x0010 - 0x001f, 
1048576 bytes (256 pages)
Sep  4 13:44:55 C720-1400094 kernel: 0x02402000 - 0x751aefff, 
1926942720 bytes (470445 pages)
Sep  4 13:44:55 C720-1400094 kernel: 0x7bc0 - 0x7bcf3fff, 

Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Mark Millard
On Sep 4, 2023, at 06:09, Alexander Motin  wrote:

> On 04.09.2023 05:56, Mark Millard wrote:
>> On Sep 4, 2023, at 02:00, Mark Millard  wrote:
>>> On Sep 3, 2023, at 23:35, Mark Millard  wrote:
 On Sep 3, 2023, at 22:06, Alexander Motin  wrote:
> On 03.09.2023 22:54, Mark Millard wrote:
>> After that ^t produced the likes of:
>> load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 
>> 121.32s 1% 13004k
> 
> So the full state is not "tx->tx", but is actually a 
> "tx->tx_quiesce_done_cv", which means the thread is waiting for new 
> transaction to be opened, which means some previous to be quiesced and 
> then synced.
> 
>> #0 0x80b6f103 at mi_switch+0x173
>> #1 0x80bc0f24 at sleepq_switch+0x104
>> #2 0x80aec4c5 at _cv_wait+0x165
>> #3 0x82aba365 at txg_wait_open+0xf5
>> #4 0x82a11b81 at dmu_free_long_range+0x151
> 
> Here it seems like transaction commit is waited due to large amount of 
> delete operations, which ZFS tries to spread between separate TXGs.
 
 That fit the context: cleaning out /usr/local/poudriere/data/.m/
 
> You should probably see some large and growing number in sysctl 
> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay .
 
 After the reboot I started a -J64 example. It has avoided the
 early "witness exhausted". Again I ^C'd after about an hours
 after the 2nd builder had started. So: again cleaning out
 /usr/local/poudriere/data/.m/ Only seconds between:
 
 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
 kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042
 
 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
 kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427
 
 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
 kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323
 
 # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
 kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027
> 
> As expected, deletes trigger and wait for TXG commits.
> 
 I have found a measure of progress: zfs list's USED
 for /usr/local/poudriere/data/.m is decreasing. So
 ztop's d/s was a good classification: deletes.
 
>> #5 0x829a87d2 at zfs_rmnode+0x72
>> #6 0x829b658d at zfs_freebsd_reclaim+0x3d
>> #7 0x8113a495 at VOP_RECLAIM_APV+0x35
>> #8 0x80c5a7d9 at vgonel+0x3a9
>> #9 0x80c5af7f at vrecycle+0x3f
>> #10 0x829b643e at zfs_freebsd_inactive+0x4e
>> #11 0x80c598cf at vinactivef+0xbf
>> #12 0x80c590da at vput_final+0x2aa
>> #13 0x80c68886 at kern_funlinkat+0x2f6
>> #14 0x80c68588 at sys_unlink+0x28
>> #15 0x8106323f at amd64_syscall+0x14f
>> #16 0x8103512b at fast_syscall_common+0xf8
> 
> What we don't see here is what quiesce and sync threads of the pool are 
> actually doing.  Sync thread has plenty of different jobs, including 
> async write, async destroy, scrub and others, that all may delay each 
> other.
> 
> Before you rebooted the system, depending how alive it is, could you save 
> a number of outputs of `procstat -akk`, or at least specifically 
> `procstat -akk | grep txg_thread_enter` if the full is hard?  Or somehow 
> else observe what they are doing.
 
 # grep txg_thread_enter ~/mmjnk0[0-5].txt
 /usr/home/root/mmjnk00.txt:6 100881 zfskern 
 txg_thread_entermi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 
 txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 
 fork_trampoline+0xe
 /usr/home/root/mmjnk00.txt:6 100882 zfskern 
 txg_thread_entermi_switch+0x173 sleepq_switch+0x104 
 sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 
 dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 
 fork_trampoline+0xe
 /usr/home/root/mmjnk01.txt:6 100881 zfskern 
 txg_thread_entermi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 
 txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 
 fork_trampoline+0xe
 /usr/home/root/mmjnk01.txt:6 100882 zfskern 
 txg_thread_entermi_switch+0x173 sleepq_switch+0x104 
 sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 zio_wait+0x3c9 
 dsl_pool_sync+0x139 spa_sync+0xc68 txg_sync_thread+0x2eb fork_exit+0x82 
 fork_trampoline+0xe
 /usr/home/root/mmjnk02.txt:6 100881 zfskern 
 txg_thread_entermi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 
 txg_thread_wait+0xeb txg_quiesce_thread+0x144 fork_exit+0x82 
 fork_trampoline+0xe
 /usr/home/root/mmjnk02.txt:6 100882 zfskern 
 txg_thread_entermi_switch+0x173 sleepq_switch+0x104 
 sleepq_timedwait+0x4b _cv_timedwait_sbt+0x188 

Re: Possible issue with linux xattr support?

2023-09-04 Thread Felix Palmen
* Felix Palmen  [20230904 15:39]:
> For some reason (I still have to try to get more information about it,
> will do soon), something else is broken now. While install from GNU
> coreutils works fine, some build systems use 'cp -p' to install files
> instead, which now fails with:
> 
> | cp: preserving permissions for [...]: Operation not permitted

The issue seems to be with listing attributes:

| root@15-default:/wrkdirs/usr/ports/x11/linuxsrc-libxcb/work/libxcb-1.15/doc # 
truss /compat/linux/bin/cp -pR ./tutorial 
'/wrkdirs/usr/ports/x11/linuxsrc-libxcb/work/stage/compat/linux/usr/share/doc/libxcb/'
 2>&1 | grep xattr
| linux_flistxattr(0x4,0x0,0x0)ERR#-1 'Operation not 
permitted'
| linux_flistxattr(0x4,0x0,0x0)ERR#-1 'Operation not 
permitted'
| linux_llistxattr(0xcdd9,0x0,0x0) ERR#-1 'Operation not 
permitted'

The following q patch makes it work again:

#v+
diff --git a/sys/compat/linux/linux_xattr.c b/sys/compat/linux/linux_xattr.c
index 74b47f1cbaec..0b5af084b1b1 100644
--- a/sys/compat/linux/linux_xattr.c
+++ b/sys/compat/linux/linux_xattr.c
@@ -198,7 +198,7 @@ listxattr(struct thread *td, struct listxattr_args *args)
if (error == 0)
td->td_retval[0] = cnt;
free(data, M_LINUX);
-   return (error_to_xattrerror(attrnamespace, error));
+   return (error_to_xattrerror(EXTATTR_NAMESPACE_SYSTEM, error));
 }
 
 int
#v-

I think this makes sense, because listxattr iterates over all
namespaces, so there's no sane way to know whether the EPERM was caused
by trying to access "system" or something else.

Cheers, Felix

-- 
 Felix Palmen  {private}   fe...@palmen-it.de
 -- ports committer -- {web}  http://palmen-it.de
 {pgp public key}  http://palmen-it.de/pub.txt
 {pgp fingerprint} 6936 13D5 5BBF 4837 B212  3ACC 54AD E006 9879 F231


signature.asc
Description: PGP signature


Re: Possible issue with linux xattr support?

2023-09-04 Thread Felix Palmen
Hello Dmitry,

* Dmitry Chagin  [20230830 14:04]:
> Thanks, I see, I agree with your change, taken into account.

Thanks a lot for committing the fix/workaround as well as the new
feature! I just upgraded my test builders to 15-CURRENT to double-check.

For some reason (I still have to try to get more information about it,
will do soon), something else is broken now. While install from GNU
coreutils works fine, some build systems use 'cp -p' to install files
instead, which now fails with:

| cp: preserving permissions for [...]: Operation not permitted

I'm pretty sure this worked fine with the patch you linked earlier, and
also with my own change taking the namespace into account ...

Did you change anything else in your commits compared to the patch?

I'll post a followup as soon as I could get some truss output to know
exactly which syscall causes the issue now!

Cheers, Felix

-- 
 Felix Palmen  {private}   fe...@palmen-it.de
 -- ports committer -- {web}  http://palmen-it.de
 {pgp public key}  http://palmen-it.de/pub.txt
 {pgp fingerprint} 6936 13D5 5BBF 4837 B212  3ACC 54AD E006 9879 F231


signature.asc
Description: PGP signature


Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Alexander Motin

On 04.09.2023 05:56, Mark Millard wrote:

On Sep 4, 2023, at 02:00, Mark Millard  wrote:

On Sep 3, 2023, at 23:35, Mark Millard  wrote:

On Sep 3, 2023, at 22:06, Alexander Motin  wrote:

On 03.09.2023 22:54, Mark Millard wrote:

After that ^t produced the likes of:
load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 1% 
13004k


So the full state is not "tx->tx", but is actually a "tx->tx_quiesce_done_cv", 
which means the thread is waiting for new transaction to be opened, which means some previous to be 
quiesced and then synced.


#0 0x80b6f103 at mi_switch+0x173
#1 0x80bc0f24 at sleepq_switch+0x104
#2 0x80aec4c5 at _cv_wait+0x165
#3 0x82aba365 at txg_wait_open+0xf5
#4 0x82a11b81 at dmu_free_long_range+0x151


Here it seems like transaction commit is waited due to large amount of delete 
operations, which ZFS tries to spread between separate TXGs.


That fit the context: cleaning out /usr/local/poudriere/data/.m/


You should probably see some large and growing number in sysctl 
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay .


After the reboot I started a -J64 example. It has avoided the
early "witness exhausted". Again I ^C'd after about an hours
after the 2nd builder had started. So: again cleaning out
/usr/local/poudriere/data/.m/ Only seconds between:

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027


As expected, deletes trigger and wait for TXG commits.


I have found a measure of progress: zfs list's USED
for /usr/local/poudriere/data/.m is decreasing. So
ztop's d/s was a good classification: deletes.


#5 0x829a87d2 at zfs_rmnode+0x72
#6 0x829b658d at zfs_freebsd_reclaim+0x3d
#7 0x8113a495 at VOP_RECLAIM_APV+0x35
#8 0x80c5a7d9 at vgonel+0x3a9
#9 0x80c5af7f at vrecycle+0x3f
#10 0x829b643e at zfs_freebsd_inactive+0x4e
#11 0x80c598cf at vinactivef+0xbf
#12 0x80c590da at vput_final+0x2aa
#13 0x80c68886 at kern_funlinkat+0x2f6
#14 0x80c68588 at sys_unlink+0x28
#15 0x8106323f at amd64_syscall+0x14f
#16 0x8103512b at fast_syscall_common+0xf8


What we don't see here is what quiesce and sync threads of the pool are 
actually doing.  Sync thread has plenty of different jobs, including async 
write, async destroy, scrub and others, that all may delay each other.

Before you rebooted the system, depending how alive it is, could you save a 
number of outputs of `procstat -akk`, or at least specifically `procstat -akk | 
grep txg_thread_enter` if the full is hard?  Or somehow else observe what they 
are doing.


# grep txg_thread_enter ~/mmjnk0[0-5].txt
/usr/home/root/mmjnk00.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk00.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk01.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk01.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk02.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk02.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk03.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe
/usr/home/root/mmjnk03.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 

Re: Speed improvements in ZFS

2023-09-04 Thread Mateusz Guzik
On 9/4/23, Alexander Leidinger  wrote:
> Am 2023-08-28 22:33, schrieb Alexander Leidinger:
>> Am 2023-08-22 18:59, schrieb Mateusz Guzik:
>>> On 8/22/23, Alexander Leidinger  wrote:
 Am 2023-08-21 10:53, schrieb Konstantin Belousov:
> On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:
>> Am 2023-08-20 23:17, schrieb Konstantin Belousov:
>> > On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
>> > > On 8/20/23, Alexander Leidinger  wrote:
>> > > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
>> > > >> On 8/20/23, Alexander Leidinger 
>> > > >> wrote:
>> > > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
>> > >  On 8/18/23, Alexander Leidinger 
>> > >  wrote:
>> > > >>>
>> > > > I have a 51MB text file, compressed to about 1MB. Are you
>> > > > interested
>> > > > to
>> > > > get it?
>> > > >
>> > > 
>> > >  Your problem is not the vnode limit, but nullfs.
>> > > 
>> > >  https://people.freebsd.org/~mjg/netchild-periodic-find.svg
>> > > >>>
>> > > >>> 122 nullfs mounts on this system. And every jail I setup has
>> > > >>> several
>> > > >>> null mounts. One basesystem mounted into every jail, and then
>> > > >>> shared
>> > > >>> ports (packages/distfiles/ccache) across all of them.
>> > > >>>
>> > >  First, some of the contention is notorious VI_LOCK in order
>> > >  to
>> > >  do
>> > >  anything.
>> > > 
>> > >  But more importantly the mind-boggling off-cpu time comes
>> > >  from
>> > >  exclusive locking which should not be there to begin with --
>> > >  as
>> > >  in
>> > >  that xlock in stat should be a slock.
>> > > 
>> > >  Maybe I'm going to look into it later.
>> > > >>>
>> > > >>> That would be fantastic.
>> > > >>>
>> > > >>
>> > > >> I did a quick test, things are shared locked as expected.
>> > > >>
>> > > >> However, I found the following:
>> > > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
>> > > >> mp->mnt_kern_flag |=
>> > > >> lowerrootvp->v_mount->mnt_kern_flag &
>> > > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
>> > > >> MNTK_EXTENDED_SHARED);
>> > > >> }
>> > > >>
>> > > >> are you using the "nocache" option? it has a side effect of
>> > > >> xlocking
>> > > >
>> > > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
>> > > >
>> > >
>> > > If you don't have "nocache" on null mounts, then I don't see how
>> > > this
>> > > could happen.
>> >
>> > There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
>> > for
>> > fuse and nfs at least.
>>
>> 11 of those 122 nullfs mounts are ZFS datasets which are also NFS
>> exported.
>> 6 of those nullfs mounts are also exported via Samba. The NFS
>> exports
>> shouldn't be needed anymore, I will remove them.
> By nfs I meant nfs client, not nfs exports.

 No NFS client mounts anywhere on this system. So where is this
 exclusive
 lock coming from then...
 This is a ZFS system. 2 pools: one for the root, one for anything I
 need
 space for. Both pools reside on the same disks. The root pool is a
 3-way
 mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
 space-pool. The jails are all basejail-style jails.

>>>
>>> While I don't see why xlocking happens, you should be able to dtrace
>>> or printf your way into finding out.
>>
>> dtrace looks to me like a faster approach to get to the root than
>> printf... my first naive try is to detect exclusive locks. I'm not 100%
>> sure I got it right, but at least dtrace doesn't complain about it:
>> ---snip---
>> #pragma D option dynvarsize=32m
>>
>> fbt:nullfs:null_lock:entry
>> /args[0]->a_flags & 0x08 != 0/
>> {
>> stack();
>> }
>> ---snip---
>>
>> In which direction should I look with dtrace if this works in tonights
>> run of periodic? I don't have enough knowledge about VFS to come up
>> with some immediate ideas.
>
> After your sysctl fix for maxvnodes I increased the amount of vnodes 10
> times compared to the initial report. This has increased the speed of
> the operation, the find runs in all those jails finished today after ~5h
> (@~8am) instead of in the afternoon as before. Could this suggest that
> in parallel some null_reclaim() is running which does the exclusive
> locks and slows down the entire operation?
>

That may be a slowdown to some extent, but the primary problem is
exclusive vnode locking for stat lookup, which should not be
happening.

-- 
Mateusz Guzik 



Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Mark Millard
On Sep 4, 2023, at 02:00, Mark Millard  wrote:

> On Sep 3, 2023, at 23:35, Mark Millard  wrote:
> 
>> On Sep 3, 2023, at 22:06, Alexander Motin  wrote:
>> 
>>> 
>>> On 03.09.2023 22:54, Mark Millard wrote:
 After that ^t produced the likes of:
 load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 
 1% 13004k
>>> 
>>> So the full state is not "tx->tx", but is actually a 
>>> "tx->tx_quiesce_done_cv", which means the thread is waiting for new 
>>> transaction to be opened, which means some previous to be quiesced and then 
>>> synced.
>>> 
 #0 0x80b6f103 at mi_switch+0x173
 #1 0x80bc0f24 at sleepq_switch+0x104
 #2 0x80aec4c5 at _cv_wait+0x165
 #3 0x82aba365 at txg_wait_open+0xf5
 #4 0x82a11b81 at dmu_free_long_range+0x151
>>> 
>>> Here it seems like transaction commit is waited due to large amount of 
>>> delete operations, which ZFS tries to spread between separate TXGs.
>> 
>> That fit the context: cleaning out /usr/local/poudriere/data/.m/
>> 
>>> You should probably see some large and growing number in sysctl 
>>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay .
>> 
>> After the reboot I started a -J64 example. It has avoided the
>> early "witness exhausted". Again I ^C'd after about an hours
>> after the 2nd builder had started. So: again cleaning out
>> /usr/local/poudriere/data/.m/ Only seconds between:
>> 
>> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042
>> 
>> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427
>> 
>> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323
>> 
>> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027
>> 
>> I have found a measure of progress: zfs list's USED
>> for /usr/local/poudriere/data/.m is decreasing. So
>> ztop's d/s was a good classification: deletes.
>> 
 #5 0x829a87d2 at zfs_rmnode+0x72
 #6 0x829b658d at zfs_freebsd_reclaim+0x3d
 #7 0x8113a495 at VOP_RECLAIM_APV+0x35
 #8 0x80c5a7d9 at vgonel+0x3a9
 #9 0x80c5af7f at vrecycle+0x3f
 #10 0x829b643e at zfs_freebsd_inactive+0x4e
 #11 0x80c598cf at vinactivef+0xbf
 #12 0x80c590da at vput_final+0x2aa
 #13 0x80c68886 at kern_funlinkat+0x2f6
 #14 0x80c68588 at sys_unlink+0x28
 #15 0x8106323f at amd64_syscall+0x14f
 #16 0x8103512b at fast_syscall_common+0xf8
>>> 
>>> What we don't see here is what quiesce and sync threads of the pool are 
>>> actually doing.  Sync thread has plenty of different jobs, including async 
>>> write, async destroy, scrub and others, that all may delay each other.
>>> 
>>> Before you rebooted the system, depending how alive it is, could you save a 
>>> number of outputs of `procstat -akk`, or at least specifically `procstat 
>>> -akk | grep txg_thread_enter` if the full is hard?  Or somehow else observe 
>>> what they are doing.
>> 
>> # procstat -akk > ~/mmjnk00.txt
>> # procstat -akk > ~/mmjnk01.txt
>> # procstat -akk > ~/mmjnk02.txt
>> # procstat -akk > ~/mmjnk03.txt
>> # procstat -akk > ~/mmjnk04.txt
>> # procstat -akk > ~/mmjnk05.txt
>> # grep txg_thread_enter ~/mmjnk0[0-5].txt
>> /usr/home/root/mmjnk00.txt:6 100881 zfskern txg_thread_enter 
>>mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
>> txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
>> /usr/home/root/mmjnk00.txt:6 100882 zfskern txg_thread_enter 
>>mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
>> _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
>> txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
>> /usr/home/root/mmjnk01.txt:6 100881 zfskern txg_thread_enter 
>>mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
>> txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
>> /usr/home/root/mmjnk01.txt:6 100882 zfskern txg_thread_enter 
>>mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
>> _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
>> txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
>> /usr/home/root/mmjnk02.txt:6 100881 zfskern txg_thread_enter 
>>mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
>> txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
>> /usr/home/root/mmjnk02.txt:6 100882 zfskern txg_thread_enter 
>>mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
>> _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
>> txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
>> /usr/home/root/mmjnk03.txt:

Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Mark Millard
On Sep 3, 2023, at 23:35, Mark Millard  wrote:

> On Sep 3, 2023, at 22:06, Alexander Motin  wrote:
> 
>> 
>> On 03.09.2023 22:54, Mark Millard wrote:
>>> After that ^t produced the likes of:
>>> load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 
>>> 1% 13004k
>> 
>> So the full state is not "tx->tx", but is actually a 
>> "tx->tx_quiesce_done_cv", which means the thread is waiting for new 
>> transaction to be opened, which means some previous to be quiesced and then 
>> synced.
>> 
>>> #0 0x80b6f103 at mi_switch+0x173
>>> #1 0x80bc0f24 at sleepq_switch+0x104
>>> #2 0x80aec4c5 at _cv_wait+0x165
>>> #3 0x82aba365 at txg_wait_open+0xf5
>>> #4 0x82a11b81 at dmu_free_long_range+0x151
>> 
>> Here it seems like transaction commit is waited due to large amount of 
>> delete operations, which ZFS tries to spread between separate TXGs.
> 
> That fit the context: cleaning out /usr/local/poudriere/data/.m/
> 
>> You should probably see some large and growing number in sysctl 
>> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay .
> 
> After the reboot I started a -J64 example. It has avoided the
> early "witness exhausted". Again I ^C'd after about an hours
> after the 2nd builder had started. So: again cleaning out
> /usr/local/poudriere/data/.m/ Only seconds between:
> 
> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042
> 
> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427
> 
> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323
> 
> # sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027
> 
> I have found a measure of progress: zfs list's USED
> for /usr/local/poudriere/data/.m is decreasing. So
> ztop's d/s was a good classification: deletes.
> 
>>> #5 0x829a87d2 at zfs_rmnode+0x72
>>> #6 0x829b658d at zfs_freebsd_reclaim+0x3d
>>> #7 0x8113a495 at VOP_RECLAIM_APV+0x35
>>> #8 0x80c5a7d9 at vgonel+0x3a9
>>> #9 0x80c5af7f at vrecycle+0x3f
>>> #10 0x829b643e at zfs_freebsd_inactive+0x4e
>>> #11 0x80c598cf at vinactivef+0xbf
>>> #12 0x80c590da at vput_final+0x2aa
>>> #13 0x80c68886 at kern_funlinkat+0x2f6
>>> #14 0x80c68588 at sys_unlink+0x28
>>> #15 0x8106323f at amd64_syscall+0x14f
>>> #16 0x8103512b at fast_syscall_common+0xf8
>> 
>> What we don't see here is what quiesce and sync threads of the pool are 
>> actually doing.  Sync thread has plenty of different jobs, including async 
>> write, async destroy, scrub and others, that all may delay each other.
>> 
>> Before you rebooted the system, depending how alive it is, could you save a 
>> number of outputs of `procstat -akk`, or at least specifically `procstat 
>> -akk | grep txg_thread_enter` if the full is hard?  Or somehow else observe 
>> what they are doing.
> 
> # procstat -akk > ~/mmjnk00.txt
> # procstat -akk > ~/mmjnk01.txt
> # procstat -akk > ~/mmjnk02.txt
> # procstat -akk > ~/mmjnk03.txt
> # procstat -akk > ~/mmjnk04.txt
> # procstat -akk > ~/mmjnk05.txt
> # grep txg_thread_enter ~/mmjnk0[0-5].txt
> /usr/home/root/mmjnk00.txt:6 100881 zfskern txg_thread_enter  
>   mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
> txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
> /usr/home/root/mmjnk00.txt:6 100882 zfskern txg_thread_enter  
>   mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
> _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
> txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
> /usr/home/root/mmjnk01.txt:6 100881 zfskern txg_thread_enter  
>   mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
> txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
> /usr/home/root/mmjnk01.txt:6 100882 zfskern txg_thread_enter  
>   mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
> _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
> txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
> /usr/home/root/mmjnk02.txt:6 100881 zfskern txg_thread_enter  
>   mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
> txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
> /usr/home/root/mmjnk02.txt:6 100882 zfskern txg_thread_enter  
>   mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
> _cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
> txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
> /usr/home/root/mmjnk03.txt:6 100881 zfskern txg_thread_enter  
>   mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
> 

Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Mark Millard
On Sep 3, 2023, at 19:54, Mark Millard  wrote:

> ThreadRipper 1950X (32 hardware threads) doing bulk -J128
> with USE_TMPFS=no , no ALLOW_MAKE_JOBS , no
> ALLOW_MAKE_JOBS_PACKAGES , USB3 NVMe SSD storage/ZFS-boot-media,
> debug system build in use :
> 
> [00:03:44] Building 34214 packages using up to 128 builders
> [00:03:44] Hit CTRL+t at any time to see build progress and stats
> [00:03:44] [01] [00:00:00] Builder starting
> [00:04:37] [01] [00:00:53] Builder started
> [00:04:37] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.20.6
> [00:05:53] [01] [00:01:16] Finished ports-mgmt/pkg | pkg-1.20.6: Success
> [00:06:15] [01] [00:00:00] Building print/indexinfo | indexinfo-0.3.1
> [00:06:15] [02] [00:00:00] Builder starting
> . . .
> [00:06:18] [128] [00:00:00] Builder starting
> [00:07:42] [01] [00:01:27] Finished print/indexinfo | indexinfo-0.3.1: Success
> [00:07:45] [01] [00:00:00] Building devel/gettext-runtime | 
> gettext-runtime-0.22_1
> [00:18:45] [01] [00:11:00] Finished devel/gettext-runtime | 
> gettext-runtime-0.22_1: Success
> [00:19:06] [01] [00:00:00] Building devel/gmake | gmake-4.3_2
> [00:24:13] [01] [00:05:07] Finished devel/gmake | gmake-4.3_2: Success
> [00:24:39] [01] [00:00:00] Building devel/libtextstyle | libtextstyle-0.22
> [00:31:08] [125] [00:24:50] Builder started
> [00:31:08] [125] [00:00:00] Building print/t1utils | t1utils-1.32
> [00:31:15] [33] [00:25:00] Builder started
> [00:31:15] [81] [00:24:59] Builder started
> [00:31:15] [33] [00:00:00] Building databases/xapian-core | 
> xapian-core-1.4.23,1
> [00:31:15] [13] [00:25:00] Builder started
> [00:31:15] [81] [00:00:00] Building devel/bmake | bmake-20230723
> [00:31:15] [13] [00:00:00] Building devel/evdev-proto | evdev-proto-5.8
> [00:31:16] [41] [00:25:00] Builder started
> [00:31:16] [41] [00:00:00] Building devel/pcre | pcre-8.45_3
> . . .
> 
> (Looks like lang/go120 ignores the lack of ALLOW_MAKE_JOBS .
> There may be others that still have signficant parallel
> activity.)
> 
> [main-amd64-bulk_a-default] [2023-09-03_13h48m45s] [parallel_build:] Queued: 
> 34588 Built: 727   Failed: 1 Skipped: 40Ignored: 335   Fetched: 0 
> Tobuild: 33485  Time: 01:36:51
> 
> (So about 1 hr after the last "Builder starting" it had
> built 727.)
> 
> The vast majority of the time: lots of cpdup's with tx->tx
> showing most of the time for STATE but showing having some
> CPU time.
> 
> ^T commonly showed various Builders in starting PHASE for
> 3min..6min.
> 
> Around 66% mean Idle time (guess from watching top).
> 
> After ^C "gstat -spod" reports it is almost always writing
> 2200 to 2500 writes per second or so for *hours* (still
> going on).
> 
> ztop reports 1500 to 3200 d/s or so almost always for
> Dataset zamd64/poudriere/data/.m instead (also still going
> on). Normally no other Dataset is shown.
> 
> With all the disk I/O activity, this is definitely "live"
> in some sense. But I've no clue if it is just repeating
> itself over and over vs. if it making some sort of progress.
> 
> For reference for the ^C and after:
> 
> ^C[01:39:00] [20] [00:00:03] Building sysutils/linux-c7-dosfstools | 
> linux-c7-dosfstools-3.0.20
> [01:39:00] [93] [00:07:12] Finished science/dimod | dimod-0.12.11: Success
> [01:39:00] Error: Signal SIGINT caught, cleaning up and exiting
> [01:39:02] [63] [00:06:34] Finished archivers/unarj | unarj-2.65_2: Success
> [01:39:03] [128] [00:07:47] Finished sysutils/shuf | shuf-3.0: Success
> [01:39:04] [113] [00:07:06] Finished devel/bsddialog | bsddialog-0.4.1: 
> Success
> [main-amd64-bulk_a-default] [2023-09-03_13h48m45s] [sigint:] Queued: 34588 
> Built: 752   Failed: 1 Skipped: 40Ignored: 335   Fetched: 0 
> Tobuild: 33460  Time: 01:38:56
> [01:39:06] Logs: 
> /usr/local/poudriere/data/logs/bulk/main-amd64-bulk_a-default/2023-09-03_13h48m45s
> [01:39:14] [12] [00:09:07] Finished archivers/rzip | rzip-2.1_1: Success
> [01:39:14] Cleaning up
> exit: cannot open ./var/run/49_nohang.pid: No such file or directory
> exit: cannot open ./var/run/87_nohang.pid: No such file or directory
> 
> After that ^t produced the likes of:
> 
> load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 1% 
> 13004k
> #0 0x80b6f103 at mi_switch+0x173
> #1 0x80bc0f24 at sleepq_switch+0x104
> #2 0x80aec4c5 at _cv_wait+0x165
> #3 0x82aba365 at txg_wait_open+0xf5
> #4 0x82a11b81 at dmu_free_long_range+0x151
> #5 0x829a87d2 at zfs_rmnode+0x72
> #6 0x829b658d at zfs_freebsd_reclaim+0x3d
> #7 0x8113a495 at VOP_RECLAIM_APV+0x35
> #8 0x80c5a7d9 at vgonel+0x3a9
> #9 0x80c5af7f at vrecycle+0x3f
> #10 0x829b643e at zfs_freebsd_inactive+0x4e
> #11 0x80c598cf at vinactivef+0xbf
> #12 0x80c590da at vput_final+0x2aa
> #13 0x80c68886 at kern_funlinkat+0x2f6
> #14 0x80c68588 at sys_unlink+0x28
> #15 0x8106323f at amd64_syscall+0x14f
> #16 0x8103512b at 

Re: An attempted test of main's "git: 2ad756a6bbb3" "merge openzfs/zfs@95f71c019" that did not go as planned

2023-09-04 Thread Mark Millard
On Sep 3, 2023, at 22:06, Alexander Motin  wrote:

> 
> On 03.09.2023 22:54, Mark Millard wrote:
>> After that ^t produced the likes of:
>> load: 6.39  cmd: sh 4849 [tx->tx_quiesce_done_cv] 10047.33r 0.51u 121.32s 1% 
>> 13004k
> 
> So the full state is not "tx->tx", but is actually a 
> "tx->tx_quiesce_done_cv", which means the thread is waiting for new 
> transaction to be opened, which means some previous to be quiesced and then 
> synced.
> 
>> #0 0x80b6f103 at mi_switch+0x173
>> #1 0x80bc0f24 at sleepq_switch+0x104
>> #2 0x80aec4c5 at _cv_wait+0x165
>> #3 0x82aba365 at txg_wait_open+0xf5
>> #4 0x82a11b81 at dmu_free_long_range+0x151
> 
> Here it seems like transaction commit is waited due to large amount of delete 
> operations, which ZFS tries to spread between separate TXGs.

That fit the context: cleaning out /usr/local/poudriere/data/.m/

> You should probably see some large and growing number in sysctl 
> kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay .

After the reboot I started a -J64 example. It has avoided the
early "witness exhausted". Again I ^C'd after about an hours
after the 2nd builder had started. So: again cleaning out
/usr/local/poudriere/data/.m/ Only seconds between:

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276042

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 276427

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 277323

# sysctl kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay
kstat.zfs.misc.dmu_tx.dmu_tx_dirty_frees_delay: 278027

I have found a measure of progress: zfs list's USED
for /usr/local/poudriere/data/.m is decreasing. So
ztop's d/s was a good classification: deletes.

>> #5 0x829a87d2 at zfs_rmnode+0x72
>> #6 0x829b658d at zfs_freebsd_reclaim+0x3d
>> #7 0x8113a495 at VOP_RECLAIM_APV+0x35
>> #8 0x80c5a7d9 at vgonel+0x3a9
>> #9 0x80c5af7f at vrecycle+0x3f
>> #10 0x829b643e at zfs_freebsd_inactive+0x4e
>> #11 0x80c598cf at vinactivef+0xbf
>> #12 0x80c590da at vput_final+0x2aa
>> #13 0x80c68886 at kern_funlinkat+0x2f6
>> #14 0x80c68588 at sys_unlink+0x28
>> #15 0x8106323f at amd64_syscall+0x14f
>> #16 0x8103512b at fast_syscall_common+0xf8
> 
> What we don't see here is what quiesce and sync threads of the pool are 
> actually doing.  Sync thread has plenty of different jobs, including async 
> write, async destroy, scrub and others, that all may delay each other.
> 
> Before you rebooted the system, depending how alive it is, could you save a 
> number of outputs of `procstat -akk`, or at least specifically `procstat -akk 
> | grep txg_thread_enter` if the full is hard?  Or somehow else observe what 
> they are doing.

# procstat -akk > ~/mmjnk00.txt
# procstat -akk > ~/mmjnk01.txt
# procstat -akk > ~/mmjnk02.txt
# procstat -akk > ~/mmjnk03.txt
# procstat -akk > ~/mmjnk04.txt
# procstat -akk > ~/mmjnk05.txt
# grep txg_thread_enter ~/mmjnk0[0-5].txt
/usr/home/root/mmjnk00.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
/usr/home/root/mmjnk00.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
/usr/home/root/mmjnk01.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
/usr/home/root/mmjnk01.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
/usr/home/root/mmjnk02.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
/usr/home/root/mmjnk02.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 zio_wait+0x3c9 dsl_pool_sync+0x139 spa_sync+0xc68 
txg_sync_thread+0x2eb fork_exit+0x82 fork_trampoline+0xe 
/usr/home/root/mmjnk03.txt:6 100881 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 _cv_wait+0x165 txg_thread_wait+0xeb 
txg_quiesce_thread+0x144 fork_exit+0x82 fork_trampoline+0xe 
/usr/home/root/mmjnk03.txt:6 100882 zfskern txg_thread_enter
mi_switch+0x173 sleepq_switch+0x104 sleepq_timedwait+0x4b 
_cv_timedwait_sbt+0x188 

Re: Speed improvements in ZFS

2023-09-04 Thread Alexander Leidinger

Am 2023-08-28 22:33, schrieb Alexander Leidinger:

Am 2023-08-22 18:59, schrieb Mateusz Guzik:

On 8/22/23, Alexander Leidinger  wrote:

Am 2023-08-21 10:53, schrieb Konstantin Belousov:

On Mon, Aug 21, 2023 at 08:19:28AM +0200, Alexander Leidinger wrote:

Am 2023-08-20 23:17, schrieb Konstantin Belousov:
> On Sun, Aug 20, 2023 at 11:07:08PM +0200, Mateusz Guzik wrote:
> > On 8/20/23, Alexander Leidinger  wrote:
> > > Am 2023-08-20 22:02, schrieb Mateusz Guzik:
> > >> On 8/20/23, Alexander Leidinger  wrote:
> > >>> Am 2023-08-20 19:10, schrieb Mateusz Guzik:
> >  On 8/18/23, Alexander Leidinger 
> >  wrote:
> > >>>
> > > I have a 51MB text file, compressed to about 1MB. Are you
> > > interested
> > > to
> > > get it?
> > >
> > 
> >  Your problem is not the vnode limit, but nullfs.
> > 
> >  https://people.freebsd.org/~mjg/netchild-periodic-find.svg
> > >>>
> > >>> 122 nullfs mounts on this system. And every jail I setup has
> > >>> several
> > >>> null mounts. One basesystem mounted into every jail, and then
> > >>> shared
> > >>> ports (packages/distfiles/ccache) across all of them.
> > >>>
> >  First, some of the contention is notorious VI_LOCK in order to
> >  do
> >  anything.
> > 
> >  But more importantly the mind-boggling off-cpu time comes from
> >  exclusive locking which should not be there to begin with -- as
> >  in
> >  that xlock in stat should be a slock.
> > 
> >  Maybe I'm going to look into it later.
> > >>>
> > >>> That would be fantastic.
> > >>>
> > >>
> > >> I did a quick test, things are shared locked as expected.
> > >>
> > >> However, I found the following:
> > >> if ((xmp->nullm_flags & NULLM_CACHE) != 0) {
> > >> mp->mnt_kern_flag |=
> > >> lowerrootvp->v_mount->mnt_kern_flag &
> > >> (MNTK_SHARED_WRITES | MNTK_LOOKUP_SHARED |
> > >> MNTK_EXTENDED_SHARED);
> > >> }
> > >>
> > >> are you using the "nocache" option? it has a side effect of
> > >> xlocking
> > >
> > > I use noatime, noexec, nosuid, nfsv4acls. I do NOT use nocache.
> > >
> >
> > If you don't have "nocache" on null mounts, then I don't see how
> > this
> > could happen.
>
> There is also MNTK_NULL_NOCACHE on lower fs, which is currently set
> for
> fuse and nfs at least.

11 of those 122 nullfs mounts are ZFS datasets which are also NFS
exported.
6 of those nullfs mounts are also exported via Samba. The NFS 
exports

shouldn't be needed anymore, I will remove them.

By nfs I meant nfs client, not nfs exports.


No NFS client mounts anywhere on this system. So where is this 
exclusive

lock coming from then...
This is a ZFS system. 2 pools: one for the root, one for anything I 
need
space for. Both pools reside on the same disks. The root pool is a 
3-way

mirror, the "space-pool" is a 5-disk raidz2. All jails are on the
space-pool. The jails are all basejail-style jails.



While I don't see why xlocking happens, you should be able to dtrace
or printf your way into finding out.


dtrace looks to me like a faster approach to get to the root than 
printf... my first naive try is to detect exclusive locks. I'm not 100% 
sure I got it right, but at least dtrace doesn't complain about it:

---snip---
#pragma D option dynvarsize=32m

fbt:nullfs:null_lock:entry
/args[0]->a_flags & 0x08 != 0/
{
stack();
}
---snip---

In which direction should I look with dtrace if this works in tonights 
run of periodic? I don't have enough knowledge about VFS to come up 
with some immediate ideas.


After your sysctl fix for maxvnodes I increased the amount of vnodes 10 
times compared to the initial report. This has increased the speed of 
the operation, the find runs in all those jails finished today after ~5h 
(@~8am) instead of in the afternoon as before. Could this suggest that 
in parallel some null_reclaim() is running which does the exclusive 
locks and slows down the entire operation?


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF