On Fri, Oct 27, 2023 at 03:34:35PM +0200, Ronald Klop wrote:

What stands out to me is that you do quite a lot of writes on the disk. (I 
might be mistaken.)
The max. number of IOPS for HDD is around 80 for consumer grade harddisks. I 
think this counts for USB connected disks.
https://en.wikipedia.org/wiki/IOPS#Mechanical_hard_drives
From the stats you posted it looks like you are almost always doing 50+ 
writes/second already. That does not leave much IOPS for the find process.

ZFS tries to bundle the writes every 5 seconds to leave room for the reads. See 
"sysctl vfs.zfs.txg.timeout". Unless it has too much data to write or a sync 
request comes in.

% sysctl vfs.zfs.txg.timeout
vfs.zfs.txg.timeout: 5

do I need to tune this?

Here's equivalent output from my setup (I ran periodic daily again)

#device       r/s     w/s     kr/s     kw/s  ms/r  ms/w  ms/o  ms/t qlen  %b
da0 16 18 191.9 557.9 50 8 144 29 10 24 da0 107 0 699.7 0.0 52 0 0 52 1 99 da0 102 0 409.2 0.0 71 0 0 71 2 98 da0 65 6 259.6 49.4 101 143 0 105 12 101 da0 57 14 227.7 123.9 153 163 0 155 12 100 da0 40 19 158.8 285.8 205 103 0 172 12 98 da0 46 30 191.1 441.9 180 58 0 132 11 91 da0 63 4 261.6 16.1 162 250 239 170 6 112 da0 67 10 273.7 83.6 99 66 0 95 12 91 da0 32 21 129.4 177.9 223 102 0 175 5 97 da0 48 16 191.9 261.3 173 130 0 162 9 109 da0 38 19 152.2 191.3 168 61 292 139 8 104 da0 92 0 366.9 0.0 104 0 0 104 4 100 da0 73 10 291.7 87.9 76 99 0 79 12 97 da0 49 15 195.2 270.9 156 129 0 150 11 103 da0 53 15 212.3 248.3 139 128 0 137 12 92 da0 54 22 216.1 272.1 151 81 92 130 8 107 da0 80 4 320.9 16.0 74 201 125 80 3 100 da0 55 10 218.8 72.9 89 73 0 87 11 82 ^C

% zpool iostat 1
capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
zroot       93.6G   818G     13     16   161K   506K
zroot       93.6G   818G     91      0   367K      0
zroot       93.6G   818G    113      0   454K      0
zroot       93.6G   818G    102      0   411K      0
zroot       93.6G   818G     98      0   422K      0
zroot       93.6G   818G     67     18   271K   171K
zroot       93.6G   818G     43     16   173K   252K
zroot       93.6G   818G     43     28   173K   376K
zroot       93.6G   818G     78      3   315K  15.9K
zroot       93.6G   818G     94      0   378K      0
zroot       93.6G   818G    103      0   414K      0
zroot       93.6G   818G    102      0   658K      0
zroot       93.6G   818G     98      0   396K      0
zroot       93.6G   818G    109      0   438K      0
zroot       93.6G   818G    101      0   404K      0
zroot       93.6G   818G     47     13   191K  91.4K
zroot       93.6G   818G     52     11   209K   126K
zroot       93.6G   818G     50     20   202K   301K
zroot       93.6G   818G     46     12   186K   128K
zroot       93.6G   818G     86      0   346K  3.93K
zroot       93.6G   818G     45     18   183K   172K
zroot       93.6G   818G     42     15   172K   343K
zroot       93.6G   818G     43     24   173K   211K
zroot       93.6G   818G     87      0   596K      0
^C

So if my observation is right it might be interesting to find out what is writing.

would ktrace and/or truss be useful? something else? The truss -p output of the
find PID produces massive amounts of output, all like this:

fstatat(AT_FDCWD,"5e70d5f895ccc92af6a7d5226f818b-81464.o",{ mode=-rw-r--r-- ,inode=367004,size=10312,blksize=10752 },AT_SYMLINK_NOFOLLOW) = 0 (0x0)

with the filename changing each time

(later...)

that file is in ccache!!!

locate 5e70d5f895ccc92af6a7d5226f818b-81464.o
/var/cache/ccache/f/5/5e70d5f895ccc92af6a7d5226f818b-81464.o

maybe if I can exclude that dir (and /usr/obj) it'll lessen the periodic 
runtime.
But i don't know yet whats calling find(1) when periodic daily runs. If I can, I might be able to tell it not to walk certain heirarchies.

I had similar issues after the number of jails on my RPI4 increased and they all were doing a little bit of writing which accumulated in quite a lot of writing.

I'm at a loss as to what's doing the writing. The system runs the following:

poudriere-devel # for aarch64 and armv7
apcupsd         # for ups monitoring
vnstat          # bandwidth use, writes to its db in /var/db/vnstat
sshd
exim (local)
pflogd          # right now it's behind a firewall, on NAT so it's not doing 
much
pf              # same
ntpd
powerd
nginx # this serves the poudriere web frontend, and that's it (http-only) syslogd

My solution was to add an SSD.

I have an nfs alternative. The LAN is 1GB. But I think the fix will be to tell 
find
to not search some paths. just need to work out how to do it.

What would the effect be of increasing or decreasing the txg delta with system
performance?
--


Reply via email to