On Thu, May 28, 2026 at 9:42 PM Andres Freund <[email protected]> wrote:
[..]
> It's definitely slower. I've not fully analyzed why, my suspicion is that we
> end up being rather terribly IO bound - we used bigger and faster disks on
> cirrus than we have access to with github hosted runners (there are large
> runners with more storage, but that's not free).
>
> A full testrun on master creates about 36GB of data directories. If individual
> tests are fast, that's often not *that* bad, because the tests are over before
> linux decides to flush out the data, and then linux never needs to write that
> data back, because we remove the data directories immediately. But once you
> get to the point that several tests take more than 30s (the default time after
> which linux writes dirty data back) or enough dirty data accumulates (20% of
> memory IIRC), you have a lot of IO.
>
> My buildfarm host, which hosts quite a few animals, got a new disk within the
> last year. Here's what smartctl says about disk IO:
>
> Data Units Read:                    43,513,034 [22.2 TB]
> Data Units Written:                 6,062,401,949 [3.10 PB]
>
> A nice indication of how much our tests end up writing...

`nijna -C build test` (of course without compilation) that was run in
dedicated cgroup gave me this /sys/fs/cgroup/my_test_suite/io.stat
figure:
    rbytes=88616960 wbytes=18406756352 rios=13843 wios=275457 dbytes=0 dios=0

~84.5 MB read
~17.1 GB written (sic!!!)
13k read ops
275k write ops

So yeah, I've really even didn't think we could generate that much IO there.
btw it's seems to be coming from block controller, so it's number of
flushed to disk (so the logically written data but removed without flush?
would be way higher; so by what Your' saying we should tweak that 30s
writeback right?) Anyway I've tried with way more relaxed dirty/writeback and
got this stil onthe same laptop with 32GB RAM:
    rbytes=20934656 wbytes=5957296128 rios=3040 wios=81992 dbytes=0 dios=

~20MB read
~5.55 GB written (down from 17GB)
3k read ops
82k write ops

And that was without even trying hard:
    sudo sysctl -w vm.dirty_ratio=50 # ~16GB
    sudo sysctl -w vm.dirty_background_ratio=40
    sudo sysctl -w vm.dirty_expire_centisecs=60000 #default was 3000 as You said
    sudo sysctl -w vm.dirty_writeback_centisecs=50000

Steps were:
  sudo mkdir /sys/fs/cgroup/my_test_suite
  echo $$ | sudo tee /sys/fs/cgroup/my_test_suite/cgroup.procs
  ninja -C build test
  cat /sys/fs/cgroup/my_test_suite/io.stat

-J.


Reply via email to