Tomas, This is really interesting data, thanks a lot for collecting all of it and formatting the helpful graphs.
Jerry On Sun, Aug 26, 2018 at 4:14 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > > On 08/25/2018 12:11 AM, Jerry Jelinek wrote: > > Alvaro, > > > > I have previously posted ZFS numbers for SmartOS and FreeBSD to this > > thread, although not with the exact same benchmark runs that Tomas did. > > > > I think the main purpose of running the benchmarks is to demonstrate > > that there is no significant performance regression with wal recycling > > disabled on a COW filesystem such as ZFS (which might just be intuitive > > for a COW filesystem). I've tried to be sure it is clear in the doc > > change with this patch that this tunable is only applicable to COW > > filesystems. I do not think the benchmarks will be able to recreate the > > problematic performance state that was originally described in Dave's > > email thread here: > > > > https://www.postgresql.org/message-id/flat/ > CACukRjO7DJvub8e2AijOayj8BfKK3XXBTwu3KKARiTr67M3E3w%40mail.gmail.com# > cacukrjo7djvub8e2aijoayj8bfkk3xxbtwu3kkaritr67m3...@mail.gmail.com > > > > I agree - the benchmarks are valuable both to show improvement and lack > of regression. I do have some numbers from LVM/ext4 (with snapshot > recreated every minute, to trigger COW-like behavior, and without the > snapshots), and from ZFS (on Linux, using zfsonlinux 0.7.9 on kernel > 4.17.17). > > Attached are PDFs with summary charts, more detailed results are > available at > > https://bitbucket.org/tvondra/wal-recycle-test-xeon/src/master/ > > > > lvm/ext4 (no snapshots) > ----------------------- > This pretty much behaves like plain ex4, at least for scales 200 and > 2000. I don't have results for scale 8000, because the test ran out of > disk space (I've used part of the device for snapshots, and it was > enough to trigger the disk space issue). > > > lvm/ext4 (snapshots) > --------------------- > On the smallest scale (200), there's no visible difference. On scale > 2000 disabling WAL reuse gives about 10% improvement (21468 vs. 23517 > tps), although it's not obvious from the chart. On the largest scale > (6000, to prevent the disk space issues) the improvement is about 10% > again, but it's much clearer. > > > zfs (Linux) > ----------- > On scale 200, there's pretty much no difference. On scale 2000, the > throughput actually decreased a bit, by about 5% - from the chart it > seems disabling the WAL reuse somewhat amplifies impact of checkpoints, > for some reason. > > I have no idea what happened at the largest scale (8000) - on master > there's a huge drop after ~120 minutes, which somewhat recovers at ~220 > minutes (but not fully). Without WAL reuse there's no such drop, > although there seems to be some degradation after ~220 minutes (i.e. at > about the same time the master partially recovers. I'm not sure what to > think about this, I wonder if it might be caused by almost filling the > disk space, or something like that. I'm rerunning this with scale 600. > > I'm also not sure how much can we extrapolate this to other ZFS configs > (I mean, this is a ZFS on a single SSD device, while I'd generally > expect ZFS on multiple devices, etc.). > > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >