Hi, you should give it a try to ceph octopus. librbd have greatly improved for write, and I can recommand to enable writeback now by default
Here some iops result with 1vm - 1disk - 4k block iodepth=64, librbd, no iothread. nautilus-cache=none nautilus-cache=writeback octopus-cache=none octopus-cache=writeback randread 4k 62.1k 25.2k 61.1k 60.8k randwrite 4k 27.7k 19.5k 34.5k 53.0k seqwrite 4k 7850 37.5k 24.9k 82.6k ----- Mail original ----- De: "Mark Schouten" <m...@tuxis.nl> À: "proxmoxve" <pve-user@pve.proxmox.com> Envoyé: Jeudi 2 Juillet 2020 15:15:20 Objet: Re: [PVE-User] High I/O waits, not sure if it's a ceph issue. On Thu, Jul 02, 2020 at 09:06:54AM +1000, Lindsay Mathieson wrote: > I did some adhoc testing last night - definitely a difference, in KRBD's > favour. Both sequential and random IO was much better with it enabled. Interesting! I just did some testing too on our demo cluster. Ceph with 6 osd's over three nodes, size 2. root@node04:~# pveversion pve-manager/6.2-6/ee1d7754 (running kernel: 5.4.41-1-pve) root@node04:~# ceph -v ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable) rbd create fio_test --size 10G -p Ceph rbd create map_test --size 10G -p Ceph rbd map Ceph/map_test When just using a write test (rw=randwrite) krbd wins, big. rbd: WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), io=10.0GiB (10.7GB), run=269904-269904msec krbd: WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB (10.7GB), run=49582-49582msec However, using rw=randrw (rwmixread=75), things change a lot: rbd: READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), io=7678MiB (8051MB), run=153607-153607msec WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), io=2562MiB (2687MB), run=153607-153607msec krbd: READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), io=7680MiB (8053MB), run=1426930-1426930msec WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), io=2560MiB (2685MB), run=1426930-1426930msec Maybe I'm interpreting or testing stuff wrong, but it looks like simply writing to krbd is much faster, but actually trying to use that data seems slower. Let me know what you guys think. Attachments are being stripped, IIRC, so here's the config and the full output of the tests: ============RESULTS================== rbd_write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=32 rbd_readwrite: (g=1): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=32 krbd_write: (g=2): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 krbd_readwrite: (g=3): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32 fio-3.12 Starting 4 processes Jobs: 1 (f=1): [_(3),f(1)][100.0%][eta 00m:00s] rbd_write: (groupid=0, jobs=1): err= 0: pid=1846441: Thu Jul 2 15:08:42 2020 write: IOPS=9712, BW=37.9MiB/s (39.8MB/s)(10.0GiB/269904msec); 0 zone resets slat (nsec): min=943, max=1131.9k, avg=6367.94, stdev=10934.84 clat (usec): min=1045, max=259066, avg=3286.70, stdev=4553.24 lat (usec): min=1053, max=259069, avg=3293.06, stdev=4553.20 clat percentiles (usec): | 1.00th=[ 1844], 5.00th=[ 2114], 10.00th=[ 2311], 20.00th=[ 2573], | 30.00th=[ 2769], 40.00th=[ 2933], 50.00th=[ 3064], 60.00th=[ 3228], | 70.00th=[ 3425], 80.00th=[ 3621], 90.00th=[ 3982], 95.00th=[ 4359], | 99.00th=[ 5538], 99.50th=[ 6718], 99.90th=[ 82314], 99.95th=[125305], | 99.99th=[187696] bw ( KiB/s): min=17413, max=40282, per=83.81%, avg=32561.17, stdev=3777.39, samples=539 iops : min= 4353, max=10070, avg=8139.93, stdev=944.34, samples=539 lat (msec) : 2=2.64%, 4=87.80%, 10=9.37%, 20=0.08%, 50=0.01% lat (msec) : 100=0.02%, 250=0.09%, 500=0.01% cpu : usr=8.73%, sys=5.27%, ctx=1254152, majf=0, minf=8484 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 rbd_readwrite: (groupid=1, jobs=1): err= 0: pid=1852029: Thu Jul 2 15:08:42 2020 read: IOPS=12.8k, BW=49.0MiB/s (52.4MB/s)(7678MiB/153607msec) slat (nsec): min=315, max=4467.8k, avg=3247.91, stdev=7360.28 clat (usec): min=276, max=160495, avg=1412.53, stdev=656.11 lat (usec): min=281, max=160497, avg=1415.78, stdev=656.02 clat percentiles (usec): | 1.00th=[ 494], 5.00th=[ 693], 10.00th=[ 832], 20.00th=[ 1012], | 30.00th=[ 1139], 40.00th=[ 1254], 50.00th=[ 1352], 60.00th=[ 1450], | 70.00th=[ 1549], 80.00th=[ 1696], 90.00th=[ 1926], 95.00th=[ 2343], | 99.00th=[ 3621], 99.50th=[ 3949], 99.90th=[ 5604], 99.95th=[ 7373], | 99.99th=[11207] bw ( KiB/s): min=25546, max=50344, per=78.44%, avg=40147.73, stdev=2610.22, samples=306 iops : min= 6386, max=12586, avg=10036.57, stdev=652.54, samples=306 write: IOPS=4270, BW=16.7MiB/s (17.5MB/s)(2562MiB/153607msec); 0 zone resets slat (nsec): min=990, max=555362, avg=5474.97, stdev=6241.91 clat (usec): min=1052, max=196165, avg=3239.08, stdev=3722.92 lat (usec): min=1056, max=196171, avg=3244.55, stdev=3722.91 clat percentiles (usec): | 1.00th=[ 1663], 5.00th=[ 1991], 10.00th=[ 2180], 20.00th=[ 2442], | 30.00th=[ 2606], 40.00th=[ 2769], 50.00th=[ 2966], 60.00th=[ 3130], | 70.00th=[ 3359], 80.00th=[ 3654], 90.00th=[ 4359], 95.00th=[ 5014], | 99.00th=[ 6325], 99.50th=[ 7177], 99.90th=[ 40109], 99.95th=[104334], | 99.99th=[175113] bw ( KiB/s): min= 8450, max=17786, per=78.45%, avg=13398.97, stdev=891.56, samples=306 iops : min= 2112, max= 4446, avg=3349.35, stdev=222.88, samples=306 lat (usec) : 500=0.79%, 750=4.30%, 1000=9.19% lat (msec) : 2=55.67%, 4=26.22%, 10=3.78%, 20=0.03%, 50=0.01% lat (msec) : 100=0.01%, 250=0.02% cpu : usr=13.97%, sys=7.94%, ctx=1729014, majf=0, minf=2214 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=1965537,655903,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 krbd_write: (groupid=2, jobs=1): err= 0: pid=1855430: Thu Jul 2 15:08:42 2020 write: IOPS=52.9k, BW=207MiB/s (217MB/s)(10.0GiB/49582msec); 0 zone resets slat (nsec): min=1624, max=41411k, avg=17942.28, stdev=482539.15 clat (nsec): min=1495, max=41565k, avg=586889.90, stdev=2650654.87 lat (usec): min=3, max=41568, avg=604.90, stdev=2691.73 clat percentiles (usec): | 1.00th=[ 92], 5.00th=[ 93], 10.00th=[ 93], 20.00th=[ 94], | 30.00th=[ 95], 40.00th=[ 96], 50.00th=[ 99], 60.00th=[ 102], | 70.00th=[ 109], 80.00th=[ 120], 90.00th=[ 139], 95.00th=[ 161], | 99.00th=[14877], 99.50th=[18482], 99.90th=[18744], 99.95th=[22676], | 99.99th=[22938] bw ( KiB/s): min=61770, max=1314960, per=94.28%, avg=199384.09, stdev=331335.15, samples=99 iops : min=15442, max=328740, avg=49845.71, stdev=82833.70, samples=99 lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 100=55.51% lat (usec) : 250=41.04%, 500=0.12%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 10=0.01%, 20=3.22%, 50=0.07% cpu : usr=6.29%, sys=11.90%, ctx=4350, majf=0, minf=12 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=0,2621440,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 krbd_readwrite: (groupid=3, jobs=1): err= 0: pid=1858884: Thu Jul 2 15:08:42 2020 read: IOPS=1377, BW=5511KiB/s (5643kB/s)(7680MiB/1426930msec) slat (usec): min=200, max=145771, avg=716.74, stdev=355.05 clat (usec): min=31, max=169171, avg=16877.60, stdev=2881.41 lat (usec): min=692, max=170099, avg=17594.97, stdev=2937.38 clat percentiles (usec): | 1.00th=[11207], 5.00th=[12780], 10.00th=[13698], 20.00th=[14746], | 30.00th=[15533], 40.00th=[16188], 50.00th=[16712], 60.00th=[17433], | 70.00th=[17957], 80.00th=[18744], 90.00th=[20055], 95.00th=[21103], | 99.00th=[25035], 99.50th=[28443], 99.90th=[34866], 99.95th=[39060], | 99.99th=[57410] bw ( KiB/s): min= 2312, max= 6776, per=99.99%, avg=5510.53, stdev=292.16, samples=2853 iops : min= 578, max= 1694, avg=1377.63, stdev=73.04, samples=2853 write: IOPS=459, BW=1837KiB/s (1881kB/s)(2560MiB/1426930msec); 0 zone resets slat (nsec): min=1731, max=131919, avg=8242.43, stdev=5170.50 clat (usec): min=4, max=169165, avg=16871.71, stdev=2885.14 lat (usec): min=22, max=169182, avg=16880.10, stdev=2885.33 clat percentiles (usec): | 1.00th=[11207], 5.00th=[12780], 10.00th=[13698], 20.00th=[14746], | 30.00th=[15533], 40.00th=[16188], 50.00th=[16712], 60.00th=[17433], | 70.00th=[17957], 80.00th=[18744], 90.00th=[20055], 95.00th=[21103], | 99.00th=[25297], 99.50th=[28181], 99.90th=[34866], 99.95th=[38536], | 99.99th=[58459] bw ( KiB/s): min= 696, max= 2368, per=100.00%, avg=1837.14, stdev=169.59, samples=2853 iops : min= 174, max= 592, avg=459.28, stdev=42.39, samples=2853 lat (usec) : 10=0.01%, 50=0.01%, 750=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.21%, 20=90.02%, 50=9.75% lat (msec) : 100=0.02%, 250=0.01% cpu : usr=1.34%, sys=3.68%, ctx=1966986, majf=0, minf=15 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued rwts: total=1966000,655440,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): WRITE: bw=37.9MiB/s (39.8MB/s), 37.9MiB/s-37.9MiB/s (39.8MB/s-39.8MB/s), io=10.0GiB (10.7GB), run=269904-269904msec Run status group 1 (all jobs): READ: bw=49.0MiB/s (52.4MB/s), 49.0MiB/s-49.0MiB/s (52.4MB/s-52.4MB/s), io=7678MiB (8051MB), run=153607-153607msec WRITE: bw=16.7MiB/s (17.5MB/s), 16.7MiB/s-16.7MiB/s (17.5MB/s-17.5MB/s), io=2562MiB (2687MB), run=153607-153607msec Run status group 2 (all jobs): WRITE: bw=207MiB/s (217MB/s), 207MiB/s-207MiB/s (217MB/s-217MB/s), io=10.0GiB (10.7GB), run=49582-49582msec Run status group 3 (all jobs): READ: bw=5511KiB/s (5643kB/s), 5511KiB/s-5511KiB/s (5643kB/s-5643kB/s), io=7680MiB (8053MB), run=1426930-1426930msec WRITE: bw=1837KiB/s (1881kB/s), 1837KiB/s-1837KiB/s (1881kB/s-1881kB/s), io=2560MiB (2685MB), run=1426930-1426930msec Disk stats (read/write): rbd0: ios=1965643/893981, merge=0/2379481, ticks=1366950/16608305, in_queue=14637096, util=95.65% ============FIO CONFIG================== [global] invalidate=0 bs=4k iodepth=32 stonewall [rbd_write] ioengine=rbd clientname=admin pool=Ceph rbdname=fio_test rw=randwrite [rbd_readwrite] ioengine=rbd clientname=admin pool=Ceph rbdname=fio_test rw=randrw rwmixread=75 [krbd_write] ioengine=libaio filename=/dev/rbd0 rw=randwrite [krbd_readwrite] ioengine=libaio filename=/dev/rbd0 rw=randrw rwmixread=75 -- Mark Schouten | Tuxis B.V. KvK: 74698818 | http://www.tuxis.nl/ T: +31 318 200208 | i...@tuxis.nl _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user