Allow user to set the QUEUE_FLAG_NOWAIT optionally using module parameter to retain the default behaviour. Also, update respective allocation flags in the write path. Following are the performance numbers with io_uring fio engine for random read, note that device has been populated fully with randwrite workload before taking these numbers :-
* linux-block (for-next) # grep IOPS pmem*fio | column -t default-nowait-off-1.fio: read: IOPS=3968k, BW=15.1GiB/s default-nowait-off-2.fio: read: IOPS=4084k, BW=15.6GiB/s default-nowait-off-3.fio: read: IOPS=3995k, BW=15.2GiB/s nowait-on-1.fio: read: IOPS=5909k, BW=22.5GiB/s nowait-on-2.fio: read: IOPS=5997k, BW=22.9GiB/s nowait-on-3.fio: read: IOPS=6006k, BW=22.9GiB/s * linux-block (for-next) # grep cpu pmem*fio | column -t default-nowait-off-1.fio: cpu : usr=6.38%, sys=31.37%, ctx=220427659 default-nowait-off-2.fio: cpu : usr=6.19%, sys=31.45%, ctx=229825635 default-nowait-off-3.fio: cpu : usr=6.17%, sys=31.22%, ctx=221896158 nowait-on-1.fio: cpu : usr=10.56%, sys=87.82%, ctx=24730 nowait-on-2.fio: cpu : usr=9.92%, sys=88.36%, ctx=23427 nowait-on-3.fio: cpu : usr=9.85%, sys=89.04%, ctx=23237 * linux-block (for-next) # grep slat pmem*fio | column -t default-nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06 default-nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94 default-nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24 nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60 nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37 nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08 Please let me know if further testing is needed I've ran fio verification job in order to make verify these changes. Chaitanya Kulkarni (1): pmem: allow user to set QUEUE_FLAG_NOWAIT drivers/nvdimm/pmem.c | 6 ++++++ 1 file changed, 6 insertions(+) linux-block (for-next) # sh test-pmem.sh + git log -1 commit 6df7042a11e06465b1b8f275170cb5593d8d7dcc (HEAD -> for-next) Author: Chaitanya Kulkarni <k...@nvidia.com> Date: Fri May 12 03:24:54 2023 -0700 pmem: allow user to set QUEUE_FLAG_NOWAIT Allow user to set the QUEUE_FLAG_NOWAIT optionally using module parameter to retain the default behaviour. Also, update respective allocation flags in the write path. Following are the performance numbers with io_uring fio engine for random read, note that device has been populated fully with randwrite workload before taking these numbers :- + rmmod nd_pmem rmmod: ERROR: Module nd_pmem is not currently loaded + makej M=drivers/nvdimm + insmod drivers/nvdimm/nd_pmem.ko + sleep 1 + test_pmem default-nowait-off + sleep 1 + fio fio/verify.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 write-and-verify: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=16 fio-3.34 Starting 1 process Jobs: 1 (f=1) write-and-verify: (groupid=0, jobs=1): err= 0: pid=5358: Fri May 12 03:25:49 2023 read: IOPS=266k, BW=1039MiB/s (1089MB/s)(566MiB/545msec) slat (nsec): min=511, max=49865, avg=2732.59, stdev=1231.82 clat (nsec): min=1703, max=134486, avg=56488.86, stdev=7471.91 lat (usec): min=6, max=138, avg=59.22, stdev= 7.76 clat percentiles (usec): | 1.00th=[ 43], 5.00th=[ 47], 10.00th=[ 49], 20.00th=[ 51], | 30.00th=[ 53], 40.00th=[ 55], 50.00th=[ 57], 60.00th=[ 58], | 70.00th=[ 60], 80.00th=[ 62], 90.00th=[ 65], 95.00th=[ 70], | 99.00th=[ 83], 99.50th=[ 90], 99.90th=[ 103], 99.95th=[ 111], | 99.99th=[ 124] write: IOPS=214k, BW=835MiB/s (876MB/s)(896MiB/1073msec); 0 zone resets slat (nsec): min=1473, max=92145, avg=4049.04, stdev=1645.45 clat (usec): min=29, max=232, avg=70.52, stdev=12.86 lat (usec): min=33, max=234, avg=74.57, stdev=13.54 clat percentiles (usec): | 1.00th=[ 44], 5.00th=[ 53], 10.00th=[ 56], 20.00th=[ 61], | 30.00th=[ 65], 40.00th=[ 68], 50.00th=[ 71], 60.00th=[ 73], | 70.00th=[ 76], 80.00th=[ 79], 90.00th=[ 85], 95.00th=[ 92], | 99.00th=[ 112], 99.50th=[ 121], 99.90th=[ 151], 99.95th=[ 165], | 99.99th=[ 188] bw ( KiB/s): min=115224, max=909344, per=71.53%, avg=611669.33, stdev=432768.96, samples=3 iops : min=28806, max=227336, avg=152917.33, stdev=108192.24, samples=3 lat (usec) : 2=0.01%, 10=0.01%, 20=0.01%, 50=7.92%, 100=90.44% lat (usec) : 250=1.64% cpu : usr=41.68%, sys=55.78%, ctx=6691, majf=0, minf=3975 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=144933,229376,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: bw=1039MiB/s (1089MB/s), 1039MiB/s-1039MiB/s (1089MB/s-1089MB/s), io=566MiB (594MB), run=545-545msec WRITE: bw=835MiB/s (876MB/s), 835MiB/s-835MiB/s (876MB/s-876MB/s), io=896MiB (940MB), run=1073-1073msec Disk stats (read/write): pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% + fio fio/randwrite.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 RANDWRITE: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=2 ... fio-3.34 Starting 48 processes Jobs: 43 (f=43): [w(16),_(1),w(1),_(1),w(1),_(1),w(11),_(1),w(14),_(1)][80.0%][w=9744MiB/s][w=2494k IOPS][eta 00m:01s] RANDWRITE: (groupid=0, jobs=48): err= 0: pid=5377: Fri May 12 03:25:54 2023 write: IOPS=2400k, BW=9374MiB/s (9829MB/s)(42.0GiB/4588msec); 0 zone resets slat (nsec): min=411, max=9672.1k, avg=6856.45, stdev=13756.39 clat (nsec): min=70, max=12541k, avg=28576.27, stdev=32832.06 lat (nsec): min=1583, max=12543k, avg=35432.72, stdev=34424.05 clat percentiles (nsec): | 1.00th=[ 916], 5.00th=[ 2288], 10.00th=[ 4896], 20.00th=[ 10560], | 30.00th=[ 17792], 40.00th=[ 22144], 50.00th=[ 25984], 60.00th=[ 29824], | 70.00th=[ 34048], 80.00th=[ 39680], 90.00th=[ 51456], 95.00th=[ 65280], | 99.00th=[102912], 99.50th=[122368], 99.90th=[199680], 99.95th=[276480], | 99.99th=[888832] bw ( MiB/s): min= 8098, max=13617, per=100.00%, avg=10176.24, stdev=45.01, samples=366 iops : min=2073313, max=3486177, avg=2605112.23, stdev=11521.54, samples=366 lat (nsec) : 100=0.01%, 250=0.47%, 500=0.36%, 750=0.08%, 1000=0.14% lat (usec) : 2=2.52%, 4=5.05%, 10=10.81%, 20=15.78%, 50=53.95% lat (usec) : 100=9.72%, 250=1.06%, 500=0.04%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01% cpu : usr=6.89%, sys=29.96%, ctx=6989842, majf=0, minf=584 IO depths : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,11010048,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=2 Run status group 0 (all jobs): WRITE: bw=9374MiB/s (9829MB/s), 9374MiB/s-9374MiB/s (9829MB/s-9829MB/s), io=42.0GiB (45.1GB), run=4588-4588msec Disk stats (read/write): pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% + for i in 1 2 3 + fio fio/randread.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 --output=pmem-default-nowait-off-1.fio + for i in 1 2 3 [r(48)][100.0%][r=16.6GiB/s][r=4348k IOPS][eta 00m:00s] + fio fio/randread.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 --output=pmem-default-nowait-off-2.fio + for i in 1 2 3 [r(48)][100.0%][r=15.8GiB/s][r=4138k IOPS][eta 00m:00s] + fio fio/randread.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 --output=pmem-default-nowait-off-3.fio + rmmod nd_pmem: [r(48)][100.0%][r=16.6GiB/s][r=4346k IOPS][eta 00m:00s] + insmod drivers/nvdimm/nd_pmem.ko nowait=1 + sleep 1 + test_pmem nowait-on + sleep 1 + fio fio/verify.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 write-and-verify: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=16 fio-3.34 Starting 1 process Jobs: 1 (f=1) write-and-verify: (groupid=0, jobs=1): err= 0: pid=6062: Fri May 12 03:28:59 2023 read: IOPS=492k, BW=1923MiB/s (2016MB/s)(567MiB/295msec) slat (nsec): min=1021, max=45136, avg=1220.20, stdev=473.55 clat (nsec): min=812, max=79261, avg=30452.86, stdev=3469.19 lat (nsec): min=1944, max=81274, avg=31673.05, stdev=3575.81 clat percentiles (nsec): | 1.00th=[28288], 5.00th=[28800], 10.00th=[29056], 20.00th=[29056], | 30.00th=[29312], 40.00th=[29568], 50.00th=[29568], 60.00th=[29824], | 70.00th=[30080], 80.00th=[30336], 90.00th=[30848], 95.00th=[37120], | 99.00th=[48384], 99.50th=[49408], 99.90th=[58112], 99.95th=[59648], | 99.99th=[78336] write: IOPS=215k, BW=839MiB/s (880MB/s)(896MiB/1068msec); 0 zone resets slat (usec): min=2, max=122, avg= 4.30, stdev= 1.52 clat (nsec): min=401, max=190492, avg=69931.70, stdev=9390.70 lat (usec): min=3, max=289, avg=74.23, stdev= 9.87 clat percentiles (usec): | 1.00th=[ 54], 5.00th=[ 58], 10.00th=[ 61], 20.00th=[ 64], | 30.00th=[ 67], 40.00th=[ 69], 50.00th=[ 70], 60.00th=[ 72], | 70.00th=[ 74], 80.00th=[ 76], 90.00th=[ 79], 95.00th=[ 83], | 99.00th=[ 96], 99.50th=[ 122], 99.90th=[ 161], 99.95th=[ 165], | 99.99th=[ 176] bw ( KiB/s): min=811952, max=899120, per=99.59%, avg=855536.00, stdev=61637.08, samples=2 iops : min=202988, max=224780, avg=213884.00, stdev=15409.27, samples=2 lat (nsec) : 500=0.01%, 1000=0.01% lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=38.69%, 100=60.80% lat (usec) : 250=0.51% cpu : usr=38.33%, sys=61.60%, ctx=1, majf=0, minf=3984 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=145223,229376,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=16 Run status group 0 (all jobs): READ: bw=1923MiB/s (2016MB/s), 1923MiB/s-1923MiB/s (2016MB/s-2016MB/s), io=567MiB (595MB), run=295-295msec WRITE: bw=839MiB/s (880MB/s), 839MiB/s-839MiB/s (880MB/s-880MB/s), io=896MiB (940MB), run=1068-1068msec Disk stats (read/write): pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% + fio fio/randwrite.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 RANDWRITE: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=2 ... fio-3.34 Starting 48 processes Jobs: 48 (f=48) RANDWRITE: (groupid=0, jobs=48): err= 0: pid=6065: Fri May 12 03:29:00 2023 write: IOPS=10.6M, BW=40.3GiB/s (43.2GB/s)(42.0GiB/1043msec); 0 zone resets slat (nsec): min=1162, max=10395k, avg=3946.17, stdev=6436.85 clat (nsec): min=70, max=10396k, avg=4608.73, stdev=6810.12 lat (nsec): min=1282, max=10403k, avg=8554.90, stdev=9532.53 clat percentiles (nsec): | 1.00th=[ 2224], 5.00th=[ 2544], 10.00th=[ 2800], 20.00th=[ 3184], | 30.00th=[ 3472], 40.00th=[ 3760], 50.00th=[ 4080], 60.00th=[ 4448], | 70.00th=[ 4896], 80.00th=[ 5408], 90.00th=[ 6304], 95.00th=[ 7200], | 99.00th=[14016], 99.50th=[27776], 99.90th=[42752], 99.95th=[46848], | 99.99th=[80384] bw ( MiB/s): min=40342, max=42969, per=100.00%, avg=41656.06, stdev=29.12, samples=93 iops : min=10327717, max=11000181, avg=10663949.00, stdev=7454.42, samples=93 lat (nsec) : 100=0.01%, 250=0.03%, 500=0.01%, 750=0.01%, 1000=0.01% lat (usec) : 2=0.28%, 4=47.32%, 10=50.83%, 20=0.91%, 50=0.58% lat (usec) : 100=0.02%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01% lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01% cpu : usr=15.39%, sys=83.72%, ctx=1002, majf=0, minf=580 IO depths : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,11010048,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=2 Run status group 0 (all jobs): WRITE: bw=40.3GiB/s (43.2GB/s), 40.3GiB/s-40.3GiB/s (43.2GB/s-43.2GB/s), io=42.0GiB (45.1GB), run=1043-1043msec Disk stats (read/write): pmem0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00% + for i in 1 2 3 + fio fio/randread.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 --output=pmem-nowait-on-1.fio + for i in 1 2 3 [r(48)][100.0%][r=22.8GiB/s][r=5987k IOPS][eta 00m:00s] + fio fio/randread.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 --output=pmem-nowait-on-2.fio + for i in 1 2 3 [r(48)][100.0%][r=22.8GiB/s][r=5990k IOPS][eta 00m:00s] + fio fio/randread.fio --ioengine=io_uring --size=896M --filename=/dev/pmem0 --output=pmem-nowait-on-3.fio + rmmod nd_pmem: [r(48)][100.0%][r=23.0GiB/s][r=6016k IOPS][eta 00m:00s] linux-block (for-next) # for i in IOPS slat cpu; do grep $i bc-*fio | column -t ; done linux-block (for-next) # for i in IOPS slat cpu; do grep $i pmem-*fio | column -t ; done pmem-default-nowait-off-1.fio: read: IOPS=3968k, BW=15.1GiB/s (16.3GB/s)(908GiB/60002msec) pmem-default-nowait-off-2.fio: read: IOPS=4084k, BW=15.6GiB/s (16.7GB/s)(935GiB/60001msec) pmem-default-nowait-off-3.fio: read: IOPS=3995k, BW=15.2GiB/s (16.4GB/s)(914GiB/60002msec) pmem-nowait-on-1.fio: read: IOPS=5909k, BW=22.5GiB/s (24.2GB/s)(1352GiB/60003msec) pmem-nowait-on-2.fio: read: IOPS=5997k, BW=22.9GiB/s (24.6GB/s)(1373GiB/60002msec) pmem-nowait-on-3.fio: read: IOPS=6006k, BW=22.9GiB/s (24.6GB/s)(1375GiB/60002msec) pmem-default-nowait-off-1.fio: slat (nsec): min=431, max=50423k, avg=9424.06, stdev=19769.73 pmem-default-nowait-off-2.fio: slat (nsec): min=420, max=35992k, avg=9193.94, stdev=19814.91 pmem-default-nowait-off-3.fio: slat (nsec): min=430, max=40737k, avg=9244.24, stdev=22646.40 pmem-nowait-on-1.fio: slat (nsec): min=1232, max=40098k, avg=7518.60, stdev=26037.75 pmem-nowait-on-2.fio: slat (nsec): min=1303, max=52107k, avg=7423.37, stdev=24122.06 pmem-nowait-on-3.fio: slat (nsec): min=1123, max=40193k, avg=7409.08, stdev=17630.05 pmem-default-nowait-off-1.fio: cpu : usr=6.38%, sys=31.37%, ctx=220427659, majf=0, minf=641 pmem-default-nowait-off-2.fio: cpu : usr=6.19%, sys=31.45%, ctx=229825635, majf=0, minf=639 pmem-default-nowait-off-3.fio: cpu : usr=6.17%, sys=31.22%, ctx=221896158, majf=0, minf=650 pmem-nowait-on-1.fio: cpu : usr=10.56%, sys=87.82%, ctx=24730, majf=0, minf=784 pmem-nowait-on-2.fio: cpu : usr=9.92%, sys=88.36%, ctx=23427, majf=0, minf=720 pmem-nowait-on-3.fio: cpu : usr=9.85%, sys=89.04%, ctx=23237, majf=0, minf=724 linux-block (for-next) # -- 2.40.0