On 5/6/22 06:30, Lukáš Doktor wrote:
Also let me briefly share the details about the execution:
Thanks, this is super useful!
I got very similar results to yours:
QEMU 6.2 bw=1132MiB/s
QEMU 7.0 bw=1046MiB/s
QEMU 7.0 + patch bw=1012MiB/s
QEMU 7.0 + tweaked patch bw=1077MiB/s
"tweaked patch" is moving qemu_cond_signal after qemu_mutex_unlock.
It's better than QemuSemaphore in QEMU 7.0 but still not as good as
the original. /me thinks
Paolo
---
mkdir -p /var/lib/runperf/runperf-nbd/
truncate -s 256M /var/lib/runperf/runperf-nbd//disk.img
nohup qemu-nbd -t -k /var/lib/runperf/runperf-nbd//socket -f raw
/var/lib/runperf/runperf-nbd//disk.img &> $(mktemp
/var/lib/runperf/runperf-nbd//qemu_nbd_XXXX.log) & echo $! >>
/var/lib/runperf/runperf-nbd//kill_pids
for PID in $(cat /var/lib/runperf/runperf-nbd//kill_pids); do disown -h $PID;
done
export TERM=xterm-256color
true
mkdir -p /var/lib/runperf/runperf-nbd/
cat > /var/lib/runperf/runperf-nbd/nbd.fio << \Gr1UaS
# To use fio to test nbdkit:
#
# nbdkit -U - memory size=256M --run 'export unixsocket; fio examples/nbd.fio'
#
# To use fio to test qemu-nbd:
#
# rm -f /tmp/disk.img /tmp/socket
# truncate -s 256M /tmp/disk.img
# export target=/tmp/socket
# qemu-nbd -t -k $target -f raw /tmp/disk.img &
# fio examples/nbd.fio
# killall qemu-nbd
[global]
bs = $@
runtime = 30
ioengine = nbd
iodepth = 32
direct = 1
sync = 0
time_based = 1
clocksource = gettimeofday
ramp_time = 5
write_bw_log = fio
write_iops_log = fio
write_lat_log = fio
log_avg_msec = 1000
write_hist_log = fio
log_hist_msec = 10000
# log_hist_coarseness = 4 # 76 bins
rw = $@
uri=nbd+unix:///?socket=/var/lib/runperf/runperf-nbd/socket
# Starting from nbdkit 1.14 the following will work:
#uri=${uri}
[job0]
offset=0
[job1]
offset=64m
[job2]
offset=128m
[job3]
offset=192m
Gr1UaS
benchmark_bin=/usr/local/bin/fio pbench-fio --block-sizes=4
--job-file=/var/lib/runperf/runperf-nbd/nbd.fio --numjobs=4 --runtime=60
--samples=5 --test-types=write --clients=$WORKER_IP
---
I am using pbench to run the execution, but you can simply replace the "$@" variables in
the produced "/var/lib/runperf/runperf-nbd/nbd.fio" and run it directly using fio.
Regards,
Lukáš
Dne 05. 05. 22 v 15:27 Paolo Bonzini napsal(a):
On 5/5/22 14:44, Daniel P. Berrangé wrote:
util/thread-pool.c uses qemu_sem_*() to notify worker threads when work
becomes available. It makes sense that this operation is
performance-critical and that's why the benchmark regressed.
Doh, I questioned whether the change would have a performance impact,
and it wasn't thought to be used in perf critical places
The expectation was that there would be no contention and thus no overhead because
of the pool->lock that exists anyway, but that was optimistic.
Lukáš, can you run a benchmark with this condvar implementation that was
suggested by Stefan:
https://lore.kernel.org/qemu-devel/20220505131346.823941-1-pbonz...@redhat.com/raw
?
If it still regresses, we can either revert the patch or look at a different
implementation (even getting rid of the global queue is an option).
Thanks,
Paolo