Thank you for the analysis. I have some further comments:
First off, filebench pre-writes the files before doing oltp benchmarks, so I
dont think the thin provisioning is at play here.
I will double check this, but if you dont hear otherwise, please presume that
is the case :)
Secondly, I am surprised at your recommendation to use virtio instead of
virtio-scsi. since the writeup for virtio-scsi claims it has equivalent
performance in general, and adds better scaling
As far as your suggestion for using multiple disks for scaling higher:
We are using an SSD. Isnt the whole advantage of using SSD drives, that you can
get the IOP/s performance of 10 drives, out of a single drive?
We certainly get that using it natively, outside of a VM.
SO it would be nice to see performance approaching that within an ovirt VM.
- Original Message -
From: "Nir Soffer"
To: "Philip Brown"
Cc: "users" , "qemu-block" , "Stefan
Hajnoczi" , "Paolo Bonzini" , "Sergio
Lopez Pascual" , "Mordechai Lehrer" ,
Sent: Tuesday, July 21, 2020 4:23:36 AM
Subject: [BULK] Re: [ovirt-users] very very bad iscsi performance
On Tue, Jul 21, 2020 at 2:20 AM Philip Brown wrote:
> yes I am testing small writes. "oltp workload" means, simulation of OLTP
> database access.
> You asked me to test the speed of iscsi from another host, which is very
> reasonable. So here are the results,
> run from another node in the ovirt cluster.
> Setup is using:
> - exact same vg device, exported via iscsi
> - mounted directly into another physical host running centos 7, rather than
> a VM running on it
> - literaly the same filesystem, again, mounted noatime
> I ran the same oltp workload. this setup gives the following results over 2
> grep Summary oltp.iscsimount.?
> oltp.iscsimount.1:35906: 63.433: IO Summary: 648762 ops, 10811.365 ops/s,
> (5375/5381 r/w), 21.4mb/s,475us cpu/op, 1.3ms latency
> oltp.iscsimount.2:36830: 61.072: IO Summary: 824557 ops, 13741.050 ops/s,
> (6844/6826 r/w), 27.2mb/s,429us cpu/op, 1.1ms latency
> As requested, I attach virsh output, and qemu log
What we see in your logs:
You are using:
- thin disk - qcow2 image on logical volume:
-object iothread,id=iothread1 \
This is the most flexible option oVirt has, but not the default.
Known issue with such a disk is possible pausing of the VM when the
disk becomes full,
if oVirt cannot extend the underlying logical volume fast enough. It
can be mitigated by
using larger chunks in vdsm.
We recommend these settings if you are going to use VMs with heavy I/O
with thin disks:
# cat /etc/vdsm/vdsm.conf.d/99-local.conf
# Together with volume_utilization_chunk_mb, set the minimal free
# space before a thin provisioned block volume is extended. Use lower
# values to extend earlier.
# default value:
# volume_utilization_percent = 50
volume_utilization_percent = 25
# Size of extension chunk in megabytes, and together with
# volume_utilization_percent, set the free space limit. Use higher
# values to extend in bigger chunks.
# default value:
# volume_utilization_chunk_mb = 1024
volume_utilization_chunk_mb = 4096
With this configuration, when free space on the disk is 1 GiB, oVirt will extend
the disk by 4 GiB. So your disk may be up to 5 GiB larger than the used space,
but if the VM is writing data very fast, the chance of pausing is reduced.
If you want to reduce the chance of pausing your database in the most busy times
to zero, using a preallocated disk is the way.
In oVirt 4.4. you can check this option when creating a disk:
[x] Enable Incremental Backup
Allocation Policy: [Preallocated]
You will get a preallocated disk in the specified size, using qcow2
format. This gives
you both the option to use incremental backup, faster disk operations
in oVirt (since
qemu-img does not need to read the entire disk), and avoids the
pausing issue. It may also
defeat thin provisioning, but if your backend storage supports thin
it does not matter.
To get best performance for database use case preallocated volume
should be better.
Please try to benchmark:
- raw preallocated disk
- using virtio instead of virtio-scsi
If your database can use multiple disks, you may get better
performance by adding multiple
disks and use one