On 2019-04-14 20:27, Jim Kusznir wrote:
> Hi all:
>
> I've had I/O performance problems pretty much since the beginning of using
> oVirt. I've applied several upgrades as time went on, but strangely, none of
> them have alleviated the problem. VM disk I/O is still very slow to the
> point that running VMs is often painful; it notably affects nearly all my
> VMs, and makes me leary of starting any more. I'm currently running 12 VMs
> and the hosted engine on the stack.
>
> My configuration started out with 1Gbps networking and hyperconverged gluster
> running on a single SSD on each node. It worked, but I/O was painfully slow.
> I also started running out of space, so I added an SSHD on each node,
> created another gluster volume, and moved VMs over to it. I also ran that on
> a dedicated 1Gbps network. I had recurring disk failures (seems that disks
> only lasted about 3-6 months; I warrantied all three at least once, and some
> twice before giving up). I suspect the Dell PERC 6/i was partly to blame;
> the raid card refused to see/acknowledge the disk, but plugging it into a
> normal PC showed no signs of problems. In any case, performance on that
> storage was notably bad, even though the gig-e interface was rarely taxed.
>
> I put in 10Gbps ethernet and moved all the storage on that none the less, as
> several people here said that 1Gbps just wasn't fast enough. Some aspects
> improved a bit, but disk I/O is still slow. And I was still having problems
> with the SSHD data gluster volume eating disks, so I bought a dedicated NAS
> server (supermicro 12 disk dedicated FreeNAS NFS storage system on 10Gbps
> ethernet). Set that up. I found that it was actually FASTER than the
> SSD-based gluster volume, but still slow. Lately its been getting slower,
> too...Don't know why. The FreeNAS server reports network loads around 4MB/s
> on its 10Gbe interface, so its not network constrained. At 4MB/s, I'd sure
> hope the 12 spindle SAS interface wasn't constrained either..... (and disk
> I/O operations on the NAS itself complete much faster).
>
> So, running a test on my NAS against an ISO file I haven't accessed in
> months:
>
> # dd
> if=en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_x64_dvd_x15-59754.iso
> of=/dev/null bs=1024k count=500
>
> 500+0 records in
> 500+0 records out
> 524288000 bytes transferred in 2.459501 secs (213168465 bytes/sec)
>
> Running it on one of my hosts:
>
> root@unifi:/home/kusznir# time dd if=/dev/sda of=/dev/null bs=1024k count=500
> 500+0 records in
> 500+0 records out
> 524288000 bytes (524 MB, 500 MiB) copied, 7.21337 s, 72.7 MB/s
>
> (I don't know if this is a true apples to apples comparison, as I don't have
> a large file inside this VM's image). Even this is faster than I often see.
>
> I have a VoIP Phone server running as a VM. Voicemail and other recordings
> usually fail due to IO issues opening and writing the files. Often, the
> first 4 or so seconds of the recording is missed; sometimes the entire thing
> just fails. I didn't use to have this problem, but its definately been
> getting worse. I finally bit the bullet and ordered a physical server
> dedicated for my VoIP System...But I still want to figure out why I'm having
> all these IO problems. I read on the list of people running 30+ VMs...I feel
> that my IO can't take any more VMs with any semblance of reliability. We
> have a Quickbooks server on here too (windows), and the performance is
> abysmal; my CPA is charging me extra because of all the lost staff time
> waiting on the system to respond and generate reports.....
>
> I'm at my whits end...I started with gluster on SSD with 1Gbps network,
> migrated to 10Gbps network, and now to dedicated high performance NAS box
> over NFS, and still have performance issues.....I don't know how to
> troubleshoot the issue any further, but I've never had these kinds of issues
> when I was playing with other VM technologies. I'd like to get to the point
> where I can resell virtual servers to customers, but I can't do so with my
> current performance levels.
>
> I'd greatly appreciate help troubleshooting this further.
>
> --Jim
>
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZR64VABNT2SGKLNP3XNTHCGFZXSOJAQF/
Been working on optimizing the same. This is where im at currently.
Gluster volume settings.
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
performance.write-behind-window-size: 64MB
performance.flush-behind: on
performance.stat-prefetch: on
server.event-threads: 4
client.event-threads: 8
performance.io-thread-count: 32
network.ping-timeout: 30
cluster.granular-entry-heal: enable
performance.strict-o-direct: on
storage.owner-gid: 36
storage.owner-uid: 36
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.quorum-type: auto
cluster.eager-lock: enable
network.remote-dio: off
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
user.cifs: off
transport.address-family: inet
nfs.disable: off
performance.client-io-threads: on
sysctl options
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728
net.core.netdev_max_backlog = 300000
net.ipv4.tcp_moderate_rcvbuf =1
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_congestion_control=htcp
custom /sbin/ifup-local file, Storage is the bridge name, which ==
ens3f0/1 in bond2
#!/bin/bash
case "$1" in
Storage)
/sbin/ethtool -K ens3f0 tx off rx off tso off gso off
/sbin/ethtool -K ens3f1 tx off rx off tso off gso off
/sbin/ip link set dev ens3f0 txqueuelen 10000
/sbin/ip link set dev ens3f1 txqueuelen 10000
/sbin/ip link set dev bond2 txqueuelen 10000
/sbin/ip link set dev Storage txqueuelen 10000
;;
*)
;;
esac
exit 0
i still have some latency issues, but my writes are up to 264MB/S
sequential on HDD's
output of crystal diskmark on windows 10 vm
Sequential Read (Q= 32,T= 1) : 688.536 MB/s
Sequential Write (Q= 32,T= 1) : 264.254 MB/s
Random Read 4KiB (Q= 8,T= 8) : 176.069 MB/s [ 42985.6 IOPS]
Random Write 4KiB (Q= 8,T= 8) : 63.217 MB/s [ 15433.8 IOPS]
Random Read 4KiB (Q= 32,T= 1) : 159.598 MB/s [ 38964.4 IOPS]
Random Write 4KiB (Q= 32,T= 1) : 54.212 MB/s [ 13235.4 IOPS]
Random Read 4KiB (Q= 1,T= 1) : 3.488 MB/s [ 851.6 IOPS]
Random Write 4KiB (Q= 1,T= 1) : 3.006 MB/s [ 733.9 IOPS]
also enabling libgfapi on the engine was the best performance option i
ever tweaked, easily doubled reads / writes
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct:
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/S7I3PQVERQZT6Q6CXDWJEWCY2ELEGRHY/