[ovirt-users] Re: Poor I/O Performance (again...)

Alex McWhirter Mon, 15 Apr 2019 09:41:49 -0700

On 2019-04-14 23:22, Leo David wrote:

> Hi, 
> Thank you Alex, I was looking for some optimisation settings as well, since I 
> am pretty much in the same boat, using ssd based replicate-distributed 
> volumes across 12 hosts. 
> Could anyone else (maybe even from from ovirt or rhev team) validate these 
> settings or add some other tweaks as well, so we can use them as standard ? 
> Thank you very much again ! 
> 
> On Mon, Apr 15, 2019, 05:56 Alex McWhirter <[email protected]> wrote: 
> 
> On 2019-04-14 20:27, Jim Kusznir wrote: 
> 
> Hi all:
> I've had I/O performance problems pretty much since the beginning of using 
> oVirt.  I've applied several upgrades as time went on, but strangely, none of 
> them have alleviated the problem.  VM disk I/O is still very slow to the 
> point that running VMs is often painful; it notably affects nearly all my 
> VMs, and makes me leary of starting any more.  I'm currently running 12 VMs 
> and the hosted engine on the stack. 
> My configuration started out with 1Gbps networking and hyperconverged gluster 
> running on a single SSD on each node.  It worked, but I/O was painfully slow. 
>  I also started running out of space, so I added an SSHD on each node, 
> created another gluster volume, and moved VMs over to it.  I also ran that on 
> a dedicated 1Gbps network.  I had recurring disk failures (seems that disks 
> only lasted about 3-6 months; I warrantied all three at least once, and some 
> twice before giving up).  I suspect the Dell PERC 6/i was partly to blame; 
> the raid card refused to see/acknowledge the disk, but plugging it into a 
> normal PC showed no signs of problems.  In any case, performance on that 
> storage was notably bad, even though the gig-e interface was rarely taxed. 
> I put in 10Gbps ethernet and moved all the storage on that none the less, as 
> several people here said that 1Gbps just wasn't fast enough.  Some aspects 
> improved a bit, but disk I/O is still slow.  And I was still having problems 
> with the SSHD data gluster volume eating disks, so I bought a dedicated NAS 
> server (supermicro 12 disk dedicated FreeNAS NFS storage system on 10Gbps 
> ethernet).  Set that up.  I found that it was actually FASTER than the 
> SSD-based gluster volume, but still slow.  Lately its been getting slower, 
> too...Don't know why.  The FreeNAS server reports network loads around 4MB/s 
> on its 10Gbe interface, so its not network constrained.  At 4MB/s, I'd sure 
> hope the 12 spindle SAS interface wasn't constrained either.....  (and disk 
> I/O operations on the NAS itself complete much faster). 
> So, running a test on my NAS against an ISO file I haven't accessed in 
> months: 
> 
> # dd 
> if=en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_x64_dvd_x15-59754.iso
>  of=/dev/null bs=1024k count=500                                              
>                 
> 500+0 records in 
> 500+0 records out 
> 524288000 bytes transferred in 2.459501 secs (213168465 bytes/sec) 
> Running it on one of my hosts: 
> 
> root@unifi:/home/kusznir# time dd if=/dev/sda of=/dev/null bs=1024k count=500 
> 500+0 records in 
> 500+0 records out 
> 524288000 bytes (524 MB, 500 MiB) copied, 7.21337 s, 72.7 MB/s 
> (I don't know if this is a true apples to apples comparison, as I don't have 
> a large file inside this VM's image).  Even this is faster than I often see. 
> I have a VoIP Phone server running as a VM.  Voicemail and other recordings 
> usually fail due to IO issues opening and writing the files.  Often, the 
> first 4 or so seconds of the recording is missed; sometimes the entire thing 
> just fails.  I didn't use to have this problem, but its definately been 
> getting worse.  I finally bit the bullet and ordered a physical server 
> dedicated for my VoIP System...But I still want to figure out why I'm having 
> all these IO problems.  I read on the list of people running 30+ VMs...I feel 
> that my IO can't take any more VMs with any semblance of reliability.  We 
> have a Quickbooks server on here too (windows), and the performance is 
> abysmal; my CPA is charging me extra because of all the lost staff time 
> waiting on the system to respond and generate reports..... 
> I'm at my whits end...I started with gluster on SSD with 1Gbps network, 
> migrated to 10Gbps network, and now to dedicated high performance NAS box 
> over NFS, and still have performance issues.....I don't know how to 
> troubleshoot the issue any further, but I've never had these kinds of issues 
> when I was playing with other VM technologies.  I'd like to get to the point 
> where I can resell virtual servers to customers, but I can't do so with my 
> current performance levels. 
> I'd greatly appreciate help troubleshooting this further. 
> --Jim 
> _______________________________________________
> Users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/[email protected]/message/ZR64VABNT2SGKLNP3XNTHCGFZXSOJAQF/
>  
> 
> Been working on optimizing the same. This is where im at currently. 
> 
> Gluster volume settings. 
> 
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> performance.write-behind-window-size: 64MB
> performance.flush-behind: on
> performance.stat-prefetch: on
> server.event-threads: 4
> client.event-threads: 8
> performance.io-thread-count: 32
> network.ping-timeout: 30
> cluster.granular-entry-heal: enable
> performance.strict-o-direct: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> features.shard: on
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: off
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> auth.allow: *
> user.cifs: off
> transport.address-family: inet
> nfs.disable: off
> performance.client-io-threads: on 
> 
> sysctl options 
> 
> net.core.rmem_max = 134217728
> net.core.wmem_max = 134217728
> net.ipv4.tcp_rmem = 4096 87380 134217728
> net.ipv4.tcp_wmem = 4096 65536 134217728
> net.core.netdev_max_backlog = 300000
> net.ipv4.tcp_moderate_rcvbuf =1
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_congestion_control=htcp 
> 
> custom /sbin/ifup-local file, Storage is the bridge name, which == ens3f0/1 
> in bond2 
> 
> #!/bin/bash
> case "$1" in
> Storage)
> /sbin/ethtool -K ens3f0 tx off rx off tso off gso off
> /sbin/ethtool -K ens3f1 tx off rx off tso off gso off
> /sbin/ip link set dev ens3f0 txqueuelen 10000
> /sbin/ip link set dev ens3f1 txqueuelen 10000
> /sbin/ip link set dev bond2 txqueuelen 10000
> /sbin/ip link set dev Storage txqueuelen 10000
> ;;
> *)
> ;;
> esac
> exit 0 
> 
> i still have some latency issues, but my writes are up to 264MB/S sequential 
> on HDD's 
> 
> output of crystal diskmark on windows 10 vm 
> 
> Sequential Read (Q= 32,T= 1) :   688.536 MB/s
> Sequential Write (Q= 32,T= 1) :   264.254 MB/s
> Random Read 4KiB (Q=  8,T= 8) :   176.069 MB/s [  42985.6 IOPS]
> Random Write 4KiB (Q=  8,T= 8) :    63.217 MB/s [  15433.8 IOPS]
> Random Read 4KiB (Q= 32,T= 1) :   159.598 MB/s [  38964.4 IOPS]
> Random Write 4KiB (Q= 32,T= 1) :    54.212 MB/s [  13235.4 IOPS]
> Random Read 4KiB (Q=  1,T= 1) :     3.488 MB/s [    851.6 IOPS]
> Random Write 4KiB (Q=  1,T= 1) :     3.006 MB/s [    733.9 IOPS] 
> 
> also enabling libgfapi on the engine was the best performance option i ever 
> tweaked, easily doubled reads / writes 
> _______________________________________________
> Users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/[email protected]/message/S7I3PQVERQZT6Q6CXDWJEWCY2ELEGRHY/


Also with all of that said, i've mostly solved the rest of my issues by
enabling performance.read-ahead on the gluster volume. I am saturating
my 10G network, which translates to 700MB/s reads, 350MB/s writes
(replica 2) 

just make sure your local read ahead settings on the bricks are sane,
I.E "blockdev --getra /dev/sdx", mine is 8192

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/DI4XNS7JSZNKWMWQGTSOZTB2MSXHZ4QC/

[ovirt-users] Re: Poor I/O Performance (again...)

Reply via email to