[ovirt-users] Re: Poor I/O Performance (again...)

Alex McWhirter Mon, 15 Apr 2019 10:26:33 -0700

On 2019-04-15 13:08, Leo David wrote:

> Thank you Alex ! 
> I will try these performance settings. 
> If someone from the dev guys could validate and recommend those as a good 
> standard configuration, it would be just great. 
> If they are ok,  wouldn't be a nice to have them applied from within UI with 
> the "Optimize for VirtStore"  button ? 
> Thnak you ! 
> 
> On Mon, Apr 15, 2019 at 7:39 PM Alex McWhirter <a...@triadic.us> wrote: 
> 
> On 2019-04-14 23:22, Leo David wrote: 
> Hi, 
> Thank you Alex, I was looking for some optimisation settings as well, since I 
> am pretty much in the same boat, using ssd based replicate-distributed 
> volumes across 12 hosts. 
> Could anyone else (maybe even from from ovirt or rhev team) validate these 
> settings or add some other tweaks as well, so we can use them as standard ? 
> Thank you very much again ! 
> 
> On Mon, Apr 15, 2019, 05:56 Alex McWhirter <a...@triadic.us> wrote: 
> 
> On 2019-04-14 20:27, Jim Kusznir wrote: 
> 
> Hi all:
> I've had I/O performance problems pretty much since the beginning of using 
> oVirt.  I've applied several upgrades as time went on, but strangely, none of 
> them have alleviated the problem.  VM disk I/O is still very slow to the 
> point that running VMs is often painful; it notably affects nearly all my 
> VMs, and makes me leary of starting any more.  I'm currently running 12 VMs 
> and the hosted engine on the stack. 
> My configuration started out with 1Gbps networking and hyperconverged gluster 
> running on a single SSD on each node.  It worked, but I/O was painfully slow. 
>  I also started running out of space, so I added an SSHD on each node, 
> created another gluster volume, and moved VMs over to it.  I also ran that on 
> a dedicated 1Gbps network.  I had recurring disk failures (seems that disks 
> only lasted about 3-6 months; I warrantied all three at least once, and some 
> twice before giving up).  I suspect the Dell PERC 6/i was partly to blame; 
> the raid card refused to see/acknowledge the disk, but plugging it into a 
> normal PC showed no signs of problems.  In any case, performance on that 
> storage was notably bad, even though the gig-e interface was rarely taxed. 
> I put in 10Gbps ethernet and moved all the storage on that none the less, as 
> several people here said that 1Gbps just wasn't fast enough.  Some aspects 
> improved a bit, but disk I/O is still slow.  And I was still having problems 
> with the SSHD data gluster volume eating disks, so I bought a dedicated NAS 
> server (supermicro 12 disk dedicated FreeNAS NFS storage system on 10Gbps 
> ethernet).  Set that up.  I found that it was actually FASTER than the 
> SSD-based gluster volume, but still slow.  Lately its been getting slower, 
> too...Don't know why.  The FreeNAS server reports network loads around 4MB/s 
> on its 10Gbe interface, so its not network constrained.  At 4MB/s, I'd sure 
> hope the 12 spindle SAS interface wasn't constrained either.....  (and disk 
> I/O operations on the NAS itself complete much faster). 
> So, running a test on my NAS against an ISO file I haven't accessed in 
> months: 
> 
> # dd 
> if=en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_x64_dvd_x15-59754.iso
>  of=/dev/null bs=1024k count=500                                              
>                 
> 500+0 records in 
> 500+0 records out 
> 524288000 bytes transferred in 2.459501 secs (213168465 bytes/sec) 
> Running it on one of my hosts: 
> 
> root@unifi:/home/kusznir# time dd if=/dev/sda of=/dev/null bs=1024k count=500 
> 500+0 records in 
> 500+0 records out 
> 524288000 bytes (524 MB, 500 MiB) copied, 7.21337 s, 72.7 MB/s 
> (I don't know if this is a true apples to apples comparison, as I don't have 
> a large file inside this VM's image).  Even this is faster than I often see. 
> I have a VoIP Phone server running as a VM.  Voicemail and other recordings 
> usually fail due to IO issues opening and writing the files.  Often, the 
> first 4 or so seconds of the recording is missed; sometimes the entire thing 
> just fails.  I didn't use to have this problem, but its definately been 
> getting worse.  I finally bit the bullet and ordered a physical server 
> dedicated for my VoIP System...But I still want to figure out why I'm having 
> all these IO problems.  I read on the list of people running 30+ VMs...I feel 
> that my IO can't take any more VMs with any semblance of reliability.  We 
> have a Quickbooks server on here too (windows), and the performance is 
> abysmal; my CPA is charging me extra because of all the lost staff time 
> waiting on the system to respond and generate reports..... 
> I'm at my whits end...I started with gluster on SSD with 1Gbps network, 
> migrated to 10Gbps network, and now to dedicated high performance NAS box 
> over NFS, and still have performance issues.....I don't know how to 
> troubleshoot the issue any further, but I've never had these kinds of issues 
> when I was playing with other VM technologies.  I'd like to get to the point 
> where I can resell virtual servers to customers, but I can't do so with my 
> current performance levels. 
> I'd greatly appreciate help troubleshooting this further. 
> --Jim 
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZR64VABNT2SGKLNP3XNTHCGFZXSOJAQF/
>  
> 
> Been working on optimizing the same. This is where im at currently. 
> 
> Gluster volume settings. 
> 
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> performance.write-behind-window-size: 64MB
> performance.flush-behind: on
> performance.stat-prefetch: on
> server.event-threads: 4
> client.event-threads: 8
> performance.io-thread-count: 32
> network.ping-timeout: 30
> cluster.granular-entry-heal: enable
> performance.strict-o-direct: on
> storage.owner-gid: 36
> storage.owner-uid: 36
> features.shard: on
> cluster.shd-wait-qlength: 10000
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: off
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> auth.allow: *
> user.cifs: off
> transport.address-family: inet
> nfs.disable: off
> performance.client-io-threads: on 
> 
> sysctl options 
> 
> net.core.rmem_max = 134217728
> net.core.wmem_max = 134217728
> net.ipv4.tcp_rmem = 4096 87380 134217728
> net.ipv4.tcp_wmem = 4096 65536 134217728
> net.core.netdev_max_backlog = 300000
> net.ipv4.tcp_moderate_rcvbuf =1
> net.ipv4.tcp_no_metrics_save = 1
> net.ipv4.tcp_congestion_control=htcp 
> 
> custom /sbin/ifup-local file, Storage is the bridge name, which == ens3f0/1 
> in bond2 
> 
> #!/bin/bash
> case "$1" in
> Storage)
> /sbin/ethtool -K ens3f0 tx off rx off tso off gso off
> /sbin/ethtool -K ens3f1 tx off rx off tso off gso off
> /sbin/ip link set dev ens3f0 txqueuelen 10000
> /sbin/ip link set dev ens3f1 txqueuelen 10000
> /sbin/ip link set dev bond2 txqueuelen 10000
> /sbin/ip link set dev Storage txqueuelen 10000
> ;;
> *)
> ;;
> esac
> exit 0 
> 
> i still have some latency issues, but my writes are up to 264MB/S sequential 
> on HDD's 
> 
> output of crystal diskmark on windows 10 vm 
> 
> Sequential Read (Q= 32,T= 1) :   688.536 MB/s
> Sequential Write (Q= 32,T= 1) :   264.254 MB/s
> Random Read 4KiB (Q=  8,T= 8) :   176.069 MB/s [  42985.6 IOPS]
> Random Write 4KiB (Q=  8,T= 8) :    63.217 MB/s [  15433.8 IOPS]
> Random Read 4KiB (Q= 32,T= 1) :   159.598 MB/s [  38964.4 IOPS]
> Random Write 4KiB (Q= 32,T= 1) :    54.212 MB/s [  13235.4 IOPS]
> Random Read 4KiB (Q=  1,T= 1) :     3.488 MB/s [    851.6 IOPS]
> Random Write 4KiB (Q=  1,T= 1) :     3.006 MB/s [    733.9 IOPS] 
> 
> also enabling libgfapi on the engine was the best performance option i ever 
> tweaked, easily doubled reads / writes 
> _______________________________________________
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/S7I3PQVERQZT6Q6CXDWJEWCY2ELEGRHY/


Also with all of that said, i've mostly solved the rest of my issues by
enabling performance.read-ahead on the gluster volume. I am saturating
my 10G network, which translates to 700MB/s reads, 350MB/s writes
(replica 2) 

just make sure your local read ahead settings on the bricks are sane,
I.E "blockdev --getra /dev/sdx", mine is 8192

-- 
Best regards, Leo David 

To be fair most of these are defaults, the ones i have changed from
defaults are. 

performance.read-ahead: on 

performance.stat-prefetch: on 

performance.flush-behind: on (pretty sure this was on by default, but i
explicitly set it) 

performance.client-io-threads: on 

performance.write-behind-window-size: 64MB (this was set to 1MB, but i
set to 64MB which is the size of a single shard in distributed replicate
mode) 

These are env specific, i have 48 cores / host so adding a few threads
to for this helped making things more consistent. 

server.event-threads: 4
client.event-threads: 8 

As far as NIC tuning, with gluster basically working exclusively with
large files you want some big buffers. also HTCP congestion protocol was
basically designed for this use case. In my case TCP offload on the nics
was hurting me, so i disabled it. Then uped the txqueuelength, again
because we are working with exclusively large files. 

The NIC tuning stuff is pretty hardware specific, i can't see ovirt devs
using them as defaults, especially since they would be really bad to do
on 1GB networks. The gluster settings also have some valid points.
stat-prefecth is off because at one point this used to corrupt data on
live migration. This was fixed in gluster, but appears to be a bit of a
leftover now. read-ahead can slow you down on 1GB networks.
client-io-threads may be a bad idea if you are really packing the hosts
up with VM's or have low core counts / no SMT. Write-behind windows are
dangerous on power loss, etc... 

The defaults from ovirt are fairly sane, and really only needed minimal
tweaking to get optimal performance.

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WVG2CPEYHZ6XLKGG5BZTLIBLA57PAHKU/

[ovirt-users] Re: Poor I/O Performance (again...)

Reply via email to