[ovirt-users] Re: Poor I/O Performance (again...)

Leo David Mon, 15 Apr 2019 11:12:18 -0700

Thank you again Alex,
It makes a lot of sense now, with this detailed explanation.


On Mon, Apr 15, 2019, 20:25 Alex McWhirter <[email protected]> wrote:

> On 2019-04-15 13:08, Leo David wrote:
>
> Thank you Alex !
> I will try these performance settings.
> If someone from the dev guys could validate and recommend those as a good
> standard configuration, it would be just great.
> If they are ok,  wouldn't be a nice to have them applied from within UI
> with the "Optimize for VirtStore"  button ?
> Thnak you !
>
> On Mon, Apr 15, 2019 at 7:39 PM Alex McWhirter <[email protected]> wrote:
>
>> On 2019-04-14 23:22, Leo David wrote:
>>
>> Hi,
>> Thank you Alex, I was looking for some optimisation settings as well,
>> since I am pretty much in the same boat, using ssd based
>> replicate-distributed volumes across 12 hosts.
>> Could anyone else (maybe even from from ovirt or rhev team) validate
>> these settings or add some other tweaks as well, so we can use them as
>> standard ?
>> Thank you very much again !
>>
>> On Mon, Apr 15, 2019, 05:56 Alex McWhirter <[email protected]> wrote:
>>
>>> On 2019-04-14 20:27, Jim Kusznir wrote:
>>>
>>> Hi all:
>>> I've had I/O performance problems pretty much since the beginning of
>>> using oVirt.  I've applied several upgrades as time went on, but strangely,
>>> none of them have alleviated the problem.  VM disk I/O is still very slow
>>> to the point that running VMs is often painful; it notably affects nearly
>>> all my VMs, and makes me leary of starting any more.  I'm currently running
>>> 12 VMs and the hosted engine on the stack.
>>> My configuration started out with 1Gbps networking and hyperconverged
>>> gluster running on a single SSD on each node.  It worked, but I/O was
>>> painfully slow.  I also started running out of space, so I added an SSHD on
>>> each node, created another gluster volume, and moved VMs over to it.  I
>>> also ran that on a dedicated 1Gbps network.  I had recurring disk failures
>>> (seems that disks only lasted about 3-6 months; I warrantied all three at
>>> least once, and some twice before giving up).  I suspect the Dell PERC 6/i
>>> was partly to blame; the raid card refused to see/acknowledge the disk, but
>>> plugging it into a normal PC showed no signs of problems.  In any case,
>>> performance on that storage was notably bad, even though the gig-e
>>> interface was rarely taxed.
>>> I put in 10Gbps ethernet and moved all the storage on that none the
>>> less, as several people here said that 1Gbps just wasn't fast enough.  Some
>>> aspects improved a bit, but disk I/O is still slow.  And I was still having
>>> problems with the SSHD data gluster volume eating disks, so I bought a
>>> dedicated NAS server (supermicro 12 disk dedicated FreeNAS NFS storage
>>> system on 10Gbps ethernet).  Set that up.  I found that it was actually
>>> FASTER than the SSD-based gluster volume, but still slow.  Lately its been
>>> getting slower, too...Don't know why.  The FreeNAS server reports network
>>> loads around 4MB/s on its 10Gbe interface, so its not network constrained.
>>> At 4MB/s, I'd sure hope the 12 spindle SAS interface wasn't constrained
>>> either.....  (and disk I/O operations on the NAS itself complete much
>>> faster).
>>> So, running a test on my NAS against an ISO file I haven't accessed in
>>> months:
>>>  # dd
>>> if=en_windows_server_2008_r2_standard_enterprise_datacenter_and_web_x64_dvd_x15-59754.iso
>>> of=/dev/null bs=1024k count=500
>>>
>>> 500+0 records in
>>> 500+0 records out
>>> 524288000 bytes transferred in 2.459501 secs (213168465 bytes/sec)
>>> Running it on one of my hosts:
>>> root@unifi:/home/kusznir# time dd if=/dev/sda of=/dev/null bs=1024k
>>> count=500
>>> 500+0 records in
>>> 500+0 records out
>>> 524288000 bytes (524 MB, 500 MiB) copied, 7.21337 s, 72.7 MB/s
>>> (I don't know if this is a true apples to apples comparison, as I don't
>>> have a large file inside this VM's image).  Even this is faster than I
>>> often see.
>>> I have a VoIP Phone server running as a VM.  Voicemail and other
>>> recordings usually fail due to IO issues opening and writing the files.
>>> Often, the first 4 or so seconds of the recording is missed; sometimes the
>>> entire thing just fails.  I didn't use to have this problem, but its
>>> definately been getting worse.  I finally bit the bullet and ordered a
>>> physical server dedicated for my VoIP System...But I still want to figure
>>> out why I'm having all these IO problems.  I read on the list of people
>>> running 30+ VMs...I feel that my IO can't take any more VMs with any
>>> semblance of reliability.  We have a Quickbooks server on here too
>>> (windows), and the performance is abysmal; my CPA is charging me extra
>>> because of all the lost staff time waiting on the system to respond and
>>> generate reports.....
>>> I'm at my whits end...I started with gluster on SSD with 1Gbps network,
>>> migrated to 10Gbps network, and now to dedicated high performance NAS box
>>> over NFS, and still have performance issues.....I don't know how to
>>> troubleshoot the issue any further, but I've never had these kinds of
>>> issues when I was playing with other VM technologies.  I'd like to get to
>>> the point where I can resell virtual servers to customers, but I can't do
>>> so with my current performance levels.
>>> I'd greatly appreciate help troubleshooting this further.
>>> --Jim
>>>
>>> _______________________________________________
>>> Users mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/[email protected]/message/ZR64VABNT2SGKLNP3XNTHCGFZXSOJAQF/
>>>
>>> Been working on optimizing the same. This is where im at currently.
>>>
>>> Gluster volume settings.
>>>
>>> diagnostics.count-fop-hits: on
>>> diagnostics.latency-measurement: on
>>> performance.write-behind-window-size: 64MB
>>> performance.flush-behind: on
>>> performance.stat-prefetch: on
>>> server.event-threads: 4
>>> client.event-threads: 8
>>> performance.io-thread-count: 32
>>> network.ping-timeout: 30
>>> cluster.granular-entry-heal: enable
>>> performance.strict-o-direct: on
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> features.shard: on
>>> cluster.shd-wait-qlength: 10000
>>> cluster.shd-max-threads: 8
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> cluster.eager-lock: enable
>>> network.remote-dio: off
>>> performance.low-prio-threads: 32
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> auth.allow: *
>>> user.cifs: off
>>> transport.address-family: inet
>>> nfs.disable: off
>>> performance.client-io-threads: on
>>>
>>> sysctl options
>>>
>>> net.core.rmem_max = 134217728
>>> net.core.wmem_max = 134217728
>>> net.ipv4.tcp_rmem = 4096 87380 134217728
>>> net.ipv4.tcp_wmem = 4096 65536 134217728
>>> net.core.netdev_max_backlog = 300000
>>> net.ipv4.tcp_moderate_rcvbuf =1
>>> net.ipv4.tcp_no_metrics_save = 1
>>> net.ipv4.tcp_congestion_control=htcp
>>>
>>> custom /sbin/ifup-local file, Storage is the bridge name, which ==
>>> ens3f0/1 in bond2
>>>
>>> #!/bin/bash
>>> case "$1" in
>>>   Storage)
>>>     /sbin/ethtool -K ens3f0 tx off rx off tso off gso off
>>>     /sbin/ethtool -K ens3f1 tx off rx off tso off gso off
>>>     /sbin/ip link set dev ens3f0 txqueuelen 10000
>>>     /sbin/ip link set dev ens3f1 txqueuelen 10000
>>>     /sbin/ip link set dev bond2 txqueuelen 10000
>>>     /sbin/ip link set dev Storage txqueuelen 10000
>>>   ;;
>>>   *)
>>>   ;;
>>> esac
>>> exit 0
>>>
>>> i still have some latency issues, but my writes are up to 264MB/S
>>> sequential on HDD's
>>>
>>> output of crystal diskmark on windows 10 vm
>>>
>>>    Sequential Read (Q= 32,T= 1) :   688.536 MB/s
>>>   Sequential Write (Q= 32,T= 1) :   264.254 MB/s
>>>   Random Read 4KiB (Q=  8,T= 8) :   176.069 MB/s [  42985.6 IOPS]
>>>  Random Write 4KiB (Q=  8,T= 8) :    63.217 MB/s [  15433.8 IOPS]
>>>   Random Read 4KiB (Q= 32,T= 1) :   159.598 MB/s [  38964.4 IOPS]
>>>  Random Write 4KiB (Q= 32,T= 1) :    54.212 MB/s [  13235.4 IOPS]
>>>   Random Read 4KiB (Q=  1,T= 1) :     3.488 MB/s [    851.6 IOPS]
>>>  Random Write 4KiB (Q=  1,T= 1) :     3.006 MB/s [    733.9 IOPS]
>>>
>>> also enabling libgfapi on the engine was the best performance option i
>>> ever tweaked, easily doubled reads / writes
>>> _______________________________________________
>>> Users mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>>> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/[email protected]/message/S7I3PQVERQZT6Q6CXDWJEWCY2ELEGRHY/
>>
>>
>>
>> Also with all of that said, i've mostly solved the rest of my issues by
>> enabling performance.read-ahead on the gluster volume. I am saturating my
>> 10G network, which translates to 700MB/s reads, 350MB/s writes (replica 2)
>>
>> just make sure your local read ahead settings on the bricks are sane, I.E
>> "blockdev --getra /dev/sdx", mine is 8192
>>
>>
>
>
> --
> Best regards, Leo David
>
>
>
> To be fair most of these are defaults, the ones i have changed from
> defaults are.
>
> performance.read-ahead: on
>
> performance.stat-prefetch: on
>
> performance.flush-behind: on (pretty sure this was on by default, but i
> explicitly set it)
>
> performance.client-io-threads: on
>
> performance.write-behind-window-size: 64MB (this was set to 1MB, but i set
> to 64MB which is the size of a single shard in distributed replicate mode)
>
>
>
> These are env specific, i have 48 cores / host so adding a few threads to
> for this helped making things more consistent.
>
> server.event-threads: 4
> client.event-threads: 8
>
>
>
> As far as NIC tuning, with gluster basically working exclusively with
> large files you want some big buffers. also HTCP congestion protocol was
> basically designed for this use case. In my case TCP offload on the nics
> was hurting me, so i disabled it. Then uped the txqueuelength, again
> because we are working with exclusively large files.
>
>
>
> The NIC tuning stuff is pretty hardware specific, i can't see ovirt devs
> using them as defaults, especially since they would be really bad to do on
> 1GB networks. The gluster settings also have some valid points.
> stat-prefecth is off because at one point this used to corrupt data on live
> migration. This was fixed in gluster, but appears to be a bit of a leftover
> now. read-ahead can slow you down on 1GB networks. client-io-threads may be
> a bad idea if you are really packing the hosts up with VM's or have low
> core counts / no SMT. Write-behind windows are dangerous on power loss,
> etc...
>
> The defaults from ovirt are fairly sane, and really only needed minimal
> tweaking to get optimal performance.
>
>

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/UYUAODRS4VIAR2ZQMG3G43XBK2EZXJEM/

[ovirt-users] Re: Poor I/O Performance (again...)

Reply via email to