[ovirt-users] Re: Tuning and testing GlusterFS performance

William Dossett Sat, 04 Aug 2018 07:34:53 -0700

Interesting… is it doing encryption by default?  That could slow it down for 
sure…

From: Jayme <jay...@gmail.com> 
Sent: Saturday, August 4, 2018 7:37 AM
To: Sahina Bose <sab...@redhat.com>
Cc: users <users@ovirt.org>
Subject: [ovirt-users] Re: Tuning and testing GlusterFS performance

Scratch that, there are actually a couple subtle changes, I did a diff to 
compare:

< server.allow-insecure: on

29c27

< network.remote-dio: enable

---

> network.remote-dio: off

On Sat, Aug 4, 2018 at 10:34 AM, Jayme <jay...@gmail.com 
<mailto:jay...@gmail.com> > wrote:

One more interesting thing to note.  As a test I just re-ran DD on engine VM 
and got around 3-5MB/sec average writes.  I had not previous set this volume to 
optimize for virt store.  So I went ahead and set that option and now I'm 
getting 50MB/sec writes. 

However, if I compare my gluster engine volume info now VS what I just posted 
in above reply before I made the optimize change all the gluster options are 
identical, not one value is changed as far as I can see.  What is the optimize 
for virt store option in the admin GUI doing exactly?

On Sat, Aug 4, 2018 at 10:29 AM, Jayme <jay...@gmail.com 
<mailto:jay...@gmail.com> > wrote:

One more note on this.  I only set optimize for virt on data volumes.  I did 
not and wasn't sure if I should set on engine volume.  My DD tests on engine VM 
are writing at ~8Mb/sec (like my test VM on data volume was before I made the 
change).  Is it recommended to use the optimize for virt on the engine volume 
as well?  

On Sat, Aug 4, 2018 at 10:26 AM, Jayme <jay...@gmail.com 
<mailto:jay...@gmail.com> > wrote:

Interesting that it should have been set by cockpit but seemingly wasn't (at 
least it did not appear so in my case, as setting optimize for virt increased 
performance dramatically).  I did indeed use the cockpit to deploy.  I was 
using ovirt node on all three host, recent download/burn of 4.2.5.  Here is my 
current gluster volume info if it's helpful to anyone:

Volume Name: data

Type: Replicate

Volume ID: 1428c3d3-8a51-4e45-a7bb-86b3bde8b6ea

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: MASKED:/gluster_bricks/data/data

Brick2: MASKED:/gluster_bricks/data/data

Brick3: MASKED:/gluster_bricks/data/data

Options Reconfigured:

features.barrier: disable

server.allow-insecure: on

cluster.granular-entry-heal: enable

performance.strict-o-direct: on

network.ping-timeout: 30

storage.owner-gid: 36

storage.owner-uid: 36

user.cifs: off

features.shard: on

cluster.shd-wait-qlength: 10000

cluster.shd-max-threads: 8

cluster.locking-scheme: granular

cluster.data-self-heal-algorithm: full

cluster.server-quorum-type: server

cluster.quorum-type: auto

cluster.eager-lock: enable

network.remote-dio: enable

performance.low-prio-threads: 32

performance.io-cache: off

performance.read-ahead: off

performance.quick-read: off

transport.address-family: inet

nfs.disable: on

performance.client-io-threads: off

Volume Name: data2

Type: Replicate

Volume ID: e97a2e9c-cd47-4f18-b2c2-32d917a8c016

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: MASKED:/gluster_bricks/data2/data2

Brick2: MASKED:/gluster_bricks/data2/data2

Brick3: MASKED:/gluster_bricks/data2/data2

Options Reconfigured:

server.allow-insecure: on

cluster.granular-entry-heal: enable

performance.strict-o-direct: on

network.ping-timeout: 30

storage.owner-gid: 36

storage.owner-uid: 36

user.cifs: off

features.shard: on

cluster.shd-wait-qlength: 10000

cluster.shd-max-threads: 8

cluster.locking-scheme: granular

cluster.data-self-heal-algorithm: full

cluster.server-quorum-type: server

cluster.quorum-type: auto

cluster.eager-lock: enable

network.remote-dio: enable

performance.low-prio-threads: 32

performance.io-cache: off

performance.read-ahead: off

performance.quick-read: off

transport.address-family: inet

nfs.disable: on

performance.client-io-threads: off

Volume Name: engine

Type: Replicate

Volume ID: ae465791-618c-4075-b68c-d4972a36d0b9

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: MASKED:/gluster_bricks/engine/engine

Brick2: MASKED:/gluster_bricks/engine/engine

Brick3: MASKED:/gluster_bricks/engine/engine

Options Reconfigured:

cluster.granular-entry-heal: enable

performance.strict-o-direct: on

network.ping-timeout: 30

storage.owner-gid: 36

storage.owner-uid: 36

user.cifs: off

features.shard: on

cluster.shd-wait-qlength: 10000

cluster.shd-max-threads: 8

cluster.locking-scheme: granular

cluster.data-self-heal-algorithm: full

cluster.server-quorum-type: server

cluster.quorum-type: auto

cluster.eager-lock: enable

network.remote-dio: off

performance.low-prio-threads: 32

performance.io-cache: off

performance.read-ahead: off

performance.quick-read: off

transport.address-family: inet

nfs.disable: on

performance.client-io-threads: off

Volume Name: vmstore

Type: Replicate

Volume ID: 7065742b-c09d-410b-9e89-174ade4fc3f5

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: MASKED:/gluster_bricks/vmstore/vmstore

Brick2: MASKED:/gluster_bricks/vmstore/vmstore

Brick3: MASKED:/gluster_bricks/vmstore/vmstore

Options Reconfigured:

cluster.granular-entry-heal: enable

performance.strict-o-direct: on

network.ping-timeout: 30

storage.owner-gid: 36

storage.owner-uid: 36

user.cifs: off

features.shard: on

cluster.shd-wait-qlength: 10000

cluster.shd-max-threads: 8

cluster.locking-scheme: granular

cluster.data-self-heal-algorithm: full

cluster.server-quorum-type: server

cluster.quorum-type: auto

cluster.eager-lock: enable

network.remote-dio: off

performance.low-prio-threads: 32

performance.io-cache: off

performance.read-ahead: off

performance.quick-read: off

transport.address-family: inet

nfs.disable: on

performance.client-io-threads: off

Volume Name: vmstore2

Type: Replicate

Volume ID: 6f9a1c51-c0bc-46ad-b94a-fc2989a36e0c

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x 3 = 3

Transport-type: tcp

Bricks:

Brick1: MASKED:/gluster_bricks/vmstore2/vmstore2

Brick2: MASKED:/gluster_bricks/vmstore2/vmstore2

Brick3: MASKED:/gluster_bricks/vmstore2/vmstore2

Options Reconfigured:

cluster.granular-entry-heal: enable

performance.strict-o-direct: on

network.ping-timeout: 30

storage.owner-gid: 36

storage.owner-uid: 36

user.cifs: off

features.shard: on

cluster.shd-wait-qlength: 10000

cluster.shd-max-threads: 8

cluster.locking-scheme: granular

cluster.data-self-heal-algorithm: full

cluster.server-quorum-type: server

cluster.quorum-type: auto

cluster.eager-lock: enable

network.remote-dio: off

performance.low-prio-threads: 32

performance.io-cache: off

performance.read-ahead: off

performance.quick-read: off

transport.address-family: inet

nfs.disable: on

performance.client-io-threads: off

On Fri, Aug 3, 2018 at 6:53 AM, Sahina Bose <sab...@redhat.com 
<mailto:sab...@redhat.com> > wrote:

On Fri, 3 Aug 2018 at 3:07 PM, Jayme <jay...@gmail.com 
<mailto:jay...@gmail.com> > wrote:

Hello,

The option to optimize for virt store is tough to find (in my opinion) you have 
to go to volumes > volume name and then click the two dots to expand further 
options in the top right to see it.  No one would know to find it (or that it 
even exists) if they weren't specifically looking. 

I don't know enough about it but my assumption is that there are reasons why 
it's not set by default (as it might or should not need to apply to ever volume 
created), however my suggestion would be that it be included in the cockpit as 
a selectable option next to each volume you create with a hint to suggest that 
for best performance select it for any volume that is going to be a data volume 
for VMs

If you have installed via Cockpit, the options are set.

Can you provide the “gluster volume info “ output  after you optimised for virt?

. 

I simply installed using the latest node ISO / default cockpit deployment.

Hope this helps!

- Jayme

On Fri, Aug 3, 2018 at 5:15 AM, Sahina Bose <sab...@redhat.com 
<mailto:sab...@redhat.com> > wrote:

On Fri, Aug 3, 2018 at 5:25 AM, Jayme <jay...@gmail.com 
<mailto:jay...@gmail.com> > wrote:

Bill,

I thought I'd let you (and others know this) as it might save you some 
headaches.  I found that my performance problem was resolved by clicking 
"optimize for virt store" option in the volume settings of the hosted engine 
(for the data volume).  Doing this one change has increased my I/O performance 
by 10x alone.  I don't know why this would not be set or recommended by default 
but I'm glad I found it!

Thanks for the feedback, Could you log a bug to make it default by providing 
the user flow that you used.

Also, I would be interested to know how you prepared the gluster volume for use 
- if it was using the Cockpit deployment UI, the volume options would have been 
set by default.

- James

On Thu, Aug 2, 2018 at 2:32 PM, William Dossett <william.doss...@gmail.com 
<mailto:william.doss...@gmail.com> > wrote:

Yeah, I am just ramping up here, but this project is mostly on my own time and 
money, hence no SSDs for Gluster… I’ve already blown close to $500 of my own 
money on 10Gb ethernet cards and SFPs on ebay as my company frowns on us 
getting good deals for equipment on ebay and would rather go to their preferred 
supplier – where $500 wouldn’t even buy half a 10Gb CNA ☹  but I believe in 
this project and it feels like it is getting ready for showtime – if I can demo 
this in a few weeks and get some interest I’ll be asking them to reimburse me, 
that’s for sure!

Hopefully going to get some of the other work off my plate and work on this 
later this afternoon, will let you know any findings.

Regards

Bill

From: Jayme <jay...@gmail.com <mailto:jay...@gmail.com> > 
Sent: Thursday, August 2, 2018 11:07 AM
To: William Dossett <william.doss...@gmail.com 
<mailto:william.doss...@gmail.com> >
Cc: users <users@ovirt.org <mailto:users@ovirt.org> >
Subject: Re: [ovirt-users] Tuning and testing GlusterFS performance

Bill,

Appreciate the feedback and would be interested to hear some of your results.  
I'm a bit worried about what i'm seeing so far on a very stock 3 node HCI 
setup.  8mb/sec on that dd test mentioned in the original post from within a VM 
(which may be explained by bad testing methods or some other configuration 
considerations).. but what is more worrisome to me is that I tried another dd 
test to time creating a 32GB file, it was taking a long time so I exited the 
process and the VM basically locked up on me, I couldn't access it or the 
console and eventually had to do a hard shutdown of the VM to recover.  

I don't plan to host many VMs, probably around 15.  They aren't super demanding 
servers but some do read/write big directories such as working with github 
repos and large node_module folders, rsyncs of fairly large dirs etc.  I'm 
definitely going to have to do a lot more testing before I can be assured 
enough to put any important VMs on this cluster.

- James

On Thu, Aug 2, 2018 at 1:54 PM, William Dossett <william.doss...@gmail.com 
<mailto:william.doss...@gmail.com> > wrote:

I usually look at IOPs using IOMeter… you usually want several workers running 
reads and writes in different threads at the same time.   You can run Dynamo on 
a Linux instance and then connect it to a window GUI running IOMeter to give 
you stats.  I was getting around 250 IOPs on JBOD sata 7200rpm drives which 
isn’t bad for cheap and cheerful sata drives.

As I said, I’ve worked with HCI in VMware now for a couple of years, intensely 
this last year when we had some defective Dell hardware and trying to diagnose 
the problem.  Since then the hardware has been completely replaced with all 
flash solution.   So when I got the all flash solution I used IOmeter on it and 
was only getting around 3000 IOPs on enterprise flash disks… not exactly 
stellar, but OK for one VM.  The trick there was the scale out.  There is a 
VMware Fling call HCI Bench.  Its very cool in that you spin up one VM and then 
it spawns 40 more VMs across the cluster.  I  could then use VSAN observer and 
it showed my hosts were actually doing 30K IOPs on average which is absolutely 
stellar performance.  

Anyway, moral of the story there was that your one VM may seem like its quick, 
but not what you would expect from flash…   but as you add more VMs in the 
cluster and they are all doing workloads, it scales out beautifully and the 
read/write speed does not slow down as you add more loads.  I’m hoping that’s 
what we are going to see with Gluster.

Also, you are using mb nomenclature below, is that Mb, or MB?  I am sort of 
assuming MB megabytes per second…  it does not seem very fast.  I’m probably 
not going to get to work more on my cluster today as I’ve got other projects 
that I need to get done on time, but I want to try and get some templates up 
and running and do some more testing either tomorrow or this weekend and see 
what I get in just basic writing MB/s and let you know.

Regards

Bill

From: Jayme <jay...@gmail.com <mailto:jay...@gmail.com> > 
Sent: Thursday, August 2, 2018 8:12 AM
To: users <users@ovirt.org <mailto:users@ovirt.org> >
Subject: [ovirt-users] Tuning and testing GlusterFS performance

So I've finally completed my first HCI build using the below configuration:

3x

Dell PowerEdge R720

2x 2.9 GHz 8 Core E5-2690

256GB RAM

2x250gb SSD Raid 1 (boot/os)

2x2TB SSD jbod passthrough (used for gluster bricks)

1Gbe Nic for management 10Gbe nic for Gluster

Using Replica 3 with no arbiter. 

Installed the latest version of oVirt available at the time 4.2.5.  Created 
recommended volumes (with an additional data volume on second SSD). Not using 
VDO

First thing I did was setup glusterFS network on 10Gbe and set it to be used 
for glusterFS and migration traffic. 

I've setup a single test VM using Centos7 minimal on the default "x-large 
instance" profile. 

Within this VM if I do very basic write test using something like:

dd bs=1M count=256 if=/dev/zero of=test conv=fdatasync

I'm seeing quite slow speeds, only 8mb/sec.  

If I do the same from one of the hosts gluster mounts i.e.

host1: /rhev/data-center/mnt/glusterSD/HOST:data 

I get about 30mb/sec (which still seems fairly low?)

Am I testing incorrectly here?  Is there anything I should be tuning on the 
Gluster volumes to increase performance with SSDs?  Where can I find out where 
the bottle neck is here, or is this expected performance of Gluster? 

_______________________________________________
Users mailing list -- users@ovirt.org <mailto:users@ovirt.org> 
To unsubscribe send an email to users-le...@ovirt.org 
<mailto:users-le...@ovirt.org> 
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/JKUWFXIZOWQ42JFBIJLFJGBKISY7OMPV/

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VFCNSW67SN37MCT7466JGIFWBPSLT3PR/

[ovirt-users] Re: Tuning and testing GlusterFS performance

Reply via email to