Re: [ovirt-users] Major Performance Issues with gluster

2018-03-19 Thread Donny Davis
Try hitting the optimize for virt option in the volumes tab on oVirt for
this volume.

This might help with some of it, but that should have been done before you
connected it as a storage domain. The sharding feature helps with
performance, and so do some of the other options that are present on your
other volumes.

On Mon, Mar 19, 2018, 12:28 PM Jim Kusznir  wrote:

> Here's gluster volume info:
>
> [root@ovirt2 ~]# gluster volume info
>
> Volume Name: data
> Type: Replicate
> Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
> Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
> Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
> Options Reconfigured:
> changelog.changelog: on
> geo-replication.ignore-pid-check: on
> geo-replication.indexing: on
> server.allow-insecure: on
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: enable
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 8
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
>
> Volume Name: data-hdd
> Type: Replicate
> Volume ID: d342a3ab-16f3-49f0-bbcf-f788be8ac5f1
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: 172.172.1.11:/gluster/brick3/data-hdd
> Brick2: 172.172.1.12:/gluster/brick3/data-hdd
> Brick3: 172.172.1.13:/gluster/brick3/data-hdd
> Options Reconfigured:
> changelog.changelog: on
> geo-replication.ignore-pid-check: on
> geo-replication.indexing: on
> transport.address-family: inet
> performance.readdir-ahead: on
>
> Volume Name: engine
> Type: Replicate
> Volume ID: 87ad86b9-d88b-457e-ba21-5d3173c612de
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt1.nwfiber.com:/gluster/brick1/engine
> Brick2: ovirt2.nwfiber.com:/gluster/brick1/engine
> Brick3: ovirt3.nwfiber.com:/gluster/brick1/engine (arbiter)
> Options Reconfigured:
> changelog.changelog: on
> geo-replication.ignore-pid-check: on
> geo-replication.indexing: on
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
>
> Volume Name: iso
> Type: Replicate
> Volume ID: b1ba15f5-0f0f-4411-89d0-595179f02b92
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: ovirt1.nwfiber.com:/gluster/brick4/iso
> Brick2: ovirt2.nwfiber.com:/gluster/brick4/iso
> Brick3: ovirt3.nwfiber.com:/gluster/brick4/iso (arbiter)
> Options Reconfigured:
> performance.readdir-ahead: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> cluster.eager-lock: enable
> network.remote-dio: off
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> storage.owner-uid: 36
> storage.owner-gid: 36
> features.shard: on
> features.shard-block-size: 512MB
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 6
> network.ping-timeout: 30
> user.cifs: off
> nfs.disable: on
> performance.strict-o-direct: on
>
> --
>
> When I try and turn on profiling, I get:
>
> [root@ovirt2 ~]# gluster volume profile data-hdd start
> Another transaction is in progress for data-hdd. Please try again after
> sometime.
>
> I don't know what that other transaction is, but I am having some "odd
> behavior" this morning, like a vm disk move between data and data-hdd that
> stuck at 84% overnight.
>
> I've been asking on IRC how to "un-stick" this transfer, as the VM cannot
> be started, and I can't seem to do anything about it.
>
> --Jim
>
> On Mon, Mar 19, 2018 at 2:14 AM, Sahina Bose  wrote:
>
>>
>>
>> On Mon, Mar 19, 2018 at 7:39 AM, Jim Kusznir  wrote:
>>
>>> 

Re: [ovirt-users] Major Performance Issues with gluster

2018-03-19 Thread Jim Kusznir
Here's gluster volume info:

[root@ovirt2 ~]# gluster volume info

Volume Name: data
Type: Replicate
Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ovirt1.nwfiber.com:/gluster/brick2/data
Brick2: ovirt2.nwfiber.com:/gluster/brick2/data
Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter)
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
server.allow-insecure: on
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 8
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on

Volume Name: data-hdd
Type: Replicate
Volume ID: d342a3ab-16f3-49f0-bbcf-f788be8ac5f1
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 172.172.1.11:/gluster/brick3/data-hdd
Brick2: 172.172.1.12:/gluster/brick3/data-hdd
Brick3: 172.172.1.13:/gluster/brick3/data-hdd
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
transport.address-family: inet
performance.readdir-ahead: on

Volume Name: engine
Type: Replicate
Volume ID: 87ad86b9-d88b-457e-ba21-5d3173c612de
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ovirt1.nwfiber.com:/gluster/brick1/engine
Brick2: ovirt2.nwfiber.com:/gluster/brick1/engine
Brick3: ovirt3.nwfiber.com:/gluster/brick1/engine (arbiter)
Options Reconfigured:
changelog.changelog: on
geo-replication.ignore-pid-check: on
geo-replication.indexing: on
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on

Volume Name: iso
Type: Replicate
Volume ID: b1ba15f5-0f0f-4411-89d0-595179f02b92
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: ovirt1.nwfiber.com:/gluster/brick4/iso
Brick2: ovirt2.nwfiber.com:/gluster/brick4/iso
Brick3: ovirt3.nwfiber.com:/gluster/brick4/iso (arbiter)
Options Reconfigured:
performance.readdir-ahead: on
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: off
cluster.quorum-type: auto
cluster.server-quorum-type: server
storage.owner-uid: 36
storage.owner-gid: 36
features.shard: on
features.shard-block-size: 512MB
performance.low-prio-threads: 32
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-wait-qlength: 1
cluster.shd-max-threads: 6
network.ping-timeout: 30
user.cifs: off
nfs.disable: on
performance.strict-o-direct: on

--

When I try and turn on profiling, I get:

[root@ovirt2 ~]# gluster volume profile data-hdd start
Another transaction is in progress for data-hdd. Please try again after
sometime.

I don't know what that other transaction is, but I am having some "odd
behavior" this morning, like a vm disk move between data and data-hdd that
stuck at 84% overnight.

I've been asking on IRC how to "un-stick" this transfer, as the VM cannot
be started, and I can't seem to do anything about it.

--Jim

On Mon, Mar 19, 2018 at 2:14 AM, Sahina Bose  wrote:

>
>
> On Mon, Mar 19, 2018 at 7:39 AM, Jim Kusznir  wrote:
>
>> Hello:
>>
>> This past week, I created a new gluster store, as I was running out of
>> disk space on my main, SSD-backed storage pool.  I used 2TB Seagate
>> FireCuda drives (hybrid SSD/spinning).  Hardware is Dell R610's with
>> integral PERC/6i cards.  I placed one disk per machine, exported the disk
>> as a single disk volume from the raid controller, formatted it XFS, mounted
>> it, and dedicated it to a new replica 3 gluster volume.
>>
>> Since doing so, I've been having major performance problems.  One of my
>> windows VMs sits at 100% disk utilization nearly continously, and its
>> painful to do anything on it.  A Zabbix install on CentOS using mysql as
>> the backing has 70%+ 

Re: [ovirt-users] Major Performance Issues with gluster

2018-03-19 Thread Sahina Bose
On Mon, Mar 19, 2018 at 7:39 AM, Jim Kusznir  wrote:

> Hello:
>
> This past week, I created a new gluster store, as I was running out of
> disk space on my main, SSD-backed storage pool.  I used 2TB Seagate
> FireCuda drives (hybrid SSD/spinning).  Hardware is Dell R610's with
> integral PERC/6i cards.  I placed one disk per machine, exported the disk
> as a single disk volume from the raid controller, formatted it XFS, mounted
> it, and dedicated it to a new replica 3 gluster volume.
>
> Since doing so, I've been having major performance problems.  One of my
> windows VMs sits at 100% disk utilization nearly continously, and its
> painful to do anything on it.  A Zabbix install on CentOS using mysql as
> the backing has 70%+ iowait nearly all the time, and I can't seem to get
> graphs loaded from the web console.  Its also always spewing errors that
> ultimately come down to insufficient disk performance issues.
>
> All of this was working OK before the changes.  There are two:
>
> Old storage was SSD backed, Replica 2 + arb, and running on the same GigE
> network as management and main VM network.
>
> New storage was created using the dedicated Gluster network (running on
> em4 on these servers, completely different subnet (174.x vs 192.x), and was
> created replica 3 (no arb), on the FireCuda disks (seem to be the fastest I
> could afford for non-SSD, as I needed a lot more storage).
>
> My attempts to watch so far have NOT shown maxed network interfaces (using
> bwm-ng on the command line); in fact, the gluster interface is usually
> below 20% utilized.
>
> I'm not sure how to meaningfully measure the performance of the disk
> itself; I'm not sure what else to look at.  My cluster is not very usable
> currently, though.  IOWait on my hosts appears to be below 0.5%, usually
> 0.0 to 0.1.  Inside the VMs is a whole different story.
>
> My cluster is currently running ovirt 4.1.  I'm interested in going to
> 4.2, but I think I need to fix this first.
>


Can you provide the info of the volume using "gluster volume info" and also
profile the volume while running the tests where you experience the
performance issue, and share results?

For info on how to profile (server-side profiling) -
https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/


> Thanks!
> --Jim
>
> ___
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users