Re: [ovirt-users] Major Performance Issues with gluster
Try hitting the optimize for virt option in the volumes tab on oVirt for this volume. This might help with some of it, but that should have been done before you connected it as a storage domain. The sharding feature helps with performance, and so do some of the other options that are present on your other volumes. On Mon, Mar 19, 2018, 12:28 PM Jim Kusznirwrote: > Here's gluster volume info: > > [root@ovirt2 ~]# gluster volume info > > Volume Name: data > Type: Replicate > Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: ovirt1.nwfiber.com:/gluster/brick2/data > Brick2: ovirt2.nwfiber.com:/gluster/brick2/data > Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > server.allow-insecure: on > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 1 > cluster.shd-max-threads: 8 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > > Volume Name: data-hdd > Type: Replicate > Volume ID: d342a3ab-16f3-49f0-bbcf-f788be8ac5f1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: 172.172.1.11:/gluster/brick3/data-hdd > Brick2: 172.172.1.12:/gluster/brick3/data-hdd > Brick3: 172.172.1.13:/gluster/brick3/data-hdd > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > transport.address-family: inet > performance.readdir-ahead: on > > Volume Name: engine > Type: Replicate > Volume ID: 87ad86b9-d88b-457e-ba21-5d3173c612de > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: ovirt1.nwfiber.com:/gluster/brick1/engine > Brick2: ovirt2.nwfiber.com:/gluster/brick1/engine > Brick3: ovirt3.nwfiber.com:/gluster/brick1/engine (arbiter) > Options Reconfigured: > changelog.changelog: on > geo-replication.ignore-pid-check: on > geo-replication.indexing: on > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: off > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 1 > cluster.shd-max-threads: 6 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > > Volume Name: iso > Type: Replicate > Volume ID: b1ba15f5-0f0f-4411-89d0-595179f02b92 > Status: Started > Snapshot Count: 0 > Number of Bricks: 1 x (2 + 1) = 3 > Transport-type: tcp > Bricks: > Brick1: ovirt1.nwfiber.com:/gluster/brick4/iso > Brick2: ovirt2.nwfiber.com:/gluster/brick4/iso > Brick3: ovirt3.nwfiber.com:/gluster/brick4/iso (arbiter) > Options Reconfigured: > performance.readdir-ahead: on > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > cluster.eager-lock: enable > network.remote-dio: off > cluster.quorum-type: auto > cluster.server-quorum-type: server > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard: on > features.shard-block-size: 512MB > performance.low-prio-threads: 32 > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-wait-qlength: 1 > cluster.shd-max-threads: 6 > network.ping-timeout: 30 > user.cifs: off > nfs.disable: on > performance.strict-o-direct: on > > -- > > When I try and turn on profiling, I get: > > [root@ovirt2 ~]# gluster volume profile data-hdd start > Another transaction is in progress for data-hdd. Please try again after > sometime. > > I don't know what that other transaction is, but I am having some "odd > behavior" this morning, like a vm disk move between data and data-hdd that > stuck at 84% overnight. > > I've been asking on IRC how to "un-stick" this transfer, as the VM cannot > be started, and I can't seem to do anything about it. > > --Jim > > On Mon, Mar 19, 2018 at 2:14 AM, Sahina Bose wrote: > >> >> >> On Mon, Mar 19, 2018 at 7:39 AM, Jim Kusznir wrote: >> >>>
Re: [ovirt-users] Major Performance Issues with gluster
Here's gluster volume info: [root@ovirt2 ~]# gluster volume info Volume Name: data Type: Replicate Volume ID: e670c488-ac16-4dd1-8bd3-e43b2e42cc59 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick2/data Brick2: ovirt2.nwfiber.com:/gluster/brick2/data Brick3: ovirt3.nwfiber.com:/gluster/brick2/data (arbiter) Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on server.allow-insecure: on performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: enable cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 1 cluster.shd-max-threads: 8 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on Volume Name: data-hdd Type: Replicate Volume ID: d342a3ab-16f3-49f0-bbcf-f788be8ac5f1 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: 172.172.1.11:/gluster/brick3/data-hdd Brick2: 172.172.1.12:/gluster/brick3/data-hdd Brick3: 172.172.1.13:/gluster/brick3/data-hdd Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on transport.address-family: inet performance.readdir-ahead: on Volume Name: engine Type: Replicate Volume ID: 87ad86b9-d88b-457e-ba21-5d3173c612de Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick1/engine Brick2: ovirt2.nwfiber.com:/gluster/brick1/engine Brick3: ovirt3.nwfiber.com:/gluster/brick1/engine (arbiter) Options Reconfigured: changelog.changelog: on geo-replication.ignore-pid-check: on geo-replication.indexing: on performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 1 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on Volume Name: iso Type: Replicate Volume ID: b1ba15f5-0f0f-4411-89d0-595179f02b92 Status: Started Snapshot Count: 0 Number of Bricks: 1 x (2 + 1) = 3 Transport-type: tcp Bricks: Brick1: ovirt1.nwfiber.com:/gluster/brick4/iso Brick2: ovirt2.nwfiber.com:/gluster/brick4/iso Brick3: ovirt3.nwfiber.com:/gluster/brick4/iso (arbiter) Options Reconfigured: performance.readdir-ahead: on performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off cluster.eager-lock: enable network.remote-dio: off cluster.quorum-type: auto cluster.server-quorum-type: server storage.owner-uid: 36 storage.owner-gid: 36 features.shard: on features.shard-block-size: 512MB performance.low-prio-threads: 32 cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-wait-qlength: 1 cluster.shd-max-threads: 6 network.ping-timeout: 30 user.cifs: off nfs.disable: on performance.strict-o-direct: on -- When I try and turn on profiling, I get: [root@ovirt2 ~]# gluster volume profile data-hdd start Another transaction is in progress for data-hdd. Please try again after sometime. I don't know what that other transaction is, but I am having some "odd behavior" this morning, like a vm disk move between data and data-hdd that stuck at 84% overnight. I've been asking on IRC how to "un-stick" this transfer, as the VM cannot be started, and I can't seem to do anything about it. --Jim On Mon, Mar 19, 2018 at 2:14 AM, Sahina Bosewrote: > > > On Mon, Mar 19, 2018 at 7:39 AM, Jim Kusznir wrote: > >> Hello: >> >> This past week, I created a new gluster store, as I was running out of >> disk space on my main, SSD-backed storage pool. I used 2TB Seagate >> FireCuda drives (hybrid SSD/spinning). Hardware is Dell R610's with >> integral PERC/6i cards. I placed one disk per machine, exported the disk >> as a single disk volume from the raid controller, formatted it XFS, mounted >> it, and dedicated it to a new replica 3 gluster volume. >> >> Since doing so, I've been having major performance problems. One of my >> windows VMs sits at 100% disk utilization nearly continously, and its >> painful to do anything on it. A Zabbix install on CentOS using mysql as >> the backing has 70%+
Re: [ovirt-users] Major Performance Issues with gluster
On Mon, Mar 19, 2018 at 7:39 AM, Jim Kusznirwrote: > Hello: > > This past week, I created a new gluster store, as I was running out of > disk space on my main, SSD-backed storage pool. I used 2TB Seagate > FireCuda drives (hybrid SSD/spinning). Hardware is Dell R610's with > integral PERC/6i cards. I placed one disk per machine, exported the disk > as a single disk volume from the raid controller, formatted it XFS, mounted > it, and dedicated it to a new replica 3 gluster volume. > > Since doing so, I've been having major performance problems. One of my > windows VMs sits at 100% disk utilization nearly continously, and its > painful to do anything on it. A Zabbix install on CentOS using mysql as > the backing has 70%+ iowait nearly all the time, and I can't seem to get > graphs loaded from the web console. Its also always spewing errors that > ultimately come down to insufficient disk performance issues. > > All of this was working OK before the changes. There are two: > > Old storage was SSD backed, Replica 2 + arb, and running on the same GigE > network as management and main VM network. > > New storage was created using the dedicated Gluster network (running on > em4 on these servers, completely different subnet (174.x vs 192.x), and was > created replica 3 (no arb), on the FireCuda disks (seem to be the fastest I > could afford for non-SSD, as I needed a lot more storage). > > My attempts to watch so far have NOT shown maxed network interfaces (using > bwm-ng on the command line); in fact, the gluster interface is usually > below 20% utilized. > > I'm not sure how to meaningfully measure the performance of the disk > itself; I'm not sure what else to look at. My cluster is not very usable > currently, though. IOWait on my hosts appears to be below 0.5%, usually > 0.0 to 0.1. Inside the VMs is a whole different story. > > My cluster is currently running ovirt 4.1. I'm interested in going to > 4.2, but I think I need to fix this first. > Can you provide the info of the volume using "gluster volume info" and also profile the volume while running the tests where you experience the performance issue, and share results? For info on how to profile (server-side profiling) - https://docs.gluster.org/en/latest/Administrator%20Guide/Performance%20Testing/ > Thanks! > --Jim > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users