Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
On 21/06/17 11:18, Chris Boot wrote: > Thanks for your input. I have yet to run any benchmarks, but I'll do > that once I have a bit more time to work on this. Is there a particular benchmark test that I should run to gather some stats for this? Would certain tests be more useful than others? Thanks, Chris -- Chris Boot bo...@bootc.net ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
[replying to lists this time] On 20/06/17 11:23, Krutika Dhananjay wrote: > Couple of things: > > 1. Like Darrell suggested, you should enable stat-prefetch and increase > client and server event threads to 4. > # gluster volume set performance.stat-prefetch on > # gluster volume set client.event-threads 4 > # gluster volume set server.event-threads 4 > > 2. Also glusterfs-3.10.1 and above has a shard performance bug fix - > https://review.gluster.org/#/c/16966/ > > With these two changes, we saw great improvement in performance in our > internal testing. Hi Krutika, Thanks for your input. I have yet to run any benchmarks, but I'll do that once I have a bit more time to work on this. I've tweaked the options as you suggest, but that doesn't seem to have made an appreciable difference. I admit that without benchmarks it's a bit like sticking your finger in the air, though. Do I need to restart my bricks and/or remount the volumes for these to take effect? I'm actually running GlusterFS 3.10.2-1. This is all coming from the CentOS Storage SIG's centos-release-gluster310 repository. Thanks again. Chris -- Chris Boot bo...@bootc.net ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
No, you don't need to do any of that. Just executing volume-set commands is sufficient for the changes to take effect. -Krutika On Wed, Jun 21, 2017 at 3:48 PM, Chris Bootwrote: > [replying to lists this time] > > On 20/06/17 11:23, Krutika Dhananjay wrote: > > Couple of things: > > > > 1. Like Darrell suggested, you should enable stat-prefetch and increase > > client and server event threads to 4. > > # gluster volume set performance.stat-prefetch on > > # gluster volume set client.event-threads 4 > > # gluster volume set server.event-threads 4 > > > > 2. Also glusterfs-3.10.1 and above has a shard performance bug fix - > > https://review.gluster.org/#/c/16966/ > > > > With these two changes, we saw great improvement in performance in our > > internal testing. > > Hi Krutika, > > Thanks for your input. I have yet to run any benchmarks, but I'll do > that once I have a bit more time to work on this. > > I've tweaked the options as you suggest, but that doesn't seem to have > made an appreciable difference. I admit that without benchmarks it's a > bit like sticking your finger in the air, though. Do I need to restart > my bricks and/or remount the volumes for these to take effect? > > I'm actually running GlusterFS 3.10.2-1. This is all coming from the > CentOS Storage SIG's centos-release-gluster310 repository. > > Thanks again. > > Chris > > -- > Chris Boot > bo...@bootc.net > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
No. It's just that in the internal testing that was done here, increasing the thread count beyond 4 did not improve the performance any further. -Krutika On Tue, Jun 20, 2017 at 11:30 PM, mabi <m...@protonmail.ch> wrote: > Dear Krutika, > > Sorry for asking so naively but can you tell me on what factor do you base > that the client and server event-threads parameters for a volume should be > set to 4? > > Is this metric for example based on the number of cores a GlusterFS server > has? > > I am asking because I saw my GlusterFS volumes are set to 2 and would like > to set these parameters to something meaningful for performance tuning. My > setup is a two node replica with GlusterFS 3.8.11. > > Best regards, > M. > > > > ---- Original Message ---- > Subject: Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance > Local Time: June 20, 2017 12:23 PM > UTC Time: June 20, 2017 10:23 AM > From: kdhan...@redhat.com > To: Lindsay Mathieson <lindsay.mathie...@gmail.com> > gluster-users <gluster-users@gluster.org>, oVirt users <us...@ovirt.org> > > Couple of things: > 1. Like Darrell suggested, you should enable stat-prefetch and increase > client and server event threads to 4. > # gluster volume set performance.stat-prefetch on > # gluster volume set client.event-threads 4 > # gluster volume set server.event-threads 4 > > 2. Also glusterfs-3.10.1 and above has a shard performance bug fix - > https://review.gluster.org/#/c/16966/ > > With these two changes, we saw great improvement in performance in our > internal testing. > > Do you mind trying these two options above? > -Krutika > > On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson < > lindsay.mathie...@gmail.com> wrote: > >> Have you tried with: >> >> performance.strict-o-direct : off >> performance.strict-write-ordering : off >> They can be changed dynamically. >> >> >> On 20 June 2017 at 17:21, Sahina Bose <sab...@redhat.com> wrote: >> >>> [Adding gluster-users] >>> >>> On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot <bo...@bootc.net> wrote: >>> >>>> Hi folks, >>>> >>>> I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10 >>>> configuration. My VMs run off a replica 3 arbiter 1 volume comprised of >>>> 6 bricks, which themselves live on two SSDs in each of the servers (one >>>> brick per SSD). The bricks are XFS on LVM thin volumes straight onto the >>>> SSDs. Connectivity is 10G Ethernet. >>>> >>>> Performance within the VMs is pretty terrible. I experience very low >>>> throughput and random IO is really bad: it feels like a latency issue. >>>> On my oVirt nodes the SSDs are not generally very busy. The 10G network >>>> seems to run without errors (iperf3 gives bandwidth measurements of >= >>>> 9.20 Gbits/sec between the three servers). >>>> >>>> To put this into perspective: I was getting better behaviour from NFS4 >>>> on a gigabit connection than I am with GlusterFS on 10G: that doesn't >>>> feel right at all. >>>> >>>> My volume configuration looks like this: >>>> >>>> Volume Name: vmssd >>>> Type: Distributed-Replicate >>>> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853 >>>> Status: Started >>>> Snapshot Count: 0 >>>> Number of Bricks: 2 x (2 + 1) = 6 >>>> Transport-type: tcp >>>> Bricks: >>>> Brick1: ovirt3:/gluster/ssd0_vmssd/brick >>>> Brick2: ovirt1:/gluster/ssd0_vmssd/brick >>>> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter) >>>> Brick4: ovirt3:/gluster/ssd1_vmssd/brick >>>> Brick5: ovirt1:/gluster/ssd1_vmssd/brick >>>> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter) >>>> Options Reconfigured: >>>> nfs.disable: on >>>> transport.address-family: inet6 >>>> performance.quick-read: off >>>> performance.read-ahead: off >>>> performance.io-cache: off >>>> performance.stat-prefetch: off >>>> performance.low-prio-threads: 32 >>>> network.remote-dio: off >>>> cluster.eager-lock: enable >>>> cluster.quorum-type: auto >>>> cluster.server-quorum-type: server >>>> cluster.data-self-heal-algorithm: full >>>> cluster.locking-scheme: granular >>>> cluster.shd-max-threads: 8 >>>> cluster.shd-wait-qlength: 1
Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
Dear Krutika, Sorry for asking so naively but can you tell me on what factor do you base that the client and server event-threads parameters for a volume should be set to 4? Is this metric for example based on the number of cores a GlusterFS server has? I am asking because I saw my GlusterFS volumes are set to 2 and would like to set these parameters to something meaningful for performance tuning. My setup is a two node replica with GlusterFS 3.8.11. Best regards, M. Original Message Subject: Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance Local Time: June 20, 2017 12:23 PM UTC Time: June 20, 2017 10:23 AM From: kdhan...@redhat.com To: Lindsay Mathieson <lindsay.mathie...@gmail.com> gluster-users <gluster-users@gluster.org>, oVirt users <us...@ovirt.org> Couple of things: 1. Like Darrell suggested, you should enable stat-prefetch and increase client and server event threads to 4. # gluster volume set performance.stat-prefetch on # gluster volume set client.event-threads 4 # gluster volume set server.event-threads 4 2. Also glusterfs-3.10.1 and above has a shard performance bug fix - https://review.gluster.org/#/c/16966/ With these two changes, we saw great improvement in performance in our internal testing. Do you mind trying these two options above? -Krutika On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson <lindsay.mathie...@gmail.com> wrote: Have you tried with: performance.strict-o-direct : off performance.strict-write-ordering : off They can be changed dynamically. On 20 June 2017 at 17:21, Sahina Bose <sab...@redhat.com> wrote: [Adding gluster-users] On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot <bo...@bootc.net> wrote: Hi folks, I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10 configuration. My VMs run off a replica 3 arbiter 1 volume comprised of 6 bricks, which themselves live on two SSDs in each of the servers (one brick per SSD). The bricks are XFS on LVM thin volumes straight onto the SSDs. Connectivity is 10G Ethernet. Performance within the VMs is pretty terrible. I experience very low throughput and random IO is really bad: it feels like a latency issue. On my oVirt nodes the SSDs are not generally very busy. The 10G network seems to run without errors (iperf3 gives bandwidth measurements of >= 9.20 Gbits/sec between the three servers). To put this into perspective: I was getting better behaviour from NFS4 on a gigabit connection than I am with GlusterFS on 10G: that doesn't feel right at all. My volume configuration looks like this: Volume Name: vmssd Type: Distributed-Replicate Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: ovirt3:/gluster/ssd0_vmssd/brick Brick2: ovirt1:/gluster/ssd0_vmssd/brick Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter) Brick4: ovirt3:/gluster/ssd1_vmssd/brick Brick5: ovirt1:/gluster/ssd1_vmssd/brick Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter) Options Reconfigured: nfs.disable: on transport.address-family: inet6 performance.quick-read: off performance.read-ahead: off performance.io-cache: off performance.stat-prefetch: off performance.low-prio-threads: 32 network.remote-dio: off cluster.eager-lock: enable cluster.quorum-type: auto cluster.server-quorum-type: server cluster.data-self-heal-algorithm: full cluster.locking-scheme: granular cluster.shd-max-threads: 8 cluster.shd-wait-qlength: 1 features.shard: on user.cifs: off storage.owner-uid: 36 storage.owner-gid: 36 features.shard-block-size: 128MB performance.strict-o-direct: on network.ping-timeout: 30 cluster.granular-entry-heal: enable I would really appreciate some guidance on this to try to improve things because at this rate I will need to reconsider using GlusterFS altogether. Could you provide the gluster volume profile output while you're running your I/O tests. # gluster volume profile start to start profiling # gluster volume profile info for the profile output. Cheers, Chris -- Chris Boot bo...@bootc.net ___ Users mailing list us...@ovirt.org http://lists.ovirt.org/mailman/listinfo/users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users -- Lindsay ___ Users mailing list us...@ovirt.org http://lists.ovirt.org/mailman/listinfo/users___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
Couple of things: 1. Like Darrell suggested, you should enable stat-prefetch and increase client and server event threads to 4. # gluster volume set performance.stat-prefetch on # gluster volume set client.event-threads 4 # gluster volume set server.event-threads 4 2. Also glusterfs-3.10.1 and above has a shard performance bug fix - https://review.gluster.org/#/c/16966/ With these two changes, we saw great improvement in performance in our internal testing. Do you mind trying these two options above? -Krutika On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson < lindsay.mathie...@gmail.com> wrote: > Have you tried with: > > performance.strict-o-direct : off > performance.strict-write-ordering : off > > They can be changed dynamically. > > > On 20 June 2017 at 17:21, Sahina Bosewrote: > >> [Adding gluster-users] >> >> On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot wrote: >> >>> Hi folks, >>> >>> I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10 >>> configuration. My VMs run off a replica 3 arbiter 1 volume comprised of >>> 6 bricks, which themselves live on two SSDs in each of the servers (one >>> brick per SSD). The bricks are XFS on LVM thin volumes straight onto the >>> SSDs. Connectivity is 10G Ethernet. >>> >>> Performance within the VMs is pretty terrible. I experience very low >>> throughput and random IO is really bad: it feels like a latency issue. >>> On my oVirt nodes the SSDs are not generally very busy. The 10G network >>> seems to run without errors (iperf3 gives bandwidth measurements of >= >>> 9.20 Gbits/sec between the three servers). >>> >>> To put this into perspective: I was getting better behaviour from NFS4 >>> on a gigabit connection than I am with GlusterFS on 10G: that doesn't >>> feel right at all. >>> >>> My volume configuration looks like this: >>> >>> Volume Name: vmssd >>> Type: Distributed-Replicate >>> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853 >>> Status: Started >>> Snapshot Count: 0 >>> Number of Bricks: 2 x (2 + 1) = 6 >>> Transport-type: tcp >>> Bricks: >>> Brick1: ovirt3:/gluster/ssd0_vmssd/brick >>> Brick2: ovirt1:/gluster/ssd0_vmssd/brick >>> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter) >>> Brick4: ovirt3:/gluster/ssd1_vmssd/brick >>> Brick5: ovirt1:/gluster/ssd1_vmssd/brick >>> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter) >>> Options Reconfigured: >>> nfs.disable: on >>> transport.address-family: inet6 >>> performance.quick-read: off >>> performance.read-ahead: off >>> performance.io-cache: off >>> performance.stat-prefetch: off >>> performance.low-prio-threads: 32 >>> network.remote-dio: off >>> cluster.eager-lock: enable >>> cluster.quorum-type: auto >>> cluster.server-quorum-type: server >>> cluster.data-self-heal-algorithm: full >>> cluster.locking-scheme: granular >>> cluster.shd-max-threads: 8 >>> cluster.shd-wait-qlength: 1 >>> features.shard: on >>> user.cifs: off >>> storage.owner-uid: 36 >>> storage.owner-gid: 36 >>> features.shard-block-size: 128MB >>> performance.strict-o-direct: on >>> network.ping-timeout: 30 >>> cluster.granular-entry-heal: enable >>> >>> I would really appreciate some guidance on this to try to improve things >>> because at this rate I will need to reconsider using GlusterFS >>> altogether. >>> >> >> >> Could you provide the gluster volume profile output while you're running >> your I/O tests. >> >> # gluster volume profile start >> to start profiling >> >> # gluster volume profile info >> >> for the profile output. >> >> >>> >>> Cheers, >>> Chris >>> >>> -- >>> Chris Boot >>> bo...@bootc.net >>> ___ >>> Users mailing list >>> us...@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >> >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > > > -- > Lindsay > > ___ > Users mailing list > us...@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
Have you tried with: performance.strict-o-direct : off performance.strict-write-ordering : off They can be changed dynamically. On 20 June 2017 at 17:21, Sahina Bosewrote: > [Adding gluster-users] > > On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot wrote: > >> Hi folks, >> >> I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10 >> configuration. My VMs run off a replica 3 arbiter 1 volume comprised of >> 6 bricks, which themselves live on two SSDs in each of the servers (one >> brick per SSD). The bricks are XFS on LVM thin volumes straight onto the >> SSDs. Connectivity is 10G Ethernet. >> >> Performance within the VMs is pretty terrible. I experience very low >> throughput and random IO is really bad: it feels like a latency issue. >> On my oVirt nodes the SSDs are not generally very busy. The 10G network >> seems to run without errors (iperf3 gives bandwidth measurements of >= >> 9.20 Gbits/sec between the three servers). >> >> To put this into perspective: I was getting better behaviour from NFS4 >> on a gigabit connection than I am with GlusterFS on 10G: that doesn't >> feel right at all. >> >> My volume configuration looks like this: >> >> Volume Name: vmssd >> Type: Distributed-Replicate >> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 2 x (2 + 1) = 6 >> Transport-type: tcp >> Bricks: >> Brick1: ovirt3:/gluster/ssd0_vmssd/brick >> Brick2: ovirt1:/gluster/ssd0_vmssd/brick >> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter) >> Brick4: ovirt3:/gluster/ssd1_vmssd/brick >> Brick5: ovirt1:/gluster/ssd1_vmssd/brick >> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter) >> Options Reconfigured: >> nfs.disable: on >> transport.address-family: inet6 >> performance.quick-read: off >> performance.read-ahead: off >> performance.io-cache: off >> performance.stat-prefetch: off >> performance.low-prio-threads: 32 >> network.remote-dio: off >> cluster.eager-lock: enable >> cluster.quorum-type: auto >> cluster.server-quorum-type: server >> cluster.data-self-heal-algorithm: full >> cluster.locking-scheme: granular >> cluster.shd-max-threads: 8 >> cluster.shd-wait-qlength: 1 >> features.shard: on >> user.cifs: off >> storage.owner-uid: 36 >> storage.owner-gid: 36 >> features.shard-block-size: 128MB >> performance.strict-o-direct: on >> network.ping-timeout: 30 >> cluster.granular-entry-heal: enable >> >> I would really appreciate some guidance on this to try to improve things >> because at this rate I will need to reconsider using GlusterFS altogether. >> > > > Could you provide the gluster volume profile output while you're running > your I/O tests. > > # gluster volume profile start > to start profiling > > # gluster volume profile info > > for the profile output. > > >> >> Cheers, >> Chris >> >> -- >> Chris Boot >> bo...@bootc.net >> ___ >> Users mailing list >> us...@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users >> > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > -- Lindsay ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
[Adding gluster-users] On Mon, Jun 19, 2017 at 8:16 PM, Chris Bootwrote: > Hi folks, > > I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10 > configuration. My VMs run off a replica 3 arbiter 1 volume comprised of > 6 bricks, which themselves live on two SSDs in each of the servers (one > brick per SSD). The bricks are XFS on LVM thin volumes straight onto the > SSDs. Connectivity is 10G Ethernet. > > Performance within the VMs is pretty terrible. I experience very low > throughput and random IO is really bad: it feels like a latency issue. > On my oVirt nodes the SSDs are not generally very busy. The 10G network > seems to run without errors (iperf3 gives bandwidth measurements of >= > 9.20 Gbits/sec between the three servers). > > To put this into perspective: I was getting better behaviour from NFS4 > on a gigabit connection than I am with GlusterFS on 10G: that doesn't > feel right at all. > > My volume configuration looks like this: > > Volume Name: vmssd > Type: Distributed-Replicate > Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853 > Status: Started > Snapshot Count: 0 > Number of Bricks: 2 x (2 + 1) = 6 > Transport-type: tcp > Bricks: > Brick1: ovirt3:/gluster/ssd0_vmssd/brick > Brick2: ovirt1:/gluster/ssd0_vmssd/brick > Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter) > Brick4: ovirt3:/gluster/ssd1_vmssd/brick > Brick5: ovirt1:/gluster/ssd1_vmssd/brick > Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter) > Options Reconfigured: > nfs.disable: on > transport.address-family: inet6 > performance.quick-read: off > performance.read-ahead: off > performance.io-cache: off > performance.stat-prefetch: off > performance.low-prio-threads: 32 > network.remote-dio: off > cluster.eager-lock: enable > cluster.quorum-type: auto > cluster.server-quorum-type: server > cluster.data-self-heal-algorithm: full > cluster.locking-scheme: granular > cluster.shd-max-threads: 8 > cluster.shd-wait-qlength: 1 > features.shard: on > user.cifs: off > storage.owner-uid: 36 > storage.owner-gid: 36 > features.shard-block-size: 128MB > performance.strict-o-direct: on > network.ping-timeout: 30 > cluster.granular-entry-heal: enable > > I would really appreciate some guidance on this to try to improve things > because at this rate I will need to reconsider using GlusterFS altogether. > Could you provide the gluster volume profile output while you're running your I/O tests. # gluster volume profile start to start profiling # gluster volume profile info for the profile output. > > Cheers, > Chris > > -- > Chris Boot > bo...@bootc.net > ___ > Users mailing list > us...@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users