Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-20 Thread Raghavendra Gowdappa
On Tue, Mar 20, 2018 at 9:45 AM, Sam McLeod 
wrote:

> Excellent description, thank you.
>
> With performance.write-behind-trickling-writes ON (default):
>
> ## 4k randwrite
>

> # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test
> --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite
> test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=libaio, iodepth=32
> fio-3.1
> Starting 1 process
> Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=17.3MiB/s][r=0,w=4422 IOPS][eta
> 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=42701: Tue Mar 20 15:05:23 2018
>   write: *IOPS=4443*, *BW=17.4MiB/s* (18.2MB/s)(256MiB/14748msec)
>bw (  KiB/s): min=16384, max=19184, per=99.92%, avg=17760.45,
> stdev=602.48, samples=29
>iops: min= 4096, max= 4796, avg=4440.07, stdev=150.66,
> samples=29
>   cpu  : usr=4.00%, sys=18.02%, ctx=131097, majf=0, minf=7
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>  issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0
>  latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>   WRITE: bw=17.4MiB/s (18.2MB/s), 17.4MiB/s-17.4MiB/s (18.2MB/s-18.2MB/s),
> io=256MiB (268MB), run=14748-14748msec
>
>
> ## 2k randwrite
>
> # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test
> --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite
> test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T)
> 2048B-2048B, ioengine=libaio, iodepth=32
> fio-3.1
> Starting 1 process
> Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8624KiB/s][r=0,w=4312 IOPS][eta
> 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=42781: Tue Mar 20 15:05:57 2018
>   write: *IOPS=4439, BW=8880KiB/s* (9093kB/s)(256MiB/29522msec)
>bw (  KiB/s): min= 6908, max= 9564, per=99.94%, avg=8874.03,
> stdev=428.92, samples=59
>iops: min= 3454, max= 4782, avg=4437.00, stdev=214.44,
> samples=59
>   cpu  : usr=2.43%, sys=18.18%, ctx=26, majf=0, minf=8
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>  issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0
>  latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>   WRITE: bw=8880KiB/s (9093kB/s), 8880KiB/s-8880KiB/s (9093kB/s-9093kB/s),
> io=256MiB (268MB), run=29522-29522msec
>
>
> With performance.write-behind-trickling-writes OFF:
>
> ## 4k randwrite - just over half the IOP/s of having it ON.
>

Note that since the workload is random write, no aggregation is possible.
So, there is no point in waiting for future writes and turning
trickling-writes on makes sense.

A better test to measure the impact of this option would be sequential
write workload. I guess smaller the writes, more pronounced one would see
the benefits of this option turned off.


>
> # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test
> --filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite
> test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> 4096B-4096B, ioengine=libaio, iodepth=32
> fio-3.1
> Starting 1 process
> Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta
> 00m:00s]
> test: (groupid=0, jobs=1): err= 0: pid=44225: Tue Mar 20 15:11:04 2018
>   write: *IOPS=2594, BW=10.1MiB/s* (10.6MB/s)(256MiB/25259msec)
>bw (  KiB/s): min= 2248, max=18728, per=100.00%, avg=10454.10,
> stdev=6481.14, samples=50
>iops: min=  562, max= 4682, avg=2613.50, stdev=1620.35,
> samples=50
>   cpu  : usr=2.29%, sys=10.09%, ctx=131141, majf=0, minf=7
>   IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
> >=64=0.0%
>  submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> >=64=0.0%
>  complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
> >=64=0.0%
>  issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0
>  latency   : target=0, window=0, percentile=100.00%, depth=32
>
> Run status group 0 (all jobs):
>   WRITE: bw=10.1MiB/s (10.6MB/s), 10.1MiB/s-10.1MiB/s (10.6MB/s-10.6MB/s),
> io=256MiB (268MB), run=25259-25259msec
>
>
> ## 2k randwrite - no noticable change.
>
> # fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test
> --filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite
> test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T)
> 2048B-2048B, ioengine=libaio, iodepth=32
> fio-3.1
> Starting 1 process
> Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8662KiB/s][r=0,w=4331 IOPS][eta
> 

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread Sam McLeod
Excellent description, thank you.

With performance.write-behind-trickling-writes ON (default):

## 4k randwrite

# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test 
--filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=17.3MiB/s][r=0,w=4422 IOPS][eta 
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=42701: Tue Mar 20 15:05:23 2018
  write: IOPS=4443, BW=17.4MiB/s (18.2MB/s)(256MiB/14748msec)
   bw (  KiB/s): min=16384, max=19184, per=99.92%, avg=17760.45, stdev=602.48, 
samples=29
   iops: min= 4096, max= 4796, avg=4440.07, stdev=150.66, samples=29
  cpu  : usr=4.00%, sys=18.02%, ctx=131097, majf=0, minf=7
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=17.4MiB/s (18.2MB/s), 17.4MiB/s-17.4MiB/s (18.2MB/s-18.2MB/s), 
io=256MiB (268MB), run=14748-14748msec


## 2k randwrite

# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test 
--filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 
2048B-2048B, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8624KiB/s][r=0,w=4312 IOPS][eta 
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=42781: Tue Mar 20 15:05:57 2018
  write: IOPS=4439, BW=8880KiB/s (9093kB/s)(256MiB/29522msec)
   bw (  KiB/s): min= 6908, max= 9564, per=99.94%, avg=8874.03, stdev=428.92, 
samples=59
   iops: min= 3454, max= 4782, avg=4437.00, stdev=214.44, samples=59
  cpu  : usr=2.43%, sys=18.18%, ctx=26, majf=0, minf=8
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued rwt: total=0,131072,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=8880KiB/s (9093kB/s), 8880KiB/s-8880KiB/s (9093kB/s-9093kB/s), 
io=256MiB (268MB), run=29522-29522msec


With performance.write-behind-trickling-writes OFF:

## 4k randwrite - just over half the IOP/s of having it ON.


# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test 
--filename=test --bs=4k --iodepth=32 --size=256MB --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 
4096B-4096B, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=44225: Tue Mar 20 15:11:04 2018
  write: IOPS=2594, BW=10.1MiB/s (10.6MB/s)(256MiB/25259msec)
   bw (  KiB/s): min= 2248, max=18728, per=100.00%, avg=10454.10, 
stdev=6481.14, samples=50
   iops: min=  562, max= 4682, avg=2613.50, stdev=1620.35, samples=50
  cpu  : usr=2.29%, sys=10.09%, ctx=131141, majf=0, minf=7
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued rwt: total=0,65536,0, short=0,0,0, dropped=0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=10.1MiB/s (10.6MB/s), 10.1MiB/s-10.1MiB/s (10.6MB/s-10.6MB/s), 
io=256MiB (268MB), run=25259-25259msec


## 2k randwrite - no noticable change.

# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test 
--filename=test --bs=2k --iodepth=32 --size=256MB --readwrite=randwrite
test: (g=0): rw=randwrite, bs=(R) 2048B-2048B, (W) 2048B-2048B, (T) 
2048B-2048B, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=8662KiB/s][r=0,w=4331 IOPS][eta 
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=45813: Tue Mar 20 15:12:02 2018
  write: IOPS=4291, BW=8583KiB/s (8789kB/s)(256MiB/30541msec)
   bw (  KiB/s): min= 7416, max=10264, per=99.94%, avg=8577.66, stdev=618.31, 
samples=61
   iops: min= 3708, max= 5132, avg=4288.84, stdev=309.15, samples=61
  cpu  : usr=2.87%, sys=15.83%, ctx=262236, majf=0, minf=8
  IO depths: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
 submit: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
 issued rwt: 

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread Raghavendra Gowdappa
On Tue, Mar 20, 2018 at 8:57 AM, Sam McLeod 
wrote:

> Hi Raghavendra,
>
>
> On 20 Mar 2018, at 1:55 pm, Raghavendra Gowdappa 
> wrote:
>
> Aggregating large number of small writes by write-behind into large writes
> has been merged on master:
> https://github.com/gluster/glusterfs/issues/364
>
> Would like to know whether it helps for this usecase. Note that its not
> part of any release yet. So you've to build and install from repo.
>
>
> Sounds interesting, not too keen to build packages at the moment but I've
> added myself as a watcher to that issue on Github and once it's in a 3.x
> release I'll try it and let you know.
>
> Another suggestion is to run tests with turning off option
> performance.write-behind-trickling-writes.
>
> # gluster volume set  performance.write-behind-trickling-writes
> off
>
> A word of caution though is if your files are too small, these suggestions
> may not have much impact.
>
>
> I'm looking for documentation on this option but all I could really find
> is in the source for write-behind.c:
>
> if is enabled (which it is), do not hold back writes if there are no
> outstanding requests.
>

Till recently this functionality though was available, couldn't be
configured from cli. One could change this option by editing volume
configuration file. However, now its configurable through cli:

https://review.gluster.org/#/c/18719/


>
> and a note on aggregate-size stating that
>
> *"aggregation won't happen if performance.write-behind-trickling-writes is
> turned on"*
>
>
> What are the potentially negative performance impacts of disabling this?
>

Even if aggregation option is turned off, write-behind has the capacity to
aggregate till a size of 128KB. But, to completely make use of this in case
of small write workloads write-behind has to wait for sometime so that
there are enough number of write-requests to fill the capacity. With this
option enabled, write-behind though aggregates existing requests, won't
wait for future writes. This means descendant xlators of write-behind can
see writes smaller than 128K. So, for a scenario where small number of
large writes are preferred over large number of small sized writes, this
can be a problem.


> --
> Sam McLeod (protoporpoise on IRC)
> https://smcleod.net
> https://twitter.com/s_mcleod
>
> Words are my own opinions and do not necessarily represent those of
> my employer or partners.
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread Sam McLeod
Hi Raghavendra,


> On 20 Mar 2018, at 1:55 pm, Raghavendra Gowdappa  wrote:
> 
> Aggregating large number of small writes by write-behind into large writes 
> has been merged on master:
> https://github.com/gluster/glusterfs/issues/364 
> 
> 
> Would like to know whether it helps for this usecase. Note that its not part 
> of any release yet. So you've to build and install from repo.

Sounds interesting, not too keen to build packages at the moment but I've added 
myself as a watcher to that issue on Github and once it's in a 3.x release I'll 
try it and let you know.

> Another suggestion is to run tests with turning off option 
> performance.write-behind-trickling-writes.
> 
> # gluster volume set  performance.write-behind-trickling-writes off
> 
> A word of caution though is if your files are too small, these suggestions 
> may not have much impact.

I'm looking for documentation on this option but all I could really find is in 
the source for write-behind.c:

if is enabled (which it is), do not hold back writes if there are no 
outstanding requests.


and a note on aggregate-size stating that 

"aggregation won't happen if performance.write-behind-trickling-writes is 
turned on"


What are the potentially negative performance impacts of disabling this?

--
Sam McLeod (protoporpoise on IRC)
https://smcleod.net
https://twitter.com/s_mcleod

Words are my own opinions and do not necessarily represent those of my employer 
or partners.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread Raghavendra Gowdappa
On Tue, Mar 20, 2018 at 1:55 AM, TomK  wrote:

> On 3/19/2018 10:52 AM, Rik Theys wrote:
>
>> Hi,
>>
>> On 03/19/2018 03:42 PM, TomK wrote:
>>
>>> On 3/19/2018 5:42 AM, Ondrej Valousek wrote:
>>> Removing NFS or NFS Ganesha from the equation, not very impressed on my
>>> own setup either.  For the writes it's doing, that's alot of CPU usage
>>> in top. Seems bottle-necked via a single execution core somewhere trying
>>> to facilitate read / writes to the other bricks.
>>>
>>> Writes to the gluster FS from within one of the gluster participating
>>> bricks:
>>>
>>> [root@nfs01 n]# dd if=/dev/zero of=./some-file.bin
>>>
>>> 393505+0 records in
>>> 393505+0 records out
>>> 201474560 bytes (201 MB) copied, 50.034 s, 4.0 MB/s
>>>
>>
>> That's not really a fare comparison as you don't specify a blocksize.
>> What does
>>
>> dd if=/dev/zero of=./some-file.bin bs=1M count=1000 oflag=direct
>>
>> give?
>>
>>
>> Rik
>>
>> Correct.  Higher block sizes gave me better numbers earlier.  Curious
> about improving the small file size performance though, preferrably via
> gluster tunables, if possible.
>
> Though it could be said I guess that compressing a set of large files and
> transferring them over that way is one solution.  However needed the small
> block size on dd to perhaps quickly simulate alot of small requests in a
> somewhat ok-ish way.
>

Aggregating large number of small writes by write-behind into large writes
has been merged on master:
https://github.com/gluster/glusterfs/issues/364

Would like to know whether it helps for this usecase. Note that its not
part of any release yet. So you've to build and install from repo.

Another suggestion is to run tests with turning off option
performance.write-behind-trickling-writes.

# gluster volume set  performance.write-behind-trickling-writes off

A word of caution though is if your files are too small, these suggestions
may not have much impact.


> Here's the numbers from the VM:
>
> [ Via Gluster ]
> [root@nfs01 n]# dd if=/dev/zero of=./some-file.bin bs=1M count=1
> oflag=direct
> 1+0 records in
> 1+0 records out
> 1048576 bytes (10 GB) copied, 96.3228 s, 109 MB/s
> [root@nfs01 n]# rm some-file.bin
> rm: remove regular file âsome-file.binâ? y
>
> [ Via XFS ]
> [root@nfs01 n]# cd /bricks/0/gv01/
> [root@nfs01 gv01]# dd if=/dev/zero of=./some-file.bin bs=1M count=1
> oflag=direct
> 1+0 records in
> 1+0 records out
> 1048576 bytes (10 GB) copied, 44.79 s, 234 MB/s
> [root@nfs01 gv01]#
>
>
>
> top - 12:49:48 up 1 day,  9:39,  2 users,  load average: 0.66, 1.15, 1.82
> Tasks: 165 total,   1 running, 164 sleeping,   0 stopped,   0 zombie
> %Cpu0  : 10.3 us,  9.6 sy,  0.0 ni, 28.0 id, 50.4 wa,  0.0 hi,  1.8 si,
> 0.0 st
> %Cpu1  : 13.8 us, 13.8 sy,  0.0 ni, 38.6 id, 30.0 wa,  0.0 hi,  3.8 si,
> 0.0 st
> %Cpu2  :  8.7 us,  6.9 sy,  0.0 ni, 48.7 id, 34.9 wa,  0.0 hi,  0.7 si,
> 0.0 st
> %Cpu3  : 10.6 us,  7.8 sy,  0.0 ni, 57.1 id, 24.1 wa,  0.0 hi,  0.4 si,
> 0.0 st
> KiB Mem :  3881708 total,  3543280 free,   224008 used,   114420 buff/cache
> KiB Swap:  4063228 total,  3836612 free,   226616 used.  3457708 avail Mem
>
>   PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
> 14115 root  20   0 2504832  27640   2612 S  43.5  0.7 432:10.35
> glusterfsd
>  1319 root  20   0 1269620  23780   2636 S  38.9  0.6 752:44.78
> glusterfs
>  1334 root  20   0 2694264  56988   1672 S  16.3  1.5 311:20.90
> ganesha.nfsd
> 27458 root  20   0  108984   1404540 D   3.0  0.0   0:00.24 dd
> 14127 root  20   0 1164720   4860   1960 S   0.7  0.1   1:47.59
> glusterfs
>   750 root  20   0  389864   5528   3988 S   0.3  0.1   0:08.77 sssd_be
>
> --
> Cheers,
> Tom K.
> 
> -
>
> Living on earth is expensive, but it includes a free trip around the sun.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread Sam McLeod
Howdy all,

Sorry in Australia so most of your replies came in over night for me.

Note: At the end of this reply is a listing of all our volume settings (gluster 
get volname all).
Note 2: I really wish Gluster used Discourse for this kind of community 
troubleshooting an analysis, using a mailing list is really painful.


> On 19 Mar 2018, at 4:38 pm, TomK  wrote:
> 
> On 3/19/2018 1:07 AM, TomK wrote:
> A few numbers you could try:
> 
> performance.cache-refresh-timeout Default: 1s

I've actually set this right up to 60 (seconds), I guess it's possible that's 
causing an issue but I thought that was more for forced eviction on idle files.

> cluster.stripe-block-size Default: 128KB

Hmm yes I wonder if it might be worth looking at the stripe-block-size, I 
forgot about this as it sounds like it's for striped volumes (now deprecated) 
only.
The issue with this is that I don't want to tune the volume just for small 
files and hurt the performance of lager I/O operations.

> 
> Looks like others are having this sort of performance problem:
> 
> http://lists.gluster.org/pipermail/gluster-users/2015-April/021487.html
> 
> Some recommended values by one poster that might help out 
> (https://forum.proxmox.com/threads/horribly-slow-gluster-performance.26319/)  
> Going to try in my LAB and let you know:
> 
> 
> > GlusterFS 3.7 parameters:

GlusterFS 3.7 is really old so I'd be careful looking at settings / tuning for 
it.

> nfs.trusted-sync: on

Not using NFS.

> performance.cache-size: 1GB

Already set to 1024MB, but that's only for reads not writes.

> performance.io-thread-count: 16

That's my current setting.

> performance.write-behind-window-size: 8MB

Currently allowing even more cache up at 256MB.

> performance.readdir-ahead: on

That's my current setting (the default now I believe).

> client.event-threads: 8

That's my current setting (the default now I believe).

> server.event-threads: 8

That's my current setting (the default now I believe).

> cluster.quorum-type: auto

Not sure how that's going to impact small I/O performance.
I currently have this set to none, but do use an arbiter node.

> cluster.server-quorum-type: server

Not sure how that's going to impact small I/O performance.
I currently have this set to off, but do use an arbiter node.

> cluster.server-quorum-ratio: 51%

Not sure how that's going to impact small I/O performance.
I currently have this set to 0, but do use an arbiter node.

> 
> > Kernel parameters:
> net.ipv4.tcp_slow_start_after_idle = 0

That's my current setting.

> net.ipv4.tcp_fin_timeout = 15

I've set this right down to 5.

> net.core.somaxconn = 65535

That's my current setting.

> vm.swappiness = 1

That's my current setting, we don't have swap - other than ZRAM enabled on any 
hosts.

> vm.dirty_ratio = 5

N/A as swap disabled (ZRAM only)

> vm.dirty_background_ratio = 2

N/A as swap disabled (ZRAM only)

> vm.min_free_kbytes = 524288   # this is on 128GB RAM

I have this set to vm.min_free_kbytes = 67584, I'd be worried that setting this 
high would cause OOM as per the official kernel docs:

min_free_kbytes:

This is used to force the Linux VM to keep a minimum number
of kilobytes free.  The VM uses this number to compute a
watermark[WMARK_MIN] value for each lowmem zone in the system.
Each lowmem zone gets a number of reserved free pages based
proportionally on its size.

Some minimal amount of memory is needed to satisfy PF_MEMALLOC
allocations; if you set this to lower than 1024KB, your system will
become subtly broken, and prone to deadlock under high loads.

Setting this too high will OOM your machine instantly.


> On 20 Mar 2018, at 1:52 am, Rik Theys  wrote:
> 
> That's not really a fare comparison as you don't specify a blocksize.
> What does
> 
> dd if=/dev/zero of=./some-file.bin bs=1M count=1000 oflag=direct
> 
> give?
> 
> 
> Rik

DD is not going to give anyone particularly useful benchmarks, especially with 
small file sizes, in fact it's more likely to mislead you than be useful.
See my short post on fio here: 
https://smcleod.net/tech/2016/04/29/benchmarking-io.html 
 , I believe it's one 
of the most useful tools for I/O benchmarking.

Just for a laugh I compared dd writes for 4k (small) writes between the client 
(gluster mounted on the cli) and a gluster host (to a directory on the same 
storage as the bricks).
The client came out faster, likely the direct I/O flag was not working as 
perhaps intended.

Client:

# dd if=/dev/zero of=./some-file.bin bs=4K count=4096 oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 2.27839 s, 7.4 MB/s

Server:

dd if=/dev/zero of=./some-file.bin bs=4K count=4096 oflag=direct
4096+0 records in
4096+0 records out
16777216 bytes (17 MB) copied, 3.94093 s, 4.3 MB/s



> Note: At the end of this reply is a listing of all our volume settings 

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread TomK

On 3/19/2018 10:52 AM, Rik Theys wrote:

Hi,

On 03/19/2018 03:42 PM, TomK wrote:

On 3/19/2018 5:42 AM, Ondrej Valousek wrote:
Removing NFS or NFS Ganesha from the equation, not very impressed on my
own setup either.  For the writes it's doing, that's alot of CPU usage
in top. Seems bottle-necked via a single execution core somewhere trying
to facilitate read / writes to the other bricks.

Writes to the gluster FS from within one of the gluster participating
bricks:

[root@nfs01 n]# dd if=/dev/zero of=./some-file.bin

393505+0 records in
393505+0 records out
201474560 bytes (201 MB) copied, 50.034 s, 4.0 MB/s


That's not really a fare comparison as you don't specify a blocksize.
What does

dd if=/dev/zero of=./some-file.bin bs=1M count=1000 oflag=direct

give?


Rik

Correct.  Higher block sizes gave me better numbers earlier.  Curious 
about improving the small file size performance though, preferrably via 
gluster tunables, if possible.


Though it could be said I guess that compressing a set of large files 
and transferring them over that way is one solution.  However needed the 
small block size on dd to perhaps quickly simulate alot of small 
requests in a somewhat ok-ish way.


Here's the numbers from the VM:

[ Via Gluster ]
[root@nfs01 n]# dd if=/dev/zero of=./some-file.bin bs=1M count=1 
oflag=direct

1+0 records in
1+0 records out
1048576 bytes (10 GB) copied, 96.3228 s, 109 MB/s
[root@nfs01 n]# rm some-file.bin
rm: remove regular file âsome-file.binâ? y

[ Via XFS ]
[root@nfs01 n]# cd /bricks/0/gv01/
[root@nfs01 gv01]# dd if=/dev/zero of=./some-file.bin bs=1M count=1 
oflag=direct

1+0 records in
1+0 records out
1048576 bytes (10 GB) copied, 44.79 s, 234 MB/s
[root@nfs01 gv01]#



top - 12:49:48 up 1 day,  9:39,  2 users,  load average: 0.66, 1.15, 1.82
Tasks: 165 total,   1 running, 164 sleeping,   0 stopped,   0 zombie
%Cpu0  : 10.3 us,  9.6 sy,  0.0 ni, 28.0 id, 50.4 wa,  0.0 hi,  1.8 si, 
0.0 st
%Cpu1  : 13.8 us, 13.8 sy,  0.0 ni, 38.6 id, 30.0 wa,  0.0 hi,  3.8 si, 
0.0 st
%Cpu2  :  8.7 us,  6.9 sy,  0.0 ni, 48.7 id, 34.9 wa,  0.0 hi,  0.7 si, 
0.0 st
%Cpu3  : 10.6 us,  7.8 sy,  0.0 ni, 57.1 id, 24.1 wa,  0.0 hi,  0.4 si, 
0.0 st

KiB Mem :  3881708 total,  3543280 free,   224008 used,   114420 buff/cache
KiB Swap:  4063228 total,  3836612 free,   226616 used.  3457708 avail Mem

  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ COMMAND
14115 root  20   0 2504832  27640   2612 S  43.5  0.7 432:10.35 
glusterfsd
 1319 root  20   0 1269620  23780   2636 S  38.9  0.6 752:44.78 
glusterfs
 1334 root  20   0 2694264  56988   1672 S  16.3  1.5 311:20.90 
ganesha.nfsd

27458 root  20   0  108984   1404540 D   3.0  0.0   0:00.24 dd
14127 root  20   0 1164720   4860   1960 S   0.7  0.1   1:47.59 
glusterfs

  750 root  20   0  389864   5528   3988 S   0.3  0.1   0:08.77 sssd_be

--
Cheers,
Tom K.
-

Living on earth is expensive, but it includes a free trip around the sun.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread Rik Theys
Hi,

On 03/19/2018 03:42 PM, TomK wrote:
> On 3/19/2018 5:42 AM, Ondrej Valousek wrote:
> Removing NFS or NFS Ganesha from the equation, not very impressed on my
> own setup either.  For the writes it's doing, that's alot of CPU usage
> in top. Seems bottle-necked via a single execution core somewhere trying
> to facilitate read / writes to the other bricks.
> 
> Writes to the gluster FS from within one of the gluster participating
> bricks:
> 
> [root@nfs01 n]# dd if=/dev/zero of=./some-file.bin
> 
> 393505+0 records in
> 393505+0 records out
> 201474560 bytes (201 MB) copied, 50.034 s, 4.0 MB/s

That's not really a fare comparison as you don't specify a blocksize.
What does

dd if=/dev/zero of=./some-file.bin bs=1M count=1000 oflag=direct

give?


Rik

-- 
Rik Theys
System Engineer
KU Leuven - Dept. Elektrotechniek (ESAT)
Kasteelpark Arenberg 10 bus 2440  - B-3001 Leuven-Heverlee
+32(0)16/32.11.07

<>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread TomK

On 3/19/2018 5:42 AM, Ondrej Valousek wrote:
Removing NFS or NFS Ganesha from the equation, not very impressed on my 
own setup either.  For the writes it's doing, that's alot of CPU usage 
in top. Seems bottle-necked via a single execution core somewhere trying 
to facilitate read / writes to the other bricks.


Writes to the gluster FS from within one of the gluster participating 
bricks:


[root@nfs01 n]# dd if=/dev/zero of=./some-file.bin

393505+0 records in
393505+0 records out
201474560 bytes (201 MB) copied, 50.034 s, 4.0 MB/s

[root@nfs01 n]#

Top results (10 second average)won't go over 32%:

top - 00:49:38 up 21:39,  2 users,  load average: 0.42, 0.24, 0.19
Tasks: 164 total,   1 running, 163 sleeping,   0 stopped,   0 zombie
%Cpu0  : 29.3 us, 24.7 sy,  0.0 ni, 45.1 id,  0.0 wa,  0.0 hi,  0.8 si, 
0.0 st
%Cpu1  : 27.2 us, 24.1 sy,  0.0 ni, 47.2 id,  0.0 wa,  0.0 hi,  1.5 si, 
0.0 st
%Cpu2  : 20.2 us, 13.5 sy,  0.0 ni, 64.1 id,  0.0 wa,  0.0 hi,  2.3 si, 
0.0 st
%Cpu3  : 30.0 us, 16.2 sy,  0.0 ni, 47.5 id,  0.0 wa,  0.0 hi,  6.3 si, 
0.0 st

KiB Mem :  3881708 total,  3207488 free,   346680 used,   327540 buff/cache
KiB Swap:  4063228 total,  4062828 free,  400 used.  3232208 avail Mem

   PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND
  1319 root  20   0  819036  12928   4036 S 32.3  0.3   1:19.64 
glusterfs
  1310 root  20   0 1232428  25636   4364 S 12.1  0.7   0:41.25 
glusterfsd



Next, the same write but directly to the brick via XFS, which of course 
is faster:



top - 09:45:09 up 1 day,  6:34,  3 users,  load average: 0.61, 1.01, 1.04
Tasks: 171 total,   2 running, 169 sleeping,   0 stopped,   0 zombie
%Cpu0  :  0.6 us,  2.1 sy,  0.0 ni, 82.6 id, 14.5 wa,  0.0 hi,  0.2 si, 
0.0 st
%Cpu1  : 16.7 us, 83.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si, 
0.0 st
%Cpu2  :  0.4 us,  0.9 sy,  0.0 ni, 94.2 id,  4.4 wa,  0.0 hi,  0.0 si, 
0.0 st
%Cpu3  :  1.1 us,  0.6 sy,  0.0 ni, 98.3 id,  0.0 wa,  0.0 hi,  0.0 si, 
0.0 st

KiB Mem :  3881708 total,   501120 free,   230704 used,  3149884 buff/cache
KiB Swap:  4063228 total,  3876896 free,   186332 used.  3343960 avail Mem

  PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND
14691 root  20   0  107948608512 R 25.0  0.0   0:34.29 dd
 1334 root  20   0 2694264  61076   2228 S  2.7  1.6 283:55.96 
ganesha.nfsd



The result of a dd command directly against the brick FS itself is of 
course much better:



[root@nfs01 gv01]# dd if=/dev/zero of=./some-file.bin
5771692+0 records in
5771692+0 records out
2955106304 bytes (3.0 GB) copied, 35.3425 s, 83.6 MB/s

[root@nfs01 gv01]# pwd
/bricks/0/gv01
[root@nfs01 gv01]#

Tried a few tweak options with no effect:

[root@nfs01 glusterfs]# gluster volume info

Volume Name: gv01
Type: Replicate
Volume ID: e5ccc75e-5192-45ac-b410-a34ebd777666
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: nfs01:/bricks/0/gv01
Brick2: nfs02:/bricks/0/gv01
Options Reconfigured:
cluster.server-quorum-type: server
cluster.quorum-type: auto
server.event-threads: 8
client.event-threads: 8
performance.readdir-ahead: on
performance.write-behind-window-size: 8MB
performance.io-thread-count: 16
performance.cache-size: 1GB
nfs.trusted-sync: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
[root@nfs01 glusterfs]#

That's despite that I can confirm doing 90+ MB/s on my 1Gbe network. 
Thoughts?


--
Cheers,
Tom K.
-

Living on earth is expensive, but it includes a free trip around the sun.



Hi,
As I posted in my previous emails - glusterfs can never match NFS (especially 
async one) performance of small files/latency. That's given by the design.
Nothing you can do about it.
Ondrej

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Rik Theys
Sent: Monday, March 19, 2018 10:38 AM
To: gluster-users@gluster.org; mailingli...@smcleod.net
Subject: Re: [Gluster-users] Gluster very poor performance when copying small 
files (1x (2+1) = 3, SSD)

Hi,

I've done some similar tests and experience similar performance issues (see my 
'gluster for home directories?' thread on the list).

If I read your mail correctly, you are comparing an NFS mount of the brick disk 
against a gluster mount (using the fuse client)?

Which options do you have set on the NFS export (sync or async)?

 From my tests, I concluded that the issue was not bandwidth but latency.
Gluster will only return an IO operation once all bricks have confirmed that 
the data is on disk. If you are using a fuse mount, you might compare with 
using the 'direct-io-mode=disable' option on the client might help (no 
experience with this).

In our tests, I've used NFS-ganesha to serve the gluster volume over NFS. This makes 
things even worse as NFS-ganesha has no "async&

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread Ondrej Valousek
Hi,
As I posted in my previous emails - glusterfs can never match NFS (especially 
async one) performance of small files/latency. That's given by the design.
Nothing you can do about it.
Ondrej

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Rik Theys
Sent: Monday, March 19, 2018 10:38 AM
To: gluster-users@gluster.org; mailingli...@smcleod.net
Subject: Re: [Gluster-users] Gluster very poor performance when copying small 
files (1x (2+1) = 3, SSD)

Hi,

I've done some similar tests and experience similar performance issues (see my 
'gluster for home directories?' thread on the list).

If I read your mail correctly, you are comparing an NFS mount of the brick disk 
against a gluster mount (using the fuse client)?

Which options do you have set on the NFS export (sync or async)?

>From my tests, I concluded that the issue was not bandwidth but latency.
Gluster will only return an IO operation once all bricks have confirmed that 
the data is on disk. If you are using a fuse mount, you might compare with 
using the 'direct-io-mode=disable' option on the client might help (no 
experience with this).

In our tests, I've used NFS-ganesha to serve the gluster volume over NFS. This 
makes things even worse as NFS-ganesha has no "async" mode, which makes 
performance terrible.

If you find a magic knob to make glusterfs fast on small-file workloads, do let 
me know!

Regards,

Rik

On 03/18/2018 11:13 PM, Sam McLeod wrote:
> Howdy all,
> 
> We're experiencing terrible small file performance when copying or 
> moving files on gluster clients.
> 
> In the example below, Gluster is taking 6mins~ to copy 128MB / 21,000 
> files sideways on a client, doing the same thing on NFS (which I know 
> is a totally different solution etc. etc.) takes approximately 10-15 
> seconds(!).
> 
> Any advice for tuning the volume or XFS settings would be greatly 
> appreciated.
> 
> Hopefully I've included enough relevant information below.
> 
> 
> ## Gluster Client
> 
> root@gluster-client:/mnt/gluster_perf_test/  # du -sh .
> 127M    .
> root@gluster-client:/mnt/gluster_perf_test/  # find . -type f | wc -l
> 21791
> root@gluster-client:/mnt/gluster_perf_test/  # du 9584toto9584.txt
> 4    9584toto9584.txt
> 
> 
> root@gluster-client:/mnt/gluster_perf_test/  # time cp -a private 
> private_perf_test
> 
> real    5m51.862s
> user    0m0.862s
> sys    0m8.334s
> 
> root@gluster-client:/mnt/gluster_perf_test/ # time rm -rf 
> private_perf_test/
> 
> real    0m49.702s
> user    0m0.087s
> sys    0m0.958s
> 
> 
> ## Hosts
> 
> - 16x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz per Gluster host / 
> client
> - Storage: iSCSI provisioned (via 10Gbit DAC/Fibre), SSD disk, 50K 
> R/RW 4k IOP/s, 400MB/s per Gluster host
> - Volumes are replicated across two hosts and one arbiter only host
> - Networking is 10Gbit DAC/Fibre between Gluster hosts and clients
> - 18GB DDR4 ECC memory
> 
> ## Volume Info
> 
> root@gluster-host-01:~ # gluster pool list UUID          Hostname             
>            
> State ad02970b-e2aa-4ca8-998c-bd10d5970faa  gluster-host-02.fqdn 
> Connected ea116a94-c19e-48db-b108-0be3ae622e2e  gluster-host-03.fqdn 
> Connected
> 2e855c25-e7ac-4ff6-be85-e8bcc6f45ee4  localhost Connected
> 
> root@gluster-host-01:~ # gluster volume info uat_storage
> 
> Volume Name: uat_storage
> Type: Replicate
> Volume ID: 7918f1c5-5031-47b8-b054-56f6f0c569a2
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster-host-01.fqdn:/mnt/gluster-storage/uat_storage
> Brick2: gluster-host-02.fqdn:/mnt/gluster-storage/uat_storage
> Brick3: gluster-host-03.fqdn:/mnt/gluster-storage/uat_storage 
> (arbiter) Options Reconfigured:
> performance.rda-cache-limit: 256MB
> network.inode-lru-limit: 5
> server.outstanding-rpc-limit: 256
> performance.client-io-threads: true
> nfs.disable: on
> transport.address-family: inet
> client.event-threads: 8
> cluster.eager-lock: true
> cluster.favorite-child-policy: size
> cluster.lookup-optimize: true
> cluster.readdir-optimize: true
> cluster.use-compound-fops: true
> diagnostics.brick-log-level: ERROR
> diagnostics.client-log-level: ERROR
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: true
> network.ping-timeout: 15
> performance.cache-invalidation: true
> performance.cache-max-file-size: 6MB
> performance.cache-refresh-timeout: 60
> performance.cache-size: 1024MB
> performance.io <http://performance.io>-thread-count: 16
> performance.md-cache-timeout: 600
> performance.stat-prefetch: true
> performance.write-behind-wind

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-19 Thread Rik Theys
Hi,

I've done some similar tests and experience similar performance issues
(see my 'gluster for home directories?' thread on the list).

If I read your mail correctly, you are comparing an NFS mount of the
brick disk against a gluster mount (using the fuse client)?

Which options do you have set on the NFS export (sync or async)?

>From my tests, I concluded that the issue was not bandwidth but latency.
Gluster will only return an IO operation once all bricks have confirmed
that the data is on disk. If you are using a fuse mount, you might
compare with using the 'direct-io-mode=disable' option on the client
might help (no experience with this).

In our tests, I've used NFS-ganesha to serve the gluster volume over
NFS. This makes things even worse as NFS-ganesha has no "async" mode,
which makes performance terrible.

If you find a magic knob to make glusterfs fast on small-file workloads,
do let me know!

Regards,

Rik

On 03/18/2018 11:13 PM, Sam McLeod wrote:
> Howdy all,
> 
> We're experiencing terrible small file performance when copying or
> moving files on gluster clients.
> 
> In the example below, Gluster is taking 6mins~ to copy 128MB / 21,000
> files sideways on a client, doing the same thing on NFS (which I know is
> a totally different solution etc. etc.) takes approximately 10-15
> seconds(!).
> 
> Any advice for tuning the volume or XFS settings would be greatly
> appreciated.
> 
> Hopefully I've included enough relevant information below.
> 
> 
> ## Gluster Client
> 
> root@gluster-client:/mnt/gluster_perf_test/  # du -sh .
> 127M    .
> root@gluster-client:/mnt/gluster_perf_test/  # find . -type f | wc -l
> 21791
> root@gluster-client:/mnt/gluster_perf_test/  # du 9584toto9584.txt
> 4    9584toto9584.txt
> 
> 
> root@gluster-client:/mnt/gluster_perf_test/  # time cp -a private
> private_perf_test
> 
> real    5m51.862s
> user    0m0.862s
> sys    0m8.334s
> 
> root@gluster-client:/mnt/gluster_perf_test/ # time rm -rf private_perf_test/
> 
> real    0m49.702s
> user    0m0.087s
> sys    0m0.958s
> 
> 
> ## Hosts
> 
> - 16x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz per Gluster host / client
> - Storage: iSCSI provisioned (via 10Gbit DAC/Fibre), SSD disk, 50K R/RW
> 4k IOP/s, 400MB/s per Gluster host
> - Volumes are replicated across two hosts and one arbiter only host
> - Networking is 10Gbit DAC/Fibre between Gluster hosts and clients
> - 18GB DDR4 ECC memory
> 
> ## Volume Info
> 
> root@gluster-host-01:~ # gluster pool list
> UUID          Hostname                        State
> ad02970b-e2aa-4ca8-998c-bd10d5970faa  gluster-host-02.fqdn Connected
> ea116a94-c19e-48db-b108-0be3ae622e2e  gluster-host-03.fqdn Connected
> 2e855c25-e7ac-4ff6-be85-e8bcc6f45ee4  localhost                      
> Connected
> 
> root@gluster-host-01:~ # gluster volume info uat_storage
> 
> Volume Name: uat_storage
> Type: Replicate
> Volume ID: 7918f1c5-5031-47b8-b054-56f6f0c569a2
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster-host-01.fqdn:/mnt/gluster-storage/uat_storage
> Brick2: gluster-host-02.fqdn:/mnt/gluster-storage/uat_storage
> Brick3: gluster-host-03.fqdn:/mnt/gluster-storage/uat_storage (arbiter)
> Options Reconfigured:
> performance.rda-cache-limit: 256MB
> network.inode-lru-limit: 5
> server.outstanding-rpc-limit: 256
> performance.client-io-threads: true
> nfs.disable: on
> transport.address-family: inet
> client.event-threads: 8
> cluster.eager-lock: true
> cluster.favorite-child-policy: size
> cluster.lookup-optimize: true
> cluster.readdir-optimize: true
> cluster.use-compound-fops: true
> diagnostics.brick-log-level: ERROR
> diagnostics.client-log-level: ERROR
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: true
> network.ping-timeout: 15
> performance.cache-invalidation: true
> performance.cache-max-file-size: 6MB
> performance.cache-refresh-timeout: 60
> performance.cache-size: 1024MB
> performance.io -thread-count: 16
> performance.md-cache-timeout: 600
> performance.stat-prefetch: true
> performance.write-behind-window-size: 256MB
> server.event-threads: 8
> transport.listen-backlog: 2048
> 
> root@gluster-host-01:~ # xfs_info /dev/mapper/gluster-storage-unlocked
> meta-data=/dev/mapper/gluster-storage-unlocked isize=512    agcount=4,
> agsize=196607360 blks
>          =                       sectsz=512   attr=2, projid32bit=1
>          =                       crc=1        finobt=0 spinodes=0
> data     =                       bsize=4096   blocks=786429440, imaxpct=5
>          =                       sunit=0      swidth=0 blks
> naming   =version 2              bsize=8192   ascii-ci=0 ftype=1
> log      =internal               bsize=4096   blocks=383998, version=2
>          =                       sectsz=512   sunit=0 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> 
> --
> Sam McLeod (protoporpoise on IRC)
> 

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-18 Thread Sam McLeod
Hi Tom,

Thanks for your reply.

1. Yes XFS is on a LUKs LV (see below).
2. Yes, I prefer FIO but each Gluster host gets between 50-100K 4K random IOP/s 
both write and read to disk.
3. Yes, we actually use 2x 10Gbit DACs in LACP, but we get full 10Gbit speeds 
(and very low latency thanks to the DACs).
4. I'd love to see that, it'd be much appreciated thanks.



# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
xvdc   202:32   0  1.5T  0 disk
└─xvdc1202:33   0  1.5T  0 part
  └─gluster-storage253:103T  0 lvm
└─gluster-storage-unlocked 253:303T  0 crypt /mnt/gluster-storage
xvda   202:00   18G  0 disk
├─xvda2202:20 17.5G  0 part
│ ├─centos-var 253:20  9.5G  0 lvm   /var
│ └─centos-root253:008G  0 lvm   /
└─xvda1202:10  500M  0 part  /boot
sr0 11:01 1024M  0 rom
xvdb   202:16   0  1.5T  0 disk
└─xvdb1202:17   0  1.5T  0 part
  └─gluster-storage253:103T  0 lvm
└─gluster-storage-unlocked 253:303T  0 crypt /mnt/gluster-storage

--
Sam McLeod
Please respond via email when possible.
https://smcleod.net
https://twitter.com/s_mcleod

> On 19 Mar 2018, at 10:37 am, TomK  wrote:
> 
> On 3/18/2018 6:13 PM, Sam McLeod wrote:
> Even your NFS transfers are 12.5 or so MB per second or less.
> 
> 1) Did you use fdisk and LVM under that XFS filesystem?
> 
> 2) Did you benchmark the XFS with something like bonnie++?  (There's probably 
> newer benchmark suites now.)
> 
> 3) Did you benchmark your Network transfer speeds?  Perhaps your NIC 
> negotiated a lower speed.
> 
> 3) I've done XFS tuning for another purpose but got good results.  If it 
> helps, I can send you the doc.
> 
> Cheers,
> Tom
> 
>> Howdy all,
>> We're experiencing terrible small file performance when copying or moving 
>> files on gluster clients.
>> In the example below, Gluster is taking 6mins~ to copy 128MB / 21,000 files 
>> sideways on a client, doing the same thing on NFS (which I know is a totally 
>> different solution etc. etc.) takes approximately 10-15 seconds(!).
>> Any advice for tuning the volume or XFS settings would be greatly 
>> appreciated.
>> Hopefully I've included enough relevant information below.
>> ## Gluster Client
>> root@gluster-client:/mnt/gluster_perf_test/  # du -sh .
>> 127M.
>> root@gluster-client:/mnt/gluster_perf_test/  # find . -type f | wc -l
>> 21791
>> root@gluster-client:/mnt/gluster_perf_test/  # du 9584toto9584.txt
>> 49584toto9584.txt
>> root@gluster-client:/mnt/gluster_perf_test/  # time cp -a private 
>> private_perf_test
>> real5m51.862s
>> user0m0.862s
>> sys0m8.334s
>> root@gluster-client:/mnt/gluster_perf_test/ # time rm -rf private_perf_test/
>> real0m49.702s
>> user0m0.087s
>> sys0m0.958s
>> ## Hosts
>> - 16x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz per Gluster host / client
>> - Storage: iSCSI provisioned (via 10Gbit DAC/Fibre), SSD disk, 50K R/RW 4k 
>> IOP/s, 400MB/s per Gluster host
>> - Volumes are replicated across two hosts and one arbiter only host
>> - Networking is 10Gbit DAC/Fibre between Gluster hosts and clients
>> - 18GB DDR4 ECC memory
>> ## Volume Info
>> root@gluster-host-01:~ # gluster pool list
>> UUID  HostnameState
>> ad02970b-e2aa-4ca8-998c-bd10d5970faa  gluster-host-02.fqdn Connected
>> ea116a94-c19e-48db-b108-0be3ae622e2e  gluster-host-03.fqdn Connected
>> 2e855c25-e7ac-4ff6-be85-e8bcc6f45ee4  localhost   
>> Connected
>> root@gluster-host-01:~ # gluster volume info uat_storage
>> Volume Name: uat_storage
>> Type: Replicate
>> Volume ID: 7918f1c5-5031-47b8-b054-56f6f0c569a2
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gluster-host-01.fqdn:/mnt/gluster-storage/uat_storage
>> Brick2: gluster-host-02.fqdn:/mnt/gluster-storage/uat_storage
>> Brick3: gluster-host-03.fqdn:/mnt/gluster-storage/uat_storage (arbiter)
>> Options Reconfigured:
>> performance.rda-cache-limit: 256MB
>> network.inode-lru-limit: 5
>> server.outstanding-rpc-limit: 256
>> performance.client-io-threads: true
>> nfs.disable: on
>> transport.address-family: inet
>> client.event-threads: 8
>> cluster.eager-lock: true
>> cluster.favorite-child-policy: size
>> cluster.lookup-optimize: true
>> cluster.readdir-optimize: true
>> cluster.use-compound-fops: true
>> diagnostics.brick-log-level: ERROR
>> diagnostics.client-log-level: ERROR
>> features.cache-invalidation-timeout: 600
>> features.cache-invalidation: true
>> network.ping-timeout: 15
>> performance.cache-invalidation: true
>> performance.cache-max-file-size: 6MB
>> performance.cache-refresh-timeout: 60
>> performance.cache-size: 1024MB
>> 

Re: [Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-18 Thread TomK

On 3/18/2018 6:13 PM, Sam McLeod wrote:
Even your NFS transfers are 12.5 or so MB per second or less.

1) Did you use fdisk and LVM under that XFS filesystem?

2) Did you benchmark the XFS with something like bonnie++?  (There's 
probably newer benchmark suites now.)


3) Did you benchmark your Network transfer speeds?  Perhaps your NIC 
negotiated a lower speed.


3) I've done XFS tuning for another purpose but got good results.  If it 
helps, I can send you the doc.


Cheers,
Tom


Howdy all,

We're experiencing terrible small file performance when copying or 
moving files on gluster clients.


In the example below, Gluster is taking 6mins~ to copy 128MB / 21,000 
files sideways on a client, doing the same thing on NFS (which I know is 
a totally different solution etc. etc.) takes approximately 10-15 
seconds(!).


Any advice for tuning the volume or XFS settings would be greatly 
appreciated.


Hopefully I've included enough relevant information below.


## Gluster Client

root@gluster-client:/mnt/gluster_perf_test/  # du -sh .
127M    .
root@gluster-client:/mnt/gluster_perf_test/  # find . -type f | wc -l
21791
root@gluster-client:/mnt/gluster_perf_test/  # du 9584toto9584.txt
4    9584toto9584.txt


root@gluster-client:/mnt/gluster_perf_test/  # time cp -a private 
private_perf_test


real    5m51.862s
user    0m0.862s
sys    0m8.334s

root@gluster-client:/mnt/gluster_perf_test/ # time rm -rf private_perf_test/

real    0m49.702s
user    0m0.087s
sys    0m0.958s


## Hosts

- 16x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz per Gluster host / client
- Storage: iSCSI provisioned (via 10Gbit DAC/Fibre), SSD disk, 50K R/RW 
4k IOP/s, 400MB/s per Gluster host

- Volumes are replicated across two hosts and one arbiter only host
- Networking is 10Gbit DAC/Fibre between Gluster hosts and clients
- 18GB DDR4 ECC memory

## Volume Info

root@gluster-host-01:~ # gluster pool list
UUID          Hostname                        State
ad02970b-e2aa-4ca8-998c-bd10d5970faa  gluster-host-02.fqdn Connected
ea116a94-c19e-48db-b108-0be3ae622e2e  gluster-host-03.fqdn Connected
2e855c25-e7ac-4ff6-be85-e8bcc6f45ee4  localhost   
Connected


root@gluster-host-01:~ # gluster volume info uat_storage

Volume Name: uat_storage
Type: Replicate
Volume ID: 7918f1c5-5031-47b8-b054-56f6f0c569a2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster-host-01.fqdn:/mnt/gluster-storage/uat_storage
Brick2: gluster-host-02.fqdn:/mnt/gluster-storage/uat_storage
Brick3: gluster-host-03.fqdn:/mnt/gluster-storage/uat_storage (arbiter)
Options Reconfigured:
performance.rda-cache-limit: 256MB
network.inode-lru-limit: 5
server.outstanding-rpc-limit: 256
performance.client-io-threads: true
nfs.disable: on
transport.address-family: inet
client.event-threads: 8
cluster.eager-lock: true
cluster.favorite-child-policy: size
cluster.lookup-optimize: true
cluster.readdir-optimize: true
cluster.use-compound-fops: true
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
features.cache-invalidation-timeout: 600
features.cache-invalidation: true
network.ping-timeout: 15
performance.cache-invalidation: true
performance.cache-max-file-size: 6MB
performance.cache-refresh-timeout: 60
performance.cache-size: 1024MB
performance.io -thread-count: 16
performance.md-cache-timeout: 600
performance.stat-prefetch: true
performance.write-behind-window-size: 256MB
server.event-threads: 8
transport.listen-backlog: 2048

root@gluster-host-01:~ # xfs_info /dev/mapper/gluster-storage-unlocked
meta-data=/dev/mapper/gluster-storage-unlocked isize=512    agcount=4, 
agsize=196607360 blks

          =                       sectsz=512   attr=2, projid32bit=1
          =                       crc=1        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=786429440, imaxpct=5
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=8192   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=383998, version=2
          =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


--
Sam McLeod (protoporpoise on IRC)
https://smcleod.net
https://twitter.com/s_mcleod

Words are my own opinions and do not necessarily represent those of 
my employer or partners.




___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users




--
Cheers,
Tom K.
-

Living on earth is expensive, but it includes a free trip around the sun.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] Gluster very poor performance when copying small files (1x (2+1) = 3, SSD)

2018-03-18 Thread Sam McLeod
Howdy all,

We're experiencing terrible small file performance when copying or moving files 
on gluster clients.

In the example below, Gluster is taking 6mins~ to copy 128MB / 21,000 files 
sideways on a client, doing the same thing on NFS (which I know is a totally 
different solution etc. etc.) takes approximately 10-15 seconds(!).

Any advice for tuning the volume or XFS settings would be greatly appreciated.

Hopefully I've included enough relevant information below.


## Gluster Client

root@gluster-client:/mnt/gluster_perf_test/  # du -sh .
127M.
root@gluster-client:/mnt/gluster_perf_test/  # find . -type f | wc -l
21791
root@gluster-client:/mnt/gluster_perf_test/  # du 9584toto9584.txt
49584toto9584.txt


root@gluster-client:/mnt/gluster_perf_test/  # time cp -a private 
private_perf_test

real5m51.862s
user0m0.862s
sys0m8.334s

root@gluster-client:/mnt/gluster_perf_test/ # time rm -rf private_perf_test/

real0m49.702s
user0m0.087s
sys0m0.958s


## Hosts

- 16x Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz per Gluster host / client
- Storage: iSCSI provisioned (via 10Gbit DAC/Fibre), SSD disk, 50K R/RW 4k 
IOP/s, 400MB/s per Gluster host
- Volumes are replicated across two hosts and one arbiter only host
- Networking is 10Gbit DAC/Fibre between Gluster hosts and clients
- 18GB DDR4 ECC memory

## Volume Info

root@gluster-host-01:~ # gluster pool list
UUID  HostnameState
ad02970b-e2aa-4ca8-998c-bd10d5970faa  gluster-host-02.fqdn Connected
ea116a94-c19e-48db-b108-0be3ae622e2e  gluster-host-03.fqdn Connected
2e855c25-e7ac-4ff6-be85-e8bcc6f45ee4  localhost   Connected

root@gluster-host-01:~ # gluster volume info uat_storage

Volume Name: uat_storage
Type: Replicate
Volume ID: 7918f1c5-5031-47b8-b054-56f6f0c569a2
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: gluster-host-01.fqdn:/mnt/gluster-storage/uat_storage
Brick2: gluster-host-02.fqdn:/mnt/gluster-storage/uat_storage
Brick3: gluster-host-03.fqdn:/mnt/gluster-storage/uat_storage (arbiter)
Options Reconfigured:
performance.rda-cache-limit: 256MB
network.inode-lru-limit: 5
server.outstanding-rpc-limit: 256
performance.client-io-threads: true
nfs.disable: on
transport.address-family: inet
client.event-threads: 8
cluster.eager-lock: true
cluster.favorite-child-policy: size
cluster.lookup-optimize: true
cluster.readdir-optimize: true
cluster.use-compound-fops: true
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
features.cache-invalidation-timeout: 600
features.cache-invalidation: true
network.ping-timeout: 15
performance.cache-invalidation: true
performance.cache-max-file-size: 6MB
performance.cache-refresh-timeout: 60
performance.cache-size: 1024MB
performance.io-thread-count: 16
performance.md-cache-timeout: 600
performance.stat-prefetch: true
performance.write-behind-window-size: 256MB
server.event-threads: 8
transport.listen-backlog: 2048

root@gluster-host-01:~ # xfs_info /dev/mapper/gluster-storage-unlocked
meta-data=/dev/mapper/gluster-storage-unlocked isize=512agcount=4, 
agsize=196607360 blks
 =   sectsz=512   attr=2, projid32bit=1
 =   crc=1finobt=0 spinodes=0
data =   bsize=4096   blocks=786429440, imaxpct=5
 =   sunit=0  swidth=0 blks
naming   =version 2  bsize=8192   ascii-ci=0 ftype=1
log  =internal   bsize=4096   blocks=383998, version=2
 =   sectsz=512   sunit=0 blks, lazy-count=1
realtime =none   extsz=4096   blocks=0, rtextents=0


--
Sam McLeod (protoporpoise on IRC)
https://smcleod.net
https://twitter.com/s_mcleod

Words are my own opinions and do not necessarily represent those of my employer 
or partners.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users