Re: [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-04-18 Thread Nithya Balachandran
Hi Artem,

I have not looked into this but what Amar said seems likely. Including
Poornima to this thread.

Regards,
Nithya

On 18 April 2018 at 10:01, Artem Russakovskii  wrote:

> Nithya, Amar,
>
> Any movement here? There could be a significant performance gain here that
> may also affect other bottlenecks that I'm experiencing which make gluster
> close to unusable at times.
>
>
> Sincerely,
> Artem
>
> --
> Founder, Android Police , APK Mirror
> , Illogical Robot LLC
> beerpla.net | +ArtemRussakovskii
>  | @ArtemR
> 
>
> On Tue, Feb 27, 2018 at 6:15 AM, Ingard Mevåg  wrote:
>
>> We got extremely slow stat calls on our disperse cluster running latest
>> 3.12 with clients also running 3.12.
>> When we downgraded clients to 3.10 the slow stat problem went away.
>>
>> We later found out that by disabling disperse.eager-lock we could run the
>> 3.12 clients without much issue (a little bit slower writes)
>> There is an open issue on this here : https://bugzilla.redhat.com/
>> show_bug.cgi?id=1546732
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-04-17 Thread Artem Russakovskii
Nithya, Amar,

Any movement here? There could be a significant performance gain here that
may also affect other bottlenecks that I'm experiencing which make gluster
close to unusable at times.


Sincerely,
Artem

--
Founder, Android Police , APK Mirror
, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii
 | @ArtemR


On Tue, Feb 27, 2018 at 6:15 AM, Ingard Mevåg  wrote:

> We got extremely slow stat calls on our disperse cluster running latest
> 3.12 with clients also running 3.12.
> When we downgraded clients to 3.10 the slow stat problem went away.
>
> We later found out that by disabling disperse.eager-lock we could run the
> 3.12 clients without much issue (a little bit slower writes)
> There is an open issue on this here : https://bugzilla.redhat.com/
> show_bug.cgi?id=1546732
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-02-27 Thread Ingard Mevåg
We got extremely slow stat calls on our disperse cluster running latest
3.12 with clients also running 3.12.
When we downgraded clients to 3.10 the slow stat problem went away.

We later found out that by disabling disperse.eager-lock we could run the
3.12 clients without much issue (a little bit slower writes)
There is an open issue on this here :
https://bugzilla.redhat.com/show_bug.cgi?id=1546732
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-02-27 Thread Artem Russakovskii
Any updates on this one?

On Mon, Feb 5, 2018 at 8:18 AM, Tom Fite  wrote:

> Hi all,
>
> I have seen this issue as well, on Gluster 3.12.1. (3 bricks per box, 2
> boxes, distributed-replicate) My testing shows the same thing -- running a
> find on a directory dramatically increases lstat performance. To add
> another clue, the performance degrades again after issuing a call to reset
> the system's cache of dentries and inodes:
>
> # sync; echo 2 > /proc/sys/vm/drop_caches
>
> I think that this shows that it's the system cache that's actually doing
> the heavy lifting here. There are a couple of sysctl tunables that I've
> found helps out with this.
>
> See here:
>
> http://docs.gluster.org/en/latest/Administrator%20Guide/
> Linux%20Kernel%20Tuning/
>
> Contrary to what that doc says, I've found that setting
> vm.vfs_cache_pressure to a low value increases performance by allowing more
> dentries and inodes to be retained in the cache.
>
> # Set the swappiness to avoid swap when possible.
> vm.swappiness = 10
>
> # Set the cache pressure to prefer inode and dentry cache over file cache.
> This is done to keep as many
> # dentries and inodes in cache as possible, which dramatically improves
> gluster small file performance.
> vm.vfs_cache_pressure = 25
>
> For comparison, my config is:
>
> Volume Name: gv0
> Type: Tier
> Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
> Status: Started
> Snapshot Count: 13
> Number of Bricks: 8
> Transport-type: tcp
> Hot Tier :
> Hot Tier Type : Replicate
> Number of Bricks: 1 x 2 = 2
> Brick1: gluster2:/data/hot_tier/gv0
> Brick2: gluster1:/data/hot_tier/gv0
> Cold Tier:
> Cold Tier Type : Distributed-Replicate
> Number of Bricks: 3 x 2 = 6
> Brick3: gluster1:/data/brick1/gv0
> Brick4: gluster2:/data/brick1/gv0
> Brick5: gluster1:/data/brick2/gv0
> Brick6: gluster2:/data/brick2/gv0
> Brick7: gluster1:/data/brick3/gv0
> Brick8: gluster2:/data/brick3/gv0
> Options Reconfigured:
> performance.cache-max-file-size: 128MB
> cluster.readdir-optimize: on
> cluster.watermark-hi: 95
> features.ctr-sql-db-cachesize: 262144
> cluster.read-freq-threshold: 5
> cluster.write-freq-threshold: 2
> features.record-counters: on
> cluster.tier-promote-frequency: 15000
> cluster.tier-pause: off
> cluster.tier-compact: on
> cluster.tier-mode: cache
> features.ctr-enabled: on
> performance.cache-refresh-timeout: 60
> performance.stat-prefetch: on
> server.outstanding-rpc-limit: 2056
> cluster.lookup-optimize: on
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> features.barrier: disable
> client.event-threads: 4
> server.event-threads: 4
> performance.cache-size: 1GB
> network.inode-lru-limit: 9
> performance.md-cache-timeout: 600
> performance.cache-invalidation: on
> features.cache-invalidation-timeout: 600
> features.cache-invalidation: on
> performance.quick-read: on
> performance.io-cache: on
> performance.nfs.write-behind-window-size: 4MB
> performance.write-behind-window-size: 4MB
> performance.nfs.io-threads: off
> network.tcp-window-size: 1048576
> performance.rda-cache-limit: 64MB
> performance.flush-behind: on
> server.allow-insecure: on
> cluster.tier-demote-frequency: 18000
> cluster.tier-max-files: 100
> cluster.tier-max-promote-file-size: 10485760
> cluster.tier-max-mb: 64000
> features.ctr-sql-db-wal-autocheckpoint: 2500
> cluster.tier-hot-compact-frequency: 86400
> cluster.tier-cold-compact-frequency: 86400
> performance.readdir-ahead: off
> cluster.watermark-low: 50
> storage.build-pgfid: on
> performance.rda-request-size: 128KB
> performance.rda-low-wmark: 4KB
> cluster.min-free-disk: 5%
> auto-delete: enable
>
>
> On Sun, Feb 4, 2018 at 9:44 PM, Amar Tumballi  wrote:
>
>> Thanks for the report Artem,
>>
>> Looks like the issue is about cache warming up. Specially, I suspect
>> rsync doing a 'readdir(), stat(), file operations' loop, where as when a
>> find or ls is issued, we get 'readdirp()' request, which contains the stat
>> information along with entries, which also makes sure cache is up-to-date
>> (at md-cache layer).
>>
>> Note that this is just a off-the memory hypothesis, We surely need to
>> analyse and debug more thoroughly for a proper explanation.  Some one in my
>> team would look at it soon.
>>
>> Regards,
>> Amar
>>
>> On Mon, Feb 5, 2018 at 7:25 AM, Vlad Kopylov  wrote:
>>
>>> You mounting it to the local bricks?
>>>
>>> struggling with same performance issues
>>> try using this volume setting
>>> http://lists.gluster.org/pipermail/gluster-users/2018-Januar
>>> y/033397.html
>>> performance.stat-prefetch: on might be it
>>>
>>> seems like when it gets to cache it is fast - those stat fetch which
>>> seem to come from .gluster are slow
>>>
>>> On Sun, Feb 4, 2018 at 3:45 AM, Artem Russakovskii 
>>> wrote:
>>> > An update, and a very interesting one!
>>> >
>>> > After I started stracing rsync, all I could see was lstat calls, quite
>>> slow
>>> > ones, over and over, which is expected.
>>> >
>>> > For example: lsta

Re: [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-02-05 Thread Tom Fite
Hi all,

I have seen this issue as well, on Gluster 3.12.1. (3 bricks per box, 2
boxes, distributed-replicate) My testing shows the same thing -- running a
find on a directory dramatically increases lstat performance. To add
another clue, the performance degrades again after issuing a call to reset
the system's cache of dentries and inodes:

# sync; echo 2 > /proc/sys/vm/drop_caches

I think that this shows that it's the system cache that's actually doing
the heavy lifting here. There are a couple of sysctl tunables that I've
found helps out with this.

See here:

http://docs.gluster.org/en/latest/Administrator%20Guide/Linux%20Kernel%20Tuning/

Contrary to what that doc says, I've found that setting
vm.vfs_cache_pressure to a low value increases performance by allowing more
dentries and inodes to be retained in the cache.

# Set the swappiness to avoid swap when possible.
vm.swappiness = 10

# Set the cache pressure to prefer inode and dentry cache over file cache.
This is done to keep as many
# dentries and inodes in cache as possible, which dramatically improves
gluster small file performance.
vm.vfs_cache_pressure = 25

For comparison, my config is:

Volume Name: gv0
Type: Tier
Volume ID: d490a9ec-f9c8-4f10-a7f3-e1b6d3ced196
Status: Started
Snapshot Count: 13
Number of Bricks: 8
Transport-type: tcp
Hot Tier :
Hot Tier Type : Replicate
Number of Bricks: 1 x 2 = 2
Brick1: gluster2:/data/hot_tier/gv0
Brick2: gluster1:/data/hot_tier/gv0
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick3: gluster1:/data/brick1/gv0
Brick4: gluster2:/data/brick1/gv0
Brick5: gluster1:/data/brick2/gv0
Brick6: gluster2:/data/brick2/gv0
Brick7: gluster1:/data/brick3/gv0
Brick8: gluster2:/data/brick3/gv0
Options Reconfigured:
performance.cache-max-file-size: 128MB
cluster.readdir-optimize: on
cluster.watermark-hi: 95
features.ctr-sql-db-cachesize: 262144
cluster.read-freq-threshold: 5
cluster.write-freq-threshold: 2
features.record-counters: on
cluster.tier-promote-frequency: 15000
cluster.tier-pause: off
cluster.tier-compact: on
cluster.tier-mode: cache
features.ctr-enabled: on
performance.cache-refresh-timeout: 60
performance.stat-prefetch: on
server.outstanding-rpc-limit: 2056
cluster.lookup-optimize: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.barrier: disable
client.event-threads: 4
server.event-threads: 4
performance.cache-size: 1GB
network.inode-lru-limit: 9
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.quick-read: on
performance.io-cache: on
performance.nfs.write-behind-window-size: 4MB
performance.write-behind-window-size: 4MB
performance.nfs.io-threads: off
network.tcp-window-size: 1048576
performance.rda-cache-limit: 64MB
performance.flush-behind: on
server.allow-insecure: on
cluster.tier-demote-frequency: 18000
cluster.tier-max-files: 100
cluster.tier-max-promote-file-size: 10485760
cluster.tier-max-mb: 64000
features.ctr-sql-db-wal-autocheckpoint: 2500
cluster.tier-hot-compact-frequency: 86400
cluster.tier-cold-compact-frequency: 86400
performance.readdir-ahead: off
cluster.watermark-low: 50
storage.build-pgfid: on
performance.rda-request-size: 128KB
performance.rda-low-wmark: 4KB
cluster.min-free-disk: 5%
auto-delete: enable


On Sun, Feb 4, 2018 at 9:44 PM, Amar Tumballi  wrote:

> Thanks for the report Artem,
>
> Looks like the issue is about cache warming up. Specially, I suspect rsync
> doing a 'readdir(), stat(), file operations' loop, where as when a find or
> ls is issued, we get 'readdirp()' request, which contains the stat
> information along with entries, which also makes sure cache is up-to-date
> (at md-cache layer).
>
> Note that this is just a off-the memory hypothesis, We surely need to
> analyse and debug more thoroughly for a proper explanation.  Some one in my
> team would look at it soon.
>
> Regards,
> Amar
>
> On Mon, Feb 5, 2018 at 7:25 AM, Vlad Kopylov  wrote:
>
>> You mounting it to the local bricks?
>>
>> struggling with same performance issues
>> try using this volume setting
>> http://lists.gluster.org/pipermail/gluster-users/2018-January/033397.html
>> performance.stat-prefetch: on might be it
>>
>> seems like when it gets to cache it is fast - those stat fetch which
>> seem to come from .gluster are slow
>>
>> On Sun, Feb 4, 2018 at 3:45 AM, Artem Russakovskii 
>> wrote:
>> > An update, and a very interesting one!
>> >
>> > After I started stracing rsync, all I could see was lstat calls, quite
>> slow
>> > ones, over and over, which is expected.
>> >
>> > For example: lstat("uploads/2016/10/nexus2c
>> ee_DSC05339_thumb-161x107.jpg",
>> > {st_mode=S_IFREG|0664, st_size=4043, ...}) = 0
>> >
>> > I googled around and found
>> > https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1, which is
>> > seeing this exact issue with gluster, rsync and xfs.
>> >
>> > Here's the craziest finding so f

Re: [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-02-04 Thread Amar Tumballi
Thanks for the report Artem,

Looks like the issue is about cache warming up. Specially, I suspect rsync
doing a 'readdir(), stat(), file operations' loop, where as when a find or
ls is issued, we get 'readdirp()' request, which contains the stat
information along with entries, which also makes sure cache is up-to-date
(at md-cache layer).

Note that this is just a off-the memory hypothesis, We surely need to
analyse and debug more thoroughly for a proper explanation.  Some one in my
team would look at it soon.

Regards,
Amar

On Mon, Feb 5, 2018 at 7:25 AM, Vlad Kopylov  wrote:

> You mounting it to the local bricks?
>
> struggling with same performance issues
> try using this volume setting
> http://lists.gluster.org/pipermail/gluster-users/2018-January/033397.html
> performance.stat-prefetch: on might be it
>
> seems like when it gets to cache it is fast - those stat fetch which
> seem to come from .gluster are slow
>
> On Sun, Feb 4, 2018 at 3:45 AM, Artem Russakovskii 
> wrote:
> > An update, and a very interesting one!
> >
> > After I started stracing rsync, all I could see was lstat calls, quite
> slow
> > ones, over and over, which is expected.
> >
> > For example: lstat("uploads/2016/10/nexus2cee_DSC05339_thumb-
> 161x107.jpg",
> > {st_mode=S_IFREG|0664, st_size=4043, ...}) = 0
> >
> > I googled around and found
> > https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1, which is
> > seeing this exact issue with gluster, rsync and xfs.
> >
> > Here's the craziest finding so far. If while rsync is running (or right
> > before), I run /bin/ls or find on the same gluster dirs, it immediately
> > speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely
> > insane.
> >
> > I'm stracing the rsync run, and the slow lstat calls flood in at an
> > incredible speed as soon as ls or find run. Several hundred of files per
> > minute (excruciatingly slow) becomes thousands or even tens of thousands
> of
> > files a second.
> >
> > What do you make of this?
> >
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Amar Tumballi (amarts)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-02-04 Thread Vlad Kopylov
You mounting it to the local bricks?

struggling with same performance issues
try using this volume setting
http://lists.gluster.org/pipermail/gluster-users/2018-January/033397.html
performance.stat-prefetch: on might be it

seems like when it gets to cache it is fast - those stat fetch which
seem to come from .gluster are slow

On Sun, Feb 4, 2018 at 3:45 AM, Artem Russakovskii  wrote:
> An update, and a very interesting one!
>
> After I started stracing rsync, all I could see was lstat calls, quite slow
> ones, over and over, which is expected.
>
> For example: lstat("uploads/2016/10/nexus2cee_DSC05339_thumb-161x107.jpg",
> {st_mode=S_IFREG|0664, st_size=4043, ...}) = 0
>
> I googled around and found
> https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1, which is
> seeing this exact issue with gluster, rsync and xfs.
>
> Here's the craziest finding so far. If while rsync is running (or right
> before), I run /bin/ls or find on the same gluster dirs, it immediately
> speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely
> insane.
>
> I'm stracing the rsync run, and the slow lstat calls flood in at an
> incredible speed as soon as ls or find run. Several hundred of files per
> minute (excruciatingly slow) becomes thousands or even tens of thousands of
> files a second.
>
> What do you make of this?
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-02-04 Thread Artem Russakovskii
An update, and a very interesting one!

After I started stracing rsync, all I could see was lstat calls, quite slow
ones, over and over, which is expected.

For example: lstat("uploads/2016/10/nexus2cee_DSC05339_thumb-161x107.jpg",
{st_mode=S_IFREG|0664, st_size=4043, ...}) = 0

I googled around and found
https://gist.github.com/nh2/1836415489e2132cf85ed3832105fcc1, which is seeing
this exact issue with gluster, rsync and xfs.

Here's the craziest finding so far. If while rsync is running (or right
before), I run /bin/ls or find on the same gluster dirs, it immediately
speeds up rsync by a factor of 100 or maybe even 1000. It's absolutely
insane.

I'm stracing the rsync run, and the slow lstat calls flood in at an
incredible speed as soon as ls or find run. Several hundred of files per
minute (excruciatingly slow) becomes thousands or even tens of thousands of
files a second.

What do you make of this?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Very slow rsync to gluster volume UNLESS `ls` or `find` scan dir on gluster volume first

2018-02-04 Thread Artem Russakovskii
Hi,

I have been working on setting up a 4 replica gluster with over a million
files (~250GB total), and I've seen some really weird stuff happen, even
after trying to optimize for small files. I've set up a 4-brick replicate
volume (gluster 3.13.2).

It took almost 2 days to rsync the data from the local drive to the gluster
volume, and now I'm running a 2nd rsync that just looks for changes in case
more files have been written. I'd like to concentrate this email on a very
specific and odd issue.


The dir structure is
/
  MM/
 10k+files in each month folder


rsyncing each month folder cold can take 2+ minutes.

However, if I ls the destination folder first, or use find (both of which
finish within 5 seconds), the rsync is almost instant.


Here's a log with time calls that shows you what happens.:

box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/08/ 08/
sending incremental file list
^Crsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at
rsync.c(637) [sender=3.1.0]

real1m39.848s
user0m0.010s
sys 0m0.030s
box:/mnt/gluster/uploads/2017 # time find 08 | wc -l
14254

real0m0.726s
user0m0.013s
sys 0m0.033s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/08/ 08/
sending incremental file list

real0m0.562s
user0m0.057s
sys 0m0.137s
box:/mnt/gluster/uploads/2017 # time find 07 | wc -l
10103

real0m4.550s
user0m0.010s
sys 0m0.033s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/07/ 07/
sending incremental file list

real0m0.428s
user0m0.030s
sys 0m0.083s
box:/mnt/gluster/uploads/2017 # time ls 06 | wc -l
11890

real0m1.850s
user0m0.077s
sys 0m0.040s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/06/ 06/
sending incremental file list

real0m0.627s
user0m0.073s
sys 0m0.107s
box:/mnt/gluster/uploads/2017 # time rsync -aPr
/srv/www/htdocs/uploads/2017/05/ 05/
sending incremental file list

real2m24.382s
user0m0.127s
sys 0m0.357s


Note how if I precede the rsync call with ls or find, the rsync completes
in less than a second (finding no files to sync because they've already
been synced). Otherwise, it takes over 2 minutes (I interrupted the first
call before the 2 minutes because it was already taking too long).

What could be causing rsync to work so slowly unless the dir is primed?

Volume config:

Volume Name: gluster
Type: Replicate
Volume ID: X
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: server1 :/mnt/server1_block4/gluster
Brick2: server2 :/mnt/server2_block4/gluster
Brick3: server3 :/mnt/server3_block4/gluster
Brick4: server4 :/mnt/server4_block4/gluster
Options Reconfigured:
performance.parallel-readdir: off
transport.address-family: inet
nfs.disable: on
cluster.self-heal-daemon: enable
performance.cache-size: 1GB
network.ping-timeout: 5
cluster.quorum-type: fixed
cluster.quorum-count: 1
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 50
performance.rda-cache-limit: 256MB
performance.read-ahead: off
client.event-threads: 4
server.event-threads: 4



Thank you for any insight.

Sincerely,
Artem
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users