Re: [Gluster-users] Slow write times to gluster disk

2018-07-05 Thread Raghavendra Gowdappa
On Fri, Jul 6, 2018 at 5:29 AM, Pat Haley  wrote:

>
> Hi Raghavendra,
>
> Our technician may have some time to look at this issue tomorrow.  Are
> there any tests that you'd like to see?
>

Sorry. I've been busy with other things and was away from work for couple
of days. It'll take me another 2 days to work on this issue again. So, most
likely you'll have an update on this next week.


> Thanks
>
> Pat
>
>
>
> On 06/29/2018 11:25 PM, Raghavendra Gowdappa wrote:
>
>
>
> On Fri, Jun 29, 2018 at 10:38 PM, Pat Haley  wrote:
>
>>
>> Hi Raghavendra,
>>
>> We ran the tests (write tests) and I copied the log files for both the
>> server and the client to http://mseas.mit.edu/download/
>> phaley/GlusterUsers/2018/Jun29/ .  Is there any additional trace
>> information you need?  (If so, where should I look for it?)
>>
>
> Nothing for now. I can see from logs that workaround is not helping. fstat
> requests are not absorbed by md-cache and read-ahead is witnessing them and
> flushing its read-ahead cache. I am investigating more on md-cache (It also
> seems to be invalidating inodes quite frequently which actually might be
> the root cause of seeing so many fstat requests from kernel). Will post
> when I find anything relevant.
>
>
>> Also the volume information you requested
>>
>> [root@mseas-data2 ~]# gluster volume info data-volume
>>
>> Volume Name: data-volume
>> Type: Distribute
>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: mseas-data2:/mnt/brick1
>> Brick2: mseas-data2:/mnt/brick2
>> Options Reconfigured:
>> diagnostics.client-log-level: TRACE
>> network.inode-lru-limit: 5
>> performance.md-cache-timeout: 60
>> performance.open-behind: off
>> disperse.eager-lock: off
>> auth.allow: *
>> server.allow-insecure: on
>> nfs.exports-auth-enable: on
>> diagnostics.brick-sys-log-level: WARNING
>> performance.readdir-ahead: on
>> nfs.disable: on
>> nfs.export-volumes: off
>> [root@mseas-data2 ~]#
>>
>>
>> On 06/29/2018 12:28 PM, Raghavendra Gowdappa wrote:
>>
>>
>>
>> On Fri, Jun 29, 2018 at 8:24 PM, Pat Haley  wrote:
>>
>>>
>>> Hi Raghavendra,
>>>
>>> Our technician was able to try the manual setting today.  He found that
>>> our upper limit for performance.md-cache-timeout was 60 not 600, so he
>>> used that value, along with the network.inode-lru-limit=5.
>>>
>>> The result was another small (~1%) increase in speed.  Does this suggest
>>> some addition tests/changes we could try?
>>>
>>
>> Can you set gluster option diagnostics.client-log-level to TRACE  and run
>> sequential read tests again (with md-cache-timeout value of 60)?
>>
>> #gluster volume set  diagnostics.client-log-level TRACE
>>
>> Also are you sure that open-behind was turned off? Can you give the
>> output of,
>>
>> # gluster volume info 
>>
>>
>>> Thanks
>>>
>>> Pat
>>>
>>>
>>>
>>>
>>> On 06/25/2018 09:39 PM, Raghavendra Gowdappa wrote:
>>>
>>>
>>>
>>> On Tue, Jun 26, 2018 at 3:21 AM, Pat Haley  wrote:
>>>

 Hi Raghavendra,

 Setting the performance.write-behind off had a small improvement on the
 write speed (~3%),

 We were unable to turn on "group metadata-cache".  When we try get
 errors like

 # gluster volume set data-volume group metadata-cache
 '/var/lib/glusterd/groups/metadata-cache' file format not valid.

 Was metadata-cache available for gluster 3.7.11? We ask because the
 release notes for 3.11 mentions “Feature for metadata-caching/small file
 performance is production ready.” (https://gluster.readthedocs.i
 o/en/latest/release-notes/3.11.0/).

 Do any of these results suggest anything?  If not, what further tests
 would be useful?

>>>
>>> Group metadata-cache is just a bunch of options one sets on a volume.
>>> So, You can set them manually using gluster cli. Following are the options
>>> and their values:
>>>
>>> performance.md-cache-timeout=600
>>> network.inode-lru-limit=5
>>>
>>>
>>>
 Thanks

 Pat




 On 06/22/2018 07:51 AM, Raghavendra Gowdappa wrote:



 On Thu, Jun 21, 2018 at 8:41 PM, Pat Haley  wrote:

>
> Hi Raghavendra,
>
> Thanks for the suggestions.  Our technician will be in on Monday.
> We'll test then and let you know the results.
>
> One question I have, is the "group metadata-cache" option supposed to
> directly impact the performance or is it to help collect data?  If the
> latter, where will the data be located?
>

 It impacts performance.


> Thanks again.
>
> Pat
>
>
>
> On 06/21/2018 01:01 AM, Raghavendra Gowdappa wrote:
>
>
>
> On Thu, Jun 21, 2018 at 10:24 AM, Raghavendra Gowdappa <
> rgowd...@redhat.com> wrote:
>
>> For the case of writes to glusterfs mount,
>>
>> I saw in earlier conversations that there are too many lookups, but
>> small number of 

Re: [Gluster-users] Slow write times to gluster disk

2018-07-05 Thread Pat Haley


Hi Raghavendra,

Our technician may have some time to look at this issue tomorrow. Are 
there any tests that you'd like to see?


Thanks

Pat


On 06/29/2018 11:25 PM, Raghavendra Gowdappa wrote:



On Fri, Jun 29, 2018 at 10:38 PM, Pat Haley > wrote:



Hi Raghavendra,

We ran the tests (write tests) and I copied the log files for both
the server and the client to
http://mseas.mit.edu/download/phaley/GlusterUsers/2018/Jun29/
 . 
Is there any additional trace information you need? (If so, where
should I look for it?)


Nothing for now. I can see from logs that workaround is not helping. 
fstat requests are not absorbed by md-cache and read-ahead is 
witnessing them and flushing its read-ahead cache. I am investigating 
more on md-cache (It also seems to be invalidating inodes quite 
frequently which actually might be the root cause of seeing so many 
fstat requests from kernel). Will post when I find anything relevant.



Also the volume information you requested

[root@mseas-data2 ~]# gluster volume info data-volume

Volume Name: data-volume
Type: Distribute
Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: mseas-data2:/mnt/brick1
Brick2: mseas-data2:/mnt/brick2
Options Reconfigured:
diagnostics.client-log-level: TRACE
network.inode-lru-limit: 5
performance.md-cache-timeout: 60
performance.open-behind: off
disperse.eager-lock: off
auth.allow: *
server.allow-insecure: on
nfs.exports-auth-enable: on
diagnostics.brick-sys-log-level: WARNING
performance.readdir-ahead: on
nfs.disable: on
nfs.export-volumes: off
[root@mseas-data2 ~]#


On 06/29/2018 12:28 PM, Raghavendra Gowdappa wrote:



On Fri, Jun 29, 2018 at 8:24 PM, Pat Haley mailto:pha...@mit.edu>> wrote:


Hi Raghavendra,

Our technician was able to try the manual setting today.  He
found that our upper limit for performance.md-cache-timeout
was 60 not 600, so he used that value, along with the
network.inode-lru-limit=5.

The result was another small (~1%) increase in speed.  Does
this suggest some addition tests/changes we could try?


Can you set gluster option diagnostics.client-log-level to TRACE 
and run sequential read tests again (with md-cache-timeout value
of 60)?

#gluster volume set  diagnostics.client-log-level TRACE

Also are you sure that open-behind was turned off? Can you give
the output of,

# gluster volume info 


Thanks

Pat




On 06/25/2018 09:39 PM, Raghavendra Gowdappa wrote:



On Tue, Jun 26, 2018 at 3:21 AM, Pat Haley mailto:pha...@mit.edu>> wrote:


Hi Raghavendra,

Setting the performance.write-behind off had a small
improvement on the write speed (~3%),

We were unable to turn on "group metadata-cache".  When
we try get errors like

# gluster volume set data-volume group metadata-cache
'/var/lib/glusterd/groups/metadata-cache' file format
not valid.

Was metadata-cache available for gluster 3.7.11? We ask
because the release notes for 3.11 mentions “Feature for
metadata-caching/small file performance is production
ready.”
(https://gluster.readthedocs.io/en/latest/release-notes/3.11.0/
).

Do any of these results suggest anything?  If not, what
further tests would be useful?


Group metadata-cache is just a bunch of options one sets on
a volume. So, You can set them manually using gluster cli.
Following are the options and their values:

performance.md-cache-timeout=600
network.inode-lru-limit=5



Thanks

Pat




On 06/22/2018 07:51 AM, Raghavendra Gowdappa wrote:



On Thu, Jun 21, 2018 at 8:41 PM, Pat Haley
mailto:pha...@mit.edu>> wrote:


Hi Raghavendra,

Thanks for the suggestions. Our technician will be
in on Monday.  We'll test then and let you know the
results.

One question I have, is the "group metadata-cache"
option supposed to directly impact the performance
or is it to help collect data? If the latter, where
will the data be located?


It impacts performance.


Thanks again.

Pat



On 06/21/2018 01:01 AM, Raghavendra Gowdappa wrote:



On Thu, Jun 21, 2018 at 10:24 AM, Raghavendra
Gowdappa mailto:rgowd...@redhat.com>> wrote:

  

Re: [Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch

2018-07-05 Thread Mauro Tridici
Hi Sanoj,

unfortunately the output of the command execution was not helpful.

[root@s01 ~]# find /tier2/CSP/ans004  | xargs getfattr -d -m. -e hex
[root@s01 ~]# 

Do you have some other idea in order to detect the cause of the issue?

Thank you again,
Mauro


> Il giorno 05 lug 2018, alle ore 09:08, Sanoj Unnikrishnan 
>  ha scritto:
> 
> Hi Mauro,
> 
> A script issue did not capture all necessary xattr. 
> Could you provide the xattrs with.. 
> find /tier2/CSP/ans004  | xargs getfattr -d -m. -e hex
> 
> Meanwhile, If you are being impacted, you could do the following
> back up quota limits 
> disable quota
> enable quota
> freshly set the limits.
> 
> Please capture the xattr values first, so that we can get to know what went 
> wrong.
> Regards,
> Sanoj
> 
> 
> On Tue, Jul 3, 2018 at 4:09 PM, Mauro Tridici  > wrote:
> Dear Sanoj,
> 
> thank you very much for your support.
> I just downloaded and executed the script you suggested.
> 
> This is the full command I executed:
> 
> ./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
> 
> In attachment, you can find the logs generated by the script.
> What can I do now?
> 
> Thank you very much for your patience.
> Mauro
> 
> 
> 
> 
>> Il giorno 03 lug 2018, alle ore 11:34, Sanoj Unnikrishnan 
>> mailto:sunni...@redhat.com>> ha scritto:
>> 
>> Hi Mauro, 
>> 
>> This may be an issue with update of backend xattrs. 
>> To RCA further and provide resolution could you provide me with the logs by 
>> running the following fsck script.
>> https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py 
>> 
>> 
>> Try running the script and revert with the logs generated.
>> 
>> Thanks,
>> Sanoj
>> 
>> 
>> On Mon, Jul 2, 2018 at 2:21 PM, Mauro Tridici > > wrote:
>> Dear Users,
>> 
>> I just noticed that, after some data deletions executed inside 
>> "/tier2/CSP/ans004” folder, the amount of used disk reported by quota 
>> command doesn’t reflect the value indicated by du command.
>> Surfing on the web, it seems that it is a bug of previous versions of 
>> Gluster FS and it was already fixed.
>> In my case, the problem seems unfortunately still here.
>> 
>> How can I solve this issue? Is it possible to do it without starting a 
>> downtime period?
>> 
>> Thank you very much in advance,
>> Mauro
>> 
>> [root@s01 ~]# glusterfs -V
>> glusterfs 3.10.5
>> Repository revision: git://git.gluster.org/glusterfs.git <>
>> Copyright (c) 2006-2016 Red Hat, Inc. > >
>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>> It is licensed to you under your choice of the GNU Lesser
>> General Public License, version 3 or any later version (LGPLv3
>> or later), or the GNU General Public License, version 2 (GPLv2),
>> in all cases as published by the Free Software Foundation.
>> 
>> [root@s01 ~]# gluster volume quota tier2 list /CSP/ans004
>>   Path   Hard-limit  Soft-limit  Used  
>> Available  Soft-limit exceeded? Hard-limit exceeded?
>> ---
>> /CSP/ans0041.0TB 99%(1013.8GB)3.9TB  
>> 0Bytes Yes  Yes
>> 
>> [root@s01 ~]# du -hs /tier2/CSP/ans004/
>> 295G /tier2/CSP/ans004/
>> 
>> 
>> 
>> 
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org 
>> http://lists.gluster.org/mailman/listinfo/gluster-users 
>> 
>> 
> 

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-07-05 Thread mabi
Dear Ravi,

Thank you for your mail and info. 

This is great news if these patches can make it into 3.12.12. I will then 
upgrade asap. Could anyone confirm in case these patches do not make it into 
3.12.12? Because I would then rather wait for the next release. I was already 
told on this list that 3.12.9 should have fixed this issue but unfortunately it 
didn't.

Best regards,
Mabi​​

‐‐‐ Original Message ‐‐‐

On July 4, 2018 5:41 PM, Ravishankar N  wrote:

> ​​
> 
> Hi mabi, there are a couple of AFR patches  from master that I'm
> 
> currently back porting to the 3.12 branch:
> 
> afr: heal gfids when file is not present on all bricks
> 
> afr: don't update readables if inode refresh failed on all children
> 
> afr: fix bug-1363721.t failure
> 
> afr: add quorum checks in pre-op
> 
> afr: don't treat all cases all bricks being blamed as split-brain
> 
> afr: capture the correct errno in post-op quorum check
> 
> afr: add quorum checks in post-op
> 
> Many of these help make the transaction code more robust by fixing
> 
> various corner cases. It would be great if you can wait for the next
> 
> 3.12 minor release (3.12.12 ?) and upgrade to that build and see if the
> 
> issues go away.
> 
> Note: CC'ing Karthik and Jiffin for their help in reviewing and merging
> 
> the backports for the above patches.
> 
> Thanks,
> 
> Ravi
> 
> On 07/04/2018 06:51 PM, mabi wrote:
> 
> > Hello,
> > 
> > I just wanted to let you know that last week I have upgraded my two replica 
> > nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) 
> > are running Debian 9 with a Linux 4 kernel.
> > 
> > Unfortunately I still have the exact same issue. Another detail I might 
> > have not mentioned yet is that I have quotas enabled on this volume, I 
> > don't really know if that is relevant but who knows...
> > 
> > As a reminder here is what happens on the client side which has the volume 
> > mounted via FUSE (take earlier today from the 
> > /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this 
> > specific case it's only one single file who had this issue.
> > 
> > [2018-07-04 08:23:49.314252] E [MSGID: 109089] 
> > [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: 
> > failed to open the fd (0x7fccb00a5120, flags=010) on file 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> >  @ myvol-replicate-0 [Input/output error]
> > 
> > [2018-07-04 08:23:49.328712] W [MSGID: 108027] 
> > [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read 
> > subvols for 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> > 
> > [2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 
> > 0-glusterfs-fuse: 55916791: TRUNCATE() 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> >  => -1 (Input/output error)
> > 
> > Best regards,
> > 
> > M.
> > 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On June 22, 2018 4:44 PM, mabi m...@protonmail.ch wrote:
> > 
> > > Hi,
> > > 
> > > Now that this issue has happened a few times I noticed a few things which 
> > > might be helpful for debugging:
> > > 
> > > -   This problem happens when files are uploaded via a cloud app called 
> > > Nextcloud where the files are encrypted by the app itself on the server 
> > > side (PHP code) but only rarely and randomly.
> > > 
> > > -   It does not seem to happen with Nextcloud installation which does not 
> > > have server side encryption enabled.
> > > 
> > > -   When this happens both first and second node of the replica have 120k 
> > > of context switches and 25k interrupts, the arbiter node 30k context 
> > > switches/20k interrupts. No nodes are overloaded, there is no io/wait and 
> > > no network issues or disconnections.
> > > 
> > > -   All of the problematic files to heal have spaces in one of their 
> > > sub-directories (might be totally irrelevant).
> > > 
> > > If that's of any use my two replica nodes are Debian 8 physical 
> > > servers with ZFS as file system for the bricks and the arbiter is a 
> > > Debian 9 virtual machine with XFS as file system for the brick. To mount 
> > > the volume I use a glusterfs fuse mount on the web server which has 
> > > Nextcloud running.
> > > 
> > > Regards,
> > > 
> > > M.
> > > 
> > > ‐‐‐ Original Message ‐‐‐
> > > 
> > > On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote:
> > > 
> > > 
> > > > Thanks Ravi. Let me know when you have time to have a look. It sort of 
> > > > happens around once or twice per week but today it was 24 files in one 
> > > > go which are unsynched and where I need to manually reset the xattrs on 
> > > > the arbiter node.
> > > > 
> > > > By the way on this volume I use quotas which I set on specifc 
> > > > directories, I don't know if this 

Re: [Gluster-users] Gluster 3.10.5: used disk size reported by quota and du mismatch

2018-07-05 Thread Sanoj Unnikrishnan
Hi Mauro,

A script issue did not capture all necessary xattr.
Could you provide the xattrs with..
find /tier2/CSP/ans004  | xargs getfattr -d -m. -e hex

Meanwhile, If you are being impacted, you could do the following
back up quota limits
disable quota
enable quota
freshly set the limits.

Please capture the xattr values first, so that we can get to know what went
wrong.
Regards,
Sanoj


On Tue, Jul 3, 2018 at 4:09 PM, Mauro Tridici  wrote:

> Dear Sanoj,
>
> thank you very much for your support.
> I just downloaded and executed the script you suggested.
>
> This is the full command I executed:
>
> ./quota_fsck_new.py --full-logs --sub-dir /tier2/CSP/ans004/ /gluster
>
> In attachment, you can find the logs generated by the script.
> What can I do now?
>
> Thank you very much for your patience.
> Mauro
>
>
>
>
> Il giorno 03 lug 2018, alle ore 11:34, Sanoj Unnikrishnan <
> sunni...@redhat.com> ha scritto:
>
> Hi Mauro,
>
> This may be an issue with update of backend xattrs.
> To RCA further and provide resolution could you provide me with the logs
> by running the following fsck script.
> https://review.gluster.org/#/c/19179/6/extras/quota/quota_fsck.py
>
> Try running the script and revert with the logs generated.
>
> Thanks,
> Sanoj
>
>
> On Mon, Jul 2, 2018 at 2:21 PM, Mauro Tridici 
> wrote:
>
>> Dear Users,
>>
>> I just noticed that, after some data deletions executed inside
>> "/tier2/CSP/ans004” folder, the amount of used disk reported by quota
>> command doesn’t reflect the value indicated by du command.
>> Surfing on the web, it seems that it is a bug of previous versions of
>> Gluster FS and it was already fixed.
>> In my case, the problem seems unfortunately still here.
>>
>> How can I solve this issue? Is it possible to do it without starting a
>> downtime period?
>>
>> Thank you very much in advance,
>> Mauro
>>
>> [root@s01 ~]# glusterfs -V
>> glusterfs 3.10.5
>> Repository revision: git://git.gluster.org/glusterfs.git
>> Copyright (c) 2006-2016 Red Hat, Inc. 
>> GlusterFS comes with ABSOLUTELY NO WARRANTY.
>> It is licensed to you under your choice of the GNU Lesser
>> General Public License, version 3 or any later version (LGPLv3
>> or later), or the GNU General Public License, version 2 (GPLv2),
>> in all cases as published by the Free Software Foundation.
>>
>> [root@s01 ~]# gluster volume quota tier2 list /CSP/ans004
>>   Path   Hard-limit  Soft-limit
>> Used  Available  Soft-limit exceeded? Hard-limit exceeded?
>> 
>> ---
>> /CSP/ans0041.0TB 99%(1013.8GB)
>> *3.9TB*  0Bytes Yes  Yes
>>
>> [root@s01 ~]# du -hs /tier2/CSP/ans004/
>> *295G* /tier2/CSP/ans004/
>>
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> -
> Mauro Tridici
>
> Fondazione CMCC
> CMCC Supercomputing Center
> presso Complesso Ecotekne - Università del Salento -
> Strada Prov.le Lecce - Monteroni sn
> 73100 Lecce  IT
> http://www.cmcc.it
>
> mobile: (+39) 327 5630841
> email: mauro.trid...@cmcc.it
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users