Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-07-05 Thread mabi
Dear Ravi,

Thank you for your mail and info. 

This is great news if these patches can make it into 3.12.12. I will then 
upgrade asap. Could anyone confirm in case these patches do not make it into 
3.12.12? Because I would then rather wait for the next release. I was already 
told on this list that 3.12.9 should have fixed this issue but unfortunately it 
didn't.

Best regards,
Mabi​​

‐‐‐ Original Message ‐‐‐

On July 4, 2018 5:41 PM, Ravishankar N  wrote:

> ​​
> 
> Hi mabi, there are a couple of AFR patches  from master that I'm
> 
> currently back porting to the 3.12 branch:
> 
> afr: heal gfids when file is not present on all bricks
> 
> afr: don't update readables if inode refresh failed on all children
> 
> afr: fix bug-1363721.t failure
> 
> afr: add quorum checks in pre-op
> 
> afr: don't treat all cases all bricks being blamed as split-brain
> 
> afr: capture the correct errno in post-op quorum check
> 
> afr: add quorum checks in post-op
> 
> Many of these help make the transaction code more robust by fixing
> 
> various corner cases. It would be great if you can wait for the next
> 
> 3.12 minor release (3.12.12 ?) and upgrade to that build and see if the
> 
> issues go away.
> 
> Note: CC'ing Karthik and Jiffin for their help in reviewing and merging
> 
> the backports for the above patches.
> 
> Thanks,
> 
> Ravi
> 
> On 07/04/2018 06:51 PM, mabi wrote:
> 
> > Hello,
> > 
> > I just wanted to let you know that last week I have upgraded my two replica 
> > nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) 
> > are running Debian 9 with a Linux 4 kernel.
> > 
> > Unfortunately I still have the exact same issue. Another detail I might 
> > have not mentioned yet is that I have quotas enabled on this volume, I 
> > don't really know if that is relevant but who knows...
> > 
> > As a reminder here is what happens on the client side which has the volume 
> > mounted via FUSE (take earlier today from the 
> > /var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this 
> > specific case it's only one single file who had this issue.
> > 
> > [2018-07-04 08:23:49.314252] E [MSGID: 109089] 
> > [dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: 
> > failed to open the fd (0x7fccb00a5120, flags=010) on file 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> >  @ myvol-replicate-0 [Input/output error]
> > 
> > [2018-07-04 08:23:49.328712] W [MSGID: 108027] 
> > [afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read 
> > subvols for 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> > 
> > [2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 
> > 0-glusterfs-fuse: 55916791: TRUNCATE() 
> > /dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
> >  => -1 (Input/output error)
> > 
> > Best regards,
> > 
> > M.
> > 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On June 22, 2018 4:44 PM, mabi m...@protonmail.ch wrote:
> > 
> > > Hi,
> > > 
> > > Now that this issue has happened a few times I noticed a few things which 
> > > might be helpful for debugging:
> > > 
> > > -   This problem happens when files are uploaded via a cloud app called 
> > > Nextcloud where the files are encrypted by the app itself on the server 
> > > side (PHP code) but only rarely and randomly.
> > > 
> > > -   It does not seem to happen with Nextcloud installation which does not 
> > > have server side encryption enabled.
> > > 
> > > -   When this happens both first and second node of the replica have 120k 
> > > of context switches and 25k interrupts, the arbiter node 30k context 
> > > switches/20k interrupts. No nodes are overloaded, there is no io/wait and 
> > > no network issues or disconnections.
> > > 
> > > -   All of the problematic files to heal have spaces in one of their 
> > > sub-directories (might be totally irrelevant).
> > > 
> > > If that's of any use my two replica nodes are Debian 8 physical 
> > > servers with ZFS as file system for the bricks and the arbiter is a 
> > > Debian 9 virtual machine with XFS as file system for the brick. To mount 
> > > the volume I use a glusterfs fuse mount on the web server which has 
> > > Nextcloud running.
> > > 
> > > Regards,
> > > 
> > > M.
> > > 
> > > ‐‐‐ Original Message ‐‐‐
> > > 
> > > On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote:
> > > 
> > > 
> > > > Thanks Ravi. Let me know when you have time to have a look. It sort of 
> > > > happens around once or twice per week but today it was 24 files in one 
> > > > go which are unsynched and where I need to manually reset the xattrs on 
> > > > the arbiter node.
> > > > 
> > > > By the way on this volume I use quotas which I set on specifc 
> > > > directories, I don't know if this 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-07-04 Thread Ravishankar N
Hi mabi, there are a couple of AFR patches  from master that I'm 
currently back porting to the 3.12 branch:


afr: heal gfids when file is not present on all bricks
afr: don't update readables if inode refresh failed on all children
afr: fix bug-1363721.t failure
afr: add quorum checks in pre-op
afr: don't treat all cases all bricks being blamed as split-brain
afr: capture the correct errno in post-op quorum check
afr: add quorum checks in post-op

Many of these help make the transaction code more robust by fixing 
various corner cases. It would be great if you can wait for the next 
3.12 minor release (3.12.12 ?) and upgrade to that build and see if the 
issues go away.


Note: CC'ing Karthik and Jiffin for their help in reviewing and merging 
the backports for the above patches.


Thanks,
Ravi

On 07/04/2018 06:51 PM, mabi wrote:

Hello,

I just wanted to let you know that last week I have upgraded my two replica 
nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are 
running Debian 9 with a Linux 4 kernel.

Unfortunately I still have the exact same issue. Another detail I might have 
not mentioned yet is that I have quotas enabled on this volume, I don't really 
know if that is relevant but who knows...

As a reminder here is what happens on the client side which has the volume 
mounted via FUSE (take earlier today from the 
/var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this 
specific case it's only one single file who had this issue.

[2018-07-04 08:23:49.314252] E [MSGID: 109089] 
[dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: 
failed to open the fd (0x7fccb00a5120, flags=010) on file 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
 @ myvol-replicate-0 [Input/output error]
[2018-07-04 08:23:49.328712] W [MSGID: 108027] 
[afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read 
subvols for 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
[2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 
0-glusterfs-fuse: 55916791: TRUNCATE() 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
 => -1 (Input/output error)

Best regards,
M.

‐‐‐ Original Message ‐‐‐

On June 22, 2018 4:44 PM, mabi  wrote:


​​

Hi,

Now that this issue has happened a few times I noticed a few things which might 
be helpful for debugging:

-   This problem happens when files are uploaded via a cloud app called 
Nextcloud where the files are encrypted by the app itself on the server side 
(PHP code) but only rarely and randomly.
-   It does not seem to happen with Nextcloud installation which does not have 
server side encryption enabled.
-   When this happens both first and second node of the replica have 120k of 
context switches and 25k interrupts, the arbiter node 30k context switches/20k 
interrupts. No nodes are overloaded, there is no io/wait and no network issues 
or disconnections.
-   All of the problematic files to heal have spaces in one of their 
sub-directories (might be totally irrelevant).
 
 If that's of any use my two replica nodes are Debian 8 physical servers with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine with XFS as file system for the brick. To mount the volume I use a glusterfs fuse mount on the web server which has Nextcloud running.
 
 Regards,
 
 M.
 
 ‐‐‐ Original Message ‐‐‐
 
 On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote:
 


Thanks Ravi. Let me know when you have time to have a look. It sort of happens 
around once or twice per week but today it was 24 files in one go which are 
unsynched and where I need to manually reset the xattrs on the arbiter node.

By the way on this volume I use quotas which I set on specifc directories, I 
don't know if this is relevant or not but thought I would just mention.

‐‐‐ Original Message ‐‐‐

On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote:


On 05/23/2018 12:47 PM, mabi wrote:


Hello,

I just wanted to ask if you had time to look into this bug I am encountering 
and if there is anything else I can do?

For now in order to get rid of these 3 unsynched files shall I do the same 
method that was suggested to me in this thread?

Sorry Mabi,  I haven't had a chance to dig deeper into this. The

workaround of resetting xattrs should be fine though.

Thanks,

Ravi


Thanks,

Mabi

‐‐‐ Original Message ‐‐‐

On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote:


Hi Ravi,

Please fine below the answers to your questions

1.  I have never touched the cluster.quorum-type option. Currently it is set as 
following for this volume:
 
 Option Value
 


cluster.quorum-type none

2.  The .shareKey files are not supposed to be empty. They should be 512 bytes 
big 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-07-04 Thread mabi
Hello,

I just wanted to let you know that last week I have upgraded my two replica 
nodes from Debian 8 to Debian 9 so now all my 3 nodes (including aribter) are 
running Debian 9 with a Linux 4 kernel.

Unfortunately I still have the exact same issue. Another detail I might have 
not mentioned yet is that I have quotas enabled on this volume, I don't really 
know if that is relevant but who knows...

As a reminder here is what happens on the client side which has the volume 
mounted via FUSE (take earlier today from the 
/var/log/glusterfs/mnt-myvol-private.log logfile). Note here that in this 
specific case it's only one single file who had this issue.

[2018-07-04 08:23:49.314252] E [MSGID: 109089] 
[dht-helper.c:1481:dht_migration_complete_check_task] 0-myvol-private-dht: 
failed to open the fd (0x7fccb00a5120, flags=010) on file 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
 @ myvol-replicate-0 [Input/output error]
[2018-07-04 08:23:49.328712] W [MSGID: 108027] 
[afr-common.c:2821:afr_discover_done] 0-myvol-private-replicate-0: no read 
subvols for 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
[2018-07-04 08:23:49.330749] W [fuse-bridge.c:779:fuse_truncate_cbk] 
0-glusterfs-fuse: 55916791: TRUNCATE() 
/dir1/data/dir2/files_encryption/keys/files/dir3/dir4/dir5/dir6/dir7/OC_DEFAULT_MODULE/file.shareKey
 => -1 (Input/output error) 

Best regards,
M.

‐‐‐ Original Message ‐‐‐

On June 22, 2018 4:44 PM, mabi  wrote:

> ​​
> 
> Hi,
> 
> Now that this issue has happened a few times I noticed a few things which 
> might be helpful for debugging:
> 
> -   This problem happens when files are uploaded via a cloud app called 
> Nextcloud where the files are encrypted by the app itself on the server side 
> (PHP code) but only rarely and randomly.
> -   It does not seem to happen with Nextcloud installation which does not 
> have server side encryption enabled.
> -   When this happens both first and second node of the replica have 120k of 
> context switches and 25k interrupts, the arbiter node 30k context 
> switches/20k interrupts. No nodes are overloaded, there is no io/wait and no 
> network issues or disconnections.
> -   All of the problematic files to heal have spaces in one of their 
> sub-directories (might be totally irrelevant).
> 
> If that's of any use my two replica nodes are Debian 8 physical servers 
> with ZFS as file system for the bricks and the arbiter is a Debian 9 virtual 
> machine with XFS as file system for the brick. To mount the volume I use a 
> glusterfs fuse mount on the web server which has Nextcloud running.
> 
> Regards,
> 
> M.
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On May 25, 2018 5:55 PM, mabi m...@protonmail.ch wrote:
> 
> 
> > Thanks Ravi. Let me know when you have time to have a look. It sort of 
> > happens around once or twice per week but today it was 24 files in one go 
> > which are unsynched and where I need to manually reset the xattrs on the 
> > arbiter node.
> > 
> > By the way on this volume I use quotas which I set on specifc directories, 
> > I don't know if this is relevant or not but thought I would just mention.
> > 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote:
> > 
> > > On 05/23/2018 12:47 PM, mabi wrote:
> > > 
> > > > Hello,
> > > > 
> > > > I just wanted to ask if you had time to look into this bug I am 
> > > > encountering and if there is anything else I can do?
> > > > 
> > > > For now in order to get rid of these 3 unsynched files shall I do the 
> > > > same method that was suggested to me in this thread?
> > > 
> > > Sorry Mabi,  I haven't had a chance to dig deeper into this. The
> > > 
> > > workaround of resetting xattrs should be fine though.
> > > 
> > > Thanks,
> > > 
> > > Ravi
> > > 
> > > > Thanks,
> > > > 
> > > > Mabi
> > > > 
> > > > ‐‐‐ Original Message ‐‐‐
> > > > 
> > > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote:
> > > > 
> > > > > Hi Ravi,
> > > > > 
> > > > > Please fine below the answers to your questions
> > > > > 
> > > > > 1.  I have never touched the cluster.quorum-type option. Currently it 
> > > > > is set as following for this volume:
> > > > > 
> > > > > Option Value
> > > > > 
> > > > > 
> > > > > cluster.quorum-type none
> > > > > 
> > > > > 2.  The .shareKey files are not supposed to be empty. They should be 
> > > > > 512 bytes big and contain binary data (PGP Secret Sub-key). I am not 
> > > > > in a position to say why it is in this specific case only 0 bytes and 
> > > > > if it is the fault of the software (Nextcloud) or GlusterFS. I can 
> > > > > just say here that I have another file server which is a simple NFS 
> > > > > server with another Nextcloud installation and there I never saw any 
> > > > > 0 bytes .shareKey files being created.

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-06-22 Thread mabi
Hi,

Now that this issue has happened a few times I noticed a few things which might 
be helpful for debugging:

- This problem happens when files are uploaded via a cloud app called Nextcloud 
where the files are encrypted by the app itself on the server side (PHP code) 
but only rarely and randomly.
- It does not seem to happen with Nextcloud installation which does not have 
server side  encryption enabled.
- When this happens both first and second node of the replica have 120k of 
context switches and 25k interrupts, the arbiter node 30k context switches/20k 
interrupts. No nodes are overloaded, there is no io/wait and no network issues 
or disconnections.
- All of the problematic files to heal have spaces in one of their 
sub-directories (might be totally irrelevant).

If that's of any use my two replica nodes are Debian 8 physical servers with 
ZFS as file system for the bricks and the arbiter is a Debian 9 virtual machine 
with XFS as file system for the brick. To mount the volume I use a glusterfs 
fuse mount on the web server which has Nextcloud running. 

Regards,
M.​​

‐‐‐ Original Message ‐‐‐

On May 25, 2018 5:55 PM, mabi  wrote:

> ​​
> 
> Thanks Ravi. Let me know when you have time to have a look. It sort of 
> happens around once or twice per week but today it was 24 files in one go 
> which are unsynched and where I need to manually reset the xattrs on the 
> arbiter node.
> 
> By the way on this volume I use quotas which I set on specifc directories, I 
> don't know if this is relevant or not but thought I would just mention.
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On May 23, 2018 9:25 AM, Ravishankar N ravishan...@redhat.com wrote:
> 
> > On 05/23/2018 12:47 PM, mabi wrote:
> > 
> > > Hello,
> > > 
> > > I just wanted to ask if you had time to look into this bug I am 
> > > encountering and if there is anything else I can do?
> > > 
> > > For now in order to get rid of these 3 unsynched files shall I do the 
> > > same method that was suggested to me in this thread?
> > 
> > Sorry Mabi,  I haven't had a chance to dig deeper into this. The
> > 
> > workaround of resetting xattrs should be fine though.
> > 
> > Thanks,
> > 
> > Ravi
> > 
> > > Thanks,
> > > 
> > > Mabi
> > > 
> > > ‐‐‐ Original Message ‐‐‐
> > > 
> > > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote:
> > > 
> > > > Hi Ravi,
> > > > 
> > > > Please fine below the answers to your questions
> > > > 
> > > > 1.  I have never touched the cluster.quorum-type option. Currently it 
> > > > is set as following for this volume:
> > > > 
> > > > Option Value
> > > > 
> > > > 
> > > > cluster.quorum-type none
> > > > 
> > > > 2.  The .shareKey files are not supposed to be empty. They should be 
> > > > 512 bytes big and contain binary data (PGP Secret Sub-key). I am not in 
> > > > a position to say why it is in this specific case only 0 bytes and if 
> > > > it is the fault of the software (Nextcloud) or GlusterFS. I can just 
> > > > say here that I have another file server which is a simple NFS server 
> > > > with another Nextcloud installation and there I never saw any 0 bytes 
> > > > .shareKey files being created.
> > > > 
> > > > 3.  It seems to be quite random and I am not the person who uses the 
> > > > Nextcloud software so I can't say what it was doing at that specific 
> > > > time but I guess uploading files or moving files around. Basically I 
> > > > use GlusterFS to store the files/data of the Nextcloud web application 
> > > > where I have it mounted using a fuse mount (mount -t glusterfs).
> > > > 
> > > > 
> > > > Regarding the logs I have attached the mount log file from the client 
> > > > and below are the relevant log entries from the brick log file of all 3 
> > > > nodes. Let me know if you need any other log files. Also if you know 
> > > > any "log file sanitizer tool" which can replace sensitive file names 
> > > > with random file names in log files that would like to use it as right 
> > > > now I have to do that manually.
> > > > 
> > > > NODE 1 brick log:
> > > > 
> > > > [2018-05-15 06:54:20.176679] E [MSGID: 113015] 
> > > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on 
> > > > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
> > > >  [No such file or directory]
> > > > 
> > > > NODE 2 brick log:
> > > > 
> > > > [2018-05-15 06:54:20.176415] E [MSGID: 113015] 
> > > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on 
> > > > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
> > > >  [No such file or directory]
> > > > 
> > > > NODE 3 (arbiter) brick log:
> > > > 
> > > > [2018-05-15 06:54:19.898981] W [MSGID: 113103] 
> > > > [posix.c:285:posix_lookup] 0-myvol-private-posix: Found stale gfid 
> > > > handle 
> > > > 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-25 Thread mabi
Thanks Ravi. Let me know when you have time to have a look. It sort of happens 
around once or twice per week but today it was 24 files in one go which are 
unsynched and where I need to manually reset the xattrs on the arbiter node.

By the way on this volume I use quotas which I set on specifc directories, I 
don't know if this is relevant or not but thought I would just mention.
​​

‐‐‐ Original Message ‐‐‐

On May 23, 2018 9:25 AM, Ravishankar N  wrote:

> ​​
> 
> On 05/23/2018 12:47 PM, mabi wrote:
> 
> > Hello,
> > 
> > I just wanted to ask if you had time to look into this bug I am 
> > encountering and if there is anything else I can do?
> > 
> > For now in order to get rid of these 3 unsynched files shall I do the same 
> > method that was suggested to me in this thread?
> 
> Sorry Mabi,  I haven't had a chance to dig deeper into this. The
> 
> workaround of resetting xattrs should be fine though.
> 
> Thanks,
> 
> Ravi
> 
> > Thanks,
> > 
> > Mabi
> > 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On May 17, 2018 11:07 PM, mabi m...@protonmail.ch wrote:
> > 
> > > Hi Ravi,
> > > 
> > > Please fine below the answers to your questions
> > > 
> > > 1.  I have never touched the cluster.quorum-type option. Currently it is 
> > > set as following for this volume:
> > > 
> > > Option Value
> > > 
> > > 
> > > cluster.quorum-type none
> > > 
> > > 2.  The .shareKey files are not supposed to be empty. They should be 512 
> > > bytes big and contain binary data (PGP Secret Sub-key). I am not in a 
> > > position to say why it is in this specific case only 0 bytes and if it is 
> > > the fault of the software (Nextcloud) or GlusterFS. I can just say here 
> > > that I have another file server which is a simple NFS server with another 
> > > Nextcloud installation and there I never saw any 0 bytes .shareKey files 
> > > being created.
> > > 
> > > 3.  It seems to be quite random and I am not the person who uses the 
> > > Nextcloud software so I can't say what it was doing at that specific time 
> > > but I guess uploading files or moving files around. Basically I use 
> > > GlusterFS to store the files/data of the Nextcloud web application where 
> > > I have it mounted using a fuse mount (mount -t glusterfs).
> > > 
> > > 
> > > Regarding the logs I have attached the mount log file from the client and 
> > > below are the relevant log entries from the brick log file of all 3 
> > > nodes. Let me know if you need any other log files. Also if you know any 
> > > "log file sanitizer tool" which can replace sensitive file names with 
> > > random file names in log files that would like to use it as right now I 
> > > have to do that manually.
> > > 
> > > NODE 1 brick log:
> > > 
> > > [2018-05-15 06:54:20.176679] E [MSGID: 113015] 
> > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on 
> > > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
> > >  [No such file or directory]
> > > 
> > > NODE 2 brick log:
> > > 
> > > [2018-05-15 06:54:20.176415] E [MSGID: 113015] 
> > > [posix.c:1211:posix_opendir] 0-myvol-private-posix: opendir failed on 
> > > /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
> > >  [No such file or directory]
> > > 
> > > NODE 3 (arbiter) brick log:
> > > 
> > > [2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 
> > > 0-myvol-private-posix: Found stale gfid handle 
> > > /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155,
> > >  removing it. [Stale file handle]
> > > 
> > > [2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 
> > > 0-myvol-private-posix: Found stale gfid handle 
> > > /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13,
> > >  removing it. [Stale file handle]
> > > 
> > > [2018-05-15 06:54:20.172823] I [MSGID: 115056] 
> > > [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 
> > > 14740125: RMDIR 
> > > /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
> > >  (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: 
> > > nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
> > >  error-xlator: myvol-private-posix [Directory not empty]
> > > 
> > > [2018-05-15 06:54:20.190911] I [MSGID: 115056] 
> > > [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 
> > > 14740141: RMDIR 
> > > /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir 
> > > (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 
> > > AVB_Photovoltaik-Versicherung 2013.pdf), client: 
> > > nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
> > >  error-xlator: myvol-private-posix [Directory not empty]
> > > 
> > > Best regards,
> > > 
> > > Mabi
> > > 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-23 Thread Ravishankar N



On 05/23/2018 12:47 PM, mabi wrote:

Hello,

I just wanted to ask if you had time to look into this bug I am encountering 
and if there is anything else I can do?

For now in order to get rid of these 3 unsynched files shall I do the same 
method that was suggested to me in this thread?
Sorry Mabi,  I haven't had a chance to dig deeper into this. The 
workaround of resetting xattrs should be fine though.

Thanks,
Ravi


Thanks,
Mabi
​

‐‐‐ Original Message ‐‐‐

On May 17, 2018 11:07 PM, mabi  wrote:


​​

Hi Ravi,

Please fine below the answers to your questions

1.  I have never touched the cluster.quorum-type option. Currently it is set as 
following for this volume:
 
 Option Value
 


cluster.quorum-type none

2) The .shareKey files are not supposed to be empty. They should be 512 bytes 
big and contain binary data (PGP Secret Sub-key). I am not in a position to say 
why it is in this specific case only 0 bytes and if it is the fault of the 
software (Nextcloud) or GlusterFS. I can just say here that I have another file 
server which is a simple NFS server with another Nextcloud installation and 
there I never saw any 0 bytes .shareKey files being created.

3) It seems to be quite random and I am not the person who uses the Nextcloud 
software so I can't say what it was doing at that specific time but I guess 
uploading files or moving files around. Basically I use GlusterFS to store the 
files/data of the Nextcloud web application where I have it mounted using a 
fuse mount (mount -t glusterfs).

Regarding the logs I have attached the mount log file from the client and below are the 
relevant log entries from the brick log file of all 3 nodes. Let me know if you need any 
other log files. Also if you know any "log file sanitizer tool" which can 
replace sensitive file names with random file names in log files that would like to use 
it as right now I have to do that manually.

NODE 1 brick log:

[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 [No such file or directory]

NODE 2 brick log:

[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 [No such file or directory]

NODE 3 (arbiter) brick log:

[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 
0-myvol-private-posix: Found stale gfid handle 
/srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155,
 removing it. [Stale file handle]

[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 
0-myvol-private-posix: Found stale gfid handle 
/srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13,
 removing it. [Stale file handle]

[2018-05-15 06:54:20.172823] I [MSGID: 115056] 
[server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: 
RMDIR 
/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: 
nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
 error-xlator: myvol-private-posix [Directory not empty]

[2018-05-15 06:54:20.190911] I [MSGID: 115056] 
[server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: 
RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir 
(72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 
2013.pdf), client: 
nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
 error-xlator: myvol-private-posix [Directory not empty]

Best regards,

Mabi

‐‐‐ Original Message ‐‐‐

On May 17, 2018 7:00 AM, Ravishankar N ravishan...@redhat.com wrote:


Hi mabi,

Some questions:

-Did you by any chance change the cluster.quorum-type option from the

default values?

-Is filename.shareKey supposed to be any empty file? Looks like the file

was fallocated with the keep-size option but never written to. (On the 2

data bricks, stat output shows Size =0, but non zero Blocks and yet a

'regular empty file').

-Do you have some sort of a reproducer/ steps that you perform when the

issue occurs? Please also share the logs from all 3 nodes and the client(s).

Thanks,

Ravi

On 05/15/2018 05:26 PM, mabi wrote:


Thank you Ravi for your fast answer. As requested you will find below the "stat" and 
"getfattr" of one of the files and its parent directory from all three nodes of my 
cluster.

NODE 1:

File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’

Size: 0 Blocks: 38 IO Block: 131072 regular empty file

Device: 23h/35d Inode: 744413 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-23 Thread mabi
Hello,

I just wanted to ask if you had time to look into this bug I am encountering 
and if there is anything else I can do?

For now in order to get rid of these 3 unsynched files shall I do the same 
method that was suggested to me in this thread?

Thanks,
Mabi
​

‐‐‐ Original Message ‐‐‐

On May 17, 2018 11:07 PM, mabi  wrote:

> ​​
> 
> Hi Ravi,
> 
> Please fine below the answers to your questions
> 
> 1.  I have never touched the cluster.quorum-type option. Currently it is set 
> as following for this volume:
> 
> Option Value
> 
> 
> cluster.quorum-type none
> 
> 2) The .shareKey files are not supposed to be empty. They should be 512 bytes 
> big and contain binary data (PGP Secret Sub-key). I am not in a position to 
> say why it is in this specific case only 0 bytes and if it is the fault of 
> the software (Nextcloud) or GlusterFS. I can just say here that I have 
> another file server which is a simple NFS server with another Nextcloud 
> installation and there I never saw any 0 bytes .shareKey files being created.
> 
> 3) It seems to be quite random and I am not the person who uses the Nextcloud 
> software so I can't say what it was doing at that specific time but I guess 
> uploading files or moving files around. Basically I use GlusterFS to store 
> the files/data of the Nextcloud web application where I have it mounted using 
> a fuse mount (mount -t glusterfs).
> 
> Regarding the logs I have attached the mount log file from the client and 
> below are the relevant log entries from the brick log file of all 3 nodes. 
> Let me know if you need any other log files. Also if you know any "log file 
> sanitizer tool" which can replace sensitive file names with random file names 
> in log files that would like to use it as right now I have to do that 
> manually.
> 
> NODE 1 brick log:
> 
> [2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
> 0-myvol-private-posix: opendir failed on 
> /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
>  [No such file or directory]
> 
> NODE 2 brick log:
> 
> [2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
> 0-myvol-private-posix: opendir failed on 
> /data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
>  [No such file or directory]
> 
> NODE 3 (arbiter) brick log:
> 
> [2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 
> 0-myvol-private-posix: Found stale gfid handle 
> /srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155,
>  removing it. [Stale file handle]
> 
> [2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 
> 0-myvol-private-posix: Found stale gfid handle 
> /srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13,
>  removing it. [Stale file handle]
> 
> [2018-05-15 06:54:20.172823] I [MSGID: 115056] 
> [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: 
> RMDIR 
> /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
>  (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: 
> nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
>  error-xlator: myvol-private-posix [Directory not empty]
> 
> [2018-05-15 06:54:20.190911] I [MSGID: 115056] 
> [server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: 
> RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir 
> (72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 
> AVB_Photovoltaik-Versicherung 2013.pdf), client: 
> nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
>  error-xlator: myvol-private-posix [Directory not empty]
> 
> Best regards,
> 
> Mabi
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On May 17, 2018 7:00 AM, Ravishankar N ravishan...@redhat.com wrote:
> 
> > Hi mabi,
> > 
> > Some questions:
> > 
> > -Did you by any chance change the cluster.quorum-type option from the
> > 
> > default values?
> > 
> > -Is filename.shareKey supposed to be any empty file? Looks like the file
> > 
> > was fallocated with the keep-size option but never written to. (On the 2
> > 
> > data bricks, stat output shows Size =0, but non zero Blocks and yet a
> > 
> > 'regular empty file').
> > 
> > -Do you have some sort of a reproducer/ steps that you perform when the
> > 
> > issue occurs? Please also share the logs from all 3 nodes and the client(s).
> > 
> > Thanks,
> > 
> > Ravi
> > 
> > On 05/15/2018 05:26 PM, mabi wrote:
> > 
> > > Thank you Ravi for your fast answer. As requested you will find below the 
> > > "stat" and "getfattr" of one of the files and its parent directory from 
> > > all three nodes of my cluster.
> > > 
> > > NODE 1:
> > > 
> > > File: 
> > > 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-17 Thread mabi
Hi Ravi,

Please fine below the answers to your questions

1) I have never touched the cluster.quorum-type option. Currently it is set as 
following for this volume:

Option  Value   
--  -   
cluster.quorum-type none

2) The .shareKey files are not supposed to be empty. They should be 512 bytes 
big and contain binary data (PGP Secret Sub-key). I am not in a position to say 
why it is in this specific case only 0 bytes and if it is the fault of the 
software (Nextcloud) or GlusterFS. I can just say here that I have another file 
server which is a simple NFS server with another Nextcloud installation and 
there I never saw any 0 bytes .shareKey files being created.

3) It seems to be quite random and I am not the person who uses the Nextcloud 
software so I can't say what it was doing at that specific time but I guess 
uploading files or moving files around. Basically I use GlusterFS to store the 
files/data of the Nextcloud web application where I have it mounted using a 
fuse mount (mount -t glusterfs).

Regarding the logs I have attached the mount log file from the client and below 
are the relevant log entries from the brick log file of all 3 nodes. Let me 
know if you need any other log files. Also if you know any "log file sanitizer 
tool" which can replace sensitive file names with random file names in log 
files that would like to use it as right now I have to do that manually.
​​
NODE 1 brick log:

[2018-05-15 06:54:20.176679] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 [No such file or directory]

NODE 2 brick log:

[2018-05-15 06:54:20.176415] E [MSGID: 113015] [posix.c:1211:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 [No such file or directory]

NODE 3 (arbiter) brick log:

[2018-05-15 06:54:19.898981] W [MSGID: 113103] [posix.c:285:posix_lookup] 
0-myvol-private-posix: Found stale gfid handle 
/srv/glusterfs/myvol-private/brick/.glusterfs/f0/65/f065a5e7-ac06-445f-add0-83acf8ce4155,
 removing it. [Stale file handle]
[2018-05-15 06:54:20.056196] W [MSGID: 113103] [posix.c:285:posix_lookup] 
0-myvol-private-posix: Found stale gfid handle 
/srv/glusterfs/myvol-private/brick/.glusterfs/8f/a1/8fa15dbd-cd5c-4900-b889-0fe7fce46a13,
 removing it. [Stale file handle]
[2018-05-15 06:54:20.172823] I [MSGID: 115056] 
[server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740125: 
RMDIR 
/cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir/OC_DEFAULT_MODULE
 (f065a5e7-ac06-445f-add0-83acf8ce4155/OC_DEFAULT_MODULE), client: 
nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
 error-xlator: myvol-private-posix [Directory not empty]
[2018-05-15 06:54:20.190911] I [MSGID: 115056] 
[server-rpc-fops.c:485:server_rmdir_cbk] 0-myvol-private-server: 14740141: 
RMDIR /cloud/data/admin/files_encryption/keys/files/dir/dir/anotherdir/dir 
(72a1613e-2ac0-48bd-8ace-f2f723f3796c/2016.03.15 AVB_Photovoltaik-Versicherung 
2013.pdf), client: 
nextcloud.domain.com-7972-2018/05/10-20:31:46:163206-myvol-private-client-2-0-0,
 error-xlator: myvol-private-posix [Directory not empty]


Best regards,
Mabi

‐‐‐ Original Message ‐‐‐

On May 17, 2018 7:00 AM, Ravishankar N  wrote:

> ​​
> 
> Hi mabi,
> 
> Some questions:
> 
> -Did you by any chance change the cluster.quorum-type option from the
> 
> default values?
> 
> -Is filename.shareKey supposed to be any empty file? Looks like the file
> 
> was fallocated with the keep-size option but never written to. (On the 2
> 
> data bricks, stat output shows Size =0, but non zero Blocks and yet a
> 
> 'regular empty file').
> 
> -Do you have some sort of a reproducer/ steps that you perform when the
> 
> issue occurs? Please also share the logs from all 3 nodes and the client(s).
> 
> Thanks,
> 
> Ravi
> 
> On 05/15/2018 05:26 PM, mabi wrote:
> 
> > Thank you Ravi for your fast answer. As requested you will find below the 
> > "stat" and "getfattr" of one of the files and its parent directory from all 
> > three nodes of my cluster.
> > 
> > NODE 1:
> > 
> > File: 
> > ‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
> > 
> > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > 
> > Device: 23h/35d Inode: 744413 Links: 2
> > 
> > Access: (0644/-rw-r--r--) Uid: (20936/ UNKNOWN) Gid: (20936/ UNKNOWN)
> > 
> > Access: 2018-05-15 08:54:20.296048887 +0200
> > 
> > Modify: 2018-05-15 08:54:20.296048887 +0200
> > 
> > Change: 2018-05-15 08:54:20.340048505 +0200
> > 
> > Birth: -
> > 
> > File: 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-16 Thread Ravishankar N

Hi mabi,
Some questions:
-Did you by any chance change the cluster.quorum-type option from the 
default values?
-Is filename.shareKey supposed to be any empty file? Looks like the file 
was fallocated with the keep-size option but never written to. (On the 2 
data bricks, stat output shows Size =0, but non zero Blocks and yet a 
'regular empty file').
-Do you have some sort of a reproducer/ steps that you perform when the 
issue occurs? Please also share the logs from all 3 nodes and the client(s).

Thanks,
Ravi

On 05/15/2018 05:26 PM, mabi wrote:

Thank you Ravi for your fast answer. As requested you will find below the "stat" and 
"getfattr" of one of the files and its parent directory from all three nodes of my 
cluster.

NODE 1:

   File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
   Size: 0  Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413  Links: 2
Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
  Birth: -

   File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
   Size: 8  Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
  Birth: -

# file: 
data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x0001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x0001

# file: 
data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x0001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x06060001
trusted.glusterfs.quota.size.1=0x06060001


NODE 2:

   File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
   Size: 0  Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
  Birth: -

   File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
   Size: 8  Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
  Birth: -

# file: 
data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x0001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x0001

# file: 
data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x0001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x06060001
trusted.glusterfs.quota.size.1=0x06060001


NODE 3 (arbiter):

   File: 
/srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
   Size: 0  Blocks: 8  IO Block: 4096   regular empty file
Device: ca11h/51729dInode: 271434295   Links: 2
Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
  Birth: -

   File: 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-15 Thread mabi
Thank you Ravi for your fast answer. As requested you will find below the 
"stat" and "getfattr" of one of the files and its parent directory from all 
three nodes of my cluster.

NODE 1:

  File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
  Size: 0   Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 744413  Links: 2
Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.296048887 +0200
Modify: 2018-05-15 08:54:20.296048887 +0200
Change: 2018-05-15 08:54:20.340048505 +0200
 Birth: -

  File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
  Size: 8   Blocks: 74 IO Block: 131072 directory
Device: 23h/35d Inode: 744410  Links: 2
Access: (0755/drwxr-xr-x)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.392048056 +0200
Change: 2018-05-15 08:54:20.392048056 +0200
 Birth: -

# file: 
data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x0001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x0001

# file: 
data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x0001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x06060001
trusted.glusterfs.quota.size.1=0x06060001


NODE 2:

  File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey’
  Size: 0   Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 5428150 Links: 2
Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-05-15 08:54:20.294280254 +0200
Modify: 2018-05-15 08:54:20.294280254 +0200
Change: 2018-05-15 08:54:20.338279576 +0200
 Birth: -

  File: 
‘/data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/’
  Size: 8   Blocks: 74 IO Block: 131072 directory
Device: 24h/36d Inode: 5428147 Links: 2
Access: (0755/drwxr-xr-x)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.394278717 +0200
Change: 2018-05-15 08:54:20.394278717 +0200
 Birth: -

# file: 
data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
trusted.gfid=0x3b6c722cd6c64a4180fa028809671d63
trusted.gfid2path.9cb852a48fe5e361=0x38666131356462642d636435632d343930302d623838392d3066653766636534366131332f6e6361646d696e6973747261746f722e73686172654b6579
trusted.glusterfs.quota.8fa15dbd-cd5c-4900-b889-0fe7fce46a13.contri.1=0x0001
trusted.pgfid.8fa15dbd-cd5c-4900-b889-0fe7fce46a13=0x0001

# file: 
data/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
trusted.gfid=0x8fa15dbdcd5c4900b8890fe7fce46a13
trusted.glusterfs.dht=0x0001
trusted.glusterfs.quota.dirty=0x3000
trusted.glusterfs.quota.f065a5e7-ac06-445f-add0-83acf8ce4155.contri.1=0x06060001
trusted.glusterfs.quota.size.1=0x06060001


NODE 3 (arbiter):

  File: 
/srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/filename.shareKey
  Size: 0   Blocks: 8  IO Block: 4096   regular empty file
Device: ca11h/51729dInode: 271434295   Links: 2
Access: (0644/-rw-r--r--)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.322527555 +0200
Modify: 2018-04-25 09:41:24.322527555 +0200
Change: 2018-05-15 08:54:20.343667380 +0200
 Birth: -

  File: 
/srv/glusterfs/myvolume-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/OC_DEFAULT_MODULE/
  Size: 8192Blocks: 24 IO Block: 4096   directory
Device: ca11h/51729dInode: 271434288   Links: 2
Access: (0755/drwxr-xr-x)  Uid: (20936/ UNKNOWN)   Gid: (20936/ UNKNOWN)
Access: 2018-04-25 09:41:24.276780766 +0200
Modify: 2018-05-15 08:54:20.391667997 +0200
Change: 2018-05-15 08:54:20.395668048 +0200
 Birth: -

# file: 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-15 Thread Ravishankar N


On 05/15/2018 12:38 PM, mabi wrote:

Dear all,

I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 
3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I 
still have exactly the same problem as initially posted in this thread.

It looks like this bug is not resolved as I just got right now 3 unsynched 
files on my arbiter node like I used to do before upgrading. This problem 
started since I upgraded to 3.12.7...
Could you provide the stat and 'getfattr -d -m . - hex 
brick/path/to/file' outputs of one of these files and also the 
corresponding parent directory from all 3 bricks?

Thanks,
Ravi


Thank you very much in advance for your advise.

Best regards,
Mabi​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 2:31 PM, Ravishankar N  wrote:


​​

On 04/09/2018 05:54 PM, Dmitry Melekhov wrote:


09.04.2018 16:18, Ravishankar N пишет:


On 04/09/2018 05:40 PM, mabi wrote:


Again thanks that worked and I have now no more unsynched files.

You mentioned that this bug has been fixed in 3.13, would it be

possible to backport it to 3.12? I am asking because 3.13 is not a

long-term release and as such I would not like to have to upgrade to

3.13.

I don't think there will be another 3.12 release.

Why not? It is LTS, right?

My bad. Just checked  the schedule [1], and you are right. It is LTM.

[1] https://www.gluster.org/release-schedule/


Gluster-users mailing list

Gluster-users@gluster.org

http://lists.gluster.org/mailman/listinfo/gluster-users

Gluster-users mailing list

Gluster-users@gluster.org

http://lists.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-05-15 Thread mabi
Dear all,

I have upgraded my replica 3 GlusterFS cluster (and clients) last Friday from 
3.12.7 to 3.12.9 in order to fix this bug but unfortunately I notice that I 
still have exactly the same problem as initially posted in this thread. 

It looks like this bug is not resolved as I just got right now 3 unsynched 
files on my arbiter node like I used to do before upgrading. This problem 
started since I upgraded to 3.12.7...

Thank you very much in advance for your advise.

Best regards,
Mabi​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 2:31 PM, Ravishankar N  wrote:

> ​​
> 
> On 04/09/2018 05:54 PM, Dmitry Melekhov wrote:
> 
> > 09.04.2018 16:18, Ravishankar N пишет:
> > 
> > > On 04/09/2018 05:40 PM, mabi wrote:
> > > 
> > > > Again thanks that worked and I have now no more unsynched files.
> > > > 
> > > > You mentioned that this bug has been fixed in 3.13, would it be
> > > > 
> > > > possible to backport it to 3.12? I am asking because 3.13 is not a
> > > > 
> > > > long-term release and as such I would not like to have to upgrade to
> > > > 
> > > > 3.13.
> > > 
> > > I don't think there will be another 3.12 release.
> > 
> > Why not? It is LTS, right?
> 
> My bad. Just checked  the schedule [1], and you are right. It is LTM.
> 
> [1] https://www.gluster.org/release-schedule/
> 
> > Gluster-users mailing list
> > 
> > Gluster-users@gluster.org
> > 
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> Gluster-users mailing list
> 
> Gluster-users@gluster.org
> 
> http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Thanks for clarifying that point. Does this mean that the fix for this bug will 
get backported to the next 3.12 release?
​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 2:31 PM, Ravishankar N  wrote:

> ​​
> 
> On 04/09/2018 05:54 PM, Dmitry Melekhov wrote:
> 
> > 09.04.2018 16:18, Ravishankar N пишет:
> > 
> > > On 04/09/2018 05:40 PM, mabi wrote:
> > > 
> > > > Again thanks that worked and I have now no more unsynched files.
> > > > 
> > > > You mentioned that this bug has been fixed in 3.13, would it be
> > > > 
> > > > possible to backport it to 3.12? I am asking because 3.13 is not a
> > > > 
> > > > long-term release and as such I would not like to have to upgrade to
> > > > 
> > > > 3.13.
> > > 
> > > I don't think there will be another 3.12 release.
> > 
> > Why not? It is LTS, right?
> 
> My bad. Just checked  the schedule [1], and you are right. It is LTM.
> 
> [1] https://www.gluster.org/release-schedule/
> 
> > Gluster-users mailing list
> > 
> > Gluster-users@gluster.org
> > 
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> 
> Gluster-users mailing list
> 
> Gluster-users@gluster.org
> 
> http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread Ravishankar N



On 04/09/2018 05:54 PM, Dmitry Melekhov wrote:

09.04.2018 16:18, Ravishankar N пишет:



On 04/09/2018 05:40 PM, mabi wrote:

Again thanks that worked and I have now no more unsynched files.

You mentioned that this bug has been fixed in 3.13, would it be 
possible to backport it to 3.12? I am asking because 3.13 is not a 
long-term release and as such I would not like to have to upgrade to 
3.13.


I don't think there will be another 3.12 release.


Why not? It is LTS, right?


My bad. Just checked  the schedule [1], and you are right. It is LTM.

[1] https://www.gluster.org/release-schedule/


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread Dmitry Melekhov

09.04.2018 16:18, Ravishankar N пишет:



On 04/09/2018 05:40 PM, mabi wrote:

Again thanks that worked and I have now no more unsynched files.

You mentioned that this bug has been fixed in 3.13, would it be 
possible to backport it to 3.12? I am asking because 3.13 is not a 
long-term release and as such I would not like to have to upgrade to 
3.13.


I don't think there will be another 3.12 release.


Why not? It is LTS, right?

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread Ravishankar N



On 04/09/2018 05:40 PM, mabi wrote:

Again thanks that worked and I have now no more unsynched files.

You mentioned that this bug has been fixed in 3.13, would it be possible to 
backport it to 3.12? I am asking because 3.13 is not a long-term release and as 
such I would not like to have to upgrade to 3.13.


I don't think there will be another 3.12 release. Adding Karthik to see 
if this is something he can back-port to 3.10.12, which is likely to be 
out by the end of the month.

-Ravi




‐‐‐ Original Message ‐‐‐

On April 9, 2018 1:46 PM, Ravishankar N  wrote:


​​

On 04/09/2018 05:09 PM, mabi wrote:


Thanks Ravi for your answer.

Stupid question but how do I delete the trusted.afr xattrs on this brick?

And when you say "this brick", do you mean the brick on the arbitrer node (node 
3 in my case)?

Sorry I should have been clearer. Yes the brick on the 3rd node.

`setfattr -x trusted.afr.myvol-private-client-0 
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`

`setfattr -x trusted.afr.myvol-private-client-1 
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`

After doing this for all files, run 'gluster volume heal `.

HTH,

Ravi


‐‐‐ Original Message ‐‐‐

On April 9, 2018 1:24 PM, Ravishankar N ravishan...@redhat.com wrote:


On 04/09/2018 04:36 PM, mabi wrote:


As I was suggested in the past by this mailing list a now ran a stat and 
getfattr on one of the problematic files on all nodes and at the end a stat on 
the fuse mount directly. The output is below:

NODE1:

STAT:

File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’

Size: 0 Blocks: 38 IO Block: 131072 regular empty file

Device: 23h/35d Inode: 6822549 Links: 2

Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)

Access: 2018-04-09 08:58:54.311556621 +0200

Modify: 2018-04-09 08:58:54.311556621 +0200

Change: 2018-04-09 08:58:54.423555611 +0200

Birth: -

GETFATTR:

trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==

trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"

trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==

trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE2:

STAT:

File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’

Size: 0 Blocks: 38 IO Block: 131072 regular empty file

Device: 24h/36d Inode: 6825876 Links: 2

Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)

Access: 2018-04-09 08:58:54.311775605 +0200

Modify: 2018-04-09 08:58:54.311775605 +0200

Change: 2018-04-09 08:58:54.423774007 +0200

Birth: -

GETFATTR:

trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==

trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"

trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==

trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE3:

STAT:

File: 
/srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile

Size: 0 Blocks: 8 IO Block: 4096 regular empty file

Device: ca11h/51729d Inode: 404058268 Links: 2

Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)

Access: 2018-04-05 16:16:55.292341447 +0200

Modify: 2018-04-05 16:16:55.292341447 +0200

Change: 2018-04-09 08:58:54.428074177 +0200

Birth: -

GETFATTR:

trusted.afr.dirty=0s

trusted.afr.myvol-private-client-0=0sAQAA

trusted.afr.myvol-private-client-1=0sAQAA

Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed

by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and

launch heal, that should fix it. But the file seems to have no content

on both data bricks as well, so you might want to check if that was

expected.

-Ravi


trusted.bit-rot.version=0sAgBavUW2AAGBaA==

trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==

trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==

trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

CLIENT GLUSTER MOUNT:

STAT:

File: 
'/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'

Size: 0 Blocks: 0 IO Block: 131072 regular empty file

Device: 1eh/30d Inode: 13600685574951729371 Links: 1

Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)

Access: 2018-04-09 08:58:54.311556621 +0200

Modify: 2018-04-09 08:58:54.311556621 +0200

Change: 2018-04-09 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Again thanks that worked and I have now no more unsynched files.

You mentioned that this bug has been fixed in 3.13, would it be possible to 
backport it to 3.12? I am asking because 3.13 is not a long-term release and as 
such I would not like to have to upgrade to 3.13.


‐‐‐ Original Message ‐‐‐

On April 9, 2018 1:46 PM, Ravishankar N  wrote:

> ​​
> 
> On 04/09/2018 05:09 PM, mabi wrote:
> 
> > Thanks Ravi for your answer.
> > 
> > Stupid question but how do I delete the trusted.afr xattrs on this brick?
> > 
> > And when you say "this brick", do you mean the brick on the arbitrer node 
> > (node 3 in my case)?
> 
> Sorry I should have been clearer. Yes the brick on the 3rd node.
> 
> `setfattr -x trusted.afr.myvol-private-client-0 
> /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`
> 
> `setfattr -x trusted.afr.myvol-private-client-1 
> /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`
> 
> After doing this for all files, run 'gluster volume heal `.
> 
> HTH,
> 
> Ravi
> 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On April 9, 2018 1:24 PM, Ravishankar N ravishan...@redhat.com wrote:
> > 
> > > On 04/09/2018 04:36 PM, mabi wrote:
> > > 
> > > > As I was suggested in the past by this mailing list a now ran a stat 
> > > > and getfattr on one of the problematic files on all nodes and at the 
> > > > end a stat on the fuse mount directly. The output is below:
> > > > 
> > > > NODE1:
> > > > 
> > > > STAT:
> > > > 
> > > > File: 
> > > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
> > > > 
> > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > > > 
> > > > Device: 23h/35d Inode: 6822549 Links: 2
> > > > 
> > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > > > 
> > > > Access: 2018-04-09 08:58:54.311556621 +0200
> > > > 
> > > > Modify: 2018-04-09 08:58:54.311556621 +0200
> > > > 
> > > > Change: 2018-04-09 08:58:54.423555611 +0200
> > > > 
> > > > Birth: -
> > > > 
> > > > GETFATTR:
> > > > 
> > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > > > 
> > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
> > > > 
> > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > > > 
> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > > > 
> > > > NODE2:
> > > > 
> > > > STAT:
> > > > 
> > > > File: 
> > > > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
> > > > 
> > > > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > > > 
> > > > Device: 24h/36d Inode: 6825876 Links: 2
> > > > 
> > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > > > 
> > > > Access: 2018-04-09 08:58:54.311775605 +0200
> > > > 
> > > > Modify: 2018-04-09 08:58:54.311775605 +0200
> > > > 
> > > > Change: 2018-04-09 08:58:54.423774007 +0200
> > > > 
> > > > Birth: -
> > > > 
> > > > GETFATTR:
> > > > 
> > > > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > > > 
> > > > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
> > > > 
> > > > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > > > 
> > > > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > > > 
> > > > NODE3:
> > > > 
> > > > STAT:
> > > > 
> > > > File: 
> > > > /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
> > > > 
> > > > Size: 0 Blocks: 8 IO Block: 4096 regular empty file
> > > > 
> > > > Device: ca11h/51729d Inode: 404058268 Links: 2
> > > > 
> > > > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > > > 
> > > > Access: 2018-04-05 16:16:55.292341447 +0200
> > > > 
> > > > Modify: 2018-04-05 16:16:55.292341447 +0200
> > > > 
> > > > Change: 2018-04-09 08:58:54.428074177 +0200
> > > > 
> > > > Birth: -
> > > > 
> > > > GETFATTR:
> > > > 
> > > > trusted.afr.dirty=0s
> > > > 
> > > > trusted.afr.myvol-private-client-0=0sAQAA
> > > > 
> > > > trusted.afr.myvol-private-client-1=0sAQAA
> > > > 
> > > > Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed
> > > 
> > > by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and
> > > 
> > > launch heal, that should fix it. But the file seems to have no content
> > > 
> > > on both data bricks as well, so you might want to check if that was
> > > 
> > > expected.
> > > 
> > > -Ravi
> > > 
> > > > 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread Ravishankar N



On 04/09/2018 05:09 PM, mabi wrote:

Thanks Ravi for your answer.

Stupid question but how do I delete the trusted.afr xattrs on this brick?

And when you say "this brick", do you mean the brick on the arbitrer node (node 
3 in my case)?

Sorry I should have been clearer. Yes the brick on the 3rd node.

`setfattr -x trusted.afr.myvol-private-client-0 
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`


`setfattr -x trusted.afr.myvol-private-client-1 
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile`


After doing this for all files, run 'gluster volume heal `.

HTH,
Ravi

​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 1:24 PM, Ravishankar N  wrote:


​​

On 04/09/2018 04:36 PM, mabi wrote:


As I was suggested in the past by this mailing list a now ran a stat and 
getfattr on one of the problematic files on all nodes and at the end a stat on 
the fuse mount directly. The output is below:

NODE1:

STAT:

File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’

Size: 0 Blocks: 38 IO Block: 131072 regular empty file

Device: 23h/35d Inode: 6822549 Links: 2

Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)

Access: 2018-04-09 08:58:54.311556621 +0200

Modify: 2018-04-09 08:58:54.311556621 +0200

Change: 2018-04-09 08:58:54.423555611 +0200

Birth: -

GETFATTR:

trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==

trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"

trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==

trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE2:

STAT:

File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’

Size: 0 Blocks: 38 IO Block: 131072 regular empty file

Device: 24h/36d Inode: 6825876 Links: 2

Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)

Access: 2018-04-09 08:58:54.311775605 +0200

Modify: 2018-04-09 08:58:54.311775605 +0200

Change: 2018-04-09 08:58:54.423774007 +0200

Birth: -

GETFATTR:

trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==

trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"

trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==

trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE3:

STAT:

File: 
/srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile

Size: 0 Blocks: 8 IO Block: 4096 regular empty file

Device: ca11h/51729d Inode: 404058268 Links: 2

Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)

Access: 2018-04-05 16:16:55.292341447 +0200

Modify: 2018-04-05 16:16:55.292341447 +0200

Change: 2018-04-09 08:58:54.428074177 +0200

Birth: -

GETFATTR:

trusted.afr.dirty=0s

trusted.afr.myvol-private-client-0=0sAQAA

trusted.afr.myvol-private-client-1=0sAQAA

Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed

by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and

launch heal, that should fix it. But the file seems to have no content

on both data bricks as well, so you might want to check if that was

expected.

-Ravi


trusted.bit-rot.version=0sAgBavUW2AAGBaA==

trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==

trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==

trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

CLIENT GLUSTER MOUNT:

STAT:

File: 
'/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'

Size: 0 Blocks: 0 IO Block: 131072 regular empty file

Device: 1eh/30d Inode: 13600685574951729371 Links: 1

Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)

Access: 2018-04-09 08:58:54.311556621 +0200

Modify: 2018-04-09 08:58:54.311556621 +0200

Change: 2018-04-09 08:58:54.423555611 +0200

Birth: -

‐‐‐ Original Message ‐‐‐

On April 9, 2018 9:49 AM, mabi m...@protonmail.ch wrote:


Here would be also the corresponding log entries on a gluster node brick log 
file:

[2018-04-09 06:58:47.363536] W [MSGID: 113093] 
[posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: 
removing gfid2path xattr failed on 
/data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a:
 key = trusted.gfid2path.2529bb66b56be110 [No data available]

[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Thanks Ravi for your answer.

Stupid question but how do I delete the trusted.afr xattrs on this brick?

And when you say "this brick", do you mean the brick on the arbitrer node (node 
3 in my case)?
​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 1:24 PM, Ravishankar N  wrote:

> ​​
> 
> On 04/09/2018 04:36 PM, mabi wrote:
> 
> > As I was suggested in the past by this mailing list a now ran a stat and 
> > getfattr on one of the problematic files on all nodes and at the end a stat 
> > on the fuse mount directly. The output is below:
> > 
> > NODE1:
> > 
> > STAT:
> > 
> > File: 
> > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
> > 
> > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > 
> > Device: 23h/35d Inode: 6822549 Links: 2
> > 
> > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > 
> > Access: 2018-04-09 08:58:54.311556621 +0200
> > 
> > Modify: 2018-04-09 08:58:54.311556621 +0200
> > 
> > Change: 2018-04-09 08:58:54.423555611 +0200
> > 
> > Birth: -
> > 
> > GETFATTR:
> > 
> > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > 
> > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
> > 
> > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > 
> > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > 
> > NODE2:
> > 
> > STAT:
> > 
> > File: 
> > ‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
> > 
> > Size: 0 Blocks: 38 IO Block: 131072 regular empty file
> > 
> > Device: 24h/36d Inode: 6825876 Links: 2
> > 
> > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > 
> > Access: 2018-04-09 08:58:54.311775605 +0200
> > 
> > Modify: 2018-04-09 08:58:54.311775605 +0200
> > 
> > Change: 2018-04-09 08:58:54.423774007 +0200
> > 
> > Birth: -
> > 
> > GETFATTR:
> > 
> > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > 
> > trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
> > 
> > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > 
> > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > 
> > NODE3:
> > 
> > STAT:
> > 
> > File: 
> > /srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
> > 
> > Size: 0 Blocks: 8 IO Block: 4096 regular empty file
> > 
> > Device: ca11h/51729d Inode: 404058268 Links: 2
> > 
> > Access: (0644/-rw-r--r--) Uid: (20909/ UNKNOWN) Gid: (20909/ UNKNOWN)
> > 
> > Access: 2018-04-05 16:16:55.292341447 +0200
> > 
> > Modify: 2018-04-05 16:16:55.292341447 +0200
> > 
> > Change: 2018-04-09 08:58:54.428074177 +0200
> > 
> > Birth: -
> > 
> > GETFATTR:
> > 
> > trusted.afr.dirty=0s
> > 
> > trusted.afr.myvol-private-client-0=0sAQAA
> > 
> > trusted.afr.myvol-private-client-1=0sAQAA
> 
> Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed
> 
> by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and
> 
> launch heal, that should fix it. But the file seems to have no content
> 
> on both data bricks as well, so you might want to check if that was
> 
> expected.
> 
> -Ravi
> 
> > trusted.bit-rot.version=0sAgBavUW2AAGBaA==
> > 
> > trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
> > 
> > trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
> > 
> > trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==
> > 
> > CLIENT GLUSTER MOUNT:
> > 
> > STAT:
> > 
> > File: 
> > '/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
> > 
> > Size: 0 Blocks: 0 IO Block: 131072 regular empty file
> > 
> > Device: 1eh/30d Inode: 13600685574951729371 Links: 1
> > 
> > Access: (0644/-rw-r--r--) Uid: (20909/nch20909) Gid: (20909/nch20909)
> > 
> > Access: 2018-04-09 08:58:54.311556621 +0200
> > 
> > Modify: 2018-04-09 08:58:54.311556621 +0200
> > 
> > Change: 2018-04-09 08:58:54.423555611 +0200
> > 
> > Birth: -
> > 
> > ‐‐‐ Original Message ‐‐‐
> > 
> > On April 9, 2018 9:49 AM, mabi m...@protonmail.ch wrote:
> > 
> > > Here would be also the corresponding log entries on a gluster node brick 
> > > log file:
> > > 
> > > [2018-04-09 06:58:47.363536] W [MSGID: 113093] 
> > > [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 
> > > 0-myvol-private-posix: removing gfid2path xattr failed on 
> > > /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a:
> > >  key = trusted.gfid2path.2529bb66b56be110 [No data available]
> > > 
> > > [2018-04-09 06:58:54.178133] E [MSGID: 113015] 
> > > 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread Ravishankar N



On 04/09/2018 04:36 PM, mabi wrote:

As I was suggested in the past by this mailing list a now ran a stat and 
getfattr on one of the problematic files on all nodes and at the end a stat on 
the fuse mount directly. The output is below:

NODE1:

STAT:
   File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
   Size: 0  Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
  Birth: -

GETFATTR:
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE2:

STAT:
   File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
   Size: 0  Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
  Birth: -

GETFATTR:
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE3:
STAT:
   File: 
/srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
   Size: 0  Blocks: 8  IO Block: 4096   regular empty file
Device: ca11h/51729dInode: 404058268   Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
  Birth: -

GETFATTR:
trusted.afr.dirty=0s
trusted.afr.myvol-private-client-0=0sAQAA
trusted.afr.myvol-private-client-1=0sAQAA


Looks like you hit the bug of arbiter becoming source (BZ 1482064) fixed 
by Karthik in 3.13. Just delete the trusted.afr xattrs on this brick and 
launch heal, that should fix it. But the file seems to have no content 
on both data bricks as well, so you might want to check if that was 
expected.


-Ravi


trusted.bit-rot.version=0sAgBavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

CLIENT GLUSTER MOUNT:
STAT:
   File: 
'/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
   Size: 0  Blocks: 0  IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371  Links: 1
Access: (0644/-rw-r--r--)  Uid: (20909/nch20909)   Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
  Birth: -
​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 9:49 AM, mabi  wrote:


​​

Here would be also the corresponding log entries on a gluster node brick log 
file:

[2018-04-09 06:58:47.363536] W [MSGID: 113093] 
[posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: 
removing gfid2path xattr failed on 
/data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a:
 key = trusted.gfid2path.2529bb66b56be110 [No data available]

[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE
 [No such file or directory]

Hope that helps to find out the issue.

‐‐‐ Original Message ‐‐‐

On April 9, 2018 9:37 AM, mabi m...@protonmail.ch wrote:


Hello,

Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) cluster to 
3.12.7 and this morning I got a warning that 9 files on one of my volumes are not synced. 
Ineeded checking that volume with a "volume heal info" shows that the third 
node (the arbitrer node) has 9 files to be healed but are not being healed automatically.

All nodes were always online and there was no network interruption so I am 
wondering if this might not 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
As I was suggested in the past by this mailing list a now ran a stat and 
getfattr on one of the problematic files on all nodes and at the end a stat on 
the fuse mount directly. The output is below:

NODE1:

STAT:
  File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
  Size: 0   Blocks: 38 IO Block: 131072 regular empty file
Device: 23h/35d Inode: 6822549 Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
 Birth: -

GETFATTR:
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE2:

STAT:
  File: 
‘/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile’
  Size: 0   Blocks: 38 IO Block: 131072 regular empty file
Device: 24h/36d Inode: 6825876 Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-09 08:58:54.311775605 +0200
Modify: 2018-04-09 08:58:54.311775605 +0200
Change: 2018-04-09 08:58:54.423774007 +0200
 Birth: -

GETFATTR:
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.gfid2path.d40e834f9a258d9f="13880e8c-13da-442f-8180-fa40b6f5327c/problematicfile"
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

NODE3:
STAT:
  File: 
/srv/glusterfs/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile
  Size: 0   Blocks: 8  IO Block: 4096   regular empty file
Device: ca11h/51729dInode: 404058268   Links: 2
Access: (0644/-rw-r--r--)  Uid: (20909/ UNKNOWN)   Gid: (20909/ UNKNOWN)
Access: 2018-04-05 16:16:55.292341447 +0200
Modify: 2018-04-05 16:16:55.292341447 +0200
Change: 2018-04-09 08:58:54.428074177 +0200
 Birth: -

GETFATTR:
trusted.afr.dirty=0s
trusted.afr.myvol-private-client-0=0sAQAA
trusted.afr.myvol-private-client-1=0sAQAA
trusted.bit-rot.version=0sAgBavUW2AAGBaA==
trusted.gfid=0smMGdfAozTLS8v1d4jMb42w==
trusted.glusterfs.quota.13880e8c-13da-442f-8180-fa40b6f5327c.contri.1=0sAQ==
trusted.pgfid.13880e8c-13da-442f-8180-fa40b6f5327c=0sAQ==

CLIENT GLUSTER MOUNT:
STAT:
  File: 
'/mnt/myvol-private/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfiledir.zip/OC_DEFAULT_MODULE/problematicfile'
  Size: 0   Blocks: 0  IO Block: 131072 regular empty file
Device: 1eh/30d Inode: 13600685574951729371  Links: 1
Access: (0644/-rw-r--r--)  Uid: (20909/nch20909)   Gid: (20909/nch20909)
Access: 2018-04-09 08:58:54.311556621 +0200
Modify: 2018-04-09 08:58:54.311556621 +0200
Change: 2018-04-09 08:58:54.423555611 +0200
 Birth: -
​​

‐‐‐ Original Message ‐‐‐

On April 9, 2018 9:49 AM, mabi  wrote:

> ​​
> 
> Here would be also the corresponding log entries on a gluster node brick log 
> file:
> 
> [2018-04-09 06:58:47.363536] W [MSGID: 113093] 
> [posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: 
> removing gfid2path xattr failed on 
> /data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a:
>  key = trusted.gfid2path.2529bb66b56be110 [No data available]
> 
> [2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 
> 0-myvol-private-posix: opendir failed on 
> /data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE
>  [No such file or directory]
> 
> Hope that helps to find out the issue.
> 
> ‐‐‐ Original Message ‐‐‐
> 
> On April 9, 2018 9:37 AM, mabi m...@protonmail.ch wrote:
> 
> > Hello,
> > 
> > Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) 
> > cluster to 3.12.7 and this morning I got a warning that 9 files on one of 
> > my volumes are not synced. Ineeded checking that volume with a "volume heal 
> > info" shows that the third node (the arbitrer node) has 9 files to be 
> > healed but are not being healed automatically.
> > 
> > All nodes were always online and there was no network interruption so I am 
> > wondering if this might not really be a split-brain issue but something 
> > else.
> > 
> > I found some interesting log entries on the client log file 
> > (/var/log/glusterfs/myvol-private.log) which I have included below in this 
> > mail. It looks like some renaming has gone wrong because a directory is 

Re: [Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Here would be also the corresponding log entries on a gluster node brick log 
file:

[2018-04-09 06:58:47.363536] W [MSGID: 113093] 
[posix-gfid-path.c:84:posix_remove_gfid2path_xattr] 0-myvol-private-posix: 
removing gfid2path xattr failed on 
/data/myvol-private/brick/.glusterfs/12/67/126759f6-8364-453c-9a9c-d9ed39198b7a:
 key = trusted.gfid2path.2529bb66b56be110 [No data available]
[2018-04-09 06:58:54.178133] E [MSGID: 113015] [posix.c:1208:posix_opendir] 
0-myvol-private-posix: opendir failed on 
/data/myvol-private/brick/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip/OC_DEFAULT_MODULE
 [No such file or directory]
​
Hope that helps to find out the issue.

‐‐‐ Original Message ‐‐‐

On April 9, 2018 9:37 AM, mabi  wrote:

> ​​
> 
> Hello,
> 
> Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) 
> cluster to 3.12.7 and this morning I got a warning that 9 files on one of my 
> volumes are not synced. Ineeded checking that volume with a "volume heal 
> info" shows that the third node (the arbitrer node) has 9 files to be healed 
> but are not being healed automatically.
> 
> All nodes were always online and there was no network interruption so I am 
> wondering if this might not really be a split-brain issue but something else.
> 
> I found some interesting log entries on the client log file 
> (/var/log/glusterfs/myvol-private.log) which I have included below in this 
> mail. It looks like some renaming has gone wrong because a directory is not 
> empty.
> 
> For your information I have upgraded my GlusterFS in offline mode and the 
> upgrade went smoothly.
> 
> What can I do to fix that issue?
> 
> Best regards,
> 
> Mabi
> 
> [2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 
> 0-myvol-private-dht: renaming 
> /dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip 
> (hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => 
> /dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip
>  (hash=myvol-private-replicate-0/cache=)
> 
> [2018-04-09 06:58:53.692440] W [MSGID: 114031] 
> [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote 
> operation failed [Directory not empty]
> 
> [2018-04-09 06:58:53.714129] W [MSGID: 114031] 
> [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: 
> remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c 
> (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
> 
> [2018-04-09 06:58:53.714161] W [MSGID: 114031] 
> [client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: 
> remote operation failed. Path: gfid:13880e8c-13da-442f-8180-fa40b6f5327c 
> (13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
> 
> [2018-04-09 06:58:53.715638] W [MSGID: 114031] 
> [client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote 
> operation failed [Directory not empty]
> 
> [2018-04-09 06:58:53.750372] I [MSGID: 108026] 
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 
> 0-myvol-private-replicate-0: performing metadata selfheal on 
> 1cc6facf-eca5-481c-a905-7a39faa25156
> 
> [2018-04-09 06:58:53.757677] I [MSGID: 108026] 
> [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
> Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. 
> sources=[2] sinks=0 1
> 
> [2018-04-09 06:58:53.775939] I [MSGID: 108026] 
> [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 
> 0-myvol-private-replicate-0: performing entry selfheal on 
> 1cc6facf-eca5-481c-a905-7a39faa25156
> 
> [2018-04-09 06:58:53.776237] I [MSGID: 108026] 
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 
> 0-myvol-private-replicate-0: performing metadata selfheal on 
> 13880e8c-13da-442f-8180-fa40b6f5327c
> 
> [2018-04-09 06:58:53.781762] I [MSGID: 108026] 
> [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
> Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. 
> sources=[2] sinks=0 1
> 
> [2018-04-09 06:58:53.796950] I [MSGID: 108026] 
> [afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
> Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2] 
> sinks=0 1
> 
> [2018-04-09 06:58:53.812682] I [MSGID: 108026] 
> [afr-self-heal-entry.c:887:afr_selfheal_entry_do] 
> 0-myvol-private-replicate-0: performing entry selfheal on 
> 13880e8c-13da-442f-8180-fa40b6f5327c
> 
> [2018-04-09 06:58:53.879382] E [MSGID: 108008] 
> [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: 
> Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain 
> observed. [Input/output error]
> 
> [2018-04-09 06:58:53.881514] E [MSGID: 108008] 
> [afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: 
> Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain 
> 

[Gluster-users] New 3.12.7 possible split-brain on replica 3

2018-04-09 Thread mabi
Hello,

Last Friday I upgraded my GlusterFS 3.10.7 3-way replica (with arbitrer) 
cluster to 3.12.7 and this morning I got a warning that 9 files on one of my 
volumes are not synced. Ineeded checking that volume with a "volume heal info" 
shows that the third node (the arbitrer node) has 9 files to be healed but are 
not being healed automatically.

All nodes were always online and there was no network interruption so I am 
wondering if this might not really be a split-brain issue but something else.

I found some interesting log entries on the client log file 
(/var/log/glusterfs/myvol-private.log) which I have included below in this 
mail. It looks like some renaming has gone wrong because a directory is not 
empty.

For your information I have upgraded my GlusterFS in offline mode and the 
upgrade went smoothly.

What can I do to fix that issue?

Best regards,
Mabi


[2018-04-09 06:58:46.906089] I [MSGID: 109066] [dht-rename.c:1741:dht_rename] 
0-myvol-private-dht: renaming 
/dir1/dir2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/azipfile.zip 
(hash=myvol-private-replicate-0/cache=myvol-private-replicate-0) => 
/dir1/di2/dir3/dir4/dir5/dir6/dir7/dir8/dir9/dir10/dir11/dir12_Archiv/azipfile.zip
 (hash=myvol-private-replicate-0/cache=)
[2018-04-09 06:58:53.692440] W [MSGID: 114031] 
[client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote 
operation failed [Directory not empty]
[2018-04-09 06:58:53.714129] W [MSGID: 114031] 
[client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-1: remote 
operation failed. Path:  
(13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.714161] W [MSGID: 114031] 
[client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-myvol-private-client-0: remote 
operation failed. Path:  
(13880e8c-13da-442f-8180-fa40b6f5327c) [No such file or directory]
[2018-04-09 06:58:53.715638] W [MSGID: 114031] 
[client-rpc-fops.c:670:client3_3_rmdir_cbk] 0-myvol-private-client-2: remote 
operation failed [Directory not empty]
[2018-04-09 06:58:53.750372] I [MSGID: 108026] 
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 
0-myvol-private-replicate-0: performing metadata selfheal on 
1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.757677] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
Completed metadata selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. 
sources=[2]  sinks=0 1 
[2018-04-09 06:58:53.775939] I [MSGID: 108026] 
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: 
performing entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156
[2018-04-09 06:58:53.776237] I [MSGID: 108026] 
[afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 
0-myvol-private-replicate-0: performing metadata selfheal on 
13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.781762] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
Completed metadata selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c. 
sources=[2]  sinks=0 1 
[2018-04-09 06:58:53.796950] I [MSGID: 108026] 
[afr-self-heal-common.c:1656:afr_log_selfheal] 0-myvol-private-replicate-0: 
Completed entry selfheal on 1cc6facf-eca5-481c-a905-7a39faa25156. sources=[2]  
sinks=0 1 
[2018-04-09 06:58:53.812682] I [MSGID: 108026] 
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 0-myvol-private-replicate-0: 
performing entry selfheal on 13880e8c-13da-442f-8180-fa40b6f5327c
[2018-04-09 06:58:53.879382] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: 
Failing READ on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain 
observed. [Input/output error]
[2018-04-09 06:58:53.881514] E [MSGID: 108008] 
[afr-read-txn.c:90:afr_read_txn_refresh_done] 0-myvol-private-replicate-0: 
Failing FGETXATTR on gfid a4c46519-7dda-489d-9f5d-811ededd53f1: split-brain 
observed. [Input/output error]
[2018-04-09 06:58:53.890073] W [MSGID: 108027] 
[afr-common.c:2798:afr_discover_done] 0-myvol-private-replicate-0: no read 
subvols for (null)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users