Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-20 Thread Anirban Ghoshal
Ok, no problem. The issue is very rare, even with our setup - we have seen it 
only once on one site even though we have been in production for several months 
now. For now, we can live with that IMO. 

And, thanks again. 

Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-19 Thread Anirban Ghoshal
It is possible, yes, because these are actually a kind of log files. I suppose, 
like other logging frameworks these files an remain open for a considerable 
period, and then get renamed to support log rotate semantics. 

That said, I might need to check with the team that actually manages the 
logging framework to be sure. I only take care of the file-system stuff. I can 
tell you for sure Monday. 

If it is the same race that you mention, is there a fix for it?

Thanks,
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-19 Thread Pranith Kumar Karampuri


On 10/19/2014 01:36 PM, Anirban Ghoshal wrote:
It is possible, yes, because these are actually a kind of log files. I 
suppose, like other logging frameworks these files an remain open for 
a considerable period, and then get renamed to support log rotate 
semantics.


That said, I might need to check with the team that actually manages 
the logging framework to be sure. I only take care of the file-system 
stuff. I can tell you for sure Monday.


If it is the same race that you mention, is there a fix for it?

Thanks,
Anirban



I am working on the fix.

RCA:
0) Lets say the file 'abc.log' is opened for writing on replica pair 
(brick-0, brick-1)

1) brick-0 went down
2) abc.log is renamed to abc.log.1
3) brick-0 comes back up
4) re-open on old abc.log happens from mount to brick-0
5) self-heal kicks in and deletes old abc.log and creates and syncs 
abc.log.1
6) But the mount is still writing to the deleted 'old abc.log' on 
brick-0 so abc.log.1 file remains at the same size while abc.log.1 file 
keeps increasing on brick-1. This leads to size mismatch split-brain on 
abc.log.1.


Race happens between steps 4), 5). If 5) happens before 4) no 
split-brain will be observed.


Work-around:

0) Take backup of good abc.log.1 file from brick-1. (Just being paranoid)

Do any of the following two steps to make sure the stale file that is 
open is closed
1-a) Take the brick process with bad file down using kill -9 brick-pid 
(In my example brick-0).

1-b) Introduce a temporary disconnect between mount and brick-0.
(I would choose 1-a)
2) Remove the bad file(abc.log.1) and its gfid-backend-file from brick-0
3) Bring the brick back up (gluster volume start volname 
force)/restore the connection and let it heal by doing 'stat' on the 
file abc.log.1 on the mount.


This bug existed from 2012, from the first time I implemented 
rename/hard-link self-heal. It is difficult to re-create. I have to put 
break-points at several places in the process to hit the race.


Pranith


Thanks,
Anirban


*From: * Pranith Kumar Karampuri pkara...@redhat.com;
*To: * Anirban Ghoshal chalcogen_eg_oxy...@yahoo.com; 
gluster-users@gluster.org;
*Subject: * Re: [Gluster-users] Split-brain seen with [0 0] pending 
matrix and io-cache page errors

*Sent: * Sun, Oct 19, 2014 5:42:24 AM


On 10/18/2014 04:36 PM, Anirban Ghoshal wrote:

Hi,

Yes, they do, and considerably. I'd forgotten to mention that on my 
last email. Their mtimes, however, as far as i could tell on separate 
servers, seemed to coincide.


Thanks,
Anirban




Are these files always open? And is it possible that the file could 
have been renamed when one of the bricks was offline? I know of a race 
which can introduce this one. Just trying to find if it is the same case.


Pranith




*From: * Pranith Kumar Karampuri pkara...@redhat.com;
*To: * Anirban Ghoshal chalcogen_eg_oxy...@yahoo.com; 
gluster-users@gluster.org gluster-users@gluster.org;
*Subject: * Re: [Gluster-users] Split-brain seen with [0 0] pending 
matrix and io-cache page errors

*Sent: * Sat, Oct 18, 2014 12:26:08 AM

hi,
  Could you see if the size of the file mismatches?

Pranith

On 10/18/2014 04:20 AM, Anirban Ghoshal wrote:

Hi everyone,

I have this really confusing split-brain here that's bothering me. I 
am running glusterfs 3.4.2 over linux 2.6.34. I have a replica 2 
volume 'testvol' that is It seems I cannot read/stat/edit the file 
in question, and `gluster volume heal testvol info split-brain` 
shows nothing. Here are the logs from the fuse-mount for the volume:


[2014-09-29 07:53:02.867111] W [fuse-bridge.c:1172:fuse_err_cbk] 
0-glusterfs-fuse: 4560969: FLUSH() ERR = -1 (Input/output error)
[2014-09-29 07:54:16.007799] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8529d20  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.007854] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561103: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008018] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8607ee0  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.008056] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561104: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008233] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8066f30  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.008269] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561105: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008800] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c860bcf0  waitq = 
0x7fd5c863b1f0
[2014-09-29 07:54:16.008839] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561107: READ = -1 (Input/output error)
[2014-09-29 07:54:16.009365] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-19 Thread Anirban Ghoshal
I see. Thanks a tonne for the thorough explanation! :) I can see that our setup 
would be vulnerable here because the logger on one server is not generally 
aware of the state of the replica on the other server. So, it is possible that 
the log files may have been renamed before heal had a chance to kick in. 

Could I also request you for the bug ID (should there be one) against which you 
are coding up the fix, so that we could get a notification once it is passed?

Also, as an aside, is O_DIRECT supposed to prevent this from occurring if one 
were to make allowance for the performance hit? 

Thanks again,
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-19 Thread Pranith Kumar Karampuri


On 10/19/2014 06:05 PM, Anirban Ghoshal wrote:
I see. Thanks a tonne for the thorough explanation! :) I can see that 
our setup would be vulnerable here because the logger on one server is 
not generally aware of the state of the replica on the other server. 
So, it is possible that the log files may have been renamed before 
heal had a chance to kick in.


Could I also request you for the bug ID (should there be one) against 
which you are coding up the fix, so that we could get a notification 
once it is passed?


This bug was reported by Redhat QE and the bug is cloned upstream. I 
copied the relevant content so you would understand the context:

https://bugzilla.redhat.com/show_bug.cgi?id=1154491

Pranith


Also, as an aside, is O_DIRECT supposed to prevent this from occurring 
if one were to make allowance for the performance hit?



Unfortunately no :-(. As far as I understand that was the only work-around.

Pranith


Thanks again,
Anirban



*From: * Pranith Kumar Karampuri pkara...@redhat.com;
*To: * Anirban Ghoshal chalcogen_eg_oxy...@yahoo.com; 
gluster-users@gluster.org;
*Subject: * Re: [Gluster-users] Split-brain seen with [0 0] pending 
matrix and io-cache page errors

*Sent: * Sun, Oct 19, 2014 9:01:58 AM


On 10/19/2014 01:36 PM, Anirban Ghoshal wrote:
It is possible, yes, because these are actually a kind of log files. 
I suppose, like other logging frameworks these files an remain open 
for a considerable period, and then get renamed to support log rotate 
semantics.


That said, I might need to check with the team that actually manages 
the logging framework to be sure. I only take care of the file-system 
stuff. I can tell you for sure Monday.


If it is the same race that you mention, is there a fix for it?

Thanks,
Anirban



I am working on the fix.

RCA:
0) Lets say the file 'abc.log' is opened for writing on replica pair 
(brick-0, brick-1)

1) brick-0 went down
2) abc.log is renamed to abc.log.1
3) brick-0 comes back up
4) re-open on old abc.log happens from mount to brick-0
5) self-heal kicks in and deletes old abc.log and creates and syncs 
abc.log.1
6) But the mount is still writing to the deleted 'old abc.log' on 
brick-0 so abc.log.1 file remains at the same size while abc.log.1 
file keeps increasing on brick-1. This leads to size mismatch 
split-brain on abc.log.1.


Race happens between steps 4), 5). If 5) happens before 4) no 
split-brain will be observed.


Work-around:

0) Take backup of good abc.log.1 file from brick-1. (Just being paranoid)

Do any of the following two steps to make sure the stale file that is 
open is closed
1-a) Take the brick process with bad file down using kill -9 
brick-pid (In my example brick-0).

1-b) Introduce a temporary disconnect between mount and brick-0.
(I would choose 1-a)
2) Remove the bad file(abc.log.1) and its gfid-backend-file from brick-0
3) Bring the brick back up (gluster volume start volname 
force)/restore the connection and let it heal by doing 'stat' on the 
file abc.log.1 on the mount.


This bug existed from 2012, from the first time I implemented 
rename/hard-link self-heal. It is difficult to re-create. I have to 
put break-points at several places in the process to hit the race.


Pranith



Thanks,
Anirban


*From: * Pranith Kumar Karampuri pkara...@redhat.com;
*To: * Anirban Ghoshal chalcogen_eg_oxy...@yahoo.com; 
gluster-users@gluster.org;
*Subject: * Re: [Gluster-users] Split-brain seen with [0 0] pending 
matrix and io-cache page errors

*Sent: * Sun, Oct 19, 2014 5:42:24 AM


On 10/18/2014 04:36 PM, Anirban Ghoshal wrote:

Hi,

Yes, they do, and considerably. I'd forgotten to mention that on my 
last email. Their mtimes, however, as far as i could tell on 
separate servers, seemed to coincide.


Thanks,
Anirban




Are these files always open? And is it possible that the file could 
have been renamed when one of the bricks was offline? I know of a 
race which can introduce this one. Just trying to find if it is the 
same case.


Pranith




*From: * Pranith Kumar Karampuri pkara...@redhat.com;
*To: * Anirban Ghoshal chalcogen_eg_oxy...@yahoo.com; 
gluster-users@gluster.org gluster-users@gluster.org;
*Subject: * Re: [Gluster-users] Split-brain seen with [0 0] pending 
matrix and io-cache page errors

*Sent: * Sat, Oct 18, 2014 12:26:08 AM

hi,
  Could you see if the size of the file mismatches?

Pranith

On 10/18/2014 04:20 AM, Anirban Ghoshal wrote:

Hi everyone,

I have this really confusing split-brain here that's bothering me. 
I am running glusterfs 3.4.2 over linux 2.6.34. I have a replica 2 
volume 'testvol' that is It seems I cannot read/stat/edit the file 
in question, and `gluster volume heal testvol info split-brain` 
shows nothing. Here are the logs from the fuse-mount for the volume

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-18 Thread Anirban Ghoshal
Hi,

Yes, they do, and considerably. I#39;d forgotten to mention that on my last 
email. Their mtimes, however, as far as i could tell on separate servers, 
seemed to coincide. 

Thanks,
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-18 Thread Pranith Kumar Karampuri


On 10/18/2014 04:36 PM, Anirban Ghoshal wrote:

Hi,

Yes, they do, and considerably. I'd forgotten to mention that on my 
last email. Their mtimes, however, as far as i could tell on separate 
servers, seemed to coincide.


Thanks,
Anirban




Are these files always open? And is it possible that the file could have 
been renamed when one of the bricks was offline? I know of a race which 
can introduce this one. Just trying to find if it is the same case.


Pranith



*From: * Pranith Kumar Karampuri pkara...@redhat.com;
*To: * Anirban Ghoshal chalcogen_eg_oxy...@yahoo.com; 
gluster-users@gluster.org gluster-users@gluster.org;
*Subject: * Re: [Gluster-users] Split-brain seen with [0 0] pending 
matrix and io-cache page errors

*Sent: * Sat, Oct 18, 2014 12:26:08 AM

hi,
  Could you see if the size of the file mismatches?

Pranith

On 10/18/2014 04:20 AM, Anirban Ghoshal wrote:

Hi everyone,

I have this really confusing split-brain here that's bothering me. I 
am running glusterfs 3.4.2 over linux 2.6.34. I have a replica 2 
volume 'testvol' that is It seems I cannot read/stat/edit the file in 
question, and `gluster volume heal testvol info split-brain` shows 
nothing. Here are the logs from the fuse-mount for the volume:


[2014-09-29 07:53:02.867111] W [fuse-bridge.c:1172:fuse_err_cbk] 
0-glusterfs-fuse: 4560969: FLUSH() ERR = -1 (Input/output error)
[2014-09-29 07:54:16.007799] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8529d20  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.007854] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561103: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008018] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8607ee0  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.008056] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561104: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008233] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8066f30  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.008269] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561105: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008800] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c860bcf0  waitq = 
0x7fd5c863b1f0
[2014-09-29 07:54:16.008839] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561107: READ = -1 (Input/output error)
[2014-09-29 07:54:16.009365] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c85fd120  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.009413] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561109: READ = -1 (Input/output error)
[2014-09-29 07:54:16.040549] W [afr-open.c:213:afr_open] 
0-testvol-replicate-0: failed to open as split brain seen, returning EIO
[2014-09-29 07:54:16.040594] W [fuse-bridge.c:915:fuse_fd_cbk] 
0-glusterfs-fuse: 4561142: OPEN() 
/SECLOG/20140908.d/SECLOG_00427425_.log 
= -1 (Input/output error)


Could somebody please give me some clue on where to begin? I checked 
the xattrs on 
/SECLOG/20140908.d/SECLOG_00427425_.log 
and it seems the changelogs are [0, 0] on both replicas, and the 
gfid's match.


Thank you very much for any help on this.
Anirban





___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-17 Thread Anirban Ghoshal
Hi everyone,

I have this really confusing split-brain here that's bothering me. I am running 
glusterfs 3.4.2 over linux 2.6.34. I have a replica 2 volume 'testvol' that is 
It seems I cannot read/stat/edit the file in question, and `gluster volume heal 
testvol info split-brain` shows nothing. Here are the logs from the fuse-mount 
for the volume:

[2014-09-29 07:53:02.867111] W [fuse-bridge.c:1172:fuse_err_cbk] 
0-glusterfs-fuse: 4560969: FLUSH() ERR = -1 (Input/output error) 
[2014-09-29 07:54:16.007799] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8529d20  waitq = 
0x7fd5c8067d40 
[2014-09-29 07:54:16.007854] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561103: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.008018] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8607ee0  waitq = 
0x7fd5c8067d40 
[2014-09-29 07:54:16.008056] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561104: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.008233] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8066f30  waitq = 
0x7fd5c8067d40 
[2014-09-29 07:54:16.008269] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561105: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.008800] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c860bcf0  waitq = 
0x7fd5c863b1f0 
[2014-09-29 07:54:16.008839] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561107: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.009365] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c85fd120  waitq = 
0x7fd5c8067d40 
[2014-09-29 07:54:16.009413] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561109: READ = -1 (Input/output error) 
[2014-09-29 07:54:16.040549] W [afr-open.c:213:afr_open] 0-testvol-replicate-0: 
failed to open as split brain seen, returning EIO 
[2014-09-29 07:54:16.040594] W [fuse-bridge.c:915:fuse_fd_cbk] 
0-glusterfs-fuse: 4561142: OPEN() 
/SECLOG/20140908.d/SECLOG_00427425_.log = -1 
(Input/output error)


Could somebody please give me some clue on where to begin? I checked the xattrs 
on /SECLOG/20140908.d/SECLOG_00427425_.log and 
it seems the changelogs are [0, 0] on both replicas, and the gfid's match.

Thank you very much for any help on this.
Anirban___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Split-brain seen with [0 0] pending matrix and io-cache page errors

2014-10-17 Thread Pranith Kumar Karampuri

hi,
  Could you see if the size of the file mismatches?

Pranith

On 10/18/2014 04:20 AM, Anirban Ghoshal wrote:

Hi everyone,

I have this really confusing split-brain here that's bothering me. I 
am running glusterfs 3.4.2 over linux 2.6.34. I have a replica 2 
volume 'testvol' that is It seems I cannot read/stat/edit the file in 
question, and `gluster volume heal testvol info split-brain` shows 
nothing. Here are the logs from the fuse-mount for the volume:


[2014-09-29 07:53:02.867111] W [fuse-bridge.c:1172:fuse_err_cbk] 
0-glusterfs-fuse: 4560969: FLUSH() ERR = -1 (Input/output error)
[2014-09-29 07:54:16.007799] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8529d20  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.007854] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561103: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008018] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8607ee0  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.008056] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561104: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008233] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c8066f30  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.008269] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561105: READ = -1 (Input/output error)
[2014-09-29 07:54:16.008800] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c860bcf0  waitq = 
0x7fd5c863b1f0
[2014-09-29 07:54:16.008839] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561107: READ = -1 (Input/output error)
[2014-09-29 07:54:16.009365] W [page.c:991:__ioc_page_error] 
0-testvol-io-cache: page error for page = 0x7fd5c85fd120  waitq = 
0x7fd5c8067d40
[2014-09-29 07:54:16.009413] W [fuse-bridge.c:2089:fuse_readv_cbk] 
0-glusterfs-fuse: 4561109: READ = -1 (Input/output error)
[2014-09-29 07:54:16.040549] W [afr-open.c:213:afr_open] 
0-testvol-replicate-0: failed to open as split brain seen, returning EIO
[2014-09-29 07:54:16.040594] W [fuse-bridge.c:915:fuse_fd_cbk] 
0-glusterfs-fuse: 4561142: OPEN() 
/SECLOG/20140908.d/SECLOG_00427425_.log = 
-1 (Input/output error)


Could somebody please give me some clue on where to begin? I checked 
the xattrs on 
/SECLOG/20140908.d/SECLOG_00427425_.log and 
it seems the changelogs are [0, 0] on both replicas, and the gfid's match.


Thank you very much for any help on this.
Anirban





___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users