Re: [Gluster-devel] query about why glustershd can not afr_selfheal_recreate_entry because of "afr: Prevent null gfids in self-heal entry re-creation"

2018-01-16 Thread Ravishankar N



On 01/16/2018 02:22 PM, Lian, George (NSB - CN/Hangzhou) wrote:


Hi,

Thanks a lots for your update.

I would like try to introduce more detail for which the issue came from.

This issue is came from a test case in our team, it is the step like 
the following:


1)Setup a glusterfs ENV with replicate 2 storage server nodes and 2 
client nodes


2)Generate a split-brain file , sn-0 is normal, sn-1 is dirty.

Hi , sorry I did not understand the test case. What type of split-brain 
did you create? (data/metadata or gfid or file type mismatch)?


3)Delete the directory before heal begin  (in this phase, the normal 
correct file in sn-0 is deleted by “rm” command , dirty file is still 
there )



Delete from the backend brick directly?


4)After that, the self-heal process will always be failure with the 
log which attached in last mail


Maybe you can write a script or a .t file (like the ones in 
https://github.com/gluster/glusterfs/tree/master/tests/basic/afr) so 
that your test can be understood unambiguously.



Also attach some command output FYI.

From my understand , the Glusterfs maybe can’t handle the split-brain 
file in this case, could you share your comments and confirm whether 
do some enhancement for this case or not?


If you create a split-brain in gluster, self-heal cannot heal it. You 
need to resolve it using one of the methods listed in 
https://docs.gluster.org/en/latest/Troubleshooting/resolving-splitbrain/#heal-info-and-split-brain-resolution


Thanks,
Ravi

/_rm -rf /mnt/export/testdir rm: cannot remove 
'/mnt/export/testdir/test file': No data available_//__/


/__/

/__/

/_[root@sn-1:/root]_/

/_# ls -l /mnt/export/testdir/_/

/_ls: cannot access '/mnt/export/testdir/IORFILE_82_2': No data 
available_/


/_total 0_/

/_-? ? ? ? ?    ? test_file_/

/__/

/_[root@sn-1:/root]_/

/_# getfattr -m . -d -e hex /mnt/bricks/export/brick/testdir/_/

/_getfattr: Removing leading '/' from absolute path names_/

/_# file: mnt/bricks/export/brick/testdir/_/

/_trusted.afr.dirty=0x0001_/

/_trusted.afr.export-client-0=0x0054_/

/_trusted.gfid=0xb217d6af49024f189a69e0ccf5207572_/

/_trusted.glusterfs.dht=0x0001_/

/__/

/_[root@sn-0:/var/log/glusterfs]_/

/_#  getfattr -m . -d -e hex /mnt/bricks/export/brick/testdir/_/

/_getfattr: Removing leading '/' from absolute path names_/

/_# file: mnt/bricks/export/brick/testdir/_/

/_trusted.gfid=0xb217d6af49024f189a69e0ccf5207572_/

/_trusted.glusterfs.dht=0x0001_/

/__/

Best Regards

George

*From:*gluster-devel-boun...@gluster.org 
[mailto:gluster-devel-boun...@gluster.org] *On Behalf Of *Ravishankar N

*Sent:* Tuesday, January 16, 2018 1:44 PM
*To:* Zhou, Cynthia (NSB - CN/Hangzhou) 
; Gluster Devel 
*Subject:* Re: [Gluster-devel] query about why glustershd can not 
afr_selfheal_recreate_entry because of "afr: Prevent null gfids in 
self-heal entry re-creation"


+ gluster-devel

On 01/15/2018 01:41 PM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:

Hi glusterfs expert,

    Good day,

    When I do some test about glusterfs self-heal I find
following prints showing when dir/file type get error it cannot
get self-healed.

*Could you help to check if it is an expected behavior ? because I
find the code change **https://review.gluster.org/#/c/17981/**add
check for iatt->ia_type,  so what if a file’s ia_type get
corrupted ? in this case it should not get self-healed* ?


Yes, without knowing the ia-type , afr_selfheal_recreate_entry () 
cannot decide what type of FOP to do (mkdir/link/mknod ) to create the 
appropriate file on the sink. You would need to find out why the 
source brick is not returning valid ia_type. i.e. why 
replies[source].poststat is not valid.

Thanks,
Ravi


Thanks!

//heal info output

[root@sn-0:/home/robot]

# gluster v heal export info

Brick sn-0.local:/mnt/bricks/export/brick

Status: Connected

Number of entries: 0

Brick sn-1.local:/mnt/bricks/export/brick

/testdir - Is in split-brain

Status: Connected

Number of entries: 1

//sn-1 glustershd
log///

[2018-01-15 03:53:40.011422] I [MSGID: 108026]
[afr-self-heal-entry.c:887:afr_selfheal_entry_do]
0-export-replicate-0: performing entry selfheal on
b217d6af-4902-4f18-9a69-e0ccf5207572

[2018-01-15 03:53:40.013994] W [MSGID: 114031]
[client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-export-client-1:
remote operation failed. Path: (null)
(----) [No data available]

[2018-01-15 03:53:40.014025] E [MSGID: 108037]
[afr-self-heal-entry.c:92:afr_selfheal_recreate_entry]
0-export-replicate-0: Invalid ia_type (0) or

Re: [Gluster-devel] query about why glustershd can not afr_selfheal_recreate_entry because of "afr: Prevent null gfids in self-heal entry re-creation"

2018-01-15 Thread Ravishankar N

+ gluster-devel


On 01/15/2018 01:41 PM, Zhou, Cynthia (NSB - CN/Hangzhou) wrote:

Hi glusterfs expert,
    Good day,
    When I do some test about glusterfs self-heal I find following 
prints showing when dir/file type get error it cannot get self-healed.
*Could you help to check if it is an expected behavior ? because I 
find the code change **_https://review.gluster.org/#/c/17981/_**add 
check for **iatt->ia_typ**e,  so what if a file’s ia_type get 
corrupted ? in this case it should not get self-healed* ?


Yes, without knowing the ia-type , afr_selfheal_recreate_entry () cannot 
decide what type of FOP to do (mkdir/link/mknod ) to create the 
appropriate file on the sink. You would need to find out why the source 
brick is not returning valid ia_type. i.e. why replies[source].poststat 
is not valid.

Thanks,
Ravi


Thanks!
//heal info output
[root@sn-0:/home/robot]
# gluster v heal export info
Brick sn-0.local:/mnt/bricks/export/brick
Status: Connected
Number of entries: 0
Brick sn-1.local:/mnt/bricks/export/brick
/testdir - Is in split-brain
Status: Connected
Number of entries: 1
//sn-1 glustershd 
log///
[2018-01-15 03:53:40.011422] I [MSGID: 108026] 
[afr-self-heal-entry.c:887:afr_selfheal_entry_do] 
0-export-replicate-0: performing entry selfheal on 
b217d6af-4902-4f18-9a69-e0ccf5207572
[2018-01-15 03:53:40.013994] W [MSGID: 114031] 
[client-rpc-fops.c:2860:client3_3_lookup_cbk] 0-export-client-1: 
remote operation failed. Path: (null) 
(----) [No data available]
[2018-01-15 03:53:40.014025] E [MSGID: 108037] 
[afr-self-heal-entry.c:92:afr_selfheal_recreate_entry] 
0-export-replicate-0: Invalid ia_type (0) or 
gfid(----). source brick=1, 
pargfid=----, name=IORFILE_82_2
//gdb attached to sn-1 
glustershd/

root@sn-1:/var/log/glusterfs]
# gdb attach 2191
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<_http://gnu.org/licenses/gpl.html_>

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<_http://www.gnu.org/software/gdb/bugs/_>.
Find the GDB manual and other documentation resources online at:
<_http://www.gnu.org/software/gdb/documentation/_>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
attach: No such file or directory.
Attaching to process 2191
[New LWP 2192]
[New LWP 2193]
[New LWP 2194]
[New LWP 2195]
[New LWP 2196]
[New LWP 2197]
[New LWP 2239]
[New LWP 2241]
[New LWP 2243]
[New LWP 2245]
[New LWP 2247]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x7f90aca037bd in __pthread_join (threadid=140259279345408, 
thread_return=0x0) at pthread_join.c:90

90 pthread_join.c: No such file or directory.
(gdb) break afr_selfheal_recreate_entry
Breakpoint 1 at 0x7f90a3b56dec: file afr-self-heal-entry.c, line 73.
(gdb) c
Continuing.
[Switching to Thread 0x7f90a1b8e700 (LWP 2241)]
Thread 9 "glustershdheal" hit Breakpoint 1, 
afr_selfheal_recreate_entry (frame=0x7f90980018d0, dst=0, source=1, 
sources=0x7f90a1b8ceb0 "", dir=0x7f9098011940, name=0x7f909c015d48 
"IORFILE_82_2",

inode=0x7f9098001bd0, replies=0x7f90a1b8c890) at afr-self-heal-entry.c:73
73 afr-self-heal-entry.c: No such file or directory.
(gdb) n
74  in afr-self-heal-entry.c
(gdb) n
75  in afr-self-heal-entry.c
(gdb) n
76  in afr-self-heal-entry.c
(gdb) n
77  in afr-self-heal-entry.c
(gdb) n
78  in afr-self-heal-entry.c
(gdb) n
79  in afr-self-heal-entry.c
(gdb) n
80  in afr-self-heal-entry.c
(gdb) n
81  in afr-self-heal-entry.c
(gdb) n
82  in afr-self-heal-entry.c
(gdb) n
83  in afr-self-heal-entry.c
(gdb) n
85  in afr-self-heal-entry.c
(gdb) n
86  in afr-self-heal-entry.c
(gdb) n
87  in afr-self-heal-entry.c
(gdb) print iatt->ia_type
$1 = IA_INVAL
(gdb) print gf_uuid_is_null(iatt->ia_gfid)
$2 = 1
(gdb) bt
#0 afr_selfheal_recreate_entry (frame=0x7f90980018d0, dst=0, source=1, 
sources=0x7f90a1b8ceb0 "", dir=0x7f9098011940, name=0x7f909c015d48 
"IORFILE_82_2", inode=0x7f9098001bd0, replies=0x7f90a1b8c890)

    at afr-self-heal-entry.c:87
#1 0x7f90a3b57d20 in __afr_selfheal_merge_dirent 
(frame=0x7f90980018d0, this=0x7f90a4024610, fd=0x7f9098413090, 
name=0x7f909c015d48 "IORFILE_82_2", inode=0x7f9098001bd0,
sources=0x7f90a1b8ceb0 "", healed_sinks=0x7f90a1b8ce70 
"\001\001A\230\220\177", locked_on=0x7f90a1b8ce50 
"\001\001\270\241\220\177",