Re: [Gluster-users] Hundreds of duplicate files
Hi Olav, I have a hunch that our problem was caused by improper unmounting of the gluster volume, and have since found that the proper order should be: kill all jobs using volume - unmount volume on clients - gluster volume stop - stop gluster service (if necessary) In my case, I wrote a Python script to find duplicate files on the mounted volume, then delete the corresponding link files on the bricks (making sure to also delete files in the .glusterfs directory) However, your find command was also suggested to me and I think it's a simpler solution. I believe removing all link files (even ones that are not causing duplicates) is fine since the next file access gluster will do a lookup on all bricks and recreate any link files if necessary. Hopefully a gluster expert can chime in on this point as I'm not completely sure. Keep in mind your setup is somewhat different than mine as I have only 5 bricks with no replication. Regards, Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Olav Peeters opeet...@gmail.com Date: 2/18/15 10:52 am To: gluster-users@gluster.org, tben...@3vgeomatics.com Hi all, I'm have this problem after upgrading from 3.5.3 to 3.6.2. At the moment I am still waiting for a heal to finish (on a 31TB volume with 42 bricks, replicated over three nodes). Tom, how did you remove the duplicates? with 42 bricks I will not be able to do this manually.. Did a: find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \; work for you? Should this type of thing ideally not be checked and mended by a heal? Does anyone have an idea yet how this happens in the first place? Can it be connected to upgrading? Cheers, Olav On 01/01/15 03:07, tben...@3vgeomatics.com wrote: No, the files can be read on a newly mounted client! I went ahead and deleted all of the link files associated with these duplicates, and then remounted the volume. The problem is fixed! Thanks again for the help, Joe and Vijay. Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Vijay Bellur vbel...@redhat.com Date: 12/28/14 3:23 am To: tben...@3vgeomatics.com, gluster-users@gluster.org On 12/28/2014 01:20 PM, tben...@3vgeomatics.com wrote: Hi Vijay, Yes the files are still readable from the .glusterfs path. There is no explicit error. However, trying to read a text file in python simply gives me null characters: open('ott_mf_itab').readlines() ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'] And reading binary files does the same Is this behavior seen with a freshly mounted client too? -Vijay - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Vijay Bellur vbel...@redhat.com Date: 12/27/14 9:57 pm To: tben...@3vgeomatics.com, gluster-users@gluster.org On 12/28/2014 10:13 AM, tben...@3vgeomatics.com wrote: Thanks Joe, I've read your blog post as well as your post regarding the .glusterfs directory. I found some unneeded duplicate files which were not being read properly. I then deleted the link file from the brick. This always removes the duplicate file from the listing, but the file does not always become readable. If I also delete the associated file in the .glusterfs directory on that brick, then some more files become readable. However this solution still doesn't work for all files. I know the file on the brick is not corrupt as it can be read directly from the brick directory. For files that are not readable from the client, can you check if the file is readable from the .glusterfs/ path? What is the specific error that is seen while trying to read one such file from the client? Thanks, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org
Re: [Gluster-users] Hundreds of duplicate files
No, the files can be read on a newly mounted client! I went ahead and deleted all of the link files associated with these duplicates, and then remounted the volume. The problem is fixed! Thanks again for the help, Joe and Vijay. Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Vijay Bellur vbel...@redhat.com Date: 12/28/14 3:23 am To: tben...@3vgeomatics.com, gluster-users@gluster.org On 12/28/2014 01:20 PM, tben...@3vgeomatics.com wrote: Hi Vijay, Yes the files are still readable from the .glusterfs path. There is no explicit error. However, trying to read a text file in python simply gives me null characters: open('ott_mf_itab').readlines() ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'] And reading binary files does the same Is this behavior seen with a freshly mounted client too? -Vijay - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Vijay Bellur vbel...@redhat.com Date: 12/27/14 9:57 pm To: tben...@3vgeomatics.com, gluster-users@gluster.org On 12/28/2014 10:13 AM, tben...@3vgeomatics.com wrote: Thanks Joe, I've read your blog post as well as your post regarding the .glusterfs directory. I found some unneeded duplicate files which were not being read properly. I then deleted the link file from the brick. This always removes the duplicate file from the listing, but the file does not always become readable. If I also delete the associated file in the .glusterfs directory on that brick, then some more files become readable. However this solution still doesn't work for all files. I know the file on the brick is not corrupt as it can be read directly from the brick directory. For files that are not readable from the client, can you check if the file is readable from the .glusterfs/ path? What is the specific error that is seen while trying to read one such file from the client? Thanks, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Hundreds of duplicate files
Moving the file with linkto attribute worked! Just one copy of the file is retained in the listing and can be read without problems. I will write a script to remove these rogue link files from the bricks - any risks associated with this? Thanks everyone for your help, of course if anyone could explain how this happened I would love to hear it.. Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Vijay Bellur vbel...@redhat.com Date: 12/27/14 9:12 am To: tben...@3vgeomatics.com, gluster-users@gluster.org On 12/27/2014 01:11 PM, tben...@3vgeomatics.com wrote: Thanks for your continued help Joe. A demonstration of the problem, in this case I was able to open the file in vim (a text file) without any issues, however sometimes duplicated text files open in vim as one line consisting of @ characters, and binary data files can also not be opened correctly for reading. However the duplicate listing is still an issue. Note that Dec 13 was the date of a server crash. [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro* -rw-rw-r-T 1 parwant users 1712 Dec 13 19:02 tif2flt.pro -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 tif2flt.pro A few minutes later doing the same listing.. sticky bit disappeared and modification date changed [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro* -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 /sar/complete/vancouver/refdem/tif2flt.pro -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 /sar/complete/vancouver/refdem/tif2flt.pro [root@jongoo ~]# getfattr -m . -d -e hex /data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3300 [root@ndovu ~]# getfattr -m . -d -e hex /data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd Is rebalance running on this volume right now? If not, can you please move out the file copy with trusted.glusterfs.dht.linkto attribute out of the brick directory (/data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro) to an alternate location check the behavior? Thanks, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Hundreds of duplicate files
Ok, I am really tearing my hair out here. I tried doing this manually for several other files just to be sure. And in these cases it removed the duplicate file from the directory listing, but the file can still not be read.. Reading directly from the brick works fine. - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/27/14 12:01 pm To: gluster-users@gluster.org Should be safe. Here's what I've done in the past to clean up rogue dht link files (not that yours looked rogue though): find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \; On 12/27/2014 11:09 AM, tben...@3vgeomatics.com wrote: Moving the file with linkto attribute worked! Just one copy of the file is retained in the listing and can be read without problems. I will write a script to remove these rogue link files from the bricks - any risks associated with this? Thanks everyone for your help, of course if anyone could explain how this happened I would love to hear it.. Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Vijay Bellur vbel...@redhat.com Date: 12/27/14 9:12 am To: tben...@3vgeomatics.com, gluster-users@gluster.org On 12/27/2014 01:11 PM, tben...@3vgeomatics.com wrote: Thanks for your continued help Joe. A demonstration of the problem, in this case I was able to open the file in vim (a text file) without any issues, however sometimes duplicated text files open in vim as one line consisting of @ characters, and binary data files can also not be opened correctly for reading. However the duplicate listing is still an issue. Note that Dec 13 was the date of a server crash. [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro* -rw-rw-r-T 1 parwant users 1712 Dec 13 19:02 tif2flt.pro -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 tif2flt.pro A few minutes later doing the same listing.. sticky bit disappeared and modification date changed [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro* -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 /sar/complete/vancouver/refdem/tif2flt.pro -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 /sar/complete/vancouver/refdem/tif2flt.pro [root@jongoo ~]# getfattr -m . -d -e hex /data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3300 [root@ndovu ~]# getfattr -m . -d -e hex /data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd Is rebalance running on this volume right now? If not, can you please move out the file copy with trusted.glusterfs.dht.linkto attribute out of the brick directory (/data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro) to an alternate location check the behavior? Thanks, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Hundreds of duplicate files
That didn't fix it unfortunately. In fact, I've done a full rebalance after initially discovering the problem and after updating Gluster, but nothing was changed.. I don't know too much about how Gluster works internally; is it possible to compute the hash for each duplicate filename - figure out on which brick it belong is and find where it actually resides, then recreate the link file or update the linkto attribute? Assuming broken link files are the problem.. - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/27/14 1:55 pm To: tben...@3vgeomatics.com, gluster-users@gluster.org I'm wondering if this is from a corrupted failed rebalance. In a directory that has duplicates, do setfattr -n trusted.distribute.fix.layout -v 1 . If that fixes it, do a rebalance...fix-layout On December 27, 2014 12:38:01 PM PST, tben...@3vgeomatics.com wrote: Ok, I am really tearing my hair out here. I tried doing this manually for several other files just to be sure. And in these cases it removed the duplicate file from the directory listing, but the file can still not be read.. Reading directly from the brick works fine. - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/27/14 12:01 pm To: gluster-users@gluster.org Should be safe. Here's what I've done in the past to clean up rogue dht link files (not that yours looked rogue though): find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \; On 12/27/2014 11:09 AM, tben...@3vgeomatics.com wrote: Moving the file with linkto attribute worked! Just one copy of the file is retained in the listing and can be read without problems. I will write a script to remove these rogue link files from the bricks - any risks associated with this? Thanks everyone for your help, of course if anyone could explain how this happened I would love to hear it.. Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Vijay Bellur vbel...@redhat.com Date: 12/27/14 9:12 am To: tben...@3vgeomatics.com, gluster-users@gluster.org On 12/27/2014 01:11 PM, tben...@3vgeomatics.com wrote: Thanks for your continued help Joe. A demonstration of the problem, in this case I was able to open the file in vim (a text file) without any issues, however sometimes duplicated text files open in vim as one line consisting of @ characters, and binary data files can also not be opened correctly for reading. However the duplicate listing is still an issue. Note that Dec 13 was the date of a server crash. [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro* -rw-rw-r-T 1 parwant users 1712 Dec 13 19:02 tif2flt.pro -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 tif2flt.pro A few minutes later doing the same listing.. sticky bit disappeared and modification date changed [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro* -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 /sar/complete/vancouver/refdem/tif2flt.pro -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 /sar/complete/vancouver/refdem/tif2flt.pro [root@jongoo ~]# getfattr -m . -d -e hex /data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3300 [root@ndovu ~]# getfattr -m . -d -e hex /data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd Is rebalance running on this volume right now? If not, can you please move out the file copy with trusted.glusterfs.dht.linkto attribute out of the brick directory (/data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro) to an alternate location check the behavior? Thanks, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list
Re: [Gluster-users] Hundreds of duplicate files
Thanks Joe, I've read your blog post as well as your post regarding the .glusterfs directory. I found some unneeded duplicate files which were not being read properly. I then deleted the link file from the brick. This always removes the duplicate file from the listing, but the file does not always become readable. If I also delete the associated file in the .glusterfs directory on that brick, then some more files become readable. However this solution still doesn't work for all files. I know the file on the brick is not corrupt as it can be read directly from the brick directory. Hopefully you have some other ideas I could try. Otherwise, I may proceed by writing a script to handle the files one-by-one. Move the actual file from .glusterfs off the brick to a temporary location, remove all references to the file on the bricks, then copy the file back onto the mounted volume.. It's not ideal but hopefully this is a one-time occurence. Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/27/14 3:28 pm To: tben...@3vgeomatics.com, gluster-users@gluster.org The linkfile you showed earlier was perfect. Check this article on my blog for the details on how dht works and how to calculate hashes: http://joejulian.name/blog/dht-misses-are-expensive/ On December 27, 2014 3:18:00 PM PST, tben...@3vgeomatics.com wrote: That didn't fix it unfortunately. In fact, I've done a full rebalance after initially discovering the problem and after updating Gluster, but nothing was changed.. I don't know too much about how Gluster works internally; is it possible to compute the hash for each duplicate filename - figure out on which brick it belong is and find where it actually resides, then recreate the link file or update the linkto attribute? Assuming broken link files are the problem.. - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/27/14 1:55 pm To: tben...@3vgeomatics.com, gluster-users@gluster.org I'm wondering if this is from a corrupted failed rebalance. In a directory that has duplicates, do setfattr -n trusted.distribute.fix.layout -v 1 . If that fixes it, do a rebalance...fix-layout On December 27, 2014 12:38:01 PM PST, tben...@3vgeomatics.com wrote: Ok, I am really tearing my hair out here. I tried doing this manually for several other files just to be sure. And in these cases it removed the duplicate file from the directory listing, but the file can still not be read.. Reading directly from the brick works fine. - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/27/14 12:01 pm To: gluster-users@gluster.org Should be safe. Here's what I've done in the past to clean up rogue dht link files (not that yours looked rogue though): find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \; ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Hundreds of duplicate files
Hi Vijay, Yes the files are still readable from the .glusterfs path. There is no explicit error. However, trying to read a text file in python simply gives me null characters: open('ott_mf_itab').readlines() ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'] And reading binary files does the same - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Vijay Bellur vbel...@redhat.com Date: 12/27/14 9:57 pm To: tben...@3vgeomatics.com, gluster-users@gluster.org On 12/28/2014 10:13 AM, tben...@3vgeomatics.com wrote: Thanks Joe, I've read your blog post as well as your post regarding the .glusterfs directory. I found some unneeded duplicate files which were not being read properly. I then deleted the link file from the brick. This always removes the duplicate file from the listing, but the file does not always become readable. If I also delete the associated file in the .glusterfs directory on that brick, then some more files become readable. However this solution still doesn't work for all files. I know the file on the brick is not corrupt as it can be read directly from the brick directory. For files that are not readable from the client, can you check if the file is readable from the .glusterfs/ path? What is the specific error that is seen while trying to read one such file from the client? Thanks, Vijay ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Hundreds of duplicate files
Hello everyone and happy holidays, I upgraded both servers so that they are now both running Gluster 3.5.3, in fact they are both running Fedora 20 with the same kernel version. We have only one client and that is the first server itself, with plans to change this in the future.. As per a previous suggestion, I also ran xfs_repair on each of the five bricks, which reported no errors. So to recap: Doing a file listing on the mounted Gluster volume shows the same filename appearing twice. Trying to access the file either gives me the link file (and an error trying to read it), or the file on the actual brick location (this is entirely random, sometimes it will not work and then a few seconds later trying to read the file again works). Additionally, there are some cases in which two versions (different content) with the same filename appear on two bricks on the different servers. I would much appreciate it if someone could shed some light on this issue. Best Regards, Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/21/14 10:34 pm To: gluster-users@gluster.org Have you tried upgrading the older server so all are running the same version? Even though it's supposed to work with mixed versions, the goal should always be to have everything running the same version (clients and servers). On 12/20/2014 09:37 PM, tben...@3vgeomatics.com wrote: Hi Joe, Thanks for the reply. That worked; I probably forgot to do this as root last time. Yet, the files still show up twice in a directory listing on the mounted volume. And it seems to be random whether reading the file will succeed or not. I've tried with several files and it sometimes works and sometimes fails; I assume this depends on whether it locates the actual file on the brick or the link file. Let me know if you have any idea what's going on. Output of the command: $ getfattr -m . -d -e hex /data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0x52c2aed77d09412d8bfd7ca70e87b196 trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3200 Cheers, Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/20/14 8:53 pm To: gluster-users@gluster.org Try 'getfattr -m . -d -e hex' (dot instead of dash) and, of course, do that as root. On 12/20/2014 06:02 PM, tben...@3vgeomatics.com wrote: Hi everyone, We have a distributed Gluster volume on five bricks over two servers (first server running gluster 3.4.2, second server running gluster 3.5.1, both running Fedora 20) Starting last week, doing a file listing on the mounted volume shows many files with the same name appearing twice (and they are listed with the same inode). Doing a search for these files, I have found 290,000 of them!! If I do a listing of these files on the bricks themselves, it looks like most are link files (du will show the file on the first server as 0 bytes, and the sticky bit set). The file is fine on the second server. Unfortunately, running getfattr -m - -e hex -d on the file shows NO gluster-related attributes and I believe this is why both files appear in the listing. The files cannot be read by any programs as it is trying to read the link file. I assume the metadata became corrupted. This is a production server so we really need to know: 1. How did this happen, and how can we prevent it going forward? There was a server crash a week ago and I believe that was the cause. 2. How can we heal the Gluster volume/bricks and link files. If there is some straightforward way of restoring the link file pointer I can write a script to do it, obviously doing this manually will be impossible. Thanks very much for any and all help - much appreciated! Regards, Tom On Wed, Dec 17, 2014 at 4:07 AM, tben...@3vgeomatics.com wrote: Hi everyone, we have noticed some extremely odd behaviour with our distributed Gluster volume where duplicate files (same name, same or different content) are being created and stored on multiple bricks. The only consistent clue is that one of the duplicate files has the sticky bit set. I am hoping someone will be able to shed some light on why this is happening and how we can restore the volume as there appear to be hundreds of such files. I will try to provide as much pertinent information as I can. We have a 130TB Gluster volume consisting of two 20TB bricks
Re: [Gluster-users] Hundreds of duplicate files
Thanks for your continued help Joe. A demonstration of the problem, in this case I was able to open the file in vim (a text file) without any issues, however sometimes duplicated text files open in vim as one line consisting of @ characters, and binary data files can also not be opened correctly for reading. However the duplicate listing is still an issue. Note that Dec 13 was the date of a server crash. [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro* -rw-rw-r-T 1 parwant users 1712 Dec 13 19:02 tif2flt.pro -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 tif2flt.pro A few minutes later doing the same listing.. sticky bit disappeared and modification date changed [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro* -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 /sar/complete/vancouver/refdem/tif2flt.pro -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 /sar/complete/vancouver/refdem/tif2flt.pro [root@jongoo ~]# getfattr -m . -d -e hex /data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3300 [root@ndovu ~]# getfattr -m . -d -e hex /data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd Log for brick 00: [2014-12-27 01:59:24.263095] I [server-handshake.c:575:server_setvolume] 0-safari-server: accepted client from jongoo-1910-2014/12/27-01:59:24:173469-safari-client-0-0-0 (version: 3.5.3) [2014-12-27 02:02:00.772454] I [server-handshake.c:575:server_setvolume] 0-safari-server: accepted client from jongoo-2015-2014/12/27-02:01:55:694478-safari-client-0-0-0 (version: 3.5.3) [2014-12-27 02:02:05.780497] I [server-handshake.c:575:server_setvolume] 0-safari-server: accepted client from ndovu-16310-2014/12/27-02:02:00:703051-safari-client-0-0-0 (version: 3.5.3) [2014-12-27 04:41:07.094149] I [server.c:520:server_rpc_notify] 0-safari-server: disconnecting connectionfrom ndovu-16310-2014/12/27-02:02:00:703051-safari-client-0-0-0 [2014-12-27 04:41:07.094187] I [client_t.c:417:gf_client_unref] 0-safari-server: Shutting down connection ndovu-16310-2014/12/27-02:02:00:703051-safari-client-0-0-0 [2014-12-27 04:41:56.979717] I [server.c:520:server_rpc_notify] 0-safari-server: disconnecting connectionfrom jongoo-2015-2014/12/27-02:01:55:694478-safari-client-0-0-0 [2014-12-27 04:41:56.979761] I [client_t.c:417:gf_client_unref] 0-safari-server: Shutting down connection jongoo-2015-2014/12/27-02:01:55:694478-safari-client-0-0-0 Log for brick 03: [2014-12-27 01:59:24.270123] I [server-handshake.c:575:server_setvolume] 0-safari-server: accepted client from jongoo-1910-2014/12/27-01:59:24:173469-safari-client-3-0-0 (version: 3.5.3) [2014-12-27 02:02:05.724212] I [server-handshake.c:575:server_setvolume] 0-safari-server: accepted client from jongoo-2015-2014/12/27-02:01:55:694478-safari-client-3-0-0 (version: 3.5.3) [2014-12-27 02:02:05.778098] I [server-handshake.c:575:server_setvolume] 0-safari-server: accepted client from ndovu-16310-2014/12/27-02:02:00:703051-safari-client-3-0-0 (version: 3.5.3) [2014-12-27 04:41:07.098381] I [server.c:520:server_rpc_notify] 0-safari-server: disconnecting connectionfrom ndovu-16310-2014/12/27-02:02:00:703051-safari-client-3-0-0 [2014-12-27 04:41:07.098417] I [client_t.c:417:gf_client_unref] 0-safari-server: Shutting down connection ndovu-16310-2014/12/27-02:02:00:703051-safari-client-3-0-0 [2014-12-27 04:41:56.984140] I [server.c:520:server_rpc_notify] 0-safari-server: disconnecting connectionfrom jongoo-2015-2014/12/27-02:01:55:694478-safari-client-3-0-0 [2014-12-27 04:41:56.984203] I [client_t.c:417:gf_client_unref] 0-safari-server: Shutting down connection jongoo-2015-2014/12/27-02:01:55:694478-safari-client-3-0-0 Log for mounted volume: [2014-12-27 01:59:24.180253] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.5.3 (/usr/sbin/glusterfs --volfile-server=ndovu --volfile-id=/safari /sar) [2014-12-27 01:59:24.199613] I [socket.c:3645:socket_init] 0-glusterfs: SSL support is NOT enabled [2014-12-27 01:59:24.199684] I [socket.c:3660:socket_init] 0-glusterfs: using
Re: [Gluster-users] Hundreds of duplicate files
Actually we are using XFS for the bricks. Still haven't made any progress on this issue, unfortunately.. - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Anders Blomdell anders.blomd...@control.lth.se Date: 12/21/14 7:42 pm To: tben...@3vgeomatics.com, gluster-users@gluster.org On 21 December 2014 06:37:44 CET, tben...@3vgeomatics.com wrote: Hi Joe, Thanks for the reply. That worked; I probably forgot to do this as root last time. Yet, the files still show up twice in a directory listing on the mounted volume. And it seems to be random whether reading the file will succeed or not. I've tried with several files and it sometimes works and sometimes fails; I assume this depends on whether it locates the actual file on the brick or the link file. Let me know if you have any idea what's going on. Does the brick filesystem happen to be ext4? I havs hed the similar problem with 3.6.x and ext4 (64 bit offset problem). Output of the command: $ getfattr -m . -d -e hex /data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0x52c2aed77d09412d8bfd7ca70e87b196 trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3200 Cheers, Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/20/14 8:53 pm To: gluster-users@gluster.org Try 'getfattr -m . -d -e hex' (dot instead of dash) and, of course, do that as root. On 12/20/2014 06:02 PM, tben...@3vgeomatics.com wrote: Hi everyone, We have a distributed Gluster volume on five bricks over two servers (first server running gluster 3.4.2, second server running gluster 3.5.1, both running Fedora 20) Starting last week, doing a file listing on the mounted volume shows many files with the same name appearing twice (and they are listed with the same inode). Doing a search for these files, I have found 290,000 of them!! If I do a listing of these files on the bricks themselves, it looks like most are link files (du will show the file on the first server as 0 bytes, and the sticky bit set). The file is fine on the second server. Unfortunately, running getfattr -m - -e hex -d on the file shows NO gluster-related attributes and I believe this is why both files appear in the listing. The files cannot be read by any programs as it is trying to read the link file. I assume the metadata became corrupted. This is a production server so we really need to know: 1. How did this happen, and how can we prevent it going forward? There was a server crash a week ago and I believe that was the cause. 2. How can we heal the Gluster volume/bricks and link files. If there is some straightforward way of restoring the link file pointer I can write a script to do it, obviously doing this manually will be impossible. Thanks very much for any and all help - much appreciated! Regards, Tom On Wed, Dec 17, 2014 at 4:07 AM, tben...@3vgeomatics.com wrote: Hi everyone, we have noticed some extremely odd behaviour with our distributed Gluster volume where duplicate files (same name, same or different content) are being created and stored on multiple bricks. The only consistent clue is that one of the duplicate files has the sticky bit set. I am hoping someone will be able to shed some light on why this is happening and how we can restore the volume as there appear to be hundreds of such files. I will try to provide as much pertinent information as I can. We have a 130TB Gluster volume consisting of two 20TB bricks on server1, and three 40TB bricks on a server2 which were added at a later date (and rebalancing was done). The volume is mounted on server1, and accessed only through this server but by many users. Both servers went down due to power loss several days ago after which this problem was first noticed. We ran a rebalance command on the volumes, this has not fixed the problem. Gluster volume info: Volume Name: safari Type: Distribute Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: server1:/data/glusterfs/safari/brick00/brick Brick2: server1:/data/glusterfs/safari/brick01/brick Brick3: server2:/data/glusterfs/safari/brick02/brick Brick4: server2:/data/glusterfs/safari/brick03/brick Brick5: server2:/data/glusterfs/safari/brick04/brick Size information:
Re: [Gluster-users] Hundreds of duplicate files
Hi everyone, We have a distributed Gluster volume on five bricks over two servers (first server running gluster 3.4.2, second server running gluster 3.5.1, both running Fedora 20) Starting last week, doing a file listing on the mounted volume shows many files with the same name appearing twice (and they are listed with the same inode). Doing a search for these files, I have found 290,000 of them!! If I do a listing of these files on the bricks themselves, it looks like most are link files (du will show the file on the first server as 0 bytes, and the sticky bit set). The file is fine on the second server. Unfortunately, running getfattr -m - -e hex -d on the file shows NO gluster-related attributes and I believe this is why both files appear in the listing. The files cannot be read by any programs as it is trying to read the link file. I assume the metadata became corrupted. This is a production server so we really need to know: 1. How did this happen, and how can we prevent it going forward? There was a server crash a week ago and I believe that was the cause. 2. How can we heal the Gluster volume/bricks and link files. If there is some straightforward way of restoring the link file pointer I can write a script to do it, obviously doing this manually will be impossible. Thanks very much for any and all help - much appreciated! Regards, Tom On Wed, Dec 17, 2014 at 4:07 AM, tben...@3vgeomatics.com wrote: Hi everyone, we have noticed some extremely odd behaviour with our distributed Gluster volume where duplicate files (same name, same or different content) are being created and stored on multiple bricks. The only consistent clue is that one of the duplicate files has the sticky bit set. I am hoping someone will be able to shed some light on why this is happening and how we can restore the volume as there appear to be hundreds of such files. I will try to provide as much pertinent information as I can. We have a 130TB Gluster volume consisting of two 20TB bricks on server1, and three 40TB bricks on a server2 which were added at a later date (and rebalancing was done). The volume is mounted on server1, and accessed only through this server but by many users. Both servers went down due to power loss several days ago after which this problem was first noticed. We ran a rebalance command on the volumes, this has not fixed the problem. Gluster volume info: Volume Name: safari Type: Distribute Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: server1:/data/glusterfs/safari/brick00/brick Brick2: server1:/data/glusterfs/safari/brick01/brick Brick3: server2:/data/glusterfs/safari/brick02/brick Brick4: server2:/data/glusterfs/safari/brick03/brick Brick5: server2:/data/glusterfs/safari/brick04/brick Size information: /dev/sdc 37T 16T 22T 42% /data/glusterfs/safari/brick02 /dev/sdd 37T 16T 22T 42% /data/glusterfs/safari/brick03 /dev/sde 37T 17T 21T 45% /data/glusterfs/safari/brick04 /dev/md126 11T 7.7T 2.8T 74% /data/glusterfs/safari/brick00 /dev/md124 11T 8.0T 2.5T 77% /data/glusterfs/safari/brick01 server2:/safari 130T 63T 68T 48% /sar Example 1: -Two files with the same name exist in one directory -They have different contents and attributes -A file listing on the mounted volume shows the same inode -The newer file has sticky bit set -Neither file is corrupted, they can both be viewed by using the absolute path (on the bricks) File listing on the mounted volume 13036730497538635177 -rw-rw-r-T 1 jon users 924 Dec 15 10:42 RSLC_tab 13036730497538635177 -rw-rw-r-- 1 jon users 418 Mar 18 2013 RSLC_tab Listing of the files on the bricks: 8925798411 -rw-rw-r-T+ 2 jon users 924 Dec 15 10:42 /data/glusterfs/safari/brick00/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab 51541886672 -rw-rw-r--+ 2 1002 users 418 Mar 18 2013 /data/glusterfs/safari/brick02/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab Example 2: -Two files with the same name exist in one directory -They have the same content and attributes -No sticky bit is set when looking at file listing on the mounted volume -Sticky bit is set for one while when looking at file listing on the bricks -Files are corrupted File listing on the mounted volume: 13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013 ifg_lr/20130226_20130813.diff.phi.ras 13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013 ifg_lr/20130226_20130813.diff.phi.ras Listing of the files on the bricks: 17058578 -rw-rw-r-T+ 2 tom users 2393848 Dec 13 17:11 /data/glusterfs/safari/brick00/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras 57986922129 -rw-rw-r--+ 2 1010 users 2393848 Dec 8 2013 /data/glusterfs/safari/brick02/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
Re: [Gluster-users] Hundreds of duplicate files
Hi Joe, Thanks for the reply. That worked; I probably forgot to do this as root last time. Yet, the files still show up twice in a directory listing on the mounted volume. And it seems to be random whether reading the file will succeed or not. I've tried with several files and it sometimes works and sometimes fails; I assume this depends on whether it locates the actual file on the brick or the link file. Let me know if you have any idea what's going on. Output of the command: $ getfattr -m . -d -e hex /data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras getfattr: Removing leading '/' from absolute path names # file: data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras system.posix_acl_access=0x0200010006000400060016002400 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004 trusted.gfid=0x52c2aed77d09412d8bfd7ca70e87b196 trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3200 Cheers, Tom - Original Message - Subject: Re: [Gluster-users] Hundreds of duplicate files From: Joe Julian j...@julianfamily.org Date: 12/20/14 8:53 pm To: gluster-users@gluster.org Try 'getfattr -m . -d -e hex' (dot instead of dash) and, of course, do that as root. On 12/20/2014 06:02 PM, tben...@3vgeomatics.com wrote: Hi everyone, We have a distributed Gluster volume on five bricks over two servers (first server running gluster 3.4.2, second server running gluster 3.5.1, both running Fedora 20) Starting last week, doing a file listing on the mounted volume shows many files with the same name appearing twice (and they are listed with the same inode). Doing a search for these files, I have found 290,000 of them!! If I do a listing of these files on the bricks themselves, it looks like most are link files (du will show the file on the first server as 0 bytes, and the sticky bit set). The file is fine on the second server. Unfortunately, running getfattr -m - -e hex -d on the file shows NO gluster-related attributes and I believe this is why both files appear in the listing. The files cannot be read by any programs as it is trying to read the link file. I assume the metadata became corrupted. This is a production server so we really need to know: 1. How did this happen, and how can we prevent it going forward? There was a server crash a week ago and I believe that was the cause. 2. How can we heal the Gluster volume/bricks and link files. If there is some straightforward way of restoring the link file pointer I can write a script to do it, obviously doing this manually will be impossible. Thanks very much for any and all help - much appreciated! Regards, Tom On Wed, Dec 17, 2014 at 4:07 AM, tben...@3vgeomatics.com wrote: Hi everyone, we have noticed some extremely odd behaviour with our distributed Gluster volume where duplicate files (same name, same or different content) are being created and stored on multiple bricks. The only consistent clue is that one of the duplicate files has the sticky bit set. I am hoping someone will be able to shed some light on why this is happening and how we can restore the volume as there appear to be hundreds of such files. I will try to provide as much pertinent information as I can. We have a 130TB Gluster volume consisting of two 20TB bricks on server1, and three 40TB bricks on a server2 which were added at a later date (and rebalancing was done). The volume is mounted on server1, and accessed only through this server but by many users. Both servers went down due to power loss several days ago after which this problem was first noticed. We ran a rebalance command on the volumes, this has not fixed the problem. Gluster volume info: Volume Name: safari Type: Distribute Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: server1:/data/glusterfs/safari/brick00/brick Brick2: server1:/data/glusterfs/safari/brick01/brick Brick3: server2:/data/glusterfs/safari/brick02/brick Brick4: server2:/data/glusterfs/safari/brick03/brick Brick5: server2:/data/glusterfs/safari/brick04/brick Size information: /dev/sdc 37T 16T 22T 42% /data/glusterfs/safari/brick02 /dev/sdd 37T 16T 22T 42% /data/glusterfs/safari/brick03 /dev/sde 37T 17T 21T 45% /data/glusterfs/safari/brick04 /dev/md126 11T 7.7T 2.8T 74% /data/glusterfs/safari/brick00 /dev/md124 11T 8.0T 2.5T 77% /data/glusterfs/safari/brick01 server2:/safari 130T 63T 68T 48% /sar Example 1: -Two files with the same name exist in one directory -They have different contents and attributes -A file listing on the mounted volume shows the same inode -The newer file has sticky bit set -Neither file is corrupted, they can both be viewed by using
[Gluster-users] Hundreds of duplicate files
Hi everyone, we have noticed some extremely odd behaviour with our distributed Gluster volume where duplicate files (same name, same or different content) are being created and stored on multiple bricks. The only consistent clue is that one of the duplicate files has the sticky bit set. I am hoping someone will be able to shed some light on why this is happening and how we can restore the volume as there appear to be hundreds of such files. I will try to provide as much pertinent information as I can. We have a 130TB Gluster volume consisting of two 20TB bricks on server1, and three 40TB bricks on a server2 which were added at a later date (and rebalancing was done). The volume is mounted on server1, and accessed only through this server but by many users. Both servers went down due to power loss several days ago after which this problem was first noticed. We ran a rebalance command on the volumes, this has not fixed the problem. Gluster volume info: Volume Name: safari Type: Distribute Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda Status: Started Number of Bricks: 5 Transport-type: tcp Bricks: Brick1: server1:/data/glusterfs/safari/brick00/brick Brick2: server1:/data/glusterfs/safari/brick01/brick Brick3: server2:/data/glusterfs/safari/brick02/brick Brick4: server2:/data/glusterfs/safari/brick03/brick Brick5: server2:/data/glusterfs/safari/brick04/brick Size information: /dev/sdc 37T 16T 22T 42% /data/glusterfs/safari/brick02 /dev/sdd 37T 16T 22T 42% /data/glusterfs/safari/brick03 /dev/sde 37T 17T 21T 45% /data/glusterfs/safari/brick04 /dev/md126 11T 7.7T 2.8T 74% /data/glusterfs/safari/brick00 /dev/md124 11T 8.0T 2.5T 77% /data/glusterfs/safari/brick01 server2:/safari 130T 63T 68T 48% /sar Example 1: -Two files with the same name exist in one directory -They have different contents and attributes -A file listing on the mounted volume shows the same inode -The newer file has sticky bit set -Neither file is corrupted, they can both be viewed by using the absolute path (on the bricks) File listing on the mounted volume 13036730497538635177 -rw-rw-r-T 1 jon users 924 Dec 15 10:42 RSLC_tab 13036730497538635177 -rw-rw-r-- 1 jon users 418 Mar 18 2013 RSLC_tab Listing of the files on the bricks: 8925798411 -rw-rw-r-T+ 2 jon users 924 Dec 15 10:42 /data/glusterfs/safari/brick00/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab 51541886672 -rw-rw-r--+ 2 1002 users 418 Mar 18 2013 /data/glusterfs/safari/brick02/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab Example 2: -Two files with the same name exist in one directory -They have the same content and attributes -No sticky bit is set when looking at file listing on the mounted volume -Sticky bit is set for one while when looking at file listing on the bricks -Files are corrupted File listing on the mounted volume: 13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013 ifg_lr/20130226_20130813.diff.phi.ras 13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013 ifg_lr/20130226_20130813.diff.phi.ras Listing of the files on the bricks: 17058578 -rw-rw-r-T+ 2 tom users 2393848 Dec 13 17:11 /data/glusterfs/safari/brick00/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras 57986922129 -rw-rw-r--+ 2 1010 users 2393848 Dec 8 2013 /data/glusterfs/safari/brick02/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras Additionally, only some files in this directory are duplicated. The duplicated files are corrupted (can not be viewed as Raster images: the original file type) The files which are not duplicated are not corrupted. File command: (notice duplicate and singleton files) ifg_lr/20091021_20100218.diff.phi.ras: Sun raster image data, 1208 x 1981, 8-bit, RGB colormap ifg_lr/20091021_20101016.diff.phi.ras: data ifg_lr/20091021_20101016.diff.phi.ras: data ifg_lr/20091021_20101109.diff.phi.ras: Sun raster image data, 1208 x 1981, 8-bit, RGB colormap ifg_lr/20091021_20101203.diff.phi.ras: Sun raster image data, 1208 x 1981, 8-bit, RGB colormap ifg_lr/20091021_20101227.diff.phi.ras: Sun raster image data, 1208 x 1981, 8-bit, RGB colormap ifg_lr/20091021_20110120.diff.phi.ras: Sun raster image data, 1208 x 1981, 8-bit, RGB colormap ifg_lr/20091021_20110213.diff.phi.ras: data ifg_lr/20091021_20110213.diff.phi.ras: data ifg_lr/20091021_20110309.diff.phi.ras: data ifg_lr/20091021_20110309.diff.phi.ras: sticky data ifg_lr/20091021_20110402.diff.phi.ras: Sun raster image data, 1208 x 1981, 8-bit, RGB colormap Information from Gluster log file: Additionally, the log is full of thousands of the following such lines (possibly, one for each directory?) dating back several mponths 27 [2014-12-12 11:10:10.257950] I [dht-layout.c:726:dht_layout_dir_mismatch] 3-safari-dht: /rsc/tsx/lasvegas/spot_asc/stack/ifg_lr - disk layout missing 28 [2014-12-12 11:10:10.257988] I [dht-common.c:623:dht_revalidate_cbk] 3-safari-dht: mismatching layouts for