Re: [Gluster-users] Hundreds of duplicate files

2015-02-18 Thread tbenzvi
Hi Olav,

I have a hunch that our problem was caused by improper unmounting of the 
gluster volume, and have since found that the proper order should be: kill all 
jobs using volume - unmount volume on clients - gluster volume stop - stop 
gluster service (if necessary)
 
In my case, I wrote a Python script to find duplicate files on the mounted 
volume, then delete the corresponding link files on the bricks (making sure to 
also delete files in the .glusterfs directory)
 
However, your find command was also suggested to me and I think it's a simpler 
solution. I believe removing all link files (even ones that are not causing 
duplicates) is fine since the next file access gluster will do a lookup on all 
bricks and recreate any link files if necessary. Hopefully a gluster expert can 
chime in on this point as I'm not completely sure.
 
Keep in mind your setup is somewhat different than mine as I have only 5 bricks 
with no replication.
 
Regards,
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Olav Peeters opeet...@gmail.com
Date: 2/18/15 10:52 am
To: gluster-users@gluster.org, tben...@3vgeomatics.com

 Hi all,
 I'm have this problem after upgrading from 3.5.3 to 3.6.2.
 At the moment I am still waiting for a heal to finish (on a 31TB volume with 
42 bricks, replicated over three nodes).
 
 Tom,
 how did you remove the duplicates?
 with 42 bricks I will not be able to do this manually..
 Did a:
 find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
 work for you?
 
 Should this type of thing ideally not be checked and mended by a heal?
 
 Does anyone have an idea yet how this happens in the first place? Can it be 
connected to upgrading?
 
 Cheers,
 Olav
   On 01/01/15 03:07, tben...@3vgeomatics.com wrote:
 No, the files can be read on a newly mounted client! I went ahead and deleted 
all of the link files associated with these duplicates, and then remounted the 
volume. The problem is fixed!
Thanks again for the help, Joe and Vijay.
 
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
 From: Vijay Bellur vbel...@redhat.com
 Date: 12/28/14 3:23 am
 To: tben...@3vgeomatics.com, gluster-users@gluster.org
 
 On 12/28/2014 01:20 PM, tben...@3vgeomatics.com wrote:
  Hi Vijay,
  Yes the files are still readable from the .glusterfs path.
  There is no explicit error. However, trying to read a text file in
  python simply gives me null characters:
 
   open('ott_mf_itab').readlines()
  ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
 
  And reading binary files does the same
 
 
 Is this behavior seen with a freshly mounted client too?
 
 -Vijay
 
  - Original Message -
  Subject: Re: [Gluster-users] Hundreds of duplicate files
  From: Vijay Bellur vbel...@redhat.com
  Date: 12/27/14 9:57 pm
  To: tben...@3vgeomatics.com, gluster-users@gluster.org
 
  On 12/28/2014 10:13 AM, tben...@3vgeomatics.com wrote:
   Thanks Joe, I've read your blog post as well as your post
  regarding the
   .glusterfs directory.
   I found some unneeded duplicate files which were not being read
   properly. I then deleted the link file from the brick. This always
   removes the duplicate file from the listing, but the file does not
   always become readable. If I also delete the associated file in the
   .glusterfs directory on that brick, then some more files become
   readable. However this solution still doesn't work for all files.
   I know the file on the brick is not corrupt as it can be read
  directly
   from the brick directory.
 
  For files that are not readable from the client, can you check if the
  file is readable from the .glusterfs/ path?
 
  What is the specific error that is seen while trying to read one such
  file from the client?
 
  Thanks,
  Vijay
 
 
 
  ___
  Gluster-users mailing list
  Gluster-users@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-users
 
 

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 

Re: [Gluster-users] Hundreds of duplicate files

2014-12-31 Thread tbenzvi
No, the files can be read on a newly mounted client! I went ahead and deleted 
all of the link files associated with these duplicates, and then remounted the 
volume. The problem is fixed!
Thanks again for the help, Joe and Vijay.
 
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Vijay Bellur vbel...@redhat.com
Date: 12/28/14 3:23 am
To: tben...@3vgeomatics.com, gluster-users@gluster.org

On 12/28/2014 01:20 PM, tben...@3vgeomatics.com wrote:
  Hi Vijay,
  Yes the files are still readable from the .glusterfs path.
  There is no explicit error. However, trying to read a text file in
  python simply gives me null characters:
 
   open('ott_mf_itab').readlines()
  ['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
 
  And reading binary files does the same
 
 
 Is this behavior seen with a freshly mounted client too?
 
 -Vijay
 
  - Original Message -
  Subject: Re: [Gluster-users] Hundreds of duplicate files
  From: Vijay Bellur vbel...@redhat.com
  Date: 12/27/14 9:57 pm
  To: tben...@3vgeomatics.com, gluster-users@gluster.org
 
  On 12/28/2014 10:13 AM, tben...@3vgeomatics.com wrote:
   Thanks Joe, I've read your blog post as well as your post
  regarding the
   .glusterfs directory.
   I found some unneeded duplicate files which were not being read
   properly. I then deleted the link file from the brick. This always
   removes the duplicate file from the listing, but the file does not
   always become readable. If I also delete the associated file in the
   .glusterfs directory on that brick, then some more files become
   readable. However this solution still doesn't work for all files.
   I know the file on the brick is not corrupt as it can be read
  directly
   from the brick directory.
 
  For files that are not readable from the client, can you check if the
  file is readable from the .glusterfs/ path?
 
  What is the specific error that is seen while trying to read one such
  file from the client?
 
  Thanks,
  Vijay
 
 
 
  ___
  Gluster-users mailing list
  Gluster-users@gluster.org
  http://www.gluster.org/mailman/listinfo/gluster-users
 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Hundreds of duplicate files

2014-12-27 Thread tbenzvi
Moving the file with linkto attribute worked! Just one copy of the file is 
retained in the listing and can be read without problems.
I will write a script to remove these rogue link files from the bricks - any 
risks associated with this?
 
Thanks everyone for your help, of course if anyone could explain how this 
happened I would love to hear it..
 
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Vijay Bellur vbel...@redhat.com
Date: 12/27/14 9:12 am
To: tben...@3vgeomatics.com, gluster-users@gluster.org

On 12/27/2014 01:11 PM, tben...@3vgeomatics.com wrote:
  Thanks for your continued help Joe.
  A demonstration of the problem, in this case I was able to open the file
  in vim (a text file) without any issues, however sometimes duplicated
  text files open in vim as one line consisting of @ characters, and
  binary data files can also not be opened correctly for reading.
  However the duplicate listing is still an issue. Note that Dec 13 was
  the date of a server crash.
 
  [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro*
  -rw-rw-r-T 1 parwant users 1712 Dec 13 19:02 tif2flt.pro
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 tif2flt.pro
 
  A few minutes later doing the same listing.. sticky bit disappeared and
  modification date changed
 
  [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro*
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010
  /sar/complete/vancouver/refdem/tif2flt.pro
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010
  /sar/complete/vancouver/refdem/tif2flt.pro
 
  [root@jongoo ~]# getfattr -m . -d -e hex
  /data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro
  getfattr: Removing leading '/' from absolute path names
  # file:
  data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro
  system.posix_acl_access=0x0200010006000400060016002400
  trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
  trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd
  trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3300
 
  [root@ndovu ~]# getfattr -m . -d -e hex
  /data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro
  getfattr: Removing leading '/' from absolute path names
  # file:
  data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro
  system.posix_acl_access=0x0200010006000400060016002400
  trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
  trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd
 
 
 Is rebalance running on this volume right now? If not, can you please 
 move out the file copy with trusted.glusterfs.dht.linkto attribute out 
 of the brick directory 
 (/data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro) 
 to an alternate location  check the behavior?
 
 Thanks,
 Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Hundreds of duplicate files

2014-12-27 Thread tbenzvi
Ok, I am really tearing my hair out here. I tried doing this manually for 
several other files just to be sure. And in these cases it removed the 
duplicate file from the directory listing, but the file can still not be read.. 
Reading directly from the brick works fine.
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Joe Julian j...@julianfamily.org
Date: 12/27/14 12:01 pm
To: gluster-users@gluster.org

Should be safe.
 
 Here's what I've done in the past to clean up rogue dht link files (not that 
yours looked rogue though):
 
 find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
 
 On 12/27/2014 11:09 AM, tben...@3vgeomatics.com wrote:
 Moving the file with linkto attribute worked! Just one copy of the file is 
retained in the listing and can be read without problems.
I will write a script to remove these rogue link files from the bricks - any 
risks associated with this?
 
Thanks everyone for your help, of course if anyone could explain how this 
happened I would love to hear it..
 
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
 From: Vijay Bellur vbel...@redhat.com
 Date: 12/27/14 9:12 am
 To: tben...@3vgeomatics.com, gluster-users@gluster.org
 
 On 12/27/2014 01:11 PM, tben...@3vgeomatics.com wrote:
  Thanks for your continued help Joe.
  A demonstration of the problem, in this case I was able to open the file
  in vim (a text file) without any issues, however sometimes duplicated
  text files open in vim as one line consisting of @ characters, and
  binary data files can also not be opened correctly for reading.
  However the duplicate listing is still an issue. Note that Dec 13 was
  the date of a server crash.
 
  [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro*
  -rw-rw-r-T 1 parwant users 1712 Dec 13 19:02 tif2flt.pro
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 tif2flt.pro
 
  A few minutes later doing the same listing.. sticky bit disappeared and
  modification date changed
 
  [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro*
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010
  /sar/complete/vancouver/refdem/tif2flt.pro
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010
  /sar/complete/vancouver/refdem/tif2flt.pro
 
  [root@jongoo ~]# getfattr -m . -d -e hex
  /data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro
  getfattr: Removing leading '/' from absolute path names
  # file:
  data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro
  system.posix_acl_access=0x0200010006000400060016002400
  trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
  trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd
  trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3300
 
  [root@ndovu ~]# getfattr -m . -d -e hex
  /data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro
  getfattr: Removing leading '/' from absolute path names
  # file:
  data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro
  system.posix_acl_access=0x0200010006000400060016002400
  trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
  trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd
 
 
 Is rebalance running on this volume right now? If not, can you please 
 move out the file copy with trusted.glusterfs.dht.linkto attribute out 
 of the brick directory 
 (/data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro) 
 to an alternate location  check the behavior?
 
 Thanks,
 Vijay
 

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users  
___ Gluster-users mailing list 
Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Hundreds of duplicate files

2014-12-27 Thread tbenzvi
That didn't fix it unfortunately. In fact, I've done a full rebalance after 
initially discovering the problem and after updating Gluster, but nothing was 
changed..
 
I don't know too much about how Gluster works internally; is it possible to 
compute the hash for each duplicate filename - figure out on which brick it 
belong is and find where it actually resides, then recreate the link file or 
update the linkto attribute? Assuming broken link files are the problem..
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Joe Julian j...@julianfamily.org
Date: 12/27/14 1:55 pm
To: tben...@3vgeomatics.com, gluster-users@gluster.org

I'm wondering if this is from a corrupted failed rebalance. In a directory that 
has duplicates, do setfattr -n trusted.distribute.fix.layout -v 1 .
 
 If that fixes it, do a rebalance...fix-layout

 On December 27, 2014 12:38:01 PM PST, tben...@3vgeomatics.com wrote:  Ok, I am 
really tearing my hair out here. I tried doing this manually for several other 
files just to be sure. And in these cases it removed the duplicate file from 
the directory listing, but the file can still not be read.. Reading directly 
from the brick works fine.
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Joe Julian j...@julianfamily.org
Date: 12/27/14 12:01 pm
To: gluster-users@gluster.org

Should be safe.
 
 Here's what I've done in the past to clean up rogue dht link files (not that 
yours looked rogue though):
 
 find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
 
 On 12/27/2014 11:09 AM, tben...@3vgeomatics.com wrote:
 Moving the file with linkto attribute worked! Just one copy of the file is 
retained in the listing and can be read without problems.
I will write a script to remove these rogue link files from the bricks - any 
risks associated with this?
 
Thanks everyone for your help, of course if anyone could explain how this 
happened I would love to hear it..
 
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
 From: Vijay Bellur vbel...@redhat.com
 Date: 12/27/14 9:12 am
 To: tben...@3vgeomatics.com, gluster-users@gluster.org
 
 On 12/27/2014 01:11 PM, tben...@3vgeomatics.com wrote:
  Thanks for your continued help Joe.
  A demonstration of the problem, in this case I was able to open the file
  in vim (a text file) without any issues, however sometimes duplicated
  text files open in vim as one line consisting of @ characters, and
  binary data files can also not be opened correctly for reading.
  However the duplicate listing is still an issue. Note that Dec 13 was
  the date of a server crash.
 
  [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro*
  -rw-rw-r-T 1 parwant users 1712 Dec 13 19:02 tif2flt.pro
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010 tif2flt.pro
 
  A few minutes later doing the same listing.. sticky bit disappeared and
  modification date changed
 
  [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro*
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010
  /sar/complete/vancouver/refdem/tif2flt.pro
  -rw-rw-r-- 1 parwant users 1712 Jun 17 2010
  /sar/complete/vancouver/refdem/tif2flt.pro
 
  [root@jongoo ~]# getfattr -m . -d -e hex
  /data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro
  getfattr: Removing leading '/' from absolute path names
  # file:
  data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro
  system.posix_acl_access=0x0200010006000400060016002400
  trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
  trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd
  trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3300
 
  [root@ndovu ~]# getfattr -m . -d -e hex
  /data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro
  getfattr: Removing leading '/' from absolute path names
  # file:
  data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro
  system.posix_acl_access=0x0200010006000400060016002400
  trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
  trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd
 
 
 Is rebalance running on this volume right now? If not, can you please 
 move out the file copy with trusted.glusterfs.dht.linkto attribute out 
 of the brick directory 
 (/data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro) 
 to an alternate location  check the behavior?
 
 Thanks,
 Vijay
 

 ___
 Gluster-users mailing list
 Gluster-users@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-users  
___ Gluster-users mailing list 

Re: [Gluster-users] Hundreds of duplicate files

2014-12-27 Thread tbenzvi
Thanks Joe, I've read your blog post as well as your post regarding the 
.glusterfs directory.
 
I found some unneeded duplicate files which were not being read properly. I 
then deleted the link file from the brick. This always removes the duplicate 
file from the listing, but the file does not always become readable. If I also 
delete the associated file in the .glusterfs directory on that brick, then some 
more files become readable. However this solution still doesn't work for all 
files.
 
I know the file on the brick is not corrupt as it can be read directly from the 
brick directory.
 
Hopefully you have some other ideas I could try.
 
Otherwise, I may proceed by writing a script to handle the files one-by-one. 
Move the actual file from .glusterfs off the brick to a temporary location, 
remove all references to the file on the bricks, then copy the file back onto 
the mounted volume.. It's not ideal but hopefully this is a one-time occurence.
 
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Joe Julian j...@julianfamily.org
Date: 12/27/14 3:28 pm
To: tben...@3vgeomatics.com, gluster-users@gluster.org

The linkfile you showed earlier was perfect. 
 
 Check this article on my blog for the details on how dht works and how to 
calculate hashes: http://joejulian.name/blog/dht-misses-are-expensive/

 On December 27, 2014 3:18:00 PM PST, tben...@3vgeomatics.com wrote:  That 
didn't fix it unfortunately. In fact, I've done a full rebalance after 
initially discovering the problem and after updating Gluster, but nothing was 
changed..
 
I don't know too much about how Gluster works internally; is it possible to 
compute the hash for each duplicate filename - figure out on which brick it 
belong is and find where it actually resides, then recreate the link file or 
update the linkto attribute? Assuming broken link files are the problem..
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Joe Julian j...@julianfamily.org
Date: 12/27/14 1:55 pm
To: tben...@3vgeomatics.com, gluster-users@gluster.org

I'm wondering if this is from a corrupted failed rebalance. In a directory that 
has duplicates, do setfattr -n trusted.distribute.fix.layout -v 1 .
 
 If that fixes it, do a rebalance...fix-layout

 On December 27, 2014 12:38:01 PM PST, tben...@3vgeomatics.com wrote:  Ok, I am 
really tearing my hair out here. I tried doing this manually for several other 
files just to be sure. And in these cases it removed the duplicate file from 
the directory listing, but the file can still not be read.. Reading directly 
from the brick works fine.
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Joe Julian j...@julianfamily.org
Date: 12/27/14 12:01 pm
To: gluster-users@gluster.org

Should be safe.
 
 Here's what I've done in the past to clean up rogue dht link files (not that 
yours looked rogue though):
 
 find $brick_root -type f -size 0 -perm 1000 -exec /bin/rm {} \;
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Hundreds of duplicate files

2014-12-27 Thread tbenzvi
Hi Vijay,
 
Yes the files are still readable from the .glusterfs path.
There is no explicit error. However, trying to read a text file in python 
simply gives me null characters:
 open('ott_mf_itab').readlines()
['\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00']
 And reading binary files does the same
 
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Vijay Bellur vbel...@redhat.com
Date: 12/27/14 9:57 pm
To: tben...@3vgeomatics.com, gluster-users@gluster.org

On 12/28/2014 10:13 AM, tben...@3vgeomatics.com wrote:
  Thanks Joe, I've read your blog post as well as your post regarding the
  .glusterfs directory.
  I found some unneeded duplicate files which were not being read
  properly. I then deleted the link file from the brick. This always
  removes the duplicate file from the listing, but the file does not
  always become readable. If I also delete the associated file in the
  .glusterfs directory on that brick, then some more files become
  readable. However this solution still doesn't work for all files.
  I know the file on the brick is not corrupt as it can be read directly
  from the brick directory.
 
 For files that are not readable from the client, can you check if the 
 file is readable from the .glusterfs/ path?
 
 What is the specific error that is seen while trying to read one such 
 file from the client?
 
 Thanks,
 Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Hundreds of duplicate files

2014-12-26 Thread tbenzvi
Hello everyone and happy holidays,
 
I upgraded both servers so that they are now both running Gluster 3.5.3, in 
fact they are both running Fedora 20 with the same kernel version. We have only 
one client and that is the first server itself, with plans to change this in 
the future..
As per a previous suggestion, I also ran xfs_repair on each of the five bricks, 
which reported no errors.
 
So to recap: Doing a file listing on the mounted Gluster volume shows the same 
filename appearing twice. Trying to access the file either gives me the link 
file (and an error trying to read it), or the file on the actual brick location 
(this is entirely random, sometimes it will not work and then a few seconds 
later trying to read the file again works).
Additionally, there are some cases in which two versions (different content) 
with the same filename appear on two bricks on the different servers.
 
I would much appreciate it if someone could shed some light on this issue.
 
Best Regards,
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Joe Julian j...@julianfamily.org
Date: 12/21/14 10:34 pm
To: gluster-users@gluster.org

Have you tried upgrading the older server so all are running the same version? 
Even though it's supposed to work with mixed versions, the goal should always 
be to have everything running the same version (clients and servers).
 
 On 12/20/2014 09:37 PM, tben...@3vgeomatics.com wrote:
 Hi Joe,
 
Thanks for the reply. That worked; I probably forgot to do this as root last 
time. Yet, the files still show up twice in a directory listing on the mounted 
volume. And it seems to be random whether reading the file will succeed or not. 
I've tried with several files and it sometimes works and sometimes fails; I 
assume this depends on whether it locates the actual file on the brick or the 
link file. Let me know if you have any idea what's going on.
 
Output of the command:
 
$ getfattr -m . -d -e hex 
/data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras
 getfattr: Removing leading '/' from absolute path names
 # file: 
data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras
 
system.posix_acl_access=0x0200010006000400060016002400
 
trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
 trusted.gfid=0x52c2aed77d09412d8bfd7ca70e87b196
 trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3200
 
 
Cheers,
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
 From: Joe Julian j...@julianfamily.org
 Date: 12/20/14 8:53 pm
 To: gluster-users@gluster.org
 
 Try 'getfattr -m . -d -e hex' (dot instead of dash) and, of course, do that as 
root.
 
 On 12/20/2014 06:02 PM, tben...@3vgeomatics.com wrote:
 Hi everyone,
 
We have a distributed Gluster volume on five bricks over two servers (first 
server running gluster 3.4.2, second server running gluster 3.5.1, both running 
Fedora 20)
Starting last week, doing a file listing on the mounted volume shows many files 
with the same name appearing twice (and they are listed with the same inode). 
Doing a search for these files, I have found 290,000 of them!!
 
If I do a listing of these files on the bricks themselves, it looks like most 
are link files (du will show the file on the first server as 0 bytes, and the 
sticky bit set). The file is fine on the second server. Unfortunately, running 
getfattr -m - -e hex -d on the file shows NO gluster-related attributes and I 
believe this is why both files appear in the listing. The files cannot be read 
by any programs as it is trying to read the link file. I assume the metadata 
became corrupted. This is a production server so we really need to know:
 
1. How did this happen, and how can we prevent it going forward? There was a 
server crash a week ago and I believe that was the cause.
2. How can we heal the Gluster volume/bricks and link files. If there is some 
straightforward way of restoring the link file pointer I can write a script to 
do it, obviously doing this manually will be impossible.
 
Thanks very much for any and all help - much appreciated!
 
Regards,
Tom
 
 
On Wed, Dec 17, 2014 at 4:07 AM, tben...@3vgeomatics.com wrote:
 Hi everyone, we have noticed some extremely odd behaviour with our
  distributed Gluster volume where duplicate files (same name, same or
  different content) are being created and stored on multiple bricks. The only
  consistent clue is that one of the duplicate files has the sticky bit set. I
  am hoping someone will be able to shed some light on why this is happening
  and how we can restore the volume as there appear to be hundreds of such
  files. I will try to provide as much pertinent information as I can.
 
  We have a 130TB Gluster volume consisting of two 20TB bricks 

Re: [Gluster-users] Hundreds of duplicate files

2014-12-26 Thread tbenzvi
Thanks for your continued help Joe.
 
A demonstration of the problem, in this case I was able to open the file in vim 
(a text file) without any issues, however sometimes duplicated text files open 
in vim as one line consisting of @ characters, and binary data files can also 
not be opened correctly for reading.
However the duplicate listing is still an issue. Note that Dec 13 was the date 
of a server crash.
[root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro*
-rw-rw-r-T 1 parwant users 1712 Dec 13 19:02 tif2flt.pro
-rw-rw-r-- 1 parwant users 1712 Jun 17 2010 tif2flt.pro
 A few minutes later doing the same listing.. sticky bit disappeared and 
modification date changed
 [root@jongoo ~]# ll /sar/complete/vancouver/refdem/tif2flt.pro*
-rw-rw-r-- 1 parwant users 1712 Jun 17 2010 
/sar/complete/vancouver/refdem/tif2flt.pro
-rw-rw-r-- 1 parwant users 1712 Jun 17 2010 
/sar/complete/vancouver/refdem/tif2flt.pro
 
[root@jongoo ~]# getfattr -m . -d -e hex 
/data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro
getfattr: Removing leading '/' from absolute path names
# file: 
data/glusterfs/safari/brick00/brick/complete/vancouver/refdem/tif2flt.pro
system.posix_acl_access=0x0200010006000400060016002400
trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd
trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3300
 [root@ndovu ~]# getfattr -m . -d -e hex 
/data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro
getfattr: Removing leading '/' from absolute path names
# file: 
data/glusterfs/safari/brick03/brick/complete/vancouver/refdem/tif2flt.pro
system.posix_acl_access=0x0200010006000400060016002400
trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
trusted.gfid=0xdfe13dc088bf4a779488ef72f0a879cd
  
 Log for brick 00:
 [2014-12-27 01:59:24.263095] I [server-handshake.c:575:server_setvolume] 
0-safari-server: accepted client from 
jongoo-1910-2014/12/27-01:59:24:173469-safari-client-0-0-0 (version: 3.5.3)
[2014-12-27 02:02:00.772454] I [server-handshake.c:575:server_setvolume] 
0-safari-server: accepted client from 
jongoo-2015-2014/12/27-02:01:55:694478-safari-client-0-0-0 (version: 3.5.3)
[2014-12-27 02:02:05.780497] I [server-handshake.c:575:server_setvolume] 
0-safari-server: accepted client from 
ndovu-16310-2014/12/27-02:02:00:703051-safari-client-0-0-0 (version: 3.5.3)
[2014-12-27 04:41:07.094149] I [server.c:520:server_rpc_notify] 
0-safari-server: disconnecting connectionfrom 
ndovu-16310-2014/12/27-02:02:00:703051-safari-client-0-0-0
[2014-12-27 04:41:07.094187] I [client_t.c:417:gf_client_unref] 
0-safari-server: Shutting down connection 
ndovu-16310-2014/12/27-02:02:00:703051-safari-client-0-0-0
[2014-12-27 04:41:56.979717] I [server.c:520:server_rpc_notify] 
0-safari-server: disconnecting connectionfrom 
jongoo-2015-2014/12/27-02:01:55:694478-safari-client-0-0-0
[2014-12-27 04:41:56.979761] I [client_t.c:417:gf_client_unref] 
0-safari-server: Shutting down connection 
jongoo-2015-2014/12/27-02:01:55:694478-safari-client-0-0-0
 Log for brick 03:
 [2014-12-27 01:59:24.270123] I [server-handshake.c:575:server_setvolume] 
0-safari-server: accepted client from 
jongoo-1910-2014/12/27-01:59:24:173469-safari-client-3-0-0 (version: 3.5.3)
[2014-12-27 02:02:05.724212] I [server-handshake.c:575:server_setvolume] 
0-safari-server: accepted client from 
jongoo-2015-2014/12/27-02:01:55:694478-safari-client-3-0-0 (version: 3.5.3)
[2014-12-27 02:02:05.778098] I [server-handshake.c:575:server_setvolume] 
0-safari-server: accepted client from 
ndovu-16310-2014/12/27-02:02:00:703051-safari-client-3-0-0 (version: 3.5.3)
[2014-12-27 04:41:07.098381] I [server.c:520:server_rpc_notify] 
0-safari-server: disconnecting connectionfrom 
ndovu-16310-2014/12/27-02:02:00:703051-safari-client-3-0-0
[2014-12-27 04:41:07.098417] I [client_t.c:417:gf_client_unref] 
0-safari-server: Shutting down connection 
ndovu-16310-2014/12/27-02:02:00:703051-safari-client-3-0-0
[2014-12-27 04:41:56.984140] I [server.c:520:server_rpc_notify] 
0-safari-server: disconnecting connectionfrom 
jongoo-2015-2014/12/27-02:01:55:694478-safari-client-3-0-0
[2014-12-27 04:41:56.984203] I [client_t.c:417:gf_client_unref] 
0-safari-server: Shutting down connection 
jongoo-2015-2014/12/27-02:01:55:694478-safari-client-3-0-0
  
 Log for mounted volume:
 [2014-12-27 01:59:24.180253] I [glusterfsd.c:1959:main] 0-/usr/sbin/glusterfs: 
Started running /usr/sbin/glusterfs version 3.5.3 (/usr/sbin/glusterfs 
--volfile-server=ndovu --volfile-id=/safari /sar)
[2014-12-27 01:59:24.199613] I [socket.c:3645:socket_init] 0-glusterfs: SSL 
support is NOT enabled
[2014-12-27 01:59:24.199684] I [socket.c:3660:socket_init] 0-glusterfs: using 

Re: [Gluster-users] Hundreds of duplicate files

2014-12-21 Thread tbenzvi
Actually we are using XFS for the bricks. Still haven't made any progress on 
this issue, unfortunately..
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Anders Blomdell anders.blomd...@control.lth.se
Date: 12/21/14 7:42 pm
To: tben...@3vgeomatics.com, gluster-users@gluster.org


 
 On 21 December 2014 06:37:44 CET, tben...@3vgeomatics.com wrote:
 Hi Joe,
  
 Thanks for the reply. That worked; I probably forgot to do this as root
 last time. Yet, the files still show up twice in a directory listing on
 the mounted volume. And it seems to be random whether reading the file
 will succeed or not. I've tried with several files and it sometimes
 works and sometimes fails; I assume this depends on whether it locates
 the actual file on the brick or the link file. Let me know if you have
 any idea what's going on.
 Does the brick filesystem happen to be ext4? I havs hed the similar problem 
with 3.6.x and 
 ext4 (64 bit offset problem). 
 
  
 Output of the command:
  
 $ getfattr -m . -d -e hex
 /data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras
 getfattr: Removing leading '/' from absolute path names
 # file:
 data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras
 system.posix_acl_access=0x0200010006000400060016002400
 trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
 trusted.gfid=0x52c2aed77d09412d8bfd7ca70e87b196
 trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3200
  
  
 Cheers,
 Tom
  
 - Original Message - Subject: Re: [Gluster-users]
 Hundreds of duplicate files
 From: Joe Julian j...@julianfamily.org
 Date: 12/20/14 8:53 pm
 To: gluster-users@gluster.org
 
 Try 'getfattr -m . -d -e hex' (dot instead of dash) and, of course, do
 that as root.
  
  On 12/20/2014 06:02 PM, tben...@3vgeomatics.com wrote:
  Hi everyone,
  
 We have a distributed Gluster volume on five bricks over two servers
 (first server running gluster 3.4.2, second server running gluster
 3.5.1, both running Fedora 20)
 Starting last week, doing a file listing on the mounted volume shows
 many files with the same name appearing twice (and they are listed with
 the same inode). Doing a search for these files, I have found 290,000
 of them!!
  
 If I do a listing of these files on the bricks themselves, it looks
 like most are link files (du will show the file on the first server as
 0 bytes, and the sticky bit set). The file is fine on the second
 server. Unfortunately, running getfattr -m - -e hex -d on the file
 shows NO gluster-related attributes and I believe this is why both
 files appear in the listing. The files cannot be read by any programs
 as it is trying to read the link file. I assume the metadata became
 corrupted. This is a production server so we really need to know:
  
 1. How did this happen, and how can we prevent it going forward? There
 was a server crash a week ago and I believe that was the cause.
 2. How can we heal the Gluster volume/bricks and link files. If there
 is some straightforward way of restoring the link file pointer I can
 write a script to do it, obviously doing this manually will be
 impossible.
  
 Thanks very much for any and all help - much appreciated!
  
 Regards,
 Tom
  
  
 On Wed, Dec 17, 2014 at 4:07 AM, tben...@3vgeomatics.com wrote:
  Hi everyone, we have noticed some extremely odd behaviour with our
   distributed Gluster volume where duplicate files (same name, same or
  different content) are being created and stored on multiple bricks.
 The only
  consistent clue is that one of the duplicate files has the sticky bit
 set. I
  am hoping someone will be able to shed some light on why this is
 happening
  and how we can restore the volume as there appear to be hundreds of
 such
   files. I will try to provide as much pertinent information as I can.
  
  We have a 130TB Gluster volume consisting of two 20TB bricks on
 server1, and
   three 40TB bricks on a server2 which were added at a later date (and
  rebalancing was done). The volume is mounted on server1, and accessed
 only
  through this server but by many users. Both servers went down due to
 power
  loss several days ago after which this problem was first noticed. We
 ran a
   rebalance command on the volumes, this has not fixed the problem.
  
  
   Gluster volume info:
   Volume Name: safari
   Type: Distribute
   Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda
   Status: Started
   Number of Bricks: 5
   Transport-type: tcp
   Bricks:
   Brick1: server1:/data/glusterfs/safari/brick00/brick
   Brick2: server1:/data/glusterfs/safari/brick01/brick
   Brick3: server2:/data/glusterfs/safari/brick02/brick
   Brick4: server2:/data/glusterfs/safari/brick03/brick
   Brick5: server2:/data/glusterfs/safari/brick04/brick
  
  
   Size information:
   

Re: [Gluster-users] Hundreds of duplicate files

2014-12-20 Thread tbenzvi
Hi everyone,
 
We have a distributed Gluster volume on five bricks over two servers (first 
server running gluster 3.4.2, second server running gluster 3.5.1, both running 
Fedora 20)
Starting last week, doing a file listing on the mounted volume shows many files 
with the same name appearing twice (and they are listed with the same inode). 
Doing a search for these files, I have found 290,000 of them!!
 
If I do a listing of these files on the bricks themselves, it looks like most 
are link files (du will show the file on the first server as 0 bytes, and the 
sticky bit set). The file is fine on the second server. Unfortunately, running 
getfattr -m - -e hex -d on the file shows NO gluster-related attributes and I 
believe this is why both files appear in the listing. The files cannot be read 
by any programs as it is trying to read the link file. I assume the metadata 
became corrupted. This is a production server so we really need to know:
 
1. How did this happen, and how can we prevent it going forward? There was a 
server crash a week ago and I believe that was the cause.
2. How can we heal the Gluster volume/bricks and link files. If there is some 
straightforward way of restoring the link file pointer I can write a script to 
do it, obviously doing this manually will be impossible.
 
Thanks very much for any and all help - much appreciated!
 
Regards,
Tom
 
 
On Wed, Dec 17, 2014 at 4:07 AM, tben...@3vgeomatics.com wrote:
 Hi everyone, we have noticed some extremely odd behaviour with our
  distributed Gluster volume where duplicate files (same name, same or
  different content) are being created and stored on multiple bricks. The only
  consistent clue is that one of the duplicate files has the sticky bit set. I
  am hoping someone will be able to shed some light on why this is happening
  and how we can restore the volume as there appear to be hundreds of such
  files. I will try to provide as much pertinent information as I can.
 
  We have a 130TB Gluster volume consisting of two 20TB bricks on server1, and
  three 40TB bricks on a server2 which were added at a later date (and
  rebalancing was done). The volume is mounted on server1, and accessed only
  through this server but by many users. Both servers went down due to power
  loss several days ago after which this problem was first noticed. We ran a
  rebalance command on the volumes, this has not fixed the problem.
 
 
  Gluster volume info:
  Volume Name: safari
  Type: Distribute
  Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda
  Status: Started
  Number of Bricks: 5
  Transport-type: tcp
  Bricks:
  Brick1: server1:/data/glusterfs/safari/brick00/brick
  Brick2: server1:/data/glusterfs/safari/brick01/brick
  Brick3: server2:/data/glusterfs/safari/brick02/brick
  Brick4: server2:/data/glusterfs/safari/brick03/brick
  Brick5: server2:/data/glusterfs/safari/brick04/brick
 
 
  Size information:
  /dev/sdc 37T 16T 22T 42% /data/glusterfs/safari/brick02
  /dev/sdd 37T 16T 22T 42% /data/glusterfs/safari/brick03
  /dev/sde 37T 17T 21T 45% /data/glusterfs/safari/brick04
  /dev/md126 11T 7.7T 2.8T 74% /data/glusterfs/safari/brick00
  /dev/md124 11T 8.0T 2.5T 77% /data/glusterfs/safari/brick01
  server2:/safari 130T 63T 68T 48% /sar
 
 
  Example 1:
  -Two files with the same name exist in one directory
  -They have different contents and attributes
  -A file listing on the mounted volume shows the same inode
  -The newer file has sticky bit set
  -Neither file is corrupted, they can both be viewed by using the absolute
  path (on the bricks)
 
  File listing on the mounted volume
  13036730497538635177 -rw-rw-r-T 1 jon users 924 Dec 15 10:42 RSLC_tab
  13036730497538635177 -rw-rw-r-- 1 jon users 418 Mar 18 2013 RSLC_tab
 
  Listing of the files on the bricks:
  8925798411 -rw-rw-r-T+ 2 jon users 924 Dec 15 10:42
  /data/glusterfs/safari/brick00/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab
  51541886672 -rw-rw-r--+ 2 1002 users 418 Mar 18 2013
  /data/glusterfs/safari/brick02/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab
 
 
  Example 2:
  -Two files with the same name exist in one directory
  -They have the same content and attributes
  -No sticky bit is set when looking at file listing on the mounted volume
  -Sticky bit is set for one while when looking at file listing on the bricks
  -Files are corrupted
 
  File listing on the mounted volume:
  13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013
  ifg_lr/20130226_20130813.diff.phi.ras
  13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013
  ifg_lr/20130226_20130813.diff.phi.ras
 
  Listing of the files on the bricks:
  17058578 -rw-rw-r-T+ 2 tom users 2393848 Dec 13 17:11
  /data/glusterfs/safari/brick00/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
  57986922129 -rw-rw-r--+ 2 1010 users 2393848 Dec 8 2013
  /data/glusterfs/safari/brick02/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
 

Re: [Gluster-users] Hundreds of duplicate files

2014-12-20 Thread tbenzvi
Hi Joe,
 
Thanks for the reply. That worked; I probably forgot to do this as root last 
time. Yet, the files still show up twice in a directory listing on the mounted 
volume. And it seems to be random whether reading the file will succeed or not. 
I've tried with several files and it sometimes works and sometimes fails; I 
assume this depends on whether it locates the actual file on the brick or the 
link file. Let me know if you have any idea what's going on.
 
Output of the command:
 
$ getfattr -m . -d -e hex 
/data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras
getfattr: Removing leading '/' from absolute path names
# file: 
data/glusterfs/safari/brick01/brick/rsc/tsx/montreal_smaller/sm_asc/stack/slc/20130210.slc.ras
system.posix_acl_access=0x0200010006000400060016002400
trusted.SGI_ACL_FILE=0x000400010006000400060010000600200004
trusted.gfid=0x52c2aed77d09412d8bfd7ca70e87b196
trusted.glusterfs.dht.linkto=0x7361666172692d636c69656e742d3200
 
 
Cheers,
Tom
 
- Original Message - Subject: Re: [Gluster-users] Hundreds of 
duplicate files
From: Joe Julian j...@julianfamily.org
Date: 12/20/14 8:53 pm
To: gluster-users@gluster.org

Try 'getfattr -m . -d -e hex' (dot instead of dash) and, of course, do that as 
root.
 
 On 12/20/2014 06:02 PM, tben...@3vgeomatics.com wrote:
 Hi everyone,
 
We have a distributed Gluster volume on five bricks over two servers (first 
server running gluster 3.4.2, second server running gluster 3.5.1, both running 
Fedora 20)
Starting last week, doing a file listing on the mounted volume shows many files 
with the same name appearing twice (and they are listed with the same inode). 
Doing a search for these files, I have found 290,000 of them!!
 
If I do a listing of these files on the bricks themselves, it looks like most 
are link files (du will show the file on the first server as 0 bytes, and the 
sticky bit set). The file is fine on the second server. Unfortunately, running 
getfattr -m - -e hex -d on the file shows NO gluster-related attributes and I 
believe this is why both files appear in the listing. The files cannot be read 
by any programs as it is trying to read the link file. I assume the metadata 
became corrupted. This is a production server so we really need to know:
 
1. How did this happen, and how can we prevent it going forward? There was a 
server crash a week ago and I believe that was the cause.
2. How can we heal the Gluster volume/bricks and link files. If there is some 
straightforward way of restoring the link file pointer I can write a script to 
do it, obviously doing this manually will be impossible.
 
Thanks very much for any and all help - much appreciated!
 
Regards,
Tom
 
 
On Wed, Dec 17, 2014 at 4:07 AM, tben...@3vgeomatics.com wrote:
 Hi everyone, we have noticed some extremely odd behaviour with our
  distributed Gluster volume where duplicate files (same name, same or
  different content) are being created and stored on multiple bricks. The only
  consistent clue is that one of the duplicate files has the sticky bit set. I
  am hoping someone will be able to shed some light on why this is happening
  and how we can restore the volume as there appear to be hundreds of such
  files. I will try to provide as much pertinent information as I can.
 
  We have a 130TB Gluster volume consisting of two 20TB bricks on server1, and
  three 40TB bricks on a server2 which were added at a later date (and
  rebalancing was done). The volume is mounted on server1, and accessed only
  through this server but by many users. Both servers went down due to power
  loss several days ago after which this problem was first noticed. We ran a
  rebalance command on the volumes, this has not fixed the problem.
 
 
  Gluster volume info:
  Volume Name: safari
  Type: Distribute
  Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda
  Status: Started
  Number of Bricks: 5
  Transport-type: tcp
  Bricks:
  Brick1: server1:/data/glusterfs/safari/brick00/brick
  Brick2: server1:/data/glusterfs/safari/brick01/brick
  Brick3: server2:/data/glusterfs/safari/brick02/brick
  Brick4: server2:/data/glusterfs/safari/brick03/brick
  Brick5: server2:/data/glusterfs/safari/brick04/brick
 
 
  Size information:
  /dev/sdc 37T 16T 22T 42% /data/glusterfs/safari/brick02
  /dev/sdd 37T 16T 22T 42% /data/glusterfs/safari/brick03
  /dev/sde 37T 17T 21T 45% /data/glusterfs/safari/brick04
  /dev/md126 11T 7.7T 2.8T 74% /data/glusterfs/safari/brick00
  /dev/md124 11T 8.0T 2.5T 77% /data/glusterfs/safari/brick01
  server2:/safari 130T 63T 68T 48% /sar
 
 
  Example 1:
  -Two files with the same name exist in one directory
  -They have different contents and attributes
  -A file listing on the mounted volume shows the same inode
  -The newer file has sticky bit set
  -Neither file is corrupted, they can both be viewed by using 

[Gluster-users] Hundreds of duplicate files

2014-12-17 Thread tbenzvi
 Hi everyone, we have noticed some extremely odd behaviour with our distributed 
Gluster volume where duplicate files (same name, same or different content) are 
being created and stored on multiple bricks. The only consistent clue is that 
one of the duplicate files has the sticky bit set. I am hoping someone will be 
able to shed some light on why this is happening and how we can restore the 
volume as there appear to be hundreds of such files. I will try to provide as 
much pertinent information as I can.
 We have a 130TB Gluster volume consisting of two 20TB bricks on server1, and 
three 40TB bricks on a server2 which were added at a later date (and 
rebalancing was done). The volume is mounted on server1, and accessed only 
through this server but by many users. Both servers went down due to power loss 
several days ago after which this problem was first noticed. We ran a rebalance 
command on the volumes, this has not fixed the problem.
 
Gluster volume info:
Volume Name: safari
Type: Distribute
Volume ID: d48d0e6b-4389-4c2c-8fd1-cd2854121eda
Status: Started
Number of Bricks: 5
Transport-type: tcp
Bricks:
Brick1: server1:/data/glusterfs/safari/brick00/brick
Brick2: server1:/data/glusterfs/safari/brick01/brick
Brick3: server2:/data/glusterfs/safari/brick02/brick
Brick4: server2:/data/glusterfs/safari/brick03/brick
Brick5: server2:/data/glusterfs/safari/brick04/brick
 
Size information:
/dev/sdc 37T 16T 22T 42% /data/glusterfs/safari/brick02
/dev/sdd 37T 16T 22T 42% /data/glusterfs/safari/brick03
/dev/sde 37T 17T 21T 45% /data/glusterfs/safari/brick04
/dev/md126 11T 7.7T 2.8T 74% /data/glusterfs/safari/brick00
/dev/md124 11T 8.0T 2.5T 77% /data/glusterfs/safari/brick01
server2:/safari 130T 63T 68T 48% /sar
 
Example 1:
-Two files with the same name exist in one directory
-They have different contents and attributes
-A file listing on the mounted volume shows the same inode
-The newer file has sticky bit set
-Neither file is corrupted, they can both be viewed by using the absolute path 
(on the bricks)
 File listing on the mounted volume
13036730497538635177 -rw-rw-r-T 1 jon users 924 Dec 15 10:42 RSLC_tab
13036730497538635177 -rw-rw-r-- 1 jon users 418 Mar 18 2013 RSLC_tab
 Listing of the files on the bricks:
8925798411 -rw-rw-r-T+ 2 jon users 924 Dec 15 10:42 
/data/glusterfs/safari/brick00/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab
51541886672 -rw-rw-r--+ 2 1002 users 418 Mar 18 2013 
/data/glusterfs/safari/brick02/brick/complete/shm/rs2/ottawa/mf6_asc/stack_org/RSLC_tab
 
Example 2: 
-Two files with the same name exist in one directory
-They have the same content and attributes
-No sticky bit is set when looking at file listing on the mounted volume
-Sticky bit is set for one while when looking at file listing on the bricks
-Files are corrupted
 File listing on the mounted volume:
13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013 
ifg_lr/20130226_20130813.diff.phi.ras
13012555852904096080 -rw-rw-r-- 1 tom users 2393848 Dec 8 2013 
ifg_lr/20130226_20130813.diff.phi.ras
 Listing of the files on the bricks:
17058578 -rw-rw-r-T+ 2 tom users 2393848 Dec 13 17:11 
/data/glusterfs/safari/brick00/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
57986922129 -rw-rw-r--+ 2 1010 users 2393848 Dec 8 2013 
/data/glusterfs/safari/brick02/brick/rsc/rs2/calgary/u22_dsc/stack_org/ifg_lr/20130226_20130813.diff.phi.ras
 
Additionally, only some files in this directory are duplicated. The duplicated 
files are corrupted (can not be viewed as Raster images: the original file 
type) 
The files which are not duplicated are not corrupted.
 File command: (notice duplicate and singleton files)
ifg_lr/20091021_20100218.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20101016.diff.phi.ras: data
ifg_lr/20091021_20101016.diff.phi.ras: data
ifg_lr/20091021_20101109.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20101203.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20101227.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20110120.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
ifg_lr/20091021_20110213.diff.phi.ras: data
ifg_lr/20091021_20110213.diff.phi.ras: data
ifg_lr/20091021_20110309.diff.phi.ras: data
ifg_lr/20091021_20110309.diff.phi.ras: sticky data
ifg_lr/20091021_20110402.diff.phi.ras: Sun raster image data, 1208 x 1981, 
8-bit, RGB colormap
 
Information from Gluster log file:
Additionally, the log is full of thousands of the following such lines 
(possibly, one for each directory?) dating back several mponths
27 [2014-12-12 11:10:10.257950] I [dht-layout.c:726:dht_layout_dir_mismatch] 
3-safari-dht: /rsc/tsx/lasvegas/spot_asc/stack/ifg_lr - disk layout missing
28 [2014-12-12 11:10:10.257988] I [dht-common.c:623:dht_revalidate_cbk] 
3-safari-dht: mismatching layouts for