Re: [Gluster-users] self service snapshot access broken with 3.7.11
Hi Alastair, Can you please provide the snap daemon logs. It is present in /var/log/glusterfs/snaps/snapd.log. Provide the snapd logs of the node from which you have mounted the volume (i.e. the node whose ip address/hostname you have given while mounting the volume). Regards, Raghavendra On Fri, Apr 22, 2016 at 5:19 PM, Alastair Neilwrote: > I just upgraded my cluster to 3.7.11 from 3.7.10 and access to the .snaps > directories now fail with > > bash: cd: .snaps: Transport endpoint is not connected > > > in the volume log file on the client I see: > > 016-04-22 21:08:28.005854] I [rpc-clnt.c:1847:rpc_clnt_reconfig] >> 2-homes-snapd-client: changing port to 49493 (from 0) >> [2016-04-22 21:08:28.009558] E [socket.c:2278:socket_connect_finish] >> 2-homes-snapd-client: connection to xx.xx.xx.xx.xx:49493 failed (No route >> to host) > > > I'm quite perplexed, now it's not a network issue or DNS as far as I can > tell, the glusterfs client is working fine, and the gluster servers all > resolve ok. It seems to be happening on all the clients I have tried > different systems with 3.7.8, 3.7.10, and 3.7.11 version clients and see > the same failure on all of them. > > On the servers the snapshots are being taken as expected and they are > started: > > Snapshot : >> Scheduled-Homes_Hourly-homes_GMT-2016.04.22-16.00.01 >> Snap UUID : 91ba50b0-d8f2-4135-9ea5-edfdfe2ce61d >> Created : 2016-04-22 16:00:01 >> Snap Volumes: >> Snap Volume Name : 5170144102814026a34f8f948738406f >> Origin Volume name: homes >> Snaps taken for homes : 16 >> Snaps available for homes : 240 >> Status: Started > > > > the homes volume is replica 3 all the peers are up and so are all the > bricks and services: > > glv status homes >> Status of volume: homes >> Gluster process TCP Port RDMA Port Online >> Pid >> >> -- >> Brick gluster-2:/export/brick2/home 49171 0 Y >> 38298 >> Brick gluster0:/export/brick2/home 49154 0 Y >> 23519 >> Brick gluster1.vsnet.gmu.edu:/export/brick2 >> /home 49154 0 Y >> 23794 >> Snapshot Daemon on localhost49486 0 Y >> 23699 >> NFS Server on localhost 2049 0 Y >> 23486 >> Self-heal Daemon on localhost N/A N/AY >> 23496 >> Snapshot Daemon on gluster-249261 0 Y >> 38479 >> NFS Server on gluster-2 2049 0 Y >> 39640 >> Self-heal Daemon on gluster-2 N/A N/AY >> 39709 >> Snapshot Daemon on gluster1 49480 0 Y >> 23982 >> NFS Server on gluster1 2049 0 Y >> 23766 >> Self-heal Daemon on gluster1N/A N/AY >> 23776 >> >> Task Status of Volume homes >> >> -- >> There are no active volume tasks > > > I'd appreciate any ideas about troubleshooting this. I tried disable > .snaps access on the volume and re-enabling it but is made no difference. > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] self service snapshot access broken with 3.7.11
I just upgraded my cluster to 3.7.11 from 3.7.10 and access to the .snaps directories now fail with bash: cd: .snaps: Transport endpoint is not connected in the volume log file on the client I see: 016-04-22 21:08:28.005854] I [rpc-clnt.c:1847:rpc_clnt_reconfig] > 2-homes-snapd-client: changing port to 49493 (from 0) > [2016-04-22 21:08:28.009558] E [socket.c:2278:socket_connect_finish] > 2-homes-snapd-client: connection to xx.xx.xx.xx.xx:49493 failed (No route > to host) I'm quite perplexed, now it's not a network issue or DNS as far as I can tell, the glusterfs client is working fine, and the gluster servers all resolve ok. It seems to be happening on all the clients I have tried different systems with 3.7.8, 3.7.10, and 3.7.11 version clients and see the same failure on all of them. On the servers the snapshots are being taken as expected and they are started: Snapshot : > Scheduled-Homes_Hourly-homes_GMT-2016.04.22-16.00.01 > Snap UUID : 91ba50b0-d8f2-4135-9ea5-edfdfe2ce61d > Created : 2016-04-22 16:00:01 > Snap Volumes: > Snap Volume Name : 5170144102814026a34f8f948738406f > Origin Volume name: homes > Snaps taken for homes : 16 > Snaps available for homes : 240 > Status: Started the homes volume is replica 3 all the peers are up and so are all the bricks and services: glv status homes > Status of volume: homes > Gluster process TCP Port RDMA Port Online > Pid > > -- > Brick gluster-2:/export/brick2/home 49171 0 Y > 38298 > Brick gluster0:/export/brick2/home 49154 0 Y > 23519 > Brick gluster1.vsnet.gmu.edu:/export/brick2 > /home 49154 0 Y > 23794 > Snapshot Daemon on localhost49486 0 Y > 23699 > NFS Server on localhost 2049 0 Y > 23486 > Self-heal Daemon on localhost N/A N/AY > 23496 > Snapshot Daemon on gluster-249261 0 Y > 38479 > NFS Server on gluster-2 2049 0 Y > 39640 > Self-heal Daemon on gluster-2 N/A N/AY > 39709 > Snapshot Daemon on gluster1 49480 0 Y > 23982 > NFS Server on gluster1 2049 0 Y > 23766 > Self-heal Daemon on gluster1N/A N/AY > 23776 > > Task Status of Volume homes > > -- > There are no active volume tasks I'd appreciate any ideas about troubleshooting this. I tried disable .snaps access on the volume and re-enabling it but is made no difference. ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] What is the corresponding op-version for a glusterfs release?
-Atin Sent from one plus one On 22-Apr-2016 8:04 pm, "Dj Merrill"wrote: > > On 04/20/2016 07:32 PM, Atin Mukherjee wrote: > > Unfortunately there is no such document. But I can take you through > > couple of code files [1] [2] where the first one defines all the volume > > tunables and their respective supported op-version where the later has > > the exact number of all those version variables. > > > > [1] > > https://github.com/gluster/glusterfs/blob/release-3.7/xlators/mgmt/glusterd/src/glusterd-volume-set.c > > [2] > > https://github.com/gluster/glusterfs/blob/release-3.7/libglusterfs/src/globals.h > > > > ~Atin > > > Thanks, Atin, this is very helpful! > > Looks like I have some research to do to figure out if any of the > features released since op-version=2 would be useful for us. > > Is there any documentation outlining "recommended" settings for a 2 > server replicated setup running the latest version of Gluster? Nothing as such, ensure quorum is not enabled. > > Thanks, > > -Dj > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] What is the corresponding op-version for a glusterfs release?
On 04/20/2016 07:32 PM, Atin Mukherjee wrote: > Unfortunately there is no such document. But I can take you through > couple of code files [1] [2] where the first one defines all the volume > tunables and their respective supported op-version where the later has > the exact number of all those version variables. > > [1] > https://github.com/gluster/glusterfs/blob/release-3.7/xlators/mgmt/glusterd/src/glusterd-volume-set.c > [2] > https://github.com/gluster/glusterfs/blob/release-3.7/libglusterfs/src/globals.h > > ~Atin Thanks, Atin, this is very helpful! Looks like I have some research to do to figure out if any of the features released since op-version=2 would be useful for us. Is there any documentation outlining "recommended" settings for a 2 server replicated setup running the latest version of Gluster? Thanks, -Dj ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] disperse volume file to subvolume mapping
Scanned files are 1112 only on the node the rebalance command run, all other fields are 0 for every nodes. If the issue is happening because of temp file name, we will make sure not to use temp files while using gluster. On Fri, Apr 22, 2016 at 9:43 AM, Xavier Hernandezwrote: > Even the number of scanned files is 0 ? > > This seems an issue with DHT. I'm not an expert on this area. Not sure if > the regular expression pattern that some files still match could cause an > interference with rebalance. > > Anyway, if you have found a solution for your use case, it's ok for me. > > Best regards, > > Xavi > > > On 22/04/16 08:24, Serkan Çoban wrote: >> >> Not only skipped column but all columns are 0 in rebalance status >> command. It seems rebalance does not to anything. All '-T' >> files are there. Anyway we wrote our custom mapreduce tool and it is >> copying files right now to gluster and it is utilizing all 60 nodes as >> expected. I will delete distcp folder and continue if you don't need >> any further log/debug files to examine the issue. >> >> Thanks for help, >> Serkan >> >> On Fri, Apr 22, 2016 at 9:15 AM, Xavier Hernandez >> wrote: >>> >>> When you execute a rebalance 'force' the skipped column should be 0 for >>> all >>> nodes and all '-T' files must have disappeared. Otherwise >>> something >>> failed. Is this true in your case ? >>> >>> >>> On 21/04/16 15:19, Serkan Çoban wrote: Same result. Also checked the rebalance.log file, it has also no reference to part files... On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandez wrote: > > > Can you try a 'gluster volume rebalance v0 start force' ? > > > On 21/04/16 14:23, Serkan Çoban wrote: >>> >>> >>> >>> Has the rebalance operation finished successfully ? has it skipped >>> any >>> files ? >> >> >> >> Yes according to gluster v rebalance status it is completed without >> any >> errors. >> rebalance status report is like: >> Node Rebalanced files size Scanned >> failures skipped >> 1.1.1.185 158 29GB 1720 >> 0 314 >> 1.1.1.20593 46.5GB 761 >> 0 95 >> 1.1.1.22574 37GB 779 >> 0 94 >> >> >> All other hosts has 0 values. >> >> I double check that files with '-T' attributes are there, >> maybe some of them deleted but I still see them in bricks... >> I am also concerned why part files not distributed to all 60 nodes? >> Rebalance should do that? >> >> On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez >> >> wrote: >>> >>> >>> >>> Hi Serkan, >>> >>> On 21/04/16 12:39, Serkan Çoban wrote: I started a gluster v rebalance v0 start command hoping that it will equally redistribute files across 60 nodes but it did not do that... why it did not redistribute files? any thoughts? >>> >>> >>> >>> >>> >>> Has the rebalance operation finished successfully ? has it skipped >>> any >>> files >>> ? >>> >>> After a successful rebalance all files with attributes '-T' >>> should >>> have disappeared. >>> >>> On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez wrote: > > > > > Hi Serkan, > > On 21/04/16 10:07, Serkan Çoban wrote: >>> >>> >>> >>> >>> >>> I think the problem is in the temporary name that distcp gives to >>> the >>> file while it's being copied before renaming it to the real name. >>> Do >>> you >>> know what is the structure of this name ? >> >> >> >> >> >> Distcp temporary file name format is: >> ".distcp.tmp.attempt_1460381790773_0248_m_01_0" and the same >> temporary file name used by one map process. For example I see in >> the >> logs that one map copies files >> part-m-00031,part-m-00047,part-m-00063 >> sequentially and they all use same temporary file name above. So >> no >> original file name appears in temporary file name. > > > > > > > This explains the problem. With the default options, DHT sends all > files > to > the subvolume that should store a file named 'distcp.tmp'. > > With this temporary name format, little can be done. > >> >> I will check if we can modify distcp behaviour, or we have to
Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
Some time ago I saw an issue with Gluster-NFS combined with disperse under high write load. I thought that it was already solved, but this issue is very similar. The problem seemed to be related to multithreaded epoll and throttling. For some reason NFS was sending a massive amount of requests, ignoring the throttling threshold. This caused the NFS connection to be unresponsive. This combined with a held lock at the time of the hung causes it to never be released, blocking other clients. Maybe it's not related to this problem, but I though it could be important to consider it. Xavi On 22/04/16 08:19, Ashish Pandey wrote: Hi Chen, I thought I replied to your previous mail. This issue has been faced by other users also. Serkan is the one if you follow his mail on gluster-user. I still have to dig further into it. Soon we will try to reproduce it and debug it. My observation is that we face this issue while IO is going on and one of the server gets disconnect and reconnects. This incident might happen because of update or network issue. But in any way we should not come to this situation. I am adding Pranith and Xavi who can address any unanswered queries and explanation. - Ashish *From: *"Chen Chen"*To: *"Joe Julian" , "Ashish Pandey" *Cc: *"Gluster Users" *Sent: *Friday, April 22, 2016 8:28:48 AM *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd Hi Ashish, Are you still watching this thread? I got no response after I sent the info you requested. Also, could anybody explain what heal-lock is doing? I got another inode lock yesterday. Only one lock occured in the whole 12 bricks, yet it stopped the cluster from working again. None of my peer's OS is frozen, and this time "start force" worked. -- [xlator.features.locks.mainvol-locks.inode] path=/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf mandatory=0 inodelk-count=2 lock-dump.domain.domain=mainvol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc3dbfac887f, client=0x7f649835adb0, connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, granted at 2016-04-21 11:45:30 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=d433bfac887f, client=0x7f649835adb0, connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, blocked at 2016-04-21 11:45:33 -- I've also filed a bug report on bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=1329466 Best regards, Chen On 4/13/2016 10:31 PM, Joe Julian wrote: > > > On 04/13/2016 03:29 AM, Ashish Pandey wrote: >> Hi Chen, >> >> What do you mean by "instantly get inode locked and teared down >> the whole cluster" ? Do you mean that whole disperse volume became >> unresponsive? >> >> I don't have much idea about features.lock-heal so can't comment how >> can it help you. > > So who should get added to this email that would have an idea? Let's get > that person looped in. > >> >> Could you please explain second part of your mail? What exactly are >> you trying to do and what is the setup? >> Also volume info, logs statedumps might help. >> >> - >> Ashish >> >> >> >> *From: *"Chen Chen" >> *To: *"Ashish Pandey" >> *Cc: *gluster-users@gluster.org >> *Sent: *Wednesday, April 13, 2016 3:26:53 PM >> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata / >> Failed combine iatt / Too many fd >> >> Hi Ashish and other Gluster Users, >> >> When I put some heavy IO load onto my cluster (a rsync operation, >> ~600MB/s), one of the node instantly get inode locked and teared down >> the whole cluster. I've already turned on "features.lock-heal" but it >> didn't help. >> >> My clients is using a round-robin tactic to mount servers, hoping to >> average the pressure. Could it be caused by a race between NFS servers >> on different nodes? Should I instead create a dedicated NFS Server with >> huge memory, no brick, and multiple Ethernet cables? >> >> I really appreciate any help from you guys. >> >> Best wishes, >> Chen >> >> PS. Don't know why the native fuse client is 5 times inferior than the >> old good NFSv3. >> >> On 4/4/2016 6:11 PM, Ashish Pandey wrote: >> > Hi Chen, >> > >> > As I suspected, there are many blocked call for inodelk in >> sm11/mnt-disk1-mainvol.31115.dump.1459760675. >> > >> > = >> > [xlator.features.locks.mainvol-locks.inode] >> > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar >> > mandatory=0 >> > inodelk-count=4 >> > lock-dump.domain.domain=mainvol-disperse-0:self-heal >> >
Re: [Gluster-users] disperse volume file to subvolume mapping
Even the number of scanned files is 0 ? This seems an issue with DHT. I'm not an expert on this area. Not sure if the regular expression pattern that some files still match could cause an interference with rebalance. Anyway, if you have found a solution for your use case, it's ok for me. Best regards, Xavi On 22/04/16 08:24, Serkan Çoban wrote: Not only skipped column but all columns are 0 in rebalance status command. It seems rebalance does not to anything. All '-T' files are there. Anyway we wrote our custom mapreduce tool and it is copying files right now to gluster and it is utilizing all 60 nodes as expected. I will delete distcp folder and continue if you don't need any further log/debug files to examine the issue. Thanks for help, Serkan On Fri, Apr 22, 2016 at 9:15 AM, Xavier Hernandezwrote: When you execute a rebalance 'force' the skipped column should be 0 for all nodes and all '-T' files must have disappeared. Otherwise something failed. Is this true in your case ? On 21/04/16 15:19, Serkan Çoban wrote: Same result. Also checked the rebalance.log file, it has also no reference to part files... On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandez wrote: Can you try a 'gluster volume rebalance v0 start force' ? On 21/04/16 14:23, Serkan Çoban wrote: Has the rebalance operation finished successfully ? has it skipped any files ? Yes according to gluster v rebalance status it is completed without any errors. rebalance status report is like: Node Rebalanced files size Scanned failures skipped 1.1.1.185 158 29GB 1720 0 314 1.1.1.20593 46.5GB 761 0 95 1.1.1.22574 37GB 779 0 94 All other hosts has 0 values. I double check that files with '-T' attributes are there, maybe some of them deleted but I still see them in bricks... I am also concerned why part files not distributed to all 60 nodes? Rebalance should do that? On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez wrote: Hi Serkan, On 21/04/16 12:39, Serkan Çoban wrote: I started a gluster v rebalance v0 start command hoping that it will equally redistribute files across 60 nodes but it did not do that... why it did not redistribute files? any thoughts? Has the rebalance operation finished successfully ? has it skipped any files ? After a successful rebalance all files with attributes '-T' should have disappeared. On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez wrote: Hi Serkan, On 21/04/16 10:07, Serkan Çoban wrote: I think the problem is in the temporary name that distcp gives to the file while it's being copied before renaming it to the real name. Do you know what is the structure of this name ? Distcp temporary file name format is: ".distcp.tmp.attempt_1460381790773_0248_m_01_0" and the same temporary file name used by one map process. For example I see in the logs that one map copies files part-m-00031,part-m-00047,part-m-00063 sequentially and they all use same temporary file name above. So no original file name appears in temporary file name. This explains the problem. With the default options, DHT sends all files to the subvolume that should store a file named 'distcp.tmp'. With this temporary name format, little can be done. I will check if we can modify distcp behaviour, or we have to write our mapreduce procedures instead of using distcp. 2. define the option 'extra-hash-regex' to an expression that matches your temporary file names and returns the same name that will finally have. Depending on the differences between original and temporary file names, this option could be useless. 3. set the option 'rsync-hash-regex' to 'none'. This will prevent the name conversion, so the files will be evenly distributed. However this will cause a lot of files placed in incorrect subvolumes, creating a lot of link files until a rebalance is executed. How can I set these options? You can set gluster options using: gluster volume set for example: gluster volume set v0 rsync-hash-regex none Xavi On Thu, Apr 21, 2016 at 10:00 AM, Xavier Hernandez wrote: Hi Serkan, I think the problem is in the temporary name that distcp gives to the file while it's being copied before renaming it to the real name. Do you know what is the structure of this name ? DHT selects the subvolume (in this case the ec set) on which the file will be stored based on the name of the file. This has a problem when a file is being renamed, because this could change the subvolume where the file should be found. DHT has a feature to avoid incorrect file placements when executing renames for the rsync case. What it does is to check if the file matches the following
Re: [Gluster-users] disperse volume file to subvolume mapping
Not only skipped column but all columns are 0 in rebalance status command. It seems rebalance does not to anything. All '-T' files are there. Anyway we wrote our custom mapreduce tool and it is copying files right now to gluster and it is utilizing all 60 nodes as expected. I will delete distcp folder and continue if you don't need any further log/debug files to examine the issue. Thanks for help, Serkan On Fri, Apr 22, 2016 at 9:15 AM, Xavier Hernandezwrote: > When you execute a rebalance 'force' the skipped column should be 0 for all > nodes and all '-T' files must have disappeared. Otherwise something > failed. Is this true in your case ? > > > On 21/04/16 15:19, Serkan Çoban wrote: >> >> Same result. Also checked the rebalance.log file, it has also no >> reference to part files... >> >> On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandez >> wrote: >>> >>> Can you try a 'gluster volume rebalance v0 start force' ? >>> >>> >>> On 21/04/16 14:23, Serkan Çoban wrote: > > > Has the rebalance operation finished successfully ? has it skipped any > files ? Yes according to gluster v rebalance status it is completed without any errors. rebalance status report is like: Node Rebalanced files size Scanned failures skipped 1.1.1.185 158 29GB 1720 0 314 1.1.1.20593 46.5GB 761 0 95 1.1.1.22574 37GB 779 0 94 All other hosts has 0 values. I double check that files with '-T' attributes are there, maybe some of them deleted but I still see them in bricks... I am also concerned why part files not distributed to all 60 nodes? Rebalance should do that? On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez wrote: > > > Hi Serkan, > > On 21/04/16 12:39, Serkan Çoban wrote: >> >> >> >> I started a gluster v rebalance v0 start command hoping that it will >> equally redistribute files across 60 nodes but it did not do that... >> why it did not redistribute files? any thoughts? > > > > > Has the rebalance operation finished successfully ? has it skipped any > files > ? > > After a successful rebalance all files with attributes '-T' > should > have disappeared. > > >> >> On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez >> wrote: >>> >>> >>> >>> Hi Serkan, >>> >>> On 21/04/16 10:07, Serkan Çoban wrote: > > > > > I think the problem is in the temporary name that distcp gives to > the > file while it's being copied before renaming it to the real name. > Do > you > know what is the structure of this name ? Distcp temporary file name format is: ".distcp.tmp.attempt_1460381790773_0248_m_01_0" and the same temporary file name used by one map process. For example I see in the logs that one map copies files part-m-00031,part-m-00047,part-m-00063 sequentially and they all use same temporary file name above. So no original file name appears in temporary file name. >>> >>> >>> >>> >>> >>> This explains the problem. With the default options, DHT sends all >>> files >>> to >>> the subvolume that should store a file named 'distcp.tmp'. >>> >>> With this temporary name format, little can be done. >>> I will check if we can modify distcp behaviour, or we have to write our mapreduce procedures instead of using distcp. > 2. define the option 'extra-hash-regex' to an expression that > matches > your temporary file names and returns the same name that will > finally > have. > Depending on the differences between original and temporary file > names, > this > option could be useless. > 3. set the option 'rsync-hash-regex' to 'none'. This will prevent > the > name conversion, so the files will be evenly distributed. However > this > will > cause a lot of files placed in incorrect subvolumes, creating a lot > of > link > files until a rebalance is executed. How can I set these options? >>> >>> >>> >>> >>> >>> You can set gluster options using: >>> >>> gluster volume set >>> >>> for example: >>> >>> gluster volume set v0 rsync-hash-regex none >>> >>> Xavi >>> >>>
Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd
Hi Chen, I thought I replied to your previous mail. This issue has been faced by other users also. Serkan is the one if you follow his mail on gluster-user. I still have to dig further into it. Soon we will try to reproduce it and debug it. My observation is that we face this issue while IO is going on and one of the server gets disconnect and reconnects. This incident might happen because of update or network issue. But in any way we should not come to this situation. I am adding Pranith and Xavi who can address any unanswered queries and explanation. - Ashish - Original Message - From: "Chen Chen"To: "Joe Julian" , "Ashish Pandey" Cc: "Gluster Users" Sent: Friday, April 22, 2016 8:28:48 AM Subject: Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd Hi Ashish, Are you still watching this thread? I got no response after I sent the info you requested. Also, could anybody explain what heal-lock is doing? I got another inode lock yesterday. Only one lock occured in the whole 12 bricks, yet it stopped the cluster from working again. None of my peer's OS is frozen, and this time "start force" worked. -- [xlator.features.locks.mainvol-locks.inode] path=/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf mandatory=0 inodelk-count=2 lock-dump.domain.domain=mainvol-disperse-0 inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=dc3dbfac887f, client=0x7f649835adb0, connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, granted at 2016-04-21 11:45:30 inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 1, owner=d433bfac887f, client=0x7f649835adb0, connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, blocked at 2016-04-21 11:45:33 -- I've also filed a bug report on bugzilla. https://bugzilla.redhat.com/show_bug.cgi?id=1329466 Best regards, Chen On 4/13/2016 10:31 PM, Joe Julian wrote: > > > On 04/13/2016 03:29 AM, Ashish Pandey wrote: >> Hi Chen, >> >> What do you mean by "instantly get inode locked and teared down >> the whole cluster" ? Do you mean that whole disperse volume became >> unresponsive? >> >> I don't have much idea about features.lock-heal so can't comment how >> can it help you. > > So who should get added to this email that would have an idea? Let's get > that person looped in. > >> >> Could you please explain second part of your mail? What exactly are >> you trying to do and what is the setup? >> Also volume info, logs statedumps might help. >> >> - >> Ashish >> >> >> >> *From: *"Chen Chen" >> *To: *"Ashish Pandey" >> *Cc: *gluster-users@gluster.org >> *Sent: *Wednesday, April 13, 2016 3:26:53 PM >> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata / >> Failed combine iatt / Too many fd >> >> Hi Ashish and other Gluster Users, >> >> When I put some heavy IO load onto my cluster (a rsync operation, >> ~600MB/s), one of the node instantly get inode locked and teared down >> the whole cluster. I've already turned on "features.lock-heal" but it >> didn't help. >> >> My clients is using a round-robin tactic to mount servers, hoping to >> average the pressure. Could it be caused by a race between NFS servers >> on different nodes? Should I instead create a dedicated NFS Server with >> huge memory, no brick, and multiple Ethernet cables? >> >> I really appreciate any help from you guys. >> >> Best wishes, >> Chen >> >> PS. Don't know why the native fuse client is 5 times inferior than the >> old good NFSv3. >> >> On 4/4/2016 6:11 PM, Ashish Pandey wrote: >> > Hi Chen, >> > >> > As I suspected, there are many blocked call for inodelk in >> sm11/mnt-disk1-mainvol.31115.dump.1459760675. >> > >> > = >> > [xlator.features.locks.mainvol-locks.inode] >> > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar >> > mandatory=0 >> > inodelk-count=4 >> > lock-dump.domain.domain=mainvol-disperse-0:self-heal >> > lock-dump.domain.domain=mainvol-disperse-0 >> > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid >> = 1, owner=dc2d3dfcc57f, client=0x7ff03435d5f0, >> connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, >> blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58 >> > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, >> pid = 1, owner=1414371e1a7f, client=0x7ff034204490, >> connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, >> blocked at 2016-04-01 16:58:51 >> > inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, >> pid = 1, owner=a8eb14cd9b7f,
Re: [Gluster-users] disperse volume file to subvolume mapping
When you execute a rebalance 'force' the skipped column should be 0 for all nodes and all '-T' files must have disappeared. Otherwise something failed. Is this true in your case ? On 21/04/16 15:19, Serkan Çoban wrote: Same result. Also checked the rebalance.log file, it has also no reference to part files... On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandezwrote: Can you try a 'gluster volume rebalance v0 start force' ? On 21/04/16 14:23, Serkan Çoban wrote: Has the rebalance operation finished successfully ? has it skipped any files ? Yes according to gluster v rebalance status it is completed without any errors. rebalance status report is like: Node Rebalanced files size Scanned failures skipped 1.1.1.185 158 29GB 1720 0 314 1.1.1.20593 46.5GB 761 0 95 1.1.1.22574 37GB 779 0 94 All other hosts has 0 values. I double check that files with '-T' attributes are there, maybe some of them deleted but I still see them in bricks... I am also concerned why part files not distributed to all 60 nodes? Rebalance should do that? On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez wrote: Hi Serkan, On 21/04/16 12:39, Serkan Çoban wrote: I started a gluster v rebalance v0 start command hoping that it will equally redistribute files across 60 nodes but it did not do that... why it did not redistribute files? any thoughts? Has the rebalance operation finished successfully ? has it skipped any files ? After a successful rebalance all files with attributes '-T' should have disappeared. On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez wrote: Hi Serkan, On 21/04/16 10:07, Serkan Çoban wrote: I think the problem is in the temporary name that distcp gives to the file while it's being copied before renaming it to the real name. Do you know what is the structure of this name ? Distcp temporary file name format is: ".distcp.tmp.attempt_1460381790773_0248_m_01_0" and the same temporary file name used by one map process. For example I see in the logs that one map copies files part-m-00031,part-m-00047,part-m-00063 sequentially and they all use same temporary file name above. So no original file name appears in temporary file name. This explains the problem. With the default options, DHT sends all files to the subvolume that should store a file named 'distcp.tmp'. With this temporary name format, little can be done. I will check if we can modify distcp behaviour, or we have to write our mapreduce procedures instead of using distcp. 2. define the option 'extra-hash-regex' to an expression that matches your temporary file names and returns the same name that will finally have. Depending on the differences between original and temporary file names, this option could be useless. 3. set the option 'rsync-hash-regex' to 'none'. This will prevent the name conversion, so the files will be evenly distributed. However this will cause a lot of files placed in incorrect subvolumes, creating a lot of link files until a rebalance is executed. How can I set these options? You can set gluster options using: gluster volume set for example: gluster volume set v0 rsync-hash-regex none Xavi On Thu, Apr 21, 2016 at 10:00 AM, Xavier Hernandez wrote: Hi Serkan, I think the problem is in the temporary name that distcp gives to the file while it's being copied before renaming it to the real name. Do you know what is the structure of this name ? DHT selects the subvolume (in this case the ec set) on which the file will be stored based on the name of the file. This has a problem when a file is being renamed, because this could change the subvolume where the file should be found. DHT has a feature to avoid incorrect file placements when executing renames for the rsync case. What it does is to check if the file matches the following regular expression: ^\.(.+)\.[^.]+$ If a match is found, it only considers the part between parenthesis to calculate the destination subvolume. This is useful for rsync because temporary file names are constructed in the following way: suppose the original filename is 'test'. The temporary filename while rsync is being executed is made by prepending a dot and appending '.': .test.712hd As you can see, the original name and the part of the name between parenthesis that matches the regular expression are the same. This causes that, after renaming the temporary file to its original filename, both files will be considered to belong to the same subvolume by DHT. In your case it's very probable that distcp uses a temporary name like '.part.'. In this case the portion of the name used to select the subvolume is always 'part'. This would explain why all