Re: [Gluster-users] self service snapshot access broken with 3.7.11

2016-04-22 Thread FNU Raghavendra Manjunath
Hi Alastair,

Can you please provide the snap daemon logs. It is present in
/var/log/glusterfs/snaps/snapd.log.

Provide the snapd logs of the node from which you have mounted the volume
(i.e. the node whose ip address/hostname you have given while mounting the
volume).

Regards,
Raghavendra



On Fri, Apr 22, 2016 at 5:19 PM, Alastair Neil 
wrote:

> I just upgraded my cluster to 3.7.11 from 3.7.10 and access to the .snaps
> directories now fail with
>
> bash: cd: .snaps: Transport endpoint is not connected
>
>
> in the volume log file on the client I see:
>
> 016-04-22 21:08:28.005854] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
>> 2-homes-snapd-client: changing port to 49493 (from 0)
>> [2016-04-22 21:08:28.009558] E [socket.c:2278:socket_connect_finish]
>> 2-homes-snapd-client: connection to xx.xx.xx.xx.xx:49493 failed (No route
>> to host)
>
>
> I'm quite perplexed, now it's not a network issue or DNS as far as I can
> tell, the glusterfs client is working fine, and the gluster servers all
> resolve ok.  It seems to be happening on all the clients I have tried
> different systems with 3.7.8, 3.7.10, and 3.7.11 version clients and see
> the same failure on all of them.
>
> On the servers the snapshots are being taken as expected and they are
> started:
>
> Snapshot  :
>> Scheduled-Homes_Hourly-homes_GMT-2016.04.22-16.00.01
>> Snap UUID : 91ba50b0-d8f2-4135-9ea5-edfdfe2ce61d
>> Created   : 2016-04-22 16:00:01
>> Snap Volumes:
>> Snap Volume Name  : 5170144102814026a34f8f948738406f
>> Origin Volume name: homes
>> Snaps taken for homes  : 16
>> Snaps available for homes  : 240
>> Status: Started
>
>
>
> the homes volume is replica 3 all the peers are up and so are all the
> bricks and services:
>
> glv status homes
>> Status of volume: homes
>> Gluster process TCP Port  RDMA Port  Online
>>  Pid
>>
>> --
>> Brick gluster-2:/export/brick2/home 49171 0  Y
>> 38298
>> Brick gluster0:/export/brick2/home  49154 0  Y
>> 23519
>> Brick gluster1.vsnet.gmu.edu:/export/brick2
>> /home   49154 0  Y
>> 23794
>> Snapshot Daemon on localhost49486 0  Y
>> 23699
>> NFS Server on localhost 2049  0  Y
>> 23486
>> Self-heal Daemon on localhost   N/A   N/AY
>> 23496
>> Snapshot Daemon on gluster-249261 0  Y
>> 38479
>> NFS Server on gluster-2 2049  0  Y
>> 39640
>> Self-heal Daemon on gluster-2   N/A   N/AY
>> 39709
>> Snapshot Daemon on gluster1 49480 0  Y
>> 23982
>> NFS Server on gluster1  2049  0  Y
>> 23766
>> Self-heal Daemon on gluster1N/A   N/AY
>> 23776
>>
>> Task Status of Volume homes
>>
>> --
>> There are no active volume tasks
>
>
> I'd appreciate any ideas about troubleshooting this.  I tried disable
> .snaps access on the volume and re-enabling it but is made no difference.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] self service snapshot access broken with 3.7.11

2016-04-22 Thread Alastair Neil
I just upgraded my cluster to 3.7.11 from 3.7.10 and access to the .snaps
directories now fail with

bash: cd: .snaps: Transport endpoint is not connected


in the volume log file on the client I see:

016-04-22 21:08:28.005854] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 2-homes-snapd-client: changing port to 49493 (from 0)
> [2016-04-22 21:08:28.009558] E [socket.c:2278:socket_connect_finish]
> 2-homes-snapd-client: connection to xx.xx.xx.xx.xx:49493 failed (No route
> to host)


I'm quite perplexed, now it's not a network issue or DNS as far as I can
tell, the glusterfs client is working fine, and the gluster servers all
resolve ok.  It seems to be happening on all the clients I have tried
different systems with 3.7.8, 3.7.10, and 3.7.11 version clients and see
the same failure on all of them.

On the servers the snapshots are being taken as expected and they are
started:

Snapshot  :
> Scheduled-Homes_Hourly-homes_GMT-2016.04.22-16.00.01
> Snap UUID : 91ba50b0-d8f2-4135-9ea5-edfdfe2ce61d
> Created   : 2016-04-22 16:00:01
> Snap Volumes:
> Snap Volume Name  : 5170144102814026a34f8f948738406f
> Origin Volume name: homes
> Snaps taken for homes  : 16
> Snaps available for homes  : 240
> Status: Started



the homes volume is replica 3 all the peers are up and so are all the
bricks and services:

glv status homes
> Status of volume: homes
> Gluster process TCP Port  RDMA Port  Online
>  Pid
>
> --
> Brick gluster-2:/export/brick2/home 49171 0  Y
> 38298
> Brick gluster0:/export/brick2/home  49154 0  Y
> 23519
> Brick gluster1.vsnet.gmu.edu:/export/brick2
> /home   49154 0  Y
> 23794
> Snapshot Daemon on localhost49486 0  Y
> 23699
> NFS Server on localhost 2049  0  Y
> 23486
> Self-heal Daemon on localhost   N/A   N/AY
> 23496
> Snapshot Daemon on gluster-249261 0  Y
> 38479
> NFS Server on gluster-2 2049  0  Y
> 39640
> Self-heal Daemon on gluster-2   N/A   N/AY
> 39709
> Snapshot Daemon on gluster1 49480 0  Y
> 23982
> NFS Server on gluster1  2049  0  Y
> 23766
> Self-heal Daemon on gluster1N/A   N/AY
> 23776
>
> Task Status of Volume homes
>
> --
> There are no active volume tasks


I'd appreciate any ideas about troubleshooting this.  I tried disable
.snaps access on the volume and re-enabling it but is made no difference.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] What is the corresponding op-version for a glusterfs release?

2016-04-22 Thread Atin Mukherjee
-Atin
Sent from one plus one
On 22-Apr-2016 8:04 pm, "Dj Merrill"  wrote:
>
> On 04/20/2016 07:32 PM, Atin Mukherjee wrote:
> > Unfortunately there is no such document. But I can take you through
> > couple of code files [1] [2] where the first one defines all the volume
> > tunables and their respective supported op-version where the later has
> > the exact number of all those version variables.
> >
> > [1]
> >
https://github.com/gluster/glusterfs/blob/release-3.7/xlators/mgmt/glusterd/src/glusterd-volume-set.c
> > [2]
> >
https://github.com/gluster/glusterfs/blob/release-3.7/libglusterfs/src/globals.h
> >
> > ~Atin
>
>
> Thanks, Atin, this is very helpful!
>
> Looks like I have some research to do to figure out if any of the
> features released since op-version=2 would be useful for us.
>
> Is there any documentation outlining "recommended" settings for a 2
> server replicated setup running the latest version of Gluster?
Nothing as such, ensure quorum is not enabled.
>
> Thanks,
>
> -Dj
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] What is the corresponding op-version for a glusterfs release?

2016-04-22 Thread Dj Merrill
On 04/20/2016 07:32 PM, Atin Mukherjee wrote:
> Unfortunately there is no such document. But I can take you through
> couple of code files [1] [2] where the first one defines all the volume
> tunables and their respective supported op-version where the later has
> the exact number of all those version variables.
> 
> [1]
> https://github.com/gluster/glusterfs/blob/release-3.7/xlators/mgmt/glusterd/src/glusterd-volume-set.c
> [2]
> https://github.com/gluster/glusterfs/blob/release-3.7/libglusterfs/src/globals.h
> 
> ~Atin


Thanks, Atin, this is very helpful!

Looks like I have some research to do to figure out if any of the
features released since op-version=2 would be useful for us.

Is there any documentation outlining "recommended" settings for a 2
server replicated setup running the latest version of Gluster?

Thanks,

-Dj


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] disperse volume file to subvolume mapping

2016-04-22 Thread Serkan Çoban
Scanned files are 1112 only on the node the rebalance command run, all
other fields are 0 for every nodes.
If the issue is happening because of temp file name, we will make sure
not to use temp files while using gluster.

On Fri, Apr 22, 2016 at 9:43 AM, Xavier Hernandez  wrote:
> Even the number of scanned files is 0 ?
>
> This seems an issue with DHT. I'm not an expert on this area. Not sure if
> the regular expression pattern that some files still match could cause an
> interference with rebalance.
>
> Anyway, if you have found a solution for your use case, it's ok for me.
>
> Best regards,
>
> Xavi
>
>
> On 22/04/16 08:24, Serkan Çoban wrote:
>>
>> Not only skipped column but all columns are 0 in rebalance status
>> command. It seems rebalance does not to anything. All '-T'
>> files are there. Anyway we wrote our custom mapreduce tool and it is
>> copying files right now to gluster and it is utilizing all 60 nodes as
>> expected. I will delete distcp folder and continue if you don't need
>> any further log/debug files to examine the issue.
>>
>> Thanks for help,
>> Serkan
>>
>> On Fri, Apr 22, 2016 at 9:15 AM, Xavier Hernandez 
>> wrote:
>>>
>>> When you execute a rebalance 'force' the skipped column should be 0 for
>>> all
>>> nodes and all '-T' files must have disappeared. Otherwise
>>> something
>>> failed. Is this true in your case ?
>>>
>>>
>>> On 21/04/16 15:19, Serkan Çoban wrote:


 Same result. Also checked the rebalance.log file, it has also no
 reference to part files...

 On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandez
 
 wrote:
>
>
> Can you try a 'gluster volume rebalance v0 start force' ?
>
>
> On 21/04/16 14:23, Serkan Çoban wrote:
>>>
>>>
>>>
>>> Has the rebalance operation finished successfully ? has it skipped
>>> any
>>> files ?
>>
>>
>>
>> Yes according to gluster v rebalance status it is completed without
>> any
>> errors.
>> rebalance status report is like:
>> Node Rebalanced files   size   Scanned
>> failures  skipped
>> 1.1.1.185   158  29GB 1720
>> 0   314
>> 1.1.1.20593   46.5GB   761
>> 0   95
>> 1.1.1.22574   37GB  779
>> 0   94
>>
>>
>> All other hosts has 0 values.
>>
>> I double check that files with '-T' attributes are there,
>> maybe some of them deleted but I still see them in bricks...
>> I am also concerned why part files not distributed to all 60 nodes?
>> Rebalance should do that?
>>
>> On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez
>> 
>> wrote:
>>>
>>>
>>>
>>> Hi Serkan,
>>>
>>> On 21/04/16 12:39, Serkan Çoban wrote:




 I started a gluster v rebalance v0 start command hoping that it will
 equally redistribute files across 60 nodes but it did not do that...
 why it did not redistribute files? any thoughts?
>>>
>>>
>>>
>>>
>>>
>>> Has the rebalance operation finished successfully ? has it skipped
>>> any
>>> files
>>> ?
>>>
>>> After a successful rebalance all files with attributes '-T'
>>> should
>>> have disappeared.
>>>
>>>

 On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez
  wrote:
>
>
>
>
> Hi Serkan,
>
> On 21/04/16 10:07, Serkan Çoban wrote:
>>>
>>>
>>>
>>>
>>>
>>> I think the problem is in the temporary name that distcp gives to
>>> the
>>> file while it's being copied before renaming it to the real name.
>>> Do
>>> you
>>> know what is the structure of this name ?
>>
>>
>>
>>
>>
>> Distcp temporary file name format is:
>> ".distcp.tmp.attempt_1460381790773_0248_m_01_0" and the same
>> temporary file name used by one map process. For example I see in
>> the
>> logs that one map copies files
>> part-m-00031,part-m-00047,part-m-00063
>> sequentially and they all use same temporary file name above. So
>> no
>> original file name appears in temporary file name.
>
>
>
>
>
>
> This explains the problem. With the default options, DHT sends all
> files
> to
> the subvolume that should store a file named 'distcp.tmp'.
>
> With this temporary name format, little can be done.
>
>>
>> I will check if we can modify distcp behaviour, or we have to

Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

2016-04-22 Thread Xavier Hernandez
Some time ago I saw an issue with Gluster-NFS combined with disperse 
under high write load. I thought that it was already solved, but this 
issue is very similar.


The problem seemed to be related to multithreaded epoll and throttling. 
For some reason NFS was sending a massive amount of requests, ignoring 
the throttling threshold. This caused the NFS connection to be 
unresponsive. This combined with a held lock at the time of the hung 
causes it to never be released, blocking other clients.


Maybe it's not related to this problem, but I though it could be 
important to consider it.


Xavi

On 22/04/16 08:19, Ashish Pandey wrote:


Hi Chen,

I thought I replied to your previous mail.
This issue has been faced by other users also. Serkan is the one if you
follow his mail on gluster-user.

I still have to dig further into it.  Soon we will try to reproduce it
and debug it.
My observation is that we face this issue while IO is going on and one
of the server gets disconnect and reconnects.
This incident might happen because of update or network issue.
But in any way we should not come to this situation.

I am adding Pranith  and Xavi who can address any unanswered queries and
explanation.

-
Ashish


*From: *"Chen Chen" 
*To: *"Joe Julian" , "Ashish Pandey"

*Cc: *"Gluster Users" 
*Sent: *Friday, April 22, 2016 8:28:48 AM
*Subject: *Re: [Gluster-users] Need some help on Mismatching xdata /
Failed combine iatt / Too many fd

Hi Ashish,

Are you still watching this thread? I got no response after I sent the
info you requested. Also, could anybody explain what heal-lock is doing?

I got another inode lock yesterday. Only one lock occured in the whole
12 bricks, yet it stopped the cluster from working again. None of my
peer's OS is frozen, and this time "start force" worked.

--
[xlator.features.locks.mainvol-locks.inode]
path=/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf
mandatory=0
inodelk-count=2
lock-dump.domain.domain=mainvol-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
1, owner=dc3dbfac887f, client=0x7f649835adb0,
connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0,
granted at 2016-04-21 11:45:30
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
1, owner=d433bfac887f, client=0x7f649835adb0,
connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0,
blocked at 2016-04-21 11:45:33
--

I've also filed a bug report on bugzilla.
https://bugzilla.redhat.com/show_bug.cgi?id=1329466

Best regards,
Chen

On 4/13/2016 10:31 PM, Joe Julian wrote:
 >
 >
 > On 04/13/2016 03:29 AM, Ashish Pandey wrote:
 >> Hi Chen,
 >>
 >> What do you mean by "instantly get inode locked and teared down
 >> the whole cluster" ? Do you mean that whole disperse volume became
 >> unresponsive?
 >>
 >> I don't have much idea about features.lock-heal so can't comment how
 >> can it help you.
 >
 > So who should get added to this email that would have an idea? Let's get
 > that person looped in.
 >
 >>
 >> Could you please explain second part of your mail? What exactly are
 >> you trying to do and what is the setup?
 >> Also volume info, logs statedumps might help.
 >>
 >> -
 >> Ashish
 >>
 >>
 >> 
 >> *From: *"Chen Chen" 
 >> *To: *"Ashish Pandey" 
 >> *Cc: *gluster-users@gluster.org
 >> *Sent: *Wednesday, April 13, 2016 3:26:53 PM
 >> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata /
 >> Failed combine iatt / Too many fd
 >>
 >> Hi Ashish and other Gluster Users,
 >>
 >> When I put some heavy IO load onto my cluster (a rsync operation,
 >> ~600MB/s), one of the node instantly get inode locked and teared down
 >> the whole cluster. I've already turned on "features.lock-heal" but it
 >> didn't help.
 >>
 >> My clients is using a round-robin tactic to mount servers, hoping to
 >> average the pressure. Could it be caused by a race between NFS servers
 >> on different nodes? Should I instead create a dedicated NFS Server with
 >> huge memory, no brick, and multiple Ethernet cables?
 >>
 >> I really appreciate any help from you guys.
 >>
 >> Best wishes,
 >> Chen
 >>
 >> PS. Don't know why the native fuse client is 5 times inferior than the
 >> old good NFSv3.
 >>
 >> On 4/4/2016 6:11 PM, Ashish Pandey wrote:
 >> > Hi Chen,
 >> >
 >> > As I suspected, there are many blocked call for inodelk in
 >> sm11/mnt-disk1-mainvol.31115.dump.1459760675.
 >> >
 >> > =
 >> > [xlator.features.locks.mainvol-locks.inode]
 >> > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar
 >> > mandatory=0
 >> > inodelk-count=4
 >> > lock-dump.domain.domain=mainvol-disperse-0:self-heal
 >> > 

Re: [Gluster-users] disperse volume file to subvolume mapping

2016-04-22 Thread Xavier Hernandez

Even the number of scanned files is 0 ?

This seems an issue with DHT. I'm not an expert on this area. Not sure 
if the regular expression pattern that some files still match could 
cause an interference with rebalance.


Anyway, if you have found a solution for your use case, it's ok for me.

Best regards,

Xavi

On 22/04/16 08:24, Serkan Çoban wrote:

Not only skipped column but all columns are 0 in rebalance status
command. It seems rebalance does not to anything. All '-T'
files are there. Anyway we wrote our custom mapreduce tool and it is
copying files right now to gluster and it is utilizing all 60 nodes as
expected. I will delete distcp folder and continue if you don't need
any further log/debug files to examine the issue.

Thanks for help,
Serkan

On Fri, Apr 22, 2016 at 9:15 AM, Xavier Hernandez  wrote:

When you execute a rebalance 'force' the skipped column should be 0 for all
nodes and all '-T' files must have disappeared. Otherwise something
failed. Is this true in your case ?


On 21/04/16 15:19, Serkan Çoban wrote:


Same result. Also checked the rebalance.log file, it has also no
reference to part files...

On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandez 
wrote:


Can you try a 'gluster volume rebalance v0 start force' ?


On 21/04/16 14:23, Serkan Çoban wrote:



Has the rebalance operation finished successfully ? has it skipped any
files ?



Yes according to gluster v rebalance status it is completed without any
errors.
rebalance status report is like:
Node Rebalanced files   size   Scanned
failures  skipped
1.1.1.185   158  29GB 1720
0   314
1.1.1.20593   46.5GB   761
0   95
1.1.1.22574   37GB  779
0   94


All other hosts has 0 values.

I double check that files with '-T' attributes are there,
maybe some of them deleted but I still see them in bricks...
I am also concerned why part files not distributed to all 60 nodes?
Rebalance should do that?

On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez

wrote:



Hi Serkan,

On 21/04/16 12:39, Serkan Çoban wrote:




I started a gluster v rebalance v0 start command hoping that it will
equally redistribute files across 60 nodes but it did not do that...
why it did not redistribute files? any thoughts?





Has the rebalance operation finished successfully ? has it skipped any
files
?

After a successful rebalance all files with attributes '-T'
should
have disappeared.




On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez
 wrote:




Hi Serkan,

On 21/04/16 10:07, Serkan Çoban wrote:





I think the problem is in the temporary name that distcp gives to
the
file while it's being copied before renaming it to the real name.
Do
you
know what is the structure of this name ?





Distcp temporary file name format is:
".distcp.tmp.attempt_1460381790773_0248_m_01_0" and the same
temporary file name used by one map process. For example I see in
the
logs that one map copies files
part-m-00031,part-m-00047,part-m-00063
sequentially and they all use same temporary file name above. So no
original file name appears in temporary file name.






This explains the problem. With the default options, DHT sends all
files
to
the subvolume that should store a file named 'distcp.tmp'.

With this temporary name format, little can be done.



I will check if we can modify distcp behaviour, or we have to write
our mapreduce procedures instead of using distcp.


2. define the option 'extra-hash-regex' to an expression that
matches
your temporary file names and returns the same name that will
finally
have.
Depending on the differences between original and temporary file
names,
this
option could be useless.
3. set the option 'rsync-hash-regex' to 'none'. This will prevent
the
name conversion, so the files will be evenly distributed. However
this
will
cause a lot of files placed in incorrect subvolumes, creating a lot
of
link
files until a rebalance is executed.






How can I set these options?






You can set gluster options using:

gluster volume set   

for example:

gluster volume set v0 rsync-hash-regex none

Xavi






On Thu, Apr 21, 2016 at 10:00 AM, Xavier Hernandez
 wrote:





Hi Serkan,

I think the problem is in the temporary name that distcp gives to
the
file
while it's being copied before renaming it to the real name. Do you
know
what is the structure of this name ?

DHT selects the subvolume (in this case the ec set) on which the
file
will
be stored based on the name of the file. This has a problem when a
file
is
being renamed, because this could change the subvolume where the
file
should
be found.

DHT has a feature to avoid incorrect file placements when executing
renames
for the rsync case. What it does is to check if the file matches
the
following 

Re: [Gluster-users] disperse volume file to subvolume mapping

2016-04-22 Thread Serkan Çoban
Not only skipped column but all columns are 0 in rebalance status
command. It seems rebalance does not to anything. All '-T'
files are there. Anyway we wrote our custom mapreduce tool and it is
copying files right now to gluster and it is utilizing all 60 nodes as
expected. I will delete distcp folder and continue if you don't need
any further log/debug files to examine the issue.

Thanks for help,
Serkan

On Fri, Apr 22, 2016 at 9:15 AM, Xavier Hernandez  wrote:
> When you execute a rebalance 'force' the skipped column should be 0 for all
> nodes and all '-T' files must have disappeared. Otherwise something
> failed. Is this true in your case ?
>
>
> On 21/04/16 15:19, Serkan Çoban wrote:
>>
>> Same result. Also checked the rebalance.log file, it has also no
>> reference to part files...
>>
>> On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandez 
>> wrote:
>>>
>>> Can you try a 'gluster volume rebalance v0 start force' ?
>>>
>>>
>>> On 21/04/16 14:23, Serkan Çoban wrote:
>
>
> Has the rebalance operation finished successfully ? has it skipped any
> files ?


 Yes according to gluster v rebalance status it is completed without any
 errors.
 rebalance status report is like:
 Node Rebalanced files   size   Scanned
 failures  skipped
 1.1.1.185   158  29GB 1720
 0   314
 1.1.1.20593   46.5GB   761
 0   95
 1.1.1.22574   37GB  779
0   94


 All other hosts has 0 values.

 I double check that files with '-T' attributes are there,
 maybe some of them deleted but I still see them in bricks...
 I am also concerned why part files not distributed to all 60 nodes?
 Rebalance should do that?

 On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez
 
 wrote:
>
>
> Hi Serkan,
>
> On 21/04/16 12:39, Serkan Çoban wrote:
>>
>>
>>
>> I started a gluster v rebalance v0 start command hoping that it will
>> equally redistribute files across 60 nodes but it did not do that...
>> why it did not redistribute files? any thoughts?
>
>
>
>
> Has the rebalance operation finished successfully ? has it skipped any
> files
> ?
>
> After a successful rebalance all files with attributes '-T'
> should
> have disappeared.
>
>
>>
>> On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez
>>  wrote:
>>>
>>>
>>>
>>> Hi Serkan,
>>>
>>> On 21/04/16 10:07, Serkan Çoban wrote:
>
>
>
>
> I think the problem is in the temporary name that distcp gives to
> the
> file while it's being copied before renaming it to the real name.
> Do
> you
> know what is the structure of this name ?




 Distcp temporary file name format is:
 ".distcp.tmp.attempt_1460381790773_0248_m_01_0" and the same
 temporary file name used by one map process. For example I see in
 the
 logs that one map copies files
 part-m-00031,part-m-00047,part-m-00063
 sequentially and they all use same temporary file name above. So no
 original file name appears in temporary file name.
>>>
>>>
>>>
>>>
>>>
>>> This explains the problem. With the default options, DHT sends all
>>> files
>>> to
>>> the subvolume that should store a file named 'distcp.tmp'.
>>>
>>> With this temporary name format, little can be done.
>>>

 I will check if we can modify distcp behaviour, or we have to write
 our mapreduce procedures instead of using distcp.

> 2. define the option 'extra-hash-regex' to an expression that
> matches
> your temporary file names and returns the same name that will
> finally
> have.
> Depending on the differences between original and temporary file
> names,
> this
> option could be useless.
> 3. set the option 'rsync-hash-regex' to 'none'. This will prevent
> the
> name conversion, so the files will be evenly distributed. However
> this
> will
> cause a lot of files placed in incorrect subvolumes, creating a lot
> of
> link
> files until a rebalance is executed.





 How can I set these options?
>>>
>>>
>>>
>>>
>>>
>>> You can set gluster options using:
>>>
>>> gluster volume set   
>>>
>>> for example:
>>>
>>> gluster volume set v0 rsync-hash-regex none
>>>
>>> Xavi
>>>
>>>

Re: [Gluster-users] Need some help on Mismatching xdata / Failed combine iatt / Too many fd

2016-04-22 Thread Ashish Pandey

Hi Chen, 

I thought I replied to your previous mail. 
This issue has been faced by other users also. Serkan is the one if you follow 
his mail on gluster-user. 

I still have to dig further into it. Soon we will try to reproduce it and debug 
it. 
My observation is that we face this issue while IO is going on and one of the 
server gets disconnect and reconnects. 
This incident might happen because of update or network issue. 
But in any way we should not come to this situation. 

I am adding Pranith and Xavi who can address any unanswered queries and 
explanation. 

- 
Ashish 

- Original Message -

From: "Chen Chen"  
To: "Joe Julian" , "Ashish Pandey"  
Cc: "Gluster Users"  
Sent: Friday, April 22, 2016 8:28:48 AM 
Subject: Re: [Gluster-users] Need some help on Mismatching xdata / Failed 
combine iatt / Too many fd 

Hi Ashish, 

Are you still watching this thread? I got no response after I sent the 
info you requested. Also, could anybody explain what heal-lock is doing? 

I got another inode lock yesterday. Only one lock occured in the whole 
12 bricks, yet it stopped the cluster from working again. None of my 
peer's OS is frozen, and this time "start force" worked. 

-- 
[xlator.features.locks.mainvol-locks.inode] 
path=/NTD/variants_calling/primary_gvcf/A2612/13.g.vcf
 
mandatory=0 
inodelk-count=2 
lock-dump.domain.domain=mainvol-disperse-0 
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid = 
1, owner=dc3dbfac887f, client=0x7f649835adb0, 
connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, 
granted at 2016-04-21 11:45:30 
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid = 
1, owner=d433bfac887f, client=0x7f649835adb0, 
connection-id=hw10-6664-2016/04/17-14:47:58:6629-mainvol-client-0-0, 
blocked at 2016-04-21 11:45:33 
-- 

I've also filed a bug report on bugzilla. 
https://bugzilla.redhat.com/show_bug.cgi?id=1329466 

Best regards, 
Chen 

On 4/13/2016 10:31 PM, Joe Julian wrote: 
> 
> 
> On 04/13/2016 03:29 AM, Ashish Pandey wrote: 
>> Hi Chen, 
>> 
>> What do you mean by "instantly get inode locked and teared down 
>> the whole cluster" ? Do you mean that whole disperse volume became 
>> unresponsive? 
>> 
>> I don't have much idea about features.lock-heal so can't comment how 
>> can it help you. 
> 
> So who should get added to this email that would have an idea? Let's get 
> that person looped in. 
> 
>> 
>> Could you please explain second part of your mail? What exactly are 
>> you trying to do and what is the setup? 
>> Also volume info, logs statedumps might help. 
>> 
>> - 
>> Ashish 
>> 
>> 
>>  
>> *From: *"Chen Chen"  
>> *To: *"Ashish Pandey"  
>> *Cc: *gluster-users@gluster.org 
>> *Sent: *Wednesday, April 13, 2016 3:26:53 PM 
>> *Subject: *Re: [Gluster-users] Need some help on Mismatching xdata / 
>> Failed combine iatt / Too many fd 
>> 
>> Hi Ashish and other Gluster Users, 
>> 
>> When I put some heavy IO load onto my cluster (a rsync operation, 
>> ~600MB/s), one of the node instantly get inode locked and teared down 
>> the whole cluster. I've already turned on "features.lock-heal" but it 
>> didn't help. 
>> 
>> My clients is using a round-robin tactic to mount servers, hoping to 
>> average the pressure. Could it be caused by a race between NFS servers 
>> on different nodes? Should I instead create a dedicated NFS Server with 
>> huge memory, no brick, and multiple Ethernet cables? 
>> 
>> I really appreciate any help from you guys. 
>> 
>> Best wishes, 
>> Chen 
>> 
>> PS. Don't know why the native fuse client is 5 times inferior than the 
>> old good NFSv3. 
>> 
>> On 4/4/2016 6:11 PM, Ashish Pandey wrote: 
>> > Hi Chen, 
>> > 
>> > As I suspected, there are many blocked call for inodelk in 
>> sm11/mnt-disk1-mainvol.31115.dump.1459760675. 
>> > 
>> > = 
>> > [xlator.features.locks.mainvol-locks.inode] 
>> > path=/home/analyzer/softs/bin/GenomeAnalysisTK.jar 
>> > mandatory=0 
>> > inodelk-count=4 
>> > lock-dump.domain.domain=mainvol-disperse-0:self-heal 
>> > lock-dump.domain.domain=mainvol-disperse-0 
>> > inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid 
>> = 1, owner=dc2d3dfcc57f, client=0x7ff03435d5f0, 
>> connection-id=sm12-8063-2016/04/01-07:51:46:892384-mainvol-client-0-0-0, 
>> blocked at 2016-04-01 16:52:58, granted at 2016-04-01 16:52:58 
>> > inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
>> pid = 1, owner=1414371e1a7f, client=0x7ff034204490, 
>> connection-id=hw10-17315-2016/04/01-07:51:44:421807-mainvol-client-0-0-0, 
>> blocked at 2016-04-01 16:58:51 
>> > inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, 
>> pid = 1, owner=a8eb14cd9b7f, 

Re: [Gluster-users] disperse volume file to subvolume mapping

2016-04-22 Thread Xavier Hernandez
When you execute a rebalance 'force' the skipped column should be 0 for 
all nodes and all '-T' files must have disappeared. Otherwise 
something failed. Is this true in your case ?


On 21/04/16 15:19, Serkan Çoban wrote:

Same result. Also checked the rebalance.log file, it has also no
reference to part files...

On Thu, Apr 21, 2016 at 3:34 PM, Xavier Hernandez  wrote:

Can you try a 'gluster volume rebalance v0 start force' ?


On 21/04/16 14:23, Serkan Çoban wrote:


Has the rebalance operation finished successfully ? has it skipped any
files ?


Yes according to gluster v rebalance status it is completed without any
errors.
rebalance status report is like:
Node Rebalanced files   size   Scanned
failures  skipped
1.1.1.185   158  29GB 1720
0   314
1.1.1.20593   46.5GB   761
0   95
1.1.1.22574   37GB  779
   0   94


All other hosts has 0 values.

I double check that files with '-T' attributes are there,
maybe some of them deleted but I still see them in bricks...
I am also concerned why part files not distributed to all 60 nodes?
Rebalance should do that?

On Thu, Apr 21, 2016 at 1:55 PM, Xavier Hernandez 
wrote:


Hi Serkan,

On 21/04/16 12:39, Serkan Çoban wrote:



I started a gluster v rebalance v0 start command hoping that it will
equally redistribute files across 60 nodes but it did not do that...
why it did not redistribute files? any thoughts?




Has the rebalance operation finished successfully ? has it skipped any
files
?

After a successful rebalance all files with attributes '-T'
should
have disappeared.




On Thu, Apr 21, 2016 at 11:24 AM, Xavier Hernandez
 wrote:



Hi Serkan,

On 21/04/16 10:07, Serkan Çoban wrote:




I think the problem is in the temporary name that distcp gives to the
file while it's being copied before renaming it to the real name. Do
you
know what is the structure of this name ?




Distcp temporary file name format is:
".distcp.tmp.attempt_1460381790773_0248_m_01_0" and the same
temporary file name used by one map process. For example I see in the
logs that one map copies files part-m-00031,part-m-00047,part-m-00063
sequentially and they all use same temporary file name above. So no
original file name appears in temporary file name.





This explains the problem. With the default options, DHT sends all
files
to
the subvolume that should store a file named 'distcp.tmp'.

With this temporary name format, little can be done.



I will check if we can modify distcp behaviour, or we have to write
our mapreduce procedures instead of using distcp.


2. define the option 'extra-hash-regex' to an expression that matches
your temporary file names and returns the same name that will finally
have.
Depending on the differences between original and temporary file
names,
this
option could be useless.
3. set the option 'rsync-hash-regex' to 'none'. This will prevent the
name conversion, so the files will be evenly distributed. However
this
will
cause a lot of files placed in incorrect subvolumes, creating a lot
of
link
files until a rebalance is executed.





How can I set these options?





You can set gluster options using:

gluster volume set   

for example:

gluster volume set v0 rsync-hash-regex none

Xavi






On Thu, Apr 21, 2016 at 10:00 AM, Xavier Hernandez
 wrote:




Hi Serkan,

I think the problem is in the temporary name that distcp gives to the
file
while it's being copied before renaming it to the real name. Do you
know
what is the structure of this name ?

DHT selects the subvolume (in this case the ec set) on which the file
will
be stored based on the name of the file. This has a problem when a
file
is
being renamed, because this could change the subvolume where the file
should
be found.

DHT has a feature to avoid incorrect file placements when executing
renames
for the rsync case. What it does is to check if the file matches the
following regular expression:

^\.(.+)\.[^.]+$

If a match is found, it only considers the part between parenthesis
to
calculate the destination subvolume.

This is useful for rsync because temporary file names are constructed
in
the
following way: suppose the original filename is 'test'. The temporary
filename while rsync is being executed is made by prepending a dot
and
appending '.': .test.712hd

As you can see, the original name and the part of the name between
parenthesis that matches the regular expression are the same. This
causes
that, after renaming the temporary file to its original filename,
both
files
will be considered to belong to the same subvolume by DHT.

In your case it's very probable that distcp uses a temporary name
like
'.part.'. In this case the portion of the name used to select
the
subvolume is always 'part'. This would explain why all