Re: [Gluster-users] Issue in Adding/Removing the gluster node

2016-02-22 Thread ABHISHEK PALIWAL
Hi Gaurav,

In my case we are removing the brick in the offline state with the force
option like in the following way:



*gluster volume remove-brick %s replica 1 %s:%s force --mode=script*
but still getting the failure or remove-brick

it seems that brick is not present which we are trying to remove here are
the log snippet of both of the boards


*1st board:*
# gluster volume info
status
gluster volume status c_glusterfs
Volume Name: c_glusterfs
Type: Replicate
Volume ID: 32793e91-6f88-4f29-b3e4-0d53d02a4b99
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.32.0.48:/opt/lvmdir/c2/brick
Brick2: 10.32.1.144:/opt/lvmdir/c2/brick
Options Reconfigured:
nfs.disable: on
network.ping-timeout: 4
performance.readdir-ahead: on
# gluster peer status
Number of Peers: 1

Hostname: 10.32.1.144
Uuid: b88c74b9-457d-4864-9fe6-403f6934d7d1
State: Peer in Cluster (Connected)
# gluster volume status c_glusterfs
Status of volume: c_glusterfs
Gluster process TCP Port  RDMA Port  Online
Pid
--

Brick 10.32.0.48:/opt/lvmdir/c2/brick   49153 0  Y
2537
Self-heal Daemon on localhost   N/A   N/AY
5577
Self-heal Daemon on 10.32.1.144 N/A   N/AY
3850

Task Status of Volume c_glusterfs
--

There are no active volume tasks

*2nd Board*:

# gluster volume info
status
gluster volume status c_glusterfs
gluster volume heal c_glusterfs info

Volume Name: c_glusterfs
Type: Replicate
Volume ID: 32793e91-6f88-4f29-b3e4-0d53d02a4b99
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.32.0.48:/opt/lvmdir/c2/brick
Brick2: 10.32.1.144:/opt/lvmdir/c2/brick
Options Reconfigured:
performance.readdir-ahead: on
network.ping-timeout: 4
nfs.disable: on
# gluster peer status
Number of Peers: 1

Hostname: 10.32.0.48
Uuid: e7c4494e-aa04-4909-81c9-27a462f6f9e7
State: Peer in Cluster (Connected)
# gluster volume status c_glusterfs
Status of volume: c_glusterfs
Gluster process TCP Port  RDMA Port  Online
Pid
--

Brick 10.32.0.48:/opt/lvmdir/c2/brick   49153 0  Y
2537
Self-heal Daemon on localhost   N/A   N/AY
3850
Self-heal Daemon on 10.32.0.48  N/A   N/AY
5577

Task Status of Volume c_glusterfs
--

There are no active volume tasks

Do you know why these logs are not showing the Brick info at the time of
gluster volume status.
Because we are not able to collect the logs of cmd_history.log file from
the 2nd board.

Regards,
Abhishek


On Tue, Feb 23, 2016 at 12:02 PM, Gaurav Garg  wrote:

> Hi abhishek,
>
> >> Can we perform remove-brick operation on the offline brick? what is the
> meaning of offline and online brick?
>
> No, you can't perform remove-brick operation on the offline brick. brick
> is offline means brick process is not running. you can see it by executing
> #gluster volume status. If brick is offline then respective brick will show
> "N" entry in Online column of #gluster volume status command. Alternatively
> you can also check whether glusterfsd process for that brick is running or
> not by executing #ps aux | grep glusterfsd, this command will list out all
> the brick process you can filter out from them, which one is online, which
> one is not.
>
> But if you want to perform remove-brick operation on the offline brick
> then you need to execute it with force option. #gluster volume remove-brick
>  hostname:/brick_name force. This might lead to data loss.
>
>
>
> >> Also, Is there any logic in gluster through which we can check the
> connectivity of node established or not before performing the any operation
> on brick?
>
> Yes, you can check it by executing #gluster peer status command.
>
>
> Thanks,
>
> ~Gaurav
>
>
> - Original Message -
> From: "ABHISHEK PALIWAL" 
> To: "Gaurav Garg" 
> Cc: gluster-users@gluster.org
> Sent: Tuesday, February 23, 2016 11:50:43 AM
> Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node
>
> Hi Gaurav,
>
> one general question related to gluster bricks.
>
> Can we perform remove-brick operation on the offline brick? what is the
> meaning of offline and online brick?
> Also, Is there any logic in gluster through which we can check the
> connectivity of node established or not before performing the any operation
> on brick?
>
> Regards,
> Abhishek
>
> On Mon, Feb 22, 2016 at 2:42 PM, Gaurav Garg  wrote:
>
> > Hi abhishek,
> >
> > I went through your logs of node 1 and by looking glusterd logs its
> > clearly indicate that your 2nd node (10.32.1.144) have disconnected 

Re: [Gluster-users] [ovirt-users] Gluster 3.7.8 from ovirt-3.6-glusterfs-epel breaks vdsm ?

2016-02-22 Thread Sahina Bose



On 02/18/2016 08:33 PM, Kaushal M wrote:

On Thu, Feb 18, 2016 at 6:39 PM, Matteo  wrote:

Seems that the "info" command is broken.

the status command for example seems to work ok.

Matteo

- Il 17-feb-16, alle 14:15, Sahina Bose sab...@redhat.com ha scritto:


[+gluster-users]

Any known compat issues with gluster 3.7.8-1.el7 client packages and
glusterfs 3.7.6-1.el7 server?


There might be some new information that 3.7.8 CLI expects, but I'd
need to check to verify what it is. We don't really test the Gluster
CLI for backwards compatibility, as we expect the CLI to used with
GlusterD on it's own host. So when using the gluster CLI's
--remote-host option, with different versions of CLI and glusterd, the
user should expect some breakage sometime.

With REST support being planned for 3.8, I'd like to retire the
--remote-host option in its entirety. Going further, the user should
only use the REST apis, when attempting to do remote operations.



The --remote-host option is now used by vdsm to query volume info from 
remote gluster servers, and check the replica count as well as querying 
the backup-volfile-servers that can be used for mount.


I think  before this option is retired, we need to change vdsm to use 
the alternative REST API.






On 02/17/2016 04:33 PM, Matteo wrote:

Hi,

today I update one node os, in the updates gluster client
packages where upgraded from 3.7.6-1.el7, centos-ovirt36 repository
to 3.7.8-1.el7, ovirt-3.6-glusterfs-epel and after reboot
the node was marked not operational.

looking into the logs, vdsm was failing to get gluster volume information.

the command (ovirt-storage is the gluster storage where the hosted engine is
kept)

gluster --mode=script volume info --remote-host=gluster1 ovirt-storage --xml

was failing, returning error 2 (and no output)

doing yum downgrade on gluster client packages (back to 3.7.6-1.el7,
centos-ovirt36) fixed everything.

Data nodes are running glusterfs 3.7.6-1.el7.

The funny thing is that from the ovirt I was able to manually mount the
glusterfs shares,
only the volume info command was failing, thus breaking vdsm.

Any hint?

regards,
Matteo



___
Users mailing list
us...@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Error between v3.7.8 and v3.7.0

2016-02-22 Thread Kaushal M
The GlusterFS network layer was changed to use unprivileged (>1024)
ports and allow incoming connections from unprivileged ports by
default in 3.7.3.

What this means is that clients/servers lower than 3.7.3 will not
accept connections from newer clients/servers. 3.7.3 and above will
try to connect using unprivileged ports, which will be rejected from
<=3.7.2.

You can find more information on the issue, and workarounds at
https://www.gluster.org/pipermail/gluster-users/2015-August/023116.html

~kaushal


On Sat, Feb 20, 2016 at 11:18 PM, Atin Mukherjee
 wrote:
> I do not see any mount related failures in the glusterd log you have pasted.
> Ideally if mount request fails it could be either the GlusterD is down or
> the brick processes are down. There'd be an error log entry from
> mgmt_getspec().
>
> The log entries do indicate that the n/w is unstable. If you are still stuck
> could you provide the mount log and glusterd log please along with gluster
> volume info output and mount command semantics?
>
> -Atin
> Sent from one plus one
>
> On 20-Feb-2016 4:21 pm, "Ml Ml"  wrote:
>>
>> Hello List,
>>
>> i am running ovirt (CentOS) on top of glusterfs. I have a 3 Node
>> replica. Versions see below.
>>
>> Looks like i can not get my node1 (v 3.7.8) together with the othet
>> two (v3.7.0). The error i get when i try to " mount -t glusterfs
>> 10.10.3.7:/RaidVolC /mnt/":
>>
>> [2016-02-20 10:27:30.890701] W [socket.c:869:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid
>> argument
>> [2016-02-20 10:27:30.890728] E [socket.c:2965:socket_connect]
>> 0-management: Failed to set keep-alive: Invalid argument
>> [2016-02-20 10:27:30.891296] W [socket.c:588:__socket_rwv]
>> 0-management: readv on 10.10.3.7:24007 failed (No data available)
>> [2016-02-20 10:27:30.891671] E [rpc-clnt.c:362:saved_frames_unwind]
>> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff82c50bab2]
>> (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff82c2d68de]
>> (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff82c2d69ee]
>> (-->
>> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7ff82c2d837a]
>> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7ff82c2d8ba8] )
>> 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1))
>> called at 2016-02-20 10:27:30.891063 (xid=0x35)
>> The message "W [MSGID: 106118]
>> [glusterd-handler.c:5149:__glusterd_peer_rpc_notify] 0-management:
>> Lock not released for RaidVolC" repeated 3 times between [2016-02-20
>> 10:27:24.873207] and [2016-02-20 10:27:27.886916]
>> [2016-02-20 10:27:30.891704] E [MSGID: 106167]
>> [glusterd-handshake.c:2074:__glusterd_peer_dump_version_cbk]
>> 0-management: Error through RPC layer, retry again later
>> [2016-02-20 10:27:30.891871] W
>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>
>> (-->/usr/lib64/glusterfs/3.7.8/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>> [0x7ff821062b9c]
>>
>> -->/usr/lib64/glusterfs/3.7.8/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>> [0x7ff82106ce72]
>>
>> -->/usr/lib64/glusterfs/3.7.8/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>> [0x7ff82110c73a] ) 0-management: Lock for vol RaidVolB not held
>> [2016-02-20 10:27:30.892001] W
>> [glusterd-locks.c:681:glusterd_mgmt_v3_unlock]
>>
>> (-->/usr/lib64/glusterfs/3.7.8/xlator/mgmt/glusterd.so(glusterd_big_locked_notify+0x4c)
>> [0x7ff821062b9c]
>>
>> -->/usr/lib64/glusterfs/3.7.8/xlator/mgmt/glusterd.so(__glusterd_peer_rpc_notify+0x162)
>> [0x7ff82106ce72]
>>
>> -->/usr/lib64/glusterfs/3.7.8/xlator/mgmt/glusterd.so(glusterd_mgmt_v3_unlock+0x58a)
>> [0x7ff82110c73a] ) 0-management: Lock for vol RaidVolC not held
>> The message "W [MSGID: 106118]
>> [glusterd-handler.c:5149:__glusterd_peer_rpc_notify] 0-management:
>> Lock not released for RaidVolB" repeated 3 times between [2016-02-20
>> 10:27:24.877923] and [2016-02-20 10:27:30.891888]
>> [2016-02-20 10:27:30.892023] W [MSGID: 106118]
>> [glusterd-handler.c:5149:__glusterd_peer_rpc_notify] 0-management:
>> Lock not released for RaidVolC
>> [2016-02-20 10:27:30.895617] W [socket.c:869:__socket_keepalive]
>> 0-socket: failed to set TCP_USER_TIMEOUT -1000 on socket 14, Invalid
>> argument
>> [2016-02-20 10:27:30.895641] E [socket.c:2965:socket_connect]
>> 0-management: Failed to set keep-alive: Invalid argument
>> [2016-02-20 10:27:30.896300] W [socket.c:588:__socket_rwv]
>> 0-management: readv on 10.10.1.6:24007 failed (No data available)
>> [2016-02-20 10:27:30.896541] E [rpc-clnt.c:362:saved_frames_unwind]
>> (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff82c50bab2]
>> (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff82c2d68de]
>> (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff82c2d69ee]
>> (-->
>> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7ff82c2d837a]
>> (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7ff82c2d8ba8] )

Re: [Gluster-users] Issue in Adding/Removing the gluster node

2016-02-22 Thread Gaurav Garg
Hi abhishek,

>> Can we perform remove-brick operation on the offline brick? what is the
meaning of offline and online brick?

No, you can't perform remove-brick operation on the offline brick. brick is 
offline means brick process is not running. you can see it by executing 
#gluster volume status. If brick is offline then respective brick will show "N" 
entry in Online column of #gluster volume status command. Alternatively you can 
also check whether glusterfsd process for that brick is running or not by 
executing #ps aux | grep glusterfsd, this command will list out all the brick 
process you can filter out from them, which one is online, which one is not.

But if you want to perform remove-brick operation on the offline brick then you 
need to execute it with force option. #gluster volume remove-brick  
hostname:/brick_name force. This might lead to data loss.



>> Also, Is there any logic in gluster through which we can check the
connectivity of node established or not before performing the any operation
on brick?

Yes, you can check it by executing #gluster peer status command.


Thanks,

~Gaurav


- Original Message -
From: "ABHISHEK PALIWAL" 
To: "Gaurav Garg" 
Cc: gluster-users@gluster.org
Sent: Tuesday, February 23, 2016 11:50:43 AM
Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node

Hi Gaurav,

one general question related to gluster bricks.

Can we perform remove-brick operation on the offline brick? what is the
meaning of offline and online brick?
Also, Is there any logic in gluster through which we can check the
connectivity of node established or not before performing the any operation
on brick?

Regards,
Abhishek

On Mon, Feb 22, 2016 at 2:42 PM, Gaurav Garg  wrote:

> Hi abhishek,
>
> I went through your logs of node 1 and by looking glusterd logs its
> clearly indicate that your 2nd node (10.32.1.144) have disconnected from
> the cluster, because of that remove-brick operation failed. I think you
> need to check your network interface.
>
> But surprising things is that i did not see duplicate peer entry in
> #gluster peer status command output.
>
> May be i will get some more information from your (10.32.1.144) 2nd node
> logs. Could you also attach your 2nd node logs.
>
> after restarting glusterd, are you seeing duplicate peer entry in #gluster
> peer status command output ?
>
> will wait for 2nd node logs for further analyzing duplicate peer entry
> problem.
>
> Thanks,
>
> ~Gaurav
>
> - Original Message -
> From: "ABHISHEK PALIWAL" 
> To: "Gaurav Garg" 
> Cc: gluster-users@gluster.org
> Sent: Monday, February 22, 2016 12:48:55 PM
> Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node
>
> Hi Gaurav,
>
> Here, You can find the attached logs for the boards in case of remove-brick
> failure.
> In these logs we do not have the cmd_history and
> etc-glusterfs-glusterd.vol.log for the second board.
>
> May be for that we need to some more time.
>
>
> Regards,
> Abhishek
>
> On Mon, Feb 22, 2016 at 10:18 AM, Gaurav Garg  wrote:
>
> > Hi Abhishek,
> >
> > >>  I'll provide the required log to you.
> >
> > sure
> >
> > on both node. do "pkill glusterd" and then start glusterd services.
> >
> > Thanks,
> >
> > ~Gaurav
> >
> > - Original Message -
> > From: "ABHISHEK PALIWAL" 
> > To: "Gaurav Garg" 
> > Cc: gluster-users@gluster.org
> > Sent: Monday, February 22, 2016 10:11:48 AM
> > Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node
> >
> > Hi Gaurav,
> >
> > Thanks for your prompt reply.
> >
> > I'll provide the required log to you.
> >
> > As a workaround you suggested that restart the glusterd service. Could
> you
> > please tell me the point where I can do this?
> >
> > Regards,
> > Abhishek
> >
> > On Fri, Feb 19, 2016 at 6:11 PM, Gaurav Garg  wrote:
> >
> > > Hi Abhishek,
> > >
> > > Peer status output looks interesting where it have stale entry,
> > > technically it should not happen. Here few thing need to ask
> > >
> > > Did you perform any manual operation with GlusterFS configuration file
> > > which resides in /var/lib/glusterd/* folder.
> > >
> > > Can you provide output of "ls /var/lib/glusterd/peers"  from both of
> your
> > > nodes.
> > >
> > > Could you provide output of #gluster peer status command when 2nd node
> is
> > > down
> > >
> > > Can you provide output of #gluster volume info command
> > >
> > > Can you provide full logs details of cmd_history.log and
> > > etc-glusterfs-glusterd.vol.log from both the nodes.
> > >
> > >
> > > You can restart your glusterd as of now as a workaround but we need to
> > > analysis this issue further.
> > >
> > > Thanks,
> > > Gaurav
> > >
> > > - Original Message -
> > > From: "ABHISHEK PALIWAL" 
> > > To: "Gaurav Garg" 
> > 

Re: [Gluster-users] Gluster Community Newsletter, February 2016

2016-02-22 Thread Kaushal M
On Tue, Feb 23, 2016 at 1:12 AM, Amye Scavarda  wrote:
> What a busy month this past month for Gluster!
> We’ve got updates from SCaLE, FOSDEM, our Developer Gatherings in
> Brno, DevConf, noteworthy threads from the mailing lists, and upcoming
> events.
> This post is also available on the Gluster blog:
> http://blog.gluster.org/2016/02/gluster-community-newsletter-february-2015/
>
> From SCaLE:
> - Richard Wareing gave a talk at the Southern California Linux Expo
> about Scaling Gluster at Facebook. More at
> https://blog.gluster.org/2016/01/scaling-glusterfs-facebook/
>
> From FOSDEM:
> Humble and Kaushal have posted thoughts on FOSDEM
> http://website-humblec.rhcloud.com/me-fosdem-2016/
> https://kshlm.in/fosdem16/
>
> From Developer Gatherings:
> We had a group of developers gather in Brno ahead of DevConf to
> discuss a number of different Gluster related things.
> Highlights were:
> GD2 with Kaushal - https://public.pad.fsfe.org/p/gluster-gd2-kaushal

We discussed volgen for GD2 in this meeting. I've put up a summary of
the discussion and outcomes in the etherpad. I'll be putting up the
same as a design spec into glusterfs-spec soon and keep it updated as
we progress.

> Heketi & Eventing with Luis - https://public.pad.fsfe.org/p/gluster-heketi
> DHT2 with Venky-  https://public.pad.fsfe.org/p/gluster-4.0-dht2
>
> From DevConf
> Ceph vs Gluster vs Swift: Similarities and Differences
> https://devconfcz2016.sched.org/event/5lze/ceph-vs-gluster-vs-swift-similarities-and-differences
> Prashanth Pai, Thiago da Silva
>
> Automated GlusterFS Volume Management with Heketi
> https://devconfcz2016.sched.org/event/5m0P/automated-glusterfs-volume-management-with-
> heketi   - Luis Pabon
> NFS-Ganesha and Distributed Storage Systems -
> https://devconfcz2016.sched.org/event/5m15/nfs-ganesha-and-distributed-storage-systems
> -   Kaleb S. Keithley
>
> Build your own Scale-Out Storage with Gluster
> https://devconfcz2016.sched.org/event/5m1X/build-your-own-scale-out-storage-with-gluster
> 0 Niels de Vos
>
> Freak show (#2): CTDB -- Scaling The Aliens Back To Outer Space
> https://devconfcz2016.sched.org/event/5m1l/freak-show-2-ctdb-scaling-the-aliens-back-to-outer-space
> Günther Deschner, Michael Adam
>
> oVirt and Gluster Hyperconvergence
> https://devconfcz2016.sched.org/event/5m20/ovirt-and-gluster-hyperconvergence
> Ramesh Nachimuthu
>
> Improvements in gluster for virtualization usecase
> https://devconfcz2016.sched.org/event/5m1p/improvements-in-gluster-for-virtualization-usecase
> Prasanna Kumar Kalever
>
> Test Automation and CI using DiSTAF
> https://devconfcz2016.sched.org/event/5m1U/test-automation-and-ci-using-distaf
> Vishwanath Bhat
> Gluster Developer Gatherings at Brno before DevConf
>
>
>  Noteworthy threads:
>  Soumya Koduri investigates the issue of memory leaks in GlusterFS FUSE
> client and suggests a re-run after application of a few specific
> patches. More at
> .
> Oleksandr reported that it did not make an impact; Xavier confirmed a
> similar issue with 3.7.6 release
> ().
> The thread (at 
> )
> is a good read around the topic of understanding how to work through
> diagnosis and fixes of memory leaks.
>
> Sachidananda provided an update
> ( about the gdeploy v2.0 including design changes
> ( enable modularity and separation of core functionality into
> self-contained unit.
>
> Kyle Harris reported an issue around high I/O and processor
> utilization 
> ().
> Ravishankar, Krutika and Pranith worked with the reporter to identify
> specific ways to address the topic. Pranith indicated that a 3.7.7
> release is coming up soonest
> ()
>
> A query was raised about the 3.6.8 release notes
> ()
> and a suggestion to include them at
>  Niels responded
> stating that the notes should be part of the repository at
> 
> and added the release manager to provide additional detail
>
> Vijay provided an update around the changes being discussed for 3.8
> ().
> The maintainers feel it is worthwhile to include some of the key
> features for 4.0 eg. NSR, dht2, glusterd2.0 as experimental in the
> release; ensure a better component coverage for tests in distaf; add a
> forward compatibility section to all the 

Re: [Gluster-users] Issue in Adding/Removing the gluster node

2016-02-22 Thread ABHISHEK PALIWAL
Hi Gaurav,

one general question related to gluster bricks.

Can we perform remove-brick operation on the offline brick? what is the
meaning of offline and online brick?
Also, Is there any logic in gluster through which we can check the
connectivity of node established or not before performing the any operation
on brick?

Regards,
Abhishek

On Mon, Feb 22, 2016 at 2:42 PM, Gaurav Garg  wrote:

> Hi abhishek,
>
> I went through your logs of node 1 and by looking glusterd logs its
> clearly indicate that your 2nd node (10.32.1.144) have disconnected from
> the cluster, because of that remove-brick operation failed. I think you
> need to check your network interface.
>
> But surprising things is that i did not see duplicate peer entry in
> #gluster peer status command output.
>
> May be i will get some more information from your (10.32.1.144) 2nd node
> logs. Could you also attach your 2nd node logs.
>
> after restarting glusterd, are you seeing duplicate peer entry in #gluster
> peer status command output ?
>
> will wait for 2nd node logs for further analyzing duplicate peer entry
> problem.
>
> Thanks,
>
> ~Gaurav
>
> - Original Message -
> From: "ABHISHEK PALIWAL" 
> To: "Gaurav Garg" 
> Cc: gluster-users@gluster.org
> Sent: Monday, February 22, 2016 12:48:55 PM
> Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node
>
> Hi Gaurav,
>
> Here, You can find the attached logs for the boards in case of remove-brick
> failure.
> In these logs we do not have the cmd_history and
> etc-glusterfs-glusterd.vol.log for the second board.
>
> May be for that we need to some more time.
>
>
> Regards,
> Abhishek
>
> On Mon, Feb 22, 2016 at 10:18 AM, Gaurav Garg  wrote:
>
> > Hi Abhishek,
> >
> > >>  I'll provide the required log to you.
> >
> > sure
> >
> > on both node. do "pkill glusterd" and then start glusterd services.
> >
> > Thanks,
> >
> > ~Gaurav
> >
> > - Original Message -
> > From: "ABHISHEK PALIWAL" 
> > To: "Gaurav Garg" 
> > Cc: gluster-users@gluster.org
> > Sent: Monday, February 22, 2016 10:11:48 AM
> > Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node
> >
> > Hi Gaurav,
> >
> > Thanks for your prompt reply.
> >
> > I'll provide the required log to you.
> >
> > As a workaround you suggested that restart the glusterd service. Could
> you
> > please tell me the point where I can do this?
> >
> > Regards,
> > Abhishek
> >
> > On Fri, Feb 19, 2016 at 6:11 PM, Gaurav Garg  wrote:
> >
> > > Hi Abhishek,
> > >
> > > Peer status output looks interesting where it have stale entry,
> > > technically it should not happen. Here few thing need to ask
> > >
> > > Did you perform any manual operation with GlusterFS configuration file
> > > which resides in /var/lib/glusterd/* folder.
> > >
> > > Can you provide output of "ls /var/lib/glusterd/peers"  from both of
> your
> > > nodes.
> > >
> > > Could you provide output of #gluster peer status command when 2nd node
> is
> > > down
> > >
> > > Can you provide output of #gluster volume info command
> > >
> > > Can you provide full logs details of cmd_history.log and
> > > etc-glusterfs-glusterd.vol.log from both the nodes.
> > >
> > >
> > > You can restart your glusterd as of now as a workaround but we need to
> > > analysis this issue further.
> > >
> > > Thanks,
> > > Gaurav
> > >
> > > - Original Message -
> > > From: "ABHISHEK PALIWAL" 
> > > To: "Gaurav Garg" 
> > > Cc: gluster-users@gluster.org
> > > Sent: Friday, February 19, 2016 5:27:21 PM
> > > Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node
> > >
> > > Hi Gaurav,
> > >
> > > After the failure of add-brick following is outcome "gluster peer
> status"
> > > command
> > >
> > > Number of Peers: 2
> > >
> > > Hostname: 10.32.1.144
> > > Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
> > > State: Peer in Cluster (Connected)
> > >
> > > Hostname: 10.32.1.144
> > > Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
> > > State: Peer in Cluster (Connected)
> > >
> > > Regards,
> > > Abhishek
> > >
> > > On Fri, Feb 19, 2016 at 5:21 PM, ABHISHEK PALIWAL <
> > abhishpali...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Gaurav,
> > > >
> > > > Both are the board connect through the backplane using ethernet.
> > > >
> > > > Even this inconsistency also occurs when I am trying to bringing back
> > the
> > > > node in slot. Means some time add-brick executes without failure but
> > some
> > > > time following error occurs.
> > > >
> > > > volume add-brick c_glusterfs replica 2 10.32.1.144:
> > /opt/lvmdir/c2/brick
> > > > force : FAILED : Another transaction is in progress for c_glusterfs.
> > > Please
> > > > try again after sometime.
> > > >
> > > >
> > > > You can also see the attached logs for add-brick failure scenario.
> > > >
> > > > Please let me know if you need 

Re: [Gluster-users] geo-rep: remote operation failed - No such file or directory

2016-02-22 Thread Milind Changire
ML,
You will have to search for the gfid c4b19f1c-cc18-4727-87a4-18de8fe0089e
at the master cluster brick back-ends and run the following command for
that specific file on the master cluster to force triggering a data sync [1]

# setfattr -n glusterfs.geo-rep.trigger-sync 

To search for the file at the brick back-end:

# find //.glusterfs -name c4b19f1c-cc18-4727-87a4-18de8fe0089e

Once path to the file is found at any of the bricks, you can then use
the setfattr command described above.

Reference:
[1] feature/changelog: Virtual xattr to trigger explicit sync in geo-rep
http://review.gluster.org/#/c/9337/
--
Milind

- Original Message -
From: "ML mail" 
To: "Milind Changire" 
Cc: "Gluster-users" 
Sent: Monday, February 22, 2016 9:10:56 PM
Subject: Re: [Gluster-users] geo-rep: remote operation failed - No such file or 
directory

Hi Milind,

Thanks for the suggestion, I did that for a few problematic files and it seems 
to continue but now I am stuck at the following error message on the slave:

[2016-02-22 15:21:30.451133] W [MSGID: 114031] 
[client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-myvolume-geo-client-0: remote 
operation failed. Path:  
(c4b19f1c-cc18-4727-87a4-18de8fe0089e) [No such file or directory]

As you can see this message does not include any file or directory name, so I 
can't go any delete that file or directory. Any other ideas how I may proceed 
here?

Or maybe would it be easier if I delete the whole directory which I think is 
affected and start geo-rep from there? Or will this mess things up?

Regards
ML



On Monday, February 22, 2016 12:12 PM, Milind Changire  
wrote:
ML,
You could try deleting problematic files on slave to recover geo-replication
from Faulty state.

However, changelogs generated due to logrotate scenario will still cause
geo-replication to go into Faulty state frequently if geo-replication
fails and restarts.

The patches mentioned in an earlier mail are being worked upon and finalized.
They will be available soon in a release which will avoid geo-replication
going into a Faulty state.

--
Milind


- Original Message -
From: "ML mail" 
To: "Milind Changire" , "Gluster-users" 

Sent: Monday, February 22, 2016 1:27:14 PM
Subject: Re: [Gluster-users] geo-rep: remote operation failed - No such file or 
   directory

Hi Milind,

Any news on this issue? I was wondering how can I fix and restart my 
geo-replication? Can I simply delete the problematic file(s) on my slave and 
restart geo-rep?

Regards
ML





On Wednesday, February 17, 2016 4:30 PM, ML mail  wrote:


Hi Milind,

Thank you for your short analysis. Indeed that's exactly what happens, as soon 
as I restart geo-rep it replays the same over and over as it does not succeed. 


Now regarding the sequence of the file management operations I am not totally 
sure how it works but I can tell you that we are using ownCloud v8.2.2 
(www.owncloud.org) and as storage for this cloud software we use GlusterFS. So 
it is very probable that ownCloud works like that: when a user uploads a new 
file if first creates it with another temporary name which it then either 
renames or moves after successful upload. 


I have the feeling this issue is related to my initial issue which I have 
reported earlier this month: 
https://www.gluster.org/pipermail/gluster-users/2016-February/025176.html

For now my question would be how do I get to restart geo-replication 
succesfully?

Regards
ML



On Wednesday, February 17, 2016 4:10 PM, Milind Changire  
wrote:


As per the slave logs, there is an attempt to RENAME files
i.e. a .part file getting renamed to a name without the
.part suffix

Just restarting geo-rep isn't going to help much if
you've already hit the problem. Since the last CHANGELOG
is replayed by geo-rep on a restart, you'll most probably
encounter the same log messages in the logs.

Are the .part files CREATEd, RENAMEd and DELETEd with the
same name often? Are the operations somewhat in the following
sequence that happen on the geo-replication master cluster?

CREATE f1.part
RENAME f1.part f1
DELETE f1
CREATE f1.part
RENAME f1.part f1
...
...


If not, then it would help if you could send the sequence
of file management operations.

--
Milind


- Original Message -
From: "Kotresh Hiremath Ravishankar" 
To: "ML mail" 
Cc: "Gluster-users" , "Milind Changire" 

Sent: Tuesday, February 16, 2016 6:28:21 PM
Subject: Re: [Gluster-users] geo-rep: remote operation failed - No such file or 
   directory

Ccing Milind, he would be able to help

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "ML mail" 
> To: "Gluster-users" 
> Sent: Monday, February 

Re: [Gluster-users] glusterfs client crashes

2016-02-22 Thread Dj Merrill

On 2/21/2016 2:23 PM, Dj Merrill wrote:
> Very interesting.  They were reporting both bricks offline, but the
> processes on both servers were still running.  Restarting glusterfsd on
> one of the servers brought them both back online.

I realize I wasn't clear in my comments yesterday and would like to 
elaborate on this a bit further. The "very interesting" comment was 
sparked because when we were running 3.7.6, the bricks were not 
reporting as offline when a client was having an issue, so this is new 
behaviour now that we are running 3.7.8 (or a different issue entirely).


The other point that I was not clear on is that we may have one client 
reporting the "Transport endpoint is not connected" error, but the other 
40+ clients all continue to work properly. This is the case with both 
3.7.6 and 3.7.8.


Curious, how can the other clients continue to work fine if both Gluster 
3.7.8 servers are reporting the bricks as offline?


What does "offline" mean in this context?


Re: the server logs, here is what I've found so far listed on both 
gluster servers (glusterfs1 and glusterfs2):


[2016-02-21 08:06:02.785788] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:20.677010] W [socket.c:588:__socket_rwv] 
0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No 
data available)
[2016-02-21 18:48:20.677096] I [MSGID: 114018] 
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from 
gv0-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available
[2016-02-21 18:48:31.148564] E [MSGID: 114058] 
[client-handshake.c:1524:client_query_portmap_cbk] 0-gv0-client-1: 
failed to get the port number for remote subvolume. Please run 'gluster 
volume status' on server to see if brick process is running.
[2016-02-21 18:48:40.941715] W [socket.c:588:__socket_rwv] 0-glusterfs: 
readv on (sanitized IP of glusterfs2):24007 failed (No data available)
[2016-02-21 18:48:51.184424] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:51.972068] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 
0-mgmt: Volume file changed
[2016-02-21 18:48:51.980210] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 
0-mgmt: Volume file changed
[2016-02-21 18:48:51.985211] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 
0-mgmt: Volume file changed
[2016-02-21 18:48:51.995002] I [glusterfsd-mgmt.c:58:mgmt_cbk_spec] 
0-mgmt: Volume file changed
[2016-02-21 18:48:53.006079] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:53.018104] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:53.024060] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:53.035170] I [glusterfsd-mgmt.c:1596:mgmt_getspec_cbk] 
0-glusterfs: No change in volfile, continuing
[2016-02-21 18:48:53.045637] I [rpc-clnt.c:1847:rpc_clnt_reconfig] 
0-gv0-client-1: changing port to 49152 (from 0)
[2016-02-21 18:48:53.051991] I [MSGID: 114057] 
[client-handshake.c:1437:select_server_supported_programs] 
0-gv0-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2016-02-21 18:48:53.052439] I [MSGID: 114046] 
[client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: Connected 
to gv0-client-1, attached to remote volume '/export/brick1/sdb1'.
[2016-02-21 18:48:53.052486] I [MSGID: 114047] 
[client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: Server 
and Client lk-version numbers are not same, reopening the fds
[2016-02-21 18:48:53.052668] I [MSGID: 114035] 
[client-handshake.c:193:client_set_lk_version_cbk] 0-gv0-client-1: 
Server lk version = 1
[2016-02-21 18:48:31.148706] I [MSGID: 114018] 
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from 
gv0-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available
[2016-02-21 18:49:12.271865] W [socket.c:588:__socket_rwv] 0-glusterfs: 
readv on (sanitized IP of glusterfs2):24007 failed (No data available)
[2016-02-21 18:49:15.637745] W [socket.c:588:__socket_rwv] 
0-gv0-client-1: readv on (sanitized IP of glusterfs2):49152 failed (No 
data available)
[2016-02-21 18:49:15.637824] I [MSGID: 114018] 
[client.c:2030:client_rpc_notify] 0-gv0-client-1: disconnected from 
gv0-client-1. Client process will keep trying to connect to glusterd 
until brick's port is available
[2016-02-21 18:49:24.198431] E [socket.c:2278:socket_connect_finish] 
0-glusterfs: connection to (sanitized IP of glusterfs2):24007 failed 
(Connection refused)
[2016-02-21 18:49:26.204811] E [socket.c:2278:socket_connect_finish] 
0-gv0-client-1: connection to (sanitized IP of glusterfs2):24007 failed 
(Connection refused)
[2016-02-21 18:49:38.366559] I [MSGID: 108031] 
[afr-common.c:1883:afr_local_discovery_cbk] 0-gv0-replicate-0: selecting 
local read_child gv0-client-0

[Gluster-users] Gluster Community Newsletter, February 2016

2016-02-22 Thread Amye Scavarda
What a busy month this past month for Gluster!
We’ve got updates from SCaLE, FOSDEM, our Developer Gatherings in
Brno, DevConf, noteworthy threads from the mailing lists, and upcoming
events.
This post is also available on the Gluster blog:
http://blog.gluster.org/2016/02/gluster-community-newsletter-february-2015/

From SCaLE:
- Richard Wareing gave a talk at the Southern California Linux Expo
about Scaling Gluster at Facebook. More at
https://blog.gluster.org/2016/01/scaling-glusterfs-facebook/

From FOSDEM:
Humble and Kaushal have posted thoughts on FOSDEM
http://website-humblec.rhcloud.com/me-fosdem-2016/
https://kshlm.in/fosdem16/

From Developer Gatherings:
We had a group of developers gather in Brno ahead of DevConf to
discuss a number of different Gluster related things.
Highlights were:
GD2 with Kaushal - https://public.pad.fsfe.org/p/gluster-gd2-kaushal
Heketi & Eventing with Luis - https://public.pad.fsfe.org/p/gluster-heketi
DHT2 with Venky-  https://public.pad.fsfe.org/p/gluster-4.0-dht2

From DevConf
Ceph vs Gluster vs Swift: Similarities and Differences
https://devconfcz2016.sched.org/event/5lze/ceph-vs-gluster-vs-swift-similarities-and-differences
Prashanth Pai, Thiago da Silva

Automated GlusterFS Volume Management with Heketi
https://devconfcz2016.sched.org/event/5m0P/automated-glusterfs-volume-management-with-
heketi   - Luis Pabon
NFS-Ganesha and Distributed Storage Systems -
https://devconfcz2016.sched.org/event/5m15/nfs-ganesha-and-distributed-storage-systems
-   Kaleb S. Keithley

Build your own Scale-Out Storage with Gluster
https://devconfcz2016.sched.org/event/5m1X/build-your-own-scale-out-storage-with-gluster
0 Niels de Vos

Freak show (#2): CTDB -- Scaling The Aliens Back To Outer Space
https://devconfcz2016.sched.org/event/5m1l/freak-show-2-ctdb-scaling-the-aliens-back-to-outer-space
Günther Deschner, Michael Adam

oVirt and Gluster Hyperconvergence
https://devconfcz2016.sched.org/event/5m20/ovirt-and-gluster-hyperconvergence
Ramesh Nachimuthu

Improvements in gluster for virtualization usecase
https://devconfcz2016.sched.org/event/5m1p/improvements-in-gluster-for-virtualization-usecase
Prasanna Kumar Kalever

Test Automation and CI using DiSTAF
https://devconfcz2016.sched.org/event/5m1U/test-automation-and-ci-using-distaf
Vishwanath Bhat
Gluster Developer Gatherings at Brno before DevConf


 Noteworthy threads:
 Soumya Koduri investigates the issue of memory leaks in GlusterFS FUSE
client and suggests a re-run after application of a few specific
patches. More at
.
Oleksandr reported that it did not make an impact; Xavier confirmed a
similar issue with 3.7.6 release
().
The thread (at 
)
is a good read around the topic of understanding how to work through
diagnosis and fixes of memory leaks.

Sachidananda provided an update
().
Ravishankar, Krutika and Pranith worked with the reporter to identify
specific ways to address the topic. Pranith indicated that a 3.7.7
release is coming up soonest
()

A query was raised about the 3.6.8 release notes
()
and a suggestion to include them at
 Niels responded
stating that the notes should be part of the repository at

and added the release manager to provide additional detail

Vijay provided an update around the changes being discussed for 3.8
().
The maintainers feel it is worthwhile to include some of the key
features for 4.0 eg. NSR, dht2, glusterd2.0 as experimental in the
release; ensure a better component coverage for tests in distaf; add a
forward compatibility section to all the feature pages proposed for
3.8 in order to facilitate review for the Gluster.next features. In
the same mail Vijay proposed that Niels de Vos would be the maintainer
for the 3.8 release. And lastly, the projected GA date for 3.8 is now
set to end-May or, early-June 2016.

Ramesh Nachimuthu linked to a blog around designing HyperConverged
Infrastructure using oVirt 3.6 and Gluster 3.7.6
().
More at 

Re: [Gluster-users] [Gluster-devel] 3.7.8 client is slow

2016-02-22 Thread David Robinson

done

-- Original Message --
From: "Oleksandr Natalenko" 
To: gluster-users@gluster.org; "David Robinson" 


Cc: "Gluster Devel" 
Sent: 2/22/2016 1:12:01 PM
Subject: Re: [Gluster-devel] 3.7.8 client is slow


David,

could you please cross-post your observations to the following 
bugreport:


https://bugzilla.redhat.com/show_bug.cgi?id=1309462

?

It seems you have faced similar issue.

On понеділок, 22 лютого 2016 р. 16:46:01 EET David Robinson wrote:

 The 3.7.8 FUSE client is significantly slower than 3.7.6.  Is this
 related to some of the fixes that were done to correct memory leaks?  
Is

 there anything that I can do to recover the performance of 3.7.6?

 My testing involved creating a "bigfile" that is 20GB.  I then 
installed

 the 3.6.6 FUSE client and tested the copy of the bigfile from one
 gluster machine to another.  The test was repeated 2x to make sure 
cache

 wasn't affect performance.

 Using Centos7.1
 FUSE 3.6.6 took 47-seconds and 38-seconds.
 FUSE 3.7.6 took 43-seconds and 34-seconds.
 FUSE 3.7.8 took 205-seconds and 224-seconds

 I repeated the test on another machine that is running centos 6.7 and
 the results were even worse.  98-seconds for FUSE 3.6.6 versus
 575-seconds for FUSE 3.7.8.

 My server setup is:

 Volume Name: gfsbackup
 Type: Distribute
 Volume ID: 29b8fae9-dfbf-4fa4-9837-8059a310669a
 Status: Started
 Number of Bricks: 2
 Transport-type: tcp
 Bricks:
 Brick1: ffib01bkp:/data/brick01/gfsbackup
 Brick2: ffib01bkp:/data/brick02/gfsbackup
 Options Reconfigured:
 performance.readdir-ahead: on
 cluster.rebal-throttle: aggressive
 diagnostics.client-log-level: WARNING
 diagnostics.brick-log-level: WARNING
 changelog.changelog: off
 client.event-threads: 8
 server.event-threads: 8

 David



 



 David F. Robinson, Ph.D.

 President - Corvid Technologies

 145 Overhill Drive

 Mooresville, NC 28117

 704.799.6944 x101   [Office]

 704.252.1310   [Cell]

 704.799.7974   [Fax]

 david.robin...@corvidtec.com

 http://www.corvidtec.com





___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] 3.7.8 client is slow

2016-02-22 Thread Oleksandr Natalenko
David,

could you please cross-post your observations to the following bugreport:

https://bugzilla.redhat.com/show_bug.cgi?id=1309462

?

It seems you have faced similar issue.

On понеділок, 22 лютого 2016 р. 16:46:01 EET David Robinson wrote:
> The 3.7.8 FUSE client is significantly slower than 3.7.6.  Is this
> related to some of the fixes that were done to correct memory leaks?  Is
> there anything that I can do to recover the performance of 3.7.6?
> 
> My testing involved creating a "bigfile" that is 20GB.  I then installed
> the 3.6.6 FUSE client and tested the copy of the bigfile from one
> gluster machine to another.  The test was repeated 2x to make sure cache
> wasn't affect performance.
> 
> Using Centos7.1
> FUSE 3.6.6 took 47-seconds and 38-seconds.
> FUSE 3.7.6 took 43-seconds and 34-seconds.
> FUSE 3.7.8 took 205-seconds and 224-seconds
> 
> I repeated the test on another machine that is running centos 6.7 and
> the results were even worse.  98-seconds for FUSE 3.6.6 versus
> 575-seconds for FUSE 3.7.8.
> 
> My server setup is:
> 
> Volume Name: gfsbackup
> Type: Distribute
> Volume ID: 29b8fae9-dfbf-4fa4-9837-8059a310669a
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: ffib01bkp:/data/brick01/gfsbackup
> Brick2: ffib01bkp:/data/brick02/gfsbackup
> Options Reconfigured:
> performance.readdir-ahead: on
> cluster.rebal-throttle: aggressive
> diagnostics.client-log-level: WARNING
> diagnostics.brick-log-level: WARNING
> changelog.changelog: off
> client.event-threads: 8
> server.event-threads: 8
> 
> David
> 
> 
> 
> 
> 
> 
> 
> David F. Robinson, Ph.D.
> 
> President - Corvid Technologies
> 
> 145 Overhill Drive
> 
> Mooresville, NC 28117
> 
> 704.799.6944 x101   [Office]
> 
> 704.252.1310   [Cell]
> 
> 704.799.7974   [Fax]
> 
> david.robin...@corvidtec.com
> 
> http://www.corvidtec.com


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Establishing Connection (Connected)??

2016-02-22 Thread Atin Mukherjee
-Atin
Sent from one plus one
On 22-Feb-2016 10:52 pm, "Marcos Renato da Silva Junior" <
marco...@dee.feis.unesp.br> wrote:
>
> Hi,
>
> Solved using the link :
>
>
http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected
Great to know that, however I was wondering how did you end up in that
state :)
>
> Thanks.
>
>
> Em 22-02-2016 00:36, Atin Mukherjee escreveu:
>>
>> Could you attach glusterd.log along with cmd_history.log file of all the
>> nodes? Output of gluster volume status & gluster volume info would be
>> also be helpful here.
>>
>> ~Atin
>>
>> On 02/20/2016 12:20 AM, Marcos Renato da Silva Junior wrote:
>>>
>>> Hi,
>>>
>>>
>>> One of my nodes show "gluster peer status" :
>>>
>>>
>>> Number of Peers: 3
>>>
>>> Hostname: 200.145.239.172
>>> Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
>>> State: Peer in Cluster (Connected)
>>>
>>> Hostname: 200.145.239.172
>>> Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
>>> State: Establishing Connection (Connected)
>>> Other names:
>>> node1
>>>
>>> Hostname: servpos4
>>> Uuid: b712284c-5b42-4c0e-b67c-a908aa47bd3c
>>> State: Peer in Cluster (Connected)
>>>
>>>
>>> Same address but one "State: Peer in Cluster" another "Establishing
>>> Connection (Connected)".
>>>
>>> Its works ok.
>>>
>
> --
> Marcos Renato da Silva Junior
> Universidade Estadual Paulista - Unesp
> Faculdade de Engenharia de Ilha Solteira - FEIS
> Departamento de Engenharia Elétrica
> 15385-000 - Ilha Solteira/SP
> (18) 3743-1164
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Establishing Connection (Connected)??

2016-02-22 Thread Marcos Renato da Silva Junior

Hi,

Solved using the link :

http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected

Thanks.

Em 22-02-2016 00:36, Atin Mukherjee escreveu:

Could you attach glusterd.log along with cmd_history.log file of all the
nodes? Output of gluster volume status & gluster volume info would be
also be helpful here.

~Atin

On 02/20/2016 12:20 AM, Marcos Renato da Silva Junior wrote:

Hi,


One of my nodes show "gluster peer status" :


Number of Peers: 3

Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Peer in Cluster (Connected)

Hostname: 200.145.239.172
Uuid: 2f3aac03-6b27-4572-8edd-48fbf53b7883
State: Establishing Connection (Connected)
Other names:
node1

Hostname: servpos4
Uuid: b712284c-5b42-4c0e-b67c-a908aa47bd3c
State: Peer in Cluster (Connected)


Same address but one "State: Peer in Cluster" another "Establishing
Connection (Connected)".

Its works ok.



--
Marcos Renato da Silva Junior
Universidade Estadual Paulista - Unesp
Faculdade de Engenharia de Ilha Solteira - FEIS
Departamento de Engenharia Elétrica
15385-000 - Ilha Solteira/SP
(18) 3743-1164

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] 3.7.8 client is slow

2016-02-22 Thread David Robinson
The 3.7.8 FUSE client is significantly slower than 3.7.6.  Is this 
related to some of the fixes that were done to correct memory leaks?  Is 
there anything that I can do to recover the performance of 3.7.6?


My testing involved creating a "bigfile" that is 20GB.  I then installed 
the 3.6.6 FUSE client and tested the copy of the bigfile from one 
gluster machine to another.  The test was repeated 2x to make sure cache 
wasn't affect performance.


Using Centos7.1
FUSE 3.6.6 took 47-seconds and 38-seconds.
FUSE 3.7.6 took 43-seconds and 34-seconds.
FUSE 3.7.8 took 205-seconds and 224-seconds

I repeated the test on another machine that is running centos 6.7 and 
the results were even worse.  98-seconds for FUSE 3.6.6 versus 
575-seconds for FUSE 3.7.8.


My server setup is:

Volume Name: gfsbackup
Type: Distribute
Volume ID: 29b8fae9-dfbf-4fa4-9837-8059a310669a
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: ffib01bkp:/data/brick01/gfsbackup
Brick2: ffib01bkp:/data/brick02/gfsbackup
Options Reconfigured:
performance.readdir-ahead: on
cluster.rebal-throttle: aggressive
diagnostics.client-log-level: WARNING
diagnostics.brick-log-level: WARNING
changelog.changelog: off
client.event-threads: 8
server.event-threads: 8

David







David F. Robinson, Ph.D.

President - Corvid Technologies

145 Overhill Drive

Mooresville, NC 28117

704.799.6944 x101   [Office]

704.252.1310   [Cell]

704.799.7974   [Fax]

david.robin...@corvidtec.com

http://www.corvidtec.com

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] geo-rep: remote operation failed - No such file or directory

2016-02-22 Thread ML mail
Hi Milind,

Thanks for the suggestion, I did that for a few problematic files and it seems 
to continue but now I am stuck at the following error message on the slave:

[2016-02-22 15:21:30.451133] W [MSGID: 114031] 
[client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-myvolume-geo-client-0: remote 
operation failed. Path:  
(c4b19f1c-cc18-4727-87a4-18de8fe0089e) [No such file or directory]

As you can see this message does not include any file or directory name, so I 
can't go any delete that file or directory. Any other ideas how I may proceed 
here?

Or maybe would it be easier if I delete the whole directory which I think is 
affected and start geo-rep from there? Or will this mess things up?

Regards
ML



On Monday, February 22, 2016 12:12 PM, Milind Changire  
wrote:
ML,
You could try deleting problematic files on slave to recover geo-replication
from Faulty state.

However, changelogs generated due to logrotate scenario will still cause
geo-replication to go into Faulty state frequently if geo-replication
fails and restarts.

The patches mentioned in an earlier mail are being worked upon and finalized.
They will be available soon in a release which will avoid geo-replication
going into a Faulty state.

--
Milind


- Original Message -
From: "ML mail" 
To: "Milind Changire" , "Gluster-users" 

Sent: Monday, February 22, 2016 1:27:14 PM
Subject: Re: [Gluster-users] geo-rep: remote operation failed - No such file or 
   directory

Hi Milind,

Any news on this issue? I was wondering how can I fix and restart my 
geo-replication? Can I simply delete the problematic file(s) on my slave and 
restart geo-rep?

Regards
ML





On Wednesday, February 17, 2016 4:30 PM, ML mail  wrote:


Hi Milind,

Thank you for your short analysis. Indeed that's exactly what happens, as soon 
as I restart geo-rep it replays the same over and over as it does not succeed. 


Now regarding the sequence of the file management operations I am not totally 
sure how it works but I can tell you that we are using ownCloud v8.2.2 
(www.owncloud.org) and as storage for this cloud software we use GlusterFS. So 
it is very probable that ownCloud works like that: when a user uploads a new 
file if first creates it with another temporary name which it then either 
renames or moves after successful upload. 


I have the feeling this issue is related to my initial issue which I have 
reported earlier this month: 
https://www.gluster.org/pipermail/gluster-users/2016-February/025176.html

For now my question would be how do I get to restart geo-replication 
succesfully?

Regards
ML



On Wednesday, February 17, 2016 4:10 PM, Milind Changire  
wrote:


As per the slave logs, there is an attempt to RENAME files
i.e. a .part file getting renamed to a name without the
.part suffix

Just restarting geo-rep isn't going to help much if
you've already hit the problem. Since the last CHANGELOG
is replayed by geo-rep on a restart, you'll most probably
encounter the same log messages in the logs.

Are the .part files CREATEd, RENAMEd and DELETEd with the
same name often? Are the operations somewhat in the following
sequence that happen on the geo-replication master cluster?

CREATE f1.part
RENAME f1.part f1
DELETE f1
CREATE f1.part
RENAME f1.part f1
...
...


If not, then it would help if you could send the sequence
of file management operations.

--
Milind


- Original Message -
From: "Kotresh Hiremath Ravishankar" 
To: "ML mail" 
Cc: "Gluster-users" , "Milind Changire" 

Sent: Tuesday, February 16, 2016 6:28:21 PM
Subject: Re: [Gluster-users] geo-rep: remote operation failed - No such file or 
   directory

Ccing Milind, he would be able to help

Thanks and Regards,
Kotresh H R

- Original Message -
> From: "ML mail" 
> To: "Gluster-users" 
> Sent: Monday, February 15, 2016 4:41:56 PM
> Subject: [Gluster-users] geo-rep: remote operation failed - No such file or   
>  directory
> 
> Hello,
> 
> I noticed that the geo-replication of a volume has STATUS "Faulty" and while
> looking in the *.gluster.log file in
> /var/log/glusterfs/geo-replication-slaves/ on my slave I can see the
> following relevant problem:
> 
> [2016-02-15 10:58:40.402516] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
> 0-myvolume-geo-client-0: changing port to 49152 (from 0)
> [2016-02-15 10:58:40.403928] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs]
> 0-myvolume-geo-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
> (330)
> [2016-02-15 10:58:40.404130] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-myvolume-geo-client-0:
> Connected to myvolume-geo-client-0, attached to remote volume
> '/data/myvolume-geo/brick'.
> [2016-02-15 10:58:40.404150] 

Re: [Gluster-users] Issue in Adding/Removing the gluster node

2016-02-22 Thread Gaurav Garg
Hi abhishek,

I went through your logs of node 1 and by looking glusterd logs its clearly 
indicate that your 2nd node (10.32.1.144) have disconnected from the cluster, 
because of that remove-brick operation failed. I think you need to check your 
network interface.

But surprising things is that i did not see duplicate peer entry in #gluster 
peer status command output.

May be i will get some more information from your (10.32.1.144) 2nd node logs. 
Could you also attach your 2nd node logs.

after restarting glusterd, are you seeing duplicate peer entry in #gluster peer 
status command output ? 

will wait for 2nd node logs for further analyzing duplicate peer entry problem. 

Thanks,

~Gaurav

- Original Message -
From: "ABHISHEK PALIWAL" 
To: "Gaurav Garg" 
Cc: gluster-users@gluster.org
Sent: Monday, February 22, 2016 12:48:55 PM
Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node

Hi Gaurav,

Here, You can find the attached logs for the boards in case of remove-brick
failure.
In these logs we do not have the cmd_history and
etc-glusterfs-glusterd.vol.log for the second board.

May be for that we need to some more time.


Regards,
Abhishek

On Mon, Feb 22, 2016 at 10:18 AM, Gaurav Garg  wrote:

> Hi Abhishek,
>
> >>  I'll provide the required log to you.
>
> sure
>
> on both node. do "pkill glusterd" and then start glusterd services.
>
> Thanks,
>
> ~Gaurav
>
> - Original Message -
> From: "ABHISHEK PALIWAL" 
> To: "Gaurav Garg" 
> Cc: gluster-users@gluster.org
> Sent: Monday, February 22, 2016 10:11:48 AM
> Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node
>
> Hi Gaurav,
>
> Thanks for your prompt reply.
>
> I'll provide the required log to you.
>
> As a workaround you suggested that restart the glusterd service. Could you
> please tell me the point where I can do this?
>
> Regards,
> Abhishek
>
> On Fri, Feb 19, 2016 at 6:11 PM, Gaurav Garg  wrote:
>
> > Hi Abhishek,
> >
> > Peer status output looks interesting where it have stale entry,
> > technically it should not happen. Here few thing need to ask
> >
> > Did you perform any manual operation with GlusterFS configuration file
> > which resides in /var/lib/glusterd/* folder.
> >
> > Can you provide output of "ls /var/lib/glusterd/peers"  from both of your
> > nodes.
> >
> > Could you provide output of #gluster peer status command when 2nd node is
> > down
> >
> > Can you provide output of #gluster volume info command
> >
> > Can you provide full logs details of cmd_history.log and
> > etc-glusterfs-glusterd.vol.log from both the nodes.
> >
> >
> > You can restart your glusterd as of now as a workaround but we need to
> > analysis this issue further.
> >
> > Thanks,
> > Gaurav
> >
> > - Original Message -
> > From: "ABHISHEK PALIWAL" 
> > To: "Gaurav Garg" 
> > Cc: gluster-users@gluster.org
> > Sent: Friday, February 19, 2016 5:27:21 PM
> > Subject: Re: [Gluster-users] Issue in Adding/Removing the gluster node
> >
> > Hi Gaurav,
> >
> > After the failure of add-brick following is outcome "gluster peer status"
> > command
> >
> > Number of Peers: 2
> >
> > Hostname: 10.32.1.144
> > Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
> > State: Peer in Cluster (Connected)
> >
> > Hostname: 10.32.1.144
> > Uuid: bbe2a458-ad3d-406d-b233-b6027c12174e
> > State: Peer in Cluster (Connected)
> >
> > Regards,
> > Abhishek
> >
> > On Fri, Feb 19, 2016 at 5:21 PM, ABHISHEK PALIWAL <
> abhishpali...@gmail.com
> > >
> > wrote:
> >
> > > Hi Gaurav,
> > >
> > > Both are the board connect through the backplane using ethernet.
> > >
> > > Even this inconsistency also occurs when I am trying to bringing back
> the
> > > node in slot. Means some time add-brick executes without failure but
> some
> > > time following error occurs.
> > >
> > > volume add-brick c_glusterfs replica 2 10.32.1.144:
> /opt/lvmdir/c2/brick
> > > force : FAILED : Another transaction is in progress for c_glusterfs.
> > Please
> > > try again after sometime.
> > >
> > >
> > > You can also see the attached logs for add-brick failure scenario.
> > >
> > > Please let me know if you need more logs.
> > >
> > > Regards,
> > > Abhishek
> > >
> > >
> > > On Fri, Feb 19, 2016 at 5:03 PM, Gaurav Garg  wrote:
> > >
> > >> Hi Abhishek,
> > >>
> > >> How are you connecting two board, and how are you removing it manually
> > >> that need to know because if you are removing your 2nd board from the
> > >> cluster (abrupt shutdown) then you can't perform remove brick
> operation
> > in
> > >> 2nd node from first node and its happening successfully in your case.
> > could
> > >> you ensure your network connection once again while removing and
> > bringing
> > >> back your node again.
> > >>
> > >> Thanks,
> > >> Gaurav
> > >>
> > >> --
> > >> *From: