[Gluster-users] How do I slow down client reconnect attempts?

2017-05-19 Thread Jim Goven
Hello,

I would like to use glusterfs to replicate a volume between my desktop
and my laptop, so that when I am away from home (and likely
disconnected), I can still access some project files. The desktop is on
most of the time, but the laptop is rarely on.

I have successfully set up gluster to accomplish this goal, but I am
looking to improve the disconnected behaviour of the desktop.

While the laptop is disconnected (ie, most of the time), the desktop
continuously re-tries to connect to the laptop, but in doing so it
generates too much DNS traffic trying to find the laptop - it looks like
there is no retry delay at all. In turn, the gluster client log files
grow very fast, the DNS resolver logs also grow fast, and there is a lot
of unnecessary network chatter.

I would like to add a delay of between subsequent re-attempts to
reconnect, but I can't find any such option at
http://gluster.readthedocs.io/...Managing%20Volumes

The question is: Is there a way to slow down gluster's attempts to
reconnect?

I looked through the source as well, and I can't find a delay mechanism.

Ideally the reconnect attempts would follow an exponential back-off
schedule with a configurable maximum delay, but a fixed, configurable
delay would work too.

In case a developer reads this, I would be inclined in implementing
exponential backoff myself, and submitting a patch, but would appreciate
a brief advice of where in the codebase this backoff should be inserted.

Thank you
jg
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] gluster remove-brick problem

2017-05-19 Thread Nithya Balachandran
Hi,

The rebalance could have failed because of any one of several reasons. You
would need to check the rebalance log for the volume to figure out why it
failed in this case. This should be /var/log/glusterfs/data-rebalance.log
on bigdata-dlp-server00.xg01.

I can take a look at the log if you send it across.

Regards,
Nithya

On 19 May 2017 at 13:09, Pranith Kumar Karampuri 
wrote:

> Adding gluster-users, developers who work on distribute module of gluster.
>
> On Fri, May 19, 2017 at 12:58 PM, 郭鸿岩(基础平台部) <
> guohongyan...@didichuxing.com> wrote:
>
>> Hello,
>>
>> I am a user of Gluster 3.8 from beijing, China.
>> I met a problem.  I added a brick to a volume, but the brick is
>> on the / disk, the same disk with the linux OS.
>> So I want to remove it. The volume is of distributed type without
>> replica.
>>
>> I use commad: gluster volume remove-brick data
>> server1:/vdata/bricks/data start,
>> It is ok. The command is successfully accepted by gluster. But
>> then I check status:  gluster volume remove-brick data
>> server1:/vdata/bricks/data status.
>> IT showd:
>>
>> Node Rebalanced-files size scanned failures skipped status run
>> time in h:m:s
>>  - --- --- --- ---
>> ---  ———
>>  bigdata-dlp-server00.xg01 0 0Bytes 0 1 0 failed 0:0:0
>>
>> 0 files Scaned, 0 Bytes transfered and 1 failure.
>>
>> Please give me some advice , How could I do to remove this brick without
>> data loss?
>>
>> Thank you very much!
>>
>> best wishes.
>>
>>
>> by starshine.
>
>
>
>
> --
> Pranith
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-05-19 Thread Mahdi Adnan
Thank you so much mate.

I'll finish the test tomorrow and let you know the results.

--

Respectfully
Mahdi A. Mahdi


From: Krutika Dhananjay 
Sent: Wednesday, May 17, 2017 6:59:20 AM
To: gluster-user
Cc: Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi Adnan
Subject: Rebalance + VM corruption - current status and request for feedback

Hi,

In the past couple of weeks, we've sent the following fixes concerning VM 
corruption upon doing rebalance - 
https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:bug-1440051

These fixes are very much part of the latest 3.10.2 release.

Satheesaran within Red Hat also verified that they work and he's not seeing 
corruption issues anymore.

I'd like to hear feedback from the users themselves on these fixes (on your 
test environments to begin with) before even changing the status of the bug to 
CLOSED.

Although 3.10.2 has a patch that prevents rebalance sub-commands from being 
executed on sharded volumes, you can override the check by using the 'force' 
option.

For example,

# gluster volume rebalance myvol start force

Very much looking forward to hearing from you all.

Thanks,
Krutika
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fwd: Re: VM going down

2017-05-19 Thread Alessandro Briosi
Il 12/05/2017 12:09, Alessandro Briosi ha scritto:
>> You probably should open a bug so that we have all the troubleshooting
>> and debugging details in one location. Once we find the problem we can
>> move the bug to the right component.
>>   https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
>>
>> HTH,
>> Niels
> The thing is that when the VM is down and I check the logs there's nothing.
> Then when I start the VM the logs get populated with the seek error.
>
> Anyway I'll open a bug for this.

Ok, as it happened again I have opened a bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1452766

I now have started the vm with gdb (maybe I can find more information)

In the logs I still have "No such file or directory" which at this point
seems to be the culprit of this (?)

Alessandro
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster remove-brick problem

2017-05-19 Thread Pranith Kumar Karampuri
Adding gluster-users, developers who work on distribute module of gluster.

On Fri, May 19, 2017 at 12:58 PM, 郭鸿岩(基础平台部) 
wrote:

> Hello,
>
> I am a user of Gluster 3.8 from beijing, China.
> I met a problem.  I added a brick to a volume, but the brick is on
> the / disk, the same disk with the linux OS.
> So I want to remove it. The volume is of distributed type without
> replica.
>
> I use commad: gluster volume remove-brick data
> server1:/vdata/bricks/data start,
> It is ok. The command is successfully accepted by gluster. But
> then I check status:  gluster volume remove-brick data
> server1:/vdata/bricks/data status.
> IT showd:
>
> Node Rebalanced-files size scanned failures skipped status run
> time in h:m:s
>  - --- --- --- ---
> ---  ———
>  bigdata-dlp-server00.xg01 0 0Bytes 0 1 0 failed 0:0:0
>
> 0 files Scaned, 0 Bytes transfered and 1 failure.
>
> Please give me some advice , How could I do to remove this brick without
> data loss?
>
> Thank you very much!
>
> best wishes.
>
>
> by starshine.




-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS+heketi+Kubernetes snapshots fail

2017-05-19 Thread Mohammed Rafi K C


On 05/18/2017 09:13 PM, Chris Jones wrote:
> On 5/18/2017 1:53 AM, Mohammed Rafi K C wrote:
>> On 05/18/2017 10:04 AM, Pranith Kumar Karampuri wrote:
>>> +Snapshot maintainer. I think he is away for a week or so. You may
>>> have to wait a bit more.
>>>
>>> On Wed, May 10, 2017 at 2:39 AM, Chris Jones >> > wrote:
>>>
>>> Hi All,
>>>
>>> This was discussed briefly on IRC, but got no resolution. I have
>>> a Kubernetes cluster running heketi and GlusterFS 3.10.1. When I
>>> try to create a snapshot, I get:
>>>
>>> snapshot create: failed: Commit failed on localhost. Please
>>> check log file for details.
>>>
>>> glusterd log: http://termbin.com/r8s3
>>>
>>
>> I'm not able to open the url. Could you please paste it in a
>> different domain. ?
>
> glusterd.log:
> https://gist.github.com/cjyar/aa5dc8bc893d2439823fa11f2373428f
>
> brick log: https://gist.github.com/cjyar/10b2194a4413c6338da0776860a94401
>
> lvs output: https://gist.github.com/cjyar/87cfef8d403ed321bd96798790828d42
It looks like your VG data part is 100% full, that might be the reason
why it is failed .

Rafi KC


>
> "gluster snapshot config" output:
> https://gist.github.com/cjyar/0798a2ba8790f26f7d745f2a67abe5b1
>
>>
>>> brick log: http://termbin.com/l0ya
>>>
>>> lvs output: http://termbin.com/bwug
>>>
>>> "gluster snapshot config" output: http://termbin.com/4t1k
>>>
>>> As you can see, there's not a lot of helpful output in the log
>>> files. I'd be grateful if somebody could help me interpret
>>> what's there.
>>>
>>> Chris
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org 
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>> 
>>>
>>>
>>>
>>>
>>> -- 
>>> Pranith
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Failure while upgrading gluster to 3.10.1

2017-05-19 Thread Pawan Alwandi
Hello Atin,

Thanks for continued support.  I've attached requested files from all 3
nodes.

(I think we already verified the UUIDs to be correct, anyway let us know if
you find any more info in the logs)

Pawan

On Thu, May 18, 2017 at 11:45 PM, Atin Mukherjee 
wrote:

>
> On Thu, 18 May 2017 at 23:40, Atin Mukherjee  wrote:
>
>> On Wed, 17 May 2017 at 12:47, Pawan Alwandi  wrote:
>>
>>> Hello Atin,
>>>
>>> I realized that these http://gluster.readthedocs.io/
>>> en/latest/Upgrade-Guide/upgrade_to_3.10/ instructions only work for
>>> upgrades from 3.7, while we are running 3.6.2.  Are there
>>> instructions/suggestion you have for us to upgrade from 3.6 version?
>>>
>>> I believe upgrade from 3.6 to 3.7 and then to 3.10 would work, but I see
>>> similar errors reported when I upgraded to 3.7 too.
>>>
>>> For what its worth, I was able to set the op-version (gluster v set all
>>> cluster.op-version 30702) but that doesn't seem to help.
>>>
>>> [2017-05-17 06:48:33.700014] I [MSGID: 100030] [glusterfsd.c:2338:main]
>>> 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.7.20
>>> (args: /usr/sbin/glusterd -p /var/run/glusterd.pid)
>>> [2017-05-17 06:48:33.703808] I [MSGID: 106478] [glusterd.c:1383:init]
>>> 0-management: Maximum allowed open file descriptors set to 65536
>>> [2017-05-17 06:48:33.703836] I [MSGID: 106479] [glusterd.c:1432:init]
>>> 0-management: Using /var/lib/glusterd as working directory
>>> [2017-05-17 06:48:33.708866] W [MSGID: 103071]
>>> [rdma.c:4594:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event
>>> channel creation failed [No such device]
>>> [2017-05-17 06:48:33.709011] W [MSGID: 103055] [rdma.c:4901:init]
>>> 0-rdma.management: Failed to initialize IB Device
>>> [2017-05-17 06:48:33.709033] W [rpc-transport.c:359:rpc_transport_load]
>>> 0-rpc-transport: 'rdma' initialization failed
>>> [2017-05-17 06:48:33.709088] W [rpcsvc.c:1642:rpcsvc_create_listener]
>>> 0-rpc-service: cannot create listener, initing the transport failed
>>> [2017-05-17 06:48:33.709105] E [MSGID: 106243] [glusterd.c:1656:init]
>>> 0-management: creation of 1 listeners failed, continuing with succeeded
>>> transport
>>> [2017-05-17 06:48:35.480043] I [MSGID: 106513] 
>>> [glusterd-store.c:2068:glusterd_restore_op_version]
>>> 0-glusterd: retrieved op-version: 30600
>>> [2017-05-17 06:48:35.605779] I [MSGID: 106498] [glusterd-handler.c:3640:
>>> glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
>>> [2017-05-17 06:48:35.607059] I [rpc-clnt.c:1046:rpc_clnt_connection_init]
>>> 0-management: setting frame-timeout to 600
>>> [2017-05-17 06:48:35.607670] I [rpc-clnt.c:1046:rpc_clnt_connection_init]
>>> 0-management: setting frame-timeout to 600
>>> [2017-05-17 06:48:35.607025] I [MSGID: 106498] [glusterd-handler.c:3640:
>>> glusterd_friend_add_from_peerinfo] 0-management: connect returned 0
>>> [2017-05-17 06:48:35.608125] I [MSGID: 106544]
>>> [glusterd.c:159:glusterd_uuid_init] 0-management: retrieved UUID:
>>> 7f2a6e11-2a53-4ab4-9ceb-8be6a9f2d073
>>>
>>
>>> Final graph:
>>> +---
>>> ---+
>>>   1: volume management
>>>   2: type mgmt/glusterd
>>>   3: option rpc-auth.auth-glusterfs on
>>>   4: option rpc-auth.auth-unix on
>>>   5: option rpc-auth.auth-null on
>>>   6: option rpc-auth-allow-insecure on
>>>   7: option transport.socket.listen-backlog 128
>>>   8: option event-threads 1
>>>   9: option ping-timeout 0
>>>  10: option transport.socket.read-fail-log off
>>>  11: option transport.socket.keepalive-interval 2
>>>  12: option transport.socket.keepalive-time 10
>>>  13: option transport-type rdma
>>>  14: option working-directory /var/lib/glusterd
>>>  15: end-volume
>>>  16:
>>> +---
>>> ---+
>>> [2017-05-17 06:48:35.609868] I [MSGID: 101190] 
>>> [event-epoll.c:632:event_dispatch_epoll_worker]
>>> 0-epoll: Started thread with index 1
>>> [2017-05-17 06:48:35.610839] W [socket.c:596:__socket_rwv] 0-management:
>>> readv on 192.168.0.7:24007 failed (No data available)
>>> [2017-05-17 06:48:35.611907] E [rpc-clnt.c:370:saved_frames_unwind]
>>> (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_
>>> callingfn+0x1a3)[0x7fd6c2d70bb3] (--> /usr/lib/x86_64-linux-gnu/
>>> libgfrpc.so.0(saved_frames_unwind+0x1cf)[0x7fd6c2b3a2df] (-->
>>> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fd6c2b3a3fe]
>>> (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_
>>> connection_cleanup+0x89)[0x7fd6c2b3ba39] (--> /usr/lib/x86_64-linux-gnu/
>>> libgfrpc.so.0(rpc_clnt_notify+0x160)[0x7fd6c2b3c380] )
>>> 0-management: forced unwinding frame type(GLUSTERD-DUMP) op(DUMP(1)) called
>>> at 2017-05-17 06:48:35.609965 (xid=0x1)
>>> [2017-05-17 06:48:35.611928] E [MSGID: 106167]
>>> 

Re: [Gluster-users] how to restore snapshot LV's

2017-05-19 Thread Mohammed Rafi K C
I do not know how you ended up in this state. This usually happens when
there is a commit failure. To recover from this state you can change the
value of "status" from

the path /var/lib/glusterd/snaps///info . From this
file change the status to 0 in nodes where the values are one. Then
restart glusterd on those node where we changed manually.

Then try to activate it.


Regards

Rafi KC


On 05/18/2017 09:38 AM, Pranith Kumar Karampuri wrote:
> +Rafi, +Raghavendra Bhat
>
> On Tue, May 16, 2017 at 11:55 AM, WoongHee Han  > wrote:
>
> Hi, all!
>
> I erased the VG having snapshot LV related to gluster volumes
> and then, I tried to restore volume;
>
> 1. vgcreate vg_cluster /dev/sdb
> 2. lvcreate --size=10G --type=thin-pool -n tp_cluster vg_cluster
> 3. lvcreate -V 5G --thinpool vg_cluster/tp_cluster -n test_vol
> vg_cluster
> 4. gluster v stop test_vol
> 5. getfattr -n trusted.glusterfs.volume-id /volume/test_vol ( in
> other node)
> 6. setfattr -n trusted.glusterfs.volume-id -v
>  0sKtUJWIIpTeKWZx+S5PyXtQ== /volume/test_vol (already mounted)
> 7. gluster v start test_vol
> 8. restart glusterd
> 9. lvcreate -s vg_cluster/test_vol --setactivationskip=n
> --name 6564c50651484d09a36b912962c573df_0
> 10. lvcreate -s vg_cluster/test_vol --setactivationskip=n
> --name ee8c32a1941e4aba91feab21fbcb3c6c_0
> 11. lvcreate -s vg_cluster/test_vol --setactivationskip=n
> --name bf93dc34233646128f0c5f84c3ac1f83_0 
> 12. reboot
>
> It works, but bricks for snapshot is not working.
>
> 
> --
> ~]# glsuter snpshot status
> Brick Path:  
> 
> 192.225.3.35:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick1
> Volume Group  :   vg_cluster
> Brick Running :   No
> Brick PID :   N/A
> Data Percentage   :   0.22
> LV Size   :   5.00g
>
>
> Brick Path:  
> 
> 192.225.3.36:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick2
> Volume Group  :   vg_cluster
> Brick Running :   No
> Brick PID :   N/A
> Data Percentage   :   0.22
> LV Size   :   5.00g
>
>
> Brick Path:  
> 
> 192.225.3.37:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick3
> Volume Group  :   vg_cluster
> Brick Running :   No
> Brick PID :   N/A
> Data Percentage   :   0.22
> LV Size   :   5.00g
>
>
> Brick Path:  
> 
> 192.225.3.38:/var/run/gluster/snaps/bf93dc34233646128f0c5f84c3ac1f83/brick4
> Volume Group  :   vg_cluster
> Brick Running :   Tes
> Brick PID :   N/A
> Data Percentage   :   0.22
> LV Size   :   5.00g
>
> ~]# gluster snapshot deactivate t3_GMT-2017.05.15-08.01.37
> Deactivating snap will make its data inaccessible. Do you want to
> continue? (y/n) y
> snapshot deactivate: failed: Pre Validation failed on
> 192.225.3.36. Snapshot t3_GMT-2017.05.15-08.01.37 is already
> deactivated.
> Snapshot command failed
>
> ~]# gluster snapshot activate t3_GMT-2017.05.15-08.01.37
> snapshot activate: failed: Snapshot t3_GMT-2017.05.15-08.01.37 is
> already activated
>
> 
> --
>
>
> how to  restore snapshot LV's ?
>
> my nodes consist of four nodes and  distributed, replicated (2x2)
>
>
> thank you.
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org 
> http://lists.gluster.org/mailman/listinfo/gluster-users
> 
>
>
>
>
> -- 
> Pranith

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users