Re: [Gluster-users] Error "Failed to find host nfs1.lightspeed.ca" when adding a new node to the cluster.

2016-04-08 Thread Atin Mukherjee
-Atin
Sent from one plus one
On 08-Apr-2016 10:40 pm, "Ernie Dunbar"  wrote:
>
> On 2016-04-07 09:16, Atin Mukherjee wrote:
>>
>> -Atin
>> Sent from one plus one
>> On 07-Apr-2016 9:32 pm, "Ernie Dunbar"  wrote:
>>>
>>>
>>> On 2016-04-06 21:20, Atin Mukherjee wrote:


 On 04/07/2016 04:04 AM, Ernie Dunbar wrote:
>
>
> On 2016-04-06 11:42, Ernie Dunbar wrote:
>>
>>
>> I've already successfully created a Gluster cluster, but when I
>>
>> try to
>>
>> add a new node, gluster on the new node claims it can't find the
>> hostname of the first node in the cluster.
>>
>> I've added the hostname nfs1.lightspeed.ca [1] to /etc/hosts like
>>
>> this:
>>
>>
>> root@nfs3:/home/ernied# cat /etc/hosts
>> 127.0.0.1localhost
>> 192.168.1.31nfs1.lightspeed.ca [1]  nfs1
>> 192.168.1.32nfs2.lightspeed.ca [2]  nfs2
>> 127.0.1.1nfs3.lightspeed.ca [3]nfs3
>>
>>
>>
>> # The following lines are desirable for IPv6 capable hosts
>> ::1 localhost ip6-localhost ip6-loopback
>> ff02::1 ip6-allnodes
>> ff02::2 ip6-allrouters
>>
>> I can ping the hostname:
>>
>> root@nfs3:/home/ernied# ping -c 3 nfs1
>> PING nfs1.lightspeed.ca [1] (192.168.1.31) 56(84) bytes of data.
>> 64 bytes from nfs1.lightspeed.ca [1] (192.168.1.31): icmp_seq=1
>>
>> ttl=64
>>
>> time=0.148 ms
>> 64 bytes from nfs1.lightspeed.ca [1] (192.168.1.31): icmp_seq=2
>>
>> ttl=64
>>
>> time=0.126 ms
>> 64 bytes from nfs1.lightspeed.ca [1] (192.168.1.31): icmp_seq=3
>>
>> ttl=64
>>
>> time=0.133 ms
>>
>> --- nfs1.lightspeed.ca [1] ping statistics ---
>>
>> 3 packets transmitted, 3 received, 0% packet loss, time 1998ms
>> rtt min/avg/max/mdev = 0.126/0.135/0.148/0.016 ms
>>
>> I can get gluster to probe the hostname:
>>
>> root@nfs3:/home/ernied# gluster peer probe nfs1
>> peer probe: success. Host nfs1 port 24007 already in peer list
>>
>> But if I try to create the brick on the new node, it says that
>>
>> the
>>
>> host can't be found? Um...
>>
>> root@nfs3:/home/ernied# gluster volume create gv2 replica 3
>> nfs1.lightspeed.ca:/brick1/gv2/ nfs2.lightspeed.ca:/brick1/gv2/
>> nfs3.lightspeed.ca:/brick1/gv2
>> volume create: gv2: failed: Failed to find host
>>
>> nfs1.lightspeed.ca [1]
>>
>>
>> Our logs from /var/log/glusterfs/etc-glusterfs-glusterd.vol.log:
>>
>> [2016-04-06 18:19:18.107459] E [MSGID: 106452]
>> [glusterd-utils.c:5825:glusterd_new_brick_validate] 0-management:
>> Failed to find host nfs1.lightspeed.ca [1]
>>
>> [2016-04-06 18:19:18.107496] E [MSGID: 106536]
>> [glusterd-volume-ops.c:1364:glusterd_op_stage_create_volume]
>> 0-management: Failed to find host nfs1.lightspeed.ca [1]
>>
>> [2016-04-06 18:19:18.107516] E [MSGID: 106301]
>> [glusterd-syncop.c:1281:gd_stage_op_phase] 0-management: Staging
>>
>> of
>>
>> operation 'Volume Create' failed on localhost : Failed to find
>>
>> host
>>
>> nfs1.lightspeed.ca [1]
>>
>> [2016-04-06 18:19:18.231864] E [MSGID: 106170]
>> [glusterd-handshake.c:1051:gd_validate_mgmt_hndsk_req]
>>
>> 0-management:
>>
>> Request from peer 192.168.1.31:65530 [4] has an entry in
>>
>> peerinfo, but
>>
>> uuid does not match


 We have introduced a new check to reject a peer if the request is
>>
>> coming

 from a node where the hostname matches but UUID is different. This
>>
>> can

 happen if a node goes through a re-installation and its
 /var/lib/glusterd/* content is wiped off. Look at [1] for more
>>
>> details.


 [1] http://review.gluster.org/13519

 Do confirm if that's the case.
>>>
>>>
>>>
>>>
>>> I couldn't say if that's *exactly* the case, but it's pretty close.
>>
>> I don't recall ever removing /var/lib/glusterd/* or any of its
>> contents, but the operating system isn't exactly the way it was when I
>> first tried to add this node to the cluster.
>>>
>>>
>>> What should I do to *fix* the problem though, so I can add this node
>>
>> to the cluster? This bug report doesn't appear to provide a solution.
>> I've tried removing the node from the cluster, and that failed too.
>> Things seem to be in a very screwey state right now.
>>
>> I should have given the work around earlier. Find the peer file for
>> the faulty node in /var/lib/glusterd/peers/ and delete the same from
>> all the nodes but the faulty node. Restart glusterd instance on all
>> those nodes. Ensure /var/lib/glusterd/ content is empty, restart
>> glusterd and then peer probe this node from any of the node in the
>> existing cluster. You should also bump up the op-version once cluster
>> is stable.
>>
>
> This mostly solved the problem, but it seems you were missing one step:
>
> # gluster 

Re: [Gluster-users] non interactive use of glsuter snapshot delete

2016-04-08 Thread Rajesh Joseph
Have you tried the "--mode=script" option in Gluster?

e.g.

# gluster --mode=script snapshot delete snap1

On Fri, Apr 8, 2016 at 10:01 PM, Alastair Neil 
wrote:

> I am crafting a script (well actually I am modifying gcron.py) to retain a
> configurable number of Hourly, Daily,Weekly and Monthly snapshots.  One
> issue is that gluster snapshot delete "snapname" does not seem to have a
> command line switch (like "-y" in yum) to attempt the operation
> non-interactively.  Is there another way of performing this that is more
> friendly to non-interactive use?  From the shell I can pipe "yes"  to the
> command but my python fu is weak so I thought I'd ask if there was a
> simpler way.
>
> Thanks, Alastair
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] GlusterFS Mounts at startup consuming needed local ports for host services

2016-04-08 Thread Ryan.J.Wyler
Glusterfs mounts are squashing ports I need available for services, 
specifically on port 1002.

How can I configure glusterfs mount points to avoid specific local ports for 
the connections?

A potential workaround I thought of is removing the mounts from starting 
automatically at startup and create an rc.d startup script to mount the 
glusterfs filesystems after all the host services have started to avoid those 
ports.

Is there a different option?

The local port range is using 1000-1023 as seen in the information below for 
client.domain.com.

# lsof | grep glusterfs | grep IPv4
glusterfs  2893root9u IPv4  15373   0t0 
 TCP client.domain.com:1023->server1.domain.com:24007 (ESTABLISHED)
glusterfs  2893root   10u IPv4  15447   0t0 
 TCP client.domain.com:1015->server2.domain.com:49171 (ESTABLISHED)
glusterfs  2893root   11u IPv4  15446   0t0 
 TCP client.domain.com:1016->server3.domain.com:49171 (ESTABLISHED)
glusterfs  2893root   13u IPv4  15422   0t0 
 TCP client.domain.com:1020->server1.domain.com:49171 (ESTABLISHED)
glusterfs  2893root   15u IPv4  15439   0t0 
 TCP client.domain.com:1018->server4.domain.com:49171 (ESTABLISHED)
glusterfs  2998root9u IPv4  15523   0t0 
 TCP client.domain.com:1014->server1.domain.com:24007 (ESTABLISHED)
glusterfs  2998root   10u IPv4  15588   0t0 
 TCP client.domain.com:1006->server2.domain.com:49157 (ESTABLISHED)
glusterfs  2998root   11u IPv4  15576   0t0 
 TCP client.domain.com:1007->server3.domain.com:49157 (ESTABLISHED)
glusterfs  2998root   14u IPv4  15549   0t0 
 TCP client.domain.com:1012->server1.domain.com:49157 (ESTABLISHED)
glusterfs  2998root   16u IPv4  15569   0t0 
 TCP client.domain.com:1009->server4.domain.com:49157 (ESTABLISHED)
glusterfs 46813root   10u IPv4 342418   0t0 
 TCP client.domain.com:1019->server1.domain.com:24007 (ESTABLISHED)
glusterfs 46813root   11u IPv4 342445   0t0 
 TCP client.domain.com:busboy->server2.domain.com:49152 (ESTABLISHED)
glusterfs 46813root   15u IPv4 342428   0t0 
 TCP client.domain.com:surf->server1.domain.com:49152 (ESTABLISHED)
glusterfs 46813root   16u IPv4 342434   0t0 
 TCP client.domain.com:1002->server4.domain.com:49152 (ESTABLISHED)
glusterfs 46813root   17u IPv4 342440   0t0 
 TCP client.domain.com:cadlock2->server3.domain.com:49152 (ESTABLISHED)


Glusterfs RPMs installed:

glusterfs-libs-3.6.0.54-1
glusterfs-api-3.6.0.54-1
glusterfs-fuse-3.6.0.54-1
glusterfs-3.6.0.54-1



Ryan Wyler
ryan.j.wy...@wellsfargo.com

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] non interactive use of glsuter snapshot delete

2016-04-08 Thread Alastair Neil
I am crafting a script (well actually I am modifying gcron.py) to retain a
configurable number of Hourly, Daily,Weekly and Monthly snapshots.  One
issue is that gluster snapshot delete "snapname" does not seem to have a
command line switch (like "-y" in yum) to attempt the operation
non-interactively.  Is there another way of performing this that is more
friendly to non-interactive use?  From the shell I can pipe "yes"  to the
command but my python fu is weak so I thought I'd ask if there was a
simpler way.

Thanks, Alastair
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] disperse heal speed up

2016-04-08 Thread Serkan Çoban
Hi,

I am testing heal speed of disperse volume and what I see is 5-10MB/s per node.
I increased disperse.background-heals to 32 and
disperse.heal-wait-qlength to 256, but still no difference.
One thing I noticed is that, when I kill a brick process, reformat it
and restart it heal speed is nearly 20x (200MB/s/node)

But when I kill the brick, then write 100TB data, and start brick
afterwords heal is slow (5-10MB/s/node)

What is the difference between two scenarios? Why one heal is slow and
other is fast? How can I increase disperse heal speed? Should I
increase thread count to 128 or 256? I am on 78x(16+4) disperse volume
and my servers are pretty strong (2x14 cores with 512GB ram, each node
has 26x8TB disks)

Gluster version is 3.7.10.

Thanks,
Serkan
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] Gluster + Infiniband + 3.x kernel -> hard crash?

2016-04-08 Thread Niels de Vos
On Wed, Apr 06, 2016 at 04:32:40PM -0400, Glomski, Patrick wrote:
> We run gluster 3.7 in a distributed replicated setup. Infiniband (tcp)
> links the gluster peers together and clients use the ethernet interface.
> 
> This setup is stable running CentOS 6.x and using the most recent
> infiniband drivers provided by Mellanox. Uptime was 170 days when we took
> it down to wipe the systems and update to CentOS 7.
> 
> When the exact same setup is loaded onto a CentOS 7 machine (minor setup
> differences, but basically the same; setup is handled by ansible), the
> peers will (seemingly randomly) experience a hard crash and need to be
> power-cycled. There is no output on the screen and nothing in the logs.
> After rebooting, the peer reconnects, heals whatever files it missed, and
> everything is happy again. Maximum uptime for any given peer is 20 days.
> Thanks to the replication, clients maintain connectivity, but from a system
> administration perspective it's driving me crazy!
> 
> We run other storage servers with the same infiniband and CentOS7 setup
> except that they use NFS instead of gluster. NFS shares are served through
> infiniband to some machines and ethernet to others.
> 
> Is it possible that gluster's (and only gluster's) use of the infiniband
> kernel module to send tcp packets to its peers on a 3 kernel is causing the
> system to have a hard crash? Pretty specific problem and it doesn't make
> much sense to me, but that's sure where the evidence seems to point.
> 
> Anyone running CentOS 7 gluster arrays with infiniband out there to confirm
> that it works fine for them? Gluster devs care to chime in with a better
> theory? I'd love for this random crashing to stop.

Giving suggestions for this is extremely difficult without knowing the
technicality of the crashes. Gluster should not cause kernel panics, so
this is most likely an issue in the kernel or the drivers that are used.
I can only advise you to install and configure kdump. This makes it
possible to capture a vmcore after a kernel panic. The system will
'kexec' into a special kdump kernel and it copies /proc/vmcore to stable
storage. There also would be a dmesg or similar log in the same
directory as the vmcore image itself. With the details from the kernel
panic, we can probably make some more useful suggestions (or get a
certain bug fixed).

Thanks,
Niels


signature.asc
Description: PGP signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] samba vfs module for debian jessie?

2016-04-08 Thread Ira Cooper
Roman  writes:

> Oh, thanks for pointing out. My bad.
> It was really there, so IN CASE if someone else would like to use this vfs
> module with debian jessie it is as simple as download the source from its
> repository and just build the samba package from debian sources. Then
> install it. Seems like everything is working.

No problem, and I'm glad you are up and running!

Cheers,

-Ira
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] samba vfs module for debian jessie?

2016-04-08 Thread Ira Cooper
That version of Samba has vfs_glusterfs.c as part of the samba source tree.

source3/modules/vfs_glusterfs.c is the source file.

Cheers,

-Ira

- Original Message -
> 2016-04-06 16:36 GMT+03:00 Ira Cooper :
> 
> > Roman  writes:
> >
> > > Hi,
> > >
> > > does anyone know, where do I
> > > get /usr/lib/x86_64-linux-gnu/samba/vfs/glusterfs.so or where do I get
> > it's
> > > source to compile it for debian jessie?
> >
> > What version of samba is in jessie?
> >
> > -Ira
> >
> 
> Its
> root@glstor1:/# smbd  --version
> Version 4.1.17-Debian
> 
> 
> --
> Best regards,
> Roman.
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users