Re: [Gluster-devel] Question on merging zfs snapshot support into the mainline glusterfs

2016-06-20 Thread B.K.Raghuram
Thanks Rajesh,

I was looking at 3.6 only to check on some locking issues that we were
seeing. However, we would like to see this in master. Please feel free to
suggest modifications/modify the code as you see fit. Are there plans of
having a more general way of integrating other underlying snapshotting
mechanisms such as btrfs/lxd as well?

On Mon, Jun 20, 2016 at 3:16 PM, Rajesh Joseph  wrote:

>
>
> On Mon, Jun 20, 2016 at 12:33 PM, Kaushal M  wrote:
>
>> On Mon, Jun 20, 2016 at 11:38 AM, B.K.Raghuram  wrote:
>> > We had hosted some changes to an old version of glusterfs (3.6.1) in
>> order
>> > to incorporate ZFS snapshot support for gluster snapshot commands. These
>> > have been done quite a while back and were not forward ported to newer
>> > versions of glusterfs. I have a couple of questions on this :
>> >
>> > 1. If one needs to incorporate these changes in their current or
>> modified
>> > form into the glusterfs master, what is the procedure to do so?
>> >
>> > 2. Since the above process may take longer to roll in, we would like to
>> get
>> > the changes into at least the latest version of the 3.6 branch. In
>> order to
>> > do this, I tried the following and needed some help :
>> >
>> > I tried to apply the two ZFS relates commits
>> > (https://github.com/fractalio/glusterfs/commits/release-3.6) to the
>> latest
>> > gluster code in the  guster-3.6 branch. I hit  one merge conflict per
>> > commit, both in xlators/mgmt/glusterd/src/glusterd-snapshot.c. The
>> attached
>> > glusterd-snapshot.c_1 is the file with the merge conflicts after
>> applying
>> > the first commit and  glusterd-snapshot.c_2 is the one applying the
>> second
>> > commit. In order to process, I removed the HEAD changes in each of the
>> merge
>> > conflicts and proceeded just to see if anything else breaks but it went
>> > through. glusterd-snapshot.c_1_corrected and
>> glusterd-snapshot.c_2_corrected
>> > and the corresponding files after removing the merge conflicts.
>> >
>> > The question I had is, are the changes that I made to correct the merge
>> > conflicts safe? If not, could someone provide some suggestions on how to
>> > correct the two conflicts?
>> >
>> > The file cmd_log contains the history of commands that I went through
>> in the
>> > process..
>> >
>>
>> Thanks for sharing this Ram!
>>
>> Rajesh is the right person to answer your questions. As a GlusterD
>> maintainer, I'll go through this and see if I can answer as well.
>>
>>
> Overall the merge resolution seems fine, except few mistakes. e.g. in
> glusterd-snapshot.c_2 you missed
> to add "(unmount == _gf_true)" in the while loop in function
> "glusterd_do_lvm_snapshot_remove".
>
> In function "glusterd_lvm_snapshot_remove" wrong chunk of code added. The
> "if" condition should break here
> instead of continuing from here.
>
> Also I think it would be better to rebase the change against master
> instead of 3.6.
>
> Apart from this I am yet to review the complete change. I have taken an
> initial look and seems like
> we do need some amount of cleanup to the code before it can be taken in. I
> also need to see how well it will
> work the existing framework. I will go through it and provide a detailed
> comments later.
>
> Thanks & Regards,
> Rajesh
>
>
>
>> > Thanks,
>> > -Ram
>> >
>> > ___
>> > Gluster-devel mailing list
>> > Gluster-devel@gluster.org
>> > http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Wrong assumptions about disperse

2016-06-20 Thread Xavier Hernandez

Hi Shyam,

On 17/06/16 15:59, Shyam wrote:

On 06/17/2016 04:59 AM, Xavier Hernandez wrote:

Firstly, thanks for the overall post, was informative and helps clarify
some aspects of EC.


AFAIK the real problem of EC is the communications
layer. It adds a lot of latency and having to communicate simultaneously
and coordinate 6 or more bricks has a big impact.


Can you elaborate this more? Is this coordination cost lesser if EC is
coordinated from the server side graphs? (like leader/follower models in
JBR)? I have heard some remarks about a transaction manager in Gluster,
that you proposed, how does that help/fit in to resolve this issue?


I think one of the big problems is in the communications layer. I did 
some tests some time ago with unexpected results. On a pure distributed 
volume with a single brick mounted through FUSE on the same server that 
contains the brick (no physical network communications happen) I did the 
following tests:


* Modify protocol/server to immediately return a buffer of 0's for all 
readv requests (I virtually disable all server side xlators for readv 
requests).


Observed read speed for a dd with bs=128 KB: 349 MB/s
Observed read speed for a dd with bs=32 MB (multiple 128KB readv 
requests in parallel): 744 MB/s


* Modify protocol/client to immediately return a buffer of 0's for all 
readv requests (this avoids all RPC/networking code for readv requests).


Observed read speed for bs=128 KB: 428 MB/s
Observed read speed for bs=32 MB: 1530 MB/s

* An iperf reported a speed of 4.7 GB/s

The network layer seems to be adding a high overhead, specially when 
many requests are sent in parallel. This is very bad for disperse.


I think the coordination effort will be similar in the server side with 
current implementation. If we use the leader approach, coordination will 
be much easier/fast in theory. However all communications will be 
directed to a single server. That could make the communications problem 
worse (I haven't tested any of this, though).


The transaction approach was thought with the idea of moving fop sorting 
to the server side, without having to explicitly take locks on the 
client. This would reduce the number of network round-trips and should 
reduce the latency, improving overall performance.


This should have a perceptible impact in write requests, that currently 
are serialized on the client side. If we move the coordination to the 
server side, the client can send multiple write requests in parallel, 
making better use of the network bandwidth. This also gives the brick 
the opportunity to combine multiple write requests into a single disk 
write. This is specially important for ec, that splits big blocks into 
smaller ones for each brick.




I am curious here w.r.t DHT2, where we are breaking this down into DHT2
client and server pieces, and on the MDC (metadata cluster), the leader
brick of DHT2 subvolume is responsible for some actions, like in-memory
inode locking (say), which would otherwise be a cross subvolume lock
(costlier).


Unfortunatly I haven't had time to read the details about DHT2 
implementation so I cannot say much here.




We also need transactions when we are going to update 2 different
objects with contents (simplest example is creating the inode for the
file and linking its name into the parent directory), IOW when we have a
composite operation.

The above xaction needs recording, which is a lesser evil when dealing
with a local brick, but will suffer performance penalties when dealing
with replication or EC. I am looking at ways where this xaction
recording can be compounded with the first real operation that needs to
be performed on the subvolume, but that may not always work.

So what are your thoughts in regard to improving the client side
coordination problem that you are facing?


My point of view is that *any* coordination will work much better in the 
server side. Additionally, one of the features of the transaction 
framework was that multiple xlators could share a single transaction on 
the same inode, reducing the number of operations needed for the general 
case (currently if two xlators need an exclusive lock, each of them 
needs to issue an independent inodelk/entrylk fop). I know this is 
evolving to the leader/follower pattern, and to have data and metadata 
separated for gluster. I'm not a big fan of this approach, though.


Independently of all these changes, improving network performance will 
benefit *all* approaches.


Regards,

Xavi



Thanks,
Shyam

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Issues with encrypted management on 3.8.0

2016-06-20 Thread Kaushal M
On Fri, Jun 17, 2016 at 7:39 PM, Michael Wyraz  wrote:
> Hi,
>
> I have set up a 3-node-cluster (4 bricks each node, 3 replicas). Without TLS
> on management connection, everything is fine. With TLS I get several errors
> with different symptomps. Basically the errors causes some daemons not to
> start. So e.g. Quota is not working correctly since quota daemon cannot be
> started.
>

This is a known issue, that should have been documented in the release
notes. My bad that I didn't do it.
I've added this issue, cause and workaround to an existing bug about
re-connection issues with encrypted connections in glusterfs [1].

tl;dr:
This happens because the GlusterFS rpc layer is configured in 3.8 to
use AF_UNSPEC getaddrinfo(), which returns both IPv6 and IPv4
addresses.
Most systems are configured to return IPv6 addresses with a higher priority.
GlusterD doesn't listen on IPv6 addresses yet (only on 0.0.0.0). So
the initial connection attempt to glusterd fails.
Encrypted connections have issues right now, that prevent
reconnections with the next addresses (IPv4 mostly) from happening.
The daemons fail to start, because they cannot connect to glusterd to
get volfiles.
This doesn't happen with non-encrypted connections because
reconnection works properly.

2 workarounds,
- Increase preference for IPv4 addresses in getaddinfo by editing /etc/gai.conf
Or,
- Edit /etc/hosts and remove IPv6 entry for localhost

~kaushal



[1] https://bugzilla.redhat.com/show_bug.cgi?id=117#c11

> My environment is:
>
> Debian Jessie, "LATEST" glusterfs repository
>
> On each node in /etc/hosts:
>xxx.xxx.xxx.xxx full.qualified.node.name
> for each node including the local one.
>
> Certs:
> - Created a self signed CA Cert with XCA /etc/ssl/glusterfs.ca
>
> Certificate:
> Data:
> Version: 3 (0x2)
> Serial Number: 1 (0x1)
> Signature Algorithm: sha224WithRSAEncryption
> Issuer: CN=cluster1.backups
> Validity
> Not Before: Jun 17 00:00:00 2016 GMT
> Not After : Jun 16 23:59:59 2041 GMT
> Subject: CN=cluster1.backups
> Subject Public Key Info:
> Public Key Algorithm: rsaEncryption
> Public-Key: (4096 bit)
> Modulus:
> ...
> X509v3 extensions:
> X509v3 Basic Constraints: critical
> CA:TRUE
> X509v3 Subject Key Identifier:
> FE:BD:92:1D:8D:B5:DB:42:32:7E:BC:A3:EC:15:0D:D3:9F:64:34:69
> X509v3 Key Usage:
> Certificate Sign, CRL Sign
> Netscape Cert Type:
> SSL CA, S/MIME CA, Object Signing CA
> Netscape Comment:
> xca certificate
> Signature Algorithm: sha224WithRSAEncryption
>  
>
> - Created a Cert for each node /etc/ssl/glusterfs.pem
>
> Certificate:
> Data:
> Version: 3 (0x2)
> Serial Number: 4 (0x4)
> Signature Algorithm: sha256WithRSAEncryption
> Issuer: CN=cluster1.backups
> Validity
> Not Before: Jun 17 00:00:00 2016 GMT
> Not After : Jun 16 23:59:59 2041 GMT
> Subject: CN=c1-m3.cluster1.backups
> Subject Public Key Info:
> Public Key Algorithm: rsaEncryption
> Public-Key: (4096 bit)
> Modulus:
> ...
> X509v3 extensions:
> X509v3 Basic Constraints: critical
> CA:FALSE
> X509v3 Subject Key Identifier:
> 35:36:9D:37:BC:AA:59:34:94:3D:D9:20:73:17:74:08:CA:AA:9F:FA
> X509v3 Key Usage:
> Digital Signature, Non Repudiation, Key Encipherment
> Netscape Cert Type:
> SSL Server
> Netscape Comment:
> xca certificate
> Signature Algorithm: sha256WithRSAEncryption
>  ...
>
> - Put the Cert private key to /etc/ssl/glusterfs.key
> - Created 4096 bit dh params to /etc/ssl/dhparam.pem
>
> Here are the corresponding error logs when I start the volume with TLS
> enabled from this node (other nodes are similar):
>
>
> ==> /var/log/glusterfs/cli.log <==
> [2016-06-17 13:44:22.577033] I [cli.c:730:main] 0-cli: Started running
> gluster with version 3.8.0
> [2016-06-17 13:44:22.580865] I [socket.c:4047:socket_init] 0-glusterfs: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:22.581855] I [socket.c:4047:socket_init] 0-glusterfs: SSL
> support for glusterd is ENABLED
> [2016-06-17 13:44:22.654191] I [MSGID: 101190]
> [event-epoll.c:628:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [2016-06-17 13:44:22.654277] W [socket.c:696:__socket_rwv] 0-glusterfs:
> readv on /var/run/gluster/quotad.socket failed (Invalid argument)
>
> ==> /var/log/glusterfs/etc-glusterfs-glusterd.vol.log <==
> [2016-06-17 13:44:22.668903] W [common-utils.c:1805:gf_string2boolean]
> 

[Gluster-devel] Requesting review for patch: http://review.gluster.org/#/c/13901/

2016-06-20 Thread Susant Palai
Hi,
  Patch in $subject is needed for lock(post lock migration), lease to work. 
Requesting your reviews asap so that this can be targeted for 3.8.1.

Thanks,
 Susant~
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel