Re: [Gluster-devel] [Gluster-users] Proposal for GlusterD-2.0

2014-09-06 Thread Pranith Kumar Karampuri


On 09/05/2014 03:51 PM, Kaushal M wrote:
GlusterD performs the following functions as the management daemon for 
GlusterFS:

- Peer membership management
- Maintains consistency of configuration data across nodes 
(distributed configuration store)

- Distributed command execution (orchestration)
- Service management (manage GlusterFS daemons)
- Portmap service for GlusterFS daemons


This proposal aims to delegate the above functions to technologies 
that solve these problems well. We aim to do this in a phased manner.
The technology alternatives we would be looking for should have the 
following properties,

- Open source
- Vibrant community
- Good documentation
- Easy to deploy/manage

This would allow GlusterD's architecture to be more modular. We also 
aim to make GlusterD's architecture as transparent and observable as 
possible. Separating out these functions would allow us to do that.


Bulk of current GlusterD code deals with keeping the configuration of 
the cluster and the volumes in it consistent and available across the 
nodes. The current algorithm is not scalable  (N^2 in no. of nodes) 
and doesn't prevent split-brain of configuration. This is the problem 
area we are targeting for the first phase.


As part of the first phase, we aim to delegate the distributed 
configuration store. We are exploring consul [1] as a replacement for 
the existing distributed configuration store (sum total of 
/var/lib/glusterd/* across all nodes). Consul provides distributed 
configuration store which is consistent and partition tolerant. By 
moving all Gluster related configuration information into consul we 
could avoid split-brain situations.
Did you get a chance to go over the following questions while making the 
decision? If yes could you please share the info.
What are the consistency guarantees for changing the configuration in 
case of network partitions?

 specifically when there are 2 nodes and 1 of them is not reachable?
 consistency guarantees when there are more than 2 nodes?
What are the consistency guarantees for reading configuration in case of 
network partitions?


Pranith


All development efforts towards this proposal would happen in parallel 
to the existing GlusterD code base. The existing code base would be 
actively maintained until GlusterD-2.0 is production-ready.


This is in alignment with the GlusterFS Quattro proposals on making 
GlusterFS scalable and easy to deploy. This is the first phase ground 
work towards that goal.


Questions and suggestions are welcome.

~kaushal

[1] : http://www.consul.io/


___
Gluster-users mailing list
gluster-us...@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] basic/afr/gfid-self-heal.t on release-3.6/NetBSD

2014-09-06 Thread Emmanuel Dreyfus
Harshavardhana har...@harshavardhana.net wrote:

 This is the change, you need this patch. Since it isn't committed to
 release-3.6 - you shouldn't be using master regression tests on release-3.6

Right, but that is a bug fixed in master but still present in
release-3.6, isn't it? Why not backport that change to release-3.6?

The patch will not apply cleanly, it requires previous changes, but
perhaps it is worth working on it?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] [Gluster-users] Proposal for GlusterD-2.0

2014-09-06 Thread Atin Mukherjee


On 09/06/2014 05:55 PM, Pranith Kumar Karampuri wrote:
 
 On 09/05/2014 03:51 PM, Kaushal M wrote:
 GlusterD performs the following functions as the management daemon for
 GlusterFS:
 - Peer membership management
 - Maintains consistency of configuration data across nodes
 (distributed configuration store)
 - Distributed command execution (orchestration)
 - Service management (manage GlusterFS daemons)
 - Portmap service for GlusterFS daemons


 This proposal aims to delegate the above functions to technologies
 that solve these problems well. We aim to do this in a phased manner.
 The technology alternatives we would be looking for should have the
 following properties,
 - Open source
 - Vibrant community
 - Good documentation
 - Easy to deploy/manage

 This would allow GlusterD's architecture to be more modular. We also
 aim to make GlusterD's architecture as transparent and observable as
 possible. Separating out these functions would allow us to do that.

 Bulk of current GlusterD code deals with keeping the configuration of
 the cluster and the volumes in it consistent and available across the
 nodes. The current algorithm is not scalable  (N^2 in no. of nodes)
 and doesn't prevent split-brain of configuration. This is the problem
 area we are targeting for the first phase.

 As part of the first phase, we aim to delegate the distributed
 configuration store. We are exploring consul [1] as a replacement for
 the existing distributed configuration store (sum total of
 /var/lib/glusterd/* across all nodes). Consul provides distributed
 configuration store which is consistent and partition tolerant. By
 moving all Gluster related configuration information into consul we
 could avoid split-brain situations.
 Did you get a chance to go over the following questions while making the
 decision? If yes could you please share the info.
 What are the consistency guarantees for changing the configuration in
 case of network partitions?
  specifically when there are 2 nodes and 1 of them is not reachable?
  consistency guarantees when there are more than 2 nodes?
 What are the consistency guarantees for reading configuration in case of
 network partitions?
Consul documentation claims that it can recover from network partition.
http://www.consul.io/docs/internals/jepsen.html

However having said that we are yet to do this POC.

~Atin
 
 Pranith

 All development efforts towards this proposal would happen in parallel
 to the existing GlusterD code base. The existing code base would be
 actively maintained until GlusterD-2.0 is production-ready.

 This is in alignment with the GlusterFS Quattro proposals on making
 GlusterFS scalable and easy to deploy. This is the first phase ground
 work towards that goal.

 Questions and suggestions are welcome.

 ~kaushal

 [1] : http://www.consul.io/


 ___
 Gluster-users mailing list
 gluster-us...@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-users
 
 
 
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-devel
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] basic/afr/gfid-self-heal.t on release-3.6/NetBSD

2014-09-06 Thread Harshavardhana

 Right, but that is a bug fixed in master but still present in
 release-3.6, isn't it? Why not backport that change to release-3.6?

 The patch will not apply cleanly, it requires previous changes, but
 perhaps it is worth working on it?



That is left to the changeset owner, perhaps it will be done in upcoming
weeks.

-- 
Religious confuse piety with mere ritual, the virtuous confuse
regulation with outcomes
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Brick replace

2014-09-06 Thread Emmanuel Dreyfus
Hi

I try getting tests/basic/pump.t to pass on NetBSD, but after a few
experiments, it seems the brick replace functionality is just broken.

I run that steps one by one on a fresh install:

netbsd0# glusterd
netbsd0# $CLI volume create $V0 $H0:$B0/${V0}0
volume create: patchy: success: please start the volume to access data
netbsd0# $CLI volume start $V0
volume start: patchy: success
netbsd0# $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0;  
netbsd0# cp -r /usr/share/misc/ $M0/
netbsd0# $CLI volume replace-brick $V0 $H0:$B0/${V0}0 $H0:$B0/${V0}1
start
volume replace-brick: success: replace-brick started successfully
ID: 98030ade-2dab-467e-86cb-cbea2436e85f
netbsd0# $CLI volume replace-brick $V0 $H0:$B0/${V0}0 $H0:$B0/${V0}1
commit
volume replace-brick: failed: Commit failed on localhost. Please check
the log file for more details.

Where logs should I look at? There is nothing in the mount log. Here is
glusterd log:

[2014-09-07 00:39:01.905900] I
[glusterd-replace-brick.c:154:__glusterd_handle_replace_brick]
0-management: Received replace brick commit request
[2014-09-07 00:39:01.948259] I
[glusterd-replace-brick.c:1441:rb_update_srcbrick_port] 0-: adding
src-brick port no
[2014-09-07 00:39:01.951238] I
[glusterd-replace-brick.c:1495:rb_update_dstbrick_port] 0-: adding
dst-brick port no
[2014-09-07 00:39:01.974376] E
[glusterd-replace-brick.c:1780:glusterd_op_replace_brick] 0-management:
Commit operation failed
[2014-09-07 00:39:01.974423] E
[glusterd-op-sm.c:4109:glusterd_op_ac_send_commit_op] 0-management:
Commit of operation 'Volume Replace brick' failed on localhost  

The brick log has a lot of errors:

[2014-09-07 00:44:44.041565] E
[client-handshake.c:1544:client_query_portmap] 0-patchy-replace-brick:
remote-subvolume not set in volfile
[2014-09-07 00:44:44.041636] I [client.c:2215:client_rpc_notify]
0-patchy-replace-brick: disconnected from patchy-replace-brick. Client
process will keep trying to connect to glusterd until brick's port is
available

Any hint of where to look at?


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] truncating grouplist

2014-09-06 Thread Emmanuel Dreyfus
Hi

I have a lot of log messages about this:
[2014-09-07 04:07:41.323747] W [rpc-clnt.c:1340:rpc_clnt_record]
0-gfs352-client-3: truncating grouplist from 500 to 87

This is produced here:

/* The number of groups and the size of lk_owner depend on oneother.
 * We can truncate the groups, but should not touch the lk_owner. */
max_groups = GF_AUTH_GLUSTERFS_MAX_GROUPS (au.lk_owner.lk_owner_len);
if (au.groups.groups_len  max_groups) {
GF_LOG_OCCASIONALLY (gf_auth_max_groups_log, clnt-conn.name,
 GF_LOG_WARNING, truncating grouplist 
 from %d to %d, au.groups.groups_len,
 max_groups);

au.groups.groups_len = max_groups;
}

Wound't it make sense to only log is max_groups  NGROUPS_MAX ? I do not
really care a group list is truncated if my system cannot handle the missing
groups anyway.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Another transaction is in progress

2014-09-06 Thread Emmanuel Dreyfus
How am I supposed to address this?

# gluster volume status gfs352
Another transaction is in progress. Please try again after sometime.

Retrying after some time does not help. The glusterd log file at least
tells me who the culprit is:
[2014-09-07 04:20:25.820379] E [glusterd-utils.c:153:glusterd_lock]
0-management: Unable to get lock for uuid:
078015de-2186-4bd7-a4d1-017e39c16dd3, lock held by:
078015de-2186-4bd7-a4d1-017e39c16dd3

This UUID is a brick whose log file will not tell anything particular.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Brick replace

2014-09-06 Thread Atin Mukherjee


On 09/07/2014 06:21 AM, Emmanuel Dreyfus wrote:
 Hi
 
 I try getting tests/basic/pump.t to pass on NetBSD, but after a few
 experiments, it seems the brick replace functionality is just broken.
 
 I run that steps one by one on a fresh install:
 
 netbsd0# glusterd
 netbsd0# $CLI volume create $V0 $H0:$B0/${V0}0
 volume create: patchy: success: please start the volume to access data
 netbsd0# $CLI volume start $V0
 volume start: patchy: success
 netbsd0# $GFS --volfile-id=/$V0 --volfile-server=$H0 $M0;  
 netbsd0# cp -r /usr/share/misc/ $M0/
 netbsd0# $CLI volume replace-brick $V0 $H0:$B0/${V0}0 $H0:$B0/${V0}1
 start
 volume replace-brick: success: replace-brick started successfully
 ID: 98030ade-2dab-467e-86cb-cbea2436e85f
 netbsd0# $CLI volume replace-brick $V0 $H0:$B0/${V0}0 $H0:$B0/${V0}1
 commit
 volume replace-brick: failed: Commit failed on localhost. Please check
 the log file for more details.
 
 Where logs should I look at? There is nothing in the mount log. Here is
 glusterd log:
 
 [2014-09-07 00:39:01.905900] I
 [glusterd-replace-brick.c:154:__glusterd_handle_replace_brick]
 0-management: Received replace brick commit request
 [2014-09-07 00:39:01.948259] I
 [glusterd-replace-brick.c:1441:rb_update_srcbrick_port] 0-: adding
 src-brick port no
 [2014-09-07 00:39:01.951238] I
 [glusterd-replace-brick.c:1495:rb_update_dstbrick_port] 0-: adding
 dst-brick port no
 [2014-09-07 00:39:01.974376] E
 [glusterd-replace-brick.c:1780:glusterd_op_replace_brick] 0-management:
 Commit operation failed
 [2014-09-07 00:39:01.974423] E
 [glusterd-op-sm.c:4109:glusterd_op_ac_send_commit_op] 0-management:
 Commit of operation 'Volume Replace brick' failed on localhost  
 
 The brick log has a lot of errors:
 
 [2014-09-07 00:44:44.041565] E
 [client-handshake.c:1544:client_query_portmap] 0-patchy-replace-brick:
 remote-subvolume not set in volfile
 [2014-09-07 00:44:44.041636] I [client.c:2215:client_rpc_notify]
 0-patchy-replace-brick: disconnected from patchy-replace-brick. Client
 process will keep trying to connect to glusterd until brick's port is
 available
 
 Any hint of where to look at?

Looking at the code, I can see there are lot of errors logged in debug
level, I would suggest you to run glusterd with -LDEBUG and reproduce
the issue such that you can get to the exact problem area looking at the
glusterd log (just check why rb_do_operation returns -1)

Having said that, I believe we should also change the loglevel from
DEBUG to ERROR in few failure cases in rb_do_operation ()

~Atin
 
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel