[Gluster-devel] Any review is appreciated. Reason about gluster server_connection_cleanup uncleanly, file flocks leaks in frequently network disconnection

2014-09-17 Thread Jaden Liang
Hi all,

By several days tracking, we finally pinpointed the reason of glusterfs
uncleanly
detach file flocks in frequently network disconnection. We are now working
on
a patch to submit. And here is this issue details. Any suggestions will be
appreciated!

First of all, as I mentioned in
http://supercolony.gluster.org/pipermail/gluster-devel/2014-September/042233.html
This issue happens in a frequently network disconnection.

According to the sources, the server cleanup jobs is in
server_connection_cleanup.
When the RPCSVC_EVENT_DISCONNECT happens, it will come here:

int
server_rpc_notify ()
{
..
case RPCSVC_EVENT_DISCONNECT:
..
if (!conf-lk_heal) {
server_conn_ref (conn);
server_connection_put (this, conn, detached);
if (detached)
server_connection_cleanup (this, conn,
   INTERNAL_LOCKS |
   POSIX_LOCKS);
server_conn_unref (conn);
..
}

The server_connection_cleanup() will be called while variable 'detached' is
true.
And the 'detached' is set by server_connection_put():
server_connection_t*
server_connection_put (xlator_t *this, server_connection_t *conn,
   gf_boolean_t *detached)
{
server_conf_t   *conf = NULL;
gf_boolean_tunref = _gf_false;

if (detached)
*detached = _gf_false;
conf = this-private;
pthread_mutex_lock (conf-mutex);
{
conn-bind_ref--;
if (!conn-bind_ref) {
list_del_init (conn-list);
unref = _gf_true;
}
}
pthread_mutex_unlock (conf-mutex);
if (unref) {
gf_log (this-name, GF_LOG_INFO, Shutting down connection
%s,
conn-id);
if (detached)
*detached = _gf_true;
server_conn_unref (conn);
conn = NULL;
}
return conn;
}

The 'detached' is only set _gf_true when 'conn-bind_ref' decrease to 0.
This 'conn-bind_ref' is set in server_connection_get(), increase or set to
1.

server_connection_t *
server_connection_get (xlator_t *this, const char *id)
{
..
list_for_each_entry (trav, conf-conns, list) {
if (!strcmp (trav-id, id)) {
conn = trav;
conn-bind_ref++;
goto unlock;
}
}
..
}

When the connection id is same, then the 'conn-bind_ref' will be increased.
Therefore, the problem should be a reference mismatch increase or decrease.
Then
we add some logs to verify our guess.

// 1st connection comes in. and there is no id
'host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0'
in the connection table. The 'conn-bind_ref' is set to 1.
[2014-09-17 04:42:28.950693] D [server-helpers.c:712:server_connection_get]
0-vs_vol_rep2-server: server connection id:
host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0,
conn-bind_ref:1, found:0
[2014-09-17 04:42:28.950717] D [server-handshake.c:430:server_setvolume]
0-vs_vol_rep2-server: Connected to
host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0
[2014-09-17 04:42:28.950758] I [server-handshake.c:567:server_setvolume]
0-vs_vol_rep2-server: accepted client from
host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0
(version: 3.4.5) (peer: host-000c29e93d20:1015)
..
// Keep running several minutes...
..
// Network disconnected here. The TCP socket of client side is disconnected
by
time-out, by the server-side socket still keep connected. AT THIS MOMENT,
network restore. Client side reconnect a new TCP connection JUST BEFORE the
last socket on server-side is reset. Note that at this point, there is 2
valid
sockets on server side. The later new connection use the same conn id
'host-000
c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0' look up
in the
connection table and increase the 'conn-bind_ref' to 2.

[2014-09-17 04:46:16.135066] D [server-helpers.c:712:server_connection_get]
0-vs_vol_rep2-server: server connection id:
host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0,
conn-bind_ref:2, found:1 // HERE IT IS, ref increase to 2!!!
[2014-09-17 04:46:16.135113] D [server-handshake.c:430:server_setvolume]
0-vs_vol_rep2-server: Connected to
host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0
[2014-09-17 04:46:16.135157] I [server-handshake.c:567:server_setvolume]
0-vs_vol_rep2-server: accepted client from
host-000c29e93d20-8661-2014/09/13-11:02:26:995090-vs_vol_rep2-client-2-0
(version: 3.4.5) (peer: host-000c29e93d20:1018)

// 

[Gluster-devel] Reminder: Weekly GlusterFS Community Meeting in 45 minutes

2014-09-17 Thread Justin Clift
Reminder!!!

The weekly Gluster Community meeting is in 45 minutes, in
#gluster-meeting on IRC.

This is a completely public meeting, everyone is encouraged
to attend and be a part of it. :)

To add Agenda items
***

Add new items under the Other items to discuss point on the
Etherpad:

  https://public.pad.fsfe.org/p/gluster-community-meetings

And be at the meeting to explain what they're about. :)

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Reminder: Weekly GlusterFS Community Meeting in 45 minutes

2014-09-17 Thread Justin Clift
On 17/09/2014, at 12:14 PM, Justin Clift wrote:
 Reminder!!!
 
 The weekly Gluster Community meeting is in 45 minutes, in
 #gluster-meeting on IRC.
 
 This is a completely public meeting, everyone is encouraged
 to attend and be a part of it. :)

Short meeting today. ;)

Meeting Minutes:

  
http://meetbot.fedoraproject.org/gluster-meeting/2014-09-17/gluster-meeting.2014-09-17-12.02.html

Full logs:

  
http://meetbot.fedoraproject.org/gluster-meeting/2014-09-17/gluster-meeting.2014-09-17-12.02.log.html

Regards and best wishes,

Justin Clift

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Transparent encryption in GlusterFS: Implications on manageability

2014-09-17 Thread Edward Shishkin
Hi all,

Unfortunately it is impossible to validate non-trusted volfiles
using existing glusterfs options. Semantic and format of values
passed by the --xlator-option don't allow to deliver trusted
values without compromises with security.

So I have added a new --secure-xlator-option,
Please, review:
review.gluster.org/8657

Thanks,
Edward.


On Wed, 13 Aug 2014 12:26:29 -0700
Anand Avati av...@gluster.org wrote:

 +1 for all the points.
 
 
 On Wed, Aug 13, 2014 at 11:22 AM, Jeff Darcy jda...@redhat.com
 wrote:
 
   I.1 Generating the master volume key
  
  
   Master volume key should be generated by user on the trusted
   machine. Recommendations on master key generation provided at
   section 6.2 of the manpages [1]. Generating of master volume key
   is in user's competence.
 
  That was fine for an initial implementation, but it's still the
  single largest obstacle to adoption of this feature.  Looking
  forward, we need to provide full CLI support for generating keys in
  the necessary format, specifying their location, etc.
 
  I.2 Location of the master volume key when mounting a
  volume
  
  
   At mount time the crypt translator searches for a master volume
   key on the client machine at the location specified by the
   respective translator option. If there is no any key at the
   specified location, or the key at specified location is in
   improper format, then mount will fail. Otherwise, the crypt
   translator loads the key to its private memory data structures.
  
   Location of the master volume key can be specified at volume
   creation time (see option master-key, section 6.7 of the man
   pages [1]). However, this option can be overridden by user at
   mount time to specify another location, see section 7 of manpages
   [1], steps 6, 7, 8.
 
  Again, we need to improve on this.  We should support this as a
  volume or mount option in its own right, not rely on the generic
  --xlator-option mechanism.  Adding options to mount.glusterfs isn't
  hard.  Alternatively, we could make this look like a volume option
  settable once through the CLI, even though the path is stored
  locally on the client.  Or we could provide a separate
  special-purpose command/script, which again only needs to be run
  once.  It would even be acceptable to treat the path to the key
  file (not its contents!) as a true volume option, stored on the
  servers.  Any of these would be better than requiring the user to
  understand our volfile format and construction so that they can add
  the necessary option by hand.
 
  II. Check graph of translators on your client
   machine after mount!
  
  
   During mount your client machine receives configuration info from
   the non-trusted server. In particular, this info contains the
   graph of translators, which can be subjected to tampering, so
   that encryption won't be invoked for your volume at all. So it is
   highly important to verify this graph. After successful mount
   make sure that the graph of translators contains the crypt
   translator with proper options (see FAQ#1, section 11 of the
   manpages [1]).
 
  It is important to verify the graph, but not by poking through log
  files and not without more information about what to look for.  So
  we got a volfile that includes the crypt translator, with some
  options.  The *code* should ensure that the master-key option has
  the value from the command line or local config, and not some
  other.  If we have to add special support for this in
  otherwise-generic graph initialization code, that's fine.
  ___
  Gluster-devel mailing list
  Gluster-devel@gluster.org
  http://supercolony.gluster.org/mailman/listinfo/gluster-devel
 

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] how do you debug ref leaks?

2014-09-17 Thread Pranith Kumar Karampuri

hi,
Till now the only method I used to find ref leaks effectively is to 
find what operation is causing ref leaks and read the code to find if 
there is a ref-leak somewhere. Valgrind doesn't solve this problem 
because it is reachable memory from inode-table etc. I am just wondering 
if there is an effective way anyone else knows of. Do you guys think we 
need a better mechanism of finding refleaks? At least which decreases 
the search space significantly i.e. xlator y, fop f etc? It would be 
better if we can come up with ways to integrate statedump and this infra 
just like we did for mem-accounting.


One way I thought was to introduce new apis called 
xl_fop_dict/inode/fd_ref/unref (). Each xl keeps an array of num_fops 
per inode/dict/fd and increments/decrements accordingly. Dump this info 
on statedump.


I myself am not completely sure about this idea. It requires all xlators 
to change.


Any ideas?

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] how do you debug ref leaks?

2014-09-17 Thread Raghavendra Gowdappa


- Original Message -
 From: Raghavendra Gowdappa rgowd...@redhat.com
 To: Pranith Kumar Karampuri pkara...@redhat.com
 Cc: Gluster Devel gluster-devel@gluster.org
 Sent: Thursday, September 18, 2014 10:08:15 AM
 Subject: Re: [Gluster-devel] how do you debug ref leaks?
 
 For eg., if a dictionary is not freed because of non-zero refcount, if there
 is an information on who has held these references would help to narrow down
 the code path or component.

This solution might be rudimentary. However, someone who has worked on things 
like garbage collection can give better answers I think. This discussion also 
reminds me of Greenspun's tenth rule [1]

[1] http://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

 
 - Original Message -
  From: Pranith Kumar Karampuri pkara...@redhat.com
  To: Raghavendra Gowdappa rgowd...@redhat.com
  Cc: Gluster Devel gluster-devel@gluster.org
  Sent: Thursday, September 18, 2014 10:05:18 AM
  Subject: Re: [Gluster-devel] how do you debug ref leaks?
  
  
  On 09/18/2014 09:59 AM, Raghavendra Gowdappa wrote:
   One thing that would be helpful is allocator info for generic objects
   like dict, inode, fd etc. That way we wouldn't have to sift through large
   amount of code.
  Could you elaborate the idea please.
  
  Pranith
   - Original Message -
   From: Pranith Kumar Karampuri pkara...@redhat.com
   To: Gluster Devel gluster-devel@gluster.org
   Sent: Thursday, September 18, 2014 7:43:00 AM
   Subject: [Gluster-devel] how do you debug ref leaks?
  
   hi,
 Till now the only method I used to find ref leaks effectively is
 to
   find what operation is causing ref leaks and read the code to find if
   there is a ref-leak somewhere. Valgrind doesn't solve this problem
   because it is reachable memory from inode-table etc. I am just wondering
   if there is an effective way anyone else knows of. Do you guys think we
   need a better mechanism of finding refleaks? At least which decreases
   the search space significantly i.e. xlator y, fop f etc? It would be
   better if we can come up with ways to integrate statedump and this infra
   just like we did for mem-accounting.
  
   One way I thought was to introduce new apis called
   xl_fop_dict/inode/fd_ref/unref (). Each xl keeps an array of num_fops
   per inode/dict/fd and increments/decrements accordingly. Dump this info
   on statedump.
  
   I myself am not completely sure about this idea. It requires all xlators
   to change.
  
   Any ideas?
  
   Pranith
   ___
   Gluster-devel mailing list
   Gluster-devel@gluster.org
   http://supercolony.gluster.org/mailman/listinfo/gluster-devel
  
  
  
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://supercolony.gluster.org/mailman/listinfo/gluster-devel
 
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] how do you debug ref leaks?

2014-09-17 Thread Pranith Kumar Karampuri


On 09/18/2014 10:08 AM, Raghavendra Gowdappa wrote:

For eg., if a dictionary is not freed because of non-zero refcount, if there is 
an information on who has held these references would help to narrow down the 
code path or component.
Yes that is the aim. The implementation I suggested tries to get that 
information per xlator. Are you saying it is better to store in the 
dict/inode/fd itself? I actually wrote one patch sometime back to do 
something similar (Not tested yet :-) ).


diff --git a/libglusterfs/src/dict.c b/libglusterfs/src/dict.c
index 37a6b2c..bd4c438 100644
--- a/libglusterfs/src/dict.c
+++ b/libglusterfs/src/dict.c
@@ -100,8 +100,10 @@ dict_new (void)

 dict = get_new_dict_full(1);

-if (dict)
+if (dict) {
+dict-refs = get_new_dict_full(1);
 dict_ref (dict);
+}

 return dict;
 }
@@ -446,6 +448,7 @@ dict_destroy (dict_t *this)
 data_pair_t *pair = this-members_list;
 data_pair_t *prev = this-members_list;

+dict_destroy (this-refs);
 LOCK_DESTROY (this-lock);

 while (prev) {
@@ -495,15 +498,21 @@ dict_unref (dict_t *this)
 dict_t *
 dict_ref (dict_t *this)
 {
+int32_t ref = 0;
+xlat_t  *x = NULL;
 if (!this) {
 gf_log_callingfn (dict, GF_LOG_WARNING, dict is NULL);
 return NULL;
 }

+x = THIS;
 LOCK (this-lock);

 this-refcount++;

+dict_get_in32 (this-refs, x-name, ref);
+dict_set_int32 (this-refs, x-name, ref+1);
+
 UNLOCK (this-lock);

 return this;
@@ -513,15 +522,20 @@ void
 data_unref (data_t *this)
 {
 int32_t ref;
+xlator_t *x = NULL;

 if (!this) {
 gf_log_callingfn (dict, GF_LOG_WARNING, dict is NULL);
 return;
 }

+x = THIS;
 LOCK (this-lock);

 this-refcount--;
+dict_get_in32 (this-refs, x-name, ref);
+dict_set_int32 (this-refs, x-name, ref-1);
+
 ref = this-refcount;

 UNLOCK (this-lock);
diff --git a/libglusterfs/src/dict.h b/libglusterfs/src/dict.h
index 682c152..33ed7bd 100644
--- a/libglusterfs/src/dict.h
+++ b/libglusterfs/src/dict.h
@@ -93,6 +93,7 @@ struct _dict {
 data_pair_t*members_internal;
 data_pair_t free_pair;
 gf_boolean_tfree_pair_in_use;
+struct _dict   *refs;
 };

I was not happy with this implementation either.

similar implementation for inode here: http://review.gluster.com/8302

But I am not happy with any of these implementations. Probably because 
it is still not granular enough, i.e. fop info is missing. It is better 
than no info but still bad. We can't have it on statedump either. So 
when User reports high memory usage we still have to spend lot of time 
looking over all the places where the dicts are allocated which is bad :-(.


Pranith


- Original Message -

From: Pranith Kumar Karampuri pkara...@redhat.com
To: Raghavendra Gowdappa rgowd...@redhat.com
Cc: Gluster Devel gluster-devel@gluster.org
Sent: Thursday, September 18, 2014 10:05:18 AM
Subject: Re: [Gluster-devel] how do you debug ref leaks?


On 09/18/2014 09:59 AM, Raghavendra Gowdappa wrote:

One thing that would be helpful is allocator info for generic objects
like dict, inode, fd etc. That way we wouldn't have to sift through large
amount of code.

Could you elaborate the idea please.

Pranith

- Original Message -

From: Pranith Kumar Karampuri pkara...@redhat.com
To: Gluster Devel gluster-devel@gluster.org
Sent: Thursday, September 18, 2014 7:43:00 AM
Subject: [Gluster-devel] how do you debug ref leaks?

hi,
   Till now the only method I used to find ref leaks effectively is to
find what operation is causing ref leaks and read the code to find if
there is a ref-leak somewhere. Valgrind doesn't solve this problem
because it is reachable memory from inode-table etc. I am just wondering
if there is an effective way anyone else knows of. Do you guys think we
need a better mechanism of finding refleaks? At least which decreases
the search space significantly i.e. xlator y, fop f etc? It would be
better if we can come up with ways to integrate statedump and this infra
just like we did for mem-accounting.

One way I thought was to introduce new apis called
xl_fop_dict/inode/fd_ref/unref (). Each xl keeps an array of num_fops
per inode/dict/fd and increments/decrements accordingly. Dump this info
on statedump.

I myself am not completely sure about this idea. It requires all xlators
to change.

Any ideas?

Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] glusterfs replica volume self heal lots of small file very very slow!how to improve? why slow?

2014-09-17 Thread justgluste...@gmail.com
Hi all:
  I do the  following test:
 I create a glusterfs  replica volume (replica count is 2 ) with two server 
node(server A and server B),use XFS as the underlying filesystem,  then  mount 
the volume in client node,
then, I  shut down the network of server A node, in  client node, I copy a 
dir(which has a lot of small files), the dir size is 2.9GByte,
when  copy finish, I unmount the volume from the  client, then I start the 
network of server A node,   now, glusterfs  self-heal-daemon start heal dir  
from  server B to  server  A, 
in the  end,  I find the self-heal-daemon   heal the  dir use  40 minutes,  
It's too slow!  why?

   I   find out   related options  with  self-heal, as  follow:
   cluster.self-heal-window-size
   cluster.self-heal-readdir-size
   cluster.background-self-heal-count

   
  then  I  config :
  cluster.self-heal-window-size  is  1024(max value)
  cluster.self-heal-readdir-size   is  131072(max  value)
   
  and  then  do  the  same  test case,  find  this times  heal the dir  use 35 
minutes,   The effective is not obvious, 
  

  I  want  to ask,  If there are better ways to improve replica volume self 
heal  lots of small file  performance??
  
  thanks!




justgluste...@gmail.com
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel