Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

2016-03-14 Thread Xavier Hernandez

Hi,

On 11/03/16 13:55, Jeff Darcy wrote:

Raghavendra G and I discussed about this problem and the right way to
fix it is to take a copy(without dict_foreach) of the dictionary in
dict_foreach inside a lock and then loop over the local dictionary. I am
worried about the performance implication of this


I'm worried about the correctness implications.  Any such copy will have
to do the equivalent of dict_foreach even if it doesn't call the function
of that name, and will be subject to the same races.


Also included Xavi, who earlier said we need to change dict.c but it is
a bigger change. May be the time has come? I would love to gather all
your inputs and implement a better version of dict if we need one.


There are three solutions I can think of.

(1) Have tier use STACK_WIND_COOKIE to pass the address of a lock down
both paths, so the two can synchronize between themselves.

(2) Have tier issue the lookup down the two paths *serially* instead
of in parallel, so there's no contention on the dictionary.  This is
probably most complicated *and* worst for performance, but I include
it for the sake of completeness.

(3) Enhance dict_t with a gf_lock_t that can be used to serialize
access.  We don't have to use the lock in every invocation of
dict_foreach (though we should probably investigate that).  For
now, we can just use it in the code paths we know are contending.

I suspect Xavi was suggesting (3) and I concur that it's the best
long-term solution.


My initial idea was quite different and probably unpractical to 
implement due to its implications in existing code. However here it is:


The idea was to have a dict based on refcounting controlled by caller 
(instead of callee like it's done now) and without locks (only atomic 
operations to increase/decrease refcounting). When some dict needs to be 
modified, it's only done if refcount is 1, otherwise a copy is made 
before changing it. This means that any dict with more that one 
reference becomes read-only avoiding any interference between xlators.


The most critical change is the caller controlled refcounting. 
Currently, when some xlator needs to use a dict, it takes a ref. This 
makes impossible to know if the already existing ref (owned by the 
caller) will be used or not, so we should always do copies of dicts, 
which is inefficient.


When it's the caller who controls the refcounting, if it still needs the 
reference to do some other work after calling a function that needs the 
dict, it will increase the refcounting. Otherwise it will simply "give" 
its reference to the called function. With this approach we'll need to 
distinguish between functions that really need their own reference form 
those that can share the reference of the caller.



Example:

  Current code:

 dict = ...;

 /* We won't use dict anymore in this function, but we still hold
a reference on the dict */
 do_something(..., dict, ...);
 /* We release our reference */
 dict_unref(dict);

 -

 /* Function that needs its own reference to dict */
 do_something(..., dict_t *dict, ...) {
 dict_ref(dict);

 /* We do something in background that uses the dict. It will
take care of releasing the ref when not needed anymore. */
 do_something_in_background(..., dict, ...);
 }

 /* Function that doesn't need its own reference to dict */
 do_something(..., dict_t *dict, ...) {
 /* We do something with the dict here, but we don't return
until all work has been done. In this case we can share the
ref of the caller. */
 
 }


  Proposed change:

dict = ...;

/* We won't use dict anymore, so we pass our reference to
   do_something() */
do_something(..., dict, ...);

/* If do_something() shares our reference, we'll need to release it
   here.

   dict_unref(dict);

*/

/* Here we cannot use dict because we don't have any reference to
   it */

-

/* Function that needs its own reference to dict */
do_something(..., dict_t *dict, ...) {
/* We already have a reference that we can use */
...
/* We do something in background that uses the dict. It will
   take care of releasing the ref when not needed anymore. */
do_something_in_background(..., dict, ...);
/* dict cannot be used here unless a ref has been taken before
   calling do_something_in_background(). */
}

/* Function that doesn't need its own reference to dict */
do_something(..., dict_t *dict, ...) {
/* We do something with the dict here, but we don't return
   until all work has been done. In this case we can share the
   ref of the caller. */

}


if we need to keep a reference on the caller:

dict = ...;

/* Acquire a new reference for do_something() if it won't share our
   reference */
dict_ref(dict);

do_som

Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

2016-03-13 Thread Pranith Kumar Karampuri



On 03/11/2016 10:16 PM, Jeff Darcy wrote:

Tier does send lookups serially, which fail on the hashed subvolumes of
dhts. Both of them trigger lookup_everywhere which is executed in epoll
threads, thus the they are executed in parallel.

According to your earlier description, items are being deleted by EC
(i.e. the cold tier) while AFR (i.e. the hot tier) is trying to access
the same dictionary.  That sounds pretty parallel across the two.  It
doesn't matter, though, because I think we agree that this solution is
too messy anyway.


(3) Enhance dict_t with a gf_lock_t that can be used to serialize
access.  We don't have to use the lock in every invocation of
dict_foreach (though we should probably investigate that).  For
now, we can just use it in the code paths we know are contending.

dict already has a lock.

Yes, we have a lock which is used in get/set/add/delete - but not in
dict_foreach for the reasons you mention.  I should have been clearer
that I was suggesting a *different* lock that's only used in this
case.  Manually locking with the lock we already have might not work
due to recursive locking, but the lock ordering with a separate
higher-level lock is pretty simple and it won't affect any other uses.
I didn't quite get it. Could you elaborate it please? The race is 
between 1) dict_set() and 2) dict_foreach()


Pranith



Xavi was mentioning that dict_copy_with_ref is too costly, which is
true, if we make this change it will be even more costly :-(.

There are probably MVCC-ish approaches that could be both safe and
performant, but they'd be quite complicated to implement.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

2016-03-11 Thread Jeff Darcy
> Tier does send lookups serially, which fail on the hashed subvolumes of
> dhts. Both of them trigger lookup_everywhere which is executed in epoll
> threads, thus the they are executed in parallel.

According to your earlier description, items are being deleted by EC
(i.e. the cold tier) while AFR (i.e. the hot tier) is trying to access
the same dictionary.  That sounds pretty parallel across the two.  It
doesn't matter, though, because I think we agree that this solution is
too messy anyway.

> > (3) Enhance dict_t with a gf_lock_t that can be used to serialize
> > access.  We don't have to use the lock in every invocation of
> > dict_foreach (though we should probably investigate that).  For
> > now, we can just use it in the code paths we know are contending.
> 
> dict already has a lock.

Yes, we have a lock which is used in get/set/add/delete - but not in
dict_foreach for the reasons you mention.  I should have been clearer
that I was suggesting a *different* lock that's only used in this
case.  Manually locking with the lock we already have might not work
due to recursive locking, but the lock ordering with a separate
higher-level lock is pretty simple and it won't affect any other uses.

> Xavi was mentioning that dict_copy_with_ref is too costly, which is
> true, if we make this change it will be even more costly :-(.

There are probably MVCC-ish approaches that could be both safe and
performant, but they'd be quite complicated to implement.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

2016-03-11 Thread Pranith Kumar Karampuri



On 03/11/2016 06:25 PM, Jeff Darcy wrote:

Raghavendra G and I discussed about this problem and the right way to
fix it is to take a copy(without dict_foreach) of the dictionary in
dict_foreach inside a lock and then loop over the local dictionary. I am
worried about the performance implication of this


I think what I forgot to say was that the copy will happen inside a lock 
dict already has.



I'm worried about the correctness implications.  Any such copy will have
to do the equivalent of dict_foreach even if it doesn't call the function
of that name, and will be subject to the same races.


Also included Xavi, who earlier said we need to change dict.c but it is
a bigger change. May be the time has come? I would love to gather all
your inputs and implement a better version of dict if we need one.

There are three solutions I can think of.

(1) Have tier use STACK_WIND_COOKIE to pass the address of a lock down
both paths, so the two can synchronize between themselves.

(2) Have tier issue the lookup down the two paths *serially* instead
of in parallel, so there's no contention on the dictionary.  This is
probably most complicated *and* worst for performance, but I include
it for the sake of completeness.


Tier does send lookups serially, which fail on the hashed subvolumes of 
dhts. Both of them trigger lookup_everywhere which is executed in epoll 
threads, thus the they are executed in parallel. So (1) won't be simple 
enough, in the sense it has to remember it in local etc.




(3) Enhance dict_t with a gf_lock_t that can be used to serialize
access.  We don't have to use the lock in every invocation of
dict_foreach (though we should probably investigate that).  For
now, we can just use it in the code paths we know are contending.


dict already has a lock. The performance implication I was talking about 
was that we will allow syscalls like getxattr in posix which happen 
inside these locks if we execute dict_foreach inside this lock, which is 
costly. That is the reason Du and I wanted to do the copy of dict inside 
locks and then use this dictionary to carry out dict_foreach without any 
locks so that we don't have to be inside locks for too long. the copy 
shouldn't use dict_foreach for this reason :-). But I forgot to mention 
about the lock. I am wondering if anyone has better solution than this.




I suspect Xavi was suggesting (3) and I concur that it's the best
long-term solution.
Xavi was mentioning that dict_copy_with_ref is too costly, which is 
true, if we make this change it will be even more costly :-(.


Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t

2016-03-11 Thread Jeff Darcy
> Raghavendra G and I discussed about this problem and the right way to
> fix it is to take a copy(without dict_foreach) of the dictionary in
> dict_foreach inside a lock and then loop over the local dictionary. I am
> worried about the performance implication of this

I'm worried about the correctness implications.  Any such copy will have
to do the equivalent of dict_foreach even if it doesn't call the function
of that name, and will be subject to the same races.

> Also included Xavi, who earlier said we need to change dict.c but it is
> a bigger change. May be the time has come? I would love to gather all
> your inputs and implement a better version of dict if we need one.

There are three solutions I can think of.

(1) Have tier use STACK_WIND_COOKIE to pass the address of a lock down
both paths, so the two can synchronize between themselves.

(2) Have tier issue the lookup down the two paths *serially* instead
of in parallel, so there's no contention on the dictionary.  This is
probably most complicated *and* worst for performance, but I include
it for the sake of completeness.

(3) Enhance dict_t with a gf_lock_t that can be used to serialize
access.  We don't have to use the lock in every invocation of
dict_foreach (though we should probably investigate that).  For
now, we can just use it in the code paths we know are contending.

I suspect Xavi was suggesting (3) and I concur that it's the best
long-term solution.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel