Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t
Hi, On 11/03/16 13:55, Jeff Darcy wrote: Raghavendra G and I discussed about this problem and the right way to fix it is to take a copy(without dict_foreach) of the dictionary in dict_foreach inside a lock and then loop over the local dictionary. I am worried about the performance implication of this I'm worried about the correctness implications. Any such copy will have to do the equivalent of dict_foreach even if it doesn't call the function of that name, and will be subject to the same races. Also included Xavi, who earlier said we need to change dict.c but it is a bigger change. May be the time has come? I would love to gather all your inputs and implement a better version of dict if we need one. There are three solutions I can think of. (1) Have tier use STACK_WIND_COOKIE to pass the address of a lock down both paths, so the two can synchronize between themselves. (2) Have tier issue the lookup down the two paths *serially* instead of in parallel, so there's no contention on the dictionary. This is probably most complicated *and* worst for performance, but I include it for the sake of completeness. (3) Enhance dict_t with a gf_lock_t that can be used to serialize access. We don't have to use the lock in every invocation of dict_foreach (though we should probably investigate that). For now, we can just use it in the code paths we know are contending. I suspect Xavi was suggesting (3) and I concur that it's the best long-term solution. My initial idea was quite different and probably unpractical to implement due to its implications in existing code. However here it is: The idea was to have a dict based on refcounting controlled by caller (instead of callee like it's done now) and without locks (only atomic operations to increase/decrease refcounting). When some dict needs to be modified, it's only done if refcount is 1, otherwise a copy is made before changing it. This means that any dict with more that one reference becomes read-only avoiding any interference between xlators. The most critical change is the caller controlled refcounting. Currently, when some xlator needs to use a dict, it takes a ref. This makes impossible to know if the already existing ref (owned by the caller) will be used or not, so we should always do copies of dicts, which is inefficient. When it's the caller who controls the refcounting, if it still needs the reference to do some other work after calling a function that needs the dict, it will increase the refcounting. Otherwise it will simply "give" its reference to the called function. With this approach we'll need to distinguish between functions that really need their own reference form those that can share the reference of the caller. Example: Current code: dict = ...; /* We won't use dict anymore in this function, but we still hold a reference on the dict */ do_something(..., dict, ...); /* We release our reference */ dict_unref(dict); - /* Function that needs its own reference to dict */ do_something(..., dict_t *dict, ...) { dict_ref(dict); /* We do something in background that uses the dict. It will take care of releasing the ref when not needed anymore. */ do_something_in_background(..., dict, ...); } /* Function that doesn't need its own reference to dict */ do_something(..., dict_t *dict, ...) { /* We do something with the dict here, but we don't return until all work has been done. In this case we can share the ref of the caller. */ } Proposed change: dict = ...; /* We won't use dict anymore, so we pass our reference to do_something() */ do_something(..., dict, ...); /* If do_something() shares our reference, we'll need to release it here. dict_unref(dict); */ /* Here we cannot use dict because we don't have any reference to it */ - /* Function that needs its own reference to dict */ do_something(..., dict_t *dict, ...) { /* We already have a reference that we can use */ ... /* We do something in background that uses the dict. It will take care of releasing the ref when not needed anymore. */ do_something_in_background(..., dict, ...); /* dict cannot be used here unless a ref has been taken before calling do_something_in_background(). */ } /* Function that doesn't need its own reference to dict */ do_something(..., dict_t *dict, ...) { /* We do something with the dict here, but we don't return until all work has been done. In this case we can share the ref of the caller. */ } if we need to keep a reference on the caller: dict = ...; /* Acquire a new reference for do_something() if it won't share our reference */ dict_ref(dict); do_som
Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t
On 03/11/2016 10:16 PM, Jeff Darcy wrote: Tier does send lookups serially, which fail on the hashed subvolumes of dhts. Both of them trigger lookup_everywhere which is executed in epoll threads, thus the they are executed in parallel. According to your earlier description, items are being deleted by EC (i.e. the cold tier) while AFR (i.e. the hot tier) is trying to access the same dictionary. That sounds pretty parallel across the two. It doesn't matter, though, because I think we agree that this solution is too messy anyway. (3) Enhance dict_t with a gf_lock_t that can be used to serialize access. We don't have to use the lock in every invocation of dict_foreach (though we should probably investigate that). For now, we can just use it in the code paths we know are contending. dict already has a lock. Yes, we have a lock which is used in get/set/add/delete - but not in dict_foreach for the reasons you mention. I should have been clearer that I was suggesting a *different* lock that's only used in this case. Manually locking with the lock we already have might not work due to recursive locking, but the lock ordering with a separate higher-level lock is pretty simple and it won't affect any other uses. I didn't quite get it. Could you elaborate it please? The race is between 1) dict_set() and 2) dict_foreach() Pranith Xavi was mentioning that dict_copy_with_ref is too costly, which is true, if we make this change it will be even more costly :-(. There are probably MVCC-ish approaches that could be both safe and performant, but they'd be quite complicated to implement. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t
> Tier does send lookups serially, which fail on the hashed subvolumes of > dhts. Both of them trigger lookup_everywhere which is executed in epoll > threads, thus the they are executed in parallel. According to your earlier description, items are being deleted by EC (i.e. the cold tier) while AFR (i.e. the hot tier) is trying to access the same dictionary. That sounds pretty parallel across the two. It doesn't matter, though, because I think we agree that this solution is too messy anyway. > > (3) Enhance dict_t with a gf_lock_t that can be used to serialize > > access. We don't have to use the lock in every invocation of > > dict_foreach (though we should probably investigate that). For > > now, we can just use it in the code paths we know are contending. > > dict already has a lock. Yes, we have a lock which is used in get/set/add/delete - but not in dict_foreach for the reasons you mention. I should have been clearer that I was suggesting a *different* lock that's only used in this case. Manually locking with the lock we already have might not work due to recursive locking, but the lock ordering with a separate higher-level lock is pretty simple and it won't affect any other uses. > Xavi was mentioning that dict_copy_with_ref is too costly, which is > true, if we make this change it will be even more costly :-(. There are probably MVCC-ish approaches that could be both safe and performant, but they'd be quite complicated to implement. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t
On 03/11/2016 06:25 PM, Jeff Darcy wrote: Raghavendra G and I discussed about this problem and the right way to fix it is to take a copy(without dict_foreach) of the dictionary in dict_foreach inside a lock and then loop over the local dictionary. I am worried about the performance implication of this I think what I forgot to say was that the copy will happen inside a lock dict already has. I'm worried about the correctness implications. Any such copy will have to do the equivalent of dict_foreach even if it doesn't call the function of that name, and will be subject to the same races. Also included Xavi, who earlier said we need to change dict.c but it is a bigger change. May be the time has come? I would love to gather all your inputs and implement a better version of dict if we need one. There are three solutions I can think of. (1) Have tier use STACK_WIND_COOKIE to pass the address of a lock down both paths, so the two can synchronize between themselves. (2) Have tier issue the lookup down the two paths *serially* instead of in parallel, so there's no contention on the dictionary. This is probably most complicated *and* worst for performance, but I include it for the sake of completeness. Tier does send lookups serially, which fail on the hashed subvolumes of dhts. Both of them trigger lookup_everywhere which is executed in epoll threads, thus the they are executed in parallel. So (1) won't be simple enough, in the sense it has to remember it in local etc. (3) Enhance dict_t with a gf_lock_t that can be used to serialize access. We don't have to use the lock in every invocation of dict_foreach (though we should probably investigate that). For now, we can just use it in the code paths we know are contending. dict already has a lock. The performance implication I was talking about was that we will allow syscalls like getxattr in posix which happen inside these locks if we execute dict_foreach inside this lock, which is costly. That is the reason Du and I wanted to do the copy of dict inside locks and then use this dictionary to carry out dict_foreach without any locks so that we don't have to be inside locks for too long. the copy shouldn't use dict_foreach for this reason :-). But I forgot to mention about the lock. I am wondering if anyone has better solution than this. I suspect Xavi was suggesting (3) and I concur that it's the best long-term solution. Xavi was mentioning that dict_copy_with_ref is too costly, which is true, if we make this change it will be even more costly :-(. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] races in dict_foreach() causing crashes in tier-file-creat.t
> Raghavendra G and I discussed about this problem and the right way to > fix it is to take a copy(without dict_foreach) of the dictionary in > dict_foreach inside a lock and then loop over the local dictionary. I am > worried about the performance implication of this I'm worried about the correctness implications. Any such copy will have to do the equivalent of dict_foreach even if it doesn't call the function of that name, and will be subject to the same races. > Also included Xavi, who earlier said we need to change dict.c but it is > a bigger change. May be the time has come? I would love to gather all > your inputs and implement a better version of dict if we need one. There are three solutions I can think of. (1) Have tier use STACK_WIND_COOKIE to pass the address of a lock down both paths, so the two can synchronize between themselves. (2) Have tier issue the lookup down the two paths *serially* instead of in parallel, so there's no contention on the dictionary. This is probably most complicated *and* worst for performance, but I include it for the sake of completeness. (3) Enhance dict_t with a gf_lock_t that can be used to serialize access. We don't have to use the lock in every invocation of dict_foreach (though we should probably investigate that). For now, we can just use it in the code paths we know are contending. I suspect Xavi was suggesting (3) and I concur that it's the best long-term solution. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel