Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas
Am 22.07.2018 um 10:04 schrieb Nguyễn Thái Ngọc Duy: + if (size < pack->oe_delta_size_limit) { + e->delta_size_ = size; + e->delta_size_valid = 1; + } else { + packing_data_lock(pack); + if (!pack->delta_size) + ALLOC_ARRAY(pack->delta_size, pack->nr_alloc); + packing_data_unlock(pack); + + pack->delta_size[e - pack->objects] = size; My first thought was that this is wrong (falling prey to the same mistake as the double-checked locking pattern). But after thinking twice over it again, I think that this unprotected access of pack->delta_size is thread-safe. Of course, I'm assuming that different threads never assign to the same array index. + e->delta_size_valid = 0; + } } -- Hannes
Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas
On Mon, Jul 23, 2018 at 08:49:59PM +0200, Duy Nguyen wrote: > On Mon, Jul 23, 2018 at 8:38 PM Duy Nguyen wrote: > > I will have to study the thread dispatch code more to have a better > > answer, unfortunately. > > Well.. I thought I would need this weekend for this, but a quick look > and ll_find_deltas() suggests that what we're doing is safe. At least > you and Jeff are way to familiar with the delta window concept in > pack-objects. So in multithread mode, we have a big list of all > objects, then the list is cut in N sublists and each is owned entirely > by one thread. Each thread then can slide a window over this sublist > to search for the best delta. > > Because of this partitioning, if trg_entry is being processed now, it > will not be accessed by any other thread. It's owned by this thread > and will be accessed again as src_entry when the next entry becomes > the delta target in the same thread. As long as we don't access > objects of another thread (and my v1 violates this) we should be ok. Yes, that matches my knowledge of how this all works. And if it didn't, I think the code would have been racy even _before_ your patches. The only thing that this pack->delta_size approach is changing is that managing that array needs to happen under lock, because it touches the whole list. And as long as we back-fill it from any arbitrary e->delta_size_, that means that touching e->delta_size_ needs to be done under lock. That's why I think keeping the individual valid flag in each entry makes the most sense. Then whenever we overflow a particular e->delta_size_, we don't have to care about anybody else's size. Which I think is what your v2 patch is doing. -Peff
Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas
On Mon, Jul 23, 2018 at 8:38 PM Duy Nguyen wrote: > I will have to study the thread dispatch code more to have a better > answer, unfortunately. Well.. I thought I would need this weekend for this, but a quick look and ll_find_deltas() suggests that what we're doing is safe. At least you and Jeff are way to familiar with the delta window concept in pack-objects. So in multithread mode, we have a big list of all objects, then the list is cut in N sublists and each is owned entirely by one thread. Each thread then can slide a window over this sublist to search for the best delta. Because of this partitioning, if trg_entry is being processed now, it will not be accessed by any other thread. It's owned by this thread and will be accessed again as src_entry when the next entry becomes the delta target in the same thread. As long as we don't access objects of another thread (and my v1 violates this) we should be ok. At the end of ll_find_deltas() though, we have the "stealing" stuff, but it's protected by progress_lock() already, outside try_delta() so we're sure nobody is updating any object_entry when the stealing happens. I will double check again this weekend just to be sure. -- Duy
Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas
On Mon, Jul 23, 2018 at 8:04 PM Junio C Hamano wrote: > > Nguyễn Thái Ngọc Duy writes: > > > Access to e->delta_size_ (and by extension > > pack->delta_size[e - pack->objects]) is unprotected as before, the > > thread scheduler in pack-objects must make sure "e" is never updated > > by two different threads. > > OK. Do we need to worry about "e" (e.g. "e->delta_size_valid") > being accessed while/before it is set by another thread? I left some details out because I think it's getting to a gray area. We already do this if (!DELTA(trg_entry)) { max_size = trg_size/2 - the_hash_algo->rawsz; ref_depth = 1; } else { max_size = DELTA_SIZE(trg_entry); ... SET_DELTA(trg_entry, src_entry); SET_DELTA_SIZE(trg_entry, delta_size); if the bottom half is running on one thread and stopped in the middle while the top half is running in another thread, we have a problem. But perhaps max_size is not that big a deal because incorrect max_size may produce a bad pack but can't corrupt it. I will have to study the thread dispatch code more to have a better answer, unfortunately. -- Duy
Re: [PATCH v2] pack-objects: fix performance issues on packing large deltas
Nguyễn Thái Ngọc Duy writes: > Access to e->delta_size_ (and by extension > pack->delta_size[e - pack->objects]) is unprotected as before, the > thread scheduler in pack-objects must make sure "e" is never updated > by two different threads. OK. Do we need to worry about "e" (e.g. "e->delta_size_valid") being accessed while/before it is set by another thread? oe_delta_size() makes unprotected accesses to .delta_size_ and pack->delta_size[e - pack->objects], so we apparently do not, and oe_set_delta_size() only protects the allocation call and does not prevent a reader in oe_delta_size() from first reading the _valid field, noticing that it is 0 as initialized, and goes on to read the pack->delta_size[] slot for the entry, while the writer is setting the size to .delta_size_ field and flipping _valid bit, without ever storing the size in the pack->delta_size[] array. > @@ -130,6 +131,7 @@ struct packing_data { > uint32_t index_size; > > unsigned int *in_pack_pos; > + unsigned long *delta_size; > > /* >* Only one of these can be non-NULL and they have different > @@ -140,10 +142,29 @@ struct packing_data { > struct packed_git **in_pack_by_idx; > struct packed_git **in_pack; > > +#ifndef NO_PTHREADS > + pthread_mutex_t lock; I am wondering if we want the variable to say what data it is protecting from simultaneous accesses, or leave it as generic so that any new caller that wants to lock any (new) thing that is associated with a packing_data structure can grab it for other purposes. The design of this patch clearly is the latter, which is OK for now, I think. > @@ -332,18 +353,34 @@ static inline unsigned long oe_delta_size(struct > packing_data *pack, > { > if (e->delta_size_valid) > return e->delta_size_; > - return oe_size(pack, e); > + > + /* > + * pack->detla_size[] can't be NULL because oe_set_delta_size() > + * must have been called when a new delta is saved with > + * oe_set_delta(). > + * If oe_delta() returns NULL (i.e. default state, which means > + * delta_size_valid is also false), then the caller must never > + * call oe_delta_size(). > + */ > + return pack->delta_size[e - pack->objects]; > } > > static inline void oe_set_delta_size(struct packing_data *pack, >struct object_entry *e, >unsigned long size) > { > - e->delta_size_ = size; > - e->delta_size_valid = e->delta_size_ == size; > - if (!e->delta_size_valid && size != oe_size(pack, e)) > - BUG("this can only happen in check_object() " > - "where delta size is the same as entry size"); > + if (size < pack->oe_delta_size_limit) { > + e->delta_size_ = size; > + e->delta_size_valid = 1; > + } else { > + packing_data_lock(pack); > + if (!pack->delta_size) > + ALLOC_ARRAY(pack->delta_size, pack->nr_alloc); > + packing_data_unlock(pack); > + > + pack->delta_size[e - pack->objects] = size; > + e->delta_size_valid = 0; > + } > }
[PATCH v2] pack-objects: fix performance issues on packing large deltas
Let's start with some background about oe_delta_size() and oe_set_delta_size(). If you already know, skip the next paragraph. These two are added in 0aca34e826 (pack-objects: shrink delta_size field in struct object_entry - 2018-04-14) to help reduce 'struct object_entry' size. The delta size field in this struct is reduced to only contain max 1MB. So if any new delta is produced and larger than 1MB, it's dropped because we can't really save such a large size anywhere. Fallback is provided in case existing packfiles already have large deltas, then we can retrieve it from the pack. While this should help small machines repacking large repos without large deltas (i.e. less memory pressure), dropping large deltas during the delta selection process could end up with worse pack files. And if existing packfiles already have >1MB delta and pack-objects is instructed to not reuse deltas, all of them will be dropped on the floor, and the resulting pack would be definitely bigger. There is also a regression in terms of CPU/IO if we have large on-disk deltas because fallback code needs to parse the pack every time the delta size is needed and just access to the mmap'd pack data is enough for extra page faults when memory is under pressure. Both of these issues were reported on the mailing list. Here's some numbers for comparison. Version Pack (MB) MaxRSS(kB) Time (s) --- - -- 2.17.0 5498 435136282494.85 2.18.010531 404495964168.94 This patch provides a better fallback that is - cheaper in terms of cpu and io because we won't have to read existing pack files as much - better in terms of pack size because the pack heuristics is back to 2.17.0 time, we do not drop large deltas at all If we encounter any delta (on-disk or created during try_delta phase) that is larger than the 1MB limit, we stop using delta_size_ field for this because it can't contain such size anyway. A new array of delta size is dynamically allocated and can hold all the deltas that 2.17.0 can. This array only contains delta sizes that delta_size_ can't contain. With this, we do not have to drop deltas in try_delta() anymore. Of course the downside is we use slightly more memory, even compared to 2.17.0. But since this is considered an uncommon case, a bit more memory consumption should not be a problem. Delta size limit is also raised from 1MB to 16MB to better cover common case and avoid that extra memory consumption (99.999% deltas in this reported repo are under 12MB; Jeff noted binary artifacts topped out at about 3MB in some other private repos). Other fields are shuffled around to keep this struct packed tight. We don't use more memory in common case even with this limit update. A note about thread synchronization. Since this code can be run in parallel during delta searching phase, we need a mutex. The realloc part in packlist_alloc() is not protected because it only happens during the object counting phase, which is always single-threaded. Access to e->delta_size_ (and by extension pack->delta_size[e - pack->objects]) is unprotected as before, the thread scheduler in pack-objects must make sure "e" is never updated by two different threads. The area under the new lock is as small as possible, avoiding locking at all in common case, since lock contention with high thread count could be expensive (most blobs are small enough that delta compute time is short and we end up taking the lock very often). The previous attempt to always hold a lock in oe_delta_size() and oe_set_delta_size() increases execution time by 33% when repacking linux.git with with 40 threads. Reported-by: Elijah Newren Helped-by: Elijah Newren Helped-by: Jeff King Signed-off-by: Nguyễn Thái Ngọc Duy --- v2 should keep execution time back to 2.17.0 range (I never thought locks are that expensive..). Changes since v1: - oe_delta_size() and oe_set_delta_size() rewritten to keep locking to minimum. This also brings back delta_size_valid field so that we avoid accessing shared data as much as possible. - delta_size[] is now an array of unsigned long instead of uint32_t (i.e. 64-bit on linux, 32-bit on windows, like in 2.17.0) - delta size limit is reduced to 16 MB - lock_initialized field is removed builtin/pack-objects.c| 5 +--- ci/run-build-and-tests.sh | 1 + pack-objects.c| 4 +++ pack-objects.h| 59 +++ t/README | 4 +++ 5 files changed, 58 insertions(+), 15 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index ebc8cefb53..a77659f9ce 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2023,10 +2023,6 @@ static int try_delta(struct unpacked *trg, struct unpacked *src, delta_buf = create_delta(src->index, trg->data, trg_size, _size, max_size); if (!delta_buf) return 0; - if (delta_size >= (1U