Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Jeff King
On Sat, Jul 21, 2018 at 06:23:32AM +0200, Duy Nguyen wrote:

> > I'm not sure I completely agree with this. While 4GB deltas should be
> > pretty rare, the nice thing about 64-bit is that you never have to even
> > think about whether the variable is large enough. I think the 2^32
> > limitations on Windows are something we should be fixing in the long
> > term (though there it is even worse because it is not just deltas, but
> > entire objects).
> 
> I guess that means we stick to uint64_t then? It does increase more
> memory on 32-bit architecture (and probably processing cost as well
> because 64-bit uses up 2 registers).

Yes, but if we switch away from an array to a hash, then we get the best
of both worlds: we are using 64-bits to store the size, but we only need
an entry for deltas that are actually big.

Then the common small-delta case remains efficient in both CPU and
memory, and we pay the costs only in proportion to the number of large
deltas (though the hash is a slightly higher cost for those large deltas
than an array).

> > This is new in this iteration. I guess this is to cover the case where
> > we are built with pthread support, but --threads=1?
> 
> If you mean the "lock_initialized" variable, not really. the other
> _lock() macros in builtin/ call pthread_mutex_lock() unconditionally,
> which is fine. But I feel a bit uncomfortable doing the same in
> pack-objects.h which technically is library code (but yes practically
> is a long arm of builtin/pack-objects.c), so lock_initialized is there
> to make sure we don't touch uninitialized locks if somebody forgets to
> init them first.

I think the ones in builtin/ check threads_active to avoid actually
locking. And that's set in init_thread(), which we do not call if we are
using a single thread. So I think this is basically doing the same
thing, but with a separate flag (since the library code does not know
about threads_active).

> > Your original patch had to copy the oe_* helpers into the file to handle
> > that. But I think we're essentially just locking the whole functions:
> 
> I'll try to avoid this lock when deltas are small and see if it helps
> the linux.git case on Elijah's machine. So we may end up locking just
> a part of these functions.

Yeah, I think my suggestion doesn't work once we start doing more
complex locking logic. Let's just forget it. I think the
"lock_initialized" thing is probably the right approach.

It might be worth getting rid of builtin/pack-objects.c's local
threads_active variable, and just using to_pack.threads_active. The two
flag would always want to match.

-Peff


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Duy Nguyen
On Mon, Jul 23, 2018 at 2:34 PM Elijah Newren  wrote:
> This patch provides a stop-gap improvement for maint that increases the
> delta size back up to 32 bits (still an improvement over the 64 bit size
> of the original).

The "still an improvement" is actually not true. Due to padding and
stuff, struct object_entry's size is increased by 8 bytes, even though
you don't use all of it. It's possible to maybe shuffle fields around
a bit to keep this struct smaller. But it's not really worth it since
this is temporary anyway.

> I think Duy's comment that I'm responding to suggests he Acks this
> patch for maint, but there is that little 'if' he put in his statement.

Yep. Ack'd by me. It's up to Junio to decide if it's worth doing this
or not, of course.
-- 
Duy


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-23 Thread Elijah Newren
On Fri, Jul 20, 2018 at 10:43 AM, Elijah Newren  wrote:
> On Fri, Jul 20, 2018 at 8:39 AM, Nguyễn Thái Ngọc Duy  
> wrote:

>>  If we want a quick fix for 2.18.1. I suggest bumping up
>>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
>>  think it's safe to fast track this patch.
>
> Okay, I'll submit that with a proper commit message then.  Since you
> also bump OE_DELTA_SIZE_BITS in your patch (though to a different
> value), it'll require a conflict resolution when merging maint into
> master, but at least the change won't silently leak through.

And here's the re-submitted patch, with commit message...

-- 8< --
Subject: [PATCH] pack-objects: fix repacking of repositories with some large
 deltas

Commit 0aca34e826 (pack-objects: shrink delta_size field in struct
object_entry - 2018-04-14) reduced 'struct object_entry' size, by dropping
the delta size field to 20 bits (max delta size 1MB).  There was a
fallback to reference existing pack files that already had large deltas,
but when aggressively repacking, it drops the deltas on the floor,
resulting in much worse packs.

For comparison, the following measurements were made for running 'git gc
--aggressive' for one particular repo:

Version  Pack (MB)  MaxRSS(kB)  Time (s)
---  -  --  
 2.17.0 5498 435136282494.85
 2.18.010531 404495964168.94

This patch provides a stop-gap improvement for maint that increases the
delta size back up to 32 bits (still an improvement over the 64 bit size
of the original).  A better fix preserving the memory savings for most
repos that do not have large deltas is being prepared for 2.19.0.

Signed-off-by: Elijah Newren 
---
I think Duy's comment that I'm responding to suggests he Acks this
patch for maint, but there is that little 'if' he put in his statement.

I'll be on vacation until the end of the week, so I apologize in advance
for not responding quickly...

 builtin/pack-objects.c | 2 +-
 pack-objects.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 71056d8294..49b7af295d 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2023,7 +2023,7 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
max_size);
if (!delta_buf)
return 0;
-   if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
+   if (delta_size >= (1UL << OE_DELTA_SIZE_BITS)) {
free(delta_buf);
return 0;
}
diff --git a/pack-objects.h b/pack-objects.h
index edf74dabdd..26021328aa 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -14,7 +14,7 @@
  * above this limit. Don't lower it too much.
  */
 #define OE_SIZE_BITS   31
-#define OE_DELTA_SIZE_BITS 20
+#define OE_DELTA_SIZE_BITS 32
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
-- 
2.18.0.2.gf4c50c7885



Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-22 Thread Duy Nguyen
On Sun, Jul 22, 2018 at 8:22 AM Elijah Newren  wrote:
>
> On Fri, Jul 20, 2018 at 9:47 PM, Duy Nguyen  wrote:
> > On Fri, Jul 20, 2018 at 10:43:25AM -0700, Elijah Newren wrote:
> >> Out of curiosity, would it be possible to use the delta_size_ field
> >> for deltas that are small enough, and only use an external data
> >> structure (perhaps a hash rather than an array) for the few deltas
> >> that are large?
> >
> > We could. And because repack time is still a bit higher in your
> > linux.git case. Let's try this. No locking in common case and very
> > small locked region when we hit large deltas
>
> This one looks like a winner.  Labelling this as fix-v7, this rounds
> out the table to:
>
> Version  Time (s)
> ---  
>  2.17.0   621.36
>  2.18.0   621.80
>  fix-v5   836.29
>  fix-v6   831.73
>  fix-v2   619.96
>  fix-v7   622.88
>
> So the runtime is basically within the noise of different runs of the
> timing for 2.17.0 or 2.18.0 or -v2, and is much faster than -v5 or
> -v6.

Thanks. I'm looking forward to asking you to test lock-related changes
on this 40-core monster in the future :D

Unrelated point of improvement for the future. I notice that at least
on my machine, i have 100% cpu on one core during writing phase,
likely because deltas are being recomputed to be written down and we
don't produce deltas fast enough. We should be able to take advantage
of multiple cores to recompute deltas in advance at this stage and
shorten pack-objects time some more.
-- 
Duy


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-22 Thread Elijah Newren
On Fri, Jul 20, 2018 at 9:47 PM, Duy Nguyen  wrote:
> On Fri, Jul 20, 2018 at 10:43:25AM -0700, Elijah Newren wrote:
>> Out of curiosity, would it be possible to use the delta_size_ field
>> for deltas that are small enough, and only use an external data
>> structure (perhaps a hash rather than an array) for the few deltas
>> that are large?
>
> We could. And because repack time is still a bit higher in your
> linux.git case. Let's try this. No locking in common case and very
> small locked region when we hit large deltas

This one looks like a winner.  Labelling this as fix-v7, this rounds
out the table to:

Version  Time (s)
---  
 2.17.0   621.36
 2.18.0   621.80
 fix-v5   836.29
 fix-v6   831.73
 fix-v2   619.96
 fix-v7   622.88

So the runtime is basically within the noise of different runs of the
timing for 2.17.0 or 2.18.0 or -v2, and is much faster than -v5 or
-v6.


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-21 Thread Duy Nguyen
On Sat, Jul 21, 2018 at 8:56 AM Elijah Newren  wrote:
> > -- 8< --
> ...
> > diff --git a/pack-objects.h b/pack-objects.h
> > index 9f977ae800..11890e7217 100644
> > --- a/pack-objects.h
> > +++ b/pack-objects.h
> > @@ -15,7 +15,7 @@
> ...
> > @@ -353,37 +354,26 @@ static inline void oe_set_size(struct packing_data 
> > *pack,
> >  static inline unsigned long oe_delta_size(struct packing_data *pack,
> >   const struct object_entry *e)
> >  {
> > -   unsigned long size;
> > -
> > -   packing_data_lock(pack);
> > -   if (pack->delta_size)
> > -   size = pack->delta_size[e - pack->objects];
> > +   if (e->delta_size_valid)
> > +   return e->delta_size_;
> > else
> > -   size = e->delta_size_;
> > -   packing_data_unlock(pack);
> > -   return size;
> > +   return pack->delta_size[e - pack->objects];
>
> Should this last line be wrapped with packing_data_lock/unlock calls?
> (And similarly in oe_set_delta_size when writing to this array?)

No. Data related to "e" alone could be accessed unprotected. This is
what try_delta has been doing (it's "trg_entry" in there). Because we
know that pack->delta_size[e - pack->objects] is for "e" and no other
entry, we can have the same treatment. If we access other entries'
data though like in my new_delta_size_array(), that could be racy.
-- 
Duy


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-21 Thread Duy Nguyen
On Sat, Jul 21, 2018 at 6:07 AM Duy Nguyen  wrote:
> > However, I'm just concentrating on a beefy machine; it may be that v6
> > drastically outperforms v2 on weaker hardware?  Can others measure a
> > lower memory usage for v6 than v2?
>
> I'll try it with massif on linux.git, but this is a very weak laptop
> (4GB) so  it'll take a while

OK massif reports are too big to upload so I only pastebin'd the peak
memory snapshot. fix-v2 uses 1.449 MB [1] (M = 1000) while my patch
uses 1.394 MB heap. The actual saving if we focus on packlist_alloc()
is 599,147,916B - 544,691,628B = roughly 50 MB or 8%.

It's not _that_ big saving, so if your test runs do not show
significant speedups, then fix-v2 is probably a better option.

[1] https://pastebin.com/8iDcm9M4
[2] https://pastebin.com/bXBjzPvN
-- 
Duy


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-21 Thread Elijah Newren
On Fri, Jul 20, 2018 at 9:47 PM, Duy Nguyen  wrote:
> On Fri, Jul 20, 2018 at 10:43:25AM -0700, Elijah Newren wrote:

>> Out of curiosity, would it be possible to use the delta_size_ field
>> for deltas that are small enough, and only use an external data
>> structure (perhaps a hash rather than an array) for the few deltas
>> that are large?
>
> We could. And because repack time is still a bit higher in your
> linux.git case. Let's try this. No locking in common case and very
> small locked region when we hit large deltas

Sweet, I'm starting some tests with it...

> -- 8< --
...
> diff --git a/pack-objects.h b/pack-objects.h
> index 9f977ae800..11890e7217 100644
> --- a/pack-objects.h
> +++ b/pack-objects.h
> @@ -15,7 +15,7 @@
...
> @@ -353,37 +354,26 @@ static inline void oe_set_size(struct packing_data 
> *pack,
>  static inline unsigned long oe_delta_size(struct packing_data *pack,
>   const struct object_entry *e)
>  {
> -   unsigned long size;
> -
> -   packing_data_lock(pack);
> -   if (pack->delta_size)
> -   size = pack->delta_size[e - pack->objects];
> +   if (e->delta_size_valid)
> +   return e->delta_size_;
> else
> -   size = e->delta_size_;
> -   packing_data_unlock(pack);
> -   return size;
> +   return pack->delta_size[e - pack->objects];

Should this last line be wrapped with packing_data_lock/unlock calls?
(And similarly in oe_set_delta_size when writing to this array?)

>  }
>
> -uint32_t *new_delta_size_array(struct packing_data *pdata);
>  static inline void oe_set_delta_size(struct packing_data *pack,
>  struct object_entry *e,
>  unsigned long size)
>  {
> -   packing_data_lock(pack);
> -   if (!pack->delta_size && size < pack->oe_delta_size_limit) {
> -   packing_data_unlock(pack);
> +   if (size < pack->oe_delta_size_limit) {
> e->delta_size_ = size;
> -   return;
> +   e->delta_size_valid = 1;
> +   } else {
> +   packing_data_lock(pack);
> +   if (!pack->delta_size)
> +   ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
> +   packing_data_unlock(pack);
> +   pack->delta_size[e - pack->objects] = size;
> }
> -   /*
> -* We have had at least one delta size exceeding OE_DELTA_SIZE_BITS
> -* limit. delta_size_ will not be used anymore. All delta sizes are
> -* now from the delta_size[] array.
> -*/
> -   if (!pack->delta_size)
> -   pack->delta_size = new_delta_size_array(pack);
> -   pack->delta_size[e - pack->objects] = size;
> -   packing_data_unlock(pack);
>  }
>
>  #endif


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Duy Nguyen
On Fri, Jul 20, 2018 at 10:43:25AM -0700, Elijah Newren wrote:
> > This patch provides a better fallback that is
> >
> > - cheaper in terms of cpu and io because we won't have to read
> >   existing pack files as much
> >
> > - better in terms of pack size because the pack heuristics is back to
> >   2.17.0 time, we do not drop large deltas at all
> >
> > If we encounter any delta (on-disk or created during try_delta phase)
> > that is larger than the 2MB limit, we stop using delta_size_ field for
> > this because it can't contain such size anyway. A new array of delta
> > size is dynamically allocated and can hold all the deltas that 2.17.0
> > can [1]. All current delta sizes are migrated over.
> >
> > With this, we do not have to drop deltas in try_delta() anymore. Of
> > course the downside is we use slightly more memory, even compared to
> > 2.17.0. But since this is considered an uncommon case, a bit more
> > memory consumption should not be a problem.
> 
> Out of curiosity, would it be possible to use the delta_size_ field
> for deltas that are small enough, and only use an external data
> structure (perhaps a hash rather than an array) for the few deltas
> that are large?

We could. And because repack time is still a bit higher in your
linux.git case. Let's try this. No locking in common case and very
small locked region when we hit large deltas

-- 8< --
diff --git a/pack-objects.c b/pack-objects.c
index eef344b7c1..e3c32bbfc2 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -88,23 +88,6 @@ struct object_entry *packlist_find(struct packing_data 
*pdata,
return >objects[pdata->index[i] - 1];
 }
 
-uint32_t *new_delta_size_array(struct packing_data *pack)
-{
-   uint32_t *delta_size;
-   uint32_t i;
-
-   /*
-* nr_alloc, not nr_objects to align with realloc() strategy in
-* packlist_alloc()
-*/
-   ALLOC_ARRAY(delta_size, pack->nr_alloc);
-
-   for (i = 0; i < pack->nr_objects; i++)
-   delta_size[i] = pack->objects[i].delta_size_;
-
-   return delta_size;
-}
-
 static void prepare_in_pack_by_idx(struct packing_data *pdata)
 {
struct packed_git **mapping, *p;
diff --git a/pack-objects.h b/pack-objects.h
index 9f977ae800..11890e7217 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -15,7 +15,7 @@
  * above this limit. Don't lower it too much.
  */
 #define OE_SIZE_BITS   31
-#define OE_DELTA_SIZE_BITS 24
+#define OE_DELTA_SIZE_BITS 23
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
@@ -94,6 +94,7 @@ struct object_entry {
 * uses the same base as me
 */
unsigned delta_size_:OE_DELTA_SIZE_BITS; /* delta data size 
(uncompressed) */
+   unsigned delta_size_valid:1;
unsigned char in_pack_header_size;
unsigned in_pack_idx:OE_IN_PACK_BITS;   /* already in pack */
unsigned z_delta_size:OE_Z_DELTA_BITS;
@@ -353,37 +354,26 @@ static inline void oe_set_size(struct packing_data *pack,
 static inline unsigned long oe_delta_size(struct packing_data *pack,
  const struct object_entry *e)
 {
-   unsigned long size;
-
-   packing_data_lock(pack);
-   if (pack->delta_size)
-   size = pack->delta_size[e - pack->objects];
+   if (e->delta_size_valid)
+   return e->delta_size_;
else
-   size = e->delta_size_;
-   packing_data_unlock(pack);
-   return size;
+   return pack->delta_size[e - pack->objects];
 }
 
-uint32_t *new_delta_size_array(struct packing_data *pdata);
 static inline void oe_set_delta_size(struct packing_data *pack,
 struct object_entry *e,
 unsigned long size)
 {
-   packing_data_lock(pack);
-   if (!pack->delta_size && size < pack->oe_delta_size_limit) {
-   packing_data_unlock(pack);
+   if (size < pack->oe_delta_size_limit) {
e->delta_size_ = size;
-   return;
+   e->delta_size_valid = 1;
+   } else {
+   packing_data_lock(pack);
+   if (!pack->delta_size)
+   ALLOC_ARRAY(pack->delta_size, pack->nr_alloc);
+   packing_data_unlock(pack);
+   pack->delta_size[e - pack->objects] = size;
}
-   /*
-* We have had at least one delta size exceeding OE_DELTA_SIZE_BITS
-* limit. delta_size_ will not be used anymore. All delta sizes are
-* now from the delta_size[] array.
-*/
-   if (!pack->delta_size)
-   pack->delta_size = new_delta_size_array(pack);
-   pack->delta_size[e - pack->objects] = size;
-   packing_data_unlock(pack);
 }
 
 #endif
-- 8< --

> > --
> > 2.18.0.rc2.476.g39500d3211
> 
> Missing the 2.18.0 tag?  ;-)

Hehe I was a bit busy lately and have not rebased my branch on the

Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Duy Nguyen
On Fri, Jul 20, 2018 at 7:40 PM Jeff King  wrote:
>
> On Fri, Jul 20, 2018 at 05:39:43PM +0200, Nguyễn Thái Ngọc Duy wrote:
>
> > Let's start with some background about oe_delta_size() and
> > oe_set_delta_size(). If you already know, skip the next paragraph.
> >
> > These two are added in 0aca34e826 (pack-objects: shrink delta_size
> > field in struct object_entry - 2018-04-14) to help reduce 'struct
> > object_entry' size. The delta size field in this struct is reduced to
> > only contain max 2MB. So if any new delta is produced and larger than
> > 2MB, it's dropped because we can't really save such a large size
> > anywhere. Fallback is provided in case existingpackfiles already have
> > large deltas, then we can retrieve it from the pack.
>
> Minor nit, but isn't this 1MB (it was 2MB after one of your patches, but
> I think v2.18.0 has 20 bits)?

Argh.. I think I thought "2 ** 20" in my mind then typed "2 << 20" in
python. And I thought I made a mistake in my previous commit message
because it does mention 1MB...

> > With this, we do not have to drop deltas in try_delta() anymore. Of
> > course the downside is we use slightly more memory, even compared to
> > 2.17.0. But since this is considered an uncommon case, a bit more
> > memory consumption should not be a problem.
>
> I wondered how common this might be. The easiest way to see the largest
> delta sizes is:
>
>   git cat-file --batch-all-objects \
>--batch-check='%(objectsize:disk) %(deltabase)' |
>   grep -v  |
>   sort -rn | head
>
> The biggest one in the kernel is ~300k. Which is about what I'd expect
> for a normal source code repo. Even some private repos I have with a lot
> of binary artifacts top out at about 3MB. So the new 32MB is probably

I'll keep these numbers in v2 commit message, easier to find later.

> > [1] With a small tweak. 2.17.0 on 64-bit linux can hold 2^64 byte
> > deltas, which is absolutely insane. But windows builds, even
> > 64-bit version, only hold 2^32. So reducing it to 2^32 should be
> > quite safe.
>
> I'm not sure I completely agree with this. While 4GB deltas should be
> pretty rare, the nice thing about 64-bit is that you never have to even
> think about whether the variable is large enough. I think the 2^32
> limitations on Windows are something we should be fixing in the long
> term (though there it is even worse because it is not just deltas, but
> entire objects).

I guess that means we stick to uint64_t then? It does increase more
memory on 32-bit architecture (and probably processing cost as well
because 64-bit uses up 2 registers).

> > @@ -2278,6 +2274,8 @@ static void init_threaded_search(void)
> >   pthread_mutex_init(_mutex, NULL);
> >   pthread_mutex_init(_mutex, NULL);
> >   pthread_cond_init(_cond, NULL);
> > + pthread_mutex_init(_pack.lock, NULL);
> > + to_pack.lock_initialized = 1;
> >   old_try_to_free_routine = 
> > set_try_to_free_routine(try_to_free_from_threads);
> >  }
>
> This is new in this iteration. I guess this is to cover the case where
> we are built with pthread support, but --threads=1?

If you mean the "lock_initialized" variable, not really. the other
_lock() macros in builtin/ call pthread_mutex_lock() unconditionally,
which is fine. But I feel a bit uncomfortable doing the same in
pack-objects.h which technically is library code (but yes practically
is a long arm of builtin/pack-objects.c), so lock_initialized is there
to make sure we don't touch uninitialized locks if somebody forgets to
init them first.

> Given that we no longer have to touch this lock during the realloc, is
> it worth actually putting it into to_pack? Instead, we could keep it
> local to pack-objects, alongside all the other locks (and use the
> lock_mutex() helper which handles the single-thread case).

You probably notice the lock name is not "delta_size_lock". I intended
to reuse this for locking other fields in struct packing_data as well.
But that might be a bad idea.

I have no strong opinion about this, so if we still end up locking the
whole functions, I'll just move the lock back close to the others in
builtin/pack-objects.c

> Your original patch had to copy the oe_* helpers into the file to handle
> that. But I think we're essentially just locking the whole functions:

I'll try to avoid this lock when deltas are small and see if it helps
the linux.git case on Elijah's machine. So we may end up locking just
a part of these functions.
-- 
Duy


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Duy Nguyen
On Sat, Jul 21, 2018 at 1:52 AM Elijah Newren  wrote:
>
> On Fri, Jul 20, 2018 at 10:43 AM, Elijah Newren  wrote:
> > On Fri, Jul 20, 2018 at 8:39 AM, Nguyễn Thái Ngọc Duy  
> > wrote:
>
> >> This patch provides a better fallback that is
> >>
> >> - cheaper in terms of cpu and io because we won't have to read
> >>   existing pack files as much
> >>
> >> - better in terms of pack size because the pack heuristics is back to
> >>   2.17.0 time, we do not drop large deltas at all
> >>
> >> If we encounter any delta (on-disk or created during try_delta phase)
> >> that is larger than the 2MB limit, we stop using delta_size_ field for
> >> this because it can't contain such size anyway. A new array of delta
> >> size is dynamically allocated and can hold all the deltas that 2.17.0
> >> can [1]. All current delta sizes are migrated over.
> >>
> >> With this, we do not have to drop deltas in try_delta() anymore. Of
> >> course the downside is we use slightly more memory, even compared to
> >> 2.17.0. But since this is considered an uncommon case, a bit more
> >> memory consumption should not be a problem.
>
> Is the increased memory only supposed to affect the uncommon case?  I
> was able to verify that 2.18.0 used less memory than 2.17.0 (for
> either my repo or linux.git), but I don't see nearly the same memory
> reduction relative to 2.17.0 for linux.git with your new patches.
>
> >> ---
> >>  I'm optimistic that the slowness on linux repo is lock contention
> >>  (because if it's not then I have no clue what is). So let's start
> >>  doing proper patches.
> >
> > I'll be testing this shortly...
>
> Using these definitions for versions:
>
>   fix-v5: 
> https://public-inbox.org/git/20180720052829.ga3...@sigill.intra.peff.net/
> (plus what it depends on)
>   fix-v6: The patch I'm responding to
>   fix-v2: 
> https://public-inbox.org/git/20180718225110.17639-3-new...@gmail.com/
>
> and inspired by
> https://public-inbox.org/git/87po42cwql@evledraar.gmail.com/, I
> ran
>
>/usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive
>
> three times on a beefy box (40 cores, 160GB ram, otherwise quiet
> system).  If I take the lowest MaxRSS, and the lowest time reported
> (regardless of whether from the same run), then the results are as
> follows for linux.git (all cases repacked down to 1279 MB, so removing
> that from the table):
>
> Version  MaxRSS(kB)  Time (s)
> ---  --  
>  2.17.0   11413236621.36
>  2.18.0   10987152621.80
>  fix-v5   11105564836.29
>  fix-v6   11357304831.73
>  fix-v2   10873348619.96
>
> The runtime could swing up to 10 seconds between runs.  The memory use
> also had some variability, but all runs of fix-v2 had lower MaxRSS
> than any runs of 2.18.0 (which surprised me); all runs of 2.18.0 used
> less memory than any run of fix-v6, and all runs of fix-v6 used less
> memory than any run of 2.17.0.  fix-v5 had the most variabiilty in
> MaxRSS, ranging from as low as some 2.18.0 runs up to higher than any
> 2.17.0 runs.

As Peff said, RSS is not a very good way to measure memory usage.
valgrind --tool=massif would be the best, or just grep "heap" in
/proc/$PID/smaps. fix-v2 should increase struct object_entry's size
and memory usage for all repos, while v6 has the same memory usage as
2.17.0 in heap, and extra once it hits a big delta.

> ...but maybe that'd change with more than 3 runs of each?
>
> Anyway, this is a lot better than the 1193.67 seconds I saw with
> fix-v4 (https://public-inbox.org/git/20180719182442.ga5...@duynguyen.home/,
> which Peff fixed up into fix-v5), suggesting it is lock contention,
> but we're still about 33% slower than 2.17.0 and we've lost almost all
> of the memory savings.  Somewhat surprisingly, the highly simplistic
> fix-v2 does a lot better on both measures.

Maybe we should avoid locking when the delta is small enough then and
see how it goes. This is linux.git so there's no big delta and we
would end up not locking at all.

If you do "git repack -adf --threads=8", does the time difference
between v6 and 2.17.0 reduce (suggesting we still hit lock contention
hard)?

> However, I'm just concentrating on a beefy machine; it may be that v6
> drastically outperforms v2 on weaker hardware?  Can others measure a
> lower memory usage for v6 than v2?

I'll try it with massif on linux.git, but this is a very weak laptop
(4GB) so  it'll take a while

> Also, for the original repo with the huge deltas that started it all,
> the numbers are (only run once since it takes so long):
>
> Version  Pack (MB)  MaxRSS(kB)  Time (s)
> ---  -  --  
>  2.17.0 5498 435136282494.85
>  2.18.010531 404495964168.94
>  fix-v5 5500 428057162577.50
>  fiv-v6 5509 428148362605.36
>  fix-v2 5509 416441042468.25
>
>
> >>  If we want a quick fix for 2.18.1. I suggest bumping up
> >>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
> >>  

Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Elijah Newren
On Fri, Jul 20, 2018 at 10:43 AM, Elijah Newren  wrote:
> On Fri, Jul 20, 2018 at 8:39 AM, Nguyễn Thái Ngọc Duy  
> wrote:

>> This patch provides a better fallback that is
>>
>> - cheaper in terms of cpu and io because we won't have to read
>>   existing pack files as much
>>
>> - better in terms of pack size because the pack heuristics is back to
>>   2.17.0 time, we do not drop large deltas at all
>>
>> If we encounter any delta (on-disk or created during try_delta phase)
>> that is larger than the 2MB limit, we stop using delta_size_ field for
>> this because it can't contain such size anyway. A new array of delta
>> size is dynamically allocated and can hold all the deltas that 2.17.0
>> can [1]. All current delta sizes are migrated over.
>>
>> With this, we do not have to drop deltas in try_delta() anymore. Of
>> course the downside is we use slightly more memory, even compared to
>> 2.17.0. But since this is considered an uncommon case, a bit more
>> memory consumption should not be a problem.

Is the increased memory only supposed to affect the uncommon case?  I
was able to verify that 2.18.0 used less memory than 2.17.0 (for
either my repo or linux.git), but I don't see nearly the same memory
reduction relative to 2.17.0 for linux.git with your new patches.

>> ---
>>  I'm optimistic that the slowness on linux repo is lock contention
>>  (because if it's not then I have no clue what is). So let's start
>>  doing proper patches.
>
> I'll be testing this shortly...

Using these definitions for versions:

  fix-v5: 
https://public-inbox.org/git/20180720052829.ga3...@sigill.intra.peff.net/
(plus what it depends on)
  fix-v6: The patch I'm responding to
  fix-v2: https://public-inbox.org/git/20180718225110.17639-3-new...@gmail.com/

and inspired by
https://public-inbox.org/git/87po42cwql@evledraar.gmail.com/, I
ran

   /usr/bin/time -f 'MaxRSS:%M Time:%e' git gc --aggressive

three times on a beefy box (40 cores, 160GB ram, otherwise quiet
system).  If I take the lowest MaxRSS, and the lowest time reported
(regardless of whether from the same run), then the results are as
follows for linux.git (all cases repacked down to 1279 MB, so removing
that from the table):

Version  MaxRSS(kB)  Time (s)
---  --  
 2.17.0   11413236621.36
 2.18.0   10987152621.80
 fix-v5   11105564836.29
 fix-v6   11357304831.73
 fix-v2   10873348619.96

The runtime could swing up to 10 seconds between runs.  The memory use
also had some variability, but all runs of fix-v2 had lower MaxRSS
than any runs of 2.18.0 (which surprised me); all runs of 2.18.0 used
less memory than any run of fix-v6, and all runs of fix-v6 used less
memory than any run of 2.17.0.  fix-v5 had the most variabiilty in
MaxRSS, ranging from as low as some 2.18.0 runs up to higher than any
2.17.0 runs.

...but maybe that'd change with more than 3 runs of each?

Anyway, this is a lot better than the 1193.67 seconds I saw with
fix-v4 (https://public-inbox.org/git/20180719182442.ga5...@duynguyen.home/,
which Peff fixed up into fix-v5), suggesting it is lock contention,
but we're still about 33% slower than 2.17.0 and we've lost almost all
of the memory savings.  Somewhat surprisingly, the highly simplistic
fix-v2 does a lot better on both measures.


However, I'm just concentrating on a beefy machine; it may be that v6
drastically outperforms v2 on weaker hardware?  Can others measure a
lower memory usage for v6 than v2?


Also, for the original repo with the huge deltas that started it all,
the numbers are (only run once since it takes so long):

Version  Pack (MB)  MaxRSS(kB)  Time (s)
---  -  --  
 2.17.0 5498 435136282494.85
 2.18.010531 404495964168.94
 fix-v5 5500 428057162577.50
 fiv-v6 5509 428148362605.36
 fix-v2 5509 416441042468.25


>>  If we want a quick fix for 2.18.1. I suggest bumping up
>>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
>>  think it's safe to fast track this patch.
>
> Okay, I'll submit that with a proper commit message then.  Since you
> also bump OE_DELTA_SIZE_BITS in your patch (though to a different
> value), it'll require a conflict resolution when merging maint into
> master, but at least the change won't silently leak through.

...except now I'm wondering if the commit message should mention that
it's just a stopgap fix for 2.18.1, or whether it's actually the fix
that we just want to use going forward as well.


Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Elijah Newren
On Fri, Jul 20, 2018 at 8:39 AM, Nguyễn Thái Ngọc Duy  wrote:
> Let's start with some background about oe_delta_size() and
> oe_set_delta_size(). If you already know, skip the next paragraph.
>
> These two are added in 0aca34e826 (pack-objects: shrink delta_size
> field in struct object_entry - 2018-04-14) to help reduce 'struct
> object_entry' size. The delta size field in this struct is reduced to
> only contain max 2MB. So if any new delta is produced and larger than

I think you mean 1MB?

$ git grep OE_DELTA_SIZE_BITS v2.18.0
v2.18.0:builtin/pack-objects.c: if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
v2.18.0:pack-objects.h:#define OE_DELTA_SIZE_BITS   20
v2.18.0:pack-objects.h: unsigned delta_size_:OE_DELTA_SIZE_BITS; /*
delta data size (uncompressed) */

In https://public-inbox.org/git/20180719151640.ga24...@duynguyen.home/,
you bumped this to 21, which may be where you got the 2MB figure?
Your 2MB figure also appears on the next line and a few other places
in the commit message.

> 2MB, it's dropped because we can't really save such a large size
> anywhere. Fallback is provided in case existingpackfiles already have

Missing space: s/existingpackfiles/existing packfiles/

> large deltas, then we can retrieve it from the pack.
>
> While this should help small machines repacking large repos (i.e. less

Maybe change "repacking large repos" to "repacking large repos without
large deltas"?

> memory pressure), dropping large deltas during the delta selection
> process could end up with worse pack files. And if existing packfiles
> already have >2MB delta and pack-objects is instructed to not reuse
> deltas, all of them will be dropped on the floor, and the resulting
> pack would be definitely bigger.
>
> There is also a regression in terms of CPU/IO if we have large on-disk
> deltas because fallback code needs to parse the pack every time the
> delta size is needed and just access to the mmap'd pack data is enough
> for extra page faults when memory is under pressure.
>
> Both of these issues were reported on the mailing list. Here's some
> numbers for comparison.
>
> Version  Pack (MB)  MaxRSS(kB)  Time (s)
> ---  -  --  
>  2.17.0 5498 435136282494.85
>  2.18.010531 404495964168.94
>
> This patch provides a better fallback that is
>
> - cheaper in terms of cpu and io because we won't have to read
>   existing pack files as much
>
> - better in terms of pack size because the pack heuristics is back to
>   2.17.0 time, we do not drop large deltas at all
>
> If we encounter any delta (on-disk or created during try_delta phase)
> that is larger than the 2MB limit, we stop using delta_size_ field for
> this because it can't contain such size anyway. A new array of delta
> size is dynamically allocated and can hold all the deltas that 2.17.0
> can [1]. All current delta sizes are migrated over.
>
> With this, we do not have to drop deltas in try_delta() anymore. Of
> course the downside is we use slightly more memory, even compared to
> 2.17.0. But since this is considered an uncommon case, a bit more
> memory consumption should not be a problem.

Out of curiosity, would it be possible to use the delta_size_ field
for deltas that are small enough, and only use an external data
structure (perhaps a hash rather than an array) for the few deltas
that are large?

> Delta size limit is also raised from 2MB to 32 MB to better cover
> common case and avoid that extra memory consumption (99.999% deltas in
> this reported repo are under 12MB). Other fields are shuffled around
> to keep this struct packed tight. We don't use more memory in common
> case even with this limit update.
>
> A note about thread synchronization. Since this code can be run in
> parallel during delta searching phase, we need a mutex. The realloc
> part in packlist_alloc() is not protected because it only happens
> during the object counting phase, which is always single-threaded.
>
> The locking in oe_delta_size() could also be dropped if we make sure
> the pack->delta_size pointer assignment in oe_set_delta_size() is
> atomic. But let's keep the lock for now to be on the safe side. Lock
> contention should not be that high for this lock.
>
> [1] With a small tweak. 2.17.0 on 64-bit linux can hold 2^64 byte
> deltas, which is absolutely insane. But windows builds, even
> 64-bit version, only hold 2^32. So reducing it to 2^32 should be
> quite safe.
>
> Reported-by: Elijah Newren 
> Helped-by: Elijah Newren 
> Helped-by: Jeff King 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  I'm optimistic that the slowness on linux repo is lock contention
>  (because if it's not then I have no clue what is). So let's start
>  doing proper patches.

I'll be testing this shortly...

>
>  If we want a quick fix for 2.18.1. I suggest bumping up
>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
>  think it's safe to fast track this patch.

Okay, I'll 

Re: [PATCH] pack-objects: fix performance issues on packing large deltas

2018-07-20 Thread Jeff King
On Fri, Jul 20, 2018 at 05:39:43PM +0200, Nguyễn Thái Ngọc Duy wrote:

> Let's start with some background about oe_delta_size() and
> oe_set_delta_size(). If you already know, skip the next paragraph.
> 
> These two are added in 0aca34e826 (pack-objects: shrink delta_size
> field in struct object_entry - 2018-04-14) to help reduce 'struct
> object_entry' size. The delta size field in this struct is reduced to
> only contain max 2MB. So if any new delta is produced and larger than
> 2MB, it's dropped because we can't really save such a large size
> anywhere. Fallback is provided in case existingpackfiles already have
> large deltas, then we can retrieve it from the pack.

Minor nit, but isn't this 1MB (it was 2MB after one of your patches, but
I think v2.18.0 has 20 bits)?

> [...more commit message...]

Nicely explained overall.

> With this, we do not have to drop deltas in try_delta() anymore. Of
> course the downside is we use slightly more memory, even compared to
> 2.17.0. But since this is considered an uncommon case, a bit more
> memory consumption should not be a problem.

I wondered how common this might be. The easiest way to see the largest
delta sizes is:

  git cat-file --batch-all-objects \
   --batch-check='%(objectsize:disk) %(deltabase)' |
  grep -v  |
  sort -rn | head

The biggest one in the kernel is ~300k. Which is about what I'd expect
for a normal source code repo. Even some private repos I have with a lot
of binary artifacts top out at about 3MB. So the new 32MB is probably
pretty unlikely to trigger, unless you really do have a bunch of huge
artifacts.

If you do, then more memory pressure isn't great. But as a relative
measure, the extra couple megabytes to store one 32-bit int per object
is probably fine.

> A note about thread synchronization. Since this code can be run in
> parallel during delta searching phase, we need a mutex. The realloc
> part in packlist_alloc() is not protected because it only happens
> during the object counting phase, which is always single-threaded.

Right, thanks for clarifying this here.

> The locking in oe_delta_size() could also be dropped if we make sure
> the pack->delta_size pointer assignment in oe_set_delta_size() is
> atomic. But let's keep the lock for now to be on the safe side. Lock
> contention should not be that high for this lock.

Yeah, I agree with this, now that we're not using the read_lock().

> [1] With a small tweak. 2.17.0 on 64-bit linux can hold 2^64 byte
> deltas, which is absolutely insane. But windows builds, even
> 64-bit version, only hold 2^32. So reducing it to 2^32 should be
> quite safe.

I'm not sure I completely agree with this. While 4GB deltas should be
pretty rare, the nice thing about 64-bit is that you never have to even
think about whether the variable is large enough. I think the 2^32
limitations on Windows are something we should be fixing in the long
term (though there it is even worse because it is not just deltas, but
entire objects).

> ---
>  I'm optimistic that the slowness on linux repo is lock contention
>  (because if it's not then I have no clue what is). So let's start
>  doing proper patches.

Me too, but I'd love to see more numbers from Elijah.

>  If we want a quick fix for 2.18.1. I suggest bumping up
>  OE_DELTA_SIZE_BITS like Elijah did in his second option. I don't
>  think it's safe to fast track this patch.

Also agreed.

> @@ -2278,6 +2274,8 @@ static void init_threaded_search(void)
>   pthread_mutex_init(_mutex, NULL);
>   pthread_mutex_init(_mutex, NULL);
>   pthread_cond_init(_cond, NULL);
> + pthread_mutex_init(_pack.lock, NULL);
> + to_pack.lock_initialized = 1;
>   old_try_to_free_routine = 
> set_try_to_free_routine(try_to_free_from_threads);
>  }

This is new in this iteration. I guess this is to cover the case where
we are built with pthread support, but --threads=1?

Given that we no longer have to touch this lock during the realloc, is
it worth actually putting it into to_pack? Instead, we could keep it
local to pack-objects, alongside all the other locks (and use the
lock_mutex() helper which handles the single-thread case).

Your original patch had to copy the oe_* helpers into the file to handle
that. But I think we're essentially just locking the whole functions:

> @@ -330,20 +353,37 @@ static inline void oe_set_size(struct packing_data 
> *pack,
>  static inline unsigned long oe_delta_size(struct packing_data *pack,
> const struct object_entry *e)
>  {
> - if (e->delta_size_valid)
> - return e->delta_size_;
> - return oe_size(pack, e);
> + unsigned long size;
> +
> + packing_data_lock(pack);
> + if (pack->delta_size)
> + size = pack->delta_size[e - pack->objects];
> + else
> + size = e->delta_size_;
> + packing_data_unlock(pack);
> + return size;
>  }

So could we just wrap