Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-24 Thread Philippe Vaucher
 I used to repack older part of history manually with a deeper depth,
 mark the result with the .keep bit, and then repack the whole thing
 again to have the remainder in a shallower depth.  Something like:

 git rev-list --objects v1.5.3 |
 git pack-objects --depth=128 --delta-base-offset pack

 would give me the first pack (in real life, I would use a larger
 window size like 4096), and then after placing the resulting .pack
 and .idx files along with a .keep file in .git/objects/pack/,
 running git repack -a -d to pack the rest.

I'm curious, after these repacking, how do you guys publish these
packs? git push? if yes, on what criteria does the remote repo know
which pack it should fetch?

Or maybe it's only a local operation and thus you cannot do it on the
remote without ssh access?

Philippe
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-22 Thread David Kastrup
Duy Nguyen pclo...@gmail.com writes:

 OK with git://git.savannah.gnu.org/emacs.git we have

  - a 209MB pack with --aggressive
  - 1.3GB with --depth=50
  - 1.3GB with --window=4000 --depth=32
  - 1.3GB with --depth=20
  - 821MB with --depth=250 for commits --before=2.years.ago, --depth=50
 for the rest

 So I don't think we should go with your following patch because the
 size explosion is just too much no matter how faster it could be. An
 immediate action could be just make --depth=250 configurable and let
 people deal with it. A better option is something like 3 repack
 steps you described where we pack deep depth first, mark .keep, pack
 shallower depth and combine them all into one.

 I'm not really happy with --depth=250 producing 209MB while
 --depth=250 --before=2.year.ago a 800MB pack. It looks wrong (or maybe
 I did something wrong)

That does look strange: Emacs has a history of more than 30 years.  But
the Git mirror is quite younger.  Maybe one needs to make sure to use
the author date rather than the commit date here?

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-22 Thread David Kastrup
David Kastrup d...@gnu.org writes:

 Duy Nguyen pclo...@gmail.com writes:

 OK with git://git.savannah.gnu.org/emacs.git we have

  - a 209MB pack with --aggressive
  - 1.3GB with --depth=50
  - 1.3GB with --window=4000 --depth=32
  - 1.3GB with --depth=20
  - 821MB with --depth=250 for commits --before=2.years.ago, --depth=50
 for the rest

 So I don't think we should go with your following patch because the
 size explosion is just too much no matter how faster it could be. An
 immediate action could be just make --depth=250 configurable and let
 people deal with it. A better option is something like 3 repack
 steps you described where we pack deep depth first, mark .keep, pack
 shallower depth and combine them all into one.

 I'm not really happy with --depth=250 producing 209MB while
 --depth=250 --before=2.year.ago a 800MB pack. It looks wrong (or maybe
 I did something wrong)

 That does look strange: Emacs has a history of more than 30 years.  But
 the Git mirror is quite younger.  Maybe one needs to make sure to use
 the author date rather than the commit date here?

Another thing: did you really use --depth=250 here or did you use
--aggressive?  It may be that the latter also sets other options?

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-22 Thread Duy Nguyen
On Sat, Feb 22, 2014 at 3:53 PM, David Kastrup d...@gnu.org wrote:
 David Kastrup d...@gnu.org writes:

 Duy Nguyen pclo...@gmail.com writes:

 OK with git://git.savannah.gnu.org/emacs.git we have

  - a 209MB pack with --aggressive
  - 1.3GB with --depth=50
  - 1.3GB with --window=4000 --depth=32
  - 1.3GB with --depth=20
  - 821MB with --depth=250 for commits --before=2.years.ago, --depth=50
 for the rest

 So I don't think we should go with your following patch because the
 size explosion is just too much no matter how faster it could be. An
 immediate action could be just make --depth=250 configurable and let
 people deal with it. A better option is something like 3 repack
 steps you described where we pack deep depth first, mark .keep, pack
 shallower depth and combine them all into one.

 I'm not really happy with --depth=250 producing 209MB while
 --depth=250 --before=2.year.ago a 800MB pack. It looks wrong (or maybe
 I did something wrong)

 That does look strange: Emacs has a history of more than 30 years.  But
 the Git mirror is quite younger.  Maybe one needs to make sure to use
 the author date rather than the commit date here?

I think commit date is fine because it covers a large portion of
objects (649946 per total 739990) and it does not (or should not)
affect object ordering in pack-objects/rev-list.

 Another thing: did you really use --depth=250 here or did you use
 --aggressive?  It may be that the latter also sets other options?

I can't use --aggressive because I need to feed revisions directly to
pack-objects. --aggressive also sets --window=250. Thanks for
checking. My machine will have another workout session.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-22 Thread Andreas Schwab
David Kastrup d...@gnu.org writes:

 That does look strange: Emacs has a history of more than 30 years.  But
 the Git mirror is quite younger.  Maybe one needs to make sure to use
 the author date rather than the commit date here?

There is no difference between commit and author date in the Emacs git
mirror since bzr doesn't keep that distinction (and cvs didn't either).

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-22 Thread Duy Nguyen
On Sat, Feb 22, 2014 at 4:14 PM, Duy Nguyen pclo...@gmail.com wrote:
 On Sat, Feb 22, 2014 at 3:53 PM, David Kastrup d...@gnu.org wrote:
 David Kastrup d...@gnu.org writes:

 Duy Nguyen pclo...@gmail.com writes:

 OK with git://git.savannah.gnu.org/emacs.git we have

  - a 209MB pack with --aggressive
  - 1.3GB with --depth=50
  - 1.3GB with --window=4000 --depth=32
  - 1.3GB with --depth=20
  - 821MB with --depth=250 for commits --before=2.years.ago, --depth=50
 for the rest
...

 I'm not really happy with --depth=250 producing 209MB while
 --depth=250 --before=2.year.ago a 800MB pack. It looks wrong (or maybe
 I did something wrong)

 Another thing: did you really use --depth=250 here or did you use
 --aggressive?  It may be that the latter also sets other options?

 I can't use --aggressive because I need to feed revisions directly to
 pack-objects. --aggressive also sets --window=250. Thanks for
 checking. My machine will have another workout session.

And 800MB is reduced to 177MB, containing history older than 2 years.
The final pack is 199MB, within the size range of current --aggressive
and should be reasonably fast on most operations. Again blame could
still hit long delta chains but I think we should just unpack some
trees/blobs when we hit long delta chains.

I think we should update --aggressive to do it this way. So

 - gc.aggressiveDepth defaults to 50 (or 20?), this is used for recent history
 - gc.aggressiveDeepDepth defaults to 250 (or smaller??), used for
ancient history
 - gc.aggressiveDeepOption is rev-list a rev-list option to define
ancient history, default to --before=2.years.ago. This option could
be specified multiple times.

Both packing phases use the same gc.aggressiveWindow. We could add
gc.aggressiveDeepWindow too.

GSoC project?
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-21 Thread Junio C Hamano
Christian Jaeger chr...@gmail.com writes:

 Also, in man git-gc document --aggressive that it leads to slower
 *read* performance after the gc, I remember having red that option's
 docs when I ran it, and since it didn't mention that it makes reads
 slower, I didn't expect it to, and thus didn't remember this as the
 source of the problem when I noticed that things were slow.

Good point. We would at least need such a documentation update to
warn users.

 (But, I took from the discussion that increasing the gzip window size
 (?) would make things smaller anyway, so perhaps all that isn't even
 necessary?)

If you are talking about --window in git repack --window=,
that is not related to gzip.  It is how many other similar objects
an object will be tried to delta against to find a smallest delta
that can represent it in the pack.  Such a better delta, if found,
can give you a packfile with a smaller depth that is as small as
another packfile created with a larger depth, which is an overall
win, and using a wider window is a way to achieve such a result.



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-21 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 For old projects, commits older than 1-2 years is probably less often
 accessed and could use some aggressive packing.

I used to repack older part of history manually with a deeper depth,
mark the result with the .keep bit, and then repack the whole thing
again to have the remainder in a shallower depth.  Something like:

git rev-list --objects v1.5.3 |
git pack-objects --depth=128 --delta-base-offset pack

would give me the first pack (in real life, I would use a larger
window size like 4096), and then after placing the resulting .pack
and .idx files along with a .keep file in .git/objects/pack/,
running git repack -a -d to pack the rest.

 This still hits git-blame badly. We could even make sure all
 objects on the blame surface have short delta chain. But that
 may be pushing pack-objects too much.

Yes, you can do a similar trick by blaming all the paths that ever
existed in the project, parse its --porcelain output to learn all
the commits and paths involved, to find the objects that need
quicker access.  Pack such objects in a pack with a shallow depth,
tentatively mark that pack with .keep, repack the remainder with a
deep depth, remove .keep from the first pack and mark the new pack
with .keep to prevent it from getting repacked, or something like
that.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-21 Thread Duy Nguyen
On Wed, Feb 19, 2014 at 3:59 AM, Junio C Hamano gits...@pobox.com wrote:
 I didn't know --agressive was so aggressive myself, as I personally
 never use it. git repack -a -d -f --depth=32 window=4000 is what I
 often use, but I suspect most people would not be patient enough for
 that 4k window.

 Let's do something like this first and then later make --depth
 configurable just like --width, perhaps?  For aggressive, I think
 the default width (hardcoded to 250 but configurable) is a bit too
 narrow.

OK with git://git.savannah.gnu.org/emacs.git we have

 - a 209MB pack with --aggressive
 - 1.3GB with --depth=50
 - 1.3GB with --window=4000 --depth=32
 - 1.3GB with --depth=20
 - 821MB with --depth=250 for commits --before=2.years.ago, --depth=50
for the rest

So I don't think we should go with your following patch because the
size explosion is just too much no matter how faster it could be. An
immediate action could be just make --depth=250 configurable and let
people deal with it. A better option is something like 3 repack
steps you described where we pack deep depth first, mark .keep, pack
shallower depth and combine them all into one.

I'm not really happy with --depth=250 producing 209MB while
--depth=250 --before=2.year.ago a 800MB pack. It looks wrong (or maybe
I did something wrong)


  builtin/gc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/builtin/gc.c b/builtin/gc.c
 index 6be6c8d..0d010f0 100644
 --- a/builtin/gc.c
 +++ b/builtin/gc.c
 @@ -204,7 +204,7 @@ int cmd_gc(int argc, const char **argv, const char 
 *prefix)

 if (aggressive) {
 argv_array_push(repack, -f);
 -   argv_array_push(repack, --depth=250);
 +   argv_array_push(repack, --depth=20);
 if (aggressive_window  0)
 argv_array_pushf(repack, --window=%d, 
 aggressive_window);
 }



-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-20 Thread David Kastrup
Duy Nguyen pclo...@gmail.com writes:

 I can think of two improvements we could make, either increase cache
 size dynamically (within limits) or make it configurable. If we have N
 entries in worktree (both trees and blobs) and depth M, then we might
 need to cache N*M objects for it to be effective. Christian, if you
 want to experiment this, update MAX_DELTA_CACHE in sha1_file.c and
 rebuild.

Well, my optimized git-blame code is considerably hit by an
aggressively packed Emacs repository so I took a look at it with the
MAX_DELTA_CACHE value set to the default 256, and then 512, 1024, 2048.

Here are the results:

dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c /dev/null

real1m17.496s
user0m30.552s
sys 0m46.496s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c /dev/null

real1m13.888s
user0m30.060s
sys 0m43.420s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c /dev/null

real1m16.415s
user0m31.436s
sys 0m44.564s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c /dev/null

real1m24.732s
user0m34.416s
sys 0m49.808s

So using a value of 512 helps a bit (7% or so), but further increases
already cause a hit.  My machine has 4G of memory (32bit x86), so it is
unlikely that memory is running out.  I have no idea why this would be
so: either memory locality plays a role here, or the cache for some
reason gets reinitialized or scanned/copied/accessed as a whole
repeatedly, defeating the idea of a cache.  Or the access pattern are
such that it's entirely useless as a cache even at this size.

Trying with 16384:
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c /dev/null

real2m8.000s
user0m54.968s
sys 1m12.624s

And memory consumption did not exceed about 200m all the while, so is
far lower than what would have been available.

Something's _really_ fishy about that cache behavior.  Note that the
_system_ time goes up considerably, not just user time.  Since the packs
are zlib-packed, it's reasonable that more I/O time is also associated
with more user time and it is well possible that the user time increase
is entirely explainable by the larger amount of compressed data to
access.

But this stinks.  I doubt that the additional time is spent in memory
allocation: most of that would register only as user time.  And the
total allocated memory is not large enough that one can explain this
away with fewer available disk buffers for the kernel: the aggressively
packed repo takes about 300m so it would fine into memory together with
the git process.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-20 Thread David Kastrup
David Kastrup d...@gnu.org writes:

 Duy Nguyen pclo...@gmail.com writes:

 I can think of two improvements we could make, either increase cache
 size dynamically (within limits) or make it configurable. If we have N
 entries in worktree (both trees and blobs) and depth M, then we might
 need to cache N*M objects for it to be effective. Christian, if you
 want to experiment this, update MAX_DELTA_CACHE in sha1_file.c and
 rebuild.

 Well, my optimized git-blame code is considerably hit by an
 aggressively packed Emacs repository so I took a look at it with the
 MAX_DELTA_CACHE value set to the default 256, and then 512, 1024, 2048.

[...]

 Trying with 16384:
 dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c /dev/null

 real  2m8.000s
 user  0m54.968s
 sys   1m12.624s

 And memory consumption did not exceed about 200m all the while, so is
 far lower than what would have been available.

Of course, this has to do with delta_base_cache_limit defaulting to 16m.

 Something's _really_ fishy about that cache behavior.  Note that the
 _system_ time goes up considerably, not just user time.  Since the
 packs are zlib-packed, it's reasonable that more I/O time is also
 associated with more user time and it is well possible that the user
 time increase is entirely explainable by the larger amount of
 compressed data to access.

 But this stinks.

And an obvious contender for the stinking is that the LRU scheme used
here is _strictly_ freeing memory based on which cache entry has been
_created_ the longest time ago, not which cache entry has been
_accessed_ the longest time ago.  Which means a pure round-robin
strategy for freeing memory rather than LRU.

Let's see what happens when changing this.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-20 Thread David Kastrup
David Kastrup d...@gnu.org writes:

 David Kastrup d...@gnu.org writes:

 Duy Nguyen pclo...@gmail.com writes:

 Something's _really_ fishy about that cache behavior.  Note that the
 _system_ time goes up considerably, not just user time.  Since the
 packs are zlib-packed, it's reasonable that more I/O time is also
 associated with more user time and it is well possible that the user
 time increase is entirely explainable by the larger amount of
 compressed data to access.

 But this stinks.

 And an obvious contender for the stinking is that the LRU scheme used
 here is _strictly_ freeing memory based on which cache entry has been
 _created_ the longest time ago, not which cache entry has been
 _accessed_ the longest time ago.  Which means a pure round-robin
 strategy for freeing memory rather than LRU.

 Let's see what happens when changing this.

Not much.  With any cache size, using a true LRU scheme does not buy
more than 2%.  On the other hand, increasing core.deltaBaseCacheLimit
from its default of 16m to 128m in the config file results in the
following difference (with default #define MAX_DELTA_CACHE (256)):

dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c /dev/null

real1m17.446s
user0m30.696s
sys 0m46.332s
dak@lola:/usr/local/tmp/emacs$ time ../git/git blame src/xdisp.c /dev/null

real0m27.519s
user0m20.248s
sys 0m7.156s

So it would seem that the default available cache slots are not utilized
anyway when operating on this file (about 1MB in size) with the default
of core.deltaBaseCacheLimit.

It is still irritating that the performance drops quite a bit with a
considerably larger number of cache slots.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-20 Thread Duy Nguyen
On Thu, Feb 20, 2014 at 1:59 AM, Junio C Hamano gits...@pobox.com wrote:
 Philippe Vaucher philippe.vauc...@gmail.com writes:

 fwiw this is the thread that added --depth=250

 http://thread.gmane.org/gmane.comp.gcc.devel/94565/focus=94626

 This post is quite interesting:
 http://article.gmane.org/gmane.comp.gcc.devel/94637

 Yes, it most clearly says that --depth=250 was *not* a
 recommendation, with technical background to explain why such a long
 delta chain is a bad idea.

On the other hand, the size reduction is really nice (320MB vs 500MB).
I don't know if we can do this, but does it make sense to apply
--depth=250 for old commits only and shallow depth for recent commits?

For old projects, commits older than 1-2 years is probably less often
accessed and could use some aggressive packing. This still hits
git-blame badly. We could even make sure all objects on the blame
surface have short delta chain. But that may be pushing pack-objects
too much.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-20 Thread Christian Jaeger
2014-02-20 23:35 GMT+00:00 Duy Nguyen pclo...@gmail.com:
 does it make sense to apply
 --depth=250 for old commits only

Just wondering: would it be difficult to fix the problems that lead to
worse than linear slowdown with the --depth? (I.e. adaptive cache/hash
table size.) If the performance difference between say --depth=25 and
--depth=250 could be reduced from a factor 40 to 10 (or better if
things are back to other things taking more time than the object
access), that would seem like a nice gain in any case.

Also, in man git-gc document --aggressive that it leads to slower
*read* performance after the gc, I remember having red that option's
docs when I ran it, and since it didn't mention that it makes reads
slower, I didn't expect it to, and thus didn't remember this as the
source of the problem when I noticed that things were slow.

(But, I took from the discussion that increasing the gzip window size
(?) would make things smaller anyway, so perhaps all that isn't even
necessary?)

I can test next week if you have particular suggestions to test.

Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-20 Thread Duy Nguyen
On Fri, Feb 21, 2014 at 06:35:06AM +0700, Duy Nguyen wrote:
 On the other hand, the size reduction is really nice (320MB vs 500MB).
 I don't know if we can do this, but does it make sense to apply
 --depth=250 for old commits only and shallow depth for recent commits?
 
 For old projects, commits older than 1-2 years is probably less often
 accessed and could use some aggressive packing. This still hits
 git-blame badly. We could even make sure all objects on the blame
 surface have short delta chain. But that may be pushing pack-objects
 too much.

We can have a moderately aggressive mode like this. With the patch
below, first you repack all and remove all loose objects. Then replay
your favourite use cases with GIT_LOOSE_THEM=1. For example, if I'm
most interested in commits from a yearq ago

$ GIT_LOOSE_THEM=1 ../git log --raw --since=1.year.ago /dev/null

all relevant trees will be unpacked. Put --stat there too if you want
to unpack blobs. blame-heavy users may want to blame a few (or all)
files here too to unpack more. Now we can repack aggressively all
non-loose objects:

$ git repack -adf --exclude-loose --depth=250

and repack again, this time with normal depth, which would only affect
loose objects

$ git repack -ad

The end result is a pack with ancient history with potentially long
delta chains, tightly packed, and nearer history with shorter
chains. You will not notice any performance degradation (unless I run
past 1 year history in my case). And the result pack of git.git is 39M
rather than 64M with standard depth.

The use of loose objects to mark recent objects is not efficient (but
fast for this prototype). We could store an SHA-1 map instead.

-- 8 --
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 541667f..0e9dc8c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -82,6 +82,7 @@ static int num_preferred_base;
 static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
+static int no_loose;
 
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
@@ -2204,7 +2205,12 @@ static void show_object(struct object *obj,
const struct name_path *path, const char *last,
void *data)
 {
-   char *name = path_name(path, last);
+   char *name;
+
+   if (no_loose  has_loose_object(obj-sha1))
+   return;
+
+   name = path_name(path, last);
 
add_preferred_base_object(name);
add_object_entry(obj-sha1, obj-type, name, 0);
@@ -2487,6 +2493,7 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
{ OPTION_SET_INT, 0, reflog, rev_list_reflog, NULL,
  N_(include objects referred by reflog entries),
  PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, 1 },
+   OPT_BOOL(0, exclude-loose, no_loose, ),
OPT_BOOL(0, stdout, pack_to_stdout,
 N_(output pack to stdout)),
OPT_BOOL(0, include-tag, include_tag,
diff --git a/builtin/repack.c b/builtin/repack.c
index bb2314c..9b8bb35 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -137,6 +137,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+   int no_loose = 0;
 
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, pack_everything,
@@ -152,6 +153,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
N_(pass --no-reuse-object to 
git-pack-objects)),
OPT_BOOL('n', NULL, no_update_server_info,
N_(do not run git-update-server-info)),
+   OPT_BOOL(0, exclude-loose, no_loose, ),
OPT__QUIET(quiet, N_(be quiet)),
OPT_BOOL('l', local, local,
N_(pass --local to git-pack-objects)),
@@ -184,6 +186,8 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argv_array_push(cmd_args, --non-empty);
argv_array_push(cmd_args, --all);
argv_array_push(cmd_args, --reflog);
+   if (no_loose)
+   argv_array_push(cmd_args, --exclude-loose);
if (window)
argv_array_pushf(cmd_args, --window=%s, window);
if (window_memory)
diff --git a/sha1_file.c b/sha1_file.c
index 6e8c05d..d0988f2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -454,7 +454,7 @@ int has_loose_object_nonlocal(const unsigned char *sha1)
return 0;
 }
 
-static int has_loose_object(const unsigned char *sha1)
+int has_loose_object(const unsigned char *sha1)
 {
return has_loose_object_local(sha1) ||
   has_loose_object_nonlocal(sha1);
@@ -2114,6 +2114,11 @@ struct unpack_entry_stack_ent {
unsigned long size;
 };
 
+static void 

Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-19 Thread Philippe Vaucher
 fwiw this is the thread that added --depth=250

 http://thread.gmane.org/gmane.comp.gcc.devel/94565/focus=94626

This post is quite interesting:
http://article.gmane.org/gmane.comp.gcc.devel/94637

Philippe
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-19 Thread David Kastrup
Philippe Vaucher philippe.vauc...@gmail.com writes:

 fwiw this is the thread that added --depth=250

 http://thread.gmane.org/gmane.comp.gcc.devel/94565/focus=94626

 This post is quite interesting:
 http://article.gmane.org/gmane.comp.gcc.devel/94637

Yes.  Of course I am prejudiced because I volunteered fixing git-blame
on the Emacs developer list in order to make it more feasible to
transfer the Emacs repository to Git.

Calling git blame via C-x v g is a rather important part of the
workflow, and it's currently intolerable to work with on a number of
files.

While I'm fixing the basic shortcomings in builtin/blame.c itself, the
operation fetch the objects is necessary for all objects at least
once.  It's conceivable that some nice caching strategy would help with
avoiding the repeated traversal of long delta chain tails.  That could
also help defusing the operation of basic stuff like git-log.

But the short and long end of it is that there are valid operations
accessing a large amount of past history, and one point of having a
distributed version control system with non-shallow repository by
default is to have history and ways of working with it at one's hand.

And git's default modus of operation is _not_ to store things like
copies and moves and renames in commits, but deduce them from looking at
the stored data.  So making looking at stored data including old data
expensive means that Git does not work well in the way it is designed to
operate.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-19 Thread Duy Nguyen
On Wed, Feb 19, 2014 at 3:38 PM, Philippe Vaucher
philippe.vauc...@gmail.com wrote:
 fwiw this is the thread that added --depth=250

 http://thread.gmane.org/gmane.comp.gcc.devel/94565/focus=94626

 This post is quite interesting:
 http://article.gmane.org/gmane.comp.gcc.devel/94637

Especially this part

-- 8 --
And quite frankly, a delta depth
of 250 is likely going to cause overflows in the delta cache (which is
only 256 entries in size *and* it's a hash, so it's going to start having
hash conflicts long before hitting the 250 depth limit).
-- 8 --

So in order to get file A's content, we go through its 250 level chain
(and fill the cache), then we get to file B and do the same, which
evicts nearly everything from A. By the time we go to the next commit,
we have to go through 250 levels for A again because the cache is
pretty much useless.

I can think of two improvements we could make, either increase cache
size dynamically (within limits) or make it configurable. If we have N
entries in worktree (both trees and blobs) and depth M, then we might
need to cache N*M objects for it to be effective. Christian, if you
want to experiment this, update MAX_DELTA_CACHE in sha1_file.c and
rebuild.

The other is smarter eviction, instead of throwing all A's cached
items out (based on recent order), keep the last few items of A and
evict B's oldest cached items. Hopefully by the next comit, we can
still reuse some cache for A and other files/trees. Delta cache needs
to learn about grouping to achieve this.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-19 Thread Duy Nguyen
On Wed, Feb 19, 2014 at 4:01 PM, David Kastrup d...@gnu.org wrote:
 Calling git blame via C-x v g is a rather important part of the
 workflow, and it's currently intolerable to work with on a number of
 files.

 While I'm fixing the basic shortcomings in builtin/blame.c itself, the
 operation fetch the objects is necessary for all objects at least
 once.  It's conceivable that some nice caching strategy would help with
 avoiding the repeated traversal of long delta chain tails.  That could
 also help defusing the operation of basic stuff like git-log.

Pack v4 is supposed to tackle this delta chain thing, but its future
is a bit uncertain (you can give a hand btw). If you often do git
blame, you might consider unpack most accessed objects (make it part
of blame process), which would function exactly like a cache with no
extra code. The downside is git-gc --auto is more likely to kick in
because of too many loose objects and pack everything up again.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-19 Thread Junio C Hamano
Philippe Vaucher philippe.vauc...@gmail.com writes:

 fwiw this is the thread that added --depth=250

 http://thread.gmane.org/gmane.comp.gcc.devel/94565/focus=94626

 This post is quite interesting:
 http://article.gmane.org/gmane.comp.gcc.devel/94637

Yes, it most clearly says that --depth=250 was *not* a
recommendation, with technical background to explain why such a long
delta chain is a bad idea.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-19 Thread Christian Jaeger
2014-02-19 10:14 GMT+00:00 Duy Nguyen pclo...@gmail.com:
 Christian, if you
 want to experiment this, update MAX_DELTA_CACHE in sha1_file.c and
 rebuild.

I don't have the time right now. (Perhaps next week?)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread David Kastrup
Christian Jaeger chr...@gmail.com writes:

 I've got a repository where git log --raw  _somefile took a few
 seconds in the past, but after an attempt at merging some commits that
 were collected in a clone of the same repo that was created about a
 year ago, I noticed that this command was now taking 3 minutes 7
 seconds. git gc, git fsck, git clone file:///the/repo/.git also
 now each took between ~4-10 minutes, also git log --raw somefile got
 equally unusably slow. With the help of the people on the IRC, I
 tracked it down to my recent use of git gc --aggressive in this
 repo. Running git repack -a -d -f solved it, now it's again taking
 4-5 seconds. After running git gc --aggressive again for
 confirmation, git log --raw  _somefile was again slowed down,
 although now 'only' to 1 minute 34 seconds;

[...]

 I've now learned to avoid git gc --aggressive. Perhaps there are
 some other conclusions to be drawn, I don't know.

I've seen the same with my ongoing work on git-blame with the current
Emacs Git mirror.  Aggressive packing reduces the repository size to
about a quarter, but it blows up the system time (mainly I/O)
significantly, quite reducing the total benefits of my algorithmic
improvements there.

There is also some quite visible additional time spent in zlib, so a
wild guess would be that zlib is not really suited to the massive amount
of directory entries of a Git object store.  Since the system time still
dominates, this guess would only make sense if Git over zlib kept
rereading the directory section of whatever compressed file we are
talking about.  But that's really a rather handwavy wild guess without
anything better than a hunch to back it up.  I don't even know what kind
of compression and/or packs are used: I've only ever messed myself with
the delta coding of the normal unpacked operation (there are a few
older commits from me on that).

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread Duy Nguyen
On Tue, Feb 18, 2014 at 3:55 PM, David Kastrup d...@gnu.org wrote:
 Christian Jaeger chr...@gmail.com writes:

 I've got a repository where git log --raw  _somefile took a few
 seconds in the past, but after an attempt at merging some commits that
 were collected in a clone of the same repo that was created about a
 year ago, I noticed that this command was now taking 3 minutes 7
 seconds. git gc, git fsck, git clone file:///the/repo/.git also
 now each took between ~4-10 minutes, also git log --raw somefile got
 equally unusably slow. With the help of the people on the IRC, I
 tracked it down to my recent use of git gc --aggressive in this
 repo. Running git repack -a -d -f solved it, now it's again taking
 4-5 seconds. After running git gc --aggressive again for
 confirmation, git log --raw  _somefile was again slowed down,
 although now 'only' to 1 minute 34 seconds;

 [...]

 I've now learned to avoid git gc --aggressive. Perhaps there are
 some other conclusions to be drawn, I don't know.

 I've seen the same with my ongoing work on git-blame with the current
 Emacs Git mirror.  Aggressive packing reduces the repository size to
 about a quarter, but it blows up the system time (mainly I/O)
 significantly, quite reducing the total benefits of my algorithmic
 improvements there.

Likely because --aggressive passes --depth=250 to pack-objects. Long
delta chains could reduce pack size and increase I/O as well as zlib
processing signficantly. Christian can try git repack -adf which is
really close to --aggressive (except it uses default --depth=50) and
see if it makes any difference.

 There is also some quite visible additional time spent in zlib, so a
 wild guess would be that zlib is not really suited to the massive amount
 of directory entries of a Git object store.  Since the system time still
 dominates, this guess would only make sense if Git over zlib kept
 rereading the directory section of whatever compressed file we are
 talking about.  But that's really a rather handwavy wild guess without
 anything better than a hunch to back it up.  I don't even know what kind
 of compression and/or packs are used: I've only ever messed myself with
 the delta coding of the normal unpacked operation (there are a few
 older commits from me on that).

 --
 David Kastrup
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread David Kastrup
Duy Nguyen pclo...@gmail.com writes:

 On Tue, Feb 18, 2014 at 3:55 PM, David Kastrup d...@gnu.org wrote:

 I've seen the same with my ongoing work on git-blame with the current
 Emacs Git mirror.  Aggressive packing reduces the repository size to
 about a quarter, but it blows up the system time (mainly I/O)
 significantly, quite reducing the total benefits of my algorithmic
 improvements there.

 Likely because --aggressive passes --depth=250 to pack-objects. Long
 delta chains could reduce pack size and increase I/O as well as zlib
 processing signficantly.

Increased zlib processing time is one thing, but if it _increases_ I/O,
then it would seem there is a serious impedance mismatch between the
compression scheme and the code relying on it, leading to repeated reads
of blocks only needed for reconstructing dynamic compression
dictionaries.

Compression should reduce rather than increase the total amount of
reads.  So it would seem that either better caching and/or smaller
independent block sizes and/or strategies for sorting the delta chain to
make its resolution require mostly linear reads, and then make sure to
do this in a manner that does not reinitialize the decompression for
accessing each delta that happens to be more or less in sequence.

Of course, this is assuming that the additional time is spent
uncompressing data rather than navigating directories.

It's actually conceivable that there is quite a bit of potential to get
better performance from unchanged readers by packing stuff in a
different order while still using the same delta chain depth.

-- 
David Kastrup
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread Jonathan Nieder
David Kastrup wrote:
 Duy Nguyen pclo...@gmail.com writes:

 Likely because --aggressive passes --depth=250 to pack-objects. Long
 delta chains could reduce pack size and increase I/O as well as zlib
 processing signficantly.
[...]
 Compression should reduce rather than increase the total amount of
 reads.

--depth=250 means to allow chains of To get this object, first
inflate this object, then apply this delta of length 250.

That's absurdly long, and doesn't even help compression much in
practice (many short chains referring to the same objects tends to
work fine).  We probably shouldn't make --aggressive do that.
Something like --depth=10 would make more sense.

Hoping that clarifies,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread Christian Jaeger
2014-02-18 9:45 GMT+00:00 Duy Nguyen pclo...@gmail.com:
 Christian can try git repack -adf

That's what I already mentioned in my first mail is what I used to fix
the problem.

Here are some 'hard' numbers, FWIW:

- both ~/scr and swap are on the same SSD;

$ free
 total   used   free sharedbuffers cached
Mem:   39967483800828 195920  0 3771761078848
-/+ buffers/cache:23448041651944
Swap:  2097148 1697601927388

git only used up to about 100 MB of VIRT or RSS when I checked, there
was an ulimit of -S -v 120.

- this is git version 1.7.10.4 (1:1.7.10.4-1+wheezy1 i386 Debian)

- after my attempted merge (which had conflicts and I had then
cancelled by way of git reset --hard), and then a git gc, the times
were:

~/scr$ time git log --raw  _THELOG

real 3m7.002s
user 2m0.252s
sys 1m6.008s

- on a copy:

/dev/shm/scr$ time git repack -a -d -f
Counting objects: 34917, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (27038/27038), done.
Writing objects: 100% (34917/34917), done.
Total 34917 (delta 13928), reused 0 (delta 0)

real 4m33.193s
user 3m42.950s
sys 1m13.821s

/dev/shm/scr$ time git log --raw  _THELOG2

real 0m8.276s
user 0m7.192s
sys 0m1.052s

(not sure why it took 8s here, perhaps I had another process running
at the same time? Compare with the 0m4.913s below.)

/dev/shm/scr$ time g-gc --aggressive
Counting objects: 36066, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (27812/27812), done.
Writing objects: 100% (36066/36066), done.
Total 36066 (delta 14367), reused 21699 (delta 0)
Checking connectivity: 36066, done.

real 5m52.013s
user 8m28.652s
sys 1m4.308s

/dev/shm/scr$ time git log --raw  _THELOG2

real 1m34.430s
user 0m47.291s
sys 0m46.615s

/dev/shm/scr$ time git repack -adf
Counting objects: 36066, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (27812/27812), done.
Writing objects: 100% (36066/36066), done.
Total 36066 (delta 14256), reused 21699 (delta 0)

real 2m32.083s
user 1m51.295s
sys 1m4.940s

/dev/shm/scr$ time git log --raw  _THELOG3

real 0m4.913s
user 0m3.944s
sys 0m0.944s

/dev/shm/scr$ du -s .git
43728 .git

- back in the original place:

~/scr$ time git repack -a -d -f
Counting objects: 36066, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (27812/27812), done.
Writing objects: 100% (36066/36066), done.
Total 36066 (delta 14257), reused 21700 (delta 0)

real 4m6.503s
user 3m16.568s
sys 1m11.640s

~/scr$ time git log --raw  _THELOG2

real 0m5.002s
user 0m4.032s
sys 0m0.952s
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread Junio C Hamano
Jonathan Nieder jrnie...@gmail.com writes:

 David Kastrup wrote:
 Duy Nguyen pclo...@gmail.com writes:

 Likely because --aggressive passes --depth=250 to pack-objects. Long
 delta chains could reduce pack size and increase I/O as well as zlib
 processing signficantly.
 [...]
 Compression should reduce rather than increase the total amount of
 reads.

 --depth=250 means to allow chains of To get this object, first
 inflate this object, then apply this delta of length 250.

 That's absurdly long, and doesn't even help compression much in
 practice (many short chains referring to the same objects tends to
 work fine).  We probably shouldn't make --aggressive do that.
 Something like --depth=10 would make more sense.

Yes, my thinking indeed.

I didn't know --agressive was so aggressive myself, as I personally
never use it. git repack -a -d -f --depth=32 window=4000 is what I
often use, but I suspect most people would not be patient enough for
that 4k window.

Let's do something like this first and then later make --depth
configurable just like --width, perhaps?  For aggressive, I think
the default width (hardcoded to 250 but configurable) is a bit too
narrow.

 builtin/gc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 6be6c8d..0d010f0 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -204,7 +204,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
if (aggressive) {
argv_array_push(repack, -f);
-   argv_array_push(repack, --depth=250);
+   argv_array_push(repack, --depth=20);
if (aggressive_window  0)
argv_array_pushf(repack, --window=%d, 
aggressive_window);
}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread Duy Nguyen
On Wed, Feb 19, 2014 at 3:59 AM, Junio C Hamano gits...@pobox.com wrote:
 Let's do something like this first and then later make --depth
 configurable just like --width, perhaps?  For aggressive, I think
 the default width (hardcoded to 250 but configurable) is a bit too
 narrow.

  builtin/gc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

 diff --git a/builtin/gc.c b/builtin/gc.c
 index 6be6c8d..0d010f0 100644
 --- a/builtin/gc.c
 +++ b/builtin/gc.c
 @@ -204,7 +204,7 @@ int cmd_gc(int argc, const char **argv, const char 
 *prefix)

 if (aggressive) {
 argv_array_push(repack, -f);
 -   argv_array_push(repack, --depth=250);
 +   argv_array_push(repack, --depth=20);
 if (aggressive_window  0)
 argv_array_pushf(repack, --window=%d, 
 aggressive_window);
 }

Lower depth than default (50) does not sound aggressive to me, at
least from disk space utilization. I agree it should be configurable
though.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 Lower depth than default (50) does not sound aggressive to me, at
 least from disk space utilization. I agree it should be configurable
 though.

Do you mean you want to keep --aggressive to mean too aggressive
in resulting size, to the point that it is not useful to anybody?

Shallow and wide will give us, with a large window, the most
aggressively efficient packfiles that are useful, and we would
rather want to fix it to be usable, I would think.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git gc --aggressive led to about 40 times slower git log --raw

2014-02-18 Thread Duy Nguyen
On Wed, Feb 19, 2014 at 7:10 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 Lower depth than default (50) does not sound aggressive to me, at
 least from disk space utilization. I agree it should be configurable
 though.

 Do you mean you want to keep --aggressive to mean too aggressive
 in resulting size, to the point that it is not useful to anybody?

git-gc.txt is pretty vague about this --aggressive. I assume we would
want both, better disk utilization and performance. But if it produces
a tiny pack that takes forever to access, then it's definitely bad
aggression.

 Shallow and wide will give us, with a large window, the most
 aggressively efficient packfiles that are useful, and we would
 rather want to fix it to be usable, I would think.

fwiw this is the thread that added --depth=250

http://thread.gmane.org/gmane.comp.gcc.devel/94565/focus=94626

yes, if reducing depth leads to better performance and does not use
much disk in general case, then of course we should do it. General
case may be hard to define though. It'd be best if we have some sort
of heuristics to try out different combinations on a specific repo and
return the best combination of parameters. It could even take longer
time, but once we have good parameters, they should remain good for a
long time, I think.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git gc --aggressive led to about 40 times slower git log --raw

2014-02-17 Thread Christian Jaeger
Hi

I've got a repository where git log --raw  _somefile took a few
seconds in the past, but after an attempt at merging some commits that
were collected in a clone of the same repo that was created about a
year ago, I noticed that this command was now taking 3 minutes 7
seconds. git gc, git fsck, git clone file:///the/repo/.git also
now each took between ~4-10 minutes, also git log --raw somefile got
equally unusably slow. With the help of the people on the IRC, I
tracked it down to my recent use of git gc --aggressive in this
repo. Running git repack -a -d -f solved it, now it's again taking
4-5 seconds. After running git gc --aggressive again for
confirmation, git log --raw  _somefile was again slowed down,
although now 'only' to 1 minute 34 seconds; did perhaps my git remote
add -f other-repo, which I remember was also running rather slowly,
exacerbate the problem (to the  3 minutes I was seeing)?

The repo has about 6000 commits, about 12'000 files in the current
HEAD, and about 43 MB packed .git contents. The files are (almost) all
plain text, about half of them are about 42 bytes long, the rest up to
about 2 MB although most of them are just around 5-50 KB. Most files
mostly grow at the end. The biggest files (500KB-2MB) are quite
long-lived and don't stop growing, again mostly at the end. Also,
about 2*5K files are each in the same directory, meaning that the tree
objects representing those 2 directories are big but changing only in
a few places.

I've now learned to avoid git gc --aggressive. Perhaps there are
some other conclusions to be drawn, I don't know.

Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html