Re: Regression in memory consumption of git fsck

2018-02-14 Thread Jeff King
On Wed, Feb 14, 2018 at 01:48:28PM +0100, SZEDER Gábor wrote:

> > If I revert that commit (on top of current master) the memory
> > consumption goes down to 2GB again. The change looks relatively harmless
> > to me, so does anyone know what's going on here?
> 
> I could reproduce the increased memory usage even for much smaller
> repositories.  The patch below seems to fix it for me.
> 
> 
>  -- >8 --
> 
> Subject: [PATCH] fsck: plug tree buffer leak

I think this is fixed already in ba3a08ca0e (fsck: fix leak when
traversing trees, 2018-01-20), which is in 'next' (and marked for "will
merge to master in yesterday's "what's cooking").

-Peff


Re: Regression in memory consumption of git fsck

2018-02-14 Thread SZEDER Gábor
> I've noticed that recent versions of git consume a lot of memory during
> "git fsck", to the point where I've regularly had git fall victim to
> Linux's OOM killer.
> 
> For example, if I clone torvalds/linux.git, and then run "git fsck
> --connectivity-only" in the newly cloned repository, git will consume
> more than 6GB of physical memory, while older versions peak at about 2GB.
> 
> I've managed to bisect this down to this commit in v2.14:
> 
> ad2db4030e42890e569de529e3cd61a8d03de497
> fsck: remove redundant parse_tree() invocation
> 
> If I revert that commit (on top of current master) the memory
> consumption goes down to 2GB again. The change looks relatively harmless
> to me, so does anyone know what's going on here?

I could reproduce the increased memory usage even for much smaller
repositories.  The patch below seems to fix it for me.


 -- >8 --

Subject: [PATCH] fsck: plug tree buffer leak

Commit ad2db4030e (fsck: remove redundant parse_tree() invocation,
2017-07-18), along with that redundant call to parse_tree() in
traverse_one_object(), also removed a call to free_tree_buffer() from
that function.  This resulted in significantly increased memory usage
of 'git fsck' because of all the non-freed tree buffers; in case of
git.git and '--connectivity-only' it went from around 270MB to over
1.2GB.

Restore that free_tree_buffer() call to bring down memory usage to the
previous level.

Reported-by: Dominic Sacré 
Signed-off-by: SZEDER Gábor 
---

 builtin/fsck.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 7a8a679d4f..8bc1b59daf 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -180,7 +180,10 @@ static void mark_object_reachable(struct object *obj)
 
 static int traverse_one_object(struct object *obj)
 {
-   return fsck_walk(obj, obj, &fsck_walk_options);
+   int result = fsck_walk(obj, obj, &fsck_walk_options);
+   if (obj->type == OBJ_TREE)
+   free_tree_buffer((struct tree *)obj);
+   return result;
 }
 
 static int traverse_reachable(void)
-- 
2.16.1.347.gd41f2872c6



Regression in memory consumption of git fsck

2018-02-14 Thread Dominic Sacré
Hi,

I've noticed that recent versions of git consume a lot of memory during
"git fsck", to the point where I've regularly had git fall victim to
Linux's OOM killer.

For example, if I clone torvalds/linux.git, and then run "git fsck
--connectivity-only" in the newly cloned repository, git will consume
more than 6GB of physical memory, while older versions peak at about 2GB.

I've managed to bisect this down to this commit in v2.14:

ad2db4030e42890e569de529e3cd61a8d03de497
fsck: remove redundant parse_tree() invocation

If I revert that commit (on top of current master) the memory
consumption goes down to 2GB again. The change looks relatively harmless
to me, so does anyone know what's going on here?


Thanks,

Dominic