I recently came across a repository with a commit containing 100 million
paths in its tree. Cleverly, the whole repo fits into a 1.5K packfile
(can you guess how it was done?). Not so cleverly, running "diff-tree
--root" on that commit uses a large amount of memory. :)
I do not think it is worth optimizing for such a pathological
repository. But I was curious how much it would want (it OOM'd on my
64-bit 16G machine). The answer is roughly:
100,000,000 * (
8 bytes per pointer to diff_filepair in the diff_queue
+ 32 bytes per diff_filepair struct
+ 2 * (
96 bytes per diff_filespec struct
+ 12 bytes per filename (in this case)
)
)
which is about 25G. Plus malloc overhead. So obviously this example is
unreasonable. A more reasonable large case is something like WebKit at
~150K files, doing a diff against the empty tree. That's only 37M.
But while looking at it, I noticed a bunch of cleanups for
diff_filespec. With the patches below, sizeof(struct diff_filespec) on
my 64-bit machine goes from 96 bytes down to 80. Compiling with "-m32"
goes from 64 bytes down to 52.
The first few patches have cleanup value aside from the struct size
improvement. The last two are pure optimization. I doubt the
optimization is noticeable for any real-life cases, so I don't mind if
they get dropped. But they're quite trivial and obvious.
[1/5]: diff_filespec: reorder dirty_submodule macro definitions
[2/5]: diff_filespec: drop funcname_pattern_ident field
[3/5]: diff_filespec: drop xfrm_flags field
[4/5]: diff_filespec: reorder is_binary field
[5/5]: diff_filespec: use only 2 bits for is_binary flag
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html