Re: [PATCH v5 00/11] nd/pack-objects-pack-struct updates

2018-03-17 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 17 2018, Ævar Arnfjörð Bjarmason jotted:

> [...]I.e. on git.git we end up with just over a a 8.5% reduction, and[...]

Urgh, sorry, this should say "on linux.git...". None of these numbers
came from testing git.git.


Re: [PATCH v5 00/11] nd/pack-objects-pack-struct updates

2018-03-17 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 17 2018, Nguyễn Thái Ngọc Duy jotted:

> v5 changes are small enough that the interdiff is pretty self
> explanatory (there's also a couple commit msg updates).

I've been testing this and it's definitely an improvement. I think it
would be good to get some mention in the commit messages themselves of
the incremental improvement, to that end I wrote this:

$ cat /tmp/howmuch-mem.sh
#!/bin/sh
cd /tmp &&
(
for i in {1..3}
do
/usr/bin/time -f MaxRSS:%M ~/g/git/git --git-dir=/tmp/linux.git 
--exec-path=/home/avar/g/git repack -A -d 2>&1
done | grep MaxRSS: | sort -n | head -n 1 | tr '\n' '\t' &&
git git-commit-summary &&
echo
) | tee -a /tmp/git-memory.log

I.e. we repack linux.git (I'd already repacked it once) and do three
runs, and take the lowest RSS size. This yields (I rebased your series
on top of g...@github.com:git/git.git master and pushed it to
g...@github.com:avar/git.git pack-objects-reduce-memory-footprint), via:

git rebase --exec='make -j8 CFLAGS="-O3" && /tmp/howmuch-mem.sh' -i

That gave me, kb on the first column:

MaxRSS:3746648  f23a196dd9 ("pack-objects: a bit of document about struct 
object_entry", 2018-03-01)
MaxRSS:3700696  953b6473d7 ("pack-objects: turn type and in_pack_type to 
bitfields", 2018-03-01)
MaxRSS:3700404  6cbe573539 ("pack-objects: use bitfield for 
object_entry::dfs_state", 2018-03-01)
MaxRSS:3654044  0b93ebcae9 ("pack-objects: use bitfield for 
object_entry::depth", 2018-03-01)
MaxRSS:3654040  67a4d48773 ("pack-objects: move in_pack_pos out of struct 
object_entry", 2018-03-01) [X]
MaxRSS:3654104  e77319c65a ("pack-objects: move in_pack out of struct 
object_entry", 2018-03-01) [X]
MaxRSS:3608096  a72cfcfea3 ("pack-objects: refer to delta objects by index 
instead of pointer", 2018-03-01)
MaxRSS:3562212  76eaa779eb ("pack-objects: shrink z_delta_size field in 
struct object_entry", 2018-03-05)
MaxRSS:3515164  42e28dd4b3 ("pack-objects: shrink size field in struct 
object_entry", 2018-03-05)
MaxRSS:3469440  26eba3ded4 ("pack-objects: shrink delta_size field in 
struct object_entry", 2018-03-05)
MaxRSS:3423704  c6493de964 ("pack-objects.h: reorder members to shrink 
struct object_entry", 2018-03-12)

I.e. on git.git we end up with just over a a 8.5% reduction, and
interestingly have a slight increase over a past commit in one change,
and one that just makes 4kb of difference (marked via [X] above).

Also, your v0 says it overall saves 260MB of memory. According to this
it's 320MB. You did note some reductions in subsequent patches, but it's
worth calling that out explicitly.

I have a bigger in-house repo that looks like this with this change:

MaxRSS:4753120  f23a196dd9 ("pack-objects: a bit of document about struct 
object_entry", 2018-03-01)
MaxRSS:4699084  953b6473d7 ("pack-objects: turn type and in_pack_type to 
bitfields", 2018-03-01)
MaxRSS:4699028  6cbe573539 ("pack-objects: use bitfield for 
object_entry::dfs_state", 2018-03-01)
MaxRSS:4645452  0b93ebcae9 ("pack-objects: use bitfield for 
object_entry::depth", 2018-03-01)
MaxRSS:4645288  67a4d48773 ("pack-objects: move in_pack_pos out of struct 
object_entry", 2018-03-01)
MaxRSS:4645548  e77319c65a ("pack-objects: move in_pack out of struct 
object_entry", 2018-03-01)
MaxRSS:4591484  a72cfcfea3 ("pack-objects: refer to delta objects by index 
instead of pointer", 2018-03-01)
MaxRSS:4537980  76eaa779eb ("pack-objects: shrink z_delta_size field in 
struct object_entry", 2018-03-05)
MaxRSS:4484148  42e28dd4b3 ("pack-objects: shrink size field in struct 
object_entry", 2018-03-05)
MaxRSS:4430404  26eba3ded4 ("pack-objects: shrink delta_size field in 
struct object_entry", 2018-03-05)
MaxRSS:4376148  c6493de964 ("pack-objects.h: reorder members to shrink 
struct object_entry", 2018-03-12)

I.e. a tad more than a 7.9% reduction in memory use.

This series also doesn't make a difference to the total runtime (which
is good, just wanted to make sure). On linux.git on my box best out of
three is 1:15.74 before and 1:14.93 after, which is within the margin of
random error.


[PATCH v5 00/11] nd/pack-objects-pack-struct updates

2018-03-17 Thread Nguyễn Thái Ngọc Duy
v5 changes are small enough that the interdiff is pretty self
explanatory (there's also a couple commit msg updates).

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c388d87c3e..fb2aba80bf 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1611,7 +1611,7 @@ static void drop_reused_delta(struct object_entry *entry)
/*
 * We failed to get the info from this pack for some reason;
 * fall back to sha1_object_info, which may find another copy.
-* And if that fails, the error will be recorded in entry->type
+* And if that fails, the error will be recorded in 
oe_type(entry)
 * and dealt with in prepare_pack().
 */
oe_set_type(entry, sha1_object_info(entry->idx.oid.hash,
@@ -1968,7 +1968,7 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
max_size);
if (!delta_buf)
return 0;
-   if (delta_size >= maximum_unsigned_value_of_type(uint32_t))
+   if (delta_size >= (1 << OE_DELTA_SIZE_BITS))
return 0;
 
if (DELTA(trg_entry)) {
@@ -2125,8 +2125,8 @@ static void find_deltas(struct object_entry **list, 
unsigned *list_size,
unsigned long size;
 
size = do_compress(>delta_data, 
DELTA_SIZE(entry));
-   entry->z_delta_size = size;
-   if (entry->z_delta_size == size) {
+   if (size < (1 << OE_Z_DELTA_BITS)) {
+   entry->z_delta_size = size;
cache_lock();
delta_cache_size -= DELTA_SIZE(entry);
delta_cache_size += entry->z_delta_size;
diff --git a/pack-objects.h b/pack-objects.h
index 0fa0c83294..8979289f5f 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -27,14 +27,15 @@ enum dfs_state {
  *
  * basic object info
  * -
- * idx.oid is filled up before delta searching starts. idx.crc32 and
- * is only valid after the object is written out and will be used for
+ * idx.oid is filled up before delta searching starts. idx.crc32 is
+ * only valid after the object is written out and will be used for
  * generating the index. idx.offset will be both gradually set and
  * used in writing phase (base objects get offset first, then deltas
  * refer to them)
  *
- * "size" is the uncompressed object size. Compressed size is not
- * cached (ie. raw data in a pack) but available via revindex.
+ * "size" is the uncompressed object size. Compressed size of the raw
+ * data for an object in a pack is not stored anywhere but is computed
+ * and made available when reverse .idx is made.
  *
  * "hash" contains a path name hash which is used for sorting the
  * delta list and also during delta searching. Once prepare_pack()
@@ -42,16 +43,16 @@ enum dfs_state {
  *
  * source pack info
  * 
- * The (in_pack, in_pack_offset, in_pack_header_size) tuple contains
- * the location of the object in the source pack, with or without
- * header.
+ * The (in_pack, in_pack_offset) tuple contains the location of the
+ * object in the source pack. in_pack_header_size allows quickly
+ * skipping the header and going straight to the zlib stream.
  *
  * "type" and "in_pack_type" both describe object type. in_pack_type
  * may contain a delta type, while type is always the canonical type.
  *
  * deltas
  * --
- * Delta links (delta, delta_child and delta_sibling) are created
+ * Delta links (delta, delta_child and delta_sibling) are created to
  * reflect that delta graph from the source pack then updated or added
  * during delta searching phase when we find better deltas.
  *
@@ -59,7 +60,7 @@ enum dfs_state {
  * compute_write_order(). "delta" and "delta_size" must remain valid
  * at object writing phase in case the delta is not cached.
  *
- * If a delta is cached in memory and is compressed delta_data points
+ * If a delta is cached in memory and is compressed, delta_data points
  * to the data and z_delta_size contains the compressed size. If it's
  * uncompressed [1], z_delta_size must be zero. delta_size is always
  * the uncompressed size and must be valid even if the delta is not
@@ -274,12 +275,19 @@ static inline unsigned long oe_size(const struct 
object_entry *e)
}
 }
 
+static inline int contains_in_32bits(unsigned long limit)
+{
+   uint32_t truncated_limit = (uint32_t)limit;
+
+   return limit == truncated_limit;
+}
+
 static inline int oe_size_less_than(const struct object_entry *e,
unsigned long limit)
 {
if (e->size_valid)
return e->size_ < limit;
-   if (limit > maximum_unsigned_value_of_type(uint32_t))
+   if (contains_in_32bits(limit))
return 1;
return