Re: [PATCH] pack-objects: no crc check when the cached version is used

2013-09-13 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 Current code makes pack-objects always do check_pack_crc() in
 unpack_entry() even if right after that we find out there's a cached
 version and pack access is not needed. Swap two code blocks, search
 for cached version first, then check crc.

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
 ---

Interesting.

This is only triggered inside pack-objects, which would read a lot
of data from existing packs, and the overhead for looking up the
entry from the revindex, faulting in the actual packdata, and
computing and comparing the crc would not be trivial, especially as
the cost is incurred over many objects we need to untangle in the
delta chain.  If you have interesting numbers to show how much this
improves the performance, I am curious to see it.

Good spotting ;-)

  sha1_file.c | 20 ++--
  1 file changed, 10 insertions(+), 10 deletions(-)

 diff --git a/sha1_file.c b/sha1_file.c
 index 8c2d1ed..4955724 100644
 --- a/sha1_file.c
 +++ b/sha1_file.c
 @@ -2126,6 +2126,16 @@ void *unpack_entry(struct packed_git *p, off_t 
 obj_offset,
   int i;
   struct delta_base_cache_entry *ent;
  
 + ent = get_delta_base_cache_entry(p, curpos);
 + if (eq_delta_base_cache_entry(ent, p, curpos)) {
 + type = ent-type;
 + data = ent-data;
 + size = ent-size;
 + clear_delta_base_cache_entry(ent);
 + base_from_cache = 1;
 + break;
 + }
 +
   if (do_check_packed_object_crc  p-index_version  1) {
   struct revindex_entry *revidx = find_pack_revindex(p, 
 obj_offset);
   unsigned long len = revidx[1].offset - obj_offset;
 @@ -2140,16 +2150,6 @@ void *unpack_entry(struct packed_git *p, off_t 
 obj_offset,
   }
   }
  
 - ent = get_delta_base_cache_entry(p, curpos);
 - if (eq_delta_base_cache_entry(ent, p, curpos)) {
 - type = ent-type;
 - data = ent-data;
 - size = ent-size;
 - clear_delta_base_cache_entry(ent);
 - base_from_cache = 1;
 - break;
 - }
 -
   type = unpack_object_header(p, w_curs, curpos, size);
   if (type != OBJ_OFS_DELTA  type != OBJ_REF_DELTA)
   break;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: no crc check when the cached version is used

2013-09-13 Thread Thomas Rast
Junio C Hamano gits...@pobox.com writes:

 Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 Current code makes pack-objects always do check_pack_crc() in
 unpack_entry() even if right after that we find out there's a cached
 version and pack access is not needed. Swap two code blocks, search
 for cached version first, then check crc.
[...]

 Interesting.

 This is only triggered inside pack-objects, which would read a lot
 of data from existing packs, and the overhead for looking up the
 entry from the revindex, faulting in the actual packdata, and
 computing and comparing the crc would not be trivial, especially as
 the cost is incurred over many objects we need to untangle in the
 delta chain.  If you have interesting numbers to show how much this
 improves the performance, I am curious to see it.

I can't see anything wrong with the patch, but then I haven't stared too
hard.  (It seems that my conversion around abe601b (sha1_file: remove
recursion in unpack_entry, 2013-03-27) was faithful on this point, the
problem has existed for longer than that.)

I tried the perf script below, but at least for the git repo the only
thing I can see is noise.

--- 8 --- t/perf/p5300-pack-object.sh --- 8 ---
#!/bin/sh

test_description=Tests object packing performance

. ./perf-lib.sh

test_perf_default_repo

test_perf 'pack-objects on commits in HEAD' '
git rev-list HEAD |
git pack-objects --stdout /dev/null
'

test_perf 'pack-objects on all of HEAD' '
git rev-list --objects HEAD |
git pack-objects --stdout /dev/null
'

test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: no crc check when the cached version is used

2013-09-13 Thread Duy Nguyen
On Sat, Sep 14, 2013 at 4:26 AM, Thomas Rast tr...@inf.ethz.ch wrote:
 Junio C Hamano gits...@pobox.com writes:

 Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 Current code makes pack-objects always do check_pack_crc() in
 unpack_entry() even if right after that we find out there's a cached
 version and pack access is not needed. Swap two code blocks, search
 for cached version first, then check crc.
 [...]

 Interesting.

 This is only triggered inside pack-objects, which would read a lot
 of data from existing packs, and the overhead for looking up the
 entry from the revindex, faulting in the actual packdata, and
 computing and comparing the crc would not be trivial, especially as
 the cost is incurred over many objects we need to untangle in the
 delta chain.  If you have interesting numbers to show how much this
 improves the performance, I am curious to see it.

No I don't have any timing numbers. I just updated the code to see how
many times crc is checked and how many times we find a cached version
after crc is checked. The numbers with git.git are 353535 and 113257
respectively. IOW we could reduce the number of crc checks by 30%.


 I can't see anything wrong with the patch, but then I haven't stared too
 hard.  (It seems that my conversion around abe601b (sha1_file: remove
 recursion in unpack_entry, 2013-03-27) was faithful on this point, the
 problem has existed for longer than that.)

 I tried the perf script below, but at least for the git repo the only
 thing I can see is noise.

--stdout does not set do_check_packed_object_crc, you need to run
pack-objects without --stdout (i.e. the real use case is repack)


 --- 8 --- t/perf/p5300-pack-object.sh --- 8 ---
 #!/bin/sh

 test_description=Tests object packing performance

 . ./perf-lib.sh

 test_perf_default_repo

 test_perf 'pack-objects on commits in HEAD' '
 git rev-list HEAD |
 git pack-objects --stdout /dev/null
 '

 test_perf 'pack-objects on all of HEAD' '
 git rev-list --objects HEAD |
 git pack-objects --stdout /dev/null
 '

 test_done



-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: no crc check when the cached version is used

2013-09-13 Thread Nicolas Pitre
On Sat, 14 Sep 2013, Duy Nguyen wrote:

 On Sat, Sep 14, 2013 at 4:26 AM, Thomas Rast tr...@inf.ethz.ch wrote:
  I tried the perf script below, but at least for the git repo the only
  thing I can see is noise.
 
 --stdout does not set do_check_packed_object_crc, you need to run
 pack-objects without --stdout (i.e. the real use case is repack)

And for those who might wonder why, you can have a look at the 
description for commit 0e8189e2708b.  This was probably one of my best 
commit logs ever.  ;-)

This commit also provides a hint about the cost of over-checking the 
CRC.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html