Re: [PATCH] pack-objects: no crc check when the cached version is used
On Sat, 14 Sep 2013, Duy Nguyen wrote: > On Sat, Sep 14, 2013 at 4:26 AM, Thomas Rast wrote: > > I tried the perf script below, but at least for the git repo the only > > thing I can see is noise. > > --stdout does not set do_check_packed_object_crc, you need to run > pack-objects without --stdout (i.e. the real use case is repack) And for those who might wonder why, you can have a look at the description for commit 0e8189e2708b. This was probably one of my best commit logs ever. ;-) This commit also provides a hint about the cost of over-checking the CRC. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-objects: no crc check when the cached version is used
On Sat, Sep 14, 2013 at 4:26 AM, Thomas Rast wrote: > Junio C Hamano writes: > >> Nguyễn Thái Ngọc Duy writes: >> >>> Current code makes pack-objects always do check_pack_crc() in >>> unpack_entry() even if right after that we find out there's a cached >>> version and pack access is not needed. Swap two code blocks, search >>> for cached version first, then check crc. > [...] >> >> Interesting. >> >> This is only triggered inside pack-objects, which would read a lot >> of data from existing packs, and the overhead for looking up the >> entry from the revindex, faulting in the actual packdata, and >> computing and comparing the crc would not be trivial, especially as >> the cost is incurred over many objects we need to untangle in the >> delta chain. If you have interesting numbers to show how much this >> improves the performance, I am curious to see it. No I don't have any timing numbers. I just updated the code to see how many times crc is checked and how many times we find a cached version after crc is checked. The numbers with git.git are 353535 and 113257 respectively. IOW we could reduce the number of crc checks by 30%. > > I can't see anything wrong with the patch, but then I haven't stared too > hard. (It seems that my conversion around abe601b (sha1_file: remove > recursion in unpack_entry, 2013-03-27) was faithful on this point, the > problem has existed for longer than that.) > > I tried the perf script below, but at least for the git repo the only > thing I can see is noise. --stdout does not set do_check_packed_object_crc, you need to run pack-objects without --stdout (i.e. the real use case is repack) > > --- 8< --- t/perf/p5300-pack-object.sh --- 8< --- > #!/bin/sh > > test_description="Tests object packing performance" > > . ./perf-lib.sh > > test_perf_default_repo > > test_perf 'pack-objects on commits in HEAD' ' > git rev-list HEAD | > git pack-objects --stdout >/dev/null > ' > > test_perf 'pack-objects on all of HEAD' ' > git rev-list --objects HEAD | > git pack-objects --stdout >/dev/null > ' > > test_done -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-objects: no crc check when the cached version is used
Junio C Hamano writes: > Nguyễn Thái Ngọc Duy writes: > >> Current code makes pack-objects always do check_pack_crc() in >> unpack_entry() even if right after that we find out there's a cached >> version and pack access is not needed. Swap two code blocks, search >> for cached version first, then check crc. [...] > > Interesting. > > This is only triggered inside pack-objects, which would read a lot > of data from existing packs, and the overhead for looking up the > entry from the revindex, faulting in the actual packdata, and > computing and comparing the crc would not be trivial, especially as > the cost is incurred over many objects we need to untangle in the > delta chain. If you have interesting numbers to show how much this > improves the performance, I am curious to see it. I can't see anything wrong with the patch, but then I haven't stared too hard. (It seems that my conversion around abe601b (sha1_file: remove recursion in unpack_entry, 2013-03-27) was faithful on this point, the problem has existed for longer than that.) I tried the perf script below, but at least for the git repo the only thing I can see is noise. --- 8< --- t/perf/p5300-pack-object.sh --- 8< --- #!/bin/sh test_description="Tests object packing performance" . ./perf-lib.sh test_perf_default_repo test_perf 'pack-objects on commits in HEAD' ' git rev-list HEAD | git pack-objects --stdout >/dev/null ' test_perf 'pack-objects on all of HEAD' ' git rev-list --objects HEAD | git pack-objects --stdout >/dev/null ' test_done -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-objects: no crc check when the cached version is used
Nguyễn Thái Ngọc Duy writes: > Current code makes pack-objects always do check_pack_crc() in > unpack_entry() even if right after that we find out there's a cached > version and pack access is not needed. Swap two code blocks, search > for cached version first, then check crc. > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- Interesting. This is only triggered inside pack-objects, which would read a lot of data from existing packs, and the overhead for looking up the entry from the revindex, faulting in the actual packdata, and computing and comparing the crc would not be trivial, especially as the cost is incurred over many objects we need to untangle in the delta chain. If you have interesting numbers to show how much this improves the performance, I am curious to see it. Good spotting ;-) > sha1_file.c | 20 ++-- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/sha1_file.c b/sha1_file.c > index 8c2d1ed..4955724 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -2126,6 +2126,16 @@ void *unpack_entry(struct packed_git *p, off_t > obj_offset, > int i; > struct delta_base_cache_entry *ent; > > + ent = get_delta_base_cache_entry(p, curpos); > + if (eq_delta_base_cache_entry(ent, p, curpos)) { > + type = ent->type; > + data = ent->data; > + size = ent->size; > + clear_delta_base_cache_entry(ent); > + base_from_cache = 1; > + break; > + } > + > if (do_check_packed_object_crc && p->index_version > 1) { > struct revindex_entry *revidx = find_pack_revindex(p, > obj_offset); > unsigned long len = revidx[1].offset - obj_offset; > @@ -2140,16 +2150,6 @@ void *unpack_entry(struct packed_git *p, off_t > obj_offset, > } > } > > - ent = get_delta_base_cache_entry(p, curpos); > - if (eq_delta_base_cache_entry(ent, p, curpos)) { > - type = ent->type; > - data = ent->data; > - size = ent->size; > - clear_delta_base_cache_entry(ent); > - base_from_cache = 1; > - break; > - } > - > type = unpack_object_header(p, &w_curs, &curpos, &size); > if (type != OBJ_OFS_DELTA && type != OBJ_REF_DELTA) > break; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html