Re: [PATCH] pack-objects: no crc check when the cached version is used

2013-09-13 Thread Nicolas Pitre
On Sat, 14 Sep 2013, Duy Nguyen wrote:

> On Sat, Sep 14, 2013 at 4:26 AM, Thomas Rast  wrote:
> > I tried the perf script below, but at least for the git repo the only
> > thing I can see is noise.
> 
> --stdout does not set do_check_packed_object_crc, you need to run
> pack-objects without --stdout (i.e. the real use case is repack)

And for those who might wonder why, you can have a look at the 
description for commit 0e8189e2708b.  This was probably one of my best 
commit logs ever.  ;-)

This commit also provides a hint about the cost of over-checking the 
CRC.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: no crc check when the cached version is used

2013-09-13 Thread Duy Nguyen
On Sat, Sep 14, 2013 at 4:26 AM, Thomas Rast  wrote:
> Junio C Hamano  writes:
>
>> Nguyễn Thái Ngọc Duy  writes:
>>
>>> Current code makes pack-objects always do check_pack_crc() in
>>> unpack_entry() even if right after that we find out there's a cached
>>> version and pack access is not needed. Swap two code blocks, search
>>> for cached version first, then check crc.
> [...]
>>
>> Interesting.
>>
>> This is only triggered inside pack-objects, which would read a lot
>> of data from existing packs, and the overhead for looking up the
>> entry from the revindex, faulting in the actual packdata, and
>> computing and comparing the crc would not be trivial, especially as
>> the cost is incurred over many objects we need to untangle in the
>> delta chain.  If you have interesting numbers to show how much this
>> improves the performance, I am curious to see it.

No I don't have any timing numbers. I just updated the code to see how
many times crc is checked and how many times we find a cached version
after crc is checked. The numbers with git.git are 353535 and 113257
respectively. IOW we could reduce the number of crc checks by 30%.

>
> I can't see anything wrong with the patch, but then I haven't stared too
> hard.  (It seems that my conversion around abe601b (sha1_file: remove
> recursion in unpack_entry, 2013-03-27) was faithful on this point, the
> problem has existed for longer than that.)
>
> I tried the perf script below, but at least for the git repo the only
> thing I can see is noise.

--stdout does not set do_check_packed_object_crc, you need to run
pack-objects without --stdout (i.e. the real use case is repack)

>
> --- 8< --- t/perf/p5300-pack-object.sh --- 8< ---
> #!/bin/sh
>
> test_description="Tests object packing performance"
>
> . ./perf-lib.sh
>
> test_perf_default_repo
>
> test_perf 'pack-objects on commits in HEAD' '
> git rev-list HEAD |
> git pack-objects --stdout >/dev/null
> '
>
> test_perf 'pack-objects on all of HEAD' '
> git rev-list --objects HEAD |
> git pack-objects --stdout >/dev/null
> '
>
> test_done



-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: no crc check when the cached version is used

2013-09-13 Thread Thomas Rast
Junio C Hamano  writes:

> Nguyễn Thái Ngọc Duy  writes:
>
>> Current code makes pack-objects always do check_pack_crc() in
>> unpack_entry() even if right after that we find out there's a cached
>> version and pack access is not needed. Swap two code blocks, search
>> for cached version first, then check crc.
[...]
>
> Interesting.
>
> This is only triggered inside pack-objects, which would read a lot
> of data from existing packs, and the overhead for looking up the
> entry from the revindex, faulting in the actual packdata, and
> computing and comparing the crc would not be trivial, especially as
> the cost is incurred over many objects we need to untangle in the
> delta chain.  If you have interesting numbers to show how much this
> improves the performance, I am curious to see it.

I can't see anything wrong with the patch, but then I haven't stared too
hard.  (It seems that my conversion around abe601b (sha1_file: remove
recursion in unpack_entry, 2013-03-27) was faithful on this point, the
problem has existed for longer than that.)

I tried the perf script below, but at least for the git repo the only
thing I can see is noise.

--- 8< --- t/perf/p5300-pack-object.sh --- 8< ---
#!/bin/sh

test_description="Tests object packing performance"

. ./perf-lib.sh

test_perf_default_repo

test_perf 'pack-objects on commits in HEAD' '
git rev-list HEAD |
git pack-objects --stdout >/dev/null
'

test_perf 'pack-objects on all of HEAD' '
git rev-list --objects HEAD |
git pack-objects --stdout >/dev/null
'

test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-objects: no crc check when the cached version is used

2013-09-13 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy  writes:

> Current code makes pack-objects always do check_pack_crc() in
> unpack_entry() even if right after that we find out there's a cached
> version and pack access is not needed. Swap two code blocks, search
> for cached version first, then check crc.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---

Interesting.

This is only triggered inside pack-objects, which would read a lot
of data from existing packs, and the overhead for looking up the
entry from the revindex, faulting in the actual packdata, and
computing and comparing the crc would not be trivial, especially as
the cost is incurred over many objects we need to untangle in the
delta chain.  If you have interesting numbers to show how much this
improves the performance, I am curious to see it.

Good spotting ;-)

>  sha1_file.c | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 8c2d1ed..4955724 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2126,6 +2126,16 @@ void *unpack_entry(struct packed_git *p, off_t 
> obj_offset,
>   int i;
>   struct delta_base_cache_entry *ent;
>  
> + ent = get_delta_base_cache_entry(p, curpos);
> + if (eq_delta_base_cache_entry(ent, p, curpos)) {
> + type = ent->type;
> + data = ent->data;
> + size = ent->size;
> + clear_delta_base_cache_entry(ent);
> + base_from_cache = 1;
> + break;
> + }
> +
>   if (do_check_packed_object_crc && p->index_version > 1) {
>   struct revindex_entry *revidx = find_pack_revindex(p, 
> obj_offset);
>   unsigned long len = revidx[1].offset - obj_offset;
> @@ -2140,16 +2150,6 @@ void *unpack_entry(struct packed_git *p, off_t 
> obj_offset,
>   }
>   }
>  
> - ent = get_delta_base_cache_entry(p, curpos);
> - if (eq_delta_base_cache_entry(ent, p, curpos)) {
> - type = ent->type;
> - data = ent->data;
> - size = ent->size;
> - clear_delta_base_cache_entry(ent);
> - base_from_cache = 1;
> - break;
> - }
> -
>   type = unpack_object_header(p, &w_curs, &curpos, &size);
>   if (type != OBJ_OFS_DELTA && type != OBJ_REF_DELTA)
>   break;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html