Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-09 Thread Jeff King
On Wed, Aug 09, 2017 at 05:20:05AM -0400, Jeff King wrote:

> > I still wonder if we want to retire that conditional invocation of
> > sha1_entry_pos(), though.
> 
> I think so. Digging in the list for it, almost every mention is either
> somebody asking "should we scrap this?" or somebody showing benchmarks
> in which it is slower than the normal lookup (and then somebody asking
> "should we scrap this" :) ).
> 
> I just re-ran a simple benchmark and it is indeed slower. I also came
> across the hashcmp open-code-versus-memcmp discussion, which shows that
> the memcmp in recent glibc is much faster. That has been around long
> enough that it's probably worth switching to.

So here are two patches (on top of René's since there is otherwise a
minor textual conflict).

  [1/2]: sha1_file: drop experimental GIT_USE_LOOKUP search
  [2/2]: hashcmp: use memcmp instead of open-coded loop

 cache.h   |   9 +-
 sha1-lookup.c | 216 --
 sha1_file.c   |  11 --
 t/t5308-pack-detect-duplicates.sh |  11 +-
 t/test-lib.sh |   1 -
 5 files changed, 2 insertions(+), 246 deletions(-)

-Peff


Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-09 Thread Jeff King
On Tue, Aug 08, 2017 at 10:36:33PM -0700, Junio C Hamano wrote:

> > Actually, I take it back. The problem happens when we enter the loop
> > with no entries to look at. But both sha1_pos() and sha1_entry_pos()
> > return early before hitting their do-while loops in that case.
> 
> Ah, I was not looking at that part of the code.  Thanks.
> 
> I still wonder if we want to retire that conditional invocation of
> sha1_entry_pos(), though.

I think so. Digging in the list for it, almost every mention is either
somebody asking "should we scrap this?" or somebody showing benchmarks
in which it is slower than the normal lookup (and then somebody asking
"should we scrap this" :) ).

I just re-ran a simple benchmark and it is indeed slower. I also came
across the hashcmp open-code-versus-memcmp discussion, which shows that
the memcmp in recent glibc is much faster. That has been around long
enough that it's probably worth switching to.

-Peff


Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Aug 08, 2017 at 06:52:31PM -0400, Jeff King wrote:
>
>> > Interesting.  I see that we still have the conditional code to call
>> > out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
>> > over there, I wonder?  Alternatively, as we have had the experimental
>> > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
>> > perhaps we should write it off as a failed experiment and retire it?
>> 
>> There is also sha1_pos(), which seems to have the same problem (and is
>> used in several places).
>
> Actually, I take it back. The problem happens when we enter the loop
> with no entries to look at. But both sha1_pos() and sha1_entry_pos()
> return early before hitting their do-while loops in that case.

Ah, I was not looking at that part of the code.  Thanks.

I still wonder if we want to retire that conditional invocation of
sha1_entry_pos(), though.


Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jeff King
On Tue, Aug 08, 2017 at 06:52:31PM -0400, Jeff King wrote:

> > Interesting.  I see that we still have the conditional code to call
> > out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
> > over there, I wonder?  Alternatively, as we have had the experimental
> > sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
> > perhaps we should write it off as a failed experiment and retire it?
> 
> There is also sha1_pos(), which seems to have the same problem (and is
> used in several places).

Actually, I take it back. The problem happens when we enter the loop
with no entries to look at. But both sha1_pos() and sha1_entry_pos()
return early before hitting their do-while loops in that case.

-Peff


Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jeff King
On Tue, Aug 08, 2017 at 03:43:13PM -0700, Junio C Hamano wrote:

> > @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
> > hi = mi;
> > else
> > lo = mi+1;
> > -   } while (lo < hi);
> > +   }
> > return 0;
> >  }
> 
> Interesting.  I see that we still have the conditional code to call
> out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
> over there, I wonder?  Alternatively, as we have had the experimental
> sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
> perhaps we should write it off as a failed experiment and retire it?

There is also sha1_pos(), which seems to have the same problem (and is
used in several places).

-Peff


Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Junio C Hamano
René Scharfe  writes:

> find_pack_entry_one() uses the fan-out table of pack indexes to find out
> which entries match the first byte of the searched hash and does a
> binary search on this subset of the main index table.
>
> If there are no matching entries then lo and hi will have the same
> value.  The binary search still starts and compares the hash of the
> following entry (which has a non-matching first byte, so won't cause any
> trouble), or whatever comes after the sorted list of entries.
>
> The probability of that stray comparison matching by mistake is low, but
> let's not take any chances and check when entering the binary search
> loop if we're actually done already.
>
> Signed-off-by: Rene Scharfe 
> ---
>  sha1_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index b60ae15f70..11ee69a99d 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2799,7 +2799,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   return nth_packed_object_offset(p, pos);
>   }
>  
> - do {
> + while (lo < hi) {
>   unsigned mi = (lo + hi) / 2;
>   int cmp = hashcmp(index + mi * stride, sha1);
>  
> @@ -2812,7 +2812,7 @@ off_t find_pack_entry_one(const unsigned char *sha1,
>   hi = mi;
>   else
>   lo = mi+1;
> - } while (lo < hi);
> + }
>   return 0;
>  }

Interesting.  I see that we still have the conditional code to call
out to sha1-lookup.c::sha1_entry_pos().  Do we need a similar change
over there, I wonder?  Alternatively, as we have had the experimental
sha1-lookup.c::sha1_entry_pos() long enough without anybody using it,
perhaps we should write it off as a failed experiment and retire it?



Re: [PATCH] sha1_file: avoid comparison if no packed hash matches the first byte

2017-08-08 Thread Jonathan Nieder
René Scharfe wrote:

> find_pack_entry_one() uses the fan-out table of pack indexes to find out
> which entries match the first byte of the searched hash and does a
> binary search on this subset of the main index table.
>
> If there are no matching entries then lo and hi will have the same
> value.  The binary search still starts and compares the hash of the
> following entry (which has a non-matching first byte, so won't cause any
> trouble), or whatever comes after the sorted list of entries.
>
> The probability of that stray comparison matching by mistake is low, but
> let's not take any chances and check when entering the binary search
> loop if we're actually done already.
>
> Signed-off-by: Rene Scharfe 
> ---
>  sha1_file.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks for a clear explanation.

Sanity checking: is this correct in the sha1[0] == 0 case?  In that
case, we have lo = 0, hi = the 0th offset from the fanout table.  The
offsets in the fanout table are defined as "the number of objects in
the corresponding pack, the first byte of whose object name is less
than or equal to N."  So hi == lo would mean there are no objects with
id starting with 0, as hoped.

Or in other words, the [lo, hi) interval we're trying to search is
indeed a half-open interval.

Reviewed-by: Jonathan Nieder