Re: [PATCH] object: measure time needed for resolving hash collisions

2016-09-15 Thread Jeff King
On Thu, Sep 15, 2016 at 09:26:22AM -0700, Stefan Beller wrote:

> > It may also be possible to really micro-optimize it on some platforms,
> > because we know the size in advance (I'd kind of expect the compiler to
> > do that, but if we're ending up in glibc memcmp then it sounds like it
> > is not the case).
> 
> That stackoverflow link suggests that glibc already has microoptimisations
> for a variety of platforms.

It's definitely micro-optimized in glibc. What I was trying to say is
that if we are hitting the glibc implementation, then we know we are
handling the "20" at runtime. Whereas the compiler should know that "20"
is a constant, and could in theory skip the memcmp() call entirely in
favor of something like an unrolled loop.

-Peff


Re: [PATCH] object: measure time needed for resolving hash collisions

2016-09-15 Thread Jeff King
On Thu, Sep 15, 2016 at 10:45:34AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Measuring _just_ the collisions is more like the patch below. In my
> > measurements it's more like 30ms, compared to 10s for all of the
> > hashcmps.
> >
> > So we really aren't dealing with collisions, but rather just verifying
> > that our hash landed at the right spot. And _any_ data structure is
> > going to have to do that.
> 
> The reverse side of the coin may be if we can shrink the hashtable
> smaller and load it more heavily without sacrificing performance by
> making the necessary "have we landed at the right spot" check cheap
> enough, I guess.

I think that's where things like cuckoo hashing come into play. They
didn't have any effect for us because we already keep the table very
unloaded. But you could _probably_ increase the load factor without
sacrificing performance using a more clever scheme.

It's not clear to me that the current table size is a big problem,
though. It might be hurting us with cache effects, but I think the only
way we'd know is to measure.

-Peff


Re: [PATCH] object: measure time needed for resolving hash collisions

2016-09-15 Thread Junio C Hamano
Jeff King  writes:

> Measuring _just_ the collisions is more like the patch below. In my
> measurements it's more like 30ms, compared to 10s for all of the
> hashcmps.
>
> So we really aren't dealing with collisions, but rather just verifying
> that our hash landed at the right spot. And _any_ data structure is
> going to have to do that.

The reverse side of the coin may be if we can shrink the hashtable
smaller and load it more heavily without sacrificing performance by
making the necessary "have we landed at the right spot" check cheap
enough, I guess.



Re: [PATCH] object: measure time needed for resolving hash collisions

2016-09-15 Thread Stefan Beller
On Wed, Sep 14, 2016 at 11:47 PM, Jeff King  wrote:
> On Wed, Sep 14, 2016 at 07:01:41PM -0700, Stefan Beller wrote:
>
>>  According to Jeff, sending patches that don't get accepted is the new hype!
>
> It is what all the cool kids are doing. Unfortunately, it does not save
> you from nitpicky reviews...;)
>
>>   first = i = hash_obj(sha1, obj_hash_size);
>> + clock_gettime(CLOCK_MONOTONIC, );
>>   while ((obj = obj_hash[i]) != NULL) {
>>   if (!hashcmp(sha1, obj->oid.hash))
>>   break;
>> @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
>>   if (i == obj_hash_size)
>>   i = 0;
>>   }
>> + clock_gettime(CLOCK_MONOTONIC, );
>> + diff(, , _diff);
>> + add_time_to(, _diff);
>>   if (obj && i != first) {
>
> I don't think this is actually measuring the time spent on collisions.
> It's measuring the time we spend in hashcmp(), but that includes the
> non-collision case where we find it in the first hashcmp.

Right. I measured all lookup times, i.e. this function accounts for
1/3 of the run
time.

>
> Measuring _just_ the collisions is more like the patch below. In my
> measurements it's more like 30ms, compared to 10s for all of the
> hashcmps.

So off by a few orders of magnitude, i.e. we don't have to worry about
the time spent in resolving collisions.

>
> So we really aren't dealing with collisions, but rather just verifying
> that our hash landed at the right spot. And _any_ data structure is
> going to have to do that. If you want to make it faster, I'd try
> optimizing hashcmp (and you can see why the critbit tree was slower; if
> we spend so much time just on hashcmp() to make sure we're at the right
> key, then making that slower with a bunch of branching is not going to
> help).
>
> I notice we still open-code hashcmp. I get a slight speedup by switching
> it to memcmp(). About 2.5%, which is similar to what I showed in
>
>   http://public-inbox.org/git/20130318073229.ga5...@sigill.intra.peff.net/
>
> a few years ago (though it's more pronounced as a portion of the whole
> now, because we've optimized some of the other bits).
>
> The main driver there was memcmp() improvements that went into glibc
> 2.13 several years ago. It might be time to start assuming that memcmp()
> beats our open-coded loop.

http://stackoverflow.com/questions/21106801/why-is-memcmp-so-much-faster-than-a-for-loop-check

seems to agree with you; so I'd think I'll agree with switching over.

>
> It may also be possible to really micro-optimize it on some platforms,
> because we know the size in advance (I'd kind of expect the compiler to
> do that, but if we're ending up in glibc memcmp then it sounds like it
> is not the case).

That stackoverflow link suggests that glibc already has microoptimisations
for a variety of platforms.

>
> -Peff


Re: [PATCH] object: measure time needed for resolving hash collisions

2016-09-15 Thread Jeff King
On Wed, Sep 14, 2016 at 11:47:01PM -0700, Jeff King wrote:

> > first = i = hash_obj(sha1, obj_hash_size);
> > +   clock_gettime(CLOCK_MONOTONIC, );
> > while ((obj = obj_hash[i]) != NULL) {
> > if (!hashcmp(sha1, obj->oid.hash))
> > break;
> > @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
> > if (i == obj_hash_size)
> > i = 0;
> > }
> > +   clock_gettime(CLOCK_MONOTONIC, );
> > +   diff(, , _diff);
> > +   add_time_to(, _diff);
> > if (obj && i != first) {
> 
> I don't think this is actually measuring the time spent on collisions.
> It's measuring the time we spend in hashcmp(), but that includes the
> non-collision case where we find it in the first hashcmp.
> 
> Measuring _just_ the collisions is more like the patch below. In my
> measurements it's more like 30ms, compared to 10s for all of the
> hashcmps.

I forgot to send the patch. Which is just as well, because I realized it
was totally buggy.

Here's a patch that I believe is correct (it counts only times when we
move past the first hash slot). It spends about 280ms. Which is still a
lot less than 10s, so I think my other comments stand.

---
diff --git a/object.c b/object.c
index e9e73e0..7a74a1d 100644
--- a/object.c
+++ b/object.c
@@ -123,17 +123,20 @@ struct object *lookup_object(const unsigned char *sha1)
return NULL;
 
first = i = hash_obj(sha1, obj_hash_size);
-   clock_gettime(CLOCK_MONOTONIC, );
while ((obj = obj_hash[i]) != NULL) {
if (!hashcmp(sha1, obj->oid.hash))
break;
+   if (first == i)
+   clock_gettime(CLOCK_MONOTONIC, );
i++;
if (i == obj_hash_size)
i = 0;
}
-   clock_gettime(CLOCK_MONOTONIC, );
-   diff(, , _diff);
-   add_time_to(, _diff);
+   if (i != first) {
+   clock_gettime(CLOCK_MONOTONIC, );
+   diff(, , _diff);
+   add_time_to(, _diff);
+   }
if (obj && i != first) {
/*
 * Move object to where we started to look for it so


Re: [PATCH] object: measure time needed for resolving hash collisions

2016-09-15 Thread Jeff King
On Wed, Sep 14, 2016 at 07:01:41PM -0700, Stefan Beller wrote:

>  According to Jeff, sending patches that don't get accepted is the new hype!

It is what all the cool kids are doing. Unfortunately, it does not save
you from nitpicky reviews...;)

>   first = i = hash_obj(sha1, obj_hash_size);
> + clock_gettime(CLOCK_MONOTONIC, );
>   while ((obj = obj_hash[i]) != NULL) {
>   if (!hashcmp(sha1, obj->oid.hash))
>   break;
> @@ -98,6 +131,9 @@ struct object *lookup_object(const unsigned char *sha1)
>   if (i == obj_hash_size)
>   i = 0;
>   }
> + clock_gettime(CLOCK_MONOTONIC, );
> + diff(, , _diff);
> + add_time_to(, _diff);
>   if (obj && i != first) {

I don't think this is actually measuring the time spent on collisions.
It's measuring the time we spend in hashcmp(), but that includes the
non-collision case where we find it in the first hashcmp.

Measuring _just_ the collisions is more like the patch below. In my
measurements it's more like 30ms, compared to 10s for all of the
hashcmps.

So we really aren't dealing with collisions, but rather just verifying
that our hash landed at the right spot. And _any_ data structure is
going to have to do that. If you want to make it faster, I'd try
optimizing hashcmp (and you can see why the critbit tree was slower; if
we spend so much time just on hashcmp() to make sure we're at the right
key, then making that slower with a bunch of branching is not going to
help).

I notice we still open-code hashcmp. I get a slight speedup by switching
it to memcmp(). About 2.5%, which is similar to what I showed in

  http://public-inbox.org/git/20130318073229.ga5...@sigill.intra.peff.net/

a few years ago (though it's more pronounced as a portion of the whole
now, because we've optimized some of the other bits).

The main driver there was memcmp() improvements that went into glibc
2.13 several years ago. It might be time to start assuming that memcmp()
beats our open-coded loop.

It may also be possible to really micro-optimize it on some platforms,
because we know the size in advance (I'd kind of expect the compiler to
do that, but if we're ending up in glibc memcmp then it sounds like it
is not the case).

-Peff