Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()

2013-09-12 Thread Junio C Hamano
Nicolas Pitre n...@fluxnic.net writes:

 Maybe it's worth squashing in one or both of the comments below as a
 warning to anybody who tries to tweak it.

 Agreed.

 @Junio: are you willing to squash those in, or do you prefer a resent?

I think I've queued it ready to be squashed.  No need for resend.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()

2013-09-12 Thread Nicolas Pitre
On Wed, 11 Sep 2013, Jeff King wrote:

 On Tue, Sep 10, 2013 at 06:17:12PM -0400, Nicolas Pitre wrote:
 
  Also remove the modulus as this is an expansive operation.
  The size argument is always a power of 2 anyway, so a simple
  mask operation provides the same result.
  
  On a 'git rev-list --all --objects' run this decreased the time spent
  in lookup_object from 27.5% to 24.1%.
 
 Nice. This is a tiny bit subtle, though, as the power-of-2 growth
 happens elsewhere, and we may want to tweak it later (the decorate.c
 hash, for example, grows by 3/2).
 
 Maybe it's worth squashing in one or both of the comments below as a
 warning to anybody who tries to tweak it.

Agreed.

@Junio: are you willing to squash those in, or do you prefer a resent?

 ---
 diff --git a/object.c b/object.c
 index e2dae22..5f792cb 100644
 --- a/object.c
 +++ b/object.c
 @@ -47,6 +47,7 @@ static unsigned int hash_obj(const unsigned char *sha1, 
 unsigned int n)
  {
   unsigned int hash;
   memcpy(hash, sha1, sizeof(unsigned int));
 + /* Assumes power-of-2 hash sizes in grow_object_hash */
   return hash  (n - 1);
  }
  
 @@ -94,6 +95,10 @@ static void grow_object_hash(void)
  static void grow_object_hash(void)
  {
   int i;
 + /*
 +  * Note that this size must always be power-of-2 to match hash_obj
 +  * above.
 +  */
   int new_hash_size = obj_hash_size  32 ? 32 : 2 * obj_hash_size;
   struct object **new_hash;
  
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()

2013-09-11 Thread Jeff King
On Tue, Sep 10, 2013 at 06:17:12PM -0400, Nicolas Pitre wrote:

 hashtable_index() appears to be a close duplicate of hash_obj().
 Keep only the later and make it usable for all cases.

Thanks. This duplication has often bugged me when looking at that
hash table, but I just never actually wrote the patch.

 Also remove the modulus as this is an expansive operation.
 The size argument is always a power of 2 anyway, so a simple
 mask operation provides the same result.
 
 On a 'git rev-list --all --objects' run this decreased the time spent
 in lookup_object from 27.5% to 24.1%.

Nice. This is a tiny bit subtle, though, as the power-of-2 growth
happens elsewhere, and we may want to tweak it later (the decorate.c
hash, for example, grows by 3/2).

Maybe it's worth squashing in one or both of the comments below as a
warning to anybody who tries to tweak it.

---
diff --git a/object.c b/object.c
index e2dae22..5f792cb 100644
--- a/object.c
+++ b/object.c
@@ -47,6 +47,7 @@ static unsigned int hash_obj(const unsigned char *sha1, 
unsigned int n)
 {
unsigned int hash;
memcpy(hash, sha1, sizeof(unsigned int));
+   /* Assumes power-of-2 hash sizes in grow_object_hash */
return hash  (n - 1);
 }
 
@@ -94,6 +95,10 @@ static void grow_object_hash(void)
 static void grow_object_hash(void)
 {
int i;
+   /*
+* Note that this size must always be power-of-2 to match hash_obj
+* above.
+*/
int new_hash_size = obj_hash_size  32 ? 32 : 2 * obj_hash_size;
struct object **new_hash;
 
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()

2013-09-10 Thread Nicolas Pitre

hashtable_index() appears to be a close duplicate of hash_obj().
Keep only the later and make it usable for all cases.

Also remove the modulus as this is an expansive operation.
The size argument is always a power of 2 anyway, so a simple
mask operation provides the same result.

On a 'git rev-list --all --objects' run this decreased the time spent
in lookup_object from 27.5% to 24.1%.

Signed-off-by: Nicolas Pitre n...@fluxnic.net
---

I discovered this patch in my git work tree dating from 2 years ago.

diff --git a/object.c b/object.c
index d8a4b1f..e2dae22 100644
--- a/object.c
+++ b/object.c
@@ -43,16 +43,16 @@ int type_from_string(const char *str)
die(invalid object type \%s\, str);
 }
 
-static unsigned int hash_obj(struct object *obj, unsigned int n)
+static unsigned int hash_obj(const unsigned char *sha1, unsigned int n)
 {
unsigned int hash;
-   memcpy(hash, obj-sha1, sizeof(unsigned int));
-   return hash % n;
+   memcpy(hash, sha1, sizeof(unsigned int));
+   return hash  (n - 1);
 }
 
 static void insert_obj_hash(struct object *obj, struct object **hash, unsigned 
int size)
 {
-   unsigned int j = hash_obj(obj, size);
+   unsigned int j = hash_obj(obj-sha1, size);
 
while (hash[j]) {
j++;
@@ -62,13 +62,6 @@ static void insert_obj_hash(struct object *obj, struct 
object **hash, unsigned i
hash[j] = obj;
 }
 
-static unsigned int hashtable_index(const unsigned char *sha1)
-{
-   unsigned int i;
-   memcpy(i, sha1, sizeof(unsigned int));
-   return i % obj_hash_size;
-}
-
 struct object *lookup_object(const unsigned char *sha1)
 {
unsigned int i, first;
@@ -77,7 +70,7 @@ struct object *lookup_object(const unsigned char *sha1)
if (!obj_hash)
return NULL;
 
-   first = i = hashtable_index(sha1);
+   first = i = hash_obj(sha1, obj_hash_size);
while ((obj = obj_hash[i]) != NULL) {
if (!hashcmp(sha1, obj-sha1))
break;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html