Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()
Nicolas Pitre n...@fluxnic.net writes: Maybe it's worth squashing in one or both of the comments below as a warning to anybody who tries to tweak it. Agreed. @Junio: are you willing to squash those in, or do you prefer a resent? I think I've queued it ready to be squashed. No need for resend. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()
On Wed, 11 Sep 2013, Jeff King wrote: On Tue, Sep 10, 2013 at 06:17:12PM -0400, Nicolas Pitre wrote: Also remove the modulus as this is an expansive operation. The size argument is always a power of 2 anyway, so a simple mask operation provides the same result. On a 'git rev-list --all --objects' run this decreased the time spent in lookup_object from 27.5% to 24.1%. Nice. This is a tiny bit subtle, though, as the power-of-2 growth happens elsewhere, and we may want to tweak it later (the decorate.c hash, for example, grows by 3/2). Maybe it's worth squashing in one or both of the comments below as a warning to anybody who tries to tweak it. Agreed. @Junio: are you willing to squash those in, or do you prefer a resent? --- diff --git a/object.c b/object.c index e2dae22..5f792cb 100644 --- a/object.c +++ b/object.c @@ -47,6 +47,7 @@ static unsigned int hash_obj(const unsigned char *sha1, unsigned int n) { unsigned int hash; memcpy(hash, sha1, sizeof(unsigned int)); + /* Assumes power-of-2 hash sizes in grow_object_hash */ return hash (n - 1); } @@ -94,6 +95,10 @@ static void grow_object_hash(void) static void grow_object_hash(void) { int i; + /* + * Note that this size must always be power-of-2 to match hash_obj + * above. + */ int new_hash_size = obj_hash_size 32 ? 32 : 2 * obj_hash_size; struct object **new_hash; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()
On Tue, Sep 10, 2013 at 06:17:12PM -0400, Nicolas Pitre wrote: hashtable_index() appears to be a close duplicate of hash_obj(). Keep only the later and make it usable for all cases. Thanks. This duplication has often bugged me when looking at that hash table, but I just never actually wrote the patch. Also remove the modulus as this is an expansive operation. The size argument is always a power of 2 anyway, so a simple mask operation provides the same result. On a 'git rev-list --all --objects' run this decreased the time spent in lookup_object from 27.5% to 24.1%. Nice. This is a tiny bit subtle, though, as the power-of-2 growth happens elsewhere, and we may want to tweak it later (the decorate.c hash, for example, grows by 3/2). Maybe it's worth squashing in one or both of the comments below as a warning to anybody who tries to tweak it. --- diff --git a/object.c b/object.c index e2dae22..5f792cb 100644 --- a/object.c +++ b/object.c @@ -47,6 +47,7 @@ static unsigned int hash_obj(const unsigned char *sha1, unsigned int n) { unsigned int hash; memcpy(hash, sha1, sizeof(unsigned int)); + /* Assumes power-of-2 hash sizes in grow_object_hash */ return hash (n - 1); } @@ -94,6 +95,10 @@ static void grow_object_hash(void) static void grow_object_hash(void) { int i; + /* +* Note that this size must always be power-of-2 to match hash_obj +* above. +*/ int new_hash_size = obj_hash_size 32 ? 32 : 2 * obj_hash_size; struct object **new_hash; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] lookup_object: remove hashtable_index() and optimize hash_obj()
hashtable_index() appears to be a close duplicate of hash_obj(). Keep only the later and make it usable for all cases. Also remove the modulus as this is an expansive operation. The size argument is always a power of 2 anyway, so a simple mask operation provides the same result. On a 'git rev-list --all --objects' run this decreased the time spent in lookup_object from 27.5% to 24.1%. Signed-off-by: Nicolas Pitre n...@fluxnic.net --- I discovered this patch in my git work tree dating from 2 years ago. diff --git a/object.c b/object.c index d8a4b1f..e2dae22 100644 --- a/object.c +++ b/object.c @@ -43,16 +43,16 @@ int type_from_string(const char *str) die(invalid object type \%s\, str); } -static unsigned int hash_obj(struct object *obj, unsigned int n) +static unsigned int hash_obj(const unsigned char *sha1, unsigned int n) { unsigned int hash; - memcpy(hash, obj-sha1, sizeof(unsigned int)); - return hash % n; + memcpy(hash, sha1, sizeof(unsigned int)); + return hash (n - 1); } static void insert_obj_hash(struct object *obj, struct object **hash, unsigned int size) { - unsigned int j = hash_obj(obj, size); + unsigned int j = hash_obj(obj-sha1, size); while (hash[j]) { j++; @@ -62,13 +62,6 @@ static void insert_obj_hash(struct object *obj, struct object **hash, unsigned i hash[j] = obj; } -static unsigned int hashtable_index(const unsigned char *sha1) -{ - unsigned int i; - memcpy(i, sha1, sizeof(unsigned int)); - return i % obj_hash_size; -} - struct object *lookup_object(const unsigned char *sha1) { unsigned int i, first; @@ -77,7 +70,7 @@ struct object *lookup_object(const unsigned char *sha1) if (!obj_hash) return NULL; - first = i = hashtable_index(sha1); + first = i = hash_obj(sha1, obj_hash_size); while ((obj = obj_hash[i]) != NULL) { if (!hashcmp(sha1, obj-sha1)) break; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html