Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-17 Thread Jeff King
On Sun, Mar 17, 2013 at 10:28:06AM +0700, Nguyen Thai Ngoc Duy wrote:

> This avoids unnecessary re-allocations and reinsertions. On webkit.git
> (i.e. about 182k inserts to the name hash table), this reduces about
> 100ms out of 3s user time.

Good idea.

I had a similar thought when analyzing the hashing behavior of
pack-objects' "Counting objects..." phase, but it had even less impact
there. The insertions are just drowned out by the number of lookups in
that case.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-16 Thread Junio C Hamano
Duy Nguyen  writes:

> On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano  wrote:
>> Nguyễn Thái Ngọc Duy  writes:
>>
>>> This avoids unnecessary re-allocations and reinsertions. On webkit.git
>>> (i.e. about 182k inserts to the name hash table), this reduces about
>>> 100ms out of 3s user time.
>>>
>>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>>
>> I think this is a very good idea, but I would prefer the second
>> parameter to the "preallocate" to be "expected number of entries"
>> and have the preallocate, which is a part of the hash API, decide
>> how to inflate that number to adjust to the desired load factor of
>> the hash table.  We shouldn't have to adjust the caller when the
>> internal implementation of the hash table changes.
>
> OK will do.

I've squashed it in myself, so no need to resend only for this.

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 8d3d9bb..6c7a72f 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -389,7 +389,7 @@ static int find_exact_renames(struct diff_options *options)
struct hash_table file_table;
 
init_hash(&file_table);
-   preallocate_hash(&file_table, (rename_src_nr + rename_dst_nr) * 2);
+   preallocate_hash(&file_table, rename_src_nr + rename_dst_nr);
for (i = 0; i < rename_src_nr; i++)
insert_file_table(&file_table, -1, i, rename_src[i].p->one);
 
diff --git a/hash.h b/hash.h
index 244d1fe..1d43ac0 100644
--- a/hash.h
+++ b/hash.h
@@ -40,11 +40,11 @@ static inline void init_hash(struct hash_table *table)
table->array = NULL;
 }
 
-static inline void preallocate_hash(struct hash_table *table, unsigned int 
size)
+static inline void preallocate_hash(struct hash_table *table, unsigned int 
elts)
 {
assert(table->size == 0 && table->nr == 0 && table->array == NULL);
-   table->size = size;
-   table->array = xcalloc(sizeof(struct hash_table_entry), size);
+   table->size = elts * 2;
+   table->array = xcalloc(sizeof(struct hash_table_entry), table->size);
 }
 
 #endif
diff --git a/name-hash.c b/name-hash.c
index 90c7b99..2a1f108 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -93,7 +93,7 @@ static void lazy_init_name_hash(struct index_state *istate)
if (istate->name_hash_initialized)
return;
if (istate->cache_nr)
-   preallocate_hash(&istate->name_hash, istate->cache_nr * 2);
+   preallocate_hash(&istate->name_hash, istate->cache_nr);
for (nr = 0; nr < istate->cache_nr; nr++)
hash_index_entry(istate, istate->cache[nr]);
istate->name_hash_initialized = 1;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-16 Thread Duy Nguyen
On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy  writes:
>
>> This avoids unnecessary re-allocations and reinsertions. On webkit.git
>> (i.e. about 182k inserts to the name hash table), this reduces about
>> 100ms out of 3s user time.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>
> I think this is a very good idea, but I would prefer the second
> parameter to the "preallocate" to be "expected number of entries"
> and have the preallocate, which is a part of the hash API, decide
> how to inflate that number to adjust to the desired load factor of
> the hash table.  We shouldn't have to adjust the caller when the
> internal implementation of the hash table changes.

OK will do.

>> ---
>>  nd/read-directory-recursive-optim reduces the number of input (from
>>  182k to 11k on webkit) to exclude machinery that all patches in the
>>  exclude optimization series I posted seem insignificant. So I won't
>>  repost them for inclusion unless you think it has cleanup values.
>
> Sorry, without a pointer, it is unclear what "exclude optimization
> series" you are referring to.

http://thread.gmane.org/gmane.comp.version-control.git/217697/focus=217950
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-16 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy  writes:

> This avoids unnecessary re-allocations and reinsertions. On webkit.git
> (i.e. about 182k inserts to the name hash table), this reduces about
> 100ms out of 3s user time.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 

I think this is a very good idea, but I would prefer the second
parameter to the "preallocate" to be "expected number of entries"
and have the preallocate, which is a part of the hash API, decide
how to inflate that number to adjust to the desired load factor of
the hash table.  We shouldn't have to adjust the caller when the
internal implementation of the hash table changes.

> ---
>  nd/read-directory-recursive-optim reduces the number of input (from
>  182k to 11k on webkit) to exclude machinery that all patches in the
>  exclude optimization series I posted seem insignificant. So I won't
>  repost them for inclusion unless you think it has cleanup values.

Sorry, without a pointer, it is unclear what "exclude optimization
series" you are referring to.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html