Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance
On Sun, Mar 17, 2013 at 10:28:06AM +0700, Nguyen Thai Ngoc Duy wrote: > This avoids unnecessary re-allocations and reinsertions. On webkit.git > (i.e. about 182k inserts to the name hash table), this reduces about > 100ms out of 3s user time. Good idea. I had a similar thought when analyzing the hashing behavior of pack-objects' "Counting objects..." phase, but it had even less impact there. The insertions are just drowned out by the number of lookups in that case. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance
Duy Nguyen writes: > On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano wrote: >> Nguyễn Thái Ngọc Duy writes: >> >>> This avoids unnecessary re-allocations and reinsertions. On webkit.git >>> (i.e. about 182k inserts to the name hash table), this reduces about >>> 100ms out of 3s user time. >>> >>> Signed-off-by: Nguyễn Thái Ngọc Duy >> >> I think this is a very good idea, but I would prefer the second >> parameter to the "preallocate" to be "expected number of entries" >> and have the preallocate, which is a part of the hash API, decide >> how to inflate that number to adjust to the desired load factor of >> the hash table. We shouldn't have to adjust the caller when the >> internal implementation of the hash table changes. > > OK will do. I've squashed it in myself, so no need to resend only for this. diff --git a/diffcore-rename.c b/diffcore-rename.c index 8d3d9bb..6c7a72f 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -389,7 +389,7 @@ static int find_exact_renames(struct diff_options *options) struct hash_table file_table; init_hash(&file_table); - preallocate_hash(&file_table, (rename_src_nr + rename_dst_nr) * 2); + preallocate_hash(&file_table, rename_src_nr + rename_dst_nr); for (i = 0; i < rename_src_nr; i++) insert_file_table(&file_table, -1, i, rename_src[i].p->one); diff --git a/hash.h b/hash.h index 244d1fe..1d43ac0 100644 --- a/hash.h +++ b/hash.h @@ -40,11 +40,11 @@ static inline void init_hash(struct hash_table *table) table->array = NULL; } -static inline void preallocate_hash(struct hash_table *table, unsigned int size) +static inline void preallocate_hash(struct hash_table *table, unsigned int elts) { assert(table->size == 0 && table->nr == 0 && table->array == NULL); - table->size = size; - table->array = xcalloc(sizeof(struct hash_table_entry), size); + table->size = elts * 2; + table->array = xcalloc(sizeof(struct hash_table_entry), table->size); } #endif diff --git a/name-hash.c b/name-hash.c index 90c7b99..2a1f108 100644 --- a/name-hash.c +++ b/name-hash.c @@ -93,7 +93,7 @@ static void lazy_init_name_hash(struct index_state *istate) if (istate->name_hash_initialized) return; if (istate->cache_nr) - preallocate_hash(&istate->name_hash, istate->cache_nr * 2); + preallocate_hash(&istate->name_hash, istate->cache_nr); for (nr = 0; nr < istate->cache_nr; nr++) hash_index_entry(istate, istate->cache[nr]); istate->name_hash_initialized = 1; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance
On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano wrote: > Nguyễn Thái Ngọc Duy writes: > >> This avoids unnecessary re-allocations and reinsertions. On webkit.git >> (i.e. about 182k inserts to the name hash table), this reduces about >> 100ms out of 3s user time. >> >> Signed-off-by: Nguyễn Thái Ngọc Duy > > I think this is a very good idea, but I would prefer the second > parameter to the "preallocate" to be "expected number of entries" > and have the preallocate, which is a part of the hash API, decide > how to inflate that number to adjust to the desired load factor of > the hash table. We shouldn't have to adjust the caller when the > internal implementation of the hash table changes. OK will do. >> --- >> nd/read-directory-recursive-optim reduces the number of input (from >> 182k to 11k on webkit) to exclude machinery that all patches in the >> exclude optimization series I posted seem insignificant. So I won't >> repost them for inclusion unless you think it has cleanup values. > > Sorry, without a pointer, it is unclear what "exclude optimization > series" you are referring to. http://thread.gmane.org/gmane.comp.version-control.git/217697/focus=217950 -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance
Nguyễn Thái Ngọc Duy writes: > This avoids unnecessary re-allocations and reinsertions. On webkit.git > (i.e. about 182k inserts to the name hash table), this reduces about > 100ms out of 3s user time. > > Signed-off-by: Nguyễn Thái Ngọc Duy I think this is a very good idea, but I would prefer the second parameter to the "preallocate" to be "expected number of entries" and have the preallocate, which is a part of the hash API, decide how to inflate that number to adjust to the desired load factor of the hash table. We shouldn't have to adjust the caller when the internal implementation of the hash table changes. > --- > nd/read-directory-recursive-optim reduces the number of input (from > 182k to 11k on webkit) to exclude machinery that all patches in the > exclude optimization series I posted seem insignificant. So I won't > repost them for inclusion unless you think it has cleanup values. Sorry, without a pointer, it is unclear what "exclude optimization series" you are referring to. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html