Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance
Duy Nguyen pclo...@gmail.com writes: On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano gits...@pobox.com wrote: Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: This avoids unnecessary re-allocations and reinsertions. On webkit.git (i.e. about 182k inserts to the name hash table), this reduces about 100ms out of 3s user time. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com I think this is a very good idea, but I would prefer the second parameter to the preallocate to be expected number of entries and have the preallocate, which is a part of the hash API, decide how to inflate that number to adjust to the desired load factor of the hash table. We shouldn't have to adjust the caller when the internal implementation of the hash table changes. OK will do. I've squashed it in myself, so no need to resend only for this. diff --git a/diffcore-rename.c b/diffcore-rename.c index 8d3d9bb..6c7a72f 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -389,7 +389,7 @@ static int find_exact_renames(struct diff_options *options) struct hash_table file_table; init_hash(file_table); - preallocate_hash(file_table, (rename_src_nr + rename_dst_nr) * 2); + preallocate_hash(file_table, rename_src_nr + rename_dst_nr); for (i = 0; i rename_src_nr; i++) insert_file_table(file_table, -1, i, rename_src[i].p-one); diff --git a/hash.h b/hash.h index 244d1fe..1d43ac0 100644 --- a/hash.h +++ b/hash.h @@ -40,11 +40,11 @@ static inline void init_hash(struct hash_table *table) table-array = NULL; } -static inline void preallocate_hash(struct hash_table *table, unsigned int size) +static inline void preallocate_hash(struct hash_table *table, unsigned int elts) { assert(table-size == 0 table-nr == 0 table-array == NULL); - table-size = size; - table-array = xcalloc(sizeof(struct hash_table_entry), size); + table-size = elts * 2; + table-array = xcalloc(sizeof(struct hash_table_entry), table-size); } #endif diff --git a/name-hash.c b/name-hash.c index 90c7b99..2a1f108 100644 --- a/name-hash.c +++ b/name-hash.c @@ -93,7 +93,7 @@ static void lazy_init_name_hash(struct index_state *istate) if (istate-name_hash_initialized) return; if (istate-cache_nr) - preallocate_hash(istate-name_hash, istate-cache_nr * 2); + preallocate_hash(istate-name_hash, istate-cache_nr); for (nr = 0; nr istate-cache_nr; nr++) hash_index_entry(istate, istate-cache[nr]); istate-name_hash_initialized = 1; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance
On Sun, Mar 17, 2013 at 10:28:06AM +0700, Nguyen Thai Ngoc Duy wrote: This avoids unnecessary re-allocations and reinsertions. On webkit.git (i.e. about 182k inserts to the name hash table), this reduces about 100ms out of 3s user time. Good idea. I had a similar thought when analyzing the hashing behavior of pack-objects' Counting objects... phase, but it had even less impact there. The insertions are just drowned out by the number of lookups in that case. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Preallocate hash tables when the number of inserts are known in advance
This avoids unnecessary re-allocations and reinsertions. On webkit.git (i.e. about 182k inserts to the name hash table), this reduces about 100ms out of 3s user time. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com --- nd/read-directory-recursive-optim reduces the number of input (from 182k to 11k on webkit) to exclude machinery that all patches in the exclude optimization series I posted seem insignificant. So I won't repost them for inclusion unless you think it has cleanup values. This one is worth doing though. I think keeping untracked index would help avoid looking up in name-hash, where all user-space CPU cycles are spent. But I have nothing to show about that. diffcore-rename.c | 1 + hash.h| 7 +++ name-hash.c | 2 ++ 3 files changed, 10 insertions(+) diff --git a/diffcore-rename.c b/diffcore-rename.c index 512d0ac..8d3d9bb 100644 --- a/diffcore-rename.c +++ b/diffcore-rename.c @@ -389,6 +389,7 @@ static int find_exact_renames(struct diff_options *options) struct hash_table file_table; init_hash(file_table); + preallocate_hash(file_table, (rename_src_nr + rename_dst_nr) * 2); for (i = 0; i rename_src_nr; i++) insert_file_table(file_table, -1, i, rename_src[i].p-one); diff --git a/hash.h b/hash.h index b875ce6..244d1fe 100644 --- a/hash.h +++ b/hash.h @@ -40,4 +40,11 @@ static inline void init_hash(struct hash_table *table) table-array = NULL; } +static inline void preallocate_hash(struct hash_table *table, unsigned int size) +{ + assert(table-size == 0 table-nr == 0 table-array == NULL); + table-size = size; + table-array = xcalloc(sizeof(struct hash_table_entry), size); +} + #endif diff --git a/name-hash.c b/name-hash.c index 942c459..12364d1 100644 --- a/name-hash.c +++ b/name-hash.c @@ -92,6 +92,8 @@ static void lazy_init_name_hash(struct index_state *istate) if (istate-name_hash_initialized) return; + if (istate-cache_nr) + preallocate_hash(istate-name_hash, istate-cache_nr * 2); for (nr = 0; nr istate-cache_nr; nr++) hash_index_entry(istate, istate-cache[nr]); istate-name_hash_initialized = 1; -- 1.8.2.83.gc99314b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: This avoids unnecessary re-allocations and reinsertions. On webkit.git (i.e. about 182k inserts to the name hash table), this reduces about 100ms out of 3s user time. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com I think this is a very good idea, but I would prefer the second parameter to the preallocate to be expected number of entries and have the preallocate, which is a part of the hash API, decide how to inflate that number to adjust to the desired load factor of the hash table. We shouldn't have to adjust the caller when the internal implementation of the hash table changes. --- nd/read-directory-recursive-optim reduces the number of input (from 182k to 11k on webkit) to exclude machinery that all patches in the exclude optimization series I posted seem insignificant. So I won't repost them for inclusion unless you think it has cleanup values. Sorry, without a pointer, it is unclear what exclude optimization series you are referring to. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance
On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano gits...@pobox.com wrote: Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: This avoids unnecessary re-allocations and reinsertions. On webkit.git (i.e. about 182k inserts to the name hash table), this reduces about 100ms out of 3s user time. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com I think this is a very good idea, but I would prefer the second parameter to the preallocate to be expected number of entries and have the preallocate, which is a part of the hash API, decide how to inflate that number to adjust to the desired load factor of the hash table. We shouldn't have to adjust the caller when the internal implementation of the hash table changes. OK will do. --- nd/read-directory-recursive-optim reduces the number of input (from 182k to 11k on webkit) to exclude machinery that all patches in the exclude optimization series I posted seem insignificant. So I won't repost them for inclusion unless you think it has cleanup values. Sorry, without a pointer, it is unclear what exclude optimization series you are referring to. http://thread.gmane.org/gmane.comp.version-control.git/217697/focus=217950 -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html