Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-17 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano gits...@pobox.com wrote:
 Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 This avoids unnecessary re-allocations and reinsertions. On webkit.git
 (i.e. about 182k inserts to the name hash table), this reduces about
 100ms out of 3s user time.

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com

 I think this is a very good idea, but I would prefer the second
 parameter to the preallocate to be expected number of entries
 and have the preallocate, which is a part of the hash API, decide
 how to inflate that number to adjust to the desired load factor of
 the hash table.  We shouldn't have to adjust the caller when the
 internal implementation of the hash table changes.

 OK will do.

I've squashed it in myself, so no need to resend only for this.

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 8d3d9bb..6c7a72f 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -389,7 +389,7 @@ static int find_exact_renames(struct diff_options *options)
struct hash_table file_table;
 
init_hash(file_table);
-   preallocate_hash(file_table, (rename_src_nr + rename_dst_nr) * 2);
+   preallocate_hash(file_table, rename_src_nr + rename_dst_nr);
for (i = 0; i  rename_src_nr; i++)
insert_file_table(file_table, -1, i, rename_src[i].p-one);
 
diff --git a/hash.h b/hash.h
index 244d1fe..1d43ac0 100644
--- a/hash.h
+++ b/hash.h
@@ -40,11 +40,11 @@ static inline void init_hash(struct hash_table *table)
table-array = NULL;
 }
 
-static inline void preallocate_hash(struct hash_table *table, unsigned int 
size)
+static inline void preallocate_hash(struct hash_table *table, unsigned int 
elts)
 {
assert(table-size == 0  table-nr == 0  table-array == NULL);
-   table-size = size;
-   table-array = xcalloc(sizeof(struct hash_table_entry), size);
+   table-size = elts * 2;
+   table-array = xcalloc(sizeof(struct hash_table_entry), table-size);
 }
 
 #endif
diff --git a/name-hash.c b/name-hash.c
index 90c7b99..2a1f108 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -93,7 +93,7 @@ static void lazy_init_name_hash(struct index_state *istate)
if (istate-name_hash_initialized)
return;
if (istate-cache_nr)
-   preallocate_hash(istate-name_hash, istate-cache_nr * 2);
+   preallocate_hash(istate-name_hash, istate-cache_nr);
for (nr = 0; nr  istate-cache_nr; nr++)
hash_index_entry(istate, istate-cache[nr]);
istate-name_hash_initialized = 1;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-17 Thread Jeff King
On Sun, Mar 17, 2013 at 10:28:06AM +0700, Nguyen Thai Ngoc Duy wrote:

 This avoids unnecessary re-allocations and reinsertions. On webkit.git
 (i.e. about 182k inserts to the name hash table), this reduces about
 100ms out of 3s user time.

Good idea.

I had a similar thought when analyzing the hashing behavior of
pack-objects' Counting objects... phase, but it had even less impact
there. The insertions are just drowned out by the number of lookups in
that case.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-16 Thread Nguyễn Thái Ngọc Duy
This avoids unnecessary re-allocations and reinsertions. On webkit.git
(i.e. about 182k inserts to the name hash table), this reduces about
100ms out of 3s user time.

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
---
 nd/read-directory-recursive-optim reduces the number of input (from
 182k to 11k on webkit) to exclude machinery that all patches in the
 exclude optimization series I posted seem insignificant. So I won't
 repost them for inclusion unless you think it has cleanup values.

 This one is worth doing though. I think keeping untracked index
 would help avoid looking up in name-hash, where all user-space CPU
 cycles are spent. But I have nothing to show about that.

 diffcore-rename.c | 1 +
 hash.h| 7 +++
 name-hash.c   | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 512d0ac..8d3d9bb 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -389,6 +389,7 @@ static int find_exact_renames(struct diff_options *options)
struct hash_table file_table;
 
init_hash(file_table);
+   preallocate_hash(file_table, (rename_src_nr + rename_dst_nr) * 2);
for (i = 0; i  rename_src_nr; i++)
insert_file_table(file_table, -1, i, rename_src[i].p-one);
 
diff --git a/hash.h b/hash.h
index b875ce6..244d1fe 100644
--- a/hash.h
+++ b/hash.h
@@ -40,4 +40,11 @@ static inline void init_hash(struct hash_table *table)
table-array = NULL;
 }
 
+static inline void preallocate_hash(struct hash_table *table, unsigned int 
size)
+{
+   assert(table-size == 0  table-nr == 0  table-array == NULL);
+   table-size = size;
+   table-array = xcalloc(sizeof(struct hash_table_entry), size);
+}
+
 #endif
diff --git a/name-hash.c b/name-hash.c
index 942c459..12364d1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -92,6 +92,8 @@ static void lazy_init_name_hash(struct index_state *istate)
 
if (istate-name_hash_initialized)
return;
+   if (istate-cache_nr)
+   preallocate_hash(istate-name_hash, istate-cache_nr * 2);
for (nr = 0; nr  istate-cache_nr; nr++)
hash_index_entry(istate, istate-cache[nr]);
istate-name_hash_initialized = 1;
-- 
1.8.2.83.gc99314b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-16 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 This avoids unnecessary re-allocations and reinsertions. On webkit.git
 (i.e. about 182k inserts to the name hash table), this reduces about
 100ms out of 3s user time.

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com

I think this is a very good idea, but I would prefer the second
parameter to the preallocate to be expected number of entries
and have the preallocate, which is a part of the hash API, decide
how to inflate that number to adjust to the desired load factor of
the hash table.  We shouldn't have to adjust the caller when the
internal implementation of the hash table changes.

 ---
  nd/read-directory-recursive-optim reduces the number of input (from
  182k to 11k on webkit) to exclude machinery that all patches in the
  exclude optimization series I posted seem insignificant. So I won't
  repost them for inclusion unless you think it has cleanup values.

Sorry, without a pointer, it is unclear what exclude optimization
series you are referring to.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Preallocate hash tables when the number of inserts are known in advance

2013-03-16 Thread Duy Nguyen
On Sun, Mar 17, 2013 at 12:34 PM, Junio C Hamano gits...@pobox.com wrote:
 Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 This avoids unnecessary re-allocations and reinsertions. On webkit.git
 (i.e. about 182k inserts to the name hash table), this reduces about
 100ms out of 3s user time.

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com

 I think this is a very good idea, but I would prefer the second
 parameter to the preallocate to be expected number of entries
 and have the preallocate, which is a part of the hash API, decide
 how to inflate that number to adjust to the desired load factor of
 the hash table.  We shouldn't have to adjust the caller when the
 internal implementation of the hash table changes.

OK will do.

 ---
  nd/read-directory-recursive-optim reduces the number of input (from
  182k to 11k on webkit) to exclude machinery that all patches in the
  exclude optimization series I posted seem insignificant. So I won't
  repost them for inclusion unless you think it has cleanup values.

 Sorry, without a pointer, it is unclear what exclude optimization
 series you are referring to.

http://thread.gmane.org/gmane.comp.version-control.git/217697/focus=217950
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html