Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash

2013-05-02 Thread Thomas Rast
Jeff King  writes:

> It _might_ still be advantageous to do your patch on top, but I suspect
> it will diminish the returns from your patch (since the point of it is
> to probe less far down the chain on average).

No, mine makes it slower again.  Apparently the increased size is no
longer worth it.  You'll need a wide window to read this:

Test  next  tr/hash-speedup 
   jk/hash-speedup both-hash-speedup

0001.1: rev-list --all0.66(0.63+0.02)   0.66(0.63+0.03) -0.4%   
   0.66(0.63+0.03) -0.6%   0.66(0.62+0.03) -0.6%
0001.2: rev-list --all --objects  4.12(4.05+0.05)   3.81(3.74+0.06) 
-7.6%***   3.50(3.43+0.05) -15.1%***   3.56(3.49+0.05) -13.7%***


Note that the scripts always generate the percentages and significance
w.r.t. the first column.  Comparing yours with both instead gives

Test   jk/hash-speedup   both-hash-speedup 
---
0001.1: rev-list --all 0.66(0.63+0.03)   0.66(0.62+0.03) +0.0% 
0001.2: rev-list --all --objects   3.50(3.43+0.05)   3.56(3.49+0.05) +1.6%*
---

which is still significant (in the statistical p=5% sense).

For kicks I also ran some other tests, which generally show that the
speedups are limited to this specific workload:

Test  next  tr/hash-speedup 
 jk/hash-speedup both-hash-speedup 
---
3201.1: branch --contains 0.76(0.74+0.02)   0.75(0.73+0.02) -1.0%*  
 0.77(0.74+0.02) +0.7%   0.76(0.73+0.02) -0.7% 
4000.1: log -3000 (baseline)  0.12(0.09+0.02)   0.12(0.10+0.02) +3.2%   
 0.12(0.10+0.01) +0.0%   0.12(0.10+0.02) +3.2% 
4000.2: log --raw -3000 (tree-only)   0.53(0.47+0.05)   0.52(0.46+0.05) -0.9%   
 0.53(0.46+0.06) +0.0%   0.52(0.45+0.06) -0.9% 
4000.3: log -p -3000 (Myers)  2.39(2.23+0.14)   2.38(2.23+0.13) -0.4%   
 2.38(2.23+0.14) -0.4%   2.38(2.23+0.13) -0.5% 
4000.4: log -p -3000 --histogram  2.43(2.28+0.13)   2.43(2.28+0.13) -0.0%   
 2.43(2.28+0.13) -0.1%   2.43(2.28+0.14) +0.1% 
4000.5: log -p -3000 --patience   2.72(2.57+0.12)   2.74(2.59+0.12) +0.7%   
 2.72(2.57+0.13) +0.2%   2.74(2.59+0.13) +0.8%.
---

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash

2013-05-01 Thread Jeff King
On Thu, Apr 25, 2013 at 08:04:01PM +0200, Thomas Rast wrote:

> And probing lookups happen a lot: some simple instrumentation shows
> that 'git rev-list --all --objects' on my git.git,
> 
> * 19.4M objects are accessed in lookup_object and grow_object_hash
>   combined, while
> 
> * the linear probing loops in lookup_object and insert_obj_hash run a
>   total of 9.4M times.
> 
> So we take a slightly different approach, and trade some memory for
> better cache locality.  Namely, we change the hash table slots to
> contain
> 
>   struct object *obj;
>   unsigned long sha1prefix;

I think this is a clever idea, though I do worry about the extra memory
use (it's not all _that_ much in the grand scheme, but it works against
the cache locality benefit). I just posted (but forgot to cc you) a
patch that takes a different approach: to actually move the likely
candidate to the front of the collision change. The patch is here:

  http://article.gmane.org/gmane.comp.version-control.git/223139

It does a bit better than the numbers you have here:

> I get a decent speedup, for example using git.git as a test
> repository:
> 
> Test   before  after
> -
> 0001.1: rev-list --all 0.42(0.40+0.01) 0.41(0.39+0.01)   
> -1.5%**
> 0001.2: rev-list --all --objects   2.40(2.37+0.03) 2.28(2.25+0.03)   
> -5.0%***
> -
> 
> And even more in linux.git:
> 
> -
> 0001.1: rev-list --all 3.31(3.19+0.12) 3.21(3.09+0.11)   
> -2.9%**
> 0001.2: rev-list --all --objects   27.99(27.70+0.26)   25.99(25.71+0.25) 
> -7.1%***
> -

It _might_ still be advantageous to do your patch on top, but I suspect
it will diminish the returns from your patch (since the point of it is
to probe less far down the chain on average).

> I expected the big win to be in grow_object_hash(), but perf says that
> 'git rev-list --all --objects' spends a whopping 33.6% of its time in
> lookup_object(), and this change gets that down to 30.0%.

I'm not surprised. I spent some time trying to optimize grow_object_hash
and realized that it doesn't make much difference. The killer in
"rev-list --objects --all" is that we hit the same tree entry objects
over and over.

Another avenue I'd like to explore is actually doing a tree-diff from
the last processed commit, since we should need to examine only the
changed entries. I suspect it won't be a big benefit, though, because
even though the tree diff can happen in O(# of entries), we are trying
to beat doing an O(1) hash for each entry. So it's the same algorithmic
complexity, and it is hard to beat a few hashcmp() calls. We'll see.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash

2013-04-25 Thread Duy Nguyen
On Fri, Apr 26, 2013 at 1:04 AM, Thomas Rast  wrote:
> So we take a slightly different approach, and trade some memory for
> better cache locality.  Namely, we change the hash table slots to
> contain
>
>   struct object *obj;
>   unsigned long sha1prefix;
>
> We use this new 'sha1prefix' field to store the first part of the
> object's sha1, from which its hash table slot is computed.

Clever. I went a similar road before. But I put the whole 20-byte
sha-1 in obj_hash, which makes obj_hash even bigger and less
cache-friendly.
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash

2013-04-25 Thread Thomas Rast
Junio C Hamano  writes:

> Thomas Rast  writes:
>
>> So we take a slightly different approach, and trade some memory for
>> better cache locality.
>
> Interesting.  It feels somewhat bait-and-switch to reveal that the
> above "some" turns out to be "double" later, but the resulting code
> does not look too bad, and the numbers do not look insignificant.

Oh, that wasn't the intent.  I was too lazy to gather some memory
numbers, so here's an estimate on the local effect and some measurements
on the global one.

struct object is at least 24 bytes (flags etc. and sha1).  We grow the
hash by 2x whenever it reaches 50% load, so it is always at least 25%
loaded.

A 25% loaded hash-table used to consist of 75% pointers (8 bytes) and
25% pointers-to-struct-object (32 bytes), for 14 bytes per average slot.
Now it's 22 bytes (one more unsigned long) per slot, i.e., a 60%
increase for the data managed by the hash table.

But that's using the crudest estimates I could think of.  If we assume
that an average blob and tree is at least as big as the smallest
possible commit, we'd guess that objects are at least ~240 bytes (this
is still somewhat of an estimate and assumes that you don't go and
handcraft commits with single-digit timestamps).  So the numbers above
go up by 25% * 240 per average slot, and work out to an about 11%
overall increase.

Here are some real numbers from /usr/bin/time git rev-list --all --objects:

before:

  2.30user 0.02system 0:02.33elapsed 99%CPU (0avgtext+0avgdata 
247760maxresident)k
  0inputs+0outputs (0major+17844minor)pagefaults 0swaps

after:

  2.18user 0.02system 0:02.21elapsed 99%CPU (0avgtext+0avgdata 
261936maxresident)k   
  0inputs+0outputs (0major+18202minor)pagefaults 0swaps

So that would be about 14MB or 5.7% of extra memory.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash

2013-04-25 Thread Junio C Hamano
Thomas Rast  writes:

> So we take a slightly different approach, and trade some memory for
> better cache locality.

Interesting.  It feels somewhat bait-and-switch to reveal that the
above "some" turns out to be "double" later, but the resulting code
does not look too bad, and the numbers do not look insignificant.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash

2013-04-25 Thread Thomas Rast
The existing obj_hash is really straightforward: it holds a struct
object * and spills into the subsequent slots (linear probing), which
is good enough because it doesn't need to support deletion.

However, this arrangement has pretty bad cache locality in the case of
collisions.  Because the sha1 is contained in the object itself, it
resides in a different memory region from the hash table.  So whenever
we have to process a hash collision, we need to access (and
potentially fetch from slower caches or memory) an object that we are
not going to use again.

And probing lookups happen a lot: some simple instrumentation shows
that 'git rev-list --all --objects' on my git.git,

* 19.4M objects are accessed in lookup_object and grow_object_hash
  combined, while

* the linear probing loops in lookup_object and insert_obj_hash run a
  total of 9.4M times.

So we take a slightly different approach, and trade some memory for
better cache locality.  Namely, we change the hash table slots to
contain

  struct object *obj;
  unsigned long sha1prefix;

We use this new 'sha1prefix' field to store the first part of the
object's sha1, from which its hash table slot is computed.  This
allows us to do two things with data that resides inside the hash
table:

* In lookup_object(), we can do a pre-filtering of the probed slots;
  the probability that we need to actually peek inside any colliding
  object(s) is very small.

* In grow_object_hash(), we actually do not need to look inside the
  objects at all.  This should give a substantial speedup during
  hashtable resizing.

The choice of 'long' makes it the same size as a pointer (to which any
smaller data type would be padded anyway) on x86 and x86_64 Linuxen,
and probably many others.  So the hash table will be twice as big as
before.

I get a decent speedup, for example using git.git as a test
repository:

Test   before  after
-
0001.1: rev-list --all 0.42(0.40+0.01) 0.41(0.39+0.01)   -1.5%**
0001.2: rev-list --all --objects   2.40(2.37+0.03) 2.28(2.25+0.03)   
-5.0%***
-

And even more in linux.git:

-
0001.1: rev-list --all 3.31(3.19+0.12) 3.21(3.09+0.11)   -2.9%**
0001.2: rev-list --all --objects   27.99(27.70+0.26)   25.99(25.71+0.25) 
-7.1%***
-

Signed-off-by: Thomas Rast 
---

I expected the big win to be in grow_object_hash(), but perf says that
'git rev-list --all --objects' spends a whopping 33.6% of its time in
lookup_object(), and this change gets that down to 30.0%.

 object.c | 58 --
 1 file changed, 36 insertions(+), 22 deletions(-)

diff --git a/object.c b/object.c
index 20703f5..6b84c87 100644
--- a/object.c
+++ b/object.c
@@ -5,7 +5,12 @@
 #include "commit.h"
 #include "tag.h"
 
-static struct object **obj_hash;
+struct obj_hash_ent {
+   struct object *obj;
+   unsigned long sha1prefix;
+};
+
+static struct obj_hash_ent *obj_hash;
 static int nr_objs, obj_hash_size;
 
 unsigned int get_max_object_index(void)
@@ -15,7 +20,7 @@ unsigned int get_max_object_index(void)
 
 struct object *get_indexed_object(unsigned int idx)
 {
-   return obj_hash[idx];
+   return obj_hash[idx].obj;
 }
 
 static const char *object_type_strings[] = {
@@ -43,43 +48,52 @@ int type_from_string(const char *str)
die("invalid object type \"%s\"", str);
 }
 
-static unsigned int hash_obj(struct object *obj, unsigned int n)
+static unsigned long hash_sha1(const unsigned char *sha1)
 {
-   unsigned int hash;
-   memcpy(&hash, obj->sha1, sizeof(unsigned int));
-   return hash % n;
+   unsigned long sha1prefix;
+   memcpy(&sha1prefix, sha1, sizeof(unsigned long));
+   return sha1prefix;
 }
 
-static void insert_obj_hash(struct object *obj, struct object **hash, unsigned 
int size)
+static unsigned long hash_obj(struct object *obj)
 {
-   unsigned int j = hash_obj(obj, size);
+   return hash_sha1(obj->sha1);
+}
 
-   while (hash[j]) {
+static void insert_obj_hash_1(struct object *obj, struct obj_hash_ent *hash, 
unsigned int size,
+ unsigned long sha1prefix)
+{
+   unsigned int j = (unsigned int) sha1prefix % size;
+
+   while (hash[j].obj) {
j++;
if (j >= size)
j = 0;
}
-   hash[j] = obj;
+   hash[j].obj = obj;
+   hash[j].sha1prefix = sha1prefix;
 }
 
-static unsigned int hashtable_index(const unsigned char *sha1)
+static void insert_obj_hash(struct object *obj, struct obj_hash_ent *table, 
unsigned int size)
 {
-   unsigned int i;
-   memcpy(&i, sha1, sizeof(unsig