Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash
Jeff King writes: > It _might_ still be advantageous to do your patch on top, but I suspect > it will diminish the returns from your patch (since the point of it is > to probe less far down the chain on average). No, mine makes it slower again. Apparently the increased size is no longer worth it. You'll need a wide window to read this: Test next tr/hash-speedup jk/hash-speedup both-hash-speedup 0001.1: rev-list --all0.66(0.63+0.02) 0.66(0.63+0.03) -0.4% 0.66(0.63+0.03) -0.6% 0.66(0.62+0.03) -0.6% 0001.2: rev-list --all --objects 4.12(4.05+0.05) 3.81(3.74+0.06) -7.6%*** 3.50(3.43+0.05) -15.1%*** 3.56(3.49+0.05) -13.7%*** Note that the scripts always generate the percentages and significance w.r.t. the first column. Comparing yours with both instead gives Test jk/hash-speedup both-hash-speedup --- 0001.1: rev-list --all 0.66(0.63+0.03) 0.66(0.62+0.03) +0.0% 0001.2: rev-list --all --objects 3.50(3.43+0.05) 3.56(3.49+0.05) +1.6%* --- which is still significant (in the statistical p=5% sense). For kicks I also ran some other tests, which generally show that the speedups are limited to this specific workload: Test next tr/hash-speedup jk/hash-speedup both-hash-speedup --- 3201.1: branch --contains 0.76(0.74+0.02) 0.75(0.73+0.02) -1.0%* 0.77(0.74+0.02) +0.7% 0.76(0.73+0.02) -0.7% 4000.1: log -3000 (baseline) 0.12(0.09+0.02) 0.12(0.10+0.02) +3.2% 0.12(0.10+0.01) +0.0% 0.12(0.10+0.02) +3.2% 4000.2: log --raw -3000 (tree-only) 0.53(0.47+0.05) 0.52(0.46+0.05) -0.9% 0.53(0.46+0.06) +0.0% 0.52(0.45+0.06) -0.9% 4000.3: log -p -3000 (Myers) 2.39(2.23+0.14) 2.38(2.23+0.13) -0.4% 2.38(2.23+0.14) -0.4% 2.38(2.23+0.13) -0.5% 4000.4: log -p -3000 --histogram 2.43(2.28+0.13) 2.43(2.28+0.13) -0.0% 2.43(2.28+0.13) -0.1% 2.43(2.28+0.14) +0.1% 4000.5: log -p -3000 --patience 2.72(2.57+0.12) 2.74(2.59+0.12) +0.7% 2.72(2.57+0.13) +0.2% 2.74(2.59+0.13) +0.8%. --- -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash
On Thu, Apr 25, 2013 at 08:04:01PM +0200, Thomas Rast wrote: > And probing lookups happen a lot: some simple instrumentation shows > that 'git rev-list --all --objects' on my git.git, > > * 19.4M objects are accessed in lookup_object and grow_object_hash > combined, while > > * the linear probing loops in lookup_object and insert_obj_hash run a > total of 9.4M times. > > So we take a slightly different approach, and trade some memory for > better cache locality. Namely, we change the hash table slots to > contain > > struct object *obj; > unsigned long sha1prefix; I think this is a clever idea, though I do worry about the extra memory use (it's not all _that_ much in the grand scheme, but it works against the cache locality benefit). I just posted (but forgot to cc you) a patch that takes a different approach: to actually move the likely candidate to the front of the collision change. The patch is here: http://article.gmane.org/gmane.comp.version-control.git/223139 It does a bit better than the numbers you have here: > I get a decent speedup, for example using git.git as a test > repository: > > Test before after > - > 0001.1: rev-list --all 0.42(0.40+0.01) 0.41(0.39+0.01) > -1.5%** > 0001.2: rev-list --all --objects 2.40(2.37+0.03) 2.28(2.25+0.03) > -5.0%*** > - > > And even more in linux.git: > > - > 0001.1: rev-list --all 3.31(3.19+0.12) 3.21(3.09+0.11) > -2.9%** > 0001.2: rev-list --all --objects 27.99(27.70+0.26) 25.99(25.71+0.25) > -7.1%*** > - It _might_ still be advantageous to do your patch on top, but I suspect it will diminish the returns from your patch (since the point of it is to probe less far down the chain on average). > I expected the big win to be in grow_object_hash(), but perf says that > 'git rev-list --all --objects' spends a whopping 33.6% of its time in > lookup_object(), and this change gets that down to 30.0%. I'm not surprised. I spent some time trying to optimize grow_object_hash and realized that it doesn't make much difference. The killer in "rev-list --objects --all" is that we hit the same tree entry objects over and over. Another avenue I'd like to explore is actually doing a tree-diff from the last processed commit, since we should need to examine only the changed entries. I suspect it won't be a big benefit, though, because even though the tree diff can happen in O(# of entries), we are trying to beat doing an O(1) hash for each entry. So it's the same algorithmic complexity, and it is hard to beat a few hashcmp() calls. We'll see. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash
On Fri, Apr 26, 2013 at 1:04 AM, Thomas Rast wrote: > So we take a slightly different approach, and trade some memory for > better cache locality. Namely, we change the hash table slots to > contain > > struct object *obj; > unsigned long sha1prefix; > > We use this new 'sha1prefix' field to store the first part of the > object's sha1, from which its hash table slot is computed. Clever. I went a similar road before. But I put the whole 20-byte sha-1 in obj_hash, which makes obj_hash even bigger and less cache-friendly. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash
Junio C Hamano writes: > Thomas Rast writes: > >> So we take a slightly different approach, and trade some memory for >> better cache locality. > > Interesting. It feels somewhat bait-and-switch to reveal that the > above "some" turns out to be "double" later, but the resulting code > does not look too bad, and the numbers do not look insignificant. Oh, that wasn't the intent. I was too lazy to gather some memory numbers, so here's an estimate on the local effect and some measurements on the global one. struct object is at least 24 bytes (flags etc. and sha1). We grow the hash by 2x whenever it reaches 50% load, so it is always at least 25% loaded. A 25% loaded hash-table used to consist of 75% pointers (8 bytes) and 25% pointers-to-struct-object (32 bytes), for 14 bytes per average slot. Now it's 22 bytes (one more unsigned long) per slot, i.e., a 60% increase for the data managed by the hash table. But that's using the crudest estimates I could think of. If we assume that an average blob and tree is at least as big as the smallest possible commit, we'd guess that objects are at least ~240 bytes (this is still somewhat of an estimate and assumes that you don't go and handcraft commits with single-digit timestamps). So the numbers above go up by 25% * 240 per average slot, and work out to an about 11% overall increase. Here are some real numbers from /usr/bin/time git rev-list --all --objects: before: 2.30user 0.02system 0:02.33elapsed 99%CPU (0avgtext+0avgdata 247760maxresident)k 0inputs+0outputs (0major+17844minor)pagefaults 0swaps after: 2.18user 0.02system 0:02.21elapsed 99%CPU (0avgtext+0avgdata 261936maxresident)k 0inputs+0outputs (0major+18202minor)pagefaults 0swaps So that would be about 14MB or 5.7% of extra memory. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash
Thomas Rast writes: > So we take a slightly different approach, and trade some memory for > better cache locality. Interesting. It feels somewhat bait-and-switch to reveal that the above "some" turns out to be "double" later, but the resulting code does not look too bad, and the numbers do not look insignificant. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Hold an 'unsigned long' chunk of the sha1 in obj_hash
The existing obj_hash is really straightforward: it holds a struct object * and spills into the subsequent slots (linear probing), which is good enough because it doesn't need to support deletion. However, this arrangement has pretty bad cache locality in the case of collisions. Because the sha1 is contained in the object itself, it resides in a different memory region from the hash table. So whenever we have to process a hash collision, we need to access (and potentially fetch from slower caches or memory) an object that we are not going to use again. And probing lookups happen a lot: some simple instrumentation shows that 'git rev-list --all --objects' on my git.git, * 19.4M objects are accessed in lookup_object and grow_object_hash combined, while * the linear probing loops in lookup_object and insert_obj_hash run a total of 9.4M times. So we take a slightly different approach, and trade some memory for better cache locality. Namely, we change the hash table slots to contain struct object *obj; unsigned long sha1prefix; We use this new 'sha1prefix' field to store the first part of the object's sha1, from which its hash table slot is computed. This allows us to do two things with data that resides inside the hash table: * In lookup_object(), we can do a pre-filtering of the probed slots; the probability that we need to actually peek inside any colliding object(s) is very small. * In grow_object_hash(), we actually do not need to look inside the objects at all. This should give a substantial speedup during hashtable resizing. The choice of 'long' makes it the same size as a pointer (to which any smaller data type would be padded anyway) on x86 and x86_64 Linuxen, and probably many others. So the hash table will be twice as big as before. I get a decent speedup, for example using git.git as a test repository: Test before after - 0001.1: rev-list --all 0.42(0.40+0.01) 0.41(0.39+0.01) -1.5%** 0001.2: rev-list --all --objects 2.40(2.37+0.03) 2.28(2.25+0.03) -5.0%*** - And even more in linux.git: - 0001.1: rev-list --all 3.31(3.19+0.12) 3.21(3.09+0.11) -2.9%** 0001.2: rev-list --all --objects 27.99(27.70+0.26) 25.99(25.71+0.25) -7.1%*** - Signed-off-by: Thomas Rast --- I expected the big win to be in grow_object_hash(), but perf says that 'git rev-list --all --objects' spends a whopping 33.6% of its time in lookup_object(), and this change gets that down to 30.0%. object.c | 58 -- 1 file changed, 36 insertions(+), 22 deletions(-) diff --git a/object.c b/object.c index 20703f5..6b84c87 100644 --- a/object.c +++ b/object.c @@ -5,7 +5,12 @@ #include "commit.h" #include "tag.h" -static struct object **obj_hash; +struct obj_hash_ent { + struct object *obj; + unsigned long sha1prefix; +}; + +static struct obj_hash_ent *obj_hash; static int nr_objs, obj_hash_size; unsigned int get_max_object_index(void) @@ -15,7 +20,7 @@ unsigned int get_max_object_index(void) struct object *get_indexed_object(unsigned int idx) { - return obj_hash[idx]; + return obj_hash[idx].obj; } static const char *object_type_strings[] = { @@ -43,43 +48,52 @@ int type_from_string(const char *str) die("invalid object type \"%s\"", str); } -static unsigned int hash_obj(struct object *obj, unsigned int n) +static unsigned long hash_sha1(const unsigned char *sha1) { - unsigned int hash; - memcpy(&hash, obj->sha1, sizeof(unsigned int)); - return hash % n; + unsigned long sha1prefix; + memcpy(&sha1prefix, sha1, sizeof(unsigned long)); + return sha1prefix; } -static void insert_obj_hash(struct object *obj, struct object **hash, unsigned int size) +static unsigned long hash_obj(struct object *obj) { - unsigned int j = hash_obj(obj, size); + return hash_sha1(obj->sha1); +} - while (hash[j]) { +static void insert_obj_hash_1(struct object *obj, struct obj_hash_ent *hash, unsigned int size, + unsigned long sha1prefix) +{ + unsigned int j = (unsigned int) sha1prefix % size; + + while (hash[j].obj) { j++; if (j >= size) j = 0; } - hash[j] = obj; + hash[j].obj = obj; + hash[j].sha1prefix = sha1prefix; } -static unsigned int hashtable_index(const unsigned char *sha1) +static void insert_obj_hash(struct object *obj, struct obj_hash_ent *table, unsigned int size) { - unsigned int i; - memcpy(&i, sha1, sizeof(unsig