Re: [PATCH] lookup_object: prioritize recently found objects
On Fri, May 03, 2013 at 02:02:44AM -0400, Jeff King wrote: > > Can we be sure that the function is never invoked in concurrently from > > different threads? I attempted to audit code paths, but quickly gave up > > because I know too little about this machinery. > > I didn't check explicitly, but in general such a program would already > need a mutex to synchronize object lookup. Not for lookup_object > specifically, but because lookup_object is mostly used to back > lookup_commit, lookup_tree, etc, which are already not re-entrant > (because they lazily insert into the hash behind the scenes). I just looked through the 25 existing calls to lookup_object. All of them should be OK. Most of them are coupled with immediate calls to update the hash table and/or to call read_sha1_file (which is also very not-thread-safe). So I don't think the patch introduces any bug into the current code. It does introduce a potential for future bugs in concurrent code, but I don't know that it makes a significant dent in the current codebase. We already have a lot of non-reentrant functions, including all of the other lookup_* functions. Our concurrent code is typically very careful to use a small known-safe subset of functions. I did look originally at updating the ordering when the hash is already being updated (i.e., to insert a new entry at the front of a collision chain, rather than the back). However, that didn't yield nearly as much improvement. I think partially because the locality of an insert/lookup pair is not as good as a lookup/lookup pair. And partially because we destroy that ordering for existing entries whenever we grow the hash table. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
On Fri, May 03, 2013 at 07:59:09AM +0200, Johannes Sixt wrote: > Am 5/2/2013 17:46, schrieb Jeff King: > > On Thu, May 02, 2013 at 09:05:01AM +0200, Johannes Sixt wrote: > >> BTW, do you notice that the function is now modifying an object (the hash > >> table) even though this is rather unexpected from a "lookup" function? > > > > I think this is fine. The function is conceptually constant from the > > outside; callers don't even know about the hash table. They just know > > that there is some mapping. It's similar to the way that lookup_commit > > will lazily allocate the "struct commit". The callers do not care > > whether it exists already or not; they care that at the end of the > > function, they have a pointer to the commit. Everything else is an > > implementation detail. > > Can we be sure that the function is never invoked in concurrently from > different threads? I attempted to audit code paths, but quickly gave up > because I know too little about this machinery. I didn't check explicitly, but in general such a program would already need a mutex to synchronize object lookup. Not for lookup_object specifically, but because lookup_object is mostly used to back lookup_commit, lookup_tree, etc, which are already not re-entrant (because they lazily insert into the hash behind the scenes). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
Am 5/2/2013 17:46, schrieb Jeff King: > On Thu, May 02, 2013 at 09:05:01AM +0200, Johannes Sixt wrote: >> BTW, do you notice that the function is now modifying an object (the hash >> table) even though this is rather unexpected from a "lookup" function? > > I think this is fine. The function is conceptually constant from the > outside; callers don't even know about the hash table. They just know > that there is some mapping. It's similar to the way that lookup_commit > will lazily allocate the "struct commit". The callers do not care > whether it exists already or not; they care that at the end of the > function, they have a pointer to the commit. Everything else is an > implementation detail. Can we be sure that the function is never invoked in concurrently from different threads? I attempted to audit code paths, but quickly gave up because I know too little about this machinery. -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
On Thu, May 02, 2013 at 08:46:08AM -0700, Junio C Hamano wrote: > Johannes Sixt writes: > > > BTW, do you notice that the function is now modifying an object (the hash > > table) even though this is rather unexpected from a "lookup" function? > > At the philosophical level, "lookup" ought to be operating on a > "const" table. But at the implementation level, the table does not > have to be "const" in the sense the implemenation language and data > structure defines. > > I think this patch is a good example of that. Very much agreed. > I am kind of surprised that Jeff's attempt to do a full LRU made > things worse, though. The only additional code compared to swap is > a single memmove(): are we colliding too often (having to move a > lot)? I actually didn't use memmove, but a hand-rolled loop. I wonder if that would have made the difference. It's a little tricky because you may have to cross the mod(obj_hash_size) wrap-around. The patch I tested was: diff --git a/object.c b/object.c index 1ba2083..8b2412c 100644 --- a/object.c +++ b/object.c @@ -85,10 +85,13 @@ struct object *lookup_object(const unsigned char *sha1) if (i == obj_hash_size) i = 0; } - if (obj && i != first) { - struct object *tmp = obj_hash[i]; - obj_hash[i] = obj_hash[first]; - obj_hash[first] = tmp; + if (obj) { + while (i != first) { + int prev = i ? i - 1 : obj_hash_size - 1; + obj_hash[i] = obj_hash[prev]; + i = prev; + } + obj_hash[i] = obj; } return obj; } -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
On Thu, May 02, 2013 at 09:05:01AM +0200, Johannes Sixt wrote: > > I figured the lengthy description in the commit message would be > > sufficient, > > It's absolutely sufficient *if* one reads the commit message. In this > case, though it goes more like "this function should be trivial, and it is > -- up to this if statement; what the heck is it good for?" and the reader > is forced to dig the history. What I really didn't want to do was to rewrite the whole rationale again. I think something like: /* * Move the found object to the front of the collision chain; * this is a pure optimization that can make future lookups cheaper, * assuming there is some time-locality to which objects are looked * up. */ would be enough, and interested parties can go to the commit history (I tend to go there pretty rapidly if there is a chunk of code that sticks out, but I guess not everybody does). > BTW, do you notice that the function is now modifying an object (the hash > table) even though this is rather unexpected from a "lookup" function? I think this is fine. The function is conceptually constant from the outside; callers don't even know about the hash table. They just know that there is some mapping. It's similar to the way that lookup_commit will lazily allocate the "struct commit". The callers do not care whether it exists already or not; they care that at the end of the function, they have a pointer to the commit. Everything else is an implementation detail. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
Johannes Sixt writes: > BTW, do you notice that the function is now modifying an object (the hash > table) even though this is rather unexpected from a "lookup" function? At the philosophical level, "lookup" ought to be operating on a "const" table. But at the implementation level, the table does not have to be "const" in the sense the implemenation language and data structure defines. I think this patch is a good example of that. I am kind of surprised that Jeff's attempt to do a full LRU made things worse, though. The only additional code compared to swap is a single memmove(): are we colliding too often (having to move a lot)? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
Jeff King writes: > I figured the lengthy description in the commit message would be > sufficient, but I don't mind adding something like your suggestion to > point readers of the code in the right direction when they see it. Yeah, I'll squash J6t's comment in and requeue. If somebody is learning the internals by reading code, having a comment there would certainly help. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
Am 5/2/2013 8:46, schrieb Jeff King: > On Thu, May 02, 2013 at 08:44:07AM +0200, Johannes Sixt wrote: >> Am 5/1/2013 22:34, schrieb Jeff King: >>> struct object *lookup_object(const unsigned char *sha1) >>> { >>> - unsigned int i; >>> + unsigned int i, first; >>> struct object *obj; >>> >>> if (!obj_hash) >>> return NULL; >>> >>> - i = hashtable_index(sha1); >>> + first = i = hashtable_index(sha1); >>> while ((obj = obj_hash[i]) != NULL) { >>> if (!hashcmp(sha1, obj->sha1)) >>> break; >>> @@ -85,6 +85,11 @@ struct object *lookup_object(const unsigned char *sha1) >>> if (i == obj_hash_size) >>> i = 0; >>> } >>> + if (obj && i != first) { >>> + struct object *tmp = obj_hash[i]; >>> + obj_hash[i] = obj_hash[first]; >>> + obj_hash[first] = tmp; >>> + } >>> return obj; >>> } >> >> This is one of the places where I think the code does not speak for itself >> and a comment is warranted: The new if statement is not about correctness, >> but about optimization: > > I figured the lengthy description in the commit message would be > sufficient, It's absolutely sufficient *if* one reads the commit message. In this case, though it goes more like "this function should be trivial, and it is -- up to this if statement; what the heck is it good for?" and the reader is forced to dig the history. BTW, do you notice that the function is now modifying an object (the hash table) even though this is rather unexpected from a "lookup" function? -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
On Thu, May 02, 2013 at 08:44:07AM +0200, Johannes Sixt wrote: > Am 5/1/2013 22:34, schrieb Jeff King: > > struct object *lookup_object(const unsigned char *sha1) > > { > > - unsigned int i; > > + unsigned int i, first; > > struct object *obj; > > > > if (!obj_hash) > > return NULL; > > > > - i = hashtable_index(sha1); > > + first = i = hashtable_index(sha1); > > while ((obj = obj_hash[i]) != NULL) { > > if (!hashcmp(sha1, obj->sha1)) > > break; > > @@ -85,6 +85,11 @@ struct object *lookup_object(const unsigned char *sha1) > > if (i == obj_hash_size) > > i = 0; > > } > > + if (obj && i != first) { > > + struct object *tmp = obj_hash[i]; > > + obj_hash[i] = obj_hash[first]; > > + obj_hash[first] = tmp; > > + } > > return obj; > > } > > This is one of the places where I think the code does not speak for itself > and a comment is warranted: The new if statement is not about correctness, > but about optimization: I figured the lengthy description in the commit message would be sufficient, but I don't mind adding something like your suggestion to point readers of the code in the right direction when they see it. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
Am 5/1/2013 22:34, schrieb Jeff King: > struct object *lookup_object(const unsigned char *sha1) > { > - unsigned int i; > + unsigned int i, first; > struct object *obj; > > if (!obj_hash) > return NULL; > > - i = hashtable_index(sha1); > + first = i = hashtable_index(sha1); > while ((obj = obj_hash[i]) != NULL) { > if (!hashcmp(sha1, obj->sha1)) > break; > @@ -85,6 +85,11 @@ struct object *lookup_object(const unsigned char *sha1) > if (i == obj_hash_size) > i = 0; > } > + if (obj && i != first) { > + struct object *tmp = obj_hash[i]; > + obj_hash[i] = obj_hash[first]; > + obj_hash[first] = tmp; > + } > return obj; > } This is one of the places where I think the code does not speak for itself and a comment is warranted: The new if statement is not about correctness, but about optimization: /* * Move object to where we started to look for it * so that we do not need to walk the hash table * the next time we look for it. */ or something. -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] lookup_object: prioritize recently found objects
Jeff King writes: > We could instead bump X into the `i` slot, and then shift > the whole contiguous chain down by one, resulting in: > > index | i-1 | i | i+1 | i+2 | >--- > entry ... | A | X | B | C | ... >--- > > That puts our chain in true most-recently-used order. > However, experiments show that it is not any faster (and in > fact, is slightly slower due to the extra manipulation). Makes quite a lot of sense to me. Very simple, clean and nice. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html