Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-03 Thread Jeff King
On Fri, May 03, 2013 at 07:59:09AM +0200, Johannes Sixt wrote:

 Am 5/2/2013 17:46, schrieb Jeff King:
  On Thu, May 02, 2013 at 09:05:01AM +0200, Johannes Sixt wrote:
  BTW, do you notice that the function is now modifying an object (the hash
  table) even though this is rather unexpected from a lookup function?
  
  I think this is fine. The function is conceptually constant from the
  outside; callers don't even know about the hash table. They just know
  that there is some mapping. It's similar to the way that lookup_commit
  will lazily allocate the struct commit. The callers do not care
  whether it exists already or not; they care that at the end of the
  function, they have a pointer to the commit. Everything else is an
  implementation detail.
 
 Can we be sure that the function is never invoked in concurrently from
 different threads? I attempted to audit code paths, but quickly gave up
 because I know too little about this machinery.

I didn't check explicitly, but in general such a program would already
need a mutex to synchronize object lookup. Not for lookup_object
specifically, but because lookup_object is mostly used to back
lookup_commit, lookup_tree, etc, which are already not re-entrant
(because they lazily insert into the hash behind the scenes).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-03 Thread Jeff King
On Fri, May 03, 2013 at 02:02:44AM -0400, Jeff King wrote:

  Can we be sure that the function is never invoked in concurrently from
  different threads? I attempted to audit code paths, but quickly gave up
  because I know too little about this machinery.
 
 I didn't check explicitly, but in general such a program would already
 need a mutex to synchronize object lookup. Not for lookup_object
 specifically, but because lookup_object is mostly used to back
 lookup_commit, lookup_tree, etc, which are already not re-entrant
 (because they lazily insert into the hash behind the scenes).

I just looked through the 25 existing calls to lookup_object. All of
them should be OK. Most of them are coupled with immediate calls to
update the hash table and/or to call read_sha1_file (which is also very
not-thread-safe). So I don't think the patch introduces any bug into the
current code.

It does introduce a potential for future bugs in concurrent code, but I
don't know that it makes a significant dent in the current codebase. We
already have a lot of non-reentrant functions, including all of the
other lookup_* functions.  Our concurrent code is typically very careful
to use a small known-safe subset of functions.

I did look originally at updating the ordering when the hash is already
being updated (i.e., to insert a new entry at the front of a collision
chain, rather than the back). However, that didn't yield nearly as much
improvement. I think partially because the locality of an insert/lookup
pair is not as good as a lookup/lookup pair. And partially because we
destroy that ordering for existing entries whenever we grow the hash
table.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-02 Thread Johannes Sixt
Am 5/1/2013 22:34, schrieb Jeff King:
  struct object *lookup_object(const unsigned char *sha1)
  {
 - unsigned int i;
 + unsigned int i, first;
   struct object *obj;
  
   if (!obj_hash)
   return NULL;
  
 - i = hashtable_index(sha1);
 + first = i = hashtable_index(sha1);
   while ((obj = obj_hash[i]) != NULL) {
   if (!hashcmp(sha1, obj-sha1))
   break;
 @@ -85,6 +85,11 @@ struct object *lookup_object(const unsigned char *sha1)
   if (i == obj_hash_size)
   i = 0;
   }
 + if (obj  i != first) {
 + struct object *tmp = obj_hash[i];
 + obj_hash[i] = obj_hash[first];
 + obj_hash[first] = tmp;
 + }
   return obj;
  }

This is one of the places where I think the code does not speak for itself
and a comment is warranted: The new if statement is not about correctness,
but about optimization:

/*
 * Move object to where we started to look for it
 * so that we do not need to walk the hash table
 * the next time we look for it.
 */

or something.

-- Hannes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-02 Thread Jeff King
On Thu, May 02, 2013 at 08:44:07AM +0200, Johannes Sixt wrote:

 Am 5/1/2013 22:34, schrieb Jeff King:
   struct object *lookup_object(const unsigned char *sha1)
   {
  -   unsigned int i;
  +   unsigned int i, first;
  struct object *obj;
   
  if (!obj_hash)
  return NULL;
   
  -   i = hashtable_index(sha1);
  +   first = i = hashtable_index(sha1);
  while ((obj = obj_hash[i]) != NULL) {
  if (!hashcmp(sha1, obj-sha1))
  break;
  @@ -85,6 +85,11 @@ struct object *lookup_object(const unsigned char *sha1)
  if (i == obj_hash_size)
  i = 0;
  }
  +   if (obj  i != first) {
  +   struct object *tmp = obj_hash[i];
  +   obj_hash[i] = obj_hash[first];
  +   obj_hash[first] = tmp;
  +   }
  return obj;
   }
 
 This is one of the places where I think the code does not speak for itself
 and a comment is warranted: The new if statement is not about correctness,
 but about optimization:

I figured the lengthy description in the commit message would be
sufficient, but I don't mind adding something like your suggestion to
point readers of the code in the right direction when they see it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-02 Thread Johannes Sixt
Am 5/2/2013 8:46, schrieb Jeff King:
 On Thu, May 02, 2013 at 08:44:07AM +0200, Johannes Sixt wrote:
 Am 5/1/2013 22:34, schrieb Jeff King:
  struct object *lookup_object(const unsigned char *sha1)
  {
 -   unsigned int i;
 +   unsigned int i, first;
 struct object *obj;
  
 if (!obj_hash)
 return NULL;
  
 -   i = hashtable_index(sha1);
 +   first = i = hashtable_index(sha1);
 while ((obj = obj_hash[i]) != NULL) {
 if (!hashcmp(sha1, obj-sha1))
 break;
 @@ -85,6 +85,11 @@ struct object *lookup_object(const unsigned char *sha1)
 if (i == obj_hash_size)
 i = 0;
 }
 +   if (obj  i != first) {
 +   struct object *tmp = obj_hash[i];
 +   obj_hash[i] = obj_hash[first];
 +   obj_hash[first] = tmp;
 +   }
 return obj;
  }

 This is one of the places where I think the code does not speak for itself
 and a comment is warranted: The new if statement is not about correctness,
 but about optimization:
 
 I figured the lengthy description in the commit message would be
 sufficient,

It's absolutely sufficient *if* one reads the commit message. In this
case, though it goes more like this function should be trivial, and it is
-- up to this if statement; what the heck is it good for? and the reader
is forced to dig the history.

BTW, do you notice that the function is now modifying an object (the hash
table) even though this is rather unexpected from a lookup function?

-- Hannes
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-02 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 I figured the lengthy description in the commit message would be
 sufficient, but I don't mind adding something like your suggestion to
 point readers of the code in the right direction when they see it.

Yeah, I'll squash J6t's comment in and requeue.

If somebody is learning the internals by reading code, having a
comment there would certainly help.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-02 Thread Junio C Hamano
Johannes Sixt j.s...@viscovery.net writes:

 BTW, do you notice that the function is now modifying an object (the hash
 table) even though this is rather unexpected from a lookup function?

At the philosophical level, lookup ought to be operating on a
const table.  But at the implementation level, the table does not
have to be const in the sense the implemenation language and data
structure defines.

I think this patch is a good example of that.

I am kind of surprised that Jeff's attempt to do a full LRU made
things worse, though.  The only additional code compared to swap is
a single memmove(): are we colliding too often (having to move a
lot)?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-02 Thread Jeff King
On Thu, May 02, 2013 at 09:05:01AM +0200, Johannes Sixt wrote:

  I figured the lengthy description in the commit message would be
  sufficient,
 
 It's absolutely sufficient *if* one reads the commit message. In this
 case, though it goes more like this function should be trivial, and it is
 -- up to this if statement; what the heck is it good for? and the reader
 is forced to dig the history.

What I really didn't want to do was to rewrite the whole rationale
again. I think something like:

  /*
   * Move the found object to the front of the collision chain;
   * this is a pure optimization that can make future lookups cheaper,
   * assuming there is some time-locality to which objects are looked
   * up.
   */

would be enough, and interested parties can go to the commit history (I
tend to go there pretty rapidly if there is a chunk of code that sticks
out, but I guess not everybody does).

 BTW, do you notice that the function is now modifying an object (the hash
 table) even though this is rather unexpected from a lookup function?

I think this is fine. The function is conceptually constant from the
outside; callers don't even know about the hash table. They just know
that there is some mapping. It's similar to the way that lookup_commit
will lazily allocate the struct commit. The callers do not care
whether it exists already or not; they care that at the end of the
function, they have a pointer to the commit. Everything else is an
implementation detail.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-02 Thread Jeff King
On Thu, May 02, 2013 at 08:46:08AM -0700, Junio C Hamano wrote:

 Johannes Sixt j.s...@viscovery.net writes:
 
  BTW, do you notice that the function is now modifying an object (the hash
  table) even though this is rather unexpected from a lookup function?
 
 At the philosophical level, lookup ought to be operating on a
 const table.  But at the implementation level, the table does not
 have to be const in the sense the implemenation language and data
 structure defines.
 
 I think this patch is a good example of that.

Very much agreed.

 I am kind of surprised that Jeff's attempt to do a full LRU made
 things worse, though.  The only additional code compared to swap is
 a single memmove(): are we colliding too often (having to move a
 lot)?

I actually didn't use memmove, but a hand-rolled loop. I wonder if that
would have made the difference. It's a little tricky because you may
have to cross the mod(obj_hash_size) wrap-around. The patch I tested
was:

diff --git a/object.c b/object.c
index 1ba2083..8b2412c 100644
--- a/object.c
+++ b/object.c
@@ -85,10 +85,13 @@ struct object *lookup_object(const unsigned char *sha1)
if (i == obj_hash_size)
i = 0;
}
-   if (obj  i != first) {
-   struct object *tmp = obj_hash[i];
-   obj_hash[i] = obj_hash[first];
-   obj_hash[first] = tmp;
+   if (obj) {
+   while (i != first) {
+   int prev = i ? i - 1 : obj_hash_size - 1;
+   obj_hash[i] = obj_hash[prev];
+   i = prev;
+   }
+   obj_hash[i] = obj;
}
return obj;
 }

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] lookup_object: prioritize recently found objects

2013-05-01 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 We could instead bump X into the `i` slot, and then shift
 the whole contiguous chain down by one, resulting in:

 index   | i-1 |  i  | i+1 | i+2 |
---
 entry   ... |  A  |  X  |  B  |  C  | ...
---

 That puts our chain in true most-recently-used order.
 However, experiments show that it is not any faster (and in
 fact, is slightly slower due to the extra manipulation).

Makes quite a lot of sense to me.  Very simple, clean and nice.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html