Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable

2014-07-11 Thread Jeff King
On Fri, Jul 11, 2014 at 01:59:53AM +0100, Ramsay Jones wrote:

  The code you're touching here was trying to make sure that each commit
  gets a unique index, under the assumption that commits only get
  allocated via alloc_commit_node. But I think that assumption is wrong.
  We can also get commit objects by allocating an OBJ_NONE (e.g., via
  lookup_unknown_object) and then converting it into an OBJ_COMMIT when we
  find out what it is.
 
 Hmm, I don't know how the object is converted, but the object allocator
 is actually allocating an 'union any_object', so it's allocating more
 space than for a struct object anyway.

Right, we would generally want to avoid lookup_unknown_object where we
can for that reason.

 If you add an 'index' field to struct object, (and remove it from
 struct commit) it could be set in alloc_object_node(). ie _all_ node
 types get an index field.

That was something I considered when we did the original commit-slab
work, as it would let you do similar tricks for any set of objects, not
just commits. The reasons against it are:

  1. It would bloat the size of blob and tree structs by at least 4
 bytes (probably 8 for alignment). In most repos, commits make up
 only 10-20% of the total objects (so for linux.git, we're talking
 about 25MB extra in the working set).

  2. It makes single types sparse in the index space. In cases where you
 do just want to keep data on commits (and that is the main use),
 you end up allocating a slab entry per object, rather than per
 commit. That wastes memory (much worse than 25MB if your slab items
 are large), and reduces cache locality.

You could probably get around (2) by splitting the index space by type
and allocating them in pools, but that complicates things considerably,
as you have to guess ahead of time at reasonable maximums for each type.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable

2014-07-11 Thread Ramsay Jones
On 11/07/14 09:32, Jeff King wrote:
 On Fri, Jul 11, 2014 at 01:59:53AM +0100, Ramsay Jones wrote:
 
 The code you're touching here was trying to make sure that each commit
 gets a unique index, under the assumption that commits only get
 allocated via alloc_commit_node. But I think that assumption is wrong.
 We can also get commit objects by allocating an OBJ_NONE (e.g., via
 lookup_unknown_object) and then converting it into an OBJ_COMMIT when we
 find out what it is.

 Hmm, I don't know how the object is converted, but the object allocator
 is actually allocating an 'union any_object', so it's allocating more
 space than for a struct object anyway.
 
 Right, we would generally want to avoid lookup_unknown_object where we
 can for that reason.
 
 If you add an 'index' field to struct object, (and remove it from
 struct commit) it could be set in alloc_object_node(). ie _all_ node
 types get an index field.
 
 That was something I considered when we did the original commit-slab
 work, as it would let you do similar tricks for any set of objects, not
 just commits. The reasons against it are:
 
   1. It would bloat the size of blob and tree structs by at least 4
  bytes (probably 8 for alignment). In most repos, commits make up
  only 10-20% of the total objects (so for linux.git, we're talking
  about 25MB extra in the working set).
 
   2. It makes single types sparse in the index space. In cases where you
  do just want to keep data on commits (and that is the main use),
  you end up allocating a slab entry per object, rather than per
  commit. That wastes memory (much worse than 25MB if your slab items
  are large), and reduces cache locality.
 

Ahem, yeah I keep telling myself not to post at 2am ... ;-)

Although I haven't given this too much extra thought, I'm beginning
to think that your best course would be to revert commit 969eba63
and add an convert_object_to_commit() function to commit.c which
would set c-index. Then track down each place an OBJ_NONE gets
converted to an OBJ_COMMIT and insert a call to
convert_object_to_commit(). (which may do more than just set the
index field ...)

Hmm, I've just noticed a new series in my in-box. I guess I will
discover what you decided to do shortly ... ;-P

ATB,
Ramsay Jones



--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] alloc.c: remove the redundant commit_count variable

2014-07-10 Thread Ramsay Jones

The 'commit_count' static variable is used in alloc_commit_node()
to set the 'index' field of the commit structure to a unique value.
This variable assumes the same value as the 'count' field of the
'commit_state' allocator state structure, which may be used in its
place.

Signed-off-by: Ramsay Jones ram...@ramsay1.demon.co.uk
---
 alloc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/alloc.c b/alloc.c
index d7c3605..c6687f9 100644
--- a/alloc.c
+++ b/alloc.c
@@ -64,9 +64,8 @@ static struct alloc_state commit_state;
 
 void *alloc_commit_node(void)
 {
-   static int commit_count;
struct commit *c = alloc_node(commit_state, sizeof(struct commit));
-   c-index = commit_count++;
+   c-index = commit_state.count - 1;
return c;
 }
 
-- 
2.0.0
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable

2014-07-10 Thread Jeff King
On Fri, Jul 11, 2014 at 12:59:48AM +0100, Ramsay Jones wrote:

 The 'commit_count' static variable is used in alloc_commit_node()
 to set the 'index' field of the commit structure to a unique value.
 This variable assumes the same value as the 'count' field of the
 'commit_state' allocator state structure, which may be used in its
 place.

I don't think we want to do this, because there is a bug in the current
code that I have not reported yet. :)

The code you're touching here was trying to make sure that each commit
gets a unique index, under the assumption that commits only get
allocated via alloc_commit_node. But I think that assumption is wrong.
We can also get commit objects by allocating an OBJ_NONE (e.g., via
lookup_unknown_object) and then converting it into an OBJ_COMMIT when we
find out what it is.

We should be allocating a commit index to it at that point (but we don't
currently), at which point the commit_count and commit_state.count
indices will no longer be in sync.

This only happens in a few places. Most calls will probably be via
lookup_commit (which gets called when we parse_object such an unknown
object), but there are others (e.g., peel_object may fill in the type).
So we could fix it with something like:

diff --git a/commit.c b/commit.c
index fb7897c..f4f4278 100644
--- a/commit.c
+++ b/commit.c
@@ -65,8 +65,11 @@ struct commit *lookup_commit(const unsigned char *sha1)
struct commit *c = alloc_commit_node();
return create_object(sha1, OBJ_COMMIT, c);
}
-   if (!obj-type)
+   if (!obj-type) {
+   struct commit *c = (struct commit *)obj;
obj-type = OBJ_COMMIT;
+   c-index = alloc_commit_index();
+   }
return check_commit(obj, sha1, 0);
 }
 

where alloc_commit_index() would be a thin wrapper around return
commit_count++. And then find and update any other callsites that need
it.

The downside is that it's hard to find those callsites. A safer
alternative would be to speculatively allocate a commit index for all
unknown objects. Then if any other code switches the type to commit,
they already have an index. But it also means we'll have holes in our
commit index space, which makes commit-slabs less efficient.

We do not call lookup_unknown_object all that often, though, so it might
be an OK tradeoff (and in at least one case, in diff_tree_stdin, I think
the code can be converted not to use lookup_unknown_object in the first
place). So worrying about the performance may not be worth it.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable

2014-07-10 Thread Ramsay Jones
On 11/07/14 01:30, Jeff King wrote:
 On Fri, Jul 11, 2014 at 12:59:48AM +0100, Ramsay Jones wrote:
 
 The 'commit_count' static variable is used in alloc_commit_node()
 to set the 'index' field of the commit structure to a unique value.
 This variable assumes the same value as the 'count' field of the
 'commit_state' allocator state structure, which may be used in its
 place.
 
 I don't think we want to do this, because there is a bug in the current
 code that I have not reported yet. :)

:P OK, I will simply drop this one then.

 
 The code you're touching here was trying to make sure that each commit
 gets a unique index, under the assumption that commits only get
 allocated via alloc_commit_node. But I think that assumption is wrong.
 We can also get commit objects by allocating an OBJ_NONE (e.g., via
 lookup_unknown_object) and then converting it into an OBJ_COMMIT when we
 find out what it is.

Hmm, I don't know how the object is converted, but the object allocator
is actually allocating an 'union any_object', so it's allocating more
space than for a struct object anyway. If you add an 'index' field to
struct object, (and remove it from struct commit) it could be set in
alloc_object_node(). ie _all_ node types get an index field.

Hmm, that was just off the top of my head, so take with a pinch of salt.

ATB,
Ramsay Jones


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html