Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
On Fri, Jul 11, 2014 at 01:59:53AM +0100, Ramsay Jones wrote: The code you're touching here was trying to make sure that each commit gets a unique index, under the assumption that commits only get allocated via alloc_commit_node. But I think that assumption is wrong. We can also get commit objects by allocating an OBJ_NONE (e.g., via lookup_unknown_object) and then converting it into an OBJ_COMMIT when we find out what it is. Hmm, I don't know how the object is converted, but the object allocator is actually allocating an 'union any_object', so it's allocating more space than for a struct object anyway. Right, we would generally want to avoid lookup_unknown_object where we can for that reason. If you add an 'index' field to struct object, (and remove it from struct commit) it could be set in alloc_object_node(). ie _all_ node types get an index field. That was something I considered when we did the original commit-slab work, as it would let you do similar tricks for any set of objects, not just commits. The reasons against it are: 1. It would bloat the size of blob and tree structs by at least 4 bytes (probably 8 for alignment). In most repos, commits make up only 10-20% of the total objects (so for linux.git, we're talking about 25MB extra in the working set). 2. It makes single types sparse in the index space. In cases where you do just want to keep data on commits (and that is the main use), you end up allocating a slab entry per object, rather than per commit. That wastes memory (much worse than 25MB if your slab items are large), and reduces cache locality. You could probably get around (2) by splitting the index space by type and allocating them in pools, but that complicates things considerably, as you have to guess ahead of time at reasonable maximums for each type. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
On 11/07/14 09:32, Jeff King wrote: On Fri, Jul 11, 2014 at 01:59:53AM +0100, Ramsay Jones wrote: The code you're touching here was trying to make sure that each commit gets a unique index, under the assumption that commits only get allocated via alloc_commit_node. But I think that assumption is wrong. We can also get commit objects by allocating an OBJ_NONE (e.g., via lookup_unknown_object) and then converting it into an OBJ_COMMIT when we find out what it is. Hmm, I don't know how the object is converted, but the object allocator is actually allocating an 'union any_object', so it's allocating more space than for a struct object anyway. Right, we would generally want to avoid lookup_unknown_object where we can for that reason. If you add an 'index' field to struct object, (and remove it from struct commit) it could be set in alloc_object_node(). ie _all_ node types get an index field. That was something I considered when we did the original commit-slab work, as it would let you do similar tricks for any set of objects, not just commits. The reasons against it are: 1. It would bloat the size of blob and tree structs by at least 4 bytes (probably 8 for alignment). In most repos, commits make up only 10-20% of the total objects (so for linux.git, we're talking about 25MB extra in the working set). 2. It makes single types sparse in the index space. In cases where you do just want to keep data on commits (and that is the main use), you end up allocating a slab entry per object, rather than per commit. That wastes memory (much worse than 25MB if your slab items are large), and reduces cache locality. Ahem, yeah I keep telling myself not to post at 2am ... ;-) Although I haven't given this too much extra thought, I'm beginning to think that your best course would be to revert commit 969eba63 and add an convert_object_to_commit() function to commit.c which would set c-index. Then track down each place an OBJ_NONE gets converted to an OBJ_COMMIT and insert a call to convert_object_to_commit(). (which may do more than just set the index field ...) Hmm, I've just noticed a new series in my in-box. I guess I will discover what you decided to do shortly ... ;-P ATB, Ramsay Jones -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
The 'commit_count' static variable is used in alloc_commit_node() to set the 'index' field of the commit structure to a unique value. This variable assumes the same value as the 'count' field of the 'commit_state' allocator state structure, which may be used in its place. Signed-off-by: Ramsay Jones ram...@ramsay1.demon.co.uk --- alloc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/alloc.c b/alloc.c index d7c3605..c6687f9 100644 --- a/alloc.c +++ b/alloc.c @@ -64,9 +64,8 @@ static struct alloc_state commit_state; void *alloc_commit_node(void) { - static int commit_count; struct commit *c = alloc_node(commit_state, sizeof(struct commit)); - c-index = commit_count++; + c-index = commit_state.count - 1; return c; } -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
On Fri, Jul 11, 2014 at 12:59:48AM +0100, Ramsay Jones wrote: The 'commit_count' static variable is used in alloc_commit_node() to set the 'index' field of the commit structure to a unique value. This variable assumes the same value as the 'count' field of the 'commit_state' allocator state structure, which may be used in its place. I don't think we want to do this, because there is a bug in the current code that I have not reported yet. :) The code you're touching here was trying to make sure that each commit gets a unique index, under the assumption that commits only get allocated via alloc_commit_node. But I think that assumption is wrong. We can also get commit objects by allocating an OBJ_NONE (e.g., via lookup_unknown_object) and then converting it into an OBJ_COMMIT when we find out what it is. We should be allocating a commit index to it at that point (but we don't currently), at which point the commit_count and commit_state.count indices will no longer be in sync. This only happens in a few places. Most calls will probably be via lookup_commit (which gets called when we parse_object such an unknown object), but there are others (e.g., peel_object may fill in the type). So we could fix it with something like: diff --git a/commit.c b/commit.c index fb7897c..f4f4278 100644 --- a/commit.c +++ b/commit.c @@ -65,8 +65,11 @@ struct commit *lookup_commit(const unsigned char *sha1) struct commit *c = alloc_commit_node(); return create_object(sha1, OBJ_COMMIT, c); } - if (!obj-type) + if (!obj-type) { + struct commit *c = (struct commit *)obj; obj-type = OBJ_COMMIT; + c-index = alloc_commit_index(); + } return check_commit(obj, sha1, 0); } where alloc_commit_index() would be a thin wrapper around return commit_count++. And then find and update any other callsites that need it. The downside is that it's hard to find those callsites. A safer alternative would be to speculatively allocate a commit index for all unknown objects. Then if any other code switches the type to commit, they already have an index. But it also means we'll have holes in our commit index space, which makes commit-slabs less efficient. We do not call lookup_unknown_object all that often, though, so it might be an OK tradeoff (and in at least one case, in diff_tree_stdin, I think the code can be converted not to use lookup_unknown_object in the first place). So worrying about the performance may not be worth it. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/2] alloc.c: remove the redundant commit_count variable
On 11/07/14 01:30, Jeff King wrote: On Fri, Jul 11, 2014 at 12:59:48AM +0100, Ramsay Jones wrote: The 'commit_count' static variable is used in alloc_commit_node() to set the 'index' field of the commit structure to a unique value. This variable assumes the same value as the 'count' field of the 'commit_state' allocator state structure, which may be used in its place. I don't think we want to do this, because there is a bug in the current code that I have not reported yet. :) :P OK, I will simply drop this one then. The code you're touching here was trying to make sure that each commit gets a unique index, under the assumption that commits only get allocated via alloc_commit_node. But I think that assumption is wrong. We can also get commit objects by allocating an OBJ_NONE (e.g., via lookup_unknown_object) and then converting it into an OBJ_COMMIT when we find out what it is. Hmm, I don't know how the object is converted, but the object allocator is actually allocating an 'union any_object', so it's allocating more space than for a struct object anyway. If you add an 'index' field to struct object, (and remove it from struct commit) it could be set in alloc_object_node(). ie _all_ node types get an index field. Hmm, that was just off the top of my head, so take with a pinch of salt. ATB, Ramsay Jones -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html