[PATCH 14/17] commit-graph: load split commit-graph files
From: Derrick Stolee Starting with commit-graph, load commit-graph files in a sequence as follows: commit-graph commit-graph-1 commit-graph-2 ... commit-graph-N This creates N + 1 files in order. Signed-off-by: Derrick Stolee --- commit-graph.c | 39 ++- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index f790f44a9c..5f6193277a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -45,6 +45,12 @@ char *get_commit_graph_filename(const char *obj_dir) return xstrfmt("%s/info/commit-graph", obj_dir); } +static char *get_split_graph_filename(const char *obj_dir, + uint32_t split_count) +{ + return xstrfmt("%s/info/commit-graphs/commit-graph-%d", obj_dir, split_count); +} + static uint8_t oid_version(void) { return 1; @@ -289,15 +295,31 @@ static struct commit_graph *load_commit_graph_one(const char *graph_file) static void prepare_commit_graph_one(struct repository *r, const char *obj_dir) { char *graph_name; + uint32_t split_count = 1; + struct commit_graph *g; if (r->objects->commit_graph) return; graph_name = get_commit_graph_filename(obj_dir); - r->objects->commit_graph = - load_commit_graph_one(graph_name); - + g = load_commit_graph_one(graph_name); FREE_AND_NULL(graph_name); + + while (g) { + g->base_graph = r->objects->commit_graph; + + if (g->base_graph) + g->num_commits_in_base = g->base_graph->num_commits + + g->base_graph->num_commits_in_base;; + + r->objects->commit_graph = g; + + graph_name = get_split_graph_filename(obj_dir, split_count); + g = load_commit_graph_one(graph_name); + FREE_AND_NULL(graph_name); + + split_count++; + } } /* @@ -411,8 +433,15 @@ static struct commit_list **insert_parent_or_die(struct repository *r, static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos) { - const unsigned char *commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * pos; - item->graph_pos = pos + g->num_commits_in_base; + const unsigned char *commit_data; + + if (pos < g->num_commits_in_base) { + fill_commit_graph_info(item, g->base_graph, pos); + return; + } + + commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * (pos - g->num_commits_in_base); + item->graph_pos = pos; item->generation = get_be32(commit_data + g->hash_len + 8) >> 2; } -- gitgitgadget
[PATCH 01/17] commit-graph: fix the_repository reference
From: Derrick Stolee The parse_commit_buffer() method takes a repository pointer, so it should not refer to the_repository anymore. Signed-off-by: Derrick Stolee --- commit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commit.c b/commit.c index a5333c7ac6..e4d1233226 100644 --- a/commit.c +++ b/commit.c @@ -443,7 +443,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b item->date = parse_commit_date(bufptr, tail); if (check_graph) - load_commit_graph_info(the_repository, item); + load_commit_graph_info(r, item); return 0; } -- gitgitgadget
Re: [PATCH 01/19] revision.h: avoid bit fields in struct rev_info
On 5/8/2019 10:41 AM, Duy Nguyen wrote: > On Wed, May 8, 2019 at 9:07 PM Derrick Stolee wrote: >> >> On 5/8/2019 7:12 AM, Nguyễn Thái Ngọc Duy wrote: >>> Bitfield addresses cannot be passed around in a pointer. This makes it >>> hard to use parse-options to set/unset them. Turn this struct to >>> normal integers. This of course increases the size of this struct >>> multiple times, but since we only have a handful of rev_info variables >>> around, memory consumption is not at all a concern. >> >> I think you are right that this memory trade-off shouldn't be a problem. >> >> What worries me instead is that we are using an "internal" data structure >> for option parsing. Would it make more sense to create a struct for use >> in the parse_opts method and a method that translates those options into >> the bitfield in struct rev_info? > > But we are doing that now (option parsing) using the same data > structure. Why would changing from a custom parser to parse_options() > affect what fields it should or should not touch in rev_info? Genuine > question. Maybe you could elaborate more about "internal". I probably > missed something. Or maybe this is a good opportunity to separate > intermediate option parsing variables from the rest of rev_info? You're right. I was unclear. rev_info stores a lot of data. Some of the fields are important in-memory structures that are crucial to the workings of revision.c and are never used by builtin/rev-list.c. Combining this purpose with the option parsing seems smelly to me. Thinking more on it, I would prefer a more invasive change that may pay off in the long term. These options, along with the "starting list" values, could be extracted to a 'struct rev_options' that contains these integers and the commit list. Then your option parsing changes could be limited to a rev_options struct, which is later inserted into a rev_info struct during setup_revisions(). Generally, the rev_info struct has too many members and could be split into smaller pieces according to purpose. I created the topo_walk_info struct as a way to not make the situation worse, but doesn't fix existing pain. My ramblings are mostly complaining about old code that grew organically across many many quality additions. It is definitely hard to understand the revision-walking code, and perhaps it would be easier to understand with a little more structure. The biggest issue with my suggestion is that it requires changing the consumers of the options, as they would no longer live directly on the rev_info struct. That would be a big change, even if it could be done with string replacement. Thanks, -Stolee
Re: [PATCH 1/2] midx: pass a repository pointer
On 5/7/2019 4:31 AM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" writes: > >> From: Derrick Stolee >> >> Much of the multi-pack-index code focuses on the multi_pack_index >> struct, and so we only pass a pointer to the current one. However, >> we will insert a dependency on the packed_git linked list in a >> future change, so we will need a repository reference. Inserting >> these parameters is a significant enough change to split out. >> >> Signed-off-by: Derrick Stolee >> --- > > This is a good move in the loger term, but not a happy thing to do > while your "expire" topic is still in flight, as the impact from > updating the signature of prepare_midx_pack() and friends will break > new callers in expire_midx_packs() etc. > > I am tempted to queue this and eject ds/midx-expire-repack for now, > while checking how that topic would look like when rebased on top of > these two patches. We'll see. Sorry for the noise. I consider this series a higher priority, as it fixes a problem with the feature in existing releases. Thanks, -Stolee
Re: [PATCH v3] repack: enable bitmaps by default on bare repos
On 5/8/2019 3:11 AM, Jeff King wrote: > On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote: > >>> I think we'd want a way to tell the bitmap code to update our progress >>> meter as it traverses (both single objects, but also taking into account >>> when it finds a bitmap and then suddenly bumps the value by a large >>> amount). >> >> Not splitting it will fix the progress bar stalling, so it fixes the >> problem that the user is wondering if the command is entirely hanging. >> >> But I was hoping to give the user an idea of roughly where we're >> spending our time, e.g. so you can see how much the pack.useSparse >> setting is helping (or not). > > Yeah, I think that's a bigger and more complicated problem. I admit that > my main annoyance is just the stall while we fill in the bitmaps (and > it's easy because the bitmap traversal is the same unit of work as a > regular traversal). The pack.useSparse setting also speeds up a section that is not marked by progress: that portion usually is walking all UNINTERESTING trees and the"Enumerating Objects" progress is just for walking the INTERESTING objects. >> So something where we report sub-progress as we go along, and perhaps >> print some brief summary at the end if it took long enough, e.g.: >> >> Enumerating Objects (X^1%) => Marking trees (Y^1%) >> Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%) I like this idea for splitting the "normal" mechanism, too: Enumerating Objects (X^1%) => Marking trees (Y^1%) Enumerating Objects (X^2%) => Enumerating objects to pack (Y^2%) >> I.e. bringing the whole "nested" trace2 regions full circle with the >> progress bar where we could elect to trace/show some of that info, and >> then you could turn on some trace2 mode/verbose progress to see more. > > I do wonder if this really needs to be part of the progress bar. The > goal of the progress bar is to give the user a sense that work is > happening, and (if possible, but not for "enumerating") an idea of when > it might finish. If the trace code can already do detailed timings, then > shouldn't we just be encouraging people to use that? The problem I've seen (without bitmaps) is that running `git push` can take a while before _any_ progress is listed. Good news is: `pack.useSparse` fixed our push problem in the Windows OS repo. The end-to-end time for `git push` sped up by 7.7x with the change, and this "blank" time is too fast for users to notice. Updating the progress could help in cases without pack.useSparse. Thanks, -Stolee
Re: [PATCH 01/19] revision.h: avoid bit fields in struct rev_info
On 5/8/2019 7:12 AM, Nguyễn Thái Ngọc Duy wrote: > Bitfield addresses cannot be passed around in a pointer. This makes it > hard to use parse-options to set/unset them. Turn this struct to > normal integers. This of course increases the size of this struct > multiple times, but since we only have a handful of rev_info variables > around, memory consumption is not at all a concern. I think you are right that this memory trade-off shouldn't be a problem. What worries me instead is that we are using an "internal" data structure for option parsing. Would it make more sense to create a struct for use in the parse_opts method and a method that translates those options into the bitfield in struct rev_info? Thanks, -Stolee
Re: [PATCH] commit-graph: fix memory leak
On 5/6/2019 5:58 PM, Emily Shaffer wrote: > Hi, > > This change looks good to me, and like good evidence for the benefits of > automated tooling :) Same here! Keep up the great work here. -Stolee
Re: [PATCH v3 0/6] Create commit-graph file format v2
On 5/6/2019 4:27 AM, Christian Couder wrote: > On Fri, May 3, 2019 at 3:44 PM Ævar Arnfjörð Bjarmason > wrote: >> 1) We can stat() the "commit-graphs" directory to see if there's any >>new/deleted ones (dir mtime changed), similar to what we do for the >>untracked cache, and can (but I don't think we do...) do for packs/* >>and objects/??/. >> >>As opposed to ".git/objects/info" itself which e.g. has the "packs", >>"alternates" etc. files (can still do it, but more false positives) > > About incremental commit-graph files and alternates, I wonder if they > could work well together. The main use case would be for servers that > use a common repo for all the forks. We use a "shared object cache" in VFS for Git, implemented as an alternate, so all enlistments share prefetch packs, multi-pack-indexes, and commit-graph files. This is something we are very focused on. Thanks, -Stolee
Re: [PATCH v3 0/6] Create commit-graph file format v2
On 5/3/2019 10:16 AM, SZEDER Gábor wrote: > On Fri, May 03, 2019 at 08:47:25AM -0400, Derrick Stolee wrote: >> It would be much simpler to restrict the model. Your idea of changing >> the file name is the inspiration here. >> >> * The "commit-graph" file is the base commit graph. It is always >> closed under reachability (if a commit exists in this file, then >> its parents are also in this file). We will also consider this to >> be "commit-graph-0". >> >> * A commit-graph- exists, then we check for the existence of >> commit-graph-. This file can contain commits whose parents >> are in any smaller file. >> >> I think this resolves the issue of back-compat without updating >> the file format: >> >> 1. Old clients will never look at commit-graph-N, so they will >>never complain about an "incomplete" file. >> >> 2. If we always open a read handle as we move up the list, then >>a "merge and collapse" write to commit-graph-N will not >>interrupt an existing process reading that file. > > What if a process reading the commit-graph files runs short on file > descriptors and has to close some of them, while a second process is > merging commit-graph files? We will want to keep the number small so we never recycle the file handles. Instead, we will keep them open for the entire process. The strategies for creating these graphs should include a "merge" strategy that keeps the number of commit-graph files very small (fewer than 5 should be sufficient). >> I'll start hacking on this model. > > Have fun! :) > > > Semi-related, but I'm curious: what are your plans for 'struct > commit's 'graph_pos' field, and how will it work with multiple > commit-graph files? Since we have a predefined "sequence" of graphs, the graph_pos will be the position in the "meta-order" given by concatenating the commit lists from each commit-graph. We then navigate to a commit in O(num graphs) instead of O(1). In the commit-graph format, we will use this "meta-order" number to refer to parent positions. > In particular: currently we use this 'graph_pos' field as an index > into the Commit Data chunk to find the metadata associated with a > given commit object. But we could add any commit-specific metadata in > a new chunk, being an array ordered by commit OID, and then use > 'graph_pos' as an index into this chunk as well. I find this quite > convenient. However, with mulitple commit-graph files there will be > multiple arrays... Yes, this will continue to be useful*. To find the position inside a specific commit-graph-N file, take the graph_pos and subtract the number of commits in the "lower" commit-graph files. * For example, this meta-data information is necessary for the Bloom filter data [1]. Thanks, -Stolee [1] https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5...@gmail.com/
Re: [PATCH v3 0/6] Create commit-graph file format v2
On 5/2/2019 2:02 PM, Ævar Arnfjörð Bjarmason wrote: > > But those are separate from any back-compat concerns, which is what I > think makes sense to focus on now. Thinking more on this topic, I think I have a way to satisfy _all_ of your concerns by simplifying the plan for incremental commit-graph files. My initial plan was to have the "commit-graph" file always be the "tip" file, and it would point to some number of "graph-{hash}.graph" files. Then, we would have some set of strategies to decide when we create a new .graph file or when we compact the files down into the "commit-graph" file. This has several issues regarding race conditions that I had not yet resolved (and maybe would always have problems). It would be much simpler to restrict the model. Your idea of changing the file name is the inspiration here. * The "commit-graph" file is the base commit graph. It is always closed under reachability (if a commit exists in this file, then its parents are also in this file). We will also consider this to be "commit-graph-0". * A commit-graph- exists, then we check for the existence of commit-graph-. This file can contain commits whose parents are in any smaller file. I think this resolves the issue of back-compat without updating the file format: 1. Old clients will never look at commit-graph-N, so they will never complain about an "incomplete" file. 2. If we always open a read handle as we move up the list, then a "merge and collapse" write to commit-graph-N will not interrupt an existing process reading that file. I'll start hacking on this model. Thanks, -Stolee
Re: [PATCH v3 0/6] Create commit-graph file format v2
On 5/1/2019 4:25 PM, Ævar Arnfjörð Bjarmason wrote: > I won't repeat my outstanding v2 feedback about v1 & v2 > incompatibilities, except to say that I'd in principle be fine with > having a v2 format the way this series is adding it. I.e. saying "here > it is, it's never written by default, we'll figure out these compat > issues later". > > My only objection/nit on that point would be that the current > docs/commit messages should make some mention of the really bad > interactions between v1 and v2 on different git versions. Good idea to add some warnings in the docs to say something like "version 2 is not supported by Git 2.2x and earlier". > However, having written this again I really don't understand why we need > a v2 of this format at all. [snip] > How about we instead just don't change the header? I.e.: > > * Let's just live with "1" as the marker for SHA-1. > >Yeah it would be cute to use 0x73686131 instead like "struct >git_hash_algo", but we can live with a 1=0x73686131 ("sha1"), >2=0x73323536 ("s256") mapping somewhere. It's not like we're going to >be running into the 255 limit of hash algorithms Git will support any >time soon. This was the intended direction of using the 1-byte value before, but we have a preferred plan to use the 4-byte value in all future file formats. > * Don't add the reachability index version *to the header* or change >the reserved byte to be an error (see [1] again). Since we can make the "corrected commit date" offset for a commit be strictly larger than the offset of a parent, we can make it so an old client will not give incorrect values when we use the new values. The only downside would be that we would fail on 'git commit-graph verify' since the offsets are not actually generation numbers in all cases. > Instead we just add these things to new "chunks" as appropriate. As this > patch of mine shows we can easily do that, and it doesn't error out on > any existing version of git: > https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95 I like the idea of a "metadata" chunk. This can be useful for a lot of things. If we start the chunk with a "number of items" and only append items to the list, we can dynamically grow the chunk as we add values. > I now can't imagine a situation where we'd ever need to change the > format. We have 32 bits of chunk ids to play with, and can have 255 of > these chunks at a time, and unknown chunks are ignored by existing > versions and future version. The solutions you have discussed work for 2 of the 3 problems at hand. The incremental file format is a high-value feature, but _would_ break existing clients if they don't understand the extra data. Unless I am missing something for how to succeed here. > 1. See feedback on the v2 patch in >https://public-inbox.org/git/87muk6q98k@evledraar.gmail.com/ My response [2] to that message includes the discussion of the incremental file format. [2] https://public-inbox.org/git/87muk6q98k@evledraar.gmail.com/
Re: [PATCH v3 6/6] commit-graph: remove Future Work section
On 5/1/2019 10:58 AM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote: > >> The commit-graph feature began with a long list of planned >> benefits, most of which are now complete. The future work >> section has only a few items left. >> >> As for making more algorithms aware of generation numbers, >> some are only waiting for generation number v2 to ensure the >> performance matches the existing behavior using commit date. >> >> It is unlikely that we will ever send a commit-graph file >> as part of the protocol, since we would need to verify the >> data, and that is as expensive as writing a commit-graph from >> scratch. If we want to start trusting remote content, then >> that item can be investigated again. > > My best of 3 times for "write" followed by "verify" on linux.git are > 8.7/7.9 real/user for "write" and 5.2/4.9 real/user for "write". > > So that's a reduction of ~40%. I have another big in-house repo where I > get similar numbers of 17/16 for "write" and 10/9 for "verify". Both for > a commit-graph file on the order of 50MB where it would be quicker for > me to download and verify it if the protocol supported it. Keep in mind that your first "write" may have warmed up the file system and your pack-files parsed faster the second time around. You are right though, 'verify' doesn't do these things: 1. Sort a list of OIDs. 2. Write a file. And perhaps some other things. I should mean that "the main task of 'git commit-graph verify' is to parse commits from the object store, and this is the most expensive operation in 'git commit-graph write'." Thanks, -Stolee
Re: [PATCH v3 5/6] commit-graph: implement file format version 2
On 5/1/2019 3:12 PM, Ævar Arnfjörð Bjarmason wrote: > > OK, let's try that then, on top of this series: > > diff --git a/commit-graph.c b/commit-graph.c > index 5eebba6a0f..36c8cdb950 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1127,7 +1127,7 @@ int write_commit_graph(const char *obj_dir, > case 2: > hashwrite_u8(f, num_chunks); > hashwrite_u8(f, 1); /* reachability index version */ > - hashwrite_u8(f, 0); /* unused padding byte */ > + hashwrite_u8(f, 1); /* unused padding byte */ > > Then: > > $ ~/g/git/git --exec-path=$PWD commit-graph write --version=2; > ~/g/git/git --exec-path=$PWD status > Expanding reachable commits in commit graph: 100% (201645/201645), done. > Computing commit graph generation numbers: 100% (200556/200556), done. > error: unsupported value in commit-graph header > HEAD detached at pr-112/derrickstolee/graph/v2-head-v3 > > So we'll error out in the same way as if "2.0" was changed to "3.0" with > this "2.1" change, just with a more cryptic error message on this new v2 > code. > > I don't see how this is meaningfully different from just bumping the > version to "3". We abort parsing the graph just like with major version > changes. Having a non-zero value here doesn't really mean "2.1" or "3". But I understand your apprehension. I'm currently working on building the incremental file format, based on this series. This "unused" byte will be used to say "how many base commit-graph files does this graph have?" If non-zero, we do not currently understand how to stitch these files together into a "combined" graph at run time, so we should fail. If we should never have an intermediate version of Git that doesn't understand this byte, then this series can wait until that feature is ready. Thanks, -Stolee
[PATCH v3 6/6] commit-graph: remove Future Work section
From: Derrick Stolee The commit-graph feature began with a long list of planned benefits, most of which are now complete. The future work section has only a few items left. As for making more algorithms aware of generation numbers, some are only waiting for generation number v2 to ensure the performance matches the existing behavior using commit date. It is unlikely that we will ever send a commit-graph file as part of the protocol, since we would need to verify the data, and that is as expensive as writing a commit-graph from scratch. If we want to start trusting remote content, then that item can be investigated again. While there is more work to be done on the feature, having a section of the docs devoted to a TODO list is wasteful and hard to keep up-to-date. Signed-off-by: Derrick Stolee --- Documentation/technical/commit-graph.txt | 17 - 1 file changed, 17 deletions(-) diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 7805b0968c..fb53341d5e 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt @@ -127,23 +127,6 @@ Design Details helpful for these clones, anyway. The commit-graph will not be read or written when shallow commits are present. -Future Work - -- After computing and storing generation numbers, we must make graph - walks aware of generation numbers to gain the performance benefits they - enable. This will mostly be accomplished by swapping a commit-date-ordered - priority queue with one ordered by generation number. The following - operations are important candidates: - -- 'log --topo-order' -- 'tag --merged' - -- A server could provide a commit-graph file as part of the network protocol - to avoid extra calculations by clients. This feature is only of benefit if - the user is willing to trust the file, because verifying the file is correct - is as hard as computing it from scratch. - Related Links - [0] https://bugs.chromium.org/p/git/issues/detail?id=8 -- gitgitgadget
[PATCH v3 2/6] commit-graph: collapse parameters into flags
From: Derrick Stolee The write_commit_graph() and write_commit_graph_reachable() methods currently take two boolean parameters: 'append' and 'report_progress'. We will soon expand the possible options to send to these methods, so instead of complicating the parameter list, first simplify it. Collapse these parameters into a 'flags' parameter, and adjust the callers to provide flags as necessary. Signed-off-by: Derrick Stolee --- builtin/commit-graph.c | 8 +--- builtin/commit.c | 2 +- builtin/gc.c | 4 ++-- commit-graph.c | 9 + commit-graph.h | 8 +--- 5 files changed, 18 insertions(+), 13 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 2e86251f02..828b1a713f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv) struct string_list *commit_hex = NULL; struct string_list lines; int result; + int flags = COMMIT_GRAPH_PROGRESS; static struct option builtin_commit_graph_write_options[] = { OPT_STRING(0, "object-dir", &opts.obj_dir, @@ -166,11 +167,13 @@ static int graph_write(int argc, const char **argv) die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs")); if (!opts.obj_dir) opts.obj_dir = get_object_directory(); + if (opts.append) + flags |= COMMIT_GRAPH_APPEND; read_replace_refs = 0; if (opts.reachable) - return write_commit_graph_reachable(opts.obj_dir, opts.append, 1); + return write_commit_graph_reachable(opts.obj_dir, flags); string_list_init(&lines, 0); if (opts.stdin_packs || opts.stdin_commits) { @@ -190,8 +193,7 @@ static int graph_write(int argc, const char **argv) result = write_commit_graph(opts.obj_dir, pack_indexes, commit_hex, - opts.append, - 1); + flags); UNLEAK(lines); return result; diff --git a/builtin/commit.c b/builtin/commit.c index b9ea7222fa..b001ef565d 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) "not exceeded, and then \"git reset HEAD\" to recover.")); if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) && - write_commit_graph_reachable(get_object_directory(), 0, 0)) + write_commit_graph_reachable(get_object_directory(), 0)) return 1; repo_rerere(the_repository, 0); diff --git a/builtin/gc.c b/builtin/gc.c index 3984addf73..df2573f124 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -665,8 +665,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix) } if (gc_write_commit_graph && - write_commit_graph_reachable(get_object_directory(), 0, -!quiet && !daemonized)) + write_commit_graph_reachable(get_object_directory(), +!quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0)) return 1; if (auto_gc && too_many_loose_objects()) diff --git a/commit-graph.c b/commit-graph.c index ee487a364b..8bbd50658c 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -851,15 +851,14 @@ static int add_ref_to_list(const char *refname, return 0; } -int write_commit_graph_reachable(const char *obj_dir, int append, -int report_progress) +int write_commit_graph_reachable(const char *obj_dir, unsigned int flags) { struct string_list list = STRING_LIST_INIT_DUP; int result; for_each_ref(add_ref_to_list, &list); result = write_commit_graph(obj_dir, NULL, &list, - append, report_progress); + flags); string_list_clear(&list, 0); return result; @@ -868,7 +867,7 @@ int write_commit_graph_reachable(const char *obj_dir, int append, int write_commit_graph(const char *obj_dir, struct string_list *pack_indexes, struct string_list *commit_hex, - int append, int report_progress) + unsigned int flags) { struct packed_oid_list oids; struct packed_commit_list commits; @@ -887,6 +886,8 @@ int write_commit_graph(const char *obj_dir, struct strbuf progress_title = STRBUF_INIT; unsigned long approx_nr_objects; int res = 0; + int append = flags & COMMIT_GRAPH_APPEND; + int report_progress = flags & COMMIT_GRAPH_PROG
[PATCH v3 5/6] commit-graph: implement file format version 2
From: Derrick Stolee The commit-graph file format had some shortcomings which we now correct: 1. The hash algorithm was determined by a single byte, instead of the 4-byte format identifier. 2. There was no way to update the reachability index we used. We currently only support generation numbers, but that will change in the future. 3. Git did not fail with error if the unused eighth byte was non-zero, so we could not use that to indicate an incremental file format without breaking compatibility across versions. The new format modifies the header of the commit-graph to solve these problems. We use the 4-byte hash format id, freeing up a byte in our 32-bit alignment to introduce a reachability index version. We can also fail to read the commit-graph if the eighth byte is non-zero. Update the 'git commit-graph read' subcommand to display the new data, and check this output in the test that explicitly writes a v2 commit-graph file. While we converted the existing 'verify' tests to use a version 1 file to avoid recalculating data offsets, add explicit 'verify' tests on a version 2 file that corrupt the new header values. Signed-off-by: Derrick Stolee --- .../technical/commit-graph-format.txt | 26 ++- builtin/commit-graph.c| 5 ++ commit-graph.c| 39 +- t/t5318-commit-graph.sh | 73 +-- 4 files changed, 134 insertions(+), 9 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 16452a0504..e367aa94b1 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -31,13 +31,22 @@ and hash type. All 4-byte numbers are in network order. +There are two versions available, 1 and 2. These currently differ only in +the header. + HEADER: +All commit-graph files use the first five bytes for the same purpose. + 4-byte signature: The signature is: {'C', 'G', 'P', 'H'} 1-byte version number: - Currently, the only valid version is 1. + Currently, the valid version numbers are 1 and 2. + +The remainder of the header changes depending on the version. + +Version 1: 1-byte Hash Version (1 = SHA-1) We infer the hash length (H) from this value. @@ -47,6 +56,21 @@ HEADER: 1-byte (reserved for later use) Current clients should ignore this value. +Version 2: + + 1-byte number (C) of "chunks" + + 1-byte reachability index version number: + Currently, the only valid number is 1. + + 1-byte (reserved for later use) + Current clients expect this value to be zero, and will not + try to read the commit-graph file if it is non-zero. + + 4-byte format identifier for the hash algorithm: + If this identifier does not agree with the repository's current + hash algorithm, then the client will not read the commit graph. + CHUNK LOOKUP: (C + 1) * 12 bytes listing the table of contents for the chunks: diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index e766dd076e..7df6688b08 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -117,6 +117,11 @@ static int graph_read(int argc, const char **argv) *(unsigned char*)(graph->data + 5), *(unsigned char*)(graph->data + 6), *(unsigned char*)(graph->data + 7)); + + if (*(unsigned char *)(graph->data + 4) == 2) + printf("hash algorithm: %X\n", + get_be32(graph->data + 8)); + printf("num_commits: %u\n", graph->num_commits); printf("chunks:"); diff --git a/commit-graph.c b/commit-graph.c index b6f09f1be2..5eebba6a0f 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -152,7 +152,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, uint64_t last_chunk_offset; uint32_t last_chunk_id; uint32_t graph_signature; - unsigned char graph_version, hash_version; + unsigned char graph_version, hash_version, reach_index_version; + uint32_t hash_id; if (!graph_map) return NULL; @@ -170,7 +171,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, } graph_version = *(unsigned char*)(data + 4); - if (graph_version != 1) { + if (!graph_version || graph_version > 2) { error(_("commit-graph version %X does not match version %X"), graph_version, 1); return NULL; @@ -190,6 +191,30 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, graph->num_chunks = *(unsigned char*)(data + 6); chunk_lookup = data + 8; break; +
[PATCH v3 0/6] Create commit-graph file format v2
The commit-graph file format has some shortcomings that were discussed on-list: 1. It doesn't use the 4-byte format ID from the_hash_algo. 2. There is no way to change the reachability index from generation numbers to corrected commit date [1]. 3. The unused byte in the format could be used to signal the file is incremental, but current clients ignore the value even if it is non-zero. This series adds a new version (2) to the commit-graph file. The fifth byte already specified the file format, so existing clients will gracefully respond to files with a different version number. The only real change now is that the header takes 12 bytes instead of 8, due to using the 4-byte format ID for the hash algorithm. The new bytes reserved for the reachability index version and incremental file formats are now expected to be equal to the defaults. When we update these values to be flexible in the future, if a client understands commit-graph v2 but not those new values, then it will fail gracefully. NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts were significant and subtle. Updates in V3: Thanks for all the feedback so far! * Moved the version information into an unsigned char parameter, instead of a flag. * We no longer default to the v2 file format, as that will break users who downgrade. This required some changes to the test script. * Removed the "Future work" section from the commit-graph design document in a new patch. * I did not change the file name for v2 file formats, as Ævar suggested. I'd like the discussion to continue on this topic. Thanks, -Stolee [1] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/ Derrick Stolee (6): commit-graph: return with errors during write commit-graph: collapse parameters into flags commit-graph: create new version parameter commit-graph: add --version= option commit-graph: implement file format version 2 commit-graph: remove Future Work section Documentation/git-commit-graph.txt| 3 + .../technical/commit-graph-format.txt | 26 ++- Documentation/technical/commit-graph.txt | 17 -- builtin/commit-graph.c| 33 ++-- builtin/commit.c | 5 +- builtin/gc.c | 8 +- commit-graph.c| 153 +- commit-graph.h| 16 +- t/t5318-commit-graph.sh | 75 - 9 files changed, 250 insertions(+), 86 deletions(-) base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v3 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-112/derrickstolee/graph/v2-head-v3 Pull-Request: https://github.com/gitgitgadget/git/pull/112 Range-diff vs v2: 1: 91f300ec0a = 1: 91f300ec0a commit-graph: return with errors during write 2: 04f5df1135 ! 2: 924b22f990 commit-graph: collapse parameters into flags @@ -86,7 +86,7 @@ -int write_commit_graph_reachable(const char *obj_dir, int append, - int report_progress) -+int write_commit_graph_reachable(const char *obj_dir, int flags) ++int write_commit_graph_reachable(const char *obj_dir, unsigned int flags) { struct string_list list = STRING_LIST_INIT_DUP; int result; @@ -103,7 +103,7 @@ struct string_list *pack_indexes, struct string_list *commit_hex, -int append, int report_progress) -+int flags) ++unsigned int flags) { struct packed_oid_list oids; struct packed_commit_list commits; @@ -129,12 +129,12 @@ +#define COMMIT_GRAPH_APPEND (1 << 0) +#define COMMIT_GRAPH_PROGRESS (1 << 1) + -+int write_commit_graph_reachable(const char *obj_dir, int flags); ++int write_commit_graph_reachable(const char *obj_dir, unsigned int flags); int write_commit_graph(const char *obj_dir, struct string_list *pack_indexes, struct string_list *commit_hex, -int append, int report_progress); -+int flags); ++unsigned int flags); int verify_commit_graph(struct repository *r, struct commit_graph *g); 3: 4ddb829163 ! 3: 8446011a43 commit-graph: create new version flags @@ -1,12 +1,12 @@ Author: Derrick Stolee -commit-graph: create new version flags +commit-graph: create new version parameter In anticipation of a new commit-graph file format version, create -a flag for the write_commit_graph()
[PATCH v3 4/6] commit-graph: add --version= option
From: Derrick Stolee Allow the commit-graph builtin to specify the file format version using the '--version=' option. Specify the version exactly in the verification tests as using a different version would change the offsets used in those tests. Signed-off-by: Derrick Stolee --- Documentation/git-commit-graph.txt | 3 +++ builtin/commit-graph.c | 13 - t/t5318-commit-graph.sh| 2 +- 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 624470e198..1d1cc70de4 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -51,6 +51,9 @@ or `--stdin-packs`.) + With the `--append` option, include all commits that are present in the existing commit-graph file. ++ +With the `--version=` option, specify the file format version. Used +only for testing. 'read':: diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 7d9185dfc2..e766dd076e 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph [--object-dir ]"), N_("git commit-graph read [--object-dir ]"), N_("git commit-graph verify [--object-dir ]"), - N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=]"), NULL }; @@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = { }; static const char * const builtin_commit_graph_write_usage[] = { - N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=]"), NULL }; @@ -35,6 +35,7 @@ static struct opts_commit_graph { int stdin_packs; int stdin_commits; int append; + int version; } opts; @@ -156,6 +157,8 @@ static int graph_write(int argc, const char **argv) N_("start walk at commits listed by stdin")), OPT_BOOL(0, "append", &opts.append, N_("include all commits already in the commit-graph file")), + OPT_INTEGER(0, "version", &opts.version, + N_("specify the file format version")), OPT_END(), }; @@ -168,12 +171,12 @@ static int graph_write(int argc, const char **argv) if (!opts.obj_dir) opts.obj_dir = get_object_directory(); if (opts.append) - flags |= COMMIT_GRAPH_APPEND; + flags |= COMMIT_GRAPH_APPEND; read_replace_refs = 0; if (opts.reachable) - return write_commit_graph_reachable(opts.obj_dir, flags, 0); + return write_commit_graph_reachable(opts.obj_dir, flags, opts.version); string_list_init(&lines, 0); if (opts.stdin_packs || opts.stdin_commits) { @@ -193,7 +196,7 @@ static int graph_write(int argc, const char **argv) result = write_commit_graph(opts.obj_dir, pack_indexes, commit_hex, - flags, 0); + flags, opts.version); UNLEAK(lines); return result; diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index e80c1cac02..4eb5a09ef3 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -328,7 +328,7 @@ test_expect_success 'replace-objects invalidates commit-graph' ' test_expect_success 'git commit-graph verify' ' cd "$TRASH_DIRECTORY/full" && - git rev-parse commits/8 | git commit-graph write --stdin-commits && + git rev-parse commits/8 | git commit-graph write --stdin-commits --version=1 && git commit-graph verify >output ' -- gitgitgadget
[PATCH v3 3/6] commit-graph: create new version parameter
From: Derrick Stolee In anticipation of a new commit-graph file format version, create a parameter for the write_commit_graph() and write_commit_graph_reachable() methods to take a version number. When the given version is zero, the implementation selects a default value. Currently, the only valid value is 1. The file format will change the header information, so place the existing header logic inside a switch statement with only one case. Signed-off-by: Derrick Stolee --- builtin/commit-graph.c | 4 +-- builtin/commit.c | 2 +- builtin/gc.c | 3 +- commit-graph.c | 63 +++--- commit-graph.h | 6 ++-- 5 files changed, 50 insertions(+), 28 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 828b1a713f..7d9185dfc2 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -173,7 +173,7 @@ static int graph_write(int argc, const char **argv) read_replace_refs = 0; if (opts.reachable) - return write_commit_graph_reachable(opts.obj_dir, flags); + return write_commit_graph_reachable(opts.obj_dir, flags, 0); string_list_init(&lines, 0); if (opts.stdin_packs || opts.stdin_commits) { @@ -193,7 +193,7 @@ static int graph_write(int argc, const char **argv) result = write_commit_graph(opts.obj_dir, pack_indexes, commit_hex, - flags); + flags, 0); UNLEAK(lines); return result; diff --git a/builtin/commit.c b/builtin/commit.c index b001ef565d..b9ea7222fa 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) "not exceeded, and then \"git reset HEAD\" to recover.")); if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) && - write_commit_graph_reachable(get_object_directory(), 0)) + write_commit_graph_reachable(get_object_directory(), 0, 0)) return 1; repo_rerere(the_repository, 0); diff --git a/builtin/gc.c b/builtin/gc.c index df2573f124..41637242b1 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -666,7 +666,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix) if (gc_write_commit_graph && write_commit_graph_reachable(get_object_directory(), -!quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0)) +!quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0, +0)) return 1; if (auto_gc && too_many_loose_objects()) diff --git a/commit-graph.c b/commit-graph.c index 8bbd50658c..b6f09f1be2 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -25,9 +25,6 @@ #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) -#define GRAPH_VERSION_1 0x1 -#define GRAPH_VERSION GRAPH_VERSION_1 - #define GRAPH_EXTRA_EDGES_NEEDED 0x8000 #define GRAPH_EDGE_LAST_MASK 0x7fff #define GRAPH_PARENT_NONE 0x7000 @@ -173,30 +170,35 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, } graph_version = *(unsigned char*)(data + 4); - if (graph_version != GRAPH_VERSION) { + if (graph_version != 1) { error(_("commit-graph version %X does not match version %X"), - graph_version, GRAPH_VERSION); - return NULL; - } - - hash_version = *(unsigned char*)(data + 5); - if (hash_version != oid_version()) { - error(_("commit-graph hash version %X does not match version %X"), - hash_version, oid_version()); + graph_version, 1); return NULL; } graph = alloc_commit_graph(); + switch (graph_version) { + case 1: + hash_version = *(unsigned char*)(data + 5); + if (hash_version != oid_version()) { + error(_("commit-graph hash version %X does not match version %X"), + hash_version, oid_version()); + return NULL; + } + + graph->num_chunks = *(unsigned char*)(data + 6); + chunk_lookup = data + 8; + break; + } + graph->hash_len = the_hash_algo->rawsz; - graph->num_chunks = *(unsigned char*)(data + 6); graph->graph_fd = fd; graph->data = graph_map; graph->data_len = graph_size; last_chunk_id = 0; last_chunk_offset = 8; - chunk_lookup = data + 8; for (i = 0; i < graph->num_chunks; i++) { uint32_t chunk_id;
[PATCH v3 1/6] commit-graph: return with errors during write
From: Derrick Stolee The write_commit_graph() method uses die() to report failure and exit when confronted with an unexpected condition. This use of die() in a library function is incorrect and is now replaced by error() statements and an int return type. Now that we use 'goto cleanup' to jump to the terminal condition on an error, we have new paths that could lead to uninitialized values. New initializers are added to correct for this. The builtins 'commit-graph', 'gc', and 'commit' call these methods, so update them to check the return value. Signed-off-by: Derrick Stolee --- builtin/commit-graph.c | 19 +++-- builtin/commit.c | 5 ++-- builtin/gc.c | 7 ++--- commit-graph.c | 60 +- commit-graph.h | 10 +++ 5 files changed, 62 insertions(+), 39 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 537fdfd0f0..2e86251f02 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv) struct string_list *pack_indexes = NULL; struct string_list *commit_hex = NULL; struct string_list lines; + int result; static struct option builtin_commit_graph_write_options[] = { OPT_STRING(0, "object-dir", &opts.obj_dir, @@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv) read_replace_refs = 0; - if (opts.reachable) { - write_commit_graph_reachable(opts.obj_dir, opts.append, 1); - return 0; - } + if (opts.reachable) + return write_commit_graph_reachable(opts.obj_dir, opts.append, 1); string_list_init(&lines, 0); if (opts.stdin_packs || opts.stdin_commits) { @@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv) UNLEAK(buf); } - write_commit_graph(opts.obj_dir, - pack_indexes, - commit_hex, - opts.append, - 1); + result = write_commit_graph(opts.obj_dir, + pack_indexes, + commit_hex, + opts.append, + 1); UNLEAK(lines); - return 0; + return result; } int cmd_commit_graph(int argc, const char **argv, const char *prefix) diff --git a/builtin/commit.c b/builtin/commit.c index 2986553d5f..b9ea7222fa 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix) "new_index file. Check that disk is not full and quota is\n" "not exceeded, and then \"git reset HEAD\" to recover.")); - if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0)) - write_commit_graph_reachable(get_object_directory(), 0, 0); + if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) && + write_commit_graph_reachable(get_object_directory(), 0, 0)) + return 1; repo_rerere(the_repository, 0); run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); diff --git a/builtin/gc.c b/builtin/gc.c index 020f725acc..3984addf73 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix) clean_pack_garbage(); } - if (gc_write_commit_graph) - write_commit_graph_reachable(get_object_directory(), 0, -!quiet && !daemonized); + if (gc_write_commit_graph && + write_commit_graph_reachable(get_object_directory(), 0, +!quiet && !daemonized)) + return 1; if (auto_gc && too_many_loose_objects()) warning(_("There are too many unreachable loose objects; " diff --git a/commit-graph.c b/commit-graph.c index 66865acbd7..ee487a364b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -851,27 +851,30 @@ static int add_ref_to_list(const char *refname, return 0; } -void write_commit_graph_reachable(const char *obj_dir, int append, - int report_progress) +int write_commit_graph_reachable(const char *obj_dir, int append, +int report_progress) { struct string_list list = STRING_LIST_INIT_DUP; + int result; for_each_ref(add_ref_to_list, &list); - write_commit_graph(obj_dir, NULL, &list, append, report_progress); + result = write_commit_graph(obj_dir, NULL, &list, + append, report_progress); string_li
Re: [PATCH 3/6] config.c: add repo_config_set_worktree_gently()
On 12/27/2018 10:56 AM, Nguyễn Thái Ngọc Duy wrote: > diff --git a/config.h b/config.h > index ee5d3fa7b4..62204dc252 100644 > --- a/config.h > +++ b/config.h > @@ -103,6 +103,9 @@ extern int git_config_color(char *, const char *, const > char *); > extern int git_config_set_in_file_gently(const char *, const char *, const > char *); > extern void git_config_set_in_file(const char *, const char *, const char *); > extern int git_config_set_gently(const char *, const char *); > +extern int repo_config_set_gently(struct repository *, const char *, const > char *); > +extern void repo_config_set(struct repository *, const char *, const char *); > +extern int repo_config_set_worktree_gently(struct repository *, const char > *, const char *); > extern void git_config_set(const char *, const char *); > extern int git_config_parse_key(const char *, char **, int *); > extern int git_config_key_is_valid(const char *key); I know this is an old thread, but the patch is still in 'pu'. These methods do not appear to have any callers. Perhaps this patch should be dropped until there is a reason to include it? Thanks, -Stolee
[PATCH 0/2] Multi-pack-index: Fix "too many file descriptors" bug
Thanks to Jeff H for finding the problem with the multi-pack-index regarding many packs. Specifically: if we open too many packs, the close_one_pack() method cannot find the packs from the multi-pack-index to close. Jeff already fixed the problem explicitly in 'git multi-pack-index verify' which would hit this issue 100% of the time we had 2000+ packs. This issue could still happen in 'git rev-list --all --objects' if there are enough packs containing commits and trees. This series fixes the issue. The basic solution is to add packs from the multi-pack-index into the packed_git struct as they are opened. To avoid performance issues, add a multi_pack_index bit to the packed_git struct. Midx-aware algorithms can then ignore those packs. There was a very subtle issue that happens during a 'git repack': we clear the multi-pack-index after possibly reading some packs from it, thus leaving some packs in the packed_git struct but having a NULL multi_pack_index in the object store. This informs the change to close_midx(). I'm based on a recent 'master' commit that contains the following three branches due to nearby changes causing conflicts if I pick only Jeff's change as a base: jh/midx-verify-too-many-packs jk/server-info-rabbit-hole bc/hash-transition-16 Thanks, -Stolee Derrick Stolee (2): midx: pass a repository pointer midx: add packs to packed_git linked list builtin/multi-pack-index.c | 2 +- builtin/pack-objects.c | 2 +- midx.c | 42 +- midx.h | 7 --- object-store.h | 9 ++-- packfile.c | 30 --- sha1-name.c| 6 ++ 7 files changed, 51 insertions(+), 47 deletions(-) base-commit: 83232e38648b51abbcbdb56c94632b6906cc85a6 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-182%2Fderrickstolee%2Fmany-packs-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-182/derrickstolee/many-packs-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/182 -- gitgitgadget
[PATCH 1/2] midx: pass a repository pointer
From: Derrick Stolee Much of the multi-pack-index code focuses on the multi_pack_index struct, and so we only pass a pointer to the current one. However, we will insert a dependency on the packed_git linked list in a future change, so we will need a repository reference. Inserting these parameters is a significant enough change to split out. Signed-off-by: Derrick Stolee --- builtin/multi-pack-index.c | 2 +- builtin/pack-objects.c | 2 +- midx.c | 22 ++ midx.h | 7 --- packfile.c | 4 ++-- 5 files changed, 22 insertions(+), 15 deletions(-) diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index ae6e476ad5..72dfd3dadc 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -46,7 +46,7 @@ int cmd_multi_pack_index(int argc, const char **argv, if (!strcmp(argv[0], "write")) return write_midx_file(opts.object_dir); if (!strcmp(argv[0], "verify")) - return verify_midx_file(opts.object_dir); + return verify_midx_file(the_repository, opts.object_dir); die(_("unrecognized verb: %s"), argv[0]); } diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 2d9a3bdc9d..e606b9ef03 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1078,7 +1078,7 @@ static int want_object_in_pack(const struct object_id *oid, for (m = get_multi_pack_index(the_repository); m; m = m->next) { struct pack_entry e; - if (fill_midx_entry(oid, &e, m)) { + if (fill_midx_entry(the_repository, oid, &e, m)) { struct packed_git *p = e.p; off_t offset; diff --git a/midx.c b/midx.c index d5d2e9522f..8b8faec35a 100644 --- a/midx.c +++ b/midx.c @@ -201,7 +201,7 @@ void close_midx(struct multi_pack_index *m) FREE_AND_NULL(m->pack_names); } -int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id) +int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id) { struct strbuf pack_name = STRBUF_INIT; @@ -261,7 +261,10 @@ static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos) return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH); } -static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry *e, uint32_t pos) +static int nth_midxed_pack_entry(struct repository *r, +struct multi_pack_index *m, +struct pack_entry *e, +uint32_t pos) { uint32_t pack_int_id; struct packed_git *p; @@ -271,7 +274,7 @@ static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry * pack_int_id = nth_midxed_pack_int_id(m, pos); - if (prepare_midx_pack(m, pack_int_id)) + if (prepare_midx_pack(r, m, pack_int_id)) die(_("error preparing packfile from multi-pack-index")); p = m->packs[pack_int_id]; @@ -301,14 +304,17 @@ static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry * return 1; } -int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct multi_pack_index *m) +int fill_midx_entry(struct repository * r, + const struct object_id *oid, + struct pack_entry *e, + struct multi_pack_index *m) { uint32_t pos; if (!bsearch_midx(oid, m, &pos)) return 0; - return nth_midxed_pack_entry(m, e, pos); + return nth_midxed_pack_entry(r, m, e, pos); } /* Match "foo.idx" against either "foo.pack" _or_ "foo.idx". */ @@ -1020,7 +1026,7 @@ static int compare_pair_pos_vs_id(const void *_a, const void *_b) display_progress(progress, _n); \ } while (0) -int verify_midx_file(const char *object_dir) +int verify_midx_file(struct repository *r, const char *object_dir) { struct pair_pos_vs_id *pairs = NULL; uint32_t i; @@ -1034,7 +1040,7 @@ int verify_midx_file(const char *object_dir) progress = start_progress(_("Looking for referenced packfiles"), m->num_packs); for (i = 0; i < m->num_packs; i++) { - if (prepare_midx_pack(m, i)) + if (prepare_midx_pack(r, m, i)) midx_report("failed to load pack in position %d", i); display_progress(progress, i + 1); @@ -1099,7 +1105,7 @@ int verify_midx_file(const char *object_dir) nth_midxed_object_oid(&oid, m, pairs[i].pos); - if (!fill_midx_entry(&oid, &e, m)) { + if (!fill_midx_entry(r, &oid, &
[PATCH 2/2] midx: add packs to packed_git linked list
From: Derrick Stolee The multi-pack-index allows searching for objects across multiple packs using one object list. The original design gains many of these performance benefits by keeping the packs in the multi-pack-index out of the packed_git list. Unfortunately, this has one major drawback. If the multi-pack-index covers thousands of packs, and a command loads many of those packs, then we can hit the limit for open file descriptors. The close_one_pack() method is used to limit this resource, but it only looks at the packed_git list, and uses an LRU cache to prevent thrashing. Instead of complicating this close_one_pack() logic to include direct references to the multi-pack-index, simply add the packs opened by the multi-pack-index to the packed_git list. This immediately solves the file-descriptor limit problem, but requires some extra steps to avoid performance issues or other problems: 1. Create a multi_pack_index bit in the packed_git struct that is one if and only if the pack was loaded from a multi-pack-index. 2. Skip packs with the multi_pack_index bit when doing object lookups and abbreviations. These algorithms already check the multi-pack-index before the packed_git struct. This has a very small performance hit, as we need to walk more packed_git structs. This is acceptable, since these operations run binary search on the other packs, so this walk-and-ignore logic is very fast by comparison. 3. When closing a multi-pack-index file, do not close its packs, as those packs will be closed using close_all_packs(). In some cases, such as 'git repack', we run 'close_midx()' without also closing the packs, so we need to un-set the multi_pack_index bit in those packs. This is necessary, and caught by running t6501-freshen-objects.sh with GIT_TEST_MULTI_PACK_INDEX=1. To manually test this change, I inserted trace2 logging into close_pack_fd() and set pack_max_fds to 10, then ran 'git rev-list --all --objects' on a copy of the Git repo with 300+ pack-files and a multi-pack-index. The logs verified the packs are closed as we read them beyond the file descriptor limit. Signed-off-by: Derrick Stolee --- midx.c | 20 ++-- object-store.h | 9 ++--- packfile.c | 28 sha1-name.c| 6 ++ 4 files changed, 30 insertions(+), 33 deletions(-) diff --git a/midx.c b/midx.c index 8b8faec35a..e7e1fe4d65 100644 --- a/midx.c +++ b/midx.c @@ -192,10 +192,8 @@ void close_midx(struct multi_pack_index *m) m->fd = -1; for (i = 0; i < m->num_packs; i++) { - if (m->packs[i]) { - close_pack(m->packs[i]); - free(m->packs[i]); - } + if (m->packs[i]) + m->packs[i]->multi_pack_index = 0; } FREE_AND_NULL(m->packs); FREE_AND_NULL(m->pack_names); @@ -204,6 +202,7 @@ void close_midx(struct multi_pack_index *m) int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t pack_int_id) { struct strbuf pack_name = STRBUF_INIT; + struct packed_git *p; if (pack_int_id >= m->num_packs) die(_("bad pack-int-id: %u (%u total packs)"), @@ -215,9 +214,18 @@ int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, uint32_t strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir, m->pack_names[pack_int_id]); - m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, m->local); + p = add_packed_git(pack_name.buf, pack_name.len, m->local); strbuf_release(&pack_name); - return !m->packs[pack_int_id]; + + if (!p) + return 1; + + p->multi_pack_index = 1; + m->packs[pack_int_id] = p; + install_packed_git(r, p); + list_add_tail(&p->mru, &r->objects->packed_git_mru); + + return 0; } int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, uint32_t *result) diff --git a/object-store.h b/object-store.h index b086f5ecdb..7acbc7fffe 100644 --- a/object-store.h +++ b/object-store.h @@ -76,7 +76,8 @@ struct packed_git { pack_keep_in_core:1, freshened:1, do_not_close:1, -pack_promisor:1; +pack_promisor:1, +multi_pack_index:1; unsigned char hash[GIT_MAX_RAWSZ]; struct revindex_entry *revindex; /* something like ".git/objects/pack/x.pack" */ @@ -128,12 +129,6 @@ struct raw_object_store { /* A most-recently-used ordered version of the packed_git list. */ struct list_head packed_git_mru; - /* -* A linked list containing all packfiles, starting with those -* contained in the multi_
Re: [PATCH v3 4/8] commit-graph: don't early exit(1) on e.g. "git status"
On 4/27/2019 9:06 AM, Ævar Arnfjörð Bjarmason wrote: > > There's still cases left where we'll exit early, e.g. if you do: > > $ git diff -U1 > diff --git a/commit-graph.c b/commit-graph.c > index 66865acbd7..63773764ce 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -1074,3 +1074,3 @@ void write_commit_graph(const char *obj_dir, > chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE; > - chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr; > + chunk_offsets[2] = chunk_offsets[0] + hashsz * commits.nr; > chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr; > > Which is obviously bad, but something I encounterd while hacking up [1] > we'll still hard die as before this patch on: > > $ git status > fatal: invalid parent position 1734910766 > $ I really appreciate you digging in deep into these kinds of issues. You seem to be hitting corrupted commit-graph files more often than we are (in VFS for Git world). However, we should be _very careful_ when turning some of these errors to warnings. At the very least, we should do some high-level planning for how to handle this case. The biggest issue is that we have some logic that is run after a call to generation_numbers_enabled(), such as the `git rev-list --topo-order` logic, that relies on the commit-graph for correctness. If we output a warning and then stop using the commit-graph, then we will start having commits with finite generation pointing to commits with infinite generation. Perhaps, with some care, we can alert the algorithm to change the "minimum generation" that limits how far we dequeue the priority-queue. Changing it to zero will cause the algorithm to behave like the old algorithm. But, having an algorithm that usually takes 0.1 seconds suddenly take 10+ seconds also violates some expectations. Q: How should we handle a detectably-invalid commit-graph? I think most of your patches have done a good job so far of detecting an invalid header, and responding by ignoring the commit-graph. This case of a detectable error in the chunk data itself is not something we can check on the first load without serious performance issues. I hope we can decide on a good solution. Thanks, -Stolee
Re: [PATCH v2 0/5] Create commit-graph file format v2
On 4/26/2019 4:33 AM, Ævar Arnfjörð Bjarmason wrote: > > On Fri, Apr 26 2019, Junio C Hamano wrote: >> >> Thanks always for your careful review and thoughtful comments, by >> the way. I agree that these comments are extremely helpful. >>> Now as noted in my series we now on 'master' downgrade that to a warning >>> (along with the rest of the errors): >>> >>> $ ~/g/git/git --exec-path=$PWD status >>> error: commit-graph version 2 does not match version 1 >>> On branch master >>> [...] >>> >>> ...and this series sets the default version for all new graphs to v2. >>> The phrasing seems odd. It is unclear, even to me who is vaguely >> familiar with the word "commit-graph" and is aware of the fact that >> the file format is being updated, what >> >> "commit-graph version 2 does not match version 1" > > Yeah it should really say: > > "commit-graph is of version 2, our maximum supported version is 1" I agree this phrasing is better. Please see the patch I just submitted [1] to try and improve these messages. [1] https://public-inbox.org/git/pull.181.git.gitgitgad...@gmail.com/ > Hindsight is 20/20, but more generally I wonder if we should have these > format versions match that of the git version (unlikely to change it > twice in the same release...) which would allow us to say things like: > > "commit-graph needs v2.22.0 or later, we have the version written by > v2.18.0..v2.21.0" > > But of course dealing with those larger integers in the code/gaps is > also messy :) There are a couple issues with using the version numbers, from my perspective: 1. We don't do that anywhere else, like the index file. 2. The microsoft/git fork takes certain performance changes faster than core git, and frequently ships versions between major version updates. Our 2.17 had the commit-graph, for instance. It's also possible that we'd take commit-graph v2 earlier than the core Git major release. >> wants to say. Do I have version #2 on disk and the running binary >> only understands version #1? Or do I have version #1 on disk and >> the binary expected version #2? How would I get out of this >> situation? Is it sufficient to do "rm -f .git/info/commit-graph*" >> and is it safe? > > Yeah. An rm of .git/info/commit-graph is safe, so is "-c > core.commitGraph=false" as a workaround. That is true. I'm not sure the error message is the right place to describe the workaround. > I'd say "let's improve the error", but that ship has sailed, and we can > do better than an error here, no matter how it's phrased... > >>> I think this is *way* too aggressive of an upgrade path. If these >>> patches go into v2.22.0 then git clients on all older versions that grok >>> the commit graph (IIRC v2.18 and above) will have their git completely >>> broken if they're in a mixed-git-version environmen.> > I should note that "all older versions..." here is those that have > core.commitGraph=true set. More details in 43d3561805 ("commit-graph > write: don't die if the existing graph is corrupt", 2019-03-25). > >>> Is it really so important to move to v2 right away that we need to risk >>> those breakages? I think even with my ab/commit-graph-fixes it's still >>> too annoying (I was mostly trying to fix other stuff...). If only we >>> could detect "we should make a new graph now" >> >> True. You are right, this is too aggressive and I should have known better. I'll update in the next version to keep a default to v1. Not only do we have this downgrade risk, there is no actual benefit in this series alone. This only sets up the ability for other features. > Having slept on my earlier > https://public-inbox.org/git/87a7gdspo4@evledraar.gmail.com/ I think > I see a better way to deal with this than my earlier suggestion that we > perform some version flip-flop dance on the single "commit-graph" file: > > How about just writing .git/objects/info/commit-graph-v2, and for the > upcoming plan when where they'll be split have some dir/prefix there > where we include the version? > > That means that: > > 1. If there's an existing v1 "commit-graph" file we don't write a v2 at > that path in v2.22, although we might have some "write v1 (as well > as v2?) for old client compat" config where we opt-in to do that. > > 2. By default in v2.22 we read/write a "commit-graph-v2" file, > preferring it over the v1 "commit-graph", falling back on earlier > versions if it's not there (until gc --auto kicks in on v2.22 and > makes a v2 graph). > > 3. If you have concurrent v2.21 and v2.22 clients accessing the repo > you might end up generating one commit-graph or the other depending > on who happens to trigger "gc --auto". > > Hopefully that's a non-issue since an out-of-date graph isn't > usually a big deal, and client versions mostly march forward. But > v2.22 could also learn some "incremental gc" where it says "my v2 is > older, v1 client must h
[PATCH 0/1] commit-graph: improve error messages
Here is a small patch that revises the error messages from ab/commit-graph-fixes, as recommended by Ævar. Hopefully, it can be merged faster than the commit-graph v2 stuff, and I can update that series to include this change if we agree it is a good one. Thanks, -Stolee Cc: ava...@gmail.com In-Reply-To: 878svxrwsh@evledraar.gmail.com [878svxrwsh@evledraar.gmail.com] Derrick Stolee (1): commit-graph: improve error messages commit-graph.c | 10 +- t/t5318-commit-graph.sh | 6 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-181%2Fderrickstolee%2Fgraph%2Fmessage-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-181/derrickstolee/graph/message-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/181 -- gitgitgadget
[PATCH 1/1] commit-graph: improve error messages
From: Derrick Stolee The error messages when reading a commit-graph have a few problems: 1. Some values are output in hexadecimal, but that is not made clear by the message. Prepend "0x" to these values. 2. The version number does not need to be hexadecimal, and also should mention a "maximum supported version". This has one possible confusing case: we could have a corrupt commit-graph file with version number zero, so the output would be "commit-graph has version 0, our maximum supported version is 1" This will only happen with corrupt data. This error message is designed instead for the case where a client is downgraded after writing a newer file version. Update t5318-commit-graph.sh to watch for these new error messages. Reported-by: Ævar Arnfjörð Bjarmason Signed-off-by: Derrick Stolee --- commit-graph.c | 10 +- t/t5318-commit-graph.sh | 6 +++--- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index 66865acbd7..aba591913e 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -167,21 +167,21 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, graph_signature = get_be32(data); if (graph_signature != GRAPH_SIGNATURE) { - error(_("commit-graph signature %X does not match signature %X"), + error(_("commit-graph signature 0x%X does not match signature 0x%X"), graph_signature, GRAPH_SIGNATURE); return NULL; } graph_version = *(unsigned char*)(data + 4); if (graph_version != GRAPH_VERSION) { - error(_("commit-graph version %X does not match version %X"), + error(_("commit-graph has version %d, our maximum supported version is %d"), graph_version, GRAPH_VERSION); return NULL; } hash_version = *(unsigned char*)(data + 5); if (hash_version != oid_version()) { - error(_("commit-graph hash version %X does not match version %X"), + error(_("commit-graph has hash version %d, our maximum supported version is %d"), hash_version, oid_version()); return NULL; } @@ -215,7 +215,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH; if (chunk_offset > graph_size - the_hash_algo->rawsz) { - error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32), + error(_("commit-graph improper chunk offset 0x%08x%08x"), (uint32_t)(chunk_offset >> 32), (uint32_t)chunk_offset); free(graph); return NULL; @@ -252,7 +252,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, } if (chunk_repeated) { - error(_("commit-graph chunk id %08x appears multiple times"), chunk_id); + error(_("commit-graph chunk id 0x%08x appears multiple times"), chunk_id); free(graph); return NULL; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index e80c1cac02..264ebb15b1 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -420,17 +420,17 @@ test_expect_success 'detect too small' ' test_expect_success 'detect bad signature' ' corrupt_graph_and_verify 0 "\0" \ - "graph signature" + "commit-graph signature" ' test_expect_success 'detect bad version' ' corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \ - "graph version" + "commit-graph has version" ' test_expect_success 'detect bad hash version' ' corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \ - "hash version" + "commit-graph has hash version" ' test_expect_success 'detect low chunk count' ' -- gitgitgadget
Re: [PATCH] trace2: fix incorrect function pointer check
On 4/25/2019 1:08 PM, Josh Steadmon wrote: > Fix trace2_data_json_fl() to check for the presence of pfn_data_json_fl > in its targets, rather than pfn_data_fl, which is not actually called. [snip] > for_each_wanted_builtin (j, tgt_j) > - if (tgt_j->pfn_data_fl) > + if (tgt_j->pfn_data_json_fl) > tgt_j->pfn_data_json_fl(file, line, us_elapsed_absolute, Seems obviously correct. Thanks! -Stolee
Git Test Coverage Report (April 25)
4248706 61) warning(_("promisor remote name cannot begin with '/': %s"), 54248706 63) return NULL; 7bdf0926 93) previous->next = r->next; 7b6e1b04 108) return git_config_string(&core_partial_clone_filter_default, b21a55f3 139) return 0; dcc8b4e9 202) static int remove_fetched_oids(struct object_id **oids, int oid_nr, int to_free) dcc8b4e9 204) int i, missing_nr = 0; dcc8b4e9 205) int *missing = xcalloc(oid_nr, sizeof(*missing)); dcc8b4e9 206) struct object_id *old_oids = *oids; dcc8b4e9 208) int old_fetch_if_missing = fetch_if_missing; dcc8b4e9 210) fetch_if_missing = 0; dcc8b4e9 212) for (i = 0; i < oid_nr; i++) dcc8b4e9 213) if (oid_object_info_extended(the_repository, &old_oids[i], NULL, 0)) { dcc8b4e9 214) missing[i] = 1; dcc8b4e9 215) missing_nr++; dcc8b4e9 218) fetch_if_missing = old_fetch_if_missing; dcc8b4e9 220) if (missing_nr) { dcc8b4e9 221) int j = 0; dcc8b4e9 222) new_oids = xcalloc(missing_nr, sizeof(*new_oids)); dcc8b4e9 223) for (i = 0; i < oid_nr; i++) dcc8b4e9 224) if (missing[i]) dcc8b4e9 225) oidcpy(&new_oids[j++], &old_oids[i]); dcc8b4e9 226) *oids = new_oids; dcc8b4e9 227) if (to_free) dcc8b4e9 228) free(old_oids); dcc8b4e9 231) free(missing); dcc8b4e9 233) return missing_nr; dcc8b4e9 248) if (missing_nr == 1) dcc8b4e9 249) continue; dcc8b4e9 250) missing_nr = remove_fetched_oids(&missing_oids, missing_nr, to_free); dcc8b4e9 251) if (missing_nr) { dcc8b4e9 252) to_free = 1; dcc8b4e9 253) continue; dcc8b4e9 261) free(missing_oids); protocol.c 6da1f1a9 37) die(_("Unrecognized protocol version")); 6da1f1a9 39) die(_("Unrecognized protocol_version")); 6da1f1a9 74) BUG("late attempt to register an allowed protocol version"); ref-filter.c 3da20422 93) keydata_aka_refname ? keydata_aka_refname : k->wt->head_ref); remote-curl.c 6da1f1a9 345) return 0; upload-pack.c a8d662e3 130) return readsz; 820a5361 149) BUG("packfile_uris requires sideband-all"); a8d662e3 355) send_client_data(1, output_state.buffer, output_state.used); 820a5361 1386) string_list_clear(&data->uri_protocols, 0); wrapper.c 5efde212 70) die("Out of memory, malloc failed (tried to allocate %" PRIuMAX " bytes)", 5efde212 73) error("Out of memory, malloc failed (tried to allocate %" PRIuMAX " bytes)", Commits introducting uncovered code: Barret Rhoden dfb4ee12 blame: use a fingerprint heuristic to match ignored lines Barret Rhoden d0738d94 blame: add config options to handle output for ignored lines Barret Rhoden a5c91678 blame: add the ability to ignore commits and their changes Barret Rhoden 3486d8d4 Move init_skiplist() outside of fsck Christian Coudere265069a Use promisor_remote_get_direct() and has_promisor_remote() Christian Couder dcc8b4e9 promisor-remote: implement promisor_remote_get_direct() Christian Couderb21a55f3 promisor-remote: parse remote.*.partialclonefilter Christian Couder7b6e1b04 Move core_partial_clone_filter_default to promisor-remote.c Christian Couder7bdf0926 promisor-remote: use repository_format_partial_clone Christian Couder54248706 Add initial support for many promisor remotes Christian Couder0ba08c05 Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} Dan McGregorba81921a http: cast result to FILE * Derrick Stolee d5747681 commit-graph: create new version flags Derrick Stolee 5ff7cad9 commit-graph: return with errors during write Jonathan Tana8d662e3 upload-pack: refactor reading of pack-objects out Jonathan Tan820a5361 upload-pack: send part of packfile response as uri Jonathan Tanbf01639c fetch-pack: support more than one pack lockfile Jonathan Tan472fbef8 http-fetch: support fetching packfiles by URL Josh Steadmon 6da1f1a9 protocol: advertise multiple supported versions Junio C Hamano 07ca07bb Merge branch 'jt/fetch-cdn-offload' into pu Martin Koegler 5efde212 zlib.c: use size_t for size Nguyễn Thái Ngọc Duy8f7c7f55 config.c: add repo_config_set_worktree_gently() Nguyễn Thái Ngọc Duy6f11fd5e config: add --move-to Nguyễn Thái Ngọc Duya12c1ff3 config: factor out set_config_source_file() Nickolai Belakovski 3da20422 ref-filter: add worktreepath atom Uncovered code in 'jch' not in 'next' attr.c b0c2d613 1167) return 0; builtin/checkout.c 4de2081d 415) die(_("'%s' cannot be used with updating paths"), 4de2081d 425) die(_("'%s' cannot be used with %s"), a0f3bbfa 433) die(_("neither '%s' or '%s' is specified"), a0f3bbfa 437) die(_("'%s' must be used when '%s' is not specified"), 7800c9e6 442) die(_("'%s' or '%s' cannot be used with %s"), 7800c9e6 447) die(_("'%s' or '%s' cannot be used with %s&q
Re: [PATCH v2 3/5] commit-graph: create new version flags
On 4/25/2019 1:29 AM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" writes: > >> +int version = 0; >> ... >> +if (flags & COMMIT_GRAPH_VERSION_1) >> +version = 1; >> +if (!version) >> +version = 1; >> +if (version != 1) { >> +error(_("unsupported commit-graph version %d"), >> + version); >> +return 1; >> +} > > The above sequence had a certain "Huh?" factor before 5/5 introduced > the support for a later version that is in use by default. > > Is it sensible to define VERSION_$N as if they are independent bits > in a single flags variable? What does it mean for the flags variable > to have both GRAPH_VERSION_1 and GRAPH_VERSION_2 bits set? > > What I am getting at is if this is better done as a n-bit bitfield > that represents a small unsigned integer (e.g. "unsigned char" that > lets you play with up to 255 versions, or "unsigned version : 3" > that limits you to up to 7 versions). > > You use an 8-bit byte in the file format anyway, so it might not be > so bad to have a separate version parameter that is not mixed with > the flag bits, perhaps? This is a reasonable idea, as this is a "pick exactly one" option. It is still important to reduce the overall parameter count by combining the other boolean options into flags. Thanks, -Stolee
Re: [PATCH v5 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index
On 4/25/2019 1:38 AM, Junio C Hamano wrote: > Derrick Stolee writes: > >> Updates in V5: >> >> * Fixed the error in PATCH 7 due to a missing line that existed in PATCH 8. >> Thanks, Josh Steadmon! >> >> * The 'repack' subcommand now computes the "expected size" of a pack instead >> of >> relying on the total size of the pack. This is actually really important to >> the way VFS for Git uses prefetch packs, and some packs are not being >> repacked because the pack size is larger than the batch size, but really >> there are only a few referenced objects. >> >> * The 'repack' subcommand now allows a batch size of zero to mean "create one >> pack containing all objects in the multi-pack-index". A new commit adds a >> test that hits the boundary cases here, but follows the 'expire' subcommand >> so we can show that cycle of repack-then-expire to safely replace the >> packs. > > I guess all of them need to tweak the authorship from the gmail > address to the work address on the Signed-off-by: trailer, which I > can do (as I noticed it before applying). Sorry. Due to the conflicts, GitGitGadget prevented me from submitting in my normal way, so I pulled out format-patch and send-email for the first time in a very long time. I manually added new "From: " lines in the bodies of the patch files, but they got suppressed, I guess. Thanks, -Stolee
[PATCH v2 4/5] commit-graph: add --version= option
From: Derrick Stolee Allow the commit-graph builtin to specify the file format version using the '--version=' option. Specify the version exactly in the verification tests as using a different version would change the offsets used in those tests. Signed-off-by: Derrick Stolee --- Documentation/git-commit-graph.txt | 3 +++ builtin/commit-graph.c | 13 +++-- t/t5318-commit-graph.sh| 2 +- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/Documentation/git-commit-graph.txt b/Documentation/git-commit-graph.txt index 624470e198..1d1cc70de4 100644 --- a/Documentation/git-commit-graph.txt +++ b/Documentation/git-commit-graph.txt @@ -51,6 +51,9 @@ or `--stdin-packs`.) + With the `--append` option, include all commits that are present in the existing commit-graph file. ++ +With the `--version=` option, specify the file format version. Used +only for testing. 'read':: diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 828b1a713f..65ceb7a141 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = { N_("git commit-graph [--object-dir ]"), N_("git commit-graph read [--object-dir ]"), N_("git commit-graph verify [--object-dir ]"), - N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=]"), NULL }; @@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = { }; static const char * const builtin_commit_graph_write_usage[] = { - N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits]"), + N_("git commit-graph write [--object-dir ] [--append] [--reachable|--stdin-packs|--stdin-commits] [--version=]"), NULL }; @@ -35,6 +35,7 @@ static struct opts_commit_graph { int stdin_packs; int stdin_commits; int append; + int version; } opts; @@ -156,6 +157,8 @@ static int graph_write(int argc, const char **argv) N_("start walk at commits listed by stdin")), OPT_BOOL(0, "append", &opts.append, N_("include all commits already in the commit-graph file")), + OPT_INTEGER(0, "version", &opts.version, + N_("specify the file format version")), OPT_END(), }; @@ -170,6 +173,12 @@ static int graph_write(int argc, const char **argv) if (opts.append) flags |= COMMIT_GRAPH_APPEND; + switch (opts.version) { + case 1: + flags |= COMMIT_GRAPH_VERSION_1; + break; + } + read_replace_refs = 0; if (opts.reachable) diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index e80c1cac02..4eb5a09ef3 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -328,7 +328,7 @@ test_expect_success 'replace-objects invalidates commit-graph' ' test_expect_success 'git commit-graph verify' ' cd "$TRASH_DIRECTORY/full" && - git rev-parse commits/8 | git commit-graph write --stdin-commits && + git rev-parse commits/8 | git commit-graph write --stdin-commits --version=1 && git commit-graph verify >output ' -- gitgitgadget
[PATCH v2 2/5] commit-graph: collapse parameters into flags
From: Derrick Stolee The write_commit_graph() and write_commit_graph_reachable() methods currently take two boolean parameters: 'append' and 'report_progress'. We will soon expand the possible options to send to these methods, so instead of complicating the parameter list, first simplify it. Collapse these parameters into a 'flags' parameter, and adjust the callers to provide flags as necessary. Signed-off-by: Derrick Stolee --- builtin/commit-graph.c | 8 +--- builtin/commit.c | 2 +- builtin/gc.c | 4 ++-- commit-graph.c | 9 + commit-graph.h | 8 +--- 5 files changed, 18 insertions(+), 13 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 2e86251f02..828b1a713f 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv) struct string_list *commit_hex = NULL; struct string_list lines; int result; + int flags = COMMIT_GRAPH_PROGRESS; static struct option builtin_commit_graph_write_options[] = { OPT_STRING(0, "object-dir", &opts.obj_dir, @@ -166,11 +167,13 @@ static int graph_write(int argc, const char **argv) die(_("use at most one of --reachable, --stdin-commits, or --stdin-packs")); if (!opts.obj_dir) opts.obj_dir = get_object_directory(); + if (opts.append) + flags |= COMMIT_GRAPH_APPEND; read_replace_refs = 0; if (opts.reachable) - return write_commit_graph_reachable(opts.obj_dir, opts.append, 1); + return write_commit_graph_reachable(opts.obj_dir, flags); string_list_init(&lines, 0); if (opts.stdin_packs || opts.stdin_commits) { @@ -190,8 +193,7 @@ static int graph_write(int argc, const char **argv) result = write_commit_graph(opts.obj_dir, pack_indexes, commit_hex, - opts.append, - 1); + flags); UNLEAK(lines); return result; diff --git a/builtin/commit.c b/builtin/commit.c index b9ea7222fa..b001ef565d 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char *prefix) "not exceeded, and then \"git reset HEAD\" to recover.")); if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) && - write_commit_graph_reachable(get_object_directory(), 0, 0)) + write_commit_graph_reachable(get_object_directory(), 0)) return 1; repo_rerere(the_repository, 0); diff --git a/builtin/gc.c b/builtin/gc.c index 3984addf73..df2573f124 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -665,8 +665,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix) } if (gc_write_commit_graph && - write_commit_graph_reachable(get_object_directory(), 0, -!quiet && !daemonized)) + write_commit_graph_reachable(get_object_directory(), +!quiet && !daemonized ? COMMIT_GRAPH_PROGRESS : 0)) return 1; if (auto_gc && too_many_loose_objects()) diff --git a/commit-graph.c b/commit-graph.c index ee487a364b..b16c71fd82 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -851,15 +851,14 @@ static int add_ref_to_list(const char *refname, return 0; } -int write_commit_graph_reachable(const char *obj_dir, int append, -int report_progress) +int write_commit_graph_reachable(const char *obj_dir, int flags) { struct string_list list = STRING_LIST_INIT_DUP; int result; for_each_ref(add_ref_to_list, &list); result = write_commit_graph(obj_dir, NULL, &list, - append, report_progress); + flags); string_list_clear(&list, 0); return result; @@ -868,7 +867,7 @@ int write_commit_graph_reachable(const char *obj_dir, int append, int write_commit_graph(const char *obj_dir, struct string_list *pack_indexes, struct string_list *commit_hex, - int append, int report_progress) + int flags) { struct packed_oid_list oids; struct packed_commit_list commits; @@ -887,6 +886,8 @@ int write_commit_graph(const char *obj_dir, struct strbuf progress_title = STRBUF_INIT; unsigned long approx_nr_objects; int res = 0; + int append = flags & COMMIT_GRAPH_APPEND; + int report_progress = flags & COMMIT_GRAPH_PROGRESS;
[PATCH v2 1/5] commit-graph: return with errors during write
From: Derrick Stolee The write_commit_graph() method uses die() to report failure and exit when confronted with an unexpected condition. This use of die() in a library function is incorrect and is now replaced by error() statements and an int return type. Now that we use 'goto cleanup' to jump to the terminal condition on an error, we have new paths that could lead to uninitialized values. New initializers are added to correct for this. The builtins 'commit-graph', 'gc', and 'commit' call these methods, so update them to check the return value. Signed-off-by: Derrick Stolee --- builtin/commit-graph.c | 19 +++-- builtin/commit.c | 5 ++-- builtin/gc.c | 7 ++--- commit-graph.c | 60 +- commit-graph.h | 10 +++ 5 files changed, 62 insertions(+), 39 deletions(-) diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 537fdfd0f0..2e86251f02 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv) struct string_list *pack_indexes = NULL; struct string_list *commit_hex = NULL; struct string_list lines; + int result; static struct option builtin_commit_graph_write_options[] = { OPT_STRING(0, "object-dir", &opts.obj_dir, @@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv) read_replace_refs = 0; - if (opts.reachable) { - write_commit_graph_reachable(opts.obj_dir, opts.append, 1); - return 0; - } + if (opts.reachable) + return write_commit_graph_reachable(opts.obj_dir, opts.append, 1); string_list_init(&lines, 0); if (opts.stdin_packs || opts.stdin_commits) { @@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv) UNLEAK(buf); } - write_commit_graph(opts.obj_dir, - pack_indexes, - commit_hex, - opts.append, - 1); + result = write_commit_graph(opts.obj_dir, + pack_indexes, + commit_hex, + opts.append, + 1); UNLEAK(lines); - return 0; + return result; } int cmd_commit_graph(int argc, const char **argv, const char *prefix) diff --git a/builtin/commit.c b/builtin/commit.c index 2986553d5f..b9ea7222fa 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char *prefix) "new_index file. Check that disk is not full and quota is\n" "not exceeded, and then \"git reset HEAD\" to recover.")); - if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0)) - write_commit_graph_reachable(get_object_directory(), 0, 0); + if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) && + write_commit_graph_reachable(get_object_directory(), 0, 0)) + return 1; repo_rerere(the_repository, 0); run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); diff --git a/builtin/gc.c b/builtin/gc.c index 020f725acc..3984addf73 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix) clean_pack_garbage(); } - if (gc_write_commit_graph) - write_commit_graph_reachable(get_object_directory(), 0, -!quiet && !daemonized); + if (gc_write_commit_graph && + write_commit_graph_reachable(get_object_directory(), 0, +!quiet && !daemonized)) + return 1; if (auto_gc && too_many_loose_objects()) warning(_("There are too many unreachable loose objects; " diff --git a/commit-graph.c b/commit-graph.c index 66865acbd7..ee487a364b 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -851,27 +851,30 @@ static int add_ref_to_list(const char *refname, return 0; } -void write_commit_graph_reachable(const char *obj_dir, int append, - int report_progress) +int write_commit_graph_reachable(const char *obj_dir, int append, +int report_progress) { struct string_list list = STRING_LIST_INIT_DUP; + int result; for_each_ref(add_ref_to_list, &list); - write_commit_graph(obj_dir, NULL, &list, append, report_progress); + result = write_commit_graph(obj_dir, NULL, &list, + append, report_progress); string_li
[PATCH v2 5/5] commit-graph: implement file format version 2
From: Derrick Stolee The commit-graph file format had some shortcomings which we now correct: 1. The hash algorithm was determined by a single byte, instead of the 4-byte format identifier. 2. There was no way to update the reachability index we used. We currently only support generation numbers, but that will change in the future. 3. Git did not fail with error if the unused eighth byte was non-zero, so we could not use that to indicate an incremental file format without breaking compatibility across versions. The new format modifies the header of the commit-graph to solve these problems. We use the 4-byte hash format id, freeing up a byte in our 32-bit alignment to introduce a reachability index version. We can also fail to read the commit-graph if the eighth byte is non-zero. Update the 'git commit-graph read' subcommand to display the new data. Set the default file format version to 2, and adjust the tests to expect the new 'git commit-graph read' output. Add explicit tests for the upgrade path from version 1 to 2. Users with an existing commit-graph with version 1 will seamlessly upgrade to version 2 on their next write. While we converted the existing 'verify' tests to use a version 1 file to avoid recalculating data offsets, add explicit 'verify' tests on a version 2 file that corrupt the new header values. Signed-off-by: Derrick Stolee --- .../technical/commit-graph-format.txt | 26 +++- builtin/commit-graph.c| 9 +++ commit-graph.c| 43 ++-- commit-graph.h| 1 + t/t5318-commit-graph.sh | 66 +-- 5 files changed, 134 insertions(+), 11 deletions(-) diff --git a/Documentation/technical/commit-graph-format.txt b/Documentation/technical/commit-graph-format.txt index 16452a0504..e367aa94b1 100644 --- a/Documentation/technical/commit-graph-format.txt +++ b/Documentation/technical/commit-graph-format.txt @@ -31,13 +31,22 @@ and hash type. All 4-byte numbers are in network order. +There are two versions available, 1 and 2. These currently differ only in +the header. + HEADER: +All commit-graph files use the first five bytes for the same purpose. + 4-byte signature: The signature is: {'C', 'G', 'P', 'H'} 1-byte version number: - Currently, the only valid version is 1. + Currently, the valid version numbers are 1 and 2. + +The remainder of the header changes depending on the version. + +Version 1: 1-byte Hash Version (1 = SHA-1) We infer the hash length (H) from this value. @@ -47,6 +56,21 @@ HEADER: 1-byte (reserved for later use) Current clients should ignore this value. +Version 2: + + 1-byte number (C) of "chunks" + + 1-byte reachability index version number: + Currently, the only valid number is 1. + + 1-byte (reserved for later use) + Current clients expect this value to be zero, and will not + try to read the commit-graph file if it is non-zero. + + 4-byte format identifier for the hash algorithm: + If this identifier does not agree with the repository's current + hash algorithm, then the client will not read the commit graph. + CHUNK LOOKUP: (C + 1) * 12 bytes listing the table of contents for the chunks: diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c index 65ceb7a141..1485b4daaf 100644 --- a/builtin/commit-graph.c +++ b/builtin/commit-graph.c @@ -117,6 +117,11 @@ static int graph_read(int argc, const char **argv) *(unsigned char*)(graph->data + 5), *(unsigned char*)(graph->data + 6), *(unsigned char*)(graph->data + 7)); + + if (*(unsigned char *)(graph->data + 4) == 2) + printf("hash algorithm: %X\n", + get_be32(graph->data + 8)); + printf("num_commits: %u\n", graph->num_commits); printf("chunks:"); @@ -177,6 +182,10 @@ static int graph_write(int argc, const char **argv) case 1: flags |= COMMIT_GRAPH_VERSION_1; break; + + case 2: + flags |= COMMIT_GRAPH_VERSION_2; + break; } read_replace_refs = 0; diff --git a/commit-graph.c b/commit-graph.c index e75e1655fb..14d6aebd99 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -152,7 +152,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, uint64_t last_chunk_offset; uint32_t last_chunk_id; uint32_t graph_signature; - unsigned char graph_version, hash_version; + unsigned char graph_version, hash_version, reach_index_version; + uint32_t hash_id; if (!graph_map) return NULL; @@ -170,7 +171,7 @@ struct commit_graph *parse_commit_graph(void *gr
[PATCH v2 3/5] commit-graph: create new version flags
From: Derrick Stolee In anticipation of a new commit-graph file format version, create a flag for the write_commit_graph() and write_commit_graph_reachable() methods to take a version number. When there is no specified version, the implementation selects a default value. Currently, the only valid value is 1. The file format will change the header information, so place the existing header logic inside a switch statement with only one case. Signed-off-by: Derrick Stolee --- commit-graph.c | 58 +- commit-graph.h | 1 + 2 files changed, 40 insertions(+), 19 deletions(-) diff --git a/commit-graph.c b/commit-graph.c index b16c71fd82..e75e1655fb 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -25,9 +25,6 @@ #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16) -#define GRAPH_VERSION_1 0x1 -#define GRAPH_VERSION GRAPH_VERSION_1 - #define GRAPH_EXTRA_EDGES_NEEDED 0x8000 #define GRAPH_EDGE_LAST_MASK 0x7fff #define GRAPH_PARENT_NONE 0x7000 @@ -173,30 +170,35 @@ struct commit_graph *parse_commit_graph(void *graph_map, int fd, } graph_version = *(unsigned char*)(data + 4); - if (graph_version != GRAPH_VERSION) { + if (graph_version != 1) { error(_("commit-graph version %X does not match version %X"), - graph_version, GRAPH_VERSION); - return NULL; - } - - hash_version = *(unsigned char*)(data + 5); - if (hash_version != oid_version()) { - error(_("commit-graph hash version %X does not match version %X"), - hash_version, oid_version()); + graph_version, 1); return NULL; } graph = alloc_commit_graph(); + switch (graph_version) { + case 1: + hash_version = *(unsigned char*)(data + 5); + if (hash_version != oid_version()) { + error(_("commit-graph hash version %X does not match version %X"), + hash_version, oid_version()); + return NULL; + } + + graph->num_chunks = *(unsigned char*)(data + 6); + chunk_lookup = data + 8; + break; + } + graph->hash_len = the_hash_algo->rawsz; - graph->num_chunks = *(unsigned char*)(data + 6); graph->graph_fd = fd; graph->data = graph_map; graph->data_len = graph_size; last_chunk_id = 0; last_chunk_offset = 8; - chunk_lookup = data + 8; for (i = 0; i < graph->num_chunks; i++) { uint32_t chunk_id; uint64_t chunk_offset; @@ -888,10 +890,22 @@ int write_commit_graph(const char *obj_dir, int res = 0; int append = flags & COMMIT_GRAPH_APPEND; int report_progress = flags & COMMIT_GRAPH_PROGRESS; + int version = 0; + int header_size = 0; if (!commit_graph_compatible(the_repository)) return 0; + if (flags & COMMIT_GRAPH_VERSION_1) + version = 1; + if (!version) + version = 1; + if (version != 1) { + error(_("unsupported commit-graph version %d"), + version); + return 1; + } + oids.nr = 0; approx_nr_objects = approximate_object_count(); oids.alloc = approx_nr_objects / 32; @@ -1076,10 +1090,16 @@ int write_commit_graph(const char *obj_dir, hashwrite_be32(f, GRAPH_SIGNATURE); - hashwrite_u8(f, GRAPH_VERSION); - hashwrite_u8(f, oid_version()); - hashwrite_u8(f, num_chunks); - hashwrite_u8(f, 0); /* unused padding byte */ + hashwrite_u8(f, version); + + switch (version) { + case 1: + hashwrite_u8(f, oid_version()); + hashwrite_u8(f, num_chunks); + hashwrite_u8(f, 0); /* unused padding byte */ + header_size = 8; + break; + } chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT; chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP; @@ -1090,7 +1110,7 @@ int write_commit_graph(const char *obj_dir, chunk_ids[3] = 0; chunk_ids[4] = 0; - chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; + chunk_offsets[0] = header_size + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH; chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE; chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr; chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr; diff --git a/commit-graph.h b/commit-graph.h index 390474047c..d7cd13deb3 100644 --- a/commit-graph.h +++ b/commit-graph.h @@ -67,6 +67,7 @@ int generation_numbers_enabled(struct repository *r); #define COMMIT_GRAPH_APPEND (1 << 0) #
[PATCH v2 0/5] Create commit-graph file format v2
The commit-graph file format has some shortcomings that were discussed on-list: 1. It doesn't use the 4-byte format ID from the_hash_algo. 2. There is no way to change the reachability index from generation numbers to corrected commit date [1]. 3. The unused byte in the format could be used to signal the file is incremental, but current clients ignore the value even if it is non-zero. This series adds a new version (2) to the commit-graph file. The fifth byte already specified the file format, so existing clients will gracefully respond to files with a different version number. The only real change now is that the header takes 12 bytes instead of 8, due to using the 4-byte format ID for the hash algorithm. The new bytes reserved for the reachability index version and incremental file formats are now expected to be equal to the defaults. When we update these values to be flexible in the future, if a client understands commit-graph v2 but not those new values, then it will fail gracefully. NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts were significant and subtle. Thanks, -Stolee [1] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/ Derrick Stolee (5): commit-graph: return with errors during write commit-graph: collapse parameters into flags commit-graph: create new version flags commit-graph: add --version= option commit-graph: implement file format version 2 Documentation/git-commit-graph.txt| 3 + .../technical/commit-graph-format.txt | 26 ++- builtin/commit-graph.c| 43 +++-- builtin/commit.c | 5 +- builtin/gc.c | 7 +- commit-graph.c| 156 +- commit-graph.h| 16 +- t/t5318-commit-graph.sh | 68 +++- 8 files changed, 254 insertions(+), 70 deletions(-) base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-112/derrickstolee/graph/v2-head-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/112 Range-diff vs v1: 1: e72498d0c5 ! 1: 91f300ec0a commit-graph: return with errors during write @@ -82,8 +82,8 @@ --- a/builtin/gc.c +++ b/builtin/gc.c @@ - if (pack_garbage.nr > 0) clean_pack_garbage(); + } - if (gc_write_commit_graph) - write_commit_graph_reachable(get_object_directory(), 0, @@ -182,9 +182,9 @@ } stop_progress(&progress); -- if (count_distinct >= GRAPH_PARENT_MISSING) +- if (count_distinct >= GRAPH_EDGE_LAST_MASK) - die(_("the commit graph format cannot write %d commits"), count_distinct); -+ if (count_distinct >= GRAPH_PARENT_MISSING) { ++ if (count_distinct >= GRAPH_EDGE_LAST_MASK) { + error(_("the commit graph format cannot write %d commits"), count_distinct); + res = 1; + goto cleanup; @@ -196,9 +196,9 @@ num_chunks = num_extra_edges ? 4 : 3; stop_progress(&progress); -- if (commits.nr >= GRAPH_PARENT_MISSING) +- if (commits.nr >= GRAPH_EDGE_LAST_MASK) - die(_("too many commits to write graph")); -+ if (commits.nr >= GRAPH_PARENT_MISSING) { ++ if (commits.nr >= GRAPH_EDGE_LAST_MASK) { + error(_("too many commits to write graph")); + res = 1; + goto cleanup; 2: 43a40d0c43 ! 2: 04f5df1135 commit-graph: collapse parameters into flags @@ -66,7 +66,7 @@ --- a/builtin/gc.c +++ b/builtin/gc.c @@ - clean_pack_garbage(); + } if (gc_write_commit_graph && - write_commit_graph_reachable(get_object_directory(), 0, 3: 39319e36bc ! 3: 4ddb829163 commit-graph: create new version flags @@ -25,25 +25,25 @@ -#define GRAPH_VERSION GRAPH_VERSION_1 - #define GRAPH_EXTRA_EDGES_NEEDED 0x8000 - #define GRAPH_PARENT_MISSING 0x7fff #define GRAPH_EDGE_LAST_MASK 0x7fff + #define GRAPH_PARENT_NONE 0x7000 @@ } graph_version = *(unsigned char*)(data + 4); - if (graph_version != GRAPH_VERSION) { + if (graph_version != 1) { - error(_("graph version %X does not match version %X"), + error(_("commit-graph version %X does not match version %X"), - graph_version, GRAPH_VERSION); -- goto cleanup_fail; +- return NULL; - } - - hash_version = *(unsigned char
[PATCH v5 08/11] midx: implement midx_repack()
To repack with a non-zero batch-size, first sort all pack-files by their modified time. Second, walk those pack-files from oldest to newest, compute their expected size, and add the packs to a list if they are smaller than the given batch-size. Stop when the total expected size is at least the batch size. If the batch size is zero, select all packs in the multi-pack-index. Finally, collect the objects from the multi-pack-index that are in the selected packs and send them to 'git pack-objects'. Write a new multi-pack-index that includes the new pack. Using a batch size of zero is very similar to a standard 'git repack' command, except that we do not delete the old packs and instead rely on the new multi-pack-index to prevent new processes from reading the old packs. This does not disrupt other Git processes that are currently reading the old packs based on the old multi-pack-index. While first designing a 'git multi-pack-index repack' operation, I started by collecting the batches based on the actual size of the objects instead of the size of the pack-files. This allows repacking a large pack-file that has very few referencd objects. However, this came at a significant cost of parsing pack-files instead of simply reading the multi-pack-index and getting the file information for the pack-files. The "expected size" version provides similar behavior, but could skip a pack-file if the average object size is much larger than the actual size of the referenced objects, or can create a large pack if the actual size of the referenced objects is larger than the expected size. Signed-off-by: Derrick Stolee --- midx.c | 150 +++- t/t5319-multi-pack-index.sh | 28 +++ 2 files changed, 177 insertions(+), 1 deletion(-) diff --git a/midx.c b/midx.c index 768a7dff73..01c6a05732 100644 --- a/midx.c +++ b/midx.c @@ -8,6 +8,7 @@ #include "sha1-lookup.h" #include "midx.h" #include "progress.h" +#include "run-command.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -1113,7 +1114,154 @@ int expire_midx_packs(const char *object_dir) return result; } -int midx_repack(const char *object_dir, size_t batch_size) +struct repack_info { + timestamp_t mtime; + uint32_t referenced_objects; + uint32_t pack_int_id; +}; + +static int compare_by_mtime(const void *a_, const void *b_) { + const struct repack_info *a, *b; + + a = (const struct repack_info *)a_; + b = (const struct repack_info *)b_; + + if (a->mtime < b->mtime) + return -1; + if (a->mtime > b->mtime) + return 1; + return 0; +} + +static int fill_included_packs_all(struct multi_pack_index *m, + unsigned char *include_pack) +{ + uint32_t i; + + for (i = 0; i < m->num_packs; i++) + include_pack[i] = 1; + + return m->num_packs < 2; +} + +static int fill_included_packs_batch(struct multi_pack_index *m, +unsigned char *include_pack, +size_t batch_size) +{ + uint32_t i, packs_to_repack; + size_t total_size; + struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info)); + + for (i = 0; i < m->num_packs; i++) { + pack_info[i].pack_int_id = i; + + if (prepare_midx_pack(m, i)) + continue; + + pack_info[i].mtime = m->packs[i]->mtime; + } + + for (i = 0; batch_size && i < m->num_objects; i++) { + uint32_t pack_int_id = nth_midxed_pack_int_id(m, i); + pack_info[pack_int_id].referenced_objects++; + } + + QSORT(pack_info, m->num_packs, compare_by_mtime); + + total_size = 0; + packs_to_repack = 0; + for (i = 0; total_size < batch_size && i < m->num_packs; i++) { + int pack_int_id = pack_info[i].pack_int_id; + struct packed_git *p = m->packs[pack_int_id]; + size_t expected_size; + + if (!p) + continue; + if (open_pack_index(p) || !p->num_objects) + continue; + + expected_size = (size_t)(p->pack_size +* pack_info[i].referenced_objects); + expected_size /= p->num_objects; + + if (expected_size >= batch_size) + continue; + + packs_to_repack++; + total_size += expected_size; + include_pack[pack_int_id] = 1; + } + + free(pack_info); + + if (total_size < batch_size || packs_to_repack < 2) + return 1; + return
[PATCH v5 01/11] repack: refactor pack deletion for future use
The repack builtin deletes redundant pack-files and their associated .idx, .promisor, .bitmap, and .keep files. We will want to re-use this logic in the future for other types of repack, so pull the logic into 'unlink_pack_path()' in packfile.c. The 'ignore_keep' parameter is enabled for the use in repack, but will be important for a future caller. Signed-off-by: Derrick Stolee --- builtin/repack.c | 14 ++ packfile.c | 28 packfile.h | 7 +++ 3 files changed, 37 insertions(+), 12 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index 45583683ee..3d445b34b4 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -129,19 +129,9 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list, static void remove_redundant_pack(const char *dir_name, const char *base_name) { - const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"}; - int i; struct strbuf buf = STRBUF_INIT; - size_t plen; - - strbuf_addf(&buf, "%s/%s", dir_name, base_name); - plen = buf.len; - - for (i = 0; i < ARRAY_SIZE(exts); i++) { - strbuf_setlen(&buf, plen); - strbuf_addstr(&buf, exts[i]); - unlink(buf.buf); - } + strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name); + unlink_pack_path(buf.buf, 1); strbuf_release(&buf); } diff --git a/packfile.c b/packfile.c index d1e6683ffe..bacecb4d0d 100644 --- a/packfile.c +++ b/packfile.c @@ -352,6 +352,34 @@ void close_all_packs(struct raw_object_store *o) } } +void unlink_pack_path(const char *pack_name, int force_delete) +{ + static const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"}; + int i; + struct strbuf buf = STRBUF_INIT; + size_t plen; + + strbuf_addstr(&buf, pack_name); + strip_suffix_mem(buf.buf, &buf.len, ".pack"); + plen = buf.len; + + if (!force_delete) { + strbuf_addstr(&buf, ".keep"); + if (!access(buf.buf, F_OK)) { + strbuf_release(&buf); + return; + } + } + + for (i = 0; i < ARRAY_SIZE(exts); i++) { + strbuf_setlen(&buf, plen); + strbuf_addstr(&buf, exts[i]); + unlink(buf.buf); + } + + strbuf_release(&buf); +} + /* * The LRU pack is the one with the oldest MRU window, preferring packs * with no used windows, or the oldest mtime if it has no windows allocated. diff --git a/packfile.h b/packfile.h index 6c4037605d..5b7bcdb1dd 100644 --- a/packfile.h +++ b/packfile.h @@ -86,6 +86,13 @@ extern void unuse_pack(struct pack_window **); extern void clear_delta_base_cache(void); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); +/* + * Unlink the .pack and associated extension files. + * Does not unlink if 'force_delete' is false and the pack-file is + * marked as ".keep". + */ +extern void unlink_pack_path(const char *pack_name, int force_delete); + /* * Make sure that a pointer access into an mmap'd index file is within bounds, * and can provide at least 8 bytes of data. -- 2.21.0.1096.g1c91fdc207
[PATCH v5 02/11] Docs: rearrange subcommands for multi-pack-index
We will add new subcommands to the multi-pack-index, and that will make the documentation a bit messier. Clean up the 'verb' descriptions by renaming the concept to 'subcommand' and removing the reference to the object directory. Helped-by: Stefan Beller Helped-by: Szeder Gábor Signed-off-by: Derrick Stolee --- Documentation/git-multi-pack-index.txt | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index f7778a2c85..1af406aca2 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes SYNOPSIS [verse] -'git multi-pack-index' [--object-dir=] +'git multi-pack-index' [--object-dir=] DESCRIPTION --- @@ -23,13 +23,13 @@ OPTIONS `/packs/multi-pack-index` for the current MIDX file, and `/packs` for the pack-files to index. +The following subcommands are available: + write:: - When given as the verb, write a new MIDX file to - `/packs/multi-pack-index`. + Write a new MIDX file. verify:: - When given as the verb, verify the contents of the MIDX file - at `/packs/multi-pack-index`. + Verify the contents of the MIDX file. EXAMPLES -- 2.21.0.1096.g1c91fdc207
[PATCH v5 06/11] multi-pack-index: implement 'expire' subcommand
The 'git multi-pack-index expire' subcommand looks at the existing mult-pack-index, counts the number of objects referenced in each pack-file, deletes the pack-fils with no referenced objects, and rewrites the multi-pack-index to no longer reference those packs. Refactor the write_midx_file() method to call write_midx_internal() which now takes an existing 'struct multi_pack_index' and a list of pack-files to drop (as specified by the names of their pack- indexes). As we write the new multi-pack-index, we drop those file names from the list of known pack-files. The expire_midx_packs() method removes the unreferenced pack-files after carefully closing the packs to avoid open handles. Test that a new pack-file that covers the contents of two other pack-files leads to those pack-files being deleted during the expire subcommand. Be sure to read the multi-pack-index to ensure it no longer references those packs. Signed-off-by: Derrick Stolee --- midx.c | 120 +--- t/t5319-multi-pack-index.sh | 20 ++ 2 files changed, 130 insertions(+), 10 deletions(-) diff --git a/midx.c b/midx.c index 95c39106b2..299e9b2e8f 100644 --- a/midx.c +++ b/midx.c @@ -33,6 +33,8 @@ #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t)) #define MIDX_LARGE_OFFSET_NEEDED 0x8000 +#define PACK_EXPIRED UINT_MAX + static char *get_midx_filename(const char *object_dir) { return xstrfmt("%s/pack/multi-pack-index", object_dir); @@ -381,6 +383,7 @@ struct pack_info { uint32_t orig_pack_int_id; char *pack_name; struct packed_git *p; + unsigned expired : 1; }; static int pack_info_compare(const void *_a, const void *_b) @@ -428,6 +431,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, packs->info[packs->nr].pack_name = xstrdup(file_name); packs->info[packs->nr].orig_pack_int_id = packs->nr; + packs->info[packs->nr].expired = 0; packs->nr++; } } @@ -587,13 +591,17 @@ static size_t write_midx_pack_names(struct hashfile *f, size_t written = 0; for (i = 0; i < num_packs; i++) { - size_t writelen = strlen(info[i].pack_name) + 1; + size_t writelen; + + if (info[i].expired) + continue; if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0) BUG("incorrect pack-file order: %s before %s", info[i - 1].pack_name, info[i].pack_name); + writelen = strlen(info[i].pack_name) + 1; hashwrite(f, info[i].pack_name, writelen); written += writelen; } @@ -675,6 +683,11 @@ static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_nee for (i = 0; i < nr_objects; i++) { struct pack_midx_entry *obj = list++; + if (perm[obj->pack_int_id] == PACK_EXPIRED) + BUG("object %s is in an expired pack with int-id %d", + oid_to_hex(&obj->oid), + obj->pack_int_id); + hashwrite_be32(f, perm[obj->pack_int_id]); if (large_offset_needed && obj->offset >> 31) @@ -721,7 +734,8 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off return written; } -int write_midx_file(const char *object_dir) +static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, + struct string_list *packs_to_drop) { unsigned char cur_chunk, num_chunks = 0; char *midx_name; @@ -737,6 +751,8 @@ int write_midx_file(const char *object_dir) struct pack_midx_entry *entries = NULL; int large_offsets_needed = 0; int pack_name_concat_len = 0; + int dropped_packs = 0; + int result = 0; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) { @@ -745,7 +761,10 @@ int write_midx_file(const char *object_dir) midx_name); } - packs.m = load_multi_pack_index(object_dir, 1); + if (m) + packs.m = m; + else + packs.m = load_multi_pack_index(object_dir, 1); packs.nr = 0; packs.alloc = packs.m ? packs.m->num_packs : 16; @@ -759,13 +778,14 @@ int write_midx_file(const char *object_dir) packs.info[packs.nr].orig_pack_int_id = i; packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]); packs.info[packs.nr].p = NULL; + packs.info[packs.nr].expired = 0; packs.nr++;
[PATCH v5 07/11] multi-pack-index: prepare 'repack' subcommand
In an environment where the multi-pack-index is useful, it is due to many pack-files and an inability to repack the object store into a single pack-file. However, it is likely that many of these pack-files are rather small, and could be repacked into a slightly larger pack-file without too much effort. It may also be important to ensure the object store is highly available and the repack operation does not interrupt concurrent git commands. Introduce a 'repack' subcommand to 'git multi-pack-index' that takes a '--batch-size' option. The subcommand will inspect the multi-pack-index for referenced pack-files whose size is smaller than the batch size, until collecting a list of pack-files whose sizes sum to larger than the batch size. Then, a new pack-file will be created containing the objects from those pack-files that are referenced by the multi-pack-index. The resulting pack is likely to actually be smaller than the batch size due to compression and the fact that there may be objects in the pack- files that have duplicate copies in other pack-files. The current change introduces the command-line arguments, and we add a test that ensures we parse these options properly. Since we specify a small batch size, we will guarantee that future implementations do not change the list of pack-files. In addition, we hard-code the modified times of the packs in the pack directory to ensure the list of packs sorted by modified time matches the order if sorted by size (ascending). This will be important in a future test. Signed-off-by: Derrick Stolee --- Documentation/git-multi-pack-index.txt | 17 + builtin/multi-pack-index.c | 12 ++-- midx.c | 5 + midx.h | 1 + t/t5319-multi-pack-index.sh| 20 +++- 5 files changed, 52 insertions(+), 3 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 6186c4c936..233b2b7862 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -36,6 +36,23 @@ expire:: have no objects referenced by the MIDX. Rewrite the MIDX file afterward to remove all references to these pack-files. +repack:: + Create a new pack-file containing objects in small pack-files + referenced by the multi-pack-index. If the size given by the + `--batch-size=` argument is zero, then create a pack + containing all objects referenced by the multi-pack-index. For + a non-zero batch size, Select the pack-files by examining packs + from oldest-to-newest, computing the "expected size" by counting + the number of objects in the pack referenced by the + multi-pack-index, then divide by the total number of objects in + the pack and multiply by the pack size. We select packs with + expected size below the batch size until the set of packs have + total expected size at least the batch size. If the total size + does not reach the batch size, then do nothing. If a new pack- + file is created, rewrite the multi-pack-index to reference the + new pack-file. A later run of 'git multi-pack-index expire' will + delete the pack-files that were part of this batch. + EXAMPLES diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 145de3a46c..c66239de33 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -5,12 +5,13 @@ #include "midx.h" static char const * const builtin_multi_pack_index_usage[] = { - N_("git multi-pack-index [--object-dir=] (write|verify|expire)"), + N_("git multi-pack-index [--object-dir=] (write|verify|expire|repack --batch-size=)"), NULL }; static struct opts_multi_pack_index { const char *object_dir; + unsigned long batch_size; } opts; int cmd_multi_pack_index(int argc, const char **argv, @@ -19,6 +20,8 @@ int cmd_multi_pack_index(int argc, const char **argv, static struct option builtin_multi_pack_index_options[] = { OPT_FILENAME(0, "object-dir", &opts.object_dir, N_("object directory containing set of packfile and pack-index pairs")), + OPT_MAGNITUDE(0, "batch-size", &opts.batch_size, + N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")), OPT_END(), }; @@ -40,6 +43,11 @@ int cmd_multi_pack_index(int argc, const char **argv, return 1; } + if (!strcmp(argv[0], "repack")) + return midx_repack(opts.object_dir, (size_t)opts.batch_size); + if (opts.batch_size) + die(_("--batch-size option is only for 'repack' subcomma
[PATCH v5 09/11] multi-pack-index: test expire while adding packs
During development of the multi-pack-index expire subcommand, a version went out that improperly computed the pack order if a new pack was introduced while other packs were being removed. Part of the subtlety of the bug involved the new pack being placed before other packs that already existed in the multi-pack-index. Add a test to t5319-multi-pack-index.sh that catches this issue. The test adds new packs that cause another pack to be expired, and creates new packs that are lexicographically sorted before and after the existing packs. Signed-off-by: Derrick Stolee --- t/t5319-multi-pack-index.sh | 32 1 file changed, 32 insertions(+) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index d6c1353514..19b769eea0 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -463,4 +463,36 @@ test_expect_success 'expire removes repacked packs' ' ) ' +test_expect_success 'expire works when adding new packs' ' + ( + cd dup && + git pack-objects --revs .git/objects/pack/pack-combined <<-EOF && + refs/heads/A + ^refs/heads/B + EOF + git pack-objects --revs .git/objects/pack/pack-combined <<-EOF && + refs/heads/B + ^refs/heads/C + EOF + git pack-objects --revs .git/objects/pack/pack-combined <<-EOF && + refs/heads/C + ^refs/heads/D + EOF + git multi-pack-index write && + git pack-objects --revs .git/objects/pack/a-pack <<-EOF && + refs/heads/D + ^refs/heads/E + EOF + git multi-pack-index write && + git pack-objects --revs .git/objects/pack/z-pack <<-EOF && + refs/heads/E + EOF + git multi-pack-index expire && + ls .git/objects/pack/ | grep idx >expect && + test-tool read-midx .git/objects | grep idx >actual && + test_cmp expect actual && + git multi-pack-index verify + ) +' + test_done -- 2.21.0.1096.g1c91fdc207
[PATCH v5 10/11] midx: add test that 'expire' respects .keep files
The 'git multi-pack-index expire' subcommand may delete packs that are not needed from the perspective of the multi-pack-index. If a pack has a .keep file, then we should not delete that pack. Add a test that ensures we preserve a pack that would otherwise be expired. First, create a new pack that contains every object in the repo, then add it to the multi-pack-index. Then create a .keep file for a pack starting with "a-pack" that was added in the previous test. Finally, expire and verify that the pack remains and the other packs were expired. Signed-off-by: Derrick Stolee --- t/t5319-multi-pack-index.sh | 18 ++ 1 file changed, 18 insertions(+) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 19b769eea0..bcfa520401 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -495,4 +495,22 @@ test_expect_success 'expire works when adding new packs' ' ) ' +test_expect_success 'expire respects .keep files' ' + ( + cd dup && + git pack-objects --revs .git/objects/pack/pack-all <<-EOF && + refs/heads/A + EOF + git multi-pack-index write && + PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) && + touch $PACKA.keep && + git multi-pack-index expire && + ls -S .git/objects/pack/a-pack* | grep $PACKA >a-pack-files && + test_line_count = 3 a-pack-files && + test-tool read-midx .git/objects | grep idx >midx-list && + test_line_count = 2 midx-list + ) +' + + test_done -- 2.21.0.1096.g1c91fdc207
[PATCH v5 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index
The multi-pack-index provides a fast way to find an object among a large list of pack-files. It stores a single pack-reference for each object id, so duplicate objects are ignored. Among a list of pack-files storing the same object, the most-recently modified one is used. Create new subcommands for the multi-pack-index builtin. * 'git multi-pack-index expire': If we have a pack-file indexed by the multi-pack-index, but all objects in that pack are duplicated in more-recently modified packs, then delete that pack (and any others like it). Delete the reference to that pack in the multi-pack-index. * 'git multi-pack-index repack --batch-size=': Starting from the oldest pack-files covered by the multi-pack-index, find those whose "expected size" is below the batch size until we have a collection of packs whose expected sizes add up to the batch size. We compute the expected size by multiplying the number of referenced objects by the pack-size and dividing by the total number of objects in the pack. If the batch-size is zero, then select all packs. Create a new pack containing all objects that the multi-pack-index references to those packs. This allows us to create a new pattern for repacking objects: run 'repack'. After enough time has passed that all Git commands that started before the last 'repack' are finished, run 'expire' again. This approach has some advantages over the existing "repack everything" model: 1. Incremental. We can repack a small batch of objects at a time, instead of repacking all reachable objects. We can also limit ourselves to the objects that do not appear in newer pack-files. 2. Highly Available. By adding a new pack-file (and not deleting the old pack-files) we do not interrupt concurrent Git commands, and do not suffer performance degradation. By expiring only pack-files that have no referenced objects, we know that Git commands that are doing normal object lookups* will not be interrupted. * Note: if someone concurrently runs a Git command that uses get_all_packs(), * then that command could try to read the pack-files and pack-indexes that we * are deleting during an expire command. Such commands are usually related to * object maintenance (i.e. fsck, gc, pack-objects) or are related to * less-often-used features (i.e. fast-import, http-backend, server-info). We **are using** this approach in VFS for Git to do background maintenance of the "shared object cache" which is a Git alternate directory filled with packfiles containing commits and trees. We currently download pack-files on an hourly basis to keep up-to-date with the central server. The cache servers supply packs on an hourly and daily basis, so most of the hourly packs become useless after a new daily pack is downloaded. The 'expire' command would clear out most of those packs, but many will still remain with fewer than 100 objects remaining. The 'repack' command (with a batch size of 1-3gb, probably) can condense the remaining packs in commands that run for 1-3 min at a time. Since the daily packs range from 100-250mb, we will also combine and condense those packs. Updates in V5: * Fixed the error in PATCH 7 due to a missing line that existed in PATCH 8. Thanks, Josh Steadmon! * The 'repack' subcommand now computes the "expected size" of a pack instead of relying on the total size of the pack. This is actually really important to the way VFS for Git uses prefetch packs, and some packs are not being repacked because the pack size is larger than the batch size, but really there are only a few referenced objects. * The 'repack' subcommand now allows a batch size of zero to mean "create one pack containing all objects in the multi-pack-index". A new commit adds a test that hits the boundary cases here, but follows the 'expire' subcommand so we can show that cycle of repack-then-expire to safely replace the packs. Junio: It appears that there are some conflicts with the trace2 changes in master. These are not new to the updates in this version. I saw how you resolved these conflicts and replaying that resolution should work for you. Thanks, -Stolee Derrick Stolee (11): repack: refactor pack deletion for future use Docs: rearrange subcommands for multi-pack-index multi-pack-index: prepare for 'expire' subcommand midx: simplify computation of pack name lengths midx: refactor permutation logic and pack sorting multi-pack-index: implement 'expire' subcommand multi-pack-index: prepare 'repack' subcommand midx: implement midx_repack() multi-pack-index: test expire while adding packs midx: add test that 'expire' respects .keep files t5319-multi-pack-index.sh: test batch size zero Documentation/git-multi-pack-index.txt | 32 +- builtin/multi-pack-index.c
[PATCH v5 11/11] t5319-multi-pack-index.sh: test batch size zero
The 'git multi-pack-index repack' command can take a batch size of zero, which creates a new pack-file containing all objects in the multi-pack-index. The first 'repack' command will create one new pack-file, and an 'expire' command after that will delete the old pack-files, as they no longer contain any referenced objects in the multi-pack-index. We must remove the .keep file that was added in the previous test in order to expire that pack-file. Also test that a 'repack' will do nothing if there is only one pack-file. Signed-off-by: Derrick Stolee --- t/t5319-multi-pack-index.sh | 19 +++ 1 file changed, 19 insertions(+) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index bcfa520401..0f116b4b92 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -512,5 +512,24 @@ test_expect_success 'expire respects .keep files' ' ) ' +test_expect_success 'repack --batch-size=0 repacks everything' ' + ( + cd dup && + rm .git/objects/pack/*.keep && + ls .git/objects/pack/*idx >idx-list && + test_line_count = 2 idx-list && + git multi-pack-index repack --batch-size=0 && + ls .git/objects/pack/*idx >idx-list && + test_line_count = 3 idx-list && + test-tool read-midx .git/objects | grep idx >midx-list && + test_line_count = 3 midx-list && + git multi-pack-index expire && + ls -al .git/objects/pack/*idx >idx-list && + test_line_count = 1 idx-list && + git multi-pack-index repack --batch-size=0 && + ls -al .git/objects/pack/*idx >new-idx-list && + test_cmp idx-list new-idx-list + ) +' test_done -- 2.21.0.1096.g1c91fdc207
[PATCH v5 05/11] midx: refactor permutation logic and pack sorting
In anticipation of the expire subcommand, refactor the way we sort the packfiles by name. This will greatly simplify our approach to dropping expired packs from the list. First, create 'struct pack_info' to replace 'struct pack_pair'. This struct contains the necessary information about a pack, including its name, a pointer to its packfile struct (if not already in the multi-pack-index), and the original pack-int-id. Second, track the pack information using an array of pack_info structs in the pack_list struct. This simplifies the logic around the multiple arrays we were tracking in that struct. Finally, update get_sorted_entries() to not permute the pack-int-id and instead supply the permutation to write_midx_object_offsets(). This requires sorting the packs after get_sorted_entries(). Signed-off-by: Derrick Stolee --- midx.c | 156 + 1 file changed, 69 insertions(+), 87 deletions(-) diff --git a/midx.c b/midx.c index f087bbbe82..95c39106b2 100644 --- a/midx.c +++ b/midx.c @@ -377,12 +377,23 @@ static size_t write_midx_header(struct hashfile *f, return MIDX_HEADER_SIZE; } +struct pack_info { + uint32_t orig_pack_int_id; + char *pack_name; + struct packed_git *p; +}; + +static int pack_info_compare(const void *_a, const void *_b) +{ + struct pack_info *a = (struct pack_info *)_a; + struct pack_info *b = (struct pack_info *)_b; + return strcmp(a->pack_name, b->pack_name); +} + struct pack_list { - struct packed_git **list; - char **names; + struct pack_info *info; uint32_t nr; - uint32_t alloc_list; - uint32_t alloc_names; + uint32_t alloc; struct multi_pack_index *m; }; @@ -395,66 +406,32 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, if (packs->m && midx_contains_pack(packs->m, file_name)) return; - ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list); - ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names); + ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc); - packs->list[packs->nr] = add_packed_git(full_path, - full_path_len, - 0); + packs->info[packs->nr].p = add_packed_git(full_path, + full_path_len, + 0); - if (!packs->list[packs->nr]) { + if (!packs->info[packs->nr].p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(packs->list[packs->nr])) { + if (open_pack_index(packs->info[packs->nr].p)) { warning(_("failed to open pack-index '%s'"), full_path); - close_pack(packs->list[packs->nr]); - FREE_AND_NULL(packs->list[packs->nr]); + close_pack(packs->info[packs->nr].p); + FREE_AND_NULL(packs->info[packs->nr].p); return; } - packs->names[packs->nr] = xstrdup(file_name); + packs->info[packs->nr].pack_name = xstrdup(file_name); + packs->info[packs->nr].orig_pack_int_id = packs->nr; packs->nr++; } } -struct pack_pair { - uint32_t pack_int_id; - char *pack_name; -}; - -static int pack_pair_compare(const void *_a, const void *_b) -{ - struct pack_pair *a = (struct pack_pair *)_a; - struct pack_pair *b = (struct pack_pair *)_b; - return strcmp(a->pack_name, b->pack_name); -} - -static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t *perm) -{ - uint32_t i; - struct pack_pair *pairs; - - ALLOC_ARRAY(pairs, nr_packs); - - for (i = 0; i < nr_packs; i++) { - pairs[i].pack_int_id = i; - pairs[i].pack_name = pack_names[i]; - } - - QSORT(pairs, nr_packs, pack_pair_compare); - - for (i = 0; i < nr_packs; i++) { - pack_names[i] = pairs[i].pack_name; - perm[pairs[i].pack_int_id] = i; - } - - free(pairs); -} - struct pack_midx_entry { struct object_id oid; uint32_t pack_int_id; @@ -480,7 +457,6 @@ static int midx_oid_compare(const void *_a, const void *_b) } static int nth_midxed_pack_midx_entry(struct multi_pack_index *m, - ui
[PATCH v5 04/11] midx: simplify computation of pack name lengths
Before writing the multi-pack-index, we compute the length of the pack-index names concatenated together. This forms the data in the pack name chunk, and we precompute it to compute chunk offsets. The value is also modified to fit alignment needs. Previously, this computation was coupled with adding packs from the existing multi-pack-index and the remaining packs in the object dir not already covered by the multi-pack-index. In anticipation of this becoming more complicated with the 'expire' subcommand, simplify the computation by centralizing it to a single loop before writing the file. Signed-off-by: Derrick Stolee --- midx.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/midx.c b/midx.c index bb825ef816..f087bbbe82 100644 --- a/midx.c +++ b/midx.c @@ -383,7 +383,6 @@ struct pack_list { uint32_t nr; uint32_t alloc_list; uint32_t alloc_names; - size_t pack_name_concat_len; struct multi_pack_index *m; }; @@ -418,7 +417,6 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, } packs->names[packs->nr] = xstrdup(file_name); - packs->pack_name_concat_len += strlen(file_name) + 1; packs->nr++; } } @@ -762,6 +760,7 @@ int write_midx_file(const char *object_dir) uint32_t nr_entries, num_large_offsets = 0; struct pack_midx_entry *entries = NULL; int large_offsets_needed = 0; + int pack_name_concat_len = 0; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) { @@ -777,7 +776,6 @@ int write_midx_file(const char *object_dir) packs.alloc_names = packs.alloc_list; packs.list = NULL; packs.names = NULL; - packs.pack_name_concat_len = 0; ALLOC_ARRAY(packs.list, packs.alloc_list); ALLOC_ARRAY(packs.names, packs.alloc_names); @@ -788,7 +786,6 @@ int write_midx_file(const char *object_dir) packs.list[packs.nr] = NULL; packs.names[packs.nr] = xstrdup(packs.m->pack_names[i]); - packs.pack_name_concat_len += strlen(packs.names[packs.nr]) + 1; packs.nr++; } } @@ -798,10 +795,6 @@ int write_midx_file(const char *object_dir) if (packs.m && packs.nr == packs.m->num_packs) goto cleanup; - if (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) - packs.pack_name_concat_len += MIDX_CHUNK_ALIGNMENT - - (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT); - ALLOC_ARRAY(pack_perm, packs.nr); sort_packs_by_name(packs.names, packs.nr, pack_perm); @@ -814,6 +807,13 @@ int write_midx_file(const char *object_dir) large_offsets_needed = 1; } + for (i = 0; i < packs.nr; i++) + pack_name_concat_len += strlen(packs.names[i]) + 1; + + if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) + pack_name_concat_len += MIDX_CHUNK_ALIGNMENT - + (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT); + hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); @@ -831,7 +831,7 @@ int write_midx_file(const char *object_dir) cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + packs.pack_name_concat_len; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len; cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; -- 2.21.0.1096.g1c91fdc207
[PATCH v5 03/11] multi-pack-index: prepare for 'expire' subcommand
The multi-pack-index tracks objects in a collection of pack-files. Only one copy of each object is indexed, using the modified time of the pack-files to determine tie-breakers. It is possible to have a pack-file with no referenced objects because all objects have a duplicate in a newer pack-file. Introduce a new 'expire' subcommand to the multi-pack-index builtin. This subcommand will delete these unused pack-files and rewrite the multi-pack-index to no longer refer to those files. More details about the specifics will follow as the method is implemented. Add a test that verifies the 'expire' subcommand is correctly wired, but will still be valid when the verb is implemented. Specifically, create a set of packs that should all have referenced objects and should not be removed during an 'expire' operation. The packs are created carefully to ensure they have a specific order when sorted by size. This will be important in a later test. Signed-off-by: Derrick Stolee --- Documentation/git-multi-pack-index.txt | 5 +++ builtin/multi-pack-index.c | 4 ++- midx.c | 5 +++ midx.h | 1 + t/t5319-multi-pack-index.sh| 49 ++ 5 files changed, 63 insertions(+), 1 deletion(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 1af406aca2..6186c4c936 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -31,6 +31,11 @@ write:: verify:: Verify the contents of the MIDX file. +expire:: + Delete the pack-files that are tracked by the MIDX file, but + have no objects referenced by the MIDX. Rewrite the MIDX file + afterward to remove all references to these pack-files. + EXAMPLES diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index fca70f8e4f..145de3a46c 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -5,7 +5,7 @@ #include "midx.h" static char const * const builtin_multi_pack_index_usage[] = { - N_("git multi-pack-index [--object-dir=] (write|verify)"), + N_("git multi-pack-index [--object-dir=] (write|verify|expire)"), NULL }; @@ -44,6 +44,8 @@ int cmd_multi_pack_index(int argc, const char **argv, return write_midx_file(opts.object_dir); if (!strcmp(argv[0], "verify")) return verify_midx_file(opts.object_dir); + if (!strcmp(argv[0], "expire")) + return expire_midx_packs(opts.object_dir); die(_("unrecognized verb: %s"), argv[0]); } diff --git a/midx.c b/midx.c index 730ff84dff..bb825ef816 100644 --- a/midx.c +++ b/midx.c @@ -1025,3 +1025,8 @@ int verify_midx_file(const char *object_dir) return verify_midx_error; } + +int expire_midx_packs(const char *object_dir) +{ + return 0; +} diff --git a/midx.h b/midx.h index 774f652530..e3a2b740b5 100644 --- a/midx.h +++ b/midx.h @@ -49,6 +49,7 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i int write_midx_file(const char *object_dir); void clear_midx_file(struct repository *r); int verify_midx_file(const char *object_dir); +int expire_midx_packs(const char *object_dir); void close_midx(struct multi_pack_index *m); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 70926b5bc0..a8528f7da0 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -348,4 +348,53 @@ test_expect_success 'verify incorrect 64-bit offset' ' "incorrect object offset" ' +test_expect_success 'setup expire tests' ' + mkdir dup && + ( + cd dup && + git init && + test-tool genrandom "data" 4096 >large_file.txt && + git update-index --add large_file.txt && + for i in $(test_seq 1 20) + do + test_commit $i + done && + git branch A HEAD && + git branch B HEAD~8 && + git branch C HEAD~13 && + git branch D HEAD~16 && + git branch E HEAD~18 && + git pack-objects --revs .git/objects/pack/pack-A <<-EOF && + refs/heads/A + ^refs/heads/B + EOF + git pack-objects --revs .git/objects/pack/pack-B <<-EOF && + refs/heads/B + ^refs/heads/C + EOF + git pack-objects --revs .git/objects/pack/pack-C <<-EOF && + refs/heads/C + ^refs/heads/D + EOF + git pack-objec
Git Test Coverage Report (April 23)
lone_filter_default, b21a55f3 139) return 0; dcc8b4e9 202) static int remove_fetched_oids(struct object_id **oids, int oid_nr, int to_free) dcc8b4e9 204) int i, missing_nr = 0; dcc8b4e9 205) int *missing = xcalloc(oid_nr, sizeof(*missing)); dcc8b4e9 206) struct object_id *old_oids = *oids; dcc8b4e9 208) int old_fetch_if_missing = fetch_if_missing; dcc8b4e9 210) fetch_if_missing = 0; dcc8b4e9 212) for (i = 0; i < oid_nr; i++) dcc8b4e9 213) if (oid_object_info_extended(the_repository, &old_oids[i], NULL, 0)) { dcc8b4e9 214) missing[i] = 1; dcc8b4e9 215) missing_nr++; dcc8b4e9 218) fetch_if_missing = old_fetch_if_missing; dcc8b4e9 220) if (missing_nr) { dcc8b4e9 221) int j = 0; dcc8b4e9 222) new_oids = xcalloc(missing_nr, sizeof(*new_oids)); dcc8b4e9 223) for (i = 0; i < oid_nr; i++) dcc8b4e9 224) if (missing[i]) dcc8b4e9 225) oidcpy(&new_oids[j++], &old_oids[i]); dcc8b4e9 226) *oids = new_oids; dcc8b4e9 227) if (to_free) dcc8b4e9 228) free(old_oids); dcc8b4e9 231) free(missing); dcc8b4e9 233) return missing_nr; dcc8b4e9 248) if (missing_nr == 1) dcc8b4e9 249) continue; dcc8b4e9 250) missing_nr = remove_fetched_oids(&missing_oids, missing_nr, to_free); dcc8b4e9 251) if (missing_nr) { dcc8b4e9 252) to_free = 1; dcc8b4e9 253) continue; dcc8b4e9 261) free(missing_oids); protocol.c 6da1f1a9 37) die(_("Unrecognized protocol version")); 6da1f1a9 39) die(_("Unrecognized protocol_version")); 6da1f1a9 74) BUG("late attempt to register an allowed protocol version"); ref-filter.c 3da20422 93) keydata_aka_refname ? keydata_aka_refname : k->wt->head_ref); remote-curl.c 6da1f1a9 345) return 0; sequencer.c 4a72486d 2184) return error_errno("unable to open '%s'", todo_file); 4a72486d 2188) eol--; /* strip Carriage Return */ 4a72486d 2190) goto fail; 4a72486d 2196) goto fail; b07d9bfd 2300) error_errno("unable to open '%s'", todo_path); b07d9bfd 2301) return 0; upload-pack.c a8d662e3 130) return readsz; 820a5361 149) BUG("packfile_uris requires sideband-all"); a8d662e3 355) send_client_data(1, output_state.buffer, output_state.used); 820a5361 1384) string_list_clear(&data->uri_protocols, 0); wrapper.c 5efde212 70) die("Out of memory, malloc failed (tried to allocate %" PRIuMAX " bytes)", 5efde212 73) error("Out of memory, malloc failed (tried to allocate %" PRIuMAX " bytes)", Commits introducting uncovered code: Barret Rhoden dfb4ee12 blame: use a fingerprint heuristic to match ignored lines Barret Rhoden d0738d94 blame: add config options to handle output for ignored lines Barret Rhoden a5c91678 blame: add the ability to ignore commits and their changes Barret Rhoden 3486d8d4 Move init_skiplist() outside of fsck Christian Couder0ba08c05 Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} Christian Couder54248706 Add initial support for many promisor remotes Christian Couder7bdf0926 promisor-remote: use repository_format_partial_clone Christian Couder7b6e1b04 Move core_partial_clone_filter_default to promisor-remote.c Christian Couderb21a55f3 promisor-remote: parse remote.*.partialclonefilter Christian Couderdcc8b4e9 promisor-remote: implement promisor_remote_get_direct() Christian Coudere265069a Use promisor_remote_get_direct() and has_promisor_remote() Dan McGregorba81921a http: cast result to FILE * Derrick Stolee 8e7e6c05 commit-graph: return with errors during write Derrick Stolee b1beb050 commit-graph: create new version flags Jonathan Tanbf01639c fetch-pack: support more than one pack lockfile Jonathan Tana8d662e3 upload-pack: refactor reading of pack-objects out Jonathan Tan472fbef8 http-fetch: support fetching packfiles by URL Jonathan Tan820a5361 upload-pack: send part of packfile response as uri Josh Steadmon 6da1f1a9 protocol: advertise multiple supported versions Junio C Hamano 8b13ff27 Merge branch 'jt/fetch-cdn-offload' into pu Martin Koegler 5efde212 zlib.c: use size_t for size Nguyễn Thái Ngọc Duy6f11fd5e config: add --move-to Nguyễn Thái Ngọc Duya12c1ff3 config: factor out set_config_source_file() Nguyễn Thái Ngọc Duy8f7c7f55 config.c: add repo_config_set_worktree_gently() Nickolai Belakovski 3da20422 ref-filter: add worktreepath atom Phillip Wood33898531 rebase -i: use struct object_id for squash_onto Phillip Wood460bc3ce rebase -i: run without forking rebase--interactive Phillip Wood0ea0847e rebase -i: use struct rebase_options in do_interactive_rebase() Phillip Wood7d3488eb rebase -i: use struct commit when parsing options Phillip Woodc44c2462 rebase -i: remove duplication Phillip Wood0609b741 rebase -i: combine rebase--interactive.c with rebase.c Phillip Wood4a72486d fix cherry-pick/revert status after commit Phillip Woodb07d9bfd commit/reset: try to clean up sequencer state Phillip
Re: [PATCH] t5304: add a test for pruning with bitmaps
On 4/19/2019 11:24 PM, Jeff King wrote: > Try running t5304 with this: > > diff --git a/reachable.c b/reachable.c > index eba6f64e01..7ec73ef43f 100644 > --- a/reachable.c > +++ b/reachable.c > @@ -191,6 +191,8 @@ static int mark_object_seen(const struct object_id *oid, > if (!obj) > die("unable to create object '%s'", oid_to_hex(oid)); > > + if (!(obj->flags & SEEN)) > + die("seen flag not already there"); > obj->flags |= SEEN; > return 0; > } > > which shows that we are indeed freshly setting SEEN on some objects. Good point! Thanks for clearing that up for me. > Interestingly, I don't _think_ you can cause an object to get pruned > accidentally here, because: [snip] I appreciate the additional context that you gave (and I snipped). This makes me comfortable accepting this test as an exercise of the code (to guard against future changes that create failures like null-refs) and we will need to rely on the performance suite to catch issues like removing the SEEN markers that I had tested. Thanks, -Stolee
Re: [PATCH] t5304: add a test for pruning with bitmaps
On 4/18/2019 4:08 PM, Jeff King wrote: > On Thu, Apr 18, 2019 at 03:49:53PM -0400, Jeff King wrote: >> I dunno. I guess it does not hurt to at least to at least make sure this >> code is running in the normal suite. I don't think that will find the >> more interesting regressions, but at least saves us the from the most >> egregious ones ("oops, the whole thing segfaults because somebody >> changed the API" kinds of things). That's the level of coverage I was hoping to see. I won't be picky if the OBJ_TAG case isn't hit or something. > So here's a test. This goes on top of jk/prune-optim (which is also > already in master). [snip] > +test_expect_success 'trivial prune with bitmaps enabled' ' > + git repack -adb && > + blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) && > + git prune --expire=now && > + git cat-file -e HEAD && > + test_must_fail git cat-file -e $blob > +' > + > test_done This test does cover the 'mark_object_seen()' method, which I tested by removing the "!" in the "if (!obj) die();" condition. However, my first inclination was to remove the line obj->flags |= SEEN; from the method, and I found that it still worked! This was confusing, so I looked for places where the SEEN flag was added during bitmap walks, and it turns out that the objects are marked as SEEN by prepare_bitmap_walk(). This means that we can remove a lot of the implementation from reachable.c and have the same effect (and drop an iteration through the objects). See the diff below. Thoughts? -Stolee -->8-- diff --git a/reachable.c b/reachable.c index 0d00a91de4..7d2762d47f 100644 --- a/reachable.c +++ b/reachable.c @@ -159,39 +159,6 @@ int add_unseen_recent_objects_to_traversal(struct rev_info *revs, FOR_EACH_OBJECT_LOCAL_ONLY); } -static void *lookup_object_by_type(struct repository *r, - const struct object_id *oid, - enum object_type type) -{ - switch (type) { - case OBJ_COMMIT: - return lookup_commit(r, oid); - case OBJ_TREE: - return lookup_tree(r, oid); - case OBJ_TAG: - return lookup_tag(r, oid); - case OBJ_BLOB: - return lookup_blob(r, oid); - default: - die("BUG: unknown object type %d", type); - } -} - -static int mark_object_seen(const struct object_id *oid, -enum object_type type, -int exclude, -uint32_t name_hash, -struct packed_git *found_pack, -off_t found_offset) -{ - struct object *obj = lookup_object_by_type(the_repository, oid, type); - if (!obj) - die("unable to create object '%s'", oid_to_hex(oid)); - - obj->flags |= SEEN; - return 0; -} - void mark_reachable_objects(struct rev_info *revs, int mark_reflog, timestamp_t mark_recent, struct progress *progress) { @@ -225,7 +192,10 @@ void mark_reachable_objects(struct rev_info *revs, int mark_reflog, bitmap_git = prepare_bitmap_walk(revs); if (bitmap_git) { - traverse_bitmap_commit_list(bitmap_git, mark_object_seen); + /* +* reachable objects are marked as SEEN +* by prepare_bitmap_walk(revs). +*/ free_bitmap_index(bitmap_git); return; }
Stalled ds/ branches (was What's cooking in git.git (Apr 2019, #03; Tue, 16))
On 4/16/2019 9:19 AM, Junio C Hamano wrote: > * ds/midx-expire-repack (2019-01-27) 10 commits > - midx: add test that 'expire' respects .keep files > - multi-pack-index: test expire while adding packs > - midx: implement midx_repack() > - multi-pack-index: prepare 'repack' subcommand > - multi-pack-index: implement 'expire' subcommand > - midx: refactor permutation logic and pack sorting > - midx: simplify computation of pack name lengths > - multi-pack-index: prepare for 'expire' subcommand > - Docs: rearrange subcommands for multi-pack-index > - repack: refactor pack deletion for future use > > "git multi-pack-index expire/repack" are new subcommands that > consult midx file and are used to drop unused pack files and > coalesce small pack files that are still in use. > > Comments? I'll be back to work full-time next week, and I have some ideas to update this command. Specifically, the 'repack' command can be improved in a few ways. I will elaborate on-thread when I send a new version. > * ds/commit-graph-format-v2 (2019-01-29) 8 commits > - SQUASH : misnamed variables and style fix > - commit-graph: test verifying a corrupt v2 header > - commit-graph: implement file format version 2 > - commit-graph: add --version= option > - commit-graph: create new version flags > - commit-graph: collapse parameters into flags > - commit-graph: return with errors during write > - Merge branch 'bc/sha-256' into ds/commit-graph-format-v2 > > Introduce version 2 of the commit-graph format to correct > deficiency in the initial version. > > Needs update before merging to 'next'. Sorry for the issues colliding with other topics. I'll send a new version next week that will resolve the conflicts. Hopefully I can do this without needing to update the base commit. Thanks, -Stolee
Re: [PATCH 07/14] parse-options: allow ll_callback with OPTION_CALLBACK
On 4/16/2019 4:52 AM, Duy Nguyen wrote: > On Mon, Apr 15, 2019 at 9:06 PM Derrick Stolee wrote: >> >> On 1/26/2019 7:35 PM, Nguyễn Thái Ngọc Duy wrote: >>> @@ -238,7 +249,10 @@ static enum parse_opt_result parse_short_opt(struct >>> parse_opt_ctx_t *p, >>> len++; >>> arg = xmemdupz(p->opt, len); >>> p->opt = p->opt[len] ? p->opt + len : NULL; >>> - rc = (*numopt->callback)(numopt, arg, 0) ? (-1) : 0; >>> + if (numopt->callback) >>> + rc = (*numopt->callback)(numopt, arg, 0) ? (-1) : 0; >>> + else >>> + rc = (*numopt->ll_callback)(p, numopt, arg, 0); >>> free(arg); >>> return rc; >>> } >> >> Hi Duy, >> >> This "else" condition is unreachable. This block is only hit when we have a >> "-" >> option, using OPT_NUMBER_CALLBACK, which is implemented by filling >> "callback", never >> "ll_callback". > > It does not mean ll_callback cannot be used in the future though. That's not a very good reason to add it now. YAGNI. > We > have three options > > 1. drop the else clause > 2. replace with "else BUG();" > 3. implement proper else clause > > Option #1 to me sounds wrong. If you don't support something, yell up. > Silently ignoring it only makes it harder to track down to this > unsupported location when it becomes reachable, however unlikely that > is. > > Which leaves options #2 and #3. If you think this one line is risky > enough, I'll send a patch to replace it with BUG(). It's not about risk, but the fact that it is pointless. The only way to get to this block is to create a 'struct option' with type OPTION_NUMBER manually (ignoring the OPT_NUMBER_CALLBACK macro), which _should_ be unsupported. If someone goes to the pain of adding a way to instantiate with a low-level callback, then they should add this 'if' statement. If you are going to add protection from an incorrect instantiation of 'callback' in a use of OPT_NUMBER_CALLBACK, then the proper place to do that is probably in parse_options_check(), where you were already doing callback-vs-ll_callback checks in the case of OPTION_CALLBACK and OPTION_LOWLEVEL_CALLBACK. However, the only places where the OPT_NUMBER_CALLBACK option appears are builtin/grep.c and t/helper/test-parse-options.c. I don't imagine a need to add low-level callbacks any time soon. >> I recommend reverting this diff segment, but please let me know if I'm >> missing something. I still think this is the easiest way to remove the dead code. I'll try to submit a patch later. I'm still not in full work mode, so I may be slow. Thanks, -Stolee
Re: [PATCH 2/3] prune: use bitmaps for reachability traversal
On 2/13/2019 11:37 PM, Jeff King wrote: > +static void *lookup_object_by_type(struct repository *r, > +const struct object_id *oid, > +enum object_type type) > +{ > + switch (type) { > + case OBJ_COMMIT: > + return lookup_commit(r, oid); > + case OBJ_TREE: > + return lookup_tree(r, oid); > + case OBJ_TAG: > + return lookup_tag(r, oid); > + case OBJ_BLOB: > + return lookup_blob(r, oid); > + default: > + die("BUG: unknown object type %d", type); > + } > +} > + > +static int mark_object_seen(const struct object_id *oid, > + enum object_type type, > + int exclude, > + uint32_t name_hash, > + struct packed_git *found_pack, > + off_t found_offset) > +{ > + struct object *obj = lookup_object_by_type(the_repository, oid, type); > + if (!obj) > + die("unable to create object '%s'", oid_to_hex(oid)); > + > + obj->flags |= SEEN; > + return 0; > +} > + > void mark_reachable_objects(struct rev_info *revs, int mark_reflog, > timestamp_t mark_recent, struct progress *progress) > { > struct connectivity_progress cp; > + struct bitmap_index *bitmap_git; > > /* >* Set up revision parsing, and mark us as being interested > @@ -188,6 +223,13 @@ void mark_reachable_objects(struct rev_info *revs, int > mark_reflog, > cp.progress = progress; > cp.count = 0; > > + bitmap_git = prepare_bitmap_walk(revs); > + if (bitmap_git) { > + traverse_bitmap_commit_list(bitmap_git, mark_object_seen); > + free_bitmap_index(bitmap_git); > + return; > + } > + Peff, This block after "if (bitmap_git)" is not exercised by the (non-performance) test suite, so the rest of the code above is not tested, either. Could a test of this "prune" capability be added to the regression tests around the bitmaps? Thanks, -Stolee
Re: [PATCH v6 2/2] config: allow giving separate author and committer idents
On 2/5/2019 2:52 PM, Ævar Arnfjörð Bjarmason wrote: > From: William Hubbs > -const char *fmt_name(const char *name, const char *email) > +const char *fmt_name(enum want_ident whose_ident) > { > - return fmt_ident(name, email, NULL, IDENT_STRICT | IDENT_NO_DATE); > + char *name = NULL; > + char *email = NULL; > + > + switch (whose_ident) { > + case WANT_BLANK_IDENT: > + break; > + case WANT_AUTHOR_IDENT: > + name = getenv("GIT_AUTHOR_NAME"); > + email = getenv("GIT_AUTHOR_EMAIL"); > + break; > + case WANT_COMMITTER_IDENT: > + name = getenv("GIT_COMMITTER_NAME"); > + email = getenv("GIT_COMMITTER_EMAIL"); > + break; > + } > + return fmt_ident(name, email, whose_ident, NULL, > + IDENT_STRICT | IDENT_NO_DATE); > } William and Ævar, The "WANT_AUTHOR_IDENT" block of this switch statement does not appear to be hit by any tests, despite the tests included in this patch. My guess is that it is ignored because we have the following code in builtin/commit.c: static void determine_author_info(struct strbuf *author_ident) { char *name, *email, *date; struct ident_split author; name = xstrdup_or_null(getenv("GIT_AUTHOR_NAME")); email = xstrdup_or_null(getenv("GIT_AUTHOR_EMAIL")); date = xstrdup_or_null(getenv("GIT_AUTHOR_DATE")); ... This is likely overriding the need to use fmt_name. Should we de-duplicate this use of the environment variable by using your new method at this spot in builtin/commit.c? Thanks, -Stolee
Re: [PATCH 07/14] parse-options: allow ll_callback with OPTION_CALLBACK
On 1/26/2019 7:35 PM, Nguyễn Thái Ngọc Duy wrote: > @@ -238,7 +249,10 @@ static enum parse_opt_result parse_short_opt(struct > parse_opt_ctx_t *p, > len++; > arg = xmemdupz(p->opt, len); > p->opt = p->opt[len] ? p->opt + len : NULL; > - rc = (*numopt->callback)(numopt, arg, 0) ? (-1) : 0; > + if (numopt->callback) > + rc = (*numopt->callback)(numopt, arg, 0) ? (-1) : 0; > + else > + rc = (*numopt->ll_callback)(p, numopt, arg, 0); > free(arg); > return rc; > } Hi Duy, This "else" condition is unreachable. This block is only hit when we have a "-" option, using OPT_NUMBER_CALLBACK, which is implemented by filling "callback", never "ll_callback". I recommend reverting this diff segment, but please let me know if I'm missing something. Thanks, -Stolee
Re: straw poll: git merge conference location
On 3/13/2019 4:55 PM, Jeff King wrote: > We're looking at doing it in North America, but there are two specific > questions: > > - is there preference between East Coast vs West Coast? I have no preference here. > - preferences between Canada and US? There should be serious consideration for Canada only because visa issues may prevent some participants from traveling into the US. I know that some academic conferences have moved from US to Canada (even after already being scheduled for a US location) due to recent changes and general volitility in visa policies. Perhaps I'm making a big deal about something that doesn't affect any contributors that would join us, so I hope that anyone concerned about this can speak up (or contact Peff privately). Thanks, -Stolee
Re: [PATCH 2/4] revision walk: optionally use sparse reachability
On 3/12/2019 9:18 AM, Nathaniel Filardo wrote: > The only caller that passes a non-zero value to prepare_revision_walk > after this patch is builtin/pack-objects. Without this, sparsity seems > to do little good therein, as prepare_revision_walk will densely > propagate UNINTERESTING flags from trees to tree contents, before > mark_edges_uninteresting has a chance to be faster by being sparse. > > Signed-off-by: Nathaniel Filardo > --- > bisect.c | 2 +- > blame.c | 2 +- > builtin/checkout.c | 2 +- > builtin/commit.c | 2 +- > builtin/describe.c | 2 +- > builtin/fast-export.c| 2 +- > builtin/fmt-merge-msg.c | 2 +- > builtin/log.c| 10 +- > builtin/merge.c | 2 +- > builtin/pack-objects.c | 4 ++-- > builtin/rev-list.c | 2 +- > builtin/shortlog.c | 2 +- > bundle.c | 2 +- > http-push.c | 2 +- > merge-recursive.c| 2 +- > pack-bitmap-write.c | 2 +- > pack-bitmap.c| 4 ++-- > reachable.c | 4 ++-- > ref-filter.c | 2 +- > remote.c | 2 +- > revision.c | 10 ++ > revision.h | 2 +- > sequencer.c | 6 +++--- > shallow.c| 2 +- > submodule.c | 4 ++-- > t/helper/test-revision-walking.c | 2 +- > 26 files changed, 41 insertions(+), 39 deletions(-) This patch is very noisy. Perhaps the pattern established in 4f6d26b1: "list-objects: consume sparse tree walk" should be reversed and instead include a 'sparse_tree_walk' setting into 'struct rev_info'. Changing so many method prototypes is rather invasive and unlikely to benefit many of these callers. If the setting is added to 'struct rev_info', then you'll want to remove the parameter from mark_edges_uninteresting(). Thanks, -Stolee
Re: [PATCH 3/4] repack: add --sparse and pass to pack-objects
On 3/12/2019 9:18 AM, Nathaniel Filardo wrote: > The sparse connectivity algorithm saves a whole lot of time when there > are UNINTERESTING trees around. Interesting! Do you have some performance numbers to include with this statement? > @@ -48,6 +49,10 @@ static int repack_config(const char *var, const char > *value, void *cb) > use_delta_islands = git_config_bool(var, value); > return 0; > } > + if (!strcmp(var, "pack.usesparse")) { > + sparse = git_config_bool(var, value); > + return 0; > + } This part is not handled inside of `pack-objects`. Since you are not sending '--no-sparse' when the variable 'sparse' is zero, the config setting will automatically be picked up by the pack-objects builtin. Now, a question of whether you _should_ allow the '--no-sparse' option in the 'repack' command, and send it along to the inner command (when it is present) is another question. > @@ -366,6 +374,8 @@ int cmd_repack(int argc, const char **argv, const char > *prefix) > argv_array_push(&cmd.args, "--all"); > argv_array_push(&cmd.args, "--reflog"); > argv_array_push(&cmd.args, "--indexed-objects"); > + if (sparse) > + argv_array_push(&cmd.args, "--sparse"); > if (repository_format_partial_clone) > argv_array_push(&cmd.args, "--exclude-promisor-objects"); > if (write_bitmaps) > How about a test with this new option? You can probably just add to t5322-pack-objects-sparse.sh. Thanks, -Stolee
[PATCH v7 10/15] trace2:data: pack-objects: add trace2 regions
From: Derrick Stolee When studying the performance of 'git push' we would like to know how much time is spent at various parts of the command. One area that could cause performance trouble is 'git pack-objects'. Add trace2 regions around the three main actions taken in this command: 1. Enumerate objects. 2. Prepare pack. 3. Write pack-file. Signed-off-by: Derrick Stolee Signed-off-by: Jeff Hostetler --- builtin/pack-objects.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0a70d04604..8a64c2868e 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -33,6 +33,7 @@ #include "object-store.h" #include "dir.h" #include "midx.h" +#include "trace2.h" #define IN_PACK(obj) oe_in_pack(&to_pack, obj) #define SIZE(obj) oe_size(&to_pack, obj) @@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } } + trace2_region_enter("pack-objects", "enumerate-objects", + the_repository); prepare_packing_data(the_repository, &to_pack); if (progress) @@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (include_tag && nr_result) for_each_ref(add_ref_tag, NULL); stop_progress(&progress_state); + trace2_region_leave("pack-objects", "enumerate-objects", + the_repository); if (non_empty && !nr_result) return 0; - if (nr_result) + if (nr_result) { + trace2_region_enter("pack-objects", "prepare-pack", + the_repository); prepare_pack(window, depth); + trace2_region_leave("pack-objects", "prepare-pack", + the_repository); + } + + trace2_region_enter("pack-objects", "write-pack-file", the_repository); write_pack_file(); + trace2_region_leave("pack-objects", "write-pack-file", the_repository); + if (progress) fprintf_ln(stderr, _("Total %"PRIu32" (delta %"PRIu32")," -- gitgitgadget
Re: How to get next commit that just after a specified commit
On 2/13/2019 4:36 PM, Ævar Arnfjörð Bjarmason wrote: > > On Wed, Feb 13 2019, Junio C Hamano wrote: > >> Ævar Arnfjörð Bjarmason writes: > > Their point in comparing it to git is that there's nothing intuitive in > the UI that exposes this information for the *current* graph, whereas in > fossil this is a built-in command: > https://fossil-scm.org/xfer/help/descendants > > Perhaps this information is cheaper to extract these days with the > commit graph and we could expose it somehow? I believe the recent changes (collected in commit-reach.c) that improve 'git branch --contains' or 'git tag --contains' are important in helping this descendants list. There is likely more work to be done to show this list in a rev-list call. Thanks, -Stolee
[PATCH v6 10/15] trace2:data: pack-objects: add trace2 regions
From: Derrick Stolee When studying the performance of 'git push' we would like to know how much time is spent at various parts of the command. One area that could cause performance trouble is 'git pack-objects'. Add trace2 regions around the three main actions taken in this command: 1. Enumerate objects. 2. Prepare pack. 3. Write pack-file. Signed-off-by: Derrick Stolee Signed-off-by: Jeff Hostetler --- builtin/pack-objects.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0a70d04604..8a64c2868e 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -33,6 +33,7 @@ #include "object-store.h" #include "dir.h" #include "midx.h" +#include "trace2.h" #define IN_PACK(obj) oe_in_pack(&to_pack, obj) #define SIZE(obj) oe_size(&to_pack, obj) @@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } } + trace2_region_enter("pack-objects", "enumerate-objects", + the_repository); prepare_packing_data(the_repository, &to_pack); if (progress) @@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (include_tag && nr_result) for_each_ref(add_ref_tag, NULL); stop_progress(&progress_state); + trace2_region_leave("pack-objects", "enumerate-objects", + the_repository); if (non_empty && !nr_result) return 0; - if (nr_result) + if (nr_result) { + trace2_region_enter("pack-objects", "prepare-pack", + the_repository); prepare_pack(window, depth); + trace2_region_leave("pack-objects", "prepare-pack", + the_repository); + } + + trace2_region_enter("pack-objects", "write-pack-file", the_repository); write_pack_file(); + trace2_region_leave("pack-objects", "write-pack-file", the_repository); + if (progress) fprintf_ln(stderr, _("Total %"PRIu32" (delta %"PRIu32")," -- gitgitgadget
Git Test Coverage Report (Wednesday, Feb. 6)
s:hook hook:%s", 3c543ab3 314) strbuf_addstr(&buf_payload, " cd:"); 3c543ab3 315) sq_quote_buf_pretty(&buf_payload, cmd->dir); 3c543ab3 320) strbuf_addstr(&buf_payload, " git"); 3c543ab3 342) static void fn_thread_start_fl(const char *file, int line, 3c543ab3 345) const char *event_name = "thread_start"; 3c543ab3 346) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 348) perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, 3c543ab3 350) strbuf_release(&buf_payload); 3c543ab3 351) } 3c543ab3 353) static void fn_thread_exit_fl(const char *file, int line, 3c543ab3 357) const char *event_name = "thread_exit"; 3c543ab3 358) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 360) perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, 3c543ab3 362) strbuf_release(&buf_payload); 3c543ab3 363) } 3c543ab3 365) static void fn_exec_fl(const char *file, int line, uint64_t us_elapsed_absolute, 3c543ab3 368) const char *event_name = "exec"; 3c543ab3 369) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 371) strbuf_addf(&buf_payload, "id:%d ", exec_id); 3c543ab3 372) strbuf_addstr(&buf_payload, "argv:"); 3c543ab3 373) if (exe) 3c543ab3 374) strbuf_addf(&buf_payload, " %s", exe); 3c543ab3 375) sq_quote_argv_pretty(&buf_payload, argv); 3c543ab3 377) perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, 3c543ab3 379) strbuf_release(&buf_payload); 3c543ab3 380) } 3c543ab3 382) static void fn_exec_result_fl(const char *file, int line, 3c543ab3 386) const char *event_name = "exec_result"; 3c543ab3 387) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 389) strbuf_addf(&buf_payload, "id:%d code:%d", exec_id, code); 3c543ab3 390) if (code > 0) 3c543ab3 391) strbuf_addf(&buf_payload, " err:%s", strerror(code)); 3c543ab3 393) perf_io_write_fl(file, line, event_name, NULL, &us_elapsed_absolute, 3c543ab3 395) strbuf_release(&buf_payload); 3c543ab3 396) } 3c543ab3 398) static void fn_param_fl(const char *file, int line, const char *param, 3c543ab3 401) const char *event_name = "def_param"; 3c543ab3 402) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 404) strbuf_addf(&buf_payload, "%s:%s", param, value); 3c543ab3 406) perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL, 3c543ab3 408) strbuf_release(&buf_payload); 3c543ab3 409) } 3c543ab3 411) static void fn_repo_fl(const char *file, int line, 3c543ab3 414) const char *event_name = "def_repo"; 3c543ab3 415) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 417) strbuf_addstr(&buf_payload, "worktree:"); 3c543ab3 418) sq_quote_buf_pretty(&buf_payload, repo->worktree); 3c543ab3 420) perf_io_write_fl(file, line, event_name, repo, NULL, NULL, NULL, 3c543ab3 422) strbuf_release(&buf_payload); 3c543ab3 423) } 3c543ab3 425) static void fn_region_enter_printf_va_fl(const char *file, int line, 3c543ab3 432) const char *event_name = "region_enter"; 3c543ab3 433) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 435) if (label) 3c543ab3 436) strbuf_addf(&buf_payload, "label:%s ", label); 3c543ab3 437) maybe_append_string_va(&buf_payload, fmt, ap); 3c543ab3 439) perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute, 3c543ab3 441) strbuf_release(&buf_payload); 3c543ab3 442) } 3c543ab3 444) static void fn_region_leave_printf_va_fl( 3c543ab3 449) const char *event_name = "region_leave"; 3c543ab3 450) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 452) if (label) 3c543ab3 453) strbuf_addf(&buf_payload, "label:%s ", label); 3c543ab3 454) maybe_append_string_va(&buf_payload, fmt, ap); 3c543ab3 456) perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute, 3c543ab3 458) strbuf_release(&buf_payload); 3c543ab3 459) } 3c543ab3 461) static void fn_data_fl(const char *file, int line, uint64_t us_elapsed_absolute, 3c543ab3 466) const char *event_name = "data"; 3c543ab3 467) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 469) strbuf_addf(&buf_payload, "%s:%s", key, value); 3c543ab3 471) perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute, 3c543ab3 473) strbuf_release(&buf_payload); 3c543ab3 474) } 3c543ab3 476) static void fn_data_json_fl(const char *file, int line, 3c543ab3 482) const char *event_name = "data_json"; 3c543ab3 483) struct strbuf buf_payload = STRBUF_INIT; 3c543ab3 485) strbuf_addf(&buf_payload, "%s:%s", key, value->json.buf); 3c543ab3 487) perf_io_write_fl(file, line, event_name, repo, &us_elapsed_absolute, 3c543ab3 489) strbuf_release(&buf_payload); 3c543ab3 490) } 3c543ab3 492) static void fn_printf_va_fl(const char *file, int line, 3c543ab3 496) const char *event_name = "printf"; 3
[PATCH v5 10/15] trace2:data: pack-objects: add trace2 regions
From: Derrick Stolee When studying the performance of 'git push' we would like to know how much time is spent at various parts of the command. One area that could cause performance trouble is 'git pack-objects'. Add trace2 regions around the three main actions taken in this command: 1. Enumerate objects. 2. Prepare pack. 3. Write pack-file. Signed-off-by: Derrick Stolee Signed-off-by: Jeff Hostetler --- builtin/pack-objects.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0a70d04604..8a64c2868e 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -33,6 +33,7 @@ #include "object-store.h" #include "dir.h" #include "midx.h" +#include "trace2.h" #define IN_PACK(obj) oe_in_pack(&to_pack, obj) #define SIZE(obj) oe_size(&to_pack, obj) @@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } } + trace2_region_enter("pack-objects", "enumerate-objects", + the_repository); prepare_packing_data(the_repository, &to_pack); if (progress) @@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (include_tag && nr_result) for_each_ref(add_ref_tag, NULL); stop_progress(&progress_state); + trace2_region_leave("pack-objects", "enumerate-objects", + the_repository); if (non_empty && !nr_result) return 0; - if (nr_result) + if (nr_result) { + trace2_region_enter("pack-objects", "prepare-pack", + the_repository); prepare_pack(window, depth); + trace2_region_leave("pack-objects", "prepare-pack", + the_repository); + } + + trace2_region_enter("pack-objects", "write-pack-file", the_repository); write_pack_file(); + trace2_region_leave("pack-objects", "write-pack-file", the_repository); + if (progress) fprintf_ln(stderr, _("Total %"PRIu32" (delta %"PRIu32")," -- gitgitgadget
[PATCH v4 10/14] pack-objects: add trace2 regions
From: Derrick Stolee When studying the performance of 'git push' we would like to know how much time is spent at various parts of the command. One area that could cause performance trouble is 'git pack-objects'. Add trace2 regions around the three main actions taken in this command: 1. Enumerate objects. 2. Prepare pack. 3. Write pack-file. Signed-off-by: Derrick Stolee Signed-off-by: Jeff Hostetler --- builtin/pack-objects.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0a70d04604..8a64c2868e 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -33,6 +33,7 @@ #include "object-store.h" #include "dir.h" #include "midx.h" +#include "trace2.h" #define IN_PACK(obj) oe_in_pack(&to_pack, obj) #define SIZE(obj) oe_size(&to_pack, obj) @@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } } + trace2_region_enter("pack-objects", "enumerate-objects", + the_repository); prepare_packing_data(the_repository, &to_pack); if (progress) @@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (include_tag && nr_result) for_each_ref(add_ref_tag, NULL); stop_progress(&progress_state); + trace2_region_leave("pack-objects", "enumerate-objects", + the_repository); if (non_empty && !nr_result) return 0; - if (nr_result) + if (nr_result) { + trace2_region_enter("pack-objects", "prepare-pack", + the_repository); prepare_pack(window, depth); + trace2_region_leave("pack-objects", "prepare-pack", + the_repository); + } + + trace2_region_enter("pack-objects", "write-pack-file", the_repository); write_pack_file(); + trace2_region_leave("pack-objects", "write-pack-file", the_repository); + if (progress) fprintf_ln(stderr, _("Total %"PRIu32" (delta %"PRIu32")," -- gitgitgadget
[PATCH v3 10/14] pack-objects: add trace2 regions
From: Derrick Stolee When studying the performance of 'git push' we would like to know how much time is spent at various parts of the command. One area that could cause performance trouble is 'git pack-objects'. Add trace2 regions around the three main actions taken in this command: 1. Enumerate objects. 2. Prepare pack. 3. Write pack-file. Signed-off-by: Derrick Stolee Signed-off-by: Jeff Hostetler --- builtin/pack-objects.c | 16 +++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0a70d04604..8a64c2868e 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -33,6 +33,7 @@ #include "object-store.h" #include "dir.h" #include "midx.h" +#include "trace2.h" #define IN_PACK(obj) oe_in_pack(&to_pack, obj) #define SIZE(obj) oe_size(&to_pack, obj) @@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } } + trace2_region_enter("pack-objects", "enumerate-objects", + the_repository); prepare_packing_data(the_repository, &to_pack); if (progress) @@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (include_tag && nr_result) for_each_ref(add_ref_tag, NULL); stop_progress(&progress_state); + trace2_region_leave("pack-objects", "enumerate-objects", + the_repository); if (non_empty && !nr_result) return 0; - if (nr_result) + if (nr_result) { + trace2_region_enter("pack-objects", "prepare-pack", + the_repository); prepare_pack(window, depth); + trace2_region_leave("pack-objects", "prepare-pack", + the_repository); + } + + trace2_region_enter("pack-objects", "write-pack-file", the_repository); write_pack_file(); + trace2_region_leave("pack-objects", "write-pack-file", the_repository); + if (progress) fprintf_ln(stderr, _("Total %"PRIu32" (delta %"PRIu32")," -- gitgitgadget
Re: [PATCH 1/1] Makefile: add prove and coverage-prove targets
On 1/29/2019 1:10 PM, Derrick Stolee wrote: > On 1/29/2019 12:34 PM, SZEDER Gábor wrote: >> On Tue, Jan 29, 2019 at 04:58:27PM +0100, SZEDER Gábor wrote: >> And in the related email discussion [1]: >> >> But even though the docs claim it [-j] should be possible, >> I've been getting "random" test failures when compiled with coverage >> support, that went away with -j1. So the tests still run with -j1, as >> with the first version of the series. >> >> So it doesn't seem to be that bad after all, because it's not >> "completely breaks" but "random test failures". Still far from ideal, >> but the original coverage patch is just about 3 weeks short of a >> decade old, so maybe things have improved since then, and it'd be >> worth a try to leave GIT_PROVE_OPTS as is and see what happens. > > It would certainly be nice if the build time could be reduced through > parallel test runs. I've kicked off a build using GIT_PROVE_OPTS="-j12" > to see what happens. I did get a failed test with this run: t0025-crlf-renormalize.sh (Wstat: 256 Tests: 3 Failed: 1) Failed test: 2 Non-zero exit status: 1 This was on the 'jch' branch, and an equivalent build with sequential execution did not have this failure. That's flaky enough for me to stick to sequential runs. Thanks, -Stolee
Re: Contributor Summit Topics and Logistics
I was hoping to attend the contributors' summit remotely, but now my leave is starting before then. This email contains a summary of what I would have added to the discussion. Thanks, -Stolee Commit-Graph Status Report == I'm really happy with the progress in this area, especially with the number of other contributors working on the feature! Thanks Ævar, Jonathan, Josh, Stefan, and Szeder in particular. Here are some directions to take the feature in the near future: File Format v2 -- The new format version [1] specifically fixes some shortcomings in v1: * Uses the 4-byte format id for the hash algorithm. * Creates a separate version byte for the reachability index. * Enforces that the unused byte is zero until we use it for incremental writes. Hopefully, this is the last time we need to update the file header. [1] https://public-inbox.org/git/pull.112.git.gitgitgad...@gmail.com/ [PATCH 0/6] Create commit-graph file format v2 Reachability Index -- As discussed on-list [2], we want to replace generation numbers with a different (negative-cut) reachability index. I used the term "corrected commit date". The definition is: * If a commit has no parents, then its corrected commit date is its commit date. * If a commit has parents, then its corrected commit date is the maximum of: - its commit date - one more than the maximum corrected commit date of its parents The benefits of this definition were discussed already, but to summarize: * This definition will work _at least as well_ as the commit date heuristic, with the added bonus of being absolutely sure our results are right. We can update algorithms like paint_down_to_common() to use this reachability index without performance problems in some cases. * If someone creates a terrible commit with a date that is far in the future, this definition is no worse than existing generation numbers (because we enforce that the corrected commit date is strictly larger than the parents' corrected commit date). To implement this index, we can re-use the 30 bits per commit in the commit-graph file that are used for generation numbers, but use them instead for the difference between the corrected commit date and the actual commit date. File format v2 gives us a version value that can be incremented to signal the change in meaning. Some work is required to adjust the existing generation-number-aware algorithms to care about an "arbitrary" reachability index. It could be as easy as a helper function that returns a function pointer to the proper compare function. If someone wants to move forward on this topic while I'm gone, please volunteer. Otherwise, this will be among my first items to work on when I return from leave. [2] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/ [RFC] Generation Number v2 Incremental Writes -- Similar to the split index, an incremental commit-graph file can be implemented to reduce the write time when adding commits to an existing (large) commit-graph. In this case, the .git/objects/info/commit-graph file would be small, and have a pointer to a base file, say "cgraph-.cgraph", that contains the majority of the commits. The important thing to keep in mind here is that we use integers to refer to a commit's parents. This integer would need to refer to the order of commits when you concatenate the orderd lists from each file. When doing this, we can point into the base file as well as the tip file. Since the base commit-graph file would be closed under reachability, it only needs to care about commits in its file. It is also possible to have multiple base files, and we can use the unused byte in the commit-graph file format v2 to store the number of base files. We can then store a list of file names in a new chunk, presenting the ordered list of base files. We still want to keep this list short, but there may be benefits to a variable number. I expect the first version would limit the construction to one base file for simplicity's sake. When this is implemented, we can use it to write the commit-graph at fetch time. A config setting, say 'fetch.writeCommitGraph', could enable this write. Since most writes would add a small number of commits compared to the large base file, this would be a more reasonable cost to add to a fetch. Since we verify the pack upon download, the commits it contained will already be in the memory cache and we won't need to re-parse those commits. Volunteers welcome. Bloom Filters - Using bloom filters to speed up file history has been discussed and prototyped on-list (see [12] and the thread before it). Thanks for lots of contributions in this area! A lot of people have shown an interest in this feature, and it is particularly helpful with server-side queries. Any implementation here should check that it is helping 'git blame' as much as it can [13]. It's entirely
Re: [PATCH 1/1] Makefile: add prove and coverage-prove targets
On 1/29/2019 12:34 PM, SZEDER Gábor wrote: > On Tue, Jan 29, 2019 at 04:58:27PM +0100, SZEDER Gábor wrote: > And in the related email discussion [1]: > > But even though the docs claim it [-j] should be possible, > I've been getting "random" test failures when compiled with coverage > support, that went away with -j1. So the tests still run with -j1, as > with the first version of the series. > > So it doesn't seem to be that bad after all, because it's not > "completely breaks" but "random test failures". Still far from ideal, > but the original coverage patch is just about 3 weeks short of a > decade old, so maybe things have improved since then, and it'd be > worth a try to leave GIT_PROVE_OPTS as is and see what happens. It would certainly be nice if the build time could be reduced through parallel test runs. I've kicked off a build using GIT_PROVE_OPTS="-j12" to see what happens. Thanks! -Stolee
Git Test Coverage Report (Tue. January 29, 2019)
Here is today's test coverage report. Since the last report, I have overhauled the machinery for generating these reports. The code is available on GitHub [1], and the reports are available online in text [2] and HTML [3] form. In addition to the two output formats, the code also has the ability to ignore lines that are unimportant for coverage. This allows me to review the report, and reduce the size of the next one. For this reason, I set the comparison branch for 'master' to 'maint' so we can see what is uncovered in that segment. If you wish to add ignored lines, then please open a pull request against the GitHub repo [1]. One side effect is that some file names may appear with no uncovered code. The ignored lines are visible by clicking a button on the HTML report [3]. The 'pu' versus 'jch' report looks rather large to me, but I did use prove to run the tests, so the entire test suite did run (through failures). A lot of the lines are related to trace2, so I'll add the GIT_TR2* variables to my build when running with optional variables. As always, I hope this is useful and I look forward to your feedback. Thanks, -Stolee [1] https://github.com/derrickstolee/git-test-coverage [2] https://derrickstolee.github.io/git-test-coverage/reports/2019-01-29.txt [3] https://derrickstolee.github.io/git-test-coverage/reports/2019-01-29.htm --- pu c7baf85843ec1dc8272483baa2f84d0ec66ad882 jch 2ea90524518a3dfcb3cda871ef2b1a739075e5d9 nextaa96b0ce6b0fa4fa4cc6870f1a3aff3878967bfa master 16a465bc018d09e9d7bbbdc5f40a7fb99c21f8ef maint 0d0ac3826a3bbb9247e39e12623bbcfdd722f24c Uncovered code in 'pu' not in 'jch' bisect.c 4f6d26b1 661) mark_edges_uninteresting(revs, NULL, 0); blame.c 07d04b91 482) ent->s_lno + ent->num_lines == next->s_lno && 07d04b91 483) ent->ignored == next->ignored) { e7973c85 940) blame_origin_decref(e->suspect); e7973c85 941) e->suspect = blame_origin_incref(parent); e7973c85 942) e->s_lno += offset; 07d04b91 943) e->ignored = 1; e7973c85 944) e->next = ignoredp; b543bb1c 1497) same = 1; b543bb1c 1503) blame_origin_decref(porigin); builtin/blame.c 07d04b91 485) length--; 31653c1a 699) if (!value) 31653c1a 700) return config_error_nonbool(var); a5481a6c 701) parse_date_format(value, &blame_date_mode); 31653c1a 702) return 0; 5817da01 777) return 0; 31653c1a 930) blame_date_width = sizeof("Thu, 19 Oct 2006 16:00:04 -0700"); 31653c1a 931) break; 5817da01 998) usage_with_options(blame_opt_usage, options); builtin/change.c 49854277 23) static int change_list(int argc, const char **argv, const char* prefix) 49854277 25) struct option options[] = { 49854277 31) struct ref_format format = REF_FORMAT_INIT; 49854277 35) argc = parse_options(argc, argv, prefix, options, builtin_list_usage, 0); 49854277 37) setup_ref_filter_porcelain_msg(); 49854277 39) memset(&filter, 0, sizeof(filter)); 49854277 40) memset(&array, 0, sizeof(array)); 49854277 42) filter.kind = FILTER_REFS_CHANGES; 49854277 43) filter.name_patterns = argv; 49854277 45) filter_refs(&array, &filter, FILTER_REFS_CHANGES); 49854277 52) if (!format.format) { 49854277 53) format.format = "%(refname:lstrip=1)"; 49854277 56) if (verify_ref_format(&format)) 49854277 57) die(_("unable to parse format string")); 49854277 59) for (i = 0; i < array.nr; i++) { 49854277 60) show_ref_array_item(array.items[i], &format); 49854277 63) ref_array_clear(&array); 49854277 65) return 0; 73f8829d 76) static void init_update_state(struct update_state *state) 73f8829d 78) memset(state, 0, sizeof(*state)); 73f8829d 79) state->content = "HEAD"; 73f8829d 80) string_list_init(&state->replace, 0); 73f8829d 81) string_list_init(&state->origin, 0); 73f8829d 82) } 73f8829d 84) static void clear_update_state(struct update_state *state) 73f8829d 86) string_list_clear(&state->replace, 0); 73f8829d 87) string_list_clear(&state->origin, 0); 73f8829d 88) } 73f8829d 90) static int update_option_parse_replace(const struct option *opt, 73f8829d 93) struct update_state *state = opt->value; 73f8829d 94) string_list_append(&state->replace, arg); 73f8829d 95) return 0; 73f8829d 98) static int update_option_parse_origin(const struct option *opt, 73f8829d 101) struct update_state *state = opt->value; 73f8829d 102) string_list_append(&state->origin, arg); 73f8829d 103) return 0; 73f8829d 106) static int resolve_commit(const char *committish, struct object_id *result) 73f8829d 109) if (get_oid_committish(committish, result)) 73f8829d 110) die(_("Failed to resolve '%s' as a valid revision."), committish); 73f8829d 111) commit = lookup_commit_reference(the_repository, result); 73f8829d 112) if (!commit) 73f8829d 113) die(_("Could not parse object '%s'."), committish); 73f8829d 114) oidcpy(result, &commit->object.oid); 73f8829d 115) return 0; 73f8829d 118) static void resolve_commit_list(const struct string_list *commitsish_list, 73f8829d 122) for (i = 0; i < commitsish_list->nr; i++) { 73f8829d 123) s
[PATCH v2 1/1] Makefile: add coverage-prove target
From: Derrick Stolee Sometimes there are test failures in the 'pu' branch. This is somewhat expected for a branch that takes the very latest topics under development, and those sometimes have semantic conflicts that only show up during test runs. This also can happen when running the test suite with different GIT_TEST_* environment variables that interact in unexpected ways This causes a problem for the test coverage reports, as the typical 'make coverage-test coverage-report' run halts at the first failed test. If that test is early in the suite, then many valuable tests are not exercising the code and the coverage report becomes noisy with false positives. Add a new 'coverage-prove' target to the Makefile, modeled after the 'coverage-test' target. This compiles the source using the coverage flags, then runs the test suite using the 'prove' tool. Since the coverage machinery is not thread-safe, enforce that the tests are run in sequence by appending '-j1' to GIT_PROVE_OPTS. Signed-off-by: Derrick Stolee --- Makefile | 5 + 1 file changed, 5 insertions(+) diff --git a/Makefile b/Makefile index 1a44c811aa..23d8730482 100644 --- a/Makefile +++ b/Makefile @@ -3077,6 +3077,11 @@ coverage-test: coverage-clean-results coverage-compile $(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \ DEFAULT_TEST_TARGET=test -j1 test +coverage-prove: coverage-clean-results coverage-compile + $(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \ + DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS="$(GIT_PROVE_OPTS) -j1" \ + -j1 test + coverage-report: $(QUIET_GCOV)for dir in $(object_dirs); do \ $(GCOV) $(GCOVFLAGS) --object-directory=$$dir $$dir*.c || exit; \ -- gitgitgadget
[PATCH v2 0/1] Makefile: add prove and coverage-prove targets
Sometimes there are test failures in the 'pu' branch. This is somewhat expected for a branch that takes the very latest topics under development, and those sometimes have semantic conflicts that only show up during test runs. This also can happen when running the test suite with different GIT_TEST_* environment variables that interact in unexpected ways. This causes a problem for the test coverage reports, as the typical 'make coverage-test coverage-report' run halts at the first failed test. If that test is early in the suite, then many valuable tests are not exercising the code and the coverage report becomes noisy with false positives. This patch adds two targets to the Makefile: 'prove' and 'coverage-prove'. The idea is to use the 'prove' tool to run the test suite, as it will track failed tests but continue running the full suite even with a failure. If/when this merges down, I will use this new target for the automation around the test coverage reports. Updates in V2: * Dropped the 'prove' target * Append '-j1' to GIT_PROVE_OPTS * Commit message tweaks. Derrick Stolee (1): Makefile: add coverage-prove target Makefile | 5 + 1 file changed, 5 insertions(+) base-commit: 0d0ac3826a3bbb9247e39e12623bbcfdd722f24c Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-114%2Fderrickstolee%2Fcoverage-prove-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-114/derrickstolee/coverage-prove-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/114 Range-diff vs v1: 1: 294187c696 < -: -- Makefile: add prove and coverage-prove targets -: -- > 1: af810fad97 Makefile: add coverage-prove target -- gitgitgadget
Re: [PATCH 1/1] Makefile: add prove and coverage-prove targets
On 1/29/2019 10:58 AM, SZEDER Gábor wrote: > On Tue, Jan 29, 2019 at 06:56:08AM -0800, Derrick Stolee via GitGitGadget > wrote: >> +prove: all >> +$(MAKE) -C t/ prove >> + > > You don't need this 'prove' target in the "main" Makefile, because > 'make test' will run the test suite using DEFAULT_TEST_TARGET anyway. Thanks! >> +coverage-prove: coverage-clean-results coverage-compile >> +$(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \ >> +DEFAULT_TEST_TARGET=prove -j1 prove > > First I was wondering why do you need a dedicated 'coverage-prove' > target, instead of letting DEFAULT_TEST_TARGET from the environment or > from 'config.mak' do its thing. But then I noticed in the hunk > context, that, for some reason, the 'coverage-test' target hardcoded > 'DEFAULT_TEST_TARGET=test -j1'. Then I was wondering why would it > want to do that, and stumbled upon commit c14cc77c11: > > coverage: set DEFAULT_TEST_TARGET to avoid using prove > > If the user sets DEFAULT_TEST_TARGET=prove in his config.mak, that > carries over into the coverage tests. Which is really bad if he also > sets GIT_PROVE_OPTS=-j<..> as that completely breaks the coverage > runs. > > Instead of attempting to mess with the GIT_PROVE_OPTS, just force the > test target to 'test' so that we run under make, like we intended all > along. Thanks for finding this! > I'm afraid that this issue would badly affect 'coverage-prove' as well > (I didn't try). Or if doesn't (anymore?), then that should be > mentioned in the commit message, and then perhaps it's time to remove > that '-j1' from the 'coverage-test' target as well. I'll fix this by forcing an update to GIT_PROVE_OPTS. It does limit our ability to use GIT_PROVE_OPTS as a pass-through, but at least this new target will have that assumption built in. Thanks, -Stolee
Re: [PATCH 1/1] Makefile: add prove and coverage-prove targets
On 1/29/2019 11:00 AM, Jeff King wrote: > On Tue, Jan 29, 2019 at 06:56:08AM -0800, Derrick Stolee via GitGitGadget > wrote: > >> From: Derrick Stolee >> >> When running the test suite for code coverage using >> 'make coverage-test', a single test failure stops the >> test suite from completing. This leads to significant >> undercounting of covered blocks. >> >> Add two new targets to the Makefile: >> >> * 'prove' runs the test suite using 'prove'. >> >> * 'coverage-prove' compiles the source using the >> coverage flags, then runs the test suite using >> 'prove'. >> >> These targets are modeled after the 'test' and >> 'coverage-test' targets. > > I think these are reasonable to have (and I personally much prefer > "prove" to the raw "make test" output anyway). > > For people who don't have "prove" available, I think they could just do > "make -k test" to make sure the full suite runs. Should we perhaps be > doing that automatically in the sub-make run by coverage-test? I wanted to avoid changing the existing behavior, if I could. But, if we can reasonably assume that anyone running 'make coverage-test' wants to run the full suite even with failures, then that's fine by me. I see from the make docs that '-k' will still result in an error code at the end of the command, so no automation would result in an incorrect response to a failed test. Am I correct? >> @@ -3077,6 +3080,10 @@ coverage-test: coverage-clean-results coverage-compile >> $(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \ >> DEFAULT_TEST_TARGET=test -j1 test >> >> +coverage-prove: coverage-clean-results coverage-compile >> +$(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \ >> +DEFAULT_TEST_TARGET=prove -j1 prove >> + > > You probably don't need to override DEFAULT_TEST_TARGET here, since the > "prove" target doesn't look at it. Likewise, "-j1" probably does nothing > here, since prove itself is a single target. As Szeder mentioned, I can probably just drop the 'prove' target and use DEFAULT_TEST_TARGET instead. Or do we think anyone will want to use 'make prove' from root? > I'm not sure why we want to enforce -j1 for these targets, but if it's > important to do so for the prove case, as well, you'd need to add it to > GIT_PROVE_OPTS. The '-j1' is necessary because the coverage data is collected in a way that is not thread-safe. Our compile options also force single-threaded behavior. I'll specifically override GIT_PROVE_OPTS here to force -j1, but also send -j1 to the 'make' command, too. Thanks, -Stolee
[PATCH 1/1] Makefile: add prove and coverage-prove targets
From: Derrick Stolee When running the test suite for code coverage using 'make coverage-test', a single test failure stops the test suite from completing. This leads to significant undercounting of covered blocks. Add two new targets to the Makefile: * 'prove' runs the test suite using 'prove'. * 'coverage-prove' compiles the source using the coverage flags, then runs the test suite using 'prove'. These targets are modeled after the 'test' and 'coverage-test' targets. Signed-off-by: Derrick Stolee --- Makefile | 7 +++ 1 file changed, 7 insertions(+) diff --git a/Makefile b/Makefile index 1a44c811aa..ec886635ae 100644 --- a/Makefile +++ b/Makefile @@ -2665,6 +2665,9 @@ export TEST_NO_MALLOC_CHECK test: all $(MAKE) -C t/ all +prove: all + $(MAKE) -C t/ prove + perf: all $(MAKE) -C t/perf/ all @@ -3077,6 +3080,10 @@ coverage-test: coverage-clean-results coverage-compile $(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \ DEFAULT_TEST_TARGET=test -j1 test +coverage-prove: coverage-clean-results coverage-compile + $(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \ + DEFAULT_TEST_TARGET=prove -j1 prove + coverage-report: $(QUIET_GCOV)for dir in $(object_dirs); do \ $(GCOV) $(GCOVFLAGS) --object-directory=$$dir $$dir*.c || exit; \ -- gitgitgadget
[PATCH 0/1] Makefile: add prove and coverage-prove targets
Sometimes there are test failures in the 'pu' branch. This is somewhat expected for a branch that takes the very latest topics under development, and those sometimes have semantic conflicts that only show up during test runs. This also can happen when running the test suite with different GIT_TEST_* environment variables that interact in unexpected ways. This causes a problem for the test coverage reports, as the typical 'make coverage-test coverage-report' run halts at the first failed test. If that test is early in the suite, then many valuable tests are not exercising the code and the coverage report becomes noisy with false positives. This patch adds two targets to the Makefile: 'prove' and 'coverage-prove'. The idea is to use the 'prove' tool to run the test suite, as it will track failed tests but continue running the full suite even with a failure. If/when this merges down, I will use this new target for the automation around the test coverage reports. Thanks, -Stolee Derrick Stolee (1): Makefile: add prove and coverage-prove targets Makefile | 7 +++ 1 file changed, 7 insertions(+) base-commit: 0d0ac3826a3bbb9247e39e12623bbcfdd722f24c Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-114%2Fderrickstolee%2Fcoverage-prove-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-114/derrickstolee/coverage-prove-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/114 -- gitgitgadget
[PATCH v2 10/14] pack-objects: add trace2 regions
From: Derrick Stolee When studying the performance of 'git push' we would like to know how much time is spent at various parts of the command. One area that could cause performance trouble is 'git pack-objects'. Add trace2 regions around the three main actions taken in this command: 1. Enumerate objects. 2. Prepare pack. 3. Write pack-file. Signed-off-by: Derrick Stolee --- builtin/pack-objects.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 889df2c755..6708529e3c 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -33,6 +33,7 @@ #include "object-store.h" #include "dir.h" #include "midx.h" +#include "trace2.h" #define IN_PACK(obj) oe_in_pack(&to_pack, obj) #define SIZE(obj) oe_size(&to_pack, obj) @@ -3472,6 +3473,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) } } + trace2_region_enter("pack-objects", "enumerate-objects", the_repository); prepare_packing_data(the_repository, &to_pack); if (progress) @@ -3486,12 +3488,20 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) if (include_tag && nr_result) for_each_ref(add_ref_tag, NULL); stop_progress(&progress_state); + trace2_region_leave("pack-objects", "enumerate-objects", the_repository); if (non_empty && !nr_result) return 0; - if (nr_result) + if (nr_result) { + trace2_region_enter("pack-objects", "prepare-pack", the_repository); prepare_pack(window, depth); + trace2_region_leave("pack-objects", "prepare-pack", the_repository); + } + + trace2_region_enter("pack-objects", "write-pack-file", the_repository); write_pack_file(); + trace2_region_leave("pack-objects", "write-pack-file", the_repository); + if (progress) fprintf_ln(stderr, _("Total %"PRIu32" (delta %"PRIu32")," -- gitgitgadget
Re: [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit'
On 1/27/2019 8:28 AM, SZEDER Gábor wrote: > On Sun, Jan 27, 2019 at 02:08:32PM +0100, SZEDER Gábor wrote: >> When the commit graph and generation numbers were introduced in >> commits 177722b344 (commit: integrate commit graph with commit >> parsing, 2018-04-10) and 83073cc994 (commit: add generation number to >> struct commit, 2018-04-25), they tried to make sure that the >> corresponding 'graph_pos' and 'generation' fields of 'struct commit' >> are initialized conservatively, as if the commit were not included in >> the commit-graph file. >> >> Alas, initializing those fields only in alloc_commit_node() missed the >> case when an object that happens to be a commit is first looked up via >> lookup_unknown_object(), and is then later converted to a 'struct >> commit' via the object_as_type() helper function (either calling it >> directly, or as part of a subsequent lookup_commit() call). >> Consequently, both of those fields incorrectly remain set to zero, >> which means e.g. that the commit is present in and is the first entry >> of the commit-graph file. This will result in wrong timestamp, parent >> and root tree hashes, if such a 'struct commit' instance is later >> filled from the commit-graph. >> >> Extract the initialization of 'struct commit's fields from >> alloc_commit_node() into a helper function, and call it from >> object_as_type() as well, to make sure that it properly initializes >> the two commit-graph-related fields, too. With this helper function >> it is hopefully less likely that any new fields added to 'struct >> commit' in the future would remain uninitialized. >> >> With this change alloc_commit_index() won't have any remaining callers >> outside of 'alloc.c', so mark it as static. >> >> Signed-off-by: SZEDER Gábor >> --- >> >> So, it turns out that ec0c5798ee (revision: use commit graph in >> get_reference(), 2018-12-04) is not the culprit after all, it merely >> highlighted a bug that is as old as the commit-graph feature itself. >> This patch fixes this and all other related issues I reported >> upthread. > > And how/why does this affect 'git describe --dirty'? > > - 'git describe' first iterates over all refs, and somewhere deep > inside for_each_ref() each commit (well, object) a ref points to > is looked up via lookup_unknown_object(). This leaves all fields > of the created object zero initialized. > > - Then it dereferences HEAD for '--dirty' and ec0c5798ee's changes > to get_reference() kick in: lookup_commit() doesn't instantiate a > brand new and freshly initialized 'struct commit', but returns the > object created in the previous step converted into 'struct > commit'. This conversion doesn't set the commit-graph fields in > 'struct commit', but leaves both as zero. get_reference() then > tries to load HEAD's commit information from the commit-graph, > find_commit_in_graph() sees the the still zero 'graph_pos' field > and doesn't perform a search through the commit-graph file, and > the subsequent fill_commit_in_graph() reads the commit info from > the first entry. > > In case of the failing test I posted earlier, where only the first > commit is in the commit-graph but HEAD isn't, this means that the > HEAD's 'struct commit' is filled with the info of HEAD^. > > - Ultimately, the diff machinery then doesn't compare the worktree > to HEAD's tree, but to HEAD^'s, finds that they differ, hence the > incorrect '-dirty' flag in the output. > > Before ec0c5798ee get_reference() simply called parse_object(), which > ignored the commit-graph, so the issue could remain hidden. Thanks for digging in, Szeder. This is a very subtle interaction, and I'm glad you caught the issue. There are likely other ways this could become problematic, including hitting BUG() statements regarding generation numbers. I recommend this be merged to 'maint' if possible. Thanks, -Stolee
Re: [PATCH v4 08/10] midx: implement midx_repack()
On 1/24/2019 4:52 PM, Derrick Stolee via GitGitGadget wrote: diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index acc5e65ecc..d6c1353514 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -383,7 +383,8 @@ test_expect_success 'setup expire tests' ' git pack-objects --revs .git/objects/pack/pack-E <<-EOF && refs/heads/E EOF - git multi-pack-index write + git multi-pack-index write && + cp -r .git/objects/pack .git/objects/pack-backup ) ' Josh: Thanks for catching the failure in PATCH 7. It's due to this line that should be part of that commit, not this one. Thanks, -Stolee
Re: [PATCH v2] log,diff-tree: add --combined-with-paths options for merges with renames
On 1/25/2019 11:54 AM, Elijah Newren wrote: > +test_expect_success '--combined-with-paths works with -z as well' ' > + printf "0f9645804ebb04cc3eef91f799eb7fb54d70cefb\0::100644 100644 > 100644 f00c965d8307308469e537302baa73048488f162 > 088bd5d92c2a8e0203ca8e7e4c2a5c692f6ae3f7 > 333b9c62519f285e1854830ade0fe1ef1d40ee1b > RR\0file\twith\ttabs\0i\tam\ttabbed\0fickle\tnaming\0" >expect && I'm guessing that you use printf here because the 'cat <<-\EOF' approach doesn't work with the special tabs? Kudos for putting in the extra effort here for the special formatting! The rest of the patch looks good. Thanks, -Stolee
Re: [PATCH] log,diff-tree: add --combined-with-paths options for merges with renames
On 1/24/2019 11:46 AM, Elijah Newren wrote: > As an alternative, I considered perhaps trying to sell it as a bugfix > (how often do people use -M, -c, and --raw together and have renames > in merge commits -- can I just change the format to include the old > names), but was worried that since diff-tree is plumbing and that the > format was documented to not include original filename(s), that I'd be > breaking backward compatibility in an important way for someone and > thus opted for a new flag to get the behavior I needed. This is wise. Changing the format is likely to break at least one GUI client or tool, especially because many depend on the installed version of Git instead of shipping with one "blessed" client. In addition, there may be whitespace in the filenames. It appears the diff format separates the filenames with tab characters. Is it possible to have tab character in a file name? That would make the output ambiguous, but is no worse than our current output in a non-combined diff. I'll repeat Brian's request for tests. I trust the compiler and the test suite more than I trust my ability to read code. Thanks, -Stolee
Re: [PATCH 0/6] Create commit-graph file format v2
On 1/24/2019 6:39 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" writes: > >> This series is based on ab/commit-graph-write-progress and bc/sha-256. > > Thanks. > > It seems that the base (i.e. merge between these two topics) you > used may have used a version of either topic (most likely the > latter) slightly older than what I have, as patches 1 and 2 seem to > lack the local variable "hashsz" in the context, near the beginning > of commit_graph_write(). I wiggled the patches in, but it has too > heavy conflict merging to 'pu', so it may have to wait until the > other topics stabilize a bit further. Sorry that the merge was painful. I would have waited longer for things to stabilize, but I'm expecting to go on paternity leave soon. Didn't want to get the idea out there before I disappear for a while. When things stabilize, I may have time to do a rebase and work out the details myself. Otherwise, everyone has my blessing to take work I've started and move it forward themselves. Thanks, -Stolee
[PATCH v4 08/10] midx: implement midx_repack()
From: Derrick Stolee To repack using a multi-pack-index, first sort all pack-files by their modified time. Second, walk those pack-files from oldest to newest, adding the packs to a list if they are smaller than the given pack-size. Finally, collect the objects from the multi-pack- index that are in those packs and send them to 'git pack-objects'. While first designing a 'git multi-pack-index repack' operation, I started by collecting the batches based on the size of the objects instead of the size of the pack-files. This allows repacking a large pack-file that has very few referencd objects. However, this came at a significant cost of parsing pack-files instead of simply reading the multi-pack-index and getting the file information for the pack-files. This object-size idea could be a direction for future expansion in this area. Signed-off-by: Derrick Stolee --- midx.c | 109 +++- t/t5319-multi-pack-index.sh | 31 +- 2 files changed, 138 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index 768a7dff73..7d81bf943e 100644 --- a/midx.c +++ b/midx.c @@ -8,6 +8,7 @@ #include "sha1-lookup.h" #include "midx.h" #include "progress.h" +#include "run-command.h" #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */ #define MIDX_VERSION 1 @@ -1113,7 +1114,113 @@ int expire_midx_packs(const char *object_dir) return result; } -int midx_repack(const char *object_dir, size_t batch_size) +struct time_and_id { + timestamp_t mtime; + uint32_t pack_int_id; +}; + +static int compare_by_mtime(const void *a_, const void *b_) { + const struct time_and_id *a, *b; + + a = (const struct time_and_id *)a_; + b = (const struct time_and_id *)b_; + + if (a->mtime < b->mtime) + return -1; + if (a->mtime > b->mtime) + return 1; return 0; } + +int midx_repack(const char *object_dir, size_t batch_size) +{ + int result = 0; + uint32_t i, packs_to_repack; + size_t total_size; + struct time_and_id *pack_ti; + unsigned char *include_pack; + struct child_process cmd = CHILD_PROCESS_INIT; + struct strbuf base_name = STRBUF_INIT; + struct multi_pack_index *m = load_multi_pack_index(object_dir, 1); + + if (!m) + return 0; + + include_pack = xcalloc(m->num_packs, sizeof(unsigned char)); + pack_ti = xcalloc(m->num_packs, sizeof(struct time_and_id)); + + for (i = 0; i < m->num_packs; i++) { + pack_ti[i].pack_int_id = i; + + if (prepare_midx_pack(m, i)) + continue; + + pack_ti[i].mtime = m->packs[i]->mtime; + } + QSORT(pack_ti, m->num_packs, compare_by_mtime); + + total_size = 0; + packs_to_repack = 0; + for (i = 0; total_size < batch_size && i < m->num_packs; i++) { + int pack_int_id = pack_ti[i].pack_int_id; + struct packed_git *p = m->packs[pack_int_id]; + + if (!p) + continue; + if (p->pack_size >= batch_size) + continue; + + packs_to_repack++; + total_size += p->pack_size; + include_pack[pack_int_id] = 1; + } + + if (total_size < batch_size || packs_to_repack < 2) + goto cleanup; + + argv_array_push(&cmd.args, "pack-objects"); + + strbuf_addstr(&base_name, object_dir); + strbuf_addstr(&base_name, "/pack/pack"); + argv_array_push(&cmd.args, base_name.buf); + strbuf_release(&base_name); + + cmd.git_cmd = 1; + cmd.in = cmd.out = -1; + + if (start_command(&cmd)) { + error(_("could not start pack-objects")); + result = 1; + goto cleanup; + } + + for (i = 0; i < m->num_objects; i++) { + struct object_id oid; + uint32_t pack_int_id = nth_midxed_pack_int_id(m, i); + + if (!include_pack[pack_int_id]) + continue; + + nth_midxed_object_oid(&oid, m, i); + xwrite(cmd.in, oid_to_hex(&oid), the_hash_algo->hexsz); + xwrite(cmd.in, "\n", 1); + } + close(cmd.in); + + if (finish_command(&cmd)) { + error(_("could not finish pack-objects")); + result = 1; + goto cleanup; + } + + result = write_midx_internal(object_dir, m, NULL); + m = NULL; + +cleanup: + if (m) + close_midx(m); + free(include_pack); + free(pack_ti); + return result; +} diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
[PATCH v4 10/10] midx: add test that 'expire' respects .keep files
From: Derrick Stolee The 'git multi-pack-index expire' subcommand may delete packs that are not needed from the perspective of the multi-pack-index. If a pack has a .keep file, then we should not delete that pack. Add a test that ensures we preserve a pack that would otherwise be expired. First, create a new pack that contains every object in the repo, then add it to the multi-pack-index. Then create a .keep file for a pack starting with "a-pack" that was added in the previous test. Finally, expire and verify that the pack remains and the other packs were expired. Signed-off-by: Derrick Stolee --- t/t5319-multi-pack-index.sh | 18 ++ 1 file changed, 18 insertions(+) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 19b769eea0..bcfa520401 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -495,4 +495,22 @@ test_expect_success 'expire works when adding new packs' ' ) ' +test_expect_success 'expire respects .keep files' ' + ( + cd dup && + git pack-objects --revs .git/objects/pack/pack-all <<-EOF && + refs/heads/A + EOF + git multi-pack-index write && + PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) && + touch $PACKA.keep && + git multi-pack-index expire && + ls -S .git/objects/pack/a-pack* | grep $PACKA >a-pack-files && + test_line_count = 3 a-pack-files && + test-tool read-midx .git/objects | grep idx >midx-list && + test_line_count = 2 midx-list + ) +' + + test_done -- gitgitgadget
[PATCH v4 06/10] multi-pack-index: implement 'expire' subcommand
From: Derrick Stolee The 'git multi-pack-index expire' subcommand looks at the existing mult-pack-index, counts the number of objects referenced in each pack-file, deletes the pack-fils with no referenced objects, and rewrites the multi-pack-index to no longer reference those packs. Refactor the write_midx_file() method to call write_midx_internal() which now takes an existing 'struct multi_pack_index' and a list of pack-files to drop (as specified by the names of their pack- indexes). As we write the new multi-pack-index, we drop those file names from the list of known pack-files. The expire_midx_packs() method removes the unreferenced pack-files after carefully closing the packs to avoid open handles. Test that a new pack-file that covers the contents of two other pack-files leads to those pack-files being deleted during the expire subcommand. Be sure to read the multi-pack-index to ensure it no longer references those packs. Signed-off-by: Derrick Stolee --- midx.c | 120 +--- t/t5319-multi-pack-index.sh | 20 ++ 2 files changed, 130 insertions(+), 10 deletions(-) diff --git a/midx.c b/midx.c index 95c39106b2..299e9b2e8f 100644 --- a/midx.c +++ b/midx.c @@ -33,6 +33,8 @@ #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t)) #define MIDX_LARGE_OFFSET_NEEDED 0x8000 +#define PACK_EXPIRED UINT_MAX + static char *get_midx_filename(const char *object_dir) { return xstrfmt("%s/pack/multi-pack-index", object_dir); @@ -381,6 +383,7 @@ struct pack_info { uint32_t orig_pack_int_id; char *pack_name; struct packed_git *p; + unsigned expired : 1; }; static int pack_info_compare(const void *_a, const void *_b) @@ -428,6 +431,7 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, packs->info[packs->nr].pack_name = xstrdup(file_name); packs->info[packs->nr].orig_pack_int_id = packs->nr; + packs->info[packs->nr].expired = 0; packs->nr++; } } @@ -587,13 +591,17 @@ static size_t write_midx_pack_names(struct hashfile *f, size_t written = 0; for (i = 0; i < num_packs; i++) { - size_t writelen = strlen(info[i].pack_name) + 1; + size_t writelen; + + if (info[i].expired) + continue; if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0) BUG("incorrect pack-file order: %s before %s", info[i - 1].pack_name, info[i].pack_name); + writelen = strlen(info[i].pack_name) + 1; hashwrite(f, info[i].pack_name, writelen); written += writelen; } @@ -675,6 +683,11 @@ static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_nee for (i = 0; i < nr_objects; i++) { struct pack_midx_entry *obj = list++; + if (perm[obj->pack_int_id] == PACK_EXPIRED) + BUG("object %s is in an expired pack with int-id %d", + oid_to_hex(&obj->oid), + obj->pack_int_id); + hashwrite_be32(f, perm[obj->pack_int_id]); if (large_offset_needed && obj->offset >> 31) @@ -721,7 +734,8 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off return written; } -int write_midx_file(const char *object_dir) +static int write_midx_internal(const char *object_dir, struct multi_pack_index *m, + struct string_list *packs_to_drop) { unsigned char cur_chunk, num_chunks = 0; char *midx_name; @@ -737,6 +751,8 @@ int write_midx_file(const char *object_dir) struct pack_midx_entry *entries = NULL; int large_offsets_needed = 0; int pack_name_concat_len = 0; + int dropped_packs = 0; + int result = 0; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) { @@ -745,7 +761,10 @@ int write_midx_file(const char *object_dir) midx_name); } - packs.m = load_multi_pack_index(object_dir, 1); + if (m) + packs.m = m; + else + packs.m = load_multi_pack_index(object_dir, 1); packs.nr = 0; packs.alloc = packs.m ? packs.m->num_packs : 16; @@ -759,13 +778,14 @@ int write_midx_file(const char *object_dir) packs.info[packs.nr].orig_pack_int_id = i; packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]); packs.info[packs.nr].p = NULL; + packs.info[packs.nr].expired = 0;
[PATCH v4 07/10] multi-pack-index: prepare 'repack' subcommand
From: Derrick Stolee In an environment where the multi-pack-index is useful, it is due to many pack-files and an inability to repack the object store into a single pack-file. However, it is likely that many of these pack-files are rather small, and could be repacked into a slightly larger pack-file without too much effort. It may also be important to ensure the object store is highly available and the repack operation does not interrupt concurrent git commands. Introduce a 'repack' subcommand to 'git multi-pack-index' that takes a '--batch-size' option. The subcommand will inspect the multi-pack-index for referenced pack-files whose size is smaller than the batch size, until collecting a list of pack-files whose sizes sum to larger than the batch size. Then, a new pack-file will be created containing the objects from those pack-files that are referenced by the multi-pack-index. The resulting pack is likely to actually be smaller than the batch size due to compression and the fact that there may be objects in the pack- files that have duplicate copies in other pack-files. The current change introduces the command-line arguments, and we add a test that ensures we parse these options properly. Since we specify a small batch size, we will guarantee that future implementations do not change the list of pack-files. In addition, we hard-code the modified times of the packs in the pack directory to ensure the list of packs sorted by modified time matches the order if sorted by size (ascending). This will be important in a future test. Signed-off-by: Derrick Stolee --- Documentation/git-multi-pack-index.txt | 11 +++ builtin/multi-pack-index.c | 12 ++-- midx.c | 5 + midx.h | 1 + t/t5319-multi-pack-index.sh| 17 + 5 files changed, 44 insertions(+), 2 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 6186c4c936..de345c2400 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -36,6 +36,17 @@ expire:: have no objects referenced by the MIDX. Rewrite the MIDX file afterward to remove all references to these pack-files. +repack:: + Create a new pack-file containing objects in small pack-files + referenced by the multi-pack-index. Select the pack-files by + examining packs from oldest-to-newest, adding a pack if its + size is below the batch size. Stop adding packs when the sum + of sizes of the added packs is above the batch size. If the + total size does not reach the batch size, then do nothing. + Rewrite the multi-pack-index to reference the new pack-file. + A later run of 'git multi-pack-index expire' will delete the + pack-files that were part of this batch. + EXAMPLES diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index 145de3a46c..c66239de33 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -5,12 +5,13 @@ #include "midx.h" static char const * const builtin_multi_pack_index_usage[] = { - N_("git multi-pack-index [--object-dir=] (write|verify|expire)"), + N_("git multi-pack-index [--object-dir=] (write|verify|expire|repack --batch-size=)"), NULL }; static struct opts_multi_pack_index { const char *object_dir; + unsigned long batch_size; } opts; int cmd_multi_pack_index(int argc, const char **argv, @@ -19,6 +20,8 @@ int cmd_multi_pack_index(int argc, const char **argv, static struct option builtin_multi_pack_index_options[] = { OPT_FILENAME(0, "object-dir", &opts.object_dir, N_("object directory containing set of packfile and pack-index pairs")), + OPT_MAGNITUDE(0, "batch-size", &opts.batch_size, + N_("during repack, collect pack-files of smaller size into a batch that is larger than this size")), OPT_END(), }; @@ -40,6 +43,11 @@ int cmd_multi_pack_index(int argc, const char **argv, return 1; } + if (!strcmp(argv[0], "repack")) + return midx_repack(opts.object_dir, (size_t)opts.batch_size); + if (opts.batch_size) + die(_("--batch-size option is only for 'repack' subcommand")); + if (!strcmp(argv[0], "write")) return write_midx_file(opts.object_dir); if (!strcmp(argv[0], "verify")) @@ -47,5 +55,5 @@ int cmd_multi_pack_index(int argc, const char **argv, if (!strcmp(argv[0], "expire")) return expire_midx_packs(opts.object_dir); - die(_("unrecognized verb: %s"), argv[0]); + die(_(&q
[PATCH v4 02/10] Docs: rearrange subcommands for multi-pack-index
From: Derrick Stolee We will add new subcommands to the multi-pack-index, and that will make the documentation a bit messier. Clean up the 'verb' descriptions by renaming the concept to 'subcommand' and removing the reference to the object directory. Helped-by: Stefan Beller Helped-by: Szeder Gábor Signed-off-by: Derrick Stolee --- Documentation/git-multi-pack-index.txt | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index f7778a2c85..1af406aca2 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes SYNOPSIS [verse] -'git multi-pack-index' [--object-dir=] +'git multi-pack-index' [--object-dir=] DESCRIPTION --- @@ -23,13 +23,13 @@ OPTIONS `/packs/multi-pack-index` for the current MIDX file, and `/packs` for the pack-files to index. +The following subcommands are available: + write:: - When given as the verb, write a new MIDX file to - `/packs/multi-pack-index`. + Write a new MIDX file. verify:: - When given as the verb, verify the contents of the MIDX file - at `/packs/multi-pack-index`. + Verify the contents of the MIDX file. EXAMPLES -- gitgitgadget
[PATCH v4 03/10] multi-pack-index: prepare for 'expire' subcommand
From: Derrick Stolee The multi-pack-index tracks objects in a collection of pack-files. Only one copy of each object is indexed, using the modified time of the pack-files to determine tie-breakers. It is possible to have a pack-file with no referenced objects because all objects have a duplicate in a newer pack-file. Introduce a new 'expire' subcommand to the multi-pack-index builtin. This subcommand will delete these unused pack-files and rewrite the multi-pack-index to no longer refer to those files. More details about the specifics will follow as the method is implemented. Add a test that verifies the 'expire' subcommand is correctly wired, but will still be valid when the verb is implemented. Specifically, create a set of packs that should all have referenced objects and should not be removed during an 'expire' operation. The packs are created carefully to ensure they have a specific order when sorted by size. This will be important in a later test. Signed-off-by: Derrick Stolee --- Documentation/git-multi-pack-index.txt | 5 +++ builtin/multi-pack-index.c | 4 ++- midx.c | 5 +++ midx.h | 1 + t/t5319-multi-pack-index.sh| 49 ++ 5 files changed, 63 insertions(+), 1 deletion(-) diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 1af406aca2..6186c4c936 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -31,6 +31,11 @@ write:: verify:: Verify the contents of the MIDX file. +expire:: + Delete the pack-files that are tracked by the MIDX file, but + have no objects referenced by the MIDX. Rewrite the MIDX file + afterward to remove all references to these pack-files. + EXAMPLES diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c index fca70f8e4f..145de3a46c 100644 --- a/builtin/multi-pack-index.c +++ b/builtin/multi-pack-index.c @@ -5,7 +5,7 @@ #include "midx.h" static char const * const builtin_multi_pack_index_usage[] = { - N_("git multi-pack-index [--object-dir=] (write|verify)"), + N_("git multi-pack-index [--object-dir=] (write|verify|expire)"), NULL }; @@ -44,6 +44,8 @@ int cmd_multi_pack_index(int argc, const char **argv, return write_midx_file(opts.object_dir); if (!strcmp(argv[0], "verify")) return verify_midx_file(opts.object_dir); + if (!strcmp(argv[0], "expire")) + return expire_midx_packs(opts.object_dir); die(_("unrecognized verb: %s"), argv[0]); } diff --git a/midx.c b/midx.c index 730ff84dff..bb825ef816 100644 --- a/midx.c +++ b/midx.c @@ -1025,3 +1025,8 @@ int verify_midx_file(const char *object_dir) return verify_midx_error; } + +int expire_midx_packs(const char *object_dir) +{ + return 0; +} diff --git a/midx.h b/midx.h index 774f652530..e3a2b740b5 100644 --- a/midx.h +++ b/midx.h @@ -49,6 +49,7 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i int write_midx_file(const char *object_dir); void clear_midx_file(struct repository *r); int verify_midx_file(const char *object_dir); +int expire_midx_packs(const char *object_dir); void close_midx(struct multi_pack_index *m); diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 70926b5bc0..a8528f7da0 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -348,4 +348,53 @@ test_expect_success 'verify incorrect 64-bit offset' ' "incorrect object offset" ' +test_expect_success 'setup expire tests' ' + mkdir dup && + ( + cd dup && + git init && + test-tool genrandom "data" 4096 >large_file.txt && + git update-index --add large_file.txt && + for i in $(test_seq 1 20) + do + test_commit $i + done && + git branch A HEAD && + git branch B HEAD~8 && + git branch C HEAD~13 && + git branch D HEAD~16 && + git branch E HEAD~18 && + git pack-objects --revs .git/objects/pack/pack-A <<-EOF && + refs/heads/A + ^refs/heads/B + EOF + git pack-objects --revs .git/objects/pack/pack-B <<-EOF && + refs/heads/B + ^refs/heads/C + EOF + git pack-objects --revs .git/objects/pack/pack-C <<-EOF && + refs/heads/C + ^refs/heads/D + EOF
[PATCH v4 05/10] midx: refactor permutation logic and pack sorting
From: Derrick Stolee In anticipation of the expire subcommand, refactor the way we sort the packfiles by name. This will greatly simplify our approach to dropping expired packs from the list. First, create 'struct pack_info' to replace 'struct pack_pair'. This struct contains the necessary information about a pack, including its name, a pointer to its packfile struct (if not already in the multi-pack-index), and the original pack-int-id. Second, track the pack information using an array of pack_info structs in the pack_list struct. This simplifies the logic around the multiple arrays we were tracking in that struct. Finally, update get_sorted_entries() to not permute the pack-int-id and instead supply the permutation to write_midx_object_offsets(). This requires sorting the packs after get_sorted_entries(). Signed-off-by: Derrick Stolee --- midx.c | 156 + 1 file changed, 69 insertions(+), 87 deletions(-) diff --git a/midx.c b/midx.c index f087bbbe82..95c39106b2 100644 --- a/midx.c +++ b/midx.c @@ -377,12 +377,23 @@ static size_t write_midx_header(struct hashfile *f, return MIDX_HEADER_SIZE; } +struct pack_info { + uint32_t orig_pack_int_id; + char *pack_name; + struct packed_git *p; +}; + +static int pack_info_compare(const void *_a, const void *_b) +{ + struct pack_info *a = (struct pack_info *)_a; + struct pack_info *b = (struct pack_info *)_b; + return strcmp(a->pack_name, b->pack_name); +} + struct pack_list { - struct packed_git **list; - char **names; + struct pack_info *info; uint32_t nr; - uint32_t alloc_list; - uint32_t alloc_names; + uint32_t alloc; struct multi_pack_index *m; }; @@ -395,66 +406,32 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, if (packs->m && midx_contains_pack(packs->m, file_name)) return; - ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list); - ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names); + ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc); - packs->list[packs->nr] = add_packed_git(full_path, - full_path_len, - 0); + packs->info[packs->nr].p = add_packed_git(full_path, + full_path_len, + 0); - if (!packs->list[packs->nr]) { + if (!packs->info[packs->nr].p) { warning(_("failed to add packfile '%s'"), full_path); return; } - if (open_pack_index(packs->list[packs->nr])) { + if (open_pack_index(packs->info[packs->nr].p)) { warning(_("failed to open pack-index '%s'"), full_path); - close_pack(packs->list[packs->nr]); - FREE_AND_NULL(packs->list[packs->nr]); + close_pack(packs->info[packs->nr].p); + FREE_AND_NULL(packs->info[packs->nr].p); return; } - packs->names[packs->nr] = xstrdup(file_name); + packs->info[packs->nr].pack_name = xstrdup(file_name); + packs->info[packs->nr].orig_pack_int_id = packs->nr; packs->nr++; } } -struct pack_pair { - uint32_t pack_int_id; - char *pack_name; -}; - -static int pack_pair_compare(const void *_a, const void *_b) -{ - struct pack_pair *a = (struct pack_pair *)_a; - struct pack_pair *b = (struct pack_pair *)_b; - return strcmp(a->pack_name, b->pack_name); -} - -static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t *perm) -{ - uint32_t i; - struct pack_pair *pairs; - - ALLOC_ARRAY(pairs, nr_packs); - - for (i = 0; i < nr_packs; i++) { - pairs[i].pack_int_id = i; - pairs[i].pack_name = pack_names[i]; - } - - QSORT(pairs, nr_packs, pack_pair_compare); - - for (i = 0; i < nr_packs; i++) { - pack_names[i] = pairs[i].pack_name; - perm[pairs[i].pack_int_id] = i; - } - - free(pairs); -} - struct pack_midx_entry { struct object_id oid; uint32_t pack_int_id; @@ -480,7 +457,6 @@ static int midx_oid_compare(const void *_a, const void *_b) } static int nth_midxed_pack_midx_entry(struct multi_pack_index *m, -
[PATCH v4 04/10] midx: simplify computation of pack name lengths
From: Derrick Stolee Before writing the multi-pack-index, we compute the length of the pack-index names concatenated together. This forms the data in the pack name chunk, and we precompute it to compute chunk offsets. The value is also modified to fit alignment needs. Previously, this computation was coupled with adding packs from the existing multi-pack-index and the remaining packs in the object dir not already covered by the multi-pack-index. In anticipation of this becoming more complicated with the 'expire' subcommand, simplify the computation by centralizing it to a single loop before writing the file. Signed-off-by: Derrick Stolee --- midx.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/midx.c b/midx.c index bb825ef816..f087bbbe82 100644 --- a/midx.c +++ b/midx.c @@ -383,7 +383,6 @@ struct pack_list { uint32_t nr; uint32_t alloc_list; uint32_t alloc_names; - size_t pack_name_concat_len; struct multi_pack_index *m; }; @@ -418,7 +417,6 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, } packs->names[packs->nr] = xstrdup(file_name); - packs->pack_name_concat_len += strlen(file_name) + 1; packs->nr++; } } @@ -762,6 +760,7 @@ int write_midx_file(const char *object_dir) uint32_t nr_entries, num_large_offsets = 0; struct pack_midx_entry *entries = NULL; int large_offsets_needed = 0; + int pack_name_concat_len = 0; midx_name = get_midx_filename(object_dir); if (safe_create_leading_directories(midx_name)) { @@ -777,7 +776,6 @@ int write_midx_file(const char *object_dir) packs.alloc_names = packs.alloc_list; packs.list = NULL; packs.names = NULL; - packs.pack_name_concat_len = 0; ALLOC_ARRAY(packs.list, packs.alloc_list); ALLOC_ARRAY(packs.names, packs.alloc_names); @@ -788,7 +786,6 @@ int write_midx_file(const char *object_dir) packs.list[packs.nr] = NULL; packs.names[packs.nr] = xstrdup(packs.m->pack_names[i]); - packs.pack_name_concat_len += strlen(packs.names[packs.nr]) + 1; packs.nr++; } } @@ -798,10 +795,6 @@ int write_midx_file(const char *object_dir) if (packs.m && packs.nr == packs.m->num_packs) goto cleanup; - if (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) - packs.pack_name_concat_len += MIDX_CHUNK_ALIGNMENT - - (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT); - ALLOC_ARRAY(pack_perm, packs.nr); sort_packs_by_name(packs.names, packs.nr, pack_perm); @@ -814,6 +807,13 @@ int write_midx_file(const char *object_dir) large_offsets_needed = 1; } + for (i = 0; i < packs.nr; i++) + pack_name_concat_len += strlen(packs.names[i]) + 1; + + if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT) + pack_name_concat_len += MIDX_CHUNK_ALIGNMENT - + (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT); + hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR); f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf); FREE_AND_NULL(midx_name); @@ -831,7 +831,7 @@ int write_midx_file(const char *object_dir) cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT; - chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + packs.pack_name_concat_len; + chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len; cur_chunk++; chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP; -- gitgitgadget
[PATCH v4 09/10] multi-pack-index: test expire while adding packs
From: Derrick Stolee During development of the multi-pack-index expire subcommand, a version went out that improperly computed the pack order if a new pack was introduced while other packs were being removed. Part of the subtlety of the bug involved the new pack being placed before other packs that already existed in the multi-pack-index. Add a test to t5319-multi-pack-index.sh that catches this issue. The test adds new packs that cause another pack to be expired, and creates new packs that are lexicographically sorted before and after the existing packs. Signed-off-by: Derrick Stolee --- t/t5319-multi-pack-index.sh | 32 1 file changed, 32 insertions(+) diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index d6c1353514..19b769eea0 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh @@ -463,4 +463,36 @@ test_expect_success 'expire removes repacked packs' ' ) ' +test_expect_success 'expire works when adding new packs' ' + ( + cd dup && + git pack-objects --revs .git/objects/pack/pack-combined <<-EOF && + refs/heads/A + ^refs/heads/B + EOF + git pack-objects --revs .git/objects/pack/pack-combined <<-EOF && + refs/heads/B + ^refs/heads/C + EOF + git pack-objects --revs .git/objects/pack/pack-combined <<-EOF && + refs/heads/C + ^refs/heads/D + EOF + git multi-pack-index write && + git pack-objects --revs .git/objects/pack/a-pack <<-EOF && + refs/heads/D + ^refs/heads/E + EOF + git multi-pack-index write && + git pack-objects --revs .git/objects/pack/z-pack <<-EOF && + refs/heads/E + EOF + git multi-pack-index expire && + ls .git/objects/pack/ | grep idx >expect && + test-tool read-midx .git/objects | grep idx >actual && + test_cmp expect actual && + git multi-pack-index verify + ) +' + test_done -- gitgitgadget
[PATCH v4 01/10] repack: refactor pack deletion for future use
From: Derrick Stolee The repack builtin deletes redundant pack-files and their associated .idx, .promisor, .bitmap, and .keep files. We will want to re-use this logic in the future for other types of repack, so pull the logic into 'unlink_pack_path()' in packfile.c. The 'ignore_keep' parameter is enabled for the use in repack, but will be important for a future caller. Signed-off-by: Derrick Stolee --- builtin/repack.c | 14 ++ packfile.c | 28 packfile.h | 7 +++ 3 files changed, 37 insertions(+), 12 deletions(-) diff --git a/builtin/repack.c b/builtin/repack.c index 45583683ee..3d445b34b4 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -129,19 +129,9 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list, static void remove_redundant_pack(const char *dir_name, const char *base_name) { - const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"}; - int i; struct strbuf buf = STRBUF_INIT; - size_t plen; - - strbuf_addf(&buf, "%s/%s", dir_name, base_name); - plen = buf.len; - - for (i = 0; i < ARRAY_SIZE(exts); i++) { - strbuf_setlen(&buf, plen); - strbuf_addstr(&buf, exts[i]); - unlink(buf.buf); - } + strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name); + unlink_pack_path(buf.buf, 1); strbuf_release(&buf); } diff --git a/packfile.c b/packfile.c index d1e6683ffe..bacecb4d0d 100644 --- a/packfile.c +++ b/packfile.c @@ -352,6 +352,34 @@ void close_all_packs(struct raw_object_store *o) } } +void unlink_pack_path(const char *pack_name, int force_delete) +{ + static const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"}; + int i; + struct strbuf buf = STRBUF_INIT; + size_t plen; + + strbuf_addstr(&buf, pack_name); + strip_suffix_mem(buf.buf, &buf.len, ".pack"); + plen = buf.len; + + if (!force_delete) { + strbuf_addstr(&buf, ".keep"); + if (!access(buf.buf, F_OK)) { + strbuf_release(&buf); + return; + } + } + + for (i = 0; i < ARRAY_SIZE(exts); i++) { + strbuf_setlen(&buf, plen); + strbuf_addstr(&buf, exts[i]); + unlink(buf.buf); + } + + strbuf_release(&buf); +} + /* * The LRU pack is the one with the oldest MRU window, preferring packs * with no used windows, or the oldest mtime if it has no windows allocated. diff --git a/packfile.h b/packfile.h index 6c4037605d..5b7bcdb1dd 100644 --- a/packfile.h +++ b/packfile.h @@ -86,6 +86,13 @@ extern void unuse_pack(struct pack_window **); extern void clear_delta_base_cache(void); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); +/* + * Unlink the .pack and associated extension files. + * Does not unlink if 'force_delete' is false and the pack-file is + * marked as ".keep". + */ +extern void unlink_pack_path(const char *pack_name, int force_delete); + /* * Make sure that a pointer access into an mmap'd index file is within bounds, * and can provide at least 8 bytes of data. -- gitgitgadget
[PATCH v4 00/10] Create 'expire' and 'repack' verbs for git-multi-pack-index
The multi-pack-index provides a fast way to find an object among a large list of pack-files. It stores a single pack-reference for each object id, so duplicate objects are ignored. Among a list of pack-files storing the same object, the most-recently modified one is used. Create new subcommands for the multi-pack-index builtin. * 'git multi-pack-index expire': If we have a pack-file indexed by the multi-pack-index, but all objects in that pack are duplicated in more-recently modified packs, then delete that pack (and any others like it). Delete the reference to that pack in the multi-pack-index. * 'git multi-pack-index repack --batch-size=': Starting from the oldest pack-files covered by the multi-pack-index, find those whose on-disk size is below the batch size until we have a collection of packs whose sizes add up to the batch size. Create a new pack containing all objects that the multi-pack-index references to those packs. This allows us to create a new pattern for repacking objects: run 'repack'. After enough time has passed that all Git commands that started before the last 'repack' are finished, run 'expire' again. This approach has some advantages over the existing "repack everything" model: 1. Incremental. We can repack a small batch of objects at a time, instead of repacking all reachable objects. We can also limit ourselves to the objects that do not appear in newer pack-files. 2. Highly Available. By adding a new pack-file (and not deleting the old pack-files) we do not interrupt concurrent Git commands, and do not suffer performance degradation. By expiring only pack-files that have no referenced objects, we know that Git commands that are doing normal object lookups* will not be interrupted. 3. Note: if someone concurrently runs a Git command that uses get_all_packs(), then that command could try to read the pack-files and pack-indexes that we are deleting during an expire command. Such commands are usually related to object maintenance (i.e. fsck, gc, pack-objects) or are related to less-often-used features (i.e. fast-import, http-backend, server-info). We plan to use this approach in VFS for Git to do background maintenance of the "shared object cache" which is a Git alternate directory filled with packfiles containing commits and trees. We currently download pack-files on an hourly basis to keep up-to-date with the central server. The cache servers supply packs on an hourly and daily basis, so most of the hourly packs become useless after a new daily pack is downloaded. The 'expire' command would clear out most of those packs, but many will still remain with fewer than 100 objects remaining. The 'repack' command (with a batch size of 1-3gb, probably) can condense the remaining packs in commands that run for 1-3 min at a time. Since the daily packs range from 100-250mb, we will also combine and condense those packs. Updates in V2: * Added a method, unlink_pack_path() to remove packfiles, but with the additional check for a .keep file. This borrows logic from builtin/repack.c. * Modified documentation and commit messages to replace 'verb' with 'subcommand'. Simplified the documentation. (I left 'verbs' in the title of the cover letter for consistency.) Updates in V3: * There was a bug in the expire logic when simultaneously removing packs and adding uncovered packs, specifically around the pack permutation. This was hard to see during review because I was using the 'pack_perm' array for multiple purposes. First, I was reducing its length, and then I was adding to it and resorting. In V3, I significantly overhauled the logic here, which required some extra commits before implementing 'expire'. The final commit includes a test that would cover this case. Updates in V4: * More 'verb' and 'command' instances replaced with 'subcommand'. I grepped the patch to check these should be fixed everywhere. * Update the tests to check .keep files (in last patch). * Modify the tests to show the terminating condition of --batch-size when there are three packs that fit under the size, but the first two are large enough to stop adding packs. This required rearranging the packs slightly to get different sizes than we had before. Also, I added 'touch -t' to set the modified times so we can fix the order in which the packs are selected. * Added a comment about the purpose of pack_perm. Thanks, -Stolee Derrick Stolee (10): repack: refactor pack deletion for future use Docs: rearrange subcommands for multi-pack-index multi-pack-index: prepare for 'expire' subcommand midx: simplify
Re: [PATCH v3 7/9] multi-pack-index: prepare 'repack' subcommand
On 1/23/2019 5:38 PM, Jonathan Tan wrote: diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt index 6186c4c936..cc63531cc0 100644 --- a/Documentation/git-multi-pack-index.txt +++ b/Documentation/git-multi-pack-index.txt @@ -36,6 +36,17 @@ expire:: have no objects referenced by the MIDX. Rewrite the MIDX file afterward to remove all references to these pack-files. +repack:: + Collect a batch of pack-files whose size are all at most the + size given by --batch-size, but whose sizes sum to larger + than --batch-size. The batch is selected by greedily adding + small pack-files starting with the oldest pack-files that fit + the size. Create a new pack-file containing the objects the + multi-pack-index indexes into those pack-files, and rewrite + the multi-pack-index to contain that pack-file. A later run + of 'git multi-pack-index expire' will delete the pack-files + that were part of this batch. I see in the subsequent patch that you stop once the batch size is matched or exceeded - I see that you mention "whose sizes sum to larger than --batch-size", but this leads me to think that if the total so happens to not exceed the batch size, don't do anything, but otherwise repack *all* the small packs together. I would write this as: Create a new packfile containing the objects in the N least-sized packfiles referenced by the multi-pack-index, where N is the smallest number such that the total size of the packfiles equals or exceeds the given batch size. Rewrite the multi-pack-index to reference the new packfile instead of the N packfiles. A later run of ... Thanks for the suggestion. It is slightly wrong, in that we don't sort by size. Instead we sort by modified time. That makes is a little complicated, but I'll give it another shot using your framing: Create a new pack-file containing objects in small pack-files referenced by the multi-pack-index. Select the pack-files by examining packs from oldest-to-newest, adding a pack if its size is below the batch size. Stop adding packs when the sum of sizes of the added packs is above the batch size. If the total size does not reach the batch size, then do nothing. Rewrite the multi-pack-index to reference the new pack-file. A later run of 'git multi-pack-index expire' will delete the pack-files that were part of this batch. -Stolee
Re: Git Test Coverage Report (Sat Jan 19)
On 1/24/2019 1:15 PM, Junio C Hamano wrote: Derrick Stolee writes: Here is today's test coverage report. Also, there has been some feedback that it can be hard to manually match up uncovered lines with names at the bottom of the summary. The suggestion was to auto-generate an HTML report that could be posted to a public page and referenced in this mail for those who prefer that. I wanted to "grep" for lines attributed to certain commits that appear in the list, by filtering lines that begin with enough number of hexdigits, except for those object names, but the attempt failed miserably because of the line wrapping (which probably comes from the assumption that it is OK because the "text/plain; format=flowed" would not care). If you can keep the long lines (due to the object names and line numbers prefixed to each line) unsplit, it would be more useful to locate and isolate lines. This is likely more a problem with my workflow (pasting the report into Thunderbird and sending) than with the content itself. If I instead created a text document and sent it with `git send-email`, then would the line endings work the way you want? Thanks, -Stolee
Re: [PATCH v3 5/9] midx: refactor permutation logic and pack sorting
On 1/24/2019 12:34 PM, Derrick Stolee wrote: On 1/23/2019 4:00 PM, Jonathan Tan wrote: Indeed, the sorting of pack_info is moved to after get_sorted_entries(). Also, pack_perm[old number] = new number, as expected. Thanks for chiming in with all the detail on the use of 'perm'. This is the most confusing part of this code path. I think a comment explaining why the perm is needed would be helpful - something explaining that the entries were generated using the old pack numbers, so we need this mapping to be able to write them using the new numbers. I can put this comment in the struct definition. Is that the right place for it? I mistakenly thought the pack_perm array was placed into the pack_list struct. I'll put the comment right before we populate the contents of the array. Thanks, -Stolee
Re: [PATCH v3 6/9] multi-pack-index: implement 'expire' verb
On 1/23/2019 5:13 PM, Jonathan Tan wrote: Maybe add a fsck at the end for sanity's sake. Also, I think that preservation of .keep packfiles is an important feature, and maybe worth a test. Good points! I forgot to test the .keep stuff directly here because I have an equivalent test in VFS for Git, so my wires got crossed. Thanks, -Stolee
Re: [PATCH v3 5/9] midx: refactor permutation logic and pack sorting
On 1/23/2019 4:00 PM, Jonathan Tan wrote: Indeed, the sorting of pack_info is moved to after get_sorted_entries(). Also, pack_perm[old number] = new number, as expected. Thanks for chiming in with all the detail on the use of 'perm'. This is the most confusing part of this code path. I think a comment explaining why the perm is needed would be helpful - something explaining that the entries were generated using the old pack numbers, so we need this mapping to be able to write them using the new numbers. I can put this comment in the struct definition. Is that the right place for it? As part of this review, I did attempt to make everything use the sorted pack_info order, but failed. I got as far as making get_sorted_entries() always use start_pack=0 and skip the pack_infos that didn't have the p pointer (to match the existing behavior of get_sorted_entries() that only operates on new pack_infos being added by add_pack_to_midx), but it still didn't work, and I didn't investigate further. In the previous version, I tried to use 'perm' and the arrays as they existed previously, but that led to the bug in that version. Hopefully the new code is easier to read and understand. The delta may actually be hard to clearly see that it does the same work, but reading the code after it is applied looks clearer (to me). Thanks, -Stolee