[PATCH 14/17] commit-graph: load split commit-graph files

2019-05-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Starting with commit-graph, load commit-graph files in a
sequence as follows:

  commit-graph
  commit-graph-1
  commit-graph-2
  ...
  commit-graph-N

This creates N + 1 files in order.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 39 ++-
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index f790f44a9c..5f6193277a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -45,6 +45,12 @@ char *get_commit_graph_filename(const char *obj_dir)
return xstrfmt("%s/info/commit-graph", obj_dir);
 }
 
+static char *get_split_graph_filename(const char *obj_dir,
+ uint32_t split_count)
+{
+   return xstrfmt("%s/info/commit-graphs/commit-graph-%d", obj_dir, 
split_count);
+}
+
 static uint8_t oid_version(void)
 {
return 1;
@@ -289,15 +295,31 @@ static struct commit_graph *load_commit_graph_one(const 
char *graph_file)
 static void prepare_commit_graph_one(struct repository *r, const char *obj_dir)
 {
char *graph_name;
+   uint32_t split_count = 1;
+   struct commit_graph *g;
 
if (r->objects->commit_graph)
return;
 
graph_name = get_commit_graph_filename(obj_dir);
-   r->objects->commit_graph =
-   load_commit_graph_one(graph_name);
-
+   g = load_commit_graph_one(graph_name);
FREE_AND_NULL(graph_name);
+
+   while (g) {
+   g->base_graph = r->objects->commit_graph;
+
+   if (g->base_graph)
+   g->num_commits_in_base = g->base_graph->num_commits +
+
g->base_graph->num_commits_in_base;;
+
+   r->objects->commit_graph = g;
+
+   graph_name = get_split_graph_filename(obj_dir, split_count);
+   g = load_commit_graph_one(graph_name);
+   FREE_AND_NULL(graph_name);
+
+   split_count++;
+   }
 }
 
 /*
@@ -411,8 +433,15 @@ static struct commit_list **insert_parent_or_die(struct 
repository *r,
 
 static void fill_commit_graph_info(struct commit *item, struct commit_graph 
*g, uint32_t pos)
 {
-   const unsigned char *commit_data = g->chunk_commit_data + 
GRAPH_DATA_WIDTH * pos;
-   item->graph_pos = pos + g->num_commits_in_base;
+   const unsigned char *commit_data;
+
+   if (pos < g->num_commits_in_base) {
+   fill_commit_graph_info(item, g->base_graph, pos);
+   return;
+   }
+
+   commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * (pos - 
g->num_commits_in_base);
+   item->graph_pos = pos;
item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
 }
 
-- 
gitgitgadget



[PATCH 01/17] commit-graph: fix the_repository reference

2019-05-08 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The parse_commit_buffer() method takes a repository pointer, so it
should not refer to the_repository anymore.

Signed-off-by: Derrick Stolee 
---
 commit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index a5333c7ac6..e4d1233226 100644
--- a/commit.c
+++ b/commit.c
@@ -443,7 +443,7 @@ int parse_commit_buffer(struct repository *r, struct commit 
*item, const void *b
item->date = parse_commit_date(bufptr, tail);
 
if (check_graph)
-   load_commit_graph_info(the_repository, item);
+   load_commit_graph_info(r, item);
 
return 0;
 }
-- 
gitgitgadget



Re: [PATCH 01/19] revision.h: avoid bit fields in struct rev_info

2019-05-08 Thread Derrick Stolee
On 5/8/2019 10:41 AM, Duy Nguyen wrote:
> On Wed, May 8, 2019 at 9:07 PM Derrick Stolee  wrote:
>>
>> On 5/8/2019 7:12 AM, Nguyễn Thái Ngọc Duy wrote:
>>> Bitfield addresses cannot be passed around in a pointer. This makes it
>>> hard to use parse-options to set/unset them. Turn this struct to
>>> normal integers. This of course increases the size of this struct
>>> multiple times, but since we only have a handful of rev_info variables
>>> around, memory consumption is not at all a concern.
>>
>> I think you are right that this memory trade-off shouldn't be a problem.
>>
>> What worries me instead is that we are using an "internal" data structure
>> for option parsing. Would it make more sense to create a struct for use
>> in the parse_opts method and a method that translates those options into
>> the bitfield in struct rev_info?
> 
> But we are doing that now (option parsing) using the same data
> structure. Why would changing from a custom parser to parse_options()
> affect what fields it should or should not touch in rev_info? Genuine
> question. Maybe you could elaborate more about "internal". I probably
> missed something. Or maybe this is a good opportunity to separate
> intermediate option parsing variables from the rest of rev_info?

You're right. I was unclear.

rev_info stores a lot of data. Some of the fields are important
in-memory structures that are crucial to the workings of revision.c
and are never used by builtin/rev-list.c. Combining this purpose
with the option parsing seems smelly to me.

Thinking more on it, I would prefer a more invasive change that may
pay off in the long term. These options, along with the "starting list"
values, could be extracted to a 'struct rev_options' that contains these
integers and the commit list. Then your option parsing changes could be
limited to a rev_options struct, which is later inserted into a rev_info
struct during setup_revisions().

Generally, the rev_info struct has too many members and could be split
into smaller pieces according to purpose. I created the topo_walk_info
struct as a way to not make the situation worse, but doesn't fix existing
pain.

My ramblings are mostly complaining about old code that grew organically
across many many quality additions. It is definitely hard to understand
the revision-walking code, and perhaps it would be easier to understand
with a little more structure.

The biggest issue with my suggestion is that it requires changing the
consumers of the options, as they would no longer live directly on the
rev_info struct. That would be a big change, even if it could be done
with string replacement.

Thanks,
-Stolee 


Re: [PATCH 1/2] midx: pass a repository pointer

2019-05-08 Thread Derrick Stolee
On 5/7/2019 4:31 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget"  writes:
> 
>> From: Derrick Stolee 
>>
>> Much of the multi-pack-index code focuses on the multi_pack_index
>> struct, and so we only pass a pointer to the current one. However,
>> we will insert a dependency on the packed_git linked list in a
>> future change, so we will need a repository reference. Inserting
>> these parameters is a significant enough change to split out.
>>
>> Signed-off-by: Derrick Stolee 
>> ---
> 
> This is a good move in the loger term, but not a happy thing to do
> while your "expire" topic is still in flight, as the impact from
> updating the signature of prepare_midx_pack() and friends will break
> new callers in expire_midx_packs() etc.
> 
> I am tempted to queue this and eject ds/midx-expire-repack for now,
> while checking how that topic would look like when rebased on top of
> these two patches.  We'll see.

Sorry for the noise. I consider this series a higher priority, as it
fixes a problem with the feature in existing releases.

Thanks,
-Stolee


Re: [PATCH v3] repack: enable bitmaps by default on bare repos

2019-05-08 Thread Derrick Stolee
On 5/8/2019 3:11 AM, Jeff King wrote:
> On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
>>> I think we'd want a way to tell the bitmap code to update our progress
>>> meter as it traverses (both single objects, but also taking into account
>>> when it finds a bitmap and then suddenly bumps the value by a large
>>> amount).
>>
>> Not splitting it will fix the progress bar stalling, so it fixes the
>> problem that the user is wondering if the command is entirely hanging.
>>
>> But I was hoping to give the user an idea of roughly where we're
>> spending our time, e.g. so you can see how much the pack.useSparse
>> setting is helping (or not).
> 
> Yeah, I think that's a bigger and more complicated problem. I admit that
> my main annoyance is just the stall while we fill in the bitmaps (and
> it's easy because the bitmap traversal is the same unit of work as a
> regular traversal).

The pack.useSparse setting also speeds up a section that is not marked
by progress: that portion usually is walking all UNINTERESTING trees and
the"Enumerating Objects" progress is just for walking the INTERESTING objects.
 
>> So something where we report sub-progress as we go along, and perhaps
>> print some brief summary at the end if it took long enough, e.g.:
>>
>> Enumerating Objects (X^1%) => Marking trees (Y^1%)
>> Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%)

I like this idea for splitting the "normal" mechanism, too:

 Enumerating Objects (X^1%) => Marking trees (Y^1%)
 Enumerating Objects (X^2%) => Enumerating objects to pack (Y^2%)

>> I.e. bringing the whole "nested" trace2 regions full circle with the
>> progress bar where we could elect to trace/show some of that info, and
>> then you could turn on some trace2 mode/verbose progress to see more.
> 
> I do wonder if this really needs to be part of the progress bar. The
> goal of the progress bar is to give the user a sense that work is
> happening, and (if possible, but not for "enumerating") an idea of when
> it might finish. If the trace code can already do detailed timings, then
> shouldn't we just be encouraging people to use that?

The problem I've seen (without bitmaps) is that running `git push` can
take a while before _any_ progress is listed.

Good news is: `pack.useSparse` fixed our push problem in the Windows OS
repo. The end-to-end time for `git push` sped up by 7.7x with the change,
and this "blank" time is too fast for users to notice.

Updating the progress could help in cases without pack.useSparse.

Thanks,
-Stolee


Re: [PATCH 01/19] revision.h: avoid bit fields in struct rev_info

2019-05-08 Thread Derrick Stolee
On 5/8/2019 7:12 AM, Nguyễn Thái Ngọc Duy wrote:
> Bitfield addresses cannot be passed around in a pointer. This makes it
> hard to use parse-options to set/unset them. Turn this struct to
> normal integers. This of course increases the size of this struct
> multiple times, but since we only have a handful of rev_info variables
> around, memory consumption is not at all a concern.

I think you are right that this memory trade-off shouldn't be a problem.

What worries me instead is that we are using an "internal" data structure
for option parsing. Would it make more sense to create a struct for use
in the parse_opts method and a method that translates those options into
the bitfield in struct rev_info?

Thanks,
-Stolee


Re: [PATCH] commit-graph: fix memory leak

2019-05-06 Thread Derrick Stolee
On 5/6/2019 5:58 PM, Emily Shaffer wrote:
> Hi,
> 
> This change looks good to me, and like good evidence for the benefits of
> automated tooling :)

Same here! Keep up the great work here.

-Stolee


Re: [PATCH v3 0/6] Create commit-graph file format v2

2019-05-06 Thread Derrick Stolee
On 5/6/2019 4:27 AM, Christian Couder wrote:
> On Fri, May 3, 2019 at 3:44 PM Ævar Arnfjörð Bjarmason  
> wrote:
>> 1) We can stat() the "commit-graphs" directory to see if there's any
>>new/deleted ones (dir mtime changed), similar to what we do for the
>>untracked cache, and can (but I don't think we do...) do for packs/*
>>and objects/??/.
>>
>>As opposed to ".git/objects/info" itself which e.g. has the "packs",
>>"alternates" etc. files (can still do it, but more false positives)
> 
> About incremental commit-graph files and alternates, I wonder if they
> could work well together. The main use case would be for servers that
> use a common repo for all the forks.

We use a "shared object cache" in VFS for Git, implemented as an alternate,
so all enlistments share prefetch packs, multi-pack-indexes, and commit-graph
files.  This is something we are very focused on.

Thanks,
-Stolee



Re: [PATCH v3 0/6] Create commit-graph file format v2

2019-05-03 Thread Derrick Stolee
On 5/3/2019 10:16 AM, SZEDER Gábor wrote:
> On Fri, May 03, 2019 at 08:47:25AM -0400, Derrick Stolee wrote:
>> It would be much simpler to restrict the model. Your idea of changing
>> the file name is the inspiration here.
>>
>> * The "commit-graph" file is the base commit graph. It is always
>>   closed under reachability (if a commit exists in this file, then
>>   its parents are also in this file). We will also consider this to
>>   be "commit-graph-0".
>>
>> * A commit-graph- exists, then we check for the existence of
>>   commit-graph-. This file can contain commits whose parents
>>   are in any smaller file.
>>
>> I think this resolves the issue of back-compat without updating
>> the file format:
>>
>> 1. Old clients will never look at commit-graph-N, so they will
>>never complain about an "incomplete" file.
>>
>> 2. If we always open a read handle as we move up the list, then
>>a "merge and collapse" write to commit-graph-N will not
>>interrupt an existing process reading that file.
> 
> What if a process reading the commit-graph files runs short on file
> descriptors and has to close some of them, while a second process is
> merging commit-graph files?

We will want to keep the number small so we never recycle the file
handles. Instead, we will keep them open for the entire process.

The strategies for creating these graphs should include a "merge"
strategy that keeps the number of commit-graph files very small
(fewer than 5 should be sufficient).
 
>> I'll start hacking on this model.
> 
> Have fun! :)
> 
> 
> Semi-related, but I'm curious:  what are your plans for 'struct
> commit's 'graph_pos' field, and how will it work with multiple
> commit-graph files?

Since we have a predefined "sequence" of graphs, the graph_pos
will be the position in the "meta-order" given by concatenating
the commit lists from each commit-graph. We then navigate to a commit
in O(num graphs) instead of O(1).

In the commit-graph format, we will use this "meta-order" number
to refer to parent positions.

> In particular: currently we use this 'graph_pos' field as an index
> into the Commit Data chunk to find the metadata associated with a
> given commit object.  But we could add any commit-specific metadata in
> a new chunk, being an array ordered by commit OID, and then use
> 'graph_pos' as an index into this chunk as well.  I find this quite
> convenient.  However, with mulitple commit-graph files there will be
> multiple arrays...

Yes, this will continue to be useful*. To find the position inside
a specific commit-graph-N file, take the graph_pos and subtract the
number of commits in the "lower" commit-graph files.

* For example, this meta-data information is necessary for the Bloom
filter data [1].

Thanks,
-Stolee

[1] https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5...@gmail.com/


Re: [PATCH v3 0/6] Create commit-graph file format v2

2019-05-03 Thread Derrick Stolee
On 5/2/2019 2:02 PM, Ævar Arnfjörð Bjarmason wrote:
>
> But those are separate from any back-compat concerns, which is what I
> think makes sense to focus on now.

Thinking more on this topic, I think I have a way to satisfy _all_ of
your concerns by simplifying the plan for incremental commit-graph files.

My initial plan was to have the "commit-graph" file always be the "tip"
file, and it would point to some number of "graph-{hash}.graph" files.
Then, we would have some set of strategies to decide when we create a new
.graph file or when we compact the files down into the "commit-graph"
file.

This has several issues regarding race conditions that I had not yet
resolved (and maybe would always have problems).

It would be much simpler to restrict the model. Your idea of changing
the file name is the inspiration here.

* The "commit-graph" file is the base commit graph. It is always
  closed under reachability (if a commit exists in this file, then
  its parents are also in this file). We will also consider this to
  be "commit-graph-0".

* A commit-graph- exists, then we check for the existence of
  commit-graph-. This file can contain commits whose parents
  are in any smaller file.

I think this resolves the issue of back-compat without updating
the file format:

1. Old clients will never look at commit-graph-N, so they will
   never complain about an "incomplete" file.

2. If we always open a read handle as we move up the list, then
   a "merge and collapse" write to commit-graph-N will not
   interrupt an existing process reading that file.

I'll start hacking on this model.

Thanks,
-Stolee


Re: [PATCH v3 0/6] Create commit-graph file format v2

2019-05-02 Thread Derrick Stolee
On 5/1/2019 4:25 PM, Ævar Arnfjörð Bjarmason wrote:
> I won't repeat my outstanding v2 feedback about v1 & v2
> incompatibilities, except to say that I'd in principle be fine with
> having a v2 format the way this series is adding it. I.e. saying "here
> it is, it's never written by default, we'll figure out these compat
> issues later".
> 
> My only objection/nit on that point would be that the current
> docs/commit messages should make some mention of the really bad
> interactions between v1 and v2 on different git versions.

Good idea to add some warnings in the docs to say something like
"version 2 is not supported by Git 2.2x and earlier".

> However, having written this again I really don't understand why we need
> a v2 of this format at all.

[snip]

> How about we instead just don't change the header? I.e.:
> 
>  * Let's just live with "1" as the marker for SHA-1.
> 
>Yeah it would be cute to use 0x73686131 instead like "struct
>git_hash_algo", but we can live with a 1=0x73686131 ("sha1"),
>2=0x73323536 ("s256") mapping somewhere. It's not like we're going to
>be running into the 255 limit of hash algorithms Git will support any
>time soon.

This was the intended direction of using the 1-byte value before, but
we have a preferred plan to use the 4-byte value in all future file formats.

>  * Don't add the reachability index version *to the header* or change
>the reserved byte to be an error (see [1] again).

Since we can make the "corrected commit date" offset for a commit be
strictly larger than the offset of a parent, we can make it so an old client
will not give incorrect values when we use the new values. The only downside
would be that we would fail on 'git commit-graph verify' since the offsets
are not actually generation numbers in all cases.

> Instead we just add these things to new "chunks" as appropriate. As this
> patch of mine shows we can easily do that, and it doesn't error out on
> any existing version of git:
> https://github.com/avar/git/commit/3fca63e12a9d38867d4bc0a8a25d419c00a09d95

I like the idea of a "metadata" chunk. This can be useful for a lot of things.
If we start the chunk with a "number of items" and only append items to the
list, we can dynamically grow the chunk as we add values.

> I now can't imagine a situation where we'd ever need to change the
> format. We have 32 bits of chunk ids to play with, and can have 255 of
> these chunks at a time, and unknown chunks are ignored by existing
> versions and future version.

The solutions you have discussed work for 2 of the 3 problems at hand.
The incremental file format is a high-value feature, but _would_ break
existing clients if they don't understand the extra data. Unless I am
missing something for how to succeed here.

> 1. See feedback on the v2 patch in
>https://public-inbox.org/git/87muk6q98k@evledraar.gmail.com/

My response [2] to that message includes the discussion of the
incremental file format.

[2] https://public-inbox.org/git/87muk6q98k@evledraar.gmail.com/




Re: [PATCH v3 6/6] commit-graph: remove Future Work section

2019-05-01 Thread Derrick Stolee
On 5/1/2019 10:58 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, May 01 2019, Derrick Stolee via GitGitGadget wrote:
> 
>> The commit-graph feature began with a long list of planned
>> benefits, most of which are now complete. The future work
>> section has only a few items left.
>>
>> As for making more algorithms aware of generation numbers,
>> some are only waiting for generation number v2 to ensure the
>> performance matches the existing behavior using commit date.
>>
>> It is unlikely that we will ever send a commit-graph file
>> as part of the protocol, since we would need to verify the
>> data, and that is as expensive as writing a commit-graph from
>> scratch. If we want to start trusting remote content, then
>> that item can be investigated again.
> 
> My best of 3 times for "write" followed by "verify" on linux.git are
> 8.7/7.9 real/user for "write" and 5.2/4.9 real/user for "write".
> 
> So that's a reduction of ~40%. I have another big in-house repo where I
> get similar numbers of 17/16 for "write" and 10/9 for "verify". Both for
> a commit-graph file on the order of 50MB where it would be quicker for
> me to download and verify it if the protocol supported it.

Keep in mind that your first "write" may have warmed up the file system
and your pack-files parsed faster the second time around.

You are right though, 'verify' doesn't do these things:

1. Sort a list of OIDs.
2. Write a file.

And perhaps some other things. I should mean that "the main task of
'git commit-graph verify' is to parse commits from the object store,
and this is the most expensive operation in 'git commit-graph write'."

Thanks,
-Stolee



Re: [PATCH v3 5/6] commit-graph: implement file format version 2

2019-05-01 Thread Derrick Stolee
On 5/1/2019 3:12 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> OK, let's try that then, on top of this series:
> 
> diff --git a/commit-graph.c b/commit-graph.c
> index 5eebba6a0f..36c8cdb950 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1127,7 +1127,7 @@ int write_commit_graph(const char *obj_dir,
> case 2:
> hashwrite_u8(f, num_chunks);
> hashwrite_u8(f, 1); /* reachability index version */
> -   hashwrite_u8(f, 0); /* unused padding byte */
> +   hashwrite_u8(f, 1); /* unused padding byte */
> 
> Then:
> 
> $ ~/g/git/git --exec-path=$PWD commit-graph write --version=2; 
> ~/g/git/git --exec-path=$PWD status
> Expanding reachable commits in commit graph: 100% (201645/201645), done.
> Computing commit graph generation numbers: 100% (200556/200556), done.
> error: unsupported value in commit-graph header
> HEAD detached at pr-112/derrickstolee/graph/v2-head-v3
> 
> So we'll error out in the same way as if "2.0" was changed to "3.0" with
> this "2.1" change, just with a more cryptic error message on this new v2
> code.
> 
> I don't see how this is meaningfully different from just bumping the
> version to "3". We abort parsing the graph just like with major version
> changes.

Having a non-zero value here doesn't really mean "2.1" or "3". But I understand
your apprehension.

I'm currently working on building the incremental file format, based on
this series. This "unused" byte will be used to say "how many base commit-graph
files does this graph have?" If non-zero, we do not currently understand
how to stitch these files together into a "combined" graph at run time,
so we should fail.

If we should never have an intermediate version of Git that doesn't
understand this byte, then this series can wait until that feature is ready.

Thanks,
-Stolee



[PATCH v3 6/6] commit-graph: remove Future Work section

2019-05-01 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The commit-graph feature began with a long list of planned
benefits, most of which are now complete. The future work
section has only a few items left.

As for making more algorithms aware of generation numbers,
some are only waiting for generation number v2 to ensure the
performance matches the existing behavior using commit date.

It is unlikely that we will ever send a commit-graph file
as part of the protocol, since we would need to verify the
data, and that is as expensive as writing a commit-graph from
scratch. If we want to start trusting remote content, then
that item can be investigated again.

While there is more work to be done on the feature, having
a section of the docs devoted to a TODO list is wasteful and
hard to keep up-to-date.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 17 -
 1 file changed, 17 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index 7805b0968c..fb53341d5e 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -127,23 +127,6 @@ Design Details
   helpful for these clones, anyway. The commit-graph will not be read or
   written when shallow commits are present.
 
-Future Work

-
-- After computing and storing generation numbers, we must make graph
-  walks aware of generation numbers to gain the performance benefits they
-  enable. This will mostly be accomplished by swapping a commit-date-ordered
-  priority queue with one ordered by generation number. The following
-  operations are important candidates:
-
-- 'log --topo-order'
-- 'tag --merged'
-
-- A server could provide a commit-graph file as part of the network protocol
-  to avoid extra calculations by clients. This feature is only of benefit if
-  the user is willing to trust the file, because verifying the file is correct
-  is as hard as computing it from scratch.
-
 Related Links
 -
 [0] https://bugs.chromium.org/p/git/issues/detail?id=8
-- 
gitgitgadget


[PATCH v3 2/6] commit-graph: collapse parameters into flags

2019-05-01 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
We will soon expand the possible options to send to these methods, so
instead of complicating the parameter list, first simplify it.

Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 8 +---
 builtin/commit.c   | 2 +-
 builtin/gc.c   | 4 ++--
 commit-graph.c | 9 +
 commit-graph.h | 8 +---
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 2e86251f02..828b1a713f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
struct string_list *commit_hex = NULL;
struct string_list lines;
int result;
+   int flags = COMMIT_GRAPH_PROGRESS;
 
static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -166,11 +167,13 @@ static int graph_write(int argc, const char **argv)
die(_("use at most one of --reachable, --stdin-commits, or 
--stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
+   if (opts.append)
+   flags |= COMMIT_GRAPH_APPEND;
 
read_replace_refs = 0;
 
if (opts.reachable)
-   return write_commit_graph_reachable(opts.obj_dir, opts.append, 
1);
+   return write_commit_graph_reachable(opts.obj_dir, flags);
 
string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
@@ -190,8 +193,7 @@ static int graph_write(int argc, const char **argv)
result = write_commit_graph(opts.obj_dir,
pack_indexes,
commit_hex,
-   opts.append,
-   1);
+   flags);
 
UNLEAK(lines);
return result;
diff --git a/builtin/commit.c b/builtin/commit.c
index b9ea7222fa..b001ef565d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
  "not exceeded, and then \"git reset HEAD\" to recover."));
 
if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-   write_commit_graph_reachable(get_object_directory(), 0, 0))
+   write_commit_graph_reachable(get_object_directory(), 0))
return 1;
 
repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index 3984addf73..df2573f124 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -665,8 +665,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
}
 
if (gc_write_commit_graph &&
-   write_commit_graph_reachable(get_object_directory(), 0,
-!quiet && !daemonized))
+   write_commit_graph_reachable(get_object_directory(),
+!quiet && !daemonized ? 
COMMIT_GRAPH_PROGRESS : 0))
return 1;
 
if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index ee487a364b..8bbd50658c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,15 +851,14 @@ static int add_ref_to_list(const char *refname,
return 0;
 }
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
 {
struct string_list list = STRING_LIST_INIT_DUP;
int result;
 
for_each_ref(add_ref_to_list, &list);
result = write_commit_graph(obj_dir, NULL, &list,
-   append, report_progress);
+   flags);
 
string_list_clear(&list, 0);
return result;
@@ -868,7 +867,7 @@ int write_commit_graph_reachable(const char *obj_dir, int 
append,
 int write_commit_graph(const char *obj_dir,
   struct string_list *pack_indexes,
   struct string_list *commit_hex,
-  int append, int report_progress)
+  unsigned int flags)
 {
struct packed_oid_list oids;
struct packed_commit_list commits;
@@ -887,6 +886,8 @@ int write_commit_graph(const char *obj_dir,
struct strbuf progress_title = STRBUF_INIT;
unsigned long approx_nr_objects;
int res = 0;
+   int append = flags & COMMIT_GRAPH_APPEND;
+   int report_progress = flags & COMMIT_GRAPH_PROG

[PATCH v3 5/6] commit-graph: implement file format version 2

2019-05-01 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The commit-graph file format had some shortcomings which we now
correct:

  1. The hash algorithm was determined by a single byte, instead
 of the 4-byte format identifier.

  2. There was no way to update the reachability index we used.
 We currently only support generation numbers, but that will
 change in the future.

  3. Git did not fail with error if the unused eighth byte was
 non-zero, so we could not use that to indicate an incremental
 file format without breaking compatibility across versions.

The new format modifies the header of the commit-graph to solve
these problems. We use the 4-byte hash format id, freeing up a byte
in our 32-bit alignment to introduce a reachability index version.
We can also fail to read the commit-graph if the eighth byte is
non-zero.

Update the 'git commit-graph read' subcommand to display the new
data, and check this output in the test that explicitly writes a
v2 commit-graph file.

While we converted the existing 'verify' tests to use a version 1
file to avoid recalculating data offsets, add explicit 'verify'
tests on a version 2 file that corrupt the new header values.

Signed-off-by: Derrick Stolee 
---
 .../technical/commit-graph-format.txt | 26 ++-
 builtin/commit-graph.c|  5 ++
 commit-graph.c| 39 +-
 t/t5318-commit-graph.sh   | 73 +--
 4 files changed, 134 insertions(+), 9 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt 
b/Documentation/technical/commit-graph-format.txt
index 16452a0504..e367aa94b1 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -31,13 +31,22 @@ and hash type.
 
 All 4-byte numbers are in network order.
 
+There are two versions available, 1 and 2. These currently differ only in
+the header.
+
 HEADER:
 
+All commit-graph files use the first five bytes for the same purpose.
+
   4-byte signature:
   The signature is: {'C', 'G', 'P', 'H'}
 
   1-byte version number:
-  Currently, the only valid version is 1.
+  Currently, the valid version numbers are 1 and 2.
+
+The remainder of the header changes depending on the version.
+
+Version 1:
 
   1-byte Hash Version (1 = SHA-1)
   We infer the hash length (H) from this value.
@@ -47,6 +56,21 @@ HEADER:
   1-byte (reserved for later use)
  Current clients should ignore this value.
 
+Version 2:
+
+  1-byte number (C) of "chunks"
+
+  1-byte reachability index version number:
+  Currently, the only valid number is 1.
+
+  1-byte (reserved for later use)
+  Current clients expect this value to be zero, and will not
+  try to read the commit-graph file if it is non-zero.
+
+  4-byte format identifier for the hash algorithm:
+  If this identifier does not agree with the repository's current
+  hash algorithm, then the client will not read the commit graph.
+
 CHUNK LOOKUP:
 
   (C + 1) * 12 bytes listing the table of contents for the chunks:
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index e766dd076e..7df6688b08 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -117,6 +117,11 @@ static int graph_read(int argc, const char **argv)
*(unsigned char*)(graph->data + 5),
*(unsigned char*)(graph->data + 6),
*(unsigned char*)(graph->data + 7));
+
+   if (*(unsigned char *)(graph->data + 4) == 2)
+   printf("hash algorithm: %X\n",
+  get_be32(graph->data + 8));
+
printf("num_commits: %u\n", graph->num_commits);
printf("chunks:");
 
diff --git a/commit-graph.c b/commit-graph.c
index b6f09f1be2..5eebba6a0f 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -152,7 +152,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
uint64_t last_chunk_offset;
uint32_t last_chunk_id;
uint32_t graph_signature;
-   unsigned char graph_version, hash_version;
+   unsigned char graph_version, hash_version, reach_index_version;
+   uint32_t hash_id;
 
if (!graph_map)
return NULL;
@@ -170,7 +171,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
}
 
graph_version = *(unsigned char*)(data + 4);
-   if (graph_version != 1) {
+   if (!graph_version || graph_version > 2) {
error(_("commit-graph version %X does not match version %X"),
  graph_version, 1);
return NULL;
@@ -190,6 +191,30 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
graph->num_chunks = *(unsigned char*)(data + 6);
chunk_lookup = data + 8;
break;
+

[PATCH v3 0/6] Create commit-graph file format v2

2019-05-01 Thread Derrick Stolee via GitGitGadget
The commit-graph file format has some shortcomings that were discussed
on-list:

 1. It doesn't use the 4-byte format ID from the_hash_algo.


 2. There is no way to change the reachability index from generation numbers
to corrected commit date [1].


 3. The unused byte in the format could be used to signal the file is
incremental, but current clients ignore the value even if it is
non-zero.



This series adds a new version (2) to the commit-graph file. The fifth byte
already specified the file format, so existing clients will gracefully
respond to files with a different version number. The only real change now
is that the header takes 12 bytes instead of 8, due to using the 4-byte
format ID for the hash algorithm.

The new bytes reserved for the reachability index version and incremental
file formats are now expected to be equal to the defaults. When we update
these values to be flexible in the future, if a client understands
commit-graph v2 but not those new values, then it will fail gracefully.

NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
were significant and subtle.

Updates in V3: Thanks for all the feedback so far!

 * Moved the version information into an unsigned char parameter, instead of
   a flag.
   
   
 * We no longer default to the v2 file format, as that will break users who
   downgrade. This required some changes to the test script.
   
   
 * Removed the "Future work" section from the commit-graph design document
   in a new patch.
   
   
 * I did not change the file name for v2 file formats, as Ævar suggested.
   I'd like the discussion to continue on this topic.
   
   

Thanks, -Stolee

[1] 
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/

Derrick Stolee (6):
  commit-graph: return with errors during write
  commit-graph: collapse parameters into flags
  commit-graph: create new version parameter
  commit-graph: add --version= option
  commit-graph: implement file format version 2
  commit-graph: remove Future Work section

 Documentation/git-commit-graph.txt|   3 +
 .../technical/commit-graph-format.txt |  26 ++-
 Documentation/technical/commit-graph.txt  |  17 --
 builtin/commit-graph.c|  33 ++--
 builtin/commit.c  |   5 +-
 builtin/gc.c  |   8 +-
 commit-graph.c| 153 +-
 commit-graph.h|  16 +-
 t/t5318-commit-graph.sh   |  75 -
 9 files changed, 250 insertions(+), 86 deletions(-)


base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-112/derrickstolee/graph/v2-head-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/112

Range-diff vs v2:

 1:  91f300ec0a = 1:  91f300ec0a commit-graph: return with errors during write
 2:  04f5df1135 ! 2:  924b22f990 commit-graph: collapse parameters into flags
 @@ -86,7 +86,7 @@
   
  -int write_commit_graph_reachable(const char *obj_dir, int append,
  -  int report_progress)
 -+int write_commit_graph_reachable(const char *obj_dir, int flags)
 ++int write_commit_graph_reachable(const char *obj_dir, unsigned int flags)
   {
struct string_list list = STRING_LIST_INIT_DUP;
int result;
 @@ -103,7 +103,7 @@
   struct string_list *pack_indexes,
   struct string_list *commit_hex,
  -int append, int report_progress)
 -+int flags)
 ++unsigned int flags)
   {
struct packed_oid_list oids;
struct packed_commit_list commits;
 @@ -129,12 +129,12 @@
  +#define COMMIT_GRAPH_APPEND (1 << 0)
  +#define COMMIT_GRAPH_PROGRESS   (1 << 1)
  +
 -+int write_commit_graph_reachable(const char *obj_dir, int flags);
 ++int write_commit_graph_reachable(const char *obj_dir, unsigned int 
flags);
   int write_commit_graph(const char *obj_dir,
   struct string_list *pack_indexes,
   struct string_list *commit_hex,
  -int append, int report_progress);
 -+int flags);
 ++unsigned int flags);
   
   int verify_commit_graph(struct repository *r, struct commit_graph *g);
   
 3:  4ddb829163 ! 3:  8446011a43 commit-graph: create new version flags
 @@ -1,12 +1,12 @@
  Author: Derrick Stolee 
  
 -commit-graph: create new version flags
 +commit-graph: create new version parameter
  
  In anticipation of a new commit-graph file format version, create
 -a flag for the write_commit_graph() 

[PATCH v3 4/6] commit-graph: add --version= option

2019-05-01 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Allow the commit-graph builtin to specify the file format version
using the '--version=' option. Specify the version exactly in
the verification tests as using a different version would change
the offsets used in those tests.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  3 +++
 builtin/commit-graph.c | 13 -
 t/t5318-commit-graph.sh|  2 +-
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 624470e198..1d1cc70de4 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -51,6 +51,9 @@ or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
++
+With the `--version=` option, specify the file format version. Used
+only for testing.
 
 'read'::
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 7d9185dfc2..e766dd076e 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph verify [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits] [--version=]"),
NULL
 };
 
@@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = 
{
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits] [--version=]"),
NULL
 };
 
@@ -35,6 +35,7 @@ static struct opts_commit_graph {
int stdin_packs;
int stdin_commits;
int append;
+   int version;
 } opts;
 
 
@@ -156,6 +157,8 @@ static int graph_write(int argc, const char **argv)
N_("start walk at commits listed by stdin")),
OPT_BOOL(0, "append", &opts.append,
N_("include all commits already in the commit-graph 
file")),
+   OPT_INTEGER(0, "version", &opts.version,
+   N_("specify the file format version")),
OPT_END(),
};
 
@@ -168,12 +171,12 @@ static int graph_write(int argc, const char **argv)
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
if (opts.append)
-   flags |= COMMIT_GRAPH_APPEND;
+   flags |= COMMIT_GRAPH_APPEND;   
 
read_replace_refs = 0;
 
if (opts.reachable)
-   return write_commit_graph_reachable(opts.obj_dir, flags, 0);
+   return write_commit_graph_reachable(opts.obj_dir, flags, 
opts.version);
 
string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
@@ -193,7 +196,7 @@ static int graph_write(int argc, const char **argv)
result = write_commit_graph(opts.obj_dir,
pack_indexes,
commit_hex,
-   flags, 0);
+   flags, opts.version);
 
UNLEAK(lines);
return result;
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e80c1cac02..4eb5a09ef3 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -328,7 +328,7 @@ test_expect_success 'replace-objects invalidates 
commit-graph' '
 
 test_expect_success 'git commit-graph verify' '
cd "$TRASH_DIRECTORY/full" &&
-   git rev-parse commits/8 | git commit-graph write --stdin-commits &&
+   git rev-parse commits/8 | git commit-graph write --stdin-commits 
--version=1 &&
git commit-graph verify >output
 '
 
-- 
gitgitgadget



[PATCH v3 3/6] commit-graph: create new version parameter

2019-05-01 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

In anticipation of a new commit-graph file format version, create
a parameter for the write_commit_graph() and write_commit_graph_reachable()
methods to take a version number.

When the given version is zero, the implementation selects a
default value. Currently, the only valid value is 1.

The file format will change the header information, so place the
existing header logic inside a switch statement with only one case.

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c |  4 +--
 builtin/commit.c   |  2 +-
 builtin/gc.c   |  3 +-
 commit-graph.c | 63 +++---
 commit-graph.h |  6 ++--
 5 files changed, 50 insertions(+), 28 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 828b1a713f..7d9185dfc2 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -173,7 +173,7 @@ static int graph_write(int argc, const char **argv)
read_replace_refs = 0;
 
if (opts.reachable)
-   return write_commit_graph_reachable(opts.obj_dir, flags);
+   return write_commit_graph_reachable(opts.obj_dir, flags, 0);
 
string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
@@ -193,7 +193,7 @@ static int graph_write(int argc, const char **argv)
result = write_commit_graph(opts.obj_dir,
pack_indexes,
commit_hex,
-   flags);
+   flags, 0);
 
UNLEAK(lines);
return result;
diff --git a/builtin/commit.c b/builtin/commit.c
index b001ef565d..b9ea7222fa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
  "not exceeded, and then \"git reset HEAD\" to recover."));
 
if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-   write_commit_graph_reachable(get_object_directory(), 0))
+   write_commit_graph_reachable(get_object_directory(), 0, 0))
return 1;
 
repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index df2573f124..41637242b1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -666,7 +666,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
if (gc_write_commit_graph &&
write_commit_graph_reachable(get_object_directory(),
-!quiet && !daemonized ? 
COMMIT_GRAPH_PROGRESS : 0))
+!quiet && !daemonized ? 
COMMIT_GRAPH_PROGRESS : 0,
+0))
return 1;
 
if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index 8bbd50658c..b6f09f1be2 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -25,9 +25,6 @@
 
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
-#define GRAPH_VERSION_1 0x1
-#define GRAPH_VERSION GRAPH_VERSION_1
-
 #define GRAPH_EXTRA_EDGES_NEEDED 0x8000
 #define GRAPH_EDGE_LAST_MASK 0x7fff
 #define GRAPH_PARENT_NONE 0x7000
@@ -173,30 +170,35 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
}
 
graph_version = *(unsigned char*)(data + 4);
-   if (graph_version != GRAPH_VERSION) {
+   if (graph_version != 1) {
error(_("commit-graph version %X does not match version %X"),
- graph_version, GRAPH_VERSION);
-   return NULL;
-   }
-
-   hash_version = *(unsigned char*)(data + 5);
-   if (hash_version != oid_version()) {
-   error(_("commit-graph hash version %X does not match version 
%X"),
- hash_version, oid_version());
+ graph_version, 1);
return NULL;
}
 
graph = alloc_commit_graph();
 
+   switch (graph_version) {
+   case 1:
+   hash_version = *(unsigned char*)(data + 5);
+   if (hash_version != oid_version()) {
+   error(_("commit-graph hash version %X does not match 
version %X"),
+ hash_version, oid_version());
+   return NULL;
+   }
+
+   graph->num_chunks = *(unsigned char*)(data + 6);
+   chunk_lookup = data + 8;
+   break;
+   }
+
graph->hash_len = the_hash_algo->rawsz;
-   graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
graph->data_len = graph_size;
 
last_chunk_id = 0;
last_chunk_offset = 8;
-   chunk_lookup = data + 8;
for (i = 0; i < graph->num_chunks; i++) {
uint32_t chunk_id;

[PATCH v3 1/6] commit-graph: return with errors during write

2019-05-01 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value.

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 19 +++--
 builtin/commit.c   |  5 ++--
 builtin/gc.c   |  7 ++---
 commit-graph.c | 60 +-
 commit-graph.h | 10 +++
 5 files changed, 62 insertions(+), 39 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 537fdfd0f0..2e86251f02 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv)
struct string_list *pack_indexes = NULL;
struct string_list *commit_hex = NULL;
struct string_list lines;
+   int result;
 
static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv)
 
read_replace_refs = 0;
 
-   if (opts.reachable) {
-   write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
-   return 0;
-   }
+   if (opts.reachable)
+   return write_commit_graph_reachable(opts.obj_dir, opts.append, 
1);
 
string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
@@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
UNLEAK(buf);
}
 
-   write_commit_graph(opts.obj_dir,
-  pack_indexes,
-  commit_hex,
-  opts.append,
-  1);
+   result = write_commit_graph(opts.obj_dir,
+   pack_indexes,
+   commit_hex,
+   opts.append,
+   1);
 
UNLEAK(lines);
-   return 0;
+   return result;
 }
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
diff --git a/builtin/commit.c b/builtin/commit.c
index 2986553d5f..b9ea7222fa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
  "new_index file. Check that disk is not full and quota 
is\n"
  "not exceeded, and then \"git reset HEAD\" to recover."));
 
-   if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
-   write_commit_graph_reachable(get_object_directory(), 0, 0);
+   if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
+   write_commit_graph_reachable(get_object_directory(), 0, 0))
+   return 1;
 
repo_rerere(the_repository, 0);
run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..3984addf73 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
clean_pack_garbage();
}
 
-   if (gc_write_commit_graph)
-   write_commit_graph_reachable(get_object_directory(), 0,
-!quiet && !daemonized);
+   if (gc_write_commit_graph &&
+   write_commit_graph_reachable(get_object_directory(), 0,
+!quiet && !daemonized))
+   return 1;
 
if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 66865acbd7..ee487a364b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,27 +851,30 @@ static int add_ref_to_list(const char *refname,
return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
- int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int append,
+int report_progress)
 {
struct string_list list = STRING_LIST_INIT_DUP;
+   int result;
 
for_each_ref(add_ref_to_list, &list);
-   write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+   result = write_commit_graph(obj_dir, NULL, &list,
+   append, report_progress);
 
string_li

Re: [PATCH 3/6] config.c: add repo_config_set_worktree_gently()

2019-04-30 Thread Derrick Stolee
On 12/27/2018 10:56 AM, Nguyễn Thái Ngọc Duy wrote:
> diff --git a/config.h b/config.h
> index ee5d3fa7b4..62204dc252 100644
> --- a/config.h
> +++ b/config.h
> @@ -103,6 +103,9 @@ extern int git_config_color(char *, const char *, const 
> char *);
>  extern int git_config_set_in_file_gently(const char *, const char *, const 
> char *);
>  extern void git_config_set_in_file(const char *, const char *, const char *);
>  extern int git_config_set_gently(const char *, const char *);
> +extern int repo_config_set_gently(struct repository *, const char *, const 
> char *);
> +extern void repo_config_set(struct repository *, const char *, const char *);
> +extern int repo_config_set_worktree_gently(struct repository *, const char 
> *, const char *);
>  extern void git_config_set(const char *, const char *);
>  extern int git_config_parse_key(const char *, char **, int *);
>  extern int git_config_key_is_valid(const char *key);

I know this is an old thread, but the patch is still in 'pu'. These methods
do not appear to have any callers. Perhaps this patch should be dropped
until there is a reason to include it?

Thanks,
-Stolee


[PATCH 0/2] Multi-pack-index: Fix "too many file descriptors" bug

2019-04-29 Thread Derrick Stolee via GitGitGadget
Thanks to Jeff H for finding the problem with the multi-pack-index regarding
many packs. Specifically: if we open too many packs, the close_one_pack()
method cannot find the packs from the multi-pack-index to close.

Jeff already fixed the problem explicitly in 'git multi-pack-index verify'
which would hit this issue 100% of the time we had 2000+ packs. This issue
could still happen in 'git rev-list --all --objects' if there are enough
packs containing commits and trees. This series fixes the issue.

The basic solution is to add packs from the multi-pack-index into the
packed_git struct as they are opened. To avoid performance issues, add a 
multi_pack_index bit to the packed_git struct. Midx-aware algorithms can
then ignore those packs.

There was a very subtle issue that happens during a 'git repack': we clear
the multi-pack-index after possibly reading some packs from it, thus leaving
some packs in the packed_git struct but having a NULL multi_pack_index in
the object store. This informs the change to close_midx().

I'm based on a recent 'master' commit that contains the following three
branches due to nearby changes causing conflicts if I pick only Jeff's
change as a base:

jh/midx-verify-too-many-packs jk/server-info-rabbit-hole
bc/hash-transition-16

Thanks, -Stolee

Derrick Stolee (2):
  midx: pass a repository pointer
  midx: add packs to packed_git linked list

 builtin/multi-pack-index.c |  2 +-
 builtin/pack-objects.c |  2 +-
 midx.c | 42 +-
 midx.h |  7 ---
 object-store.h |  9 ++--
 packfile.c | 30 ---
 sha1-name.c|  6 ++
 7 files changed, 51 insertions(+), 47 deletions(-)


base-commit: 83232e38648b51abbcbdb56c94632b6906cc85a6
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-182%2Fderrickstolee%2Fmany-packs-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-182/derrickstolee/many-packs-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/182
-- 
gitgitgadget


[PATCH 1/2] midx: pass a repository pointer

2019-04-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Much of the multi-pack-index code focuses on the multi_pack_index
struct, and so we only pass a pointer to the current one. However,
we will insert a dependency on the packed_git linked list in a
future change, so we will need a repository reference. Inserting
these parameters is a significant enough change to split out.

Signed-off-by: Derrick Stolee 
---
 builtin/multi-pack-index.c |  2 +-
 builtin/pack-objects.c |  2 +-
 midx.c | 22 ++
 midx.h |  7 ---
 packfile.c |  4 ++--
 5 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index ae6e476ad5..72dfd3dadc 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -46,7 +46,7 @@ int cmd_multi_pack_index(int argc, const char **argv,
if (!strcmp(argv[0], "write"))
return write_midx_file(opts.object_dir);
if (!strcmp(argv[0], "verify"))
-   return verify_midx_file(opts.object_dir);
+   return verify_midx_file(the_repository, opts.object_dir);
 
die(_("unrecognized verb: %s"), argv[0]);
 }
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2d9a3bdc9d..e606b9ef03 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1078,7 +1078,7 @@ static int want_object_in_pack(const struct object_id 
*oid,
 
for (m = get_multi_pack_index(the_repository); m; m = m->next) {
struct pack_entry e;
-   if (fill_midx_entry(oid, &e, m)) {
+   if (fill_midx_entry(the_repository, oid, &e, m)) {
struct packed_git *p = e.p;
off_t offset;
 
diff --git a/midx.c b/midx.c
index d5d2e9522f..8b8faec35a 100644
--- a/midx.c
+++ b/midx.c
@@ -201,7 +201,7 @@ void close_midx(struct multi_pack_index *m)
FREE_AND_NULL(m->pack_names);
 }
 
-int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
+int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, 
uint32_t pack_int_id)
 {
struct strbuf pack_name = STRBUF_INIT;
 
@@ -261,7 +261,10 @@ static uint32_t nth_midxed_pack_int_id(struct 
multi_pack_index *m, uint32_t pos)
return get_be32(m->chunk_object_offsets + pos * 
MIDX_CHUNK_OFFSET_WIDTH);
 }
 
-static int nth_midxed_pack_entry(struct multi_pack_index *m, struct pack_entry 
*e, uint32_t pos)
+static int nth_midxed_pack_entry(struct repository *r,
+struct multi_pack_index *m,
+struct pack_entry *e,
+uint32_t pos)
 {
uint32_t pack_int_id;
struct packed_git *p;
@@ -271,7 +274,7 @@ static int nth_midxed_pack_entry(struct multi_pack_index 
*m, struct pack_entry *
 
pack_int_id = nth_midxed_pack_int_id(m, pos);
 
-   if (prepare_midx_pack(m, pack_int_id))
+   if (prepare_midx_pack(r, m, pack_int_id))
die(_("error preparing packfile from multi-pack-index"));
p = m->packs[pack_int_id];
 
@@ -301,14 +304,17 @@ static int nth_midxed_pack_entry(struct multi_pack_index 
*m, struct pack_entry *
return 1;
 }
 
-int fill_midx_entry(const struct object_id *oid, struct pack_entry *e, struct 
multi_pack_index *m)
+int fill_midx_entry(struct repository * r,
+   const struct object_id *oid,
+   struct pack_entry *e,
+   struct multi_pack_index *m)
 {
uint32_t pos;
 
if (!bsearch_midx(oid, m, &pos))
return 0;
 
-   return nth_midxed_pack_entry(m, e, pos);
+   return nth_midxed_pack_entry(r, m, e, pos);
 }
 
 /* Match "foo.idx" against either "foo.pack" _or_ "foo.idx". */
@@ -1020,7 +1026,7 @@ static int compare_pair_pos_vs_id(const void *_a, const 
void *_b)
display_progress(progress, _n); \
} while (0)
 
-int verify_midx_file(const char *object_dir)
+int verify_midx_file(struct repository *r, const char *object_dir)
 {
struct pair_pos_vs_id *pairs = NULL;
uint32_t i;
@@ -1034,7 +1040,7 @@ int verify_midx_file(const char *object_dir)
progress = start_progress(_("Looking for referenced packfiles"),
  m->num_packs);
for (i = 0; i < m->num_packs; i++) {
-   if (prepare_midx_pack(m, i))
+   if (prepare_midx_pack(r, m, i))
midx_report("failed to load pack in position %d", i);
 
display_progress(progress, i + 1);
@@ -1099,7 +1105,7 @@ int verify_midx_file(const char *object_dir)
 
nth_midxed_object_oid(&oid, m, pairs[i].pos);
 
-   if (!fill_midx_entry(&oid, &e, m)) {
+   if (!fill_midx_entry(r, &oid, &

[PATCH 2/2] midx: add packs to packed_git linked list

2019-04-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The multi-pack-index allows searching for objects across multiple
packs using one object list. The original design gains many of
these performance benefits by keeping the packs in the
multi-pack-index out of the packed_git list.

Unfortunately, this has one major drawback. If the multi-pack-index
covers thousands of packs, and a command loads many of those packs,
then we can hit the limit for open file descriptors. The
close_one_pack() method is used to limit this resource, but it
only looks at the packed_git list, and uses an LRU cache to prevent
thrashing.

Instead of complicating this close_one_pack() logic to include
direct references to the multi-pack-index, simply add the packs
opened by the multi-pack-index to the packed_git list. This
immediately solves the file-descriptor limit problem, but requires
some extra steps to avoid performance issues or other problems:

1. Create a multi_pack_index bit in the packed_git struct that is
   one if and only if the pack was loaded from a multi-pack-index.

2. Skip packs with the multi_pack_index bit when doing object
   lookups and abbreviations. These algorithms already check the
   multi-pack-index before the packed_git struct. This has a very
   small performance hit, as we need to walk more packed_git
   structs. This is acceptable, since these operations run binary
   search on the other packs, so this walk-and-ignore logic is
   very fast by comparison.

3. When closing a multi-pack-index file, do not close its packs,
   as those packs will be closed using close_all_packs(). In some
   cases, such as 'git repack', we run 'close_midx()' without also
   closing the packs, so we need to un-set the multi_pack_index bit
   in those packs. This is necessary, and caught by running
   t6501-freshen-objects.sh with GIT_TEST_MULTI_PACK_INDEX=1.

To manually test this change, I inserted trace2 logging into
close_pack_fd() and set pack_max_fds to 10, then ran 'git rev-list
--all --objects' on a copy of the Git repo with 300+ pack-files and
a multi-pack-index. The logs verified the packs are closed as
we read them beyond the file descriptor limit.

Signed-off-by: Derrick Stolee 
---
 midx.c | 20 ++--
 object-store.h |  9 ++---
 packfile.c | 28 
 sha1-name.c|  6 ++
 4 files changed, 30 insertions(+), 33 deletions(-)

diff --git a/midx.c b/midx.c
index 8b8faec35a..e7e1fe4d65 100644
--- a/midx.c
+++ b/midx.c
@@ -192,10 +192,8 @@ void close_midx(struct multi_pack_index *m)
m->fd = -1;
 
for (i = 0; i < m->num_packs; i++) {
-   if (m->packs[i]) {
-   close_pack(m->packs[i]);
-   free(m->packs[i]);
-   }
+   if (m->packs[i])
+   m->packs[i]->multi_pack_index = 0;
}
FREE_AND_NULL(m->packs);
FREE_AND_NULL(m->pack_names);
@@ -204,6 +202,7 @@ void close_midx(struct multi_pack_index *m)
 int prepare_midx_pack(struct repository *r, struct multi_pack_index *m, 
uint32_t pack_int_id)
 {
struct strbuf pack_name = STRBUF_INIT;
+   struct packed_git *p;
 
if (pack_int_id >= m->num_packs)
die(_("bad pack-int-id: %u (%u total packs)"),
@@ -215,9 +214,18 @@ int prepare_midx_pack(struct repository *r, struct 
multi_pack_index *m, uint32_t
strbuf_addf(&pack_name, "%s/pack/%s", m->object_dir,
m->pack_names[pack_int_id]);
 
-   m->packs[pack_int_id] = add_packed_git(pack_name.buf, pack_name.len, 
m->local);
+   p = add_packed_git(pack_name.buf, pack_name.len, m->local);
strbuf_release(&pack_name);
-   return !m->packs[pack_int_id];
+
+   if (!p)
+   return 1;
+
+   p->multi_pack_index = 1;
+   m->packs[pack_int_id] = p;
+   install_packed_git(r, p);
+   list_add_tail(&p->mru, &r->objects->packed_git_mru);
+
+   return 0;
 }
 
 int bsearch_midx(const struct object_id *oid, struct multi_pack_index *m, 
uint32_t *result)
diff --git a/object-store.h b/object-store.h
index b086f5ecdb..7acbc7fffe 100644
--- a/object-store.h
+++ b/object-store.h
@@ -76,7 +76,8 @@ struct packed_git {
 pack_keep_in_core:1,
 freshened:1,
 do_not_close:1,
-pack_promisor:1;
+pack_promisor:1,
+multi_pack_index:1;
unsigned char hash[GIT_MAX_RAWSZ];
struct revindex_entry *revindex;
/* something like ".git/objects/pack/x.pack" */
@@ -128,12 +129,6 @@ struct raw_object_store {
/* A most-recently-used ordered version of the packed_git list. */
struct list_head packed_git_mru;
 
-   /*
-* A linked list containing all packfiles, starting with those
-* contained in the multi_

Re: [PATCH v3 4/8] commit-graph: don't early exit(1) on e.g. "git status"

2019-04-29 Thread Derrick Stolee
On 4/27/2019 9:06 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> There's still cases left where we'll exit early, e.g. if you do:
> 
> $ git diff -U1
> diff --git a/commit-graph.c b/commit-graph.c
> index 66865acbd7..63773764ce 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -1074,3 +1074,3 @@ void write_commit_graph(const char *obj_dir,
> chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
> -   chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
> +   chunk_offsets[2] = chunk_offsets[0] + hashsz * commits.nr;
> chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
> 
> Which is obviously bad, but something I encounterd while hacking up [1]
> we'll still hard die as before this patch on:
> 
> $ git status
> fatal: invalid parent position 1734910766
> $

I really appreciate you digging in deep into these kinds of issues. You
seem to be hitting corrupted commit-graph files more often than we are
(in VFS for Git world). However, we should be _very careful_ when turning
some of these errors to warnings. At the very least, we should do some
high-level planning for how to handle this case.

The biggest issue is that we have some logic that is run after a call to
generation_numbers_enabled(), such as the `git rev-list --topo-order`
logic, that relies on the commit-graph for correctness. If we output a
warning and then stop using the commit-graph, then we will start having
commits with finite generation pointing to commits with infinite generation.

Perhaps, with some care, we can alert the algorithm to change the "minimum
generation" that limits how far we dequeue the priority-queue. Changing it
to zero will cause the algorithm to behave like the old algorithm.

But, having an algorithm that usually takes 0.1 seconds suddenly take 10+
seconds also violates some expectations.

Q: How should we handle a detectably-invalid commit-graph?

I think most of your patches have done a good job so far of detecting
an invalid header, and responding by ignoring the commit-graph. This case
of a detectable error in the chunk data itself is not something we can
check on the first load without serious performance issues.

I hope we can decide on a good solution.

Thanks,
-Stolee


Re: [PATCH v2 0/5] Create commit-graph file format v2

2019-04-26 Thread Derrick Stolee
On 4/26/2019 4:33 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Apr 26 2019, Junio C Hamano wrote:
>>
>> Thanks always for your careful review and thoughtful comments, by
>> the way.

I agree that these comments are extremely helpful.

>>> Now as noted in my series we now on 'master' downgrade that to a warning
>>> (along with the rest of the errors):
>>>
>>> $ ~/g/git/git --exec-path=$PWD status
>>> error: commit-graph version 2 does not match version 1
>>> On branch master
>>> [...]
>>>
>>> ...and this series sets the default version for all new graphs to v2. 
>>> The phrasing seems odd.  It is unclear, even to me who is vaguely
>> familiar with the word "commit-graph" and is aware of the fact that
>> the file format is being updated, what
>>
>> "commit-graph version 2 does not match version 1"
> 
> Yeah it should really say:
> 
> "commit-graph is of version 2, our maximum supported version is 1"

I agree this phrasing is better. Please see the patch I just submitted [1]
to try and improve these messages.

[1] https://public-inbox.org/git/pull.181.git.gitgitgad...@gmail.com/
 
> Hindsight is 20/20, but more generally I wonder if we should have these
> format versions match that of the git version (unlikely to change it
> twice in the same release...) which would allow us to say things like:
> 
> "commit-graph needs v2.22.0 or later, we have the version written by 
> v2.18.0..v2.21.0"
> 
> But of course dealing with those larger integers in the code/gaps is
> also messy :)

There are a couple issues with using the version numbers, from my
perspective:

1. We don't do that anywhere else, like the index file.

2. The microsoft/git fork takes certain performance changes faster
   than core git, and frequently ships versions between major version
   updates. Our 2.17 had the commit-graph, for instance. It's also
   possible that we'd take commit-graph v2 earlier than the core Git
   major release.
 
>> wants to say.  Do I have version #2 on disk and the running binary
>> only understands version #1?  Or do I have version #1 on disk and
>> the binary expected version #2?  How would I get out of this
>> situation?  Is it sufficient to do "rm -f .git/info/commit-graph*"
>> and is it safe?
> 
> Yeah. An rm of .git/info/commit-graph is safe, so is "-c
> core.commitGraph=false" as a workaround.

That is true. I'm not sure the error message is the right place to
describe the workaround.

> I'd say "let's improve the error", but that ship has sailed, and we can
> do better than an error here, no matter how it's phrased...
> 
>>> I think this is *way* too aggressive of an upgrade path. If these
>>> patches go into v2.22.0 then git clients on all older versions that grok
>>> the commit graph (IIRC v2.18 and above) will have their git completely
>>> broken if they're in a mixed-git-version environmen.>
> I should note that "all older versions..." here is those that have
> core.commitGraph=true set. More details in 43d3561805 ("commit-graph
> write: don't die if the existing graph is corrupt", 2019-03-25).
> 
>>> Is it really so important to move to v2 right away that we need to risk
>>> those breakages? I think even with my ab/commit-graph-fixes it's still
>>> too annoying (I was mostly trying to fix other stuff...). If only we
>>> could detect "we should make a new graph now" 
>>
>> True.

You are right, this is too aggressive and I should have known better. I'll
update in the next version to keep a default to v1. Not only do we have this
downgrade risk, there is no actual benefit in this series alone. This only
sets up the ability for other features.
 
> Having slept on my earlier
> https://public-inbox.org/git/87a7gdspo4@evledraar.gmail.com/ I think
> I see a better way to deal with this than my earlier suggestion that we
> perform some version flip-flop dance on the single "commit-graph" file:
> 
> How about just writing .git/objects/info/commit-graph-v2, and for the
> upcoming plan when where they'll be split have some dir/prefix there
> where we include the version?
> 
> That means that:
> 
>  1. If there's an existing v1 "commit-graph" file we don't write a v2 at
> that path in v2.22, although we might have some "write v1 (as well
> as v2?) for old client compat" config where we opt-in to do that.
> 
>  2. By default in v2.22 we read/write a "commit-graph-v2" file,
> preferring it over the v1 "commit-graph", falling back on earlier
> versions if it's not there (until gc --auto kicks in on v2.22 and
> makes a v2 graph).
> 
>  3. If you have concurrent v2.21 and v2.22 clients accessing the repo
> you might end up generating one commit-graph or the other depending
> on who happens to trigger "gc --auto".
> 
> Hopefully that's a non-issue since an out-of-date graph isn't
> usually a big deal, and client versions mostly march forward. But
> v2.22 could also learn some "incremental gc" where it says "my v2 is
> older, v1 client must h

[PATCH 0/1] commit-graph: improve error messages

2019-04-26 Thread Derrick Stolee via GitGitGadget
Here is a small patch that revises the error messages from
ab/commit-graph-fixes, as recommended by Ævar. Hopefully, it can be merged
faster than the commit-graph v2 stuff, and I can update that series to
include this change if we agree it is a good one.

Thanks, -Stolee

Cc: ava...@gmail.com

In-Reply-To: 878svxrwsh@evledraar.gmail.com
[878svxrwsh@evledraar.gmail.com]

Derrick Stolee (1):
  commit-graph: improve error messages

 commit-graph.c  | 10 +-
 t/t5318-commit-graph.sh |  6 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)


base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-181%2Fderrickstolee%2Fgraph%2Fmessage-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-181/derrickstolee/graph/message-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/181
-- 
gitgitgadget


[PATCH 1/1] commit-graph: improve error messages

2019-04-26 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The error messages when reading a commit-graph have a few problems:

1. Some values are output in hexadecimal, but that is not made
   clear by the message. Prepend "0x" to these values.

2. The version number does not need to be hexadecimal, and also
   should mention a "maximum supported version". This has one
   possible confusing case: we could have a corrupt commit-graph
   file with version number zero, so the output would be

   "commit-graph has version 0, our maximum supported version is 1"

   This will only happen with corrupt data. This error message is
   designed instead for the case where a client is downgraded after
   writing a newer file version.

Update t5318-commit-graph.sh to watch for these new error messages.

Reported-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Derrick Stolee 
---
 commit-graph.c  | 10 +-
 t/t5318-commit-graph.sh |  6 +++---
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 66865acbd7..aba591913e 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -167,21 +167,21 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
 
graph_signature = get_be32(data);
if (graph_signature != GRAPH_SIGNATURE) {
-   error(_("commit-graph signature %X does not match signature 
%X"),
+   error(_("commit-graph signature 0x%X does not match signature 
0x%X"),
  graph_signature, GRAPH_SIGNATURE);
return NULL;
}
 
graph_version = *(unsigned char*)(data + 4);
if (graph_version != GRAPH_VERSION) {
-   error(_("commit-graph version %X does not match version %X"),
+   error(_("commit-graph has version %d, our maximum supported 
version is %d"),
  graph_version, GRAPH_VERSION);
return NULL;
}
 
hash_version = *(unsigned char*)(data + 5);
if (hash_version != oid_version()) {
-   error(_("commit-graph hash version %X does not match version 
%X"),
+   error(_("commit-graph has hash version %d, our maximum 
supported version is %d"),
  hash_version, oid_version());
return NULL;
}
@@ -215,7 +215,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
 
if (chunk_offset > graph_size - the_hash_algo->rawsz) {
-   error(_("commit-graph improper chunk offset %08x%08x"), 
(uint32_t)(chunk_offset >> 32),
+   error(_("commit-graph improper chunk offset 
0x%08x%08x"), (uint32_t)(chunk_offset >> 32),
  (uint32_t)chunk_offset);
free(graph);
return NULL;
@@ -252,7 +252,7 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
}
 
if (chunk_repeated) {
-   error(_("commit-graph chunk id %08x appears multiple 
times"), chunk_id);
+   error(_("commit-graph chunk id 0x%08x appears multiple 
times"), chunk_id);
free(graph);
return NULL;
}
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e80c1cac02..264ebb15b1 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -420,17 +420,17 @@ test_expect_success 'detect too small' '
 
 test_expect_success 'detect bad signature' '
corrupt_graph_and_verify 0 "\0" \
-   "graph signature"
+   "commit-graph signature"
 '
 
 test_expect_success 'detect bad version' '
corrupt_graph_and_verify $GRAPH_BYTE_VERSION "\02" \
-   "graph version"
+   "commit-graph has version"
 '
 
 test_expect_success 'detect bad hash version' '
corrupt_graph_and_verify $GRAPH_BYTE_HASH "\02" \
-   "hash version"
+   "commit-graph has hash version"
 '
 
 test_expect_success 'detect low chunk count' '
-- 
gitgitgadget


Re: [PATCH] trace2: fix incorrect function pointer check

2019-04-25 Thread Derrick Stolee
On 4/25/2019 1:08 PM, Josh Steadmon wrote:
> Fix trace2_data_json_fl() to check for the presence of pfn_data_json_fl
> in its targets, rather than pfn_data_fl, which is not actually called.

[snip]

>   for_each_wanted_builtin (j, tgt_j)
> - if (tgt_j->pfn_data_fl)
> + if (tgt_j->pfn_data_json_fl)
>   tgt_j->pfn_data_json_fl(file, line, us_elapsed_absolute,

Seems obviously correct. Thanks!

-Stolee



Git Test Coverage Report (April 25)

2019-04-25 Thread Derrick Stolee
4248706 61) warning(_("promisor remote name cannot begin with '/': %s"),
54248706 63) return NULL;
7bdf0926 93) previous->next = r->next;
7b6e1b04 108) return git_config_string(&core_partial_clone_filter_default,
b21a55f3 139) return 0;
dcc8b4e9 202) static int remove_fetched_oids(struct object_id **oids, int 
oid_nr, int to_free)
dcc8b4e9 204) int i, missing_nr = 0;
dcc8b4e9 205) int *missing = xcalloc(oid_nr, sizeof(*missing));
dcc8b4e9 206) struct object_id *old_oids = *oids;
dcc8b4e9 208) int old_fetch_if_missing = fetch_if_missing;
dcc8b4e9 210) fetch_if_missing = 0;
dcc8b4e9 212) for (i = 0; i < oid_nr; i++)
dcc8b4e9 213) if (oid_object_info_extended(the_repository, &old_oids[i], NULL, 
0)) {
dcc8b4e9 214) missing[i] = 1;
dcc8b4e9 215) missing_nr++;
dcc8b4e9 218) fetch_if_missing = old_fetch_if_missing;
dcc8b4e9 220) if (missing_nr) {
dcc8b4e9 221) int j = 0;
dcc8b4e9 222) new_oids = xcalloc(missing_nr, sizeof(*new_oids));
dcc8b4e9 223) for (i = 0; i < oid_nr; i++)
dcc8b4e9 224) if (missing[i])
dcc8b4e9 225) oidcpy(&new_oids[j++], &old_oids[i]);
dcc8b4e9 226) *oids = new_oids;
dcc8b4e9 227) if (to_free)
dcc8b4e9 228) free(old_oids);
dcc8b4e9 231) free(missing);
dcc8b4e9 233) return missing_nr;
dcc8b4e9 248) if (missing_nr == 1)
dcc8b4e9 249) continue;
dcc8b4e9 250) missing_nr = remove_fetched_oids(&missing_oids, missing_nr, 
to_free);
dcc8b4e9 251) if (missing_nr) {
dcc8b4e9 252) to_free = 1;
dcc8b4e9 253) continue;
dcc8b4e9 261) free(missing_oids);

protocol.c
6da1f1a9 37) die(_("Unrecognized protocol version"));
6da1f1a9 39) die(_("Unrecognized protocol_version"));
6da1f1a9 74) BUG("late attempt to register an allowed protocol version");

ref-filter.c
3da20422 93) keydata_aka_refname ? keydata_aka_refname : k->wt->head_ref);

remote-curl.c
6da1f1a9 345) return 0;

upload-pack.c
a8d662e3 130) return readsz;
820a5361 149) BUG("packfile_uris requires sideband-all");
a8d662e3 355) send_client_data(1, output_state.buffer, output_state.used);
820a5361 1386) string_list_clear(&data->uri_protocols, 0);

wrapper.c
5efde212 70) die("Out of memory, malloc failed (tried to allocate %" PRIuMAX " 
bytes)",
5efde212 73) error("Out of memory, malloc failed (tried to allocate %" PRIuMAX 
" bytes)",

Commits introducting uncovered code:
Barret Rhoden   dfb4ee12 blame: use a fingerprint heuristic to match ignored 
lines
Barret Rhoden   d0738d94 blame: add config options to handle output for ignored 
lines
Barret Rhoden   a5c91678 blame: add the ability to ignore commits and their 
changes
Barret Rhoden   3486d8d4 Move init_skiplist() outside of fsck
Christian Coudere265069a Use promisor_remote_get_direct() and 
has_promisor_remote()
Christian Couder    dcc8b4e9 promisor-remote: implement 
promisor_remote_get_direct()
Christian Couderb21a55f3 promisor-remote: parse 
remote.*.partialclonefilter
Christian Couder7b6e1b04 Move core_partial_clone_filter_default to 
promisor-remote.c
Christian Couder7bdf0926 promisor-remote: use 
repository_format_partial_clone
Christian Couder54248706 Add initial support for many promisor remotes
Christian Couder0ba08c05 Remove fetch-object.{c,h} in favor of 
promisor-remote.{c,h}
Dan McGregorba81921a http: cast result to FILE *
Derrick Stolee  d5747681 commit-graph: create new version flags
Derrick Stolee  5ff7cad9 commit-graph: return with errors during write
Jonathan Tana8d662e3 upload-pack: refactor reading of pack-objects out
Jonathan Tan820a5361 upload-pack: send part of packfile response as uri
Jonathan Tanbf01639c fetch-pack: support more than one pack lockfile
Jonathan Tan472fbef8 http-fetch: support fetching packfiles by URL
Josh Steadmon   6da1f1a9 protocol: advertise multiple supported versions
Junio C Hamano  07ca07bb Merge branch 'jt/fetch-cdn-offload' into pu
Martin Koegler  5efde212 zlib.c: use size_t for size
Nguyễn Thái Ngọc Duy8f7c7f55 config.c: add repo_config_set_worktree_gently()
Nguyễn Thái Ngọc Duy6f11fd5e config: add --move-to
Nguyễn Thái Ngọc Duya12c1ff3 config: factor out set_config_source_file()
Nickolai Belakovski 3da20422 ref-filter: add worktreepath atom


Uncovered code in 'jch' not in 'next'


attr.c
b0c2d613 1167) return 0;

builtin/checkout.c
4de2081d 415) die(_("'%s' cannot be used with updating paths"),
4de2081d 425) die(_("'%s' cannot be used with %s"),
a0f3bbfa 433) die(_("neither '%s' or '%s' is specified"),
a0f3bbfa 437) die(_("'%s' must be used when '%s' is not specified"),
7800c9e6 442) die(_("'%s' or '%s' cannot be used with %s"),
7800c9e6 447) die(_("'%s' or '%s' cannot be used with %s&q

Re: [PATCH v2 3/5] commit-graph: create new version flags

2019-04-25 Thread Derrick Stolee
On 4/25/2019 1:29 AM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget"  writes:
> 
>> +int version = 0;
>> ...
>> +if (flags & COMMIT_GRAPH_VERSION_1)
>> +version = 1;
>> +if (!version)
>> +version = 1;
>> +if (version != 1) {
>> +error(_("unsupported commit-graph version %d"),
>> +  version);
>> +return 1;
>> +}
> 
> The above sequence had a certain "Huh?" factor before 5/5 introduced
> the support for a later version that is in use by default.
> 
> Is it sensible to define VERSION_$N as if they are independent bits
> in a single flags variable?  What does it mean for the flags variable
> to have both GRAPH_VERSION_1 and GRAPH_VERSION_2 bits set?
>
> What I am getting at is if this is better done as a n-bit bitfield
> that represents a small unsigned integer (e.g. "unsigned char" that
> lets you play with up to 255 versions, or "unsigned version : 3"
> that limits you to up to 7 versions).
> 
> You use an 8-bit byte in the file format anyway, so it might not be
> so bad to have a separate version parameter that is not mixed with
> the flag bits, perhaps?

This is a reasonable idea, as this is a "pick exactly one" option.
It is still important to reduce the overall parameter count by combining
the other boolean options into flags.

Thanks,
-Stolee
 


Re: [PATCH v5 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index

2019-04-25 Thread Derrick Stolee
On 4/25/2019 1:38 AM, Junio C Hamano wrote:
> Derrick Stolee  writes:
> 
>> Updates in V5:
>>
>> * Fixed the error in PATCH 7 due to a missing line that existed in PATCH 8. 
>> Thanks, Josh Steadmon!
>>
>> * The 'repack' subcommand now computes the "expected size" of a pack instead 
>> of
>>   relying on the total size of the pack. This is actually really important to
>>   the way VFS for Git uses prefetch packs, and some packs are not being
>>   repacked because the pack size is larger than the batch size, but really
>>   there are only a few referenced objects.
>>
>> * The 'repack' subcommand now allows a batch size of zero to mean "create one
>>   pack containing all objects in the multi-pack-index". A new commit adds a
>>   test that hits the boundary cases here, but follows the 'expire' subcommand
>>   so we can show that cycle of repack-then-expire to safely replace the 
>> packs.
> 
> I guess all of them need to tweak the authorship from the gmail
> address to the work address on the Signed-off-by: trailer, which I
> can do (as I noticed it before applying).

Sorry. Due to the conflicts, GitGitGadget prevented me from submitting in
my normal way, so I pulled out format-patch and send-email for the first
time in a very long time. I manually added new "From: " lines in the bodies
of the patch files, but they got suppressed, I guess.

Thanks,
-Stolee


[PATCH v2 4/5] commit-graph: add --version= option

2019-04-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Allow the commit-graph builtin to specify the file format version
using the '--version=' option. Specify the version exactly in
the verification tests as using a different version would change
the offsets used in those tests.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  3 +++
 builtin/commit-graph.c | 13 +++--
 t/t5318-commit-graph.sh|  2 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 624470e198..1d1cc70de4 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -51,6 +51,9 @@ or `--stdin-packs`.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
++
+With the `--version=` option, specify the file format version. Used
+only for testing.
 
 'read'::
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 828b1a713f..65ceb7a141 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -10,7 +10,7 @@ static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph verify [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits] [--version=]"),
NULL
 };
 
@@ -25,7 +25,7 @@ static const char * const builtin_commit_graph_read_usage[] = 
{
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits] [--version=]"),
NULL
 };
 
@@ -35,6 +35,7 @@ static struct opts_commit_graph {
int stdin_packs;
int stdin_commits;
int append;
+   int version;
 } opts;
 
 
@@ -156,6 +157,8 @@ static int graph_write(int argc, const char **argv)
N_("start walk at commits listed by stdin")),
OPT_BOOL(0, "append", &opts.append,
N_("include all commits already in the commit-graph 
file")),
+   OPT_INTEGER(0, "version", &opts.version,
+   N_("specify the file format version")),
OPT_END(),
};
 
@@ -170,6 +173,12 @@ static int graph_write(int argc, const char **argv)
if (opts.append)
flags |= COMMIT_GRAPH_APPEND;
 
+   switch (opts.version) {
+   case 1:
+   flags |= COMMIT_GRAPH_VERSION_1;
+   break;
+   }
+
read_replace_refs = 0;
 
if (opts.reachable)
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index e80c1cac02..4eb5a09ef3 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -328,7 +328,7 @@ test_expect_success 'replace-objects invalidates 
commit-graph' '
 
 test_expect_success 'git commit-graph verify' '
cd "$TRASH_DIRECTORY/full" &&
-   git rev-parse commits/8 | git commit-graph write --stdin-commits &&
+   git rev-parse commits/8 | git commit-graph write --stdin-commits 
--version=1 &&
git commit-graph verify >output
 '
 
-- 
gitgitgadget



[PATCH v2 2/5] commit-graph: collapse parameters into flags

2019-04-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
We will soon expand the possible options to send to these methods, so
instead of complicating the parameter list, first simplify it.

Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 8 +---
 builtin/commit.c   | 2 +-
 builtin/gc.c   | 4 ++--
 commit-graph.c | 9 +
 commit-graph.h | 8 +---
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 2e86251f02..828b1a713f 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -142,6 +142,7 @@ static int graph_write(int argc, const char **argv)
struct string_list *commit_hex = NULL;
struct string_list lines;
int result;
+   int flags = COMMIT_GRAPH_PROGRESS;
 
static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -166,11 +167,13 @@ static int graph_write(int argc, const char **argv)
die(_("use at most one of --reachable, --stdin-commits, or 
--stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
+   if (opts.append)
+   flags |= COMMIT_GRAPH_APPEND;
 
read_replace_refs = 0;
 
if (opts.reachable)
-   return write_commit_graph_reachable(opts.obj_dir, opts.append, 
1);
+   return write_commit_graph_reachable(opts.obj_dir, flags);
 
string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
@@ -190,8 +193,7 @@ static int graph_write(int argc, const char **argv)
result = write_commit_graph(opts.obj_dir,
pack_indexes,
commit_hex,
-   opts.append,
-   1);
+   flags);
 
UNLEAK(lines);
return result;
diff --git a/builtin/commit.c b/builtin/commit.c
index b9ea7222fa..b001ef565d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1670,7 +1670,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
  "not exceeded, and then \"git reset HEAD\" to recover."));
 
if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
-   write_commit_graph_reachable(get_object_directory(), 0, 0))
+   write_commit_graph_reachable(get_object_directory(), 0))
return 1;
 
repo_rerere(the_repository, 0);
diff --git a/builtin/gc.c b/builtin/gc.c
index 3984addf73..df2573f124 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -665,8 +665,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
}
 
if (gc_write_commit_graph &&
-   write_commit_graph_reachable(get_object_directory(), 0,
-!quiet && !daemonized))
+   write_commit_graph_reachable(get_object_directory(),
+!quiet && !daemonized ? 
COMMIT_GRAPH_PROGRESS : 0))
return 1;
 
if (auto_gc && too_many_loose_objects())
diff --git a/commit-graph.c b/commit-graph.c
index ee487a364b..b16c71fd82 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,15 +851,14 @@ static int add_ref_to_list(const char *refname,
return 0;
 }
 
-int write_commit_graph_reachable(const char *obj_dir, int append,
-int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int flags)
 {
struct string_list list = STRING_LIST_INIT_DUP;
int result;
 
for_each_ref(add_ref_to_list, &list);
result = write_commit_graph(obj_dir, NULL, &list,
-   append, report_progress);
+   flags);
 
string_list_clear(&list, 0);
return result;
@@ -868,7 +867,7 @@ int write_commit_graph_reachable(const char *obj_dir, int 
append,
 int write_commit_graph(const char *obj_dir,
   struct string_list *pack_indexes,
   struct string_list *commit_hex,
-  int append, int report_progress)
+  int flags)
 {
struct packed_oid_list oids;
struct packed_commit_list commits;
@@ -887,6 +886,8 @@ int write_commit_graph(const char *obj_dir,
struct strbuf progress_title = STRBUF_INIT;
unsigned long approx_nr_objects;
int res = 0;
+   int append = flags & COMMIT_GRAPH_APPEND;
+   int report_progress = flags & COMMIT_GRAPH_PROGRESS;
 
   

[PATCH v2 1/5] commit-graph: return with errors during write

2019-04-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value.

Signed-off-by: Derrick Stolee 
---
 builtin/commit-graph.c | 19 +++--
 builtin/commit.c   |  5 ++--
 builtin/gc.c   |  7 ++---
 commit-graph.c | 60 +-
 commit-graph.h | 10 +++
 5 files changed, 62 insertions(+), 39 deletions(-)

diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 537fdfd0f0..2e86251f02 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -141,6 +141,7 @@ static int graph_write(int argc, const char **argv)
struct string_list *pack_indexes = NULL;
struct string_list *commit_hex = NULL;
struct string_list lines;
+   int result;
 
static struct option builtin_commit_graph_write_options[] = {
OPT_STRING(0, "object-dir", &opts.obj_dir,
@@ -168,10 +169,8 @@ static int graph_write(int argc, const char **argv)
 
read_replace_refs = 0;
 
-   if (opts.reachable) {
-   write_commit_graph_reachable(opts.obj_dir, opts.append, 1);
-   return 0;
-   }
+   if (opts.reachable)
+   return write_commit_graph_reachable(opts.obj_dir, opts.append, 
1);
 
string_list_init(&lines, 0);
if (opts.stdin_packs || opts.stdin_commits) {
@@ -188,14 +187,14 @@ static int graph_write(int argc, const char **argv)
UNLEAK(buf);
}
 
-   write_commit_graph(opts.obj_dir,
-  pack_indexes,
-  commit_hex,
-  opts.append,
-  1);
+   result = write_commit_graph(opts.obj_dir,
+   pack_indexes,
+   commit_hex,
+   opts.append,
+   1);
 
UNLEAK(lines);
-   return 0;
+   return result;
 }
 
 int cmd_commit_graph(int argc, const char **argv, const char *prefix)
diff --git a/builtin/commit.c b/builtin/commit.c
index 2986553d5f..b9ea7222fa 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1669,8 +1669,9 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
  "new_index file. Check that disk is not full and quota 
is\n"
  "not exceeded, and then \"git reset HEAD\" to recover."));
 
-   if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0))
-   write_commit_graph_reachable(get_object_directory(), 0, 0);
+   if (git_env_bool(GIT_TEST_COMMIT_GRAPH, 0) &&
+   write_commit_graph_reachable(get_object_directory(), 0, 0))
+   return 1;
 
repo_rerere(the_repository, 0);
run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..3984addf73 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -664,9 +664,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
clean_pack_garbage();
}
 
-   if (gc_write_commit_graph)
-   write_commit_graph_reachable(get_object_directory(), 0,
-!quiet && !daemonized);
+   if (gc_write_commit_graph &&
+   write_commit_graph_reachable(get_object_directory(), 0,
+!quiet && !daemonized))
+   return 1;
 
if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
diff --git a/commit-graph.c b/commit-graph.c
index 66865acbd7..ee487a364b 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -851,27 +851,30 @@ static int add_ref_to_list(const char *refname,
return 0;
 }
 
-void write_commit_graph_reachable(const char *obj_dir, int append,
- int report_progress)
+int write_commit_graph_reachable(const char *obj_dir, int append,
+int report_progress)
 {
struct string_list list = STRING_LIST_INIT_DUP;
+   int result;
 
for_each_ref(add_ref_to_list, &list);
-   write_commit_graph(obj_dir, NULL, &list, append, report_progress);
+   result = write_commit_graph(obj_dir, NULL, &list,
+   append, report_progress);
 
string_li

[PATCH v2 5/5] commit-graph: implement file format version 2

2019-04-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The commit-graph file format had some shortcomings which we now
correct:

  1. The hash algorithm was determined by a single byte, instead
 of the 4-byte format identifier.

  2. There was no way to update the reachability index we used.
 We currently only support generation numbers, but that will
 change in the future.

  3. Git did not fail with error if the unused eighth byte was
 non-zero, so we could not use that to indicate an incremental
 file format without breaking compatibility across versions.

The new format modifies the header of the commit-graph to solve
these problems. We use the 4-byte hash format id, freeing up a byte
in our 32-bit alignment to introduce a reachability index version.
We can also fail to read the commit-graph if the eighth byte is
non-zero.

Update the 'git commit-graph read' subcommand to display the new
data.

Set the default file format version to 2, and adjust the tests to
expect the new 'git commit-graph read' output.

Add explicit tests for the upgrade path from version 1 to 2. Users
with an existing commit-graph with version 1 will seamlessly
upgrade to version 2 on their next write.

While we converted the existing 'verify' tests to use a version 1
file to avoid recalculating data offsets, add explicit 'verify'
tests on a version 2 file that corrupt the new header values.

Signed-off-by: Derrick Stolee 
---
 .../technical/commit-graph-format.txt | 26 +++-
 builtin/commit-graph.c|  9 +++
 commit-graph.c| 43 ++--
 commit-graph.h|  1 +
 t/t5318-commit-graph.sh   | 66 +--
 5 files changed, 134 insertions(+), 11 deletions(-)

diff --git a/Documentation/technical/commit-graph-format.txt 
b/Documentation/technical/commit-graph-format.txt
index 16452a0504..e367aa94b1 100644
--- a/Documentation/technical/commit-graph-format.txt
+++ b/Documentation/technical/commit-graph-format.txt
@@ -31,13 +31,22 @@ and hash type.
 
 All 4-byte numbers are in network order.
 
+There are two versions available, 1 and 2. These currently differ only in
+the header.
+
 HEADER:
 
+All commit-graph files use the first five bytes for the same purpose.
+
   4-byte signature:
   The signature is: {'C', 'G', 'P', 'H'}
 
   1-byte version number:
-  Currently, the only valid version is 1.
+  Currently, the valid version numbers are 1 and 2.
+
+The remainder of the header changes depending on the version.
+
+Version 1:
 
   1-byte Hash Version (1 = SHA-1)
   We infer the hash length (H) from this value.
@@ -47,6 +56,21 @@ HEADER:
   1-byte (reserved for later use)
  Current clients should ignore this value.
 
+Version 2:
+
+  1-byte number (C) of "chunks"
+
+  1-byte reachability index version number:
+  Currently, the only valid number is 1.
+
+  1-byte (reserved for later use)
+  Current clients expect this value to be zero, and will not
+  try to read the commit-graph file if it is non-zero.
+
+  4-byte format identifier for the hash algorithm:
+  If this identifier does not agree with the repository's current
+  hash algorithm, then the client will not read the commit graph.
+
 CHUNK LOOKUP:
 
   (C + 1) * 12 bytes listing the table of contents for the chunks:
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 65ceb7a141..1485b4daaf 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -117,6 +117,11 @@ static int graph_read(int argc, const char **argv)
*(unsigned char*)(graph->data + 5),
*(unsigned char*)(graph->data + 6),
*(unsigned char*)(graph->data + 7));
+
+   if (*(unsigned char *)(graph->data + 4) == 2)
+   printf("hash algorithm: %X\n",
+  get_be32(graph->data + 8));
+
printf("num_commits: %u\n", graph->num_commits);
printf("chunks:");
 
@@ -177,6 +182,10 @@ static int graph_write(int argc, const char **argv)
case 1:
flags |= COMMIT_GRAPH_VERSION_1;
break;
+
+   case 2:
+   flags |= COMMIT_GRAPH_VERSION_2;
+   break;
}
 
read_replace_refs = 0;
diff --git a/commit-graph.c b/commit-graph.c
index e75e1655fb..14d6aebd99 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -152,7 +152,8 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
uint64_t last_chunk_offset;
uint32_t last_chunk_id;
uint32_t graph_signature;
-   unsigned char graph_version, hash_version;
+   unsigned char graph_version, hash_version, reach_index_version;
+   uint32_t hash_id;
 
if (!graph_map)
return NULL;
@@ -170,7 +171,7 @@ struct commit_graph *parse_commit_graph(void *gr

[PATCH v2 3/5] commit-graph: create new version flags

2019-04-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

In anticipation of a new commit-graph file format version, create
a flag for the write_commit_graph() and write_commit_graph_reachable()
methods to take a version number.

When there is no specified version, the implementation selects a
default value. Currently, the only valid value is 1.

The file format will change the header information, so place the
existing header logic inside a switch statement with only one case.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 58 +-
 commit-graph.h |  1 +
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index b16c71fd82..e75e1655fb 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -25,9 +25,6 @@
 
 #define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
 
-#define GRAPH_VERSION_1 0x1
-#define GRAPH_VERSION GRAPH_VERSION_1
-
 #define GRAPH_EXTRA_EDGES_NEEDED 0x8000
 #define GRAPH_EDGE_LAST_MASK 0x7fff
 #define GRAPH_PARENT_NONE 0x7000
@@ -173,30 +170,35 @@ struct commit_graph *parse_commit_graph(void *graph_map, 
int fd,
}
 
graph_version = *(unsigned char*)(data + 4);
-   if (graph_version != GRAPH_VERSION) {
+   if (graph_version != 1) {
error(_("commit-graph version %X does not match version %X"),
- graph_version, GRAPH_VERSION);
-   return NULL;
-   }
-
-   hash_version = *(unsigned char*)(data + 5);
-   if (hash_version != oid_version()) {
-   error(_("commit-graph hash version %X does not match version 
%X"),
- hash_version, oid_version());
+ graph_version, 1);
return NULL;
}
 
graph = alloc_commit_graph();
 
+   switch (graph_version) {
+   case 1:
+   hash_version = *(unsigned char*)(data + 5);
+   if (hash_version != oid_version()) {
+   error(_("commit-graph hash version %X does not match 
version %X"),
+ hash_version, oid_version());
+   return NULL;
+   }
+
+   graph->num_chunks = *(unsigned char*)(data + 6);
+   chunk_lookup = data + 8;
+   break;
+   }
+
graph->hash_len = the_hash_algo->rawsz;
-   graph->num_chunks = *(unsigned char*)(data + 6);
graph->graph_fd = fd;
graph->data = graph_map;
graph->data_len = graph_size;
 
last_chunk_id = 0;
last_chunk_offset = 8;
-   chunk_lookup = data + 8;
for (i = 0; i < graph->num_chunks; i++) {
uint32_t chunk_id;
uint64_t chunk_offset;
@@ -888,10 +890,22 @@ int write_commit_graph(const char *obj_dir,
int res = 0;
int append = flags & COMMIT_GRAPH_APPEND;
int report_progress = flags & COMMIT_GRAPH_PROGRESS;
+   int version = 0;
+   int header_size = 0;
 
if (!commit_graph_compatible(the_repository))
return 0;
 
+   if (flags & COMMIT_GRAPH_VERSION_1)
+   version = 1;
+   if (!version)
+   version = 1;
+   if (version != 1) {
+   error(_("unsupported commit-graph version %d"),
+ version);
+   return 1;
+   }
+
oids.nr = 0;
approx_nr_objects = approximate_object_count();
oids.alloc = approx_nr_objects / 32;
@@ -1076,10 +1090,16 @@ int write_commit_graph(const char *obj_dir,
 
hashwrite_be32(f, GRAPH_SIGNATURE);
 
-   hashwrite_u8(f, GRAPH_VERSION);
-   hashwrite_u8(f, oid_version());
-   hashwrite_u8(f, num_chunks);
-   hashwrite_u8(f, 0); /* unused padding byte */
+   hashwrite_u8(f, version);
+
+   switch (version) {
+   case 1:
+   hashwrite_u8(f, oid_version());
+   hashwrite_u8(f, num_chunks);
+   hashwrite_u8(f, 0); /* unused padding byte */
+   header_size = 8;
+   break;
+   }
 
chunk_ids[0] = GRAPH_CHUNKID_OIDFANOUT;
chunk_ids[1] = GRAPH_CHUNKID_OIDLOOKUP;
@@ -1090,7 +1110,7 @@ int write_commit_graph(const char *obj_dir,
chunk_ids[3] = 0;
chunk_ids[4] = 0;
 
-   chunk_offsets[0] = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
+   chunk_offsets[0] = header_size + (num_chunks + 1) * 
GRAPH_CHUNKLOOKUP_WIDTH;
chunk_offsets[1] = chunk_offsets[0] + GRAPH_FANOUT_SIZE;
chunk_offsets[2] = chunk_offsets[1] + hashsz * commits.nr;
chunk_offsets[3] = chunk_offsets[2] + (hashsz + 16) * commits.nr;
diff --git a/commit-graph.h b/commit-graph.h
index 390474047c..d7cd13deb3 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -67,6 +67,7 @@ int generation_numbers_enabled(struct repository *r);
 
 #define COMMIT_GRAPH_APPEND (1 << 0)
 #

[PATCH v2 0/5] Create commit-graph file format v2

2019-04-24 Thread Derrick Stolee via GitGitGadget
The commit-graph file format has some shortcomings that were discussed
on-list:

 1. It doesn't use the 4-byte format ID from the_hash_algo.


 2. There is no way to change the reachability index from generation numbers
to corrected commit date [1].


 3. The unused byte in the format could be used to signal the file is
incremental, but current clients ignore the value even if it is
non-zero.



This series adds a new version (2) to the commit-graph file. The fifth byte
already specified the file format, so existing clients will gracefully
respond to files with a different version number. The only real change now
is that the header takes 12 bytes instead of 8, due to using the 4-byte
format ID for the hash algorithm.

The new bytes reserved for the reachability index version and incremental
file formats are now expected to be equal to the defaults. When we update
these values to be flexible in the future, if a client understands
commit-graph v2 but not those new values, then it will fail gracefully.

NOTE: this series was rebased onto ab/commit-graph-fixes, as the conflicts
were significant and subtle.

Thanks, -Stolee

[1] 
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/

Derrick Stolee (5):
  commit-graph: return with errors during write
  commit-graph: collapse parameters into flags
  commit-graph: create new version flags
  commit-graph: add --version= option
  commit-graph: implement file format version 2

 Documentation/git-commit-graph.txt|   3 +
 .../technical/commit-graph-format.txt |  26 ++-
 builtin/commit-graph.c|  43 +++--
 builtin/commit.c  |   5 +-
 builtin/gc.c  |   7 +-
 commit-graph.c| 156 +-
 commit-graph.h|  16 +-
 t/t5318-commit-graph.sh   |  68 +++-
 8 files changed, 254 insertions(+), 70 deletions(-)


base-commit: 93b4405ffe4ad9308740e7c1c71383bfc369baaa
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-112%2Fderrickstolee%2Fgraph%2Fv2-head-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-112/derrickstolee/graph/v2-head-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/112

Range-diff vs v1:

 1:  e72498d0c5 ! 1:  91f300ec0a commit-graph: return with errors during write
 @@ -82,8 +82,8 @@
   --- a/builtin/gc.c
   +++ b/builtin/gc.c
  @@
 -  if (pack_garbage.nr > 0)
clean_pack_garbage();
 +  }
   
  - if (gc_write_commit_graph)
  - write_commit_graph_reachable(get_object_directory(), 0,
 @@ -182,9 +182,9 @@
}
stop_progress(&progress);
   
 -- if (count_distinct >= GRAPH_PARENT_MISSING)
 +- if (count_distinct >= GRAPH_EDGE_LAST_MASK)
  - die(_("the commit graph format cannot write %d commits"), 
count_distinct);
 -+ if (count_distinct >= GRAPH_PARENT_MISSING) {
 ++ if (count_distinct >= GRAPH_EDGE_LAST_MASK) {
  + error(_("the commit graph format cannot write %d commits"), 
count_distinct);
  + res = 1;
  + goto cleanup;
 @@ -196,9 +196,9 @@
num_chunks = num_extra_edges ? 4 : 3;
stop_progress(&progress);
   
 -- if (commits.nr >= GRAPH_PARENT_MISSING)
 +- if (commits.nr >= GRAPH_EDGE_LAST_MASK)
  - die(_("too many commits to write graph"));
 -+ if (commits.nr >= GRAPH_PARENT_MISSING) {
 ++ if (commits.nr >= GRAPH_EDGE_LAST_MASK) {
  + error(_("too many commits to write graph"));
  + res = 1;
  + goto cleanup;
 2:  43a40d0c43 ! 2:  04f5df1135 commit-graph: collapse parameters into flags
 @@ -66,7 +66,7 @@
   --- a/builtin/gc.c
   +++ b/builtin/gc.c
  @@
 -  clean_pack_garbage();
 +  }
   
if (gc_write_commit_graph &&
  - write_commit_graph_reachable(get_object_directory(), 0,
 3:  39319e36bc ! 3:  4ddb829163 commit-graph: create new version flags
 @@ -25,25 +25,25 @@
  -#define GRAPH_VERSION GRAPH_VERSION_1
  -
   #define GRAPH_EXTRA_EDGES_NEEDED 0x8000
 - #define GRAPH_PARENT_MISSING 0x7fff
   #define GRAPH_EDGE_LAST_MASK 0x7fff
 + #define GRAPH_PARENT_NONE 0x7000
  @@
}
   
graph_version = *(unsigned char*)(data + 4);
  - if (graph_version != GRAPH_VERSION) {
  + if (graph_version != 1) {
 -  error(_("graph version %X does not match version %X"),
 +  error(_("commit-graph version %X does not match version %X"),
  -   graph_version, GRAPH_VERSION);
 -- goto cleanup_fail;
 +- return NULL;
  - }
  -
  - hash_version = *(unsigned char

[PATCH v5 08/11] midx: implement midx_repack()

2019-04-24 Thread Derrick Stolee
To repack with a non-zero batch-size, first sort all pack-files by
their modified time. Second, walk those pack-files from oldest
to newest, compute their expected size, and add the packs to a list
if they are smaller than the given batch-size. Stop when the total
expected size is at least the batch size.

If the batch size is zero, select all packs in the multi-pack-index.

Finally, collect the objects from the multi-pack-index that are in
the selected packs and send them to 'git pack-objects'. Write a new
multi-pack-index that includes the new pack.

Using a batch size of zero is very similar to a standard 'git repack'
command, except that we do not delete the old packs and instead rely
on the new multi-pack-index to prevent new processes from reading the
old packs. This does not disrupt other Git processes that are currently
reading the old packs based on the old multi-pack-index.

While first designing a 'git multi-pack-index repack' operation, I
started by collecting the batches based on the actual size of the
objects instead of the size of the pack-files. This allows repacking
a large pack-file that has very few referencd objects. However, this
came at a significant cost of parsing pack-files instead of simply
reading the multi-pack-index and getting the file information for
the pack-files. The "expected size" version provides similar
behavior, but could skip a pack-file if the average object size is
much larger than the actual size of the referenced objects, or
can create a large pack if the actual size of the referenced objects
is larger than the expected size.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 150 +++-
 t/t5319-multi-pack-index.sh |  28 +++
 2 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 768a7dff73..01c6a05732 100644
--- a/midx.c
+++ b/midx.c
@@ -8,6 +8,7 @@
 #include "sha1-lookup.h"
 #include "midx.h"
 #include "progress.h"
+#include "run-command.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -1113,7 +1114,154 @@ int expire_midx_packs(const char *object_dir)
return result;
 }
 
-int midx_repack(const char *object_dir, size_t batch_size)
+struct repack_info {
+   timestamp_t mtime;
+   uint32_t referenced_objects;
+   uint32_t pack_int_id;
+};
+
+static int compare_by_mtime(const void *a_, const void *b_)
 {
+   const struct repack_info *a, *b;
+
+   a = (const struct repack_info *)a_;
+   b = (const struct repack_info *)b_;
+
+   if (a->mtime < b->mtime)
+   return -1;
+   if (a->mtime > b->mtime)
+   return 1;
+   return 0;
+}
+
+static int fill_included_packs_all(struct multi_pack_index *m,
+  unsigned char *include_pack)
+{
+   uint32_t i;
+
+   for (i = 0; i < m->num_packs; i++)
+   include_pack[i] = 1;
+
+   return m->num_packs < 2;
+}
+
+static int fill_included_packs_batch(struct multi_pack_index *m,
+unsigned char *include_pack,
+size_t batch_size)
+{
+   uint32_t i, packs_to_repack;
+   size_t total_size;
+   struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct 
repack_info));
+
+   for (i = 0; i < m->num_packs; i++) {
+   pack_info[i].pack_int_id = i;
+
+   if (prepare_midx_pack(m, i))
+   continue;
+
+   pack_info[i].mtime = m->packs[i]->mtime;
+   }
+
+   for (i = 0; batch_size && i < m->num_objects; i++) {
+   uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
+   pack_info[pack_int_id].referenced_objects++;
+   }
+
+   QSORT(pack_info, m->num_packs, compare_by_mtime);
+
+   total_size = 0;
+   packs_to_repack = 0;
+   for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
+   int pack_int_id = pack_info[i].pack_int_id;
+   struct packed_git *p = m->packs[pack_int_id];
+   size_t expected_size;
+
+   if (!p)
+   continue;
+   if (open_pack_index(p) || !p->num_objects)
+   continue;
+
+   expected_size = (size_t)(p->pack_size
+* pack_info[i].referenced_objects);
+   expected_size /= p->num_objects;
+
+   if (expected_size >= batch_size)
+   continue;
+
+   packs_to_repack++;
+   total_size += expected_size;
+   include_pack[pack_int_id] = 1;
+   }
+
+   free(pack_info);
+
+   if (total_size < batch_size || packs_to_repack < 2)
+   return 1;
+
return 

[PATCH v5 01/11] repack: refactor pack deletion for future use

2019-04-24 Thread Derrick Stolee
The repack builtin deletes redundant pack-files and their
associated .idx, .promisor, .bitmap, and .keep files. We will want
to re-use this logic in the future for other types of repack, so
pull the logic into 'unlink_pack_path()' in packfile.c.

The 'ignore_keep' parameter is enabled for the use in repack, but
will be important for a future caller.

Signed-off-by: Derrick Stolee 
---
 builtin/repack.c | 14 ++
 packfile.c   | 28 
 packfile.h   |  7 +++
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 45583683ee..3d445b34b4 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -129,19 +129,9 @@ static void get_non_kept_pack_filenames(struct string_list 
*fname_list,
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
-   const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"};
-   int i;
struct strbuf buf = STRBUF_INIT;
-   size_t plen;
-
-   strbuf_addf(&buf, "%s/%s", dir_name, base_name);
-   plen = buf.len;
-
-   for (i = 0; i < ARRAY_SIZE(exts); i++) {
-   strbuf_setlen(&buf, plen);
-   strbuf_addstr(&buf, exts[i]);
-   unlink(buf.buf);
-   }
+   strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
+   unlink_pack_path(buf.buf, 1);
strbuf_release(&buf);
 }
 
diff --git a/packfile.c b/packfile.c
index d1e6683ffe..bacecb4d0d 100644
--- a/packfile.c
+++ b/packfile.c
@@ -352,6 +352,34 @@ void close_all_packs(struct raw_object_store *o)
}
 }
 
+void unlink_pack_path(const char *pack_name, int force_delete)
+{
+   static const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", 
".promisor"};
+   int i;
+   struct strbuf buf = STRBUF_INIT;
+   size_t plen;
+
+   strbuf_addstr(&buf, pack_name);
+   strip_suffix_mem(buf.buf, &buf.len, ".pack");
+   plen = buf.len;
+
+   if (!force_delete) {
+   strbuf_addstr(&buf, ".keep");
+   if (!access(buf.buf, F_OK)) {
+   strbuf_release(&buf);
+   return;
+   }
+   }
+
+   for (i = 0; i < ARRAY_SIZE(exts); i++) {
+   strbuf_setlen(&buf, plen);
+   strbuf_addstr(&buf, exts[i]);
+   unlink(buf.buf);
+   }
+
+   strbuf_release(&buf);
+}
+
 /*
  * The LRU pack is the one with the oldest MRU window, preferring packs
  * with no used windows, or the oldest mtime if it has no windows allocated.
diff --git a/packfile.h b/packfile.h
index 6c4037605d..5b7bcdb1dd 100644
--- a/packfile.h
+++ b/packfile.h
@@ -86,6 +86,13 @@ extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
+/*
+ * Unlink the .pack and associated extension files.
+ * Does not unlink if 'force_delete' is false and the pack-file is
+ * marked as ".keep".
+ */
+extern void unlink_pack_path(const char *pack_name, int force_delete);
+
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
  * and can provide at least 8 bytes of data.
-- 
2.21.0.1096.g1c91fdc207



[PATCH v5 02/11] Docs: rearrange subcommands for multi-pack-index

2019-04-24 Thread Derrick Stolee
We will add new subcommands to the multi-pack-index, and that will
make the documentation a bit messier. Clean up the 'verb'
descriptions by renaming the concept to 'subcommand' and removing
the reference to the object directory.

Helped-by: Stefan Beller 
Helped-by: Szeder Gábor 
Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index f7778a2c85..1af406aca2 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 
 [verse]
-'git multi-pack-index' [--object-dir=] 
+'git multi-pack-index' [--object-dir=] 
 
 DESCRIPTION
 ---
@@ -23,13 +23,13 @@ OPTIONS
`/packs/multi-pack-index` for the current MIDX file, and
`/packs` for the pack-files to index.
 
+The following subcommands are available:
+
 write::
-   When given as the verb, write a new MIDX file to
-   `/packs/multi-pack-index`.
+   Write a new MIDX file.
 
 verify::
-   When given as the verb, verify the contents of the MIDX file
-   at `/packs/multi-pack-index`.
+   Verify the contents of the MIDX file.
 
 
 EXAMPLES
-- 
2.21.0.1096.g1c91fdc207



[PATCH v5 06/11] multi-pack-index: implement 'expire' subcommand

2019-04-24 Thread Derrick Stolee
The 'git multi-pack-index expire' subcommand looks at the existing
mult-pack-index, counts the number of objects referenced in each
pack-file, deletes the pack-fils with no referenced objects, and
rewrites the multi-pack-index to no longer reference those packs.

Refactor the write_midx_file() method to call write_midx_internal()
which now takes an existing 'struct multi_pack_index' and a list
of pack-files to drop (as specified by the names of their pack-
indexes). As we write the new multi-pack-index, we drop those
file names from the list of known pack-files.

The expire_midx_packs() method removes the unreferenced pack-files
after carefully closing the packs to avoid open handles.

Test that a new pack-file that covers the contents of two other
pack-files leads to those pack-files being deleted during the
expire subcommand. Be sure to read the multi-pack-index to ensure
it no longer references those packs.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 120 +---
 t/t5319-multi-pack-index.sh |  20 ++
 2 files changed, 130 insertions(+), 10 deletions(-)

diff --git a/midx.c b/midx.c
index 95c39106b2..299e9b2e8f 100644
--- a/midx.c
+++ b/midx.c
@@ -33,6 +33,8 @@
 #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
 #define MIDX_LARGE_OFFSET_NEEDED 0x8000
 
+#define PACK_EXPIRED UINT_MAX
+
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
@@ -381,6 +383,7 @@ struct pack_info {
uint32_t orig_pack_int_id;
char *pack_name;
struct packed_git *p;
+   unsigned expired : 1;
 };
 
 static int pack_info_compare(const void *_a, const void *_b)
@@ -428,6 +431,7 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
 
packs->info[packs->nr].pack_name = xstrdup(file_name);
packs->info[packs->nr].orig_pack_int_id = packs->nr;
+   packs->info[packs->nr].expired = 0;
packs->nr++;
}
 }
@@ -587,13 +591,17 @@ static size_t write_midx_pack_names(struct hashfile *f,
size_t written = 0;
 
for (i = 0; i < num_packs; i++) {
-   size_t writelen = strlen(info[i].pack_name) + 1;
+   size_t writelen;
+
+   if (info[i].expired)
+   continue;
 
if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0)
BUG("incorrect pack-file order: %s before %s",
info[i - 1].pack_name,
info[i].pack_name);
 
+   writelen = strlen(info[i].pack_name) + 1;
hashwrite(f, info[i].pack_name, writelen);
written += writelen;
}
@@ -675,6 +683,11 @@ static size_t write_midx_object_offsets(struct hashfile 
*f, int large_offset_nee
for (i = 0; i < nr_objects; i++) {
struct pack_midx_entry *obj = list++;
 
+   if (perm[obj->pack_int_id] == PACK_EXPIRED)
+   BUG("object %s is in an expired pack with int-id %d",
+   oid_to_hex(&obj->oid),
+   obj->pack_int_id);
+
hashwrite_be32(f, perm[obj->pack_int_id]);
 
if (large_offset_needed && obj->offset >> 31)
@@ -721,7 +734,8 @@ static size_t write_midx_large_offsets(struct hashfile *f, 
uint32_t nr_large_off
return written;
 }
 
-int write_midx_file(const char *object_dir)
+static int write_midx_internal(const char *object_dir, struct multi_pack_index 
*m,
+  struct string_list *packs_to_drop)
 {
unsigned char cur_chunk, num_chunks = 0;
char *midx_name;
@@ -737,6 +751,8 @@ int write_midx_file(const char *object_dir)
struct pack_midx_entry *entries = NULL;
int large_offsets_needed = 0;
int pack_name_concat_len = 0;
+   int dropped_packs = 0;
+   int result = 0;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -745,7 +761,10 @@ int write_midx_file(const char *object_dir)
  midx_name);
}
 
-   packs.m = load_multi_pack_index(object_dir, 1);
+   if (m)
+   packs.m = m;
+   else
+   packs.m = load_multi_pack_index(object_dir, 1);
 
packs.nr = 0;
packs.alloc = packs.m ? packs.m->num_packs : 16;
@@ -759,13 +778,14 @@ int write_midx_file(const char *object_dir)
packs.info[packs.nr].orig_pack_int_id = i;
packs.info[packs.nr].pack_name = 
xstrdup(packs.m->pack_names[i]);
packs.info[packs.nr].p = NULL;
+   packs.info[packs.nr].expired = 0;
packs.nr++;
  

[PATCH v5 07/11] multi-pack-index: prepare 'repack' subcommand

2019-04-24 Thread Derrick Stolee
In an environment where the multi-pack-index is useful, it is due
to many pack-files and an inability to repack the object store
into a single pack-file. However, it is likely that many of these
pack-files are rather small, and could be repacked into a slightly
larger pack-file without too much effort. It may also be important
to ensure the object store is highly available and the repack
operation does not interrupt concurrent git commands.

Introduce a 'repack' subcommand to 'git multi-pack-index' that
takes a '--batch-size' option. The subcommand will inspect the
multi-pack-index for referenced pack-files whose size is smaller
than the batch size, until collecting a list of pack-files whose
sizes sum to larger than the batch size. Then, a new pack-file
will be created containing the objects from those pack-files that
are referenced by the multi-pack-index. The resulting pack is
likely to actually be smaller than the batch size due to
compression and the fact that there may be objects in the pack-
files that have duplicate copies in other pack-files.

The current change introduces the command-line arguments, and we
add a test that ensures we parse these options properly. Since
we specify a small batch size, we will guarantee that future
implementations do not change the list of pack-files.

In addition, we hard-code the modified times of the packs in
the pack directory to ensure the list of packs sorted by modified
time matches the order if sorted by size (ascending). This will
be important in a future test.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt | 17 +
 builtin/multi-pack-index.c | 12 ++--
 midx.c |  5 +
 midx.h |  1 +
 t/t5319-multi-pack-index.sh| 20 +++-
 5 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index 6186c4c936..233b2b7862 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -36,6 +36,23 @@ expire::
have no objects referenced by the MIDX. Rewrite the MIDX file
afterward to remove all references to these pack-files.
 
+repack::
+   Create a new pack-file containing objects in small pack-files
+   referenced by the multi-pack-index. If the size given by the
+   `--batch-size=` argument is zero, then create a pack
+   containing all objects referenced by the multi-pack-index. For
+   a non-zero batch size, Select the pack-files by examining packs
+   from oldest-to-newest, computing the "expected size" by counting
+   the number of objects in the pack referenced by the
+   multi-pack-index, then divide by the total number of objects in
+   the pack and multiply by the pack size. We select packs with
+   expected size below the batch size until the set of packs have
+   total expected size at least the batch size. If the total size
+   does not reach the batch size, then do nothing. If a new pack-
+   file is created, rewrite the multi-pack-index to reference the
+   new pack-file. A later run of 'git multi-pack-index expire' will
+   delete the pack-files that were part of this batch.
+
 
 EXAMPLES
 
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 145de3a46c..c66239de33 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,12 +5,13 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-   N_("git multi-pack-index [--object-dir=] (write|verify|expire)"),
+   N_("git multi-pack-index [--object-dir=] 
(write|verify|expire|repack --batch-size=)"),
NULL
 };
 
 static struct opts_multi_pack_index {
const char *object_dir;
+   unsigned long batch_size;
 } opts;
 
 int cmd_multi_pack_index(int argc, const char **argv,
@@ -19,6 +20,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
static struct option builtin_multi_pack_index_options[] = {
OPT_FILENAME(0, "object-dir", &opts.object_dir,
  N_("object directory containing set of packfile and 
pack-index pairs")),
+   OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
+ N_("during repack, collect pack-files of smaller size into a 
batch that is larger than this size")),
OPT_END(),
};
 
@@ -40,6 +43,11 @@ int cmd_multi_pack_index(int argc, const char **argv,
return 1;
}
 
+   if (!strcmp(argv[0], "repack"))
+   return midx_repack(opts.object_dir, (size_t)opts.batch_size);
+   if (opts.batch_size)
+   die(_("--batch-size option is only for 'repack' subcomma

[PATCH v5 09/11] multi-pack-index: test expire while adding packs

2019-04-24 Thread Derrick Stolee
During development of the multi-pack-index expire subcommand, a
version went out that improperly computed the pack order if a new
pack was introduced while other packs were being removed. Part of
the subtlety of the bug involved the new pack being placed before
other packs that already existed in the multi-pack-index.

Add a test to t5319-multi-pack-index.sh that catches this issue.
The test adds new packs that cause another pack to be expired, and
creates new packs that are lexicographically sorted before and
after the existing packs.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 32 
 1 file changed, 32 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index d6c1353514..19b769eea0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -463,4 +463,36 @@ test_expect_success 'expire removes repacked packs' '
)
 '
 
+test_expect_success 'expire works when adding new packs' '
+   (
+   cd dup &&
+   git pack-objects --revs .git/objects/pack/pack-combined <<-EOF 
&&
+   refs/heads/A
+   ^refs/heads/B
+   EOF
+   git pack-objects --revs .git/objects/pack/pack-combined <<-EOF 
&&
+   refs/heads/B
+   ^refs/heads/C
+   EOF
+   git pack-objects --revs .git/objects/pack/pack-combined <<-EOF 
&&
+   refs/heads/C
+   ^refs/heads/D
+   EOF
+   git multi-pack-index write &&
+   git pack-objects --revs .git/objects/pack/a-pack <<-EOF &&
+   refs/heads/D
+   ^refs/heads/E
+   EOF
+   git multi-pack-index write &&
+   git pack-objects --revs .git/objects/pack/z-pack <<-EOF &&
+   refs/heads/E
+   EOF
+   git multi-pack-index expire &&
+   ls .git/objects/pack/ | grep idx >expect &&
+   test-tool read-midx .git/objects | grep idx >actual &&
+   test_cmp expect actual &&
+   git multi-pack-index verify
+   )
+'
+
 test_done
-- 
2.21.0.1096.g1c91fdc207



[PATCH v5 10/11] midx: add test that 'expire' respects .keep files

2019-04-24 Thread Derrick Stolee
The 'git multi-pack-index expire' subcommand may delete packs that
are not needed from the perspective of the multi-pack-index. If
a pack has a .keep file, then we should not delete that pack. Add
a test that ensures we preserve a pack that would otherwise be
expired. First, create a new pack that contains every object in
the repo, then add it to the multi-pack-index. Then create a .keep
file for a pack starting with "a-pack" that was added in the
previous test. Finally, expire and verify that the pack remains
and the other packs were expired.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 19b769eea0..bcfa520401 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -495,4 +495,22 @@ test_expect_success 'expire works when adding new packs' '
)
 '
 
+test_expect_success 'expire respects .keep files' '
+   (
+   cd dup &&
+   git pack-objects --revs .git/objects/pack/pack-all <<-EOF &&
+   refs/heads/A
+   EOF
+   git multi-pack-index write &&
+   PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) 
&&
+   touch $PACKA.keep &&
+   git multi-pack-index expire &&
+   ls -S .git/objects/pack/a-pack* | grep $PACKA >a-pack-files &&
+   test_line_count = 3 a-pack-files &&
+   test-tool read-midx .git/objects | grep idx >midx-list &&
+   test_line_count = 2 midx-list
+   )
+'
+
+
 test_done
-- 
2.21.0.1096.g1c91fdc207



[PATCH v5 00/11] Create 'expire' and 'repack' verbs for git-multi-pack-index

2019-04-24 Thread Derrick Stolee
The multi-pack-index provides a fast way to find an object among a large list
of pack-files. It stores a single pack-reference for each object id, so
duplicate objects are ignored. Among a list of pack-files storing the same
object, the most-recently modified one is used.

Create new subcommands for the multi-pack-index builtin.

* 'git multi-pack-index expire': If we have a pack-file indexed by the
  multi-pack-index, but all objects in that pack are duplicated in
  more-recently modified packs, then delete that pack (and any others like it).
  Delete the reference to that pack in the multi-pack-index.

* 'git multi-pack-index repack --batch-size=': Starting from the oldest
  pack-files covered by the multi-pack-index, find those whose "expected size"
  is below the batch size until we have a collection of packs whose expected
  sizes add up to the batch size. We compute the expected size by multiplying
  the number of referenced objects by the pack-size and dividing by the total
  number of objects in the pack. If the batch-size is zero, then select all
  packs. Create a new pack containing all objects that the multi-pack-index
  references to those packs.

This allows us to create a new pattern for repacking objects: run 'repack'.
After enough time has passed that all Git commands that started before the last
'repack' are finished, run 'expire' again. This approach has some advantages
over the existing "repack everything" model:

1. Incremental. We can repack a small batch of objects at a time, instead of
repacking all reachable objects. We can also limit ourselves to the objects
that do not appear in newer pack-files.

2. Highly Available. By adding a new pack-file (and not deleting the old
pack-files) we do not interrupt concurrent Git commands, and do not suffer
performance degradation. By expiring only pack-files that have no referenced
objects, we know that Git commands that are doing normal object lookups* will
not be interrupted.

* Note: if someone concurrently runs a Git command that uses get_all_packs(),
* then that command could try to read the pack-files and pack-indexes that we
* are deleting during an expire command. Such commands are usually related to
* object maintenance (i.e. fsck, gc, pack-objects) or are related to
* less-often-used features (i.e. fast-import, http-backend, server-info).

We **are using** this approach in VFS for Git to do background maintenance of
the "shared object cache" which is a Git alternate directory filled with
packfiles containing commits and trees. We currently download pack-files on an
hourly basis to keep up-to-date with the central server. The cache servers
supply packs on an hourly and daily basis, so most of the hourly packs become
useless after a new daily pack is downloaded. The 'expire' command would clear
out most of those packs, but many will still remain with fewer than 100 objects
remaining. The 'repack' command (with a batch size of 1-3gb, probably) can
condense the remaining packs in commands that run for 1-3 min at a time. Since
the daily packs range from 100-250mb, we will also combine and condense those
packs.

Updates in V5:

* Fixed the error in PATCH 7 due to a missing line that existed in PATCH 8. 
Thanks, Josh Steadmon!

* The 'repack' subcommand now computes the "expected size" of a pack instead of
  relying on the total size of the pack. This is actually really important to
  the way VFS for Git uses prefetch packs, and some packs are not being
  repacked because the pack size is larger than the batch size, but really
  there are only a few referenced objects.

* The 'repack' subcommand now allows a batch size of zero to mean "create one
  pack containing all objects in the multi-pack-index". A new commit adds a
  test that hits the boundary cases here, but follows the 'expire' subcommand
  so we can show that cycle of repack-then-expire to safely replace the packs.

Junio: It appears that there are some conflicts with the trace2 changes in
master. These are not new to the updates in this version. I saw how you
resolved these conflicts and replaying that resolution should work for you.

Thanks,
-Stolee

Derrick Stolee (11):
  repack: refactor pack deletion for future use
  Docs: rearrange subcommands for multi-pack-index
  multi-pack-index: prepare for 'expire' subcommand
  midx: simplify computation of pack name lengths
  midx: refactor permutation logic and pack sorting
  multi-pack-index: implement 'expire' subcommand
  multi-pack-index: prepare 'repack' subcommand
  midx: implement midx_repack()
  multi-pack-index: test expire while adding packs
  midx: add test that 'expire' respects .keep files
  t5319-multi-pack-index.sh: test batch size zero

 Documentation/git-multi-pack-index.txt |  32 +-
 builtin/multi-pack-index.c

[PATCH v5 11/11] t5319-multi-pack-index.sh: test batch size zero

2019-04-24 Thread Derrick Stolee
The 'git multi-pack-index repack' command can take a batch size of
zero, which creates a new pack-file containing all objects in the
multi-pack-index. The first 'repack' command will create one new
pack-file, and an 'expire' command after that will delete the old
pack-files, as they no longer contain any referenced objects in the
multi-pack-index.

We must remove the .keep file that was added in the previous test
in order to expire that pack-file.

Also test that a 'repack' will do nothing if there is only one
pack-file.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index bcfa520401..0f116b4b92 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -512,5 +512,24 @@ test_expect_success 'expire respects .keep files' '
)
 '
 
+test_expect_success 'repack --batch-size=0 repacks everything' '
+   (
+   cd dup &&
+   rm .git/objects/pack/*.keep &&
+   ls .git/objects/pack/*idx >idx-list &&
+   test_line_count = 2 idx-list &&
+   git multi-pack-index repack --batch-size=0 &&
+   ls .git/objects/pack/*idx >idx-list &&
+   test_line_count = 3 idx-list &&
+   test-tool read-midx .git/objects | grep idx >midx-list &&
+   test_line_count = 3 midx-list &&
+   git multi-pack-index expire &&
+   ls -al .git/objects/pack/*idx >idx-list &&
+   test_line_count = 1 idx-list &&
+   git multi-pack-index repack --batch-size=0 &&
+   ls -al .git/objects/pack/*idx >new-idx-list &&
+   test_cmp idx-list new-idx-list
+   )
+'
 
 test_done
-- 
2.21.0.1096.g1c91fdc207



[PATCH v5 05/11] midx: refactor permutation logic and pack sorting

2019-04-24 Thread Derrick Stolee
In anticipation of the expire subcommand, refactor the way we sort
the packfiles by name. This will greatly simplify our approach to
dropping expired packs from the list.

First, create 'struct pack_info' to replace 'struct pack_pair'.
This struct contains the necessary information about a pack,
including its name, a pointer to its packfile struct (if not
already in the multi-pack-index), and the original pack-int-id.

Second, track the pack information using an array of pack_info
structs in the pack_list struct. This simplifies the logic around
the multiple arrays we were tracking in that struct.

Finally, update get_sorted_entries() to not permute the pack-int-id
and instead supply the permutation to write_midx_object_offsets().
This requires sorting the packs after get_sorted_entries().

Signed-off-by: Derrick Stolee 
---
 midx.c | 156 +
 1 file changed, 69 insertions(+), 87 deletions(-)

diff --git a/midx.c b/midx.c
index f087bbbe82..95c39106b2 100644
--- a/midx.c
+++ b/midx.c
@@ -377,12 +377,23 @@ static size_t write_midx_header(struct hashfile *f,
return MIDX_HEADER_SIZE;
 }
 
+struct pack_info {
+   uint32_t orig_pack_int_id;
+   char *pack_name;
+   struct packed_git *p;
+};
+
+static int pack_info_compare(const void *_a, const void *_b)
+{
+   struct pack_info *a = (struct pack_info *)_a;
+   struct pack_info *b = (struct pack_info *)_b;
+   return strcmp(a->pack_name, b->pack_name);
+}
+
 struct pack_list {
-   struct packed_git **list;
-   char **names;
+   struct pack_info *info;
uint32_t nr;
-   uint32_t alloc_list;
-   uint32_t alloc_names;
+   uint32_t alloc;
struct multi_pack_index *m;
 };
 
@@ -395,66 +406,32 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,
if (packs->m && midx_contains_pack(packs->m, file_name))
return;
 
-   ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
-   ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
+   ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc);
 
-   packs->list[packs->nr] = add_packed_git(full_path,
-   full_path_len,
-   0);
+   packs->info[packs->nr].p = add_packed_git(full_path,
+ full_path_len,
+ 0);
 
-   if (!packs->list[packs->nr]) {
+   if (!packs->info[packs->nr].p) {
warning(_("failed to add packfile '%s'"),
full_path);
return;
}
 
-   if (open_pack_index(packs->list[packs->nr])) {
+   if (open_pack_index(packs->info[packs->nr].p)) {
warning(_("failed to open pack-index '%s'"),
full_path);
-   close_pack(packs->list[packs->nr]);
-   FREE_AND_NULL(packs->list[packs->nr]);
+   close_pack(packs->info[packs->nr].p);
+   FREE_AND_NULL(packs->info[packs->nr].p);
return;
}
 
-   packs->names[packs->nr] = xstrdup(file_name);
+   packs->info[packs->nr].pack_name = xstrdup(file_name);
+   packs->info[packs->nr].orig_pack_int_id = packs->nr;
packs->nr++;
}
 }
 
-struct pack_pair {
-   uint32_t pack_int_id;
-   char *pack_name;
-};
-
-static int pack_pair_compare(const void *_a, const void *_b)
-{
-   struct pack_pair *a = (struct pack_pair *)_a;
-   struct pack_pair *b = (struct pack_pair *)_b;
-   return strcmp(a->pack_name, b->pack_name);
-}
-
-static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t 
*perm)
-{
-   uint32_t i;
-   struct pack_pair *pairs;
-
-   ALLOC_ARRAY(pairs, nr_packs);
-
-   for (i = 0; i < nr_packs; i++) {
-   pairs[i].pack_int_id = i;
-   pairs[i].pack_name = pack_names[i];
-   }
-
-   QSORT(pairs, nr_packs, pack_pair_compare);
-
-   for (i = 0; i < nr_packs; i++) {
-   pack_names[i] = pairs[i].pack_name;
-   perm[pairs[i].pack_int_id] = i;
-   }
-
-   free(pairs);
-}
-
 struct pack_midx_entry {
struct object_id oid;
uint32_t pack_int_id;
@@ -480,7 +457,6 @@ static int midx_oid_compare(const void *_a, const void *_b)
 }
 
 static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
- ui

[PATCH v5 04/11] midx: simplify computation of pack name lengths

2019-04-24 Thread Derrick Stolee
Before writing the multi-pack-index, we compute the length of the
pack-index names concatenated together. This forms the data in the
pack name chunk, and we precompute it to compute chunk offsets.
The value is also modified to fit alignment needs.

Previously, this computation was coupled with adding packs from
the existing multi-pack-index and the remaining packs in the object
dir not already covered by the multi-pack-index.

In anticipation of this becoming more complicated with the 'expire'
subcommand, simplify the computation by centralizing it to a single
loop before writing the file.

Signed-off-by: Derrick Stolee 
---
 midx.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/midx.c b/midx.c
index bb825ef816..f087bbbe82 100644
--- a/midx.c
+++ b/midx.c
@@ -383,7 +383,6 @@ struct pack_list {
uint32_t nr;
uint32_t alloc_list;
uint32_t alloc_names;
-   size_t pack_name_concat_len;
struct multi_pack_index *m;
 };
 
@@ -418,7 +417,6 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
}
 
packs->names[packs->nr] = xstrdup(file_name);
-   packs->pack_name_concat_len += strlen(file_name) + 1;
packs->nr++;
}
 }
@@ -762,6 +760,7 @@ int write_midx_file(const char *object_dir)
uint32_t nr_entries, num_large_offsets = 0;
struct pack_midx_entry *entries = NULL;
int large_offsets_needed = 0;
+   int pack_name_concat_len = 0;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -777,7 +776,6 @@ int write_midx_file(const char *object_dir)
packs.alloc_names = packs.alloc_list;
packs.list = NULL;
packs.names = NULL;
-   packs.pack_name_concat_len = 0;
ALLOC_ARRAY(packs.list, packs.alloc_list);
ALLOC_ARRAY(packs.names, packs.alloc_names);
 
@@ -788,7 +786,6 @@ int write_midx_file(const char *object_dir)
 
packs.list[packs.nr] = NULL;
packs.names[packs.nr] = xstrdup(packs.m->pack_names[i]);
-   packs.pack_name_concat_len += 
strlen(packs.names[packs.nr]) + 1;
packs.nr++;
}
}
@@ -798,10 +795,6 @@ int write_midx_file(const char *object_dir)
if (packs.m && packs.nr == packs.m->num_packs)
goto cleanup;
 
-   if (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
-   packs.pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
- (packs.pack_name_concat_len % 
MIDX_CHUNK_ALIGNMENT);
-
ALLOC_ARRAY(pack_perm, packs.nr);
sort_packs_by_name(packs.names, packs.nr, pack_perm);
 
@@ -814,6 +807,13 @@ int write_midx_file(const char *object_dir)
large_offsets_needed = 1;
}
 
+   for (i = 0; i < packs.nr; i++)
+   pack_name_concat_len += strlen(packs.names[i]) + 1;
+
+   if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
+   pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
+   (pack_name_concat_len % 
MIDX_CHUNK_ALIGNMENT);
+
hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
FREE_AND_NULL(midx_name);
@@ -831,7 +831,7 @@ int write_midx_file(const char *object_dir)
 
cur_chunk++;
chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT;
-   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
pack_name_concat_len;
 
cur_chunk++;
chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
-- 
2.21.0.1096.g1c91fdc207



[PATCH v5 03/11] multi-pack-index: prepare for 'expire' subcommand

2019-04-24 Thread Derrick Stolee
The multi-pack-index tracks objects in a collection of pack-files.
Only one copy of each object is indexed, using the modified time
of the pack-files to determine tie-breakers. It is possible to
have a pack-file with no referenced objects because all objects
have a duplicate in a newer pack-file.

Introduce a new 'expire' subcommand to the multi-pack-index builtin.
This subcommand will delete these unused pack-files and rewrite the
multi-pack-index to no longer refer to those files. More details
about the specifics will follow as the method is implemented.

Add a test that verifies the 'expire' subcommand is correctly wired,
but will still be valid when the verb is implemented. Specifically,
create a set of packs that should all have referenced objects and
should not be removed during an 'expire' operation. The packs are
created carefully to ensure they have a specific order when sorted
by size. This will be important in a later test.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt |  5 +++
 builtin/multi-pack-index.c |  4 ++-
 midx.c |  5 +++
 midx.h |  1 +
 t/t5319-multi-pack-index.sh| 49 ++
 5 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index 1af406aca2..6186c4c936 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -31,6 +31,11 @@ write::
 verify::
Verify the contents of the MIDX file.
 
+expire::
+   Delete the pack-files that are tracked  by the MIDX file, but
+   have no objects referenced by the MIDX. Rewrite the MIDX file
+   afterward to remove all references to these pack-files.
+
 
 EXAMPLES
 
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index fca70f8e4f..145de3a46c 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,7 +5,7 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-   N_("git multi-pack-index [--object-dir=] (write|verify)"),
+   N_("git multi-pack-index [--object-dir=] (write|verify|expire)"),
NULL
 };
 
@@ -44,6 +44,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
return write_midx_file(opts.object_dir);
if (!strcmp(argv[0], "verify"))
return verify_midx_file(opts.object_dir);
+   if (!strcmp(argv[0], "expire"))
+   return expire_midx_packs(opts.object_dir);
 
die(_("unrecognized verb: %s"), argv[0]);
 }
diff --git a/midx.c b/midx.c
index 730ff84dff..bb825ef816 100644
--- a/midx.c
+++ b/midx.c
@@ -1025,3 +1025,8 @@ int verify_midx_file(const char *object_dir)
 
return verify_midx_error;
 }
+
+int expire_midx_packs(const char *object_dir)
+{
+   return 0;
+}
diff --git a/midx.h b/midx.h
index 774f652530..e3a2b740b5 100644
--- a/midx.h
+++ b/midx.h
@@ -49,6 +49,7 @@ int prepare_multi_pack_index_one(struct repository *r, const 
char *object_dir, i
 int write_midx_file(const char *object_dir);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(const char *object_dir);
+int expire_midx_packs(const char *object_dir);
 
 void close_midx(struct multi_pack_index *m);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 70926b5bc0..a8528f7da0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -348,4 +348,53 @@ test_expect_success 'verify incorrect 64-bit offset' '
"incorrect object offset"
 '
 
+test_expect_success 'setup expire tests' '
+   mkdir dup &&
+   (
+   cd dup &&
+   git init &&
+   test-tool genrandom "data" 4096 >large_file.txt &&
+   git update-index --add large_file.txt &&
+   for i in $(test_seq 1 20)
+   do
+   test_commit $i
+   done &&
+   git branch A HEAD &&
+   git branch B HEAD~8 &&
+   git branch C HEAD~13 &&
+   git branch D HEAD~16 &&
+   git branch E HEAD~18 &&
+   git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
+   refs/heads/A
+   ^refs/heads/B
+   EOF
+   git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
+   refs/heads/B
+   ^refs/heads/C
+   EOF
+   git pack-objects --revs .git/objects/pack/pack-C <<-EOF &&
+   refs/heads/C
+   ^refs/heads/D
+   EOF
+   git pack-objec

Git Test Coverage Report (April 23)

2019-04-23 Thread Derrick Stolee
lone_filter_default,
b21a55f3 139) return 0;
dcc8b4e9 202) static int remove_fetched_oids(struct object_id **oids, int 
oid_nr, int to_free)
dcc8b4e9 204) int i, missing_nr = 0;
dcc8b4e9 205) int *missing = xcalloc(oid_nr, sizeof(*missing));
dcc8b4e9 206) struct object_id *old_oids = *oids;
dcc8b4e9 208) int old_fetch_if_missing = fetch_if_missing;
dcc8b4e9 210) fetch_if_missing = 0;
dcc8b4e9 212) for (i = 0; i < oid_nr; i++)
dcc8b4e9 213) if (oid_object_info_extended(the_repository, &old_oids[i], NULL, 
0)) {
dcc8b4e9 214) missing[i] = 1;
dcc8b4e9 215) missing_nr++;
dcc8b4e9 218) fetch_if_missing = old_fetch_if_missing;
dcc8b4e9 220) if (missing_nr) {
dcc8b4e9 221) int j = 0;
dcc8b4e9 222) new_oids = xcalloc(missing_nr, sizeof(*new_oids));
dcc8b4e9 223) for (i = 0; i < oid_nr; i++)
dcc8b4e9 224) if (missing[i])
dcc8b4e9 225) oidcpy(&new_oids[j++], &old_oids[i]);
dcc8b4e9 226) *oids = new_oids;
dcc8b4e9 227) if (to_free)
dcc8b4e9 228) free(old_oids);
dcc8b4e9 231) free(missing);
dcc8b4e9 233) return missing_nr;
dcc8b4e9 248) if (missing_nr == 1)
dcc8b4e9 249) continue;
dcc8b4e9 250) missing_nr = remove_fetched_oids(&missing_oids, missing_nr, 
to_free);
dcc8b4e9 251) if (missing_nr) {
dcc8b4e9 252) to_free = 1;
dcc8b4e9 253) continue;
dcc8b4e9 261) free(missing_oids);

protocol.c
6da1f1a9 37) die(_("Unrecognized protocol version"));
6da1f1a9 39) die(_("Unrecognized protocol_version"));
6da1f1a9 74) BUG("late attempt to register an allowed protocol version");

ref-filter.c
3da20422 93) keydata_aka_refname ? keydata_aka_refname : k->wt->head_ref);

remote-curl.c
6da1f1a9 345) return 0;

sequencer.c
4a72486d 2184) return error_errno("unable to open '%s'", todo_file);
4a72486d 2188) eol--; /* strip Carriage Return */
4a72486d 2190) goto fail;
4a72486d 2196) goto fail;
b07d9bfd 2300) error_errno("unable to open '%s'", todo_path);
b07d9bfd 2301) return 0;

upload-pack.c
a8d662e3 130) return readsz;
820a5361 149) BUG("packfile_uris requires sideband-all");
a8d662e3 355) send_client_data(1, output_state.buffer, output_state.used);
820a5361 1384) string_list_clear(&data->uri_protocols, 0);

wrapper.c
5efde212 70) die("Out of memory, malloc failed (tried to allocate %" PRIuMAX " 
bytes)",
5efde212 73) error("Out of memory, malloc failed (tried to allocate %" PRIuMAX 
" bytes)",

Commits introducting uncovered code:
Barret Rhoden   dfb4ee12 blame: use a fingerprint heuristic to match ignored 
lines
Barret Rhoden   d0738d94 blame: add config options to handle output for ignored 
lines
Barret Rhoden   a5c91678 blame: add the ability to ignore commits and their 
changes
Barret Rhoden   3486d8d4 Move init_skiplist() outside of fsck
Christian Couder0ba08c05 Remove fetch-object.{c,h} in favor of 
promisor-remote.{c,h}
Christian Couder54248706 Add initial support for many promisor remotes
Christian Couder7bdf0926 promisor-remote: use 
repository_format_partial_clone
Christian Couder7b6e1b04 Move core_partial_clone_filter_default to 
promisor-remote.c
Christian Couderb21a55f3 promisor-remote: parse 
remote.*.partialclonefilter
Christian Couderdcc8b4e9 promisor-remote: implement 
promisor_remote_get_direct()
Christian Coudere265069a Use promisor_remote_get_direct() and 
has_promisor_remote()
Dan McGregorba81921a http: cast result to FILE *
Derrick Stolee  8e7e6c05 commit-graph: return with errors during write
Derrick Stolee  b1beb050 commit-graph: create new version flags
Jonathan Tanbf01639c fetch-pack: support more than one pack lockfile
Jonathan Tana8d662e3 upload-pack: refactor reading of pack-objects out
Jonathan Tan472fbef8 http-fetch: support fetching packfiles by URL
Jonathan Tan820a5361 upload-pack: send part of packfile response as uri
Josh Steadmon   6da1f1a9 protocol: advertise multiple supported versions
Junio C Hamano  8b13ff27 Merge branch 'jt/fetch-cdn-offload' into pu
Martin Koegler  5efde212 zlib.c: use size_t for size
Nguyễn Thái Ngọc Duy6f11fd5e config: add --move-to
Nguyễn Thái Ngọc Duya12c1ff3 config: factor out set_config_source_file()
Nguyễn Thái Ngọc Duy8f7c7f55 config.c: add repo_config_set_worktree_gently()
Nickolai Belakovski 3da20422 ref-filter: add worktreepath atom
Phillip Wood33898531 rebase -i: use struct object_id for squash_onto
Phillip Wood460bc3ce rebase -i: run without forking rebase--interactive
Phillip Wood0ea0847e rebase -i: use struct rebase_options in 
do_interactive_rebase()
Phillip Wood7d3488eb rebase -i: use struct commit when parsing options
Phillip Woodc44c2462 rebase -i: remove duplication
Phillip Wood0609b741 rebase -i: combine rebase--interactive.c with rebase.c
Phillip Wood4a72486d fix cherry-pick/revert status after commit
Phillip Woodb07d9bfd commit/reset: try to clean up sequencer state
Phillip

Re: [PATCH] t5304: add a test for pruning with bitmaps

2019-04-20 Thread Derrick Stolee
On 4/19/2019 11:24 PM, Jeff King wrote:
> Try running t5304 with this:
> 
> diff --git a/reachable.c b/reachable.c
> index eba6f64e01..7ec73ef43f 100644
> --- a/reachable.c
> +++ b/reachable.c
> @@ -191,6 +191,8 @@ static int mark_object_seen(const struct object_id *oid,
>   if (!obj)
>   die("unable to create object '%s'", oid_to_hex(oid));
>  
> + if (!(obj->flags & SEEN))
> + die("seen flag not already there");
>   obj->flags |= SEEN;
>   return 0;
>  }
> 
> which shows that we are indeed freshly setting SEEN on some objects.

Good point! Thanks for clearing that up for me.
 
> Interestingly, I don't _think_ you can cause an object to get pruned
> accidentally here, because:

[snip]

I appreciate the additional context that you gave (and I snipped). This
makes me comfortable accepting this test as an exercise of the code (to
guard against future changes that create failures like null-refs) and we
will need to rely on the performance suite to catch issues like removing
the SEEN markers that I had tested.

Thanks,
-Stolee



Re: [PATCH] t5304: add a test for pruning with bitmaps

2019-04-19 Thread Derrick Stolee
On 4/18/2019 4:08 PM, Jeff King wrote:
> On Thu, Apr 18, 2019 at 03:49:53PM -0400, Jeff King wrote:
>> I dunno. I guess it does not hurt to at least to at least make sure this
>> code is running in the normal suite. I don't think that will find the
>> more interesting regressions, but at least saves us the from the most
>> egregious ones ("oops, the whole thing segfaults because somebody
>> changed the API" kinds of things).

That's the level of coverage I was hoping to see. I won't be picky if
the OBJ_TAG case isn't hit or something.

> So here's a test. This goes on top of jk/prune-optim (which is also
> already in master).

[snip]

> +test_expect_success 'trivial prune with bitmaps enabled' '
> + git repack -adb &&
> + blob=$(echo bitmap-unreachable-blob | git hash-object -w --stdin) &&
> + git prune --expire=now &&
> + git cat-file -e HEAD &&
> + test_must_fail git cat-file -e $blob
> +'
> +
>  test_done

This test does cover the 'mark_object_seen()' method, which I tested by
removing the "!" in the "if (!obj) die();" condition.

However, my first inclination was to remove the line

obj->flags |= SEEN;

from the method, and I found that it still worked! This was confusing,
so I looked for places where the SEEN flag was added during bitmap walks,
and it turns out that the objects are marked as SEEN by prepare_bitmap_walk().

This means that we can remove a lot of the implementation from reachable.c
and have the same effect (and drop an iteration through the objects). See the
diff below.

Thoughts?
-Stolee

-->8--

diff --git a/reachable.c b/reachable.c
index 0d00a91de4..7d2762d47f 100644
--- a/reachable.c
+++ b/reachable.c
@@ -159,39 +159,6 @@ int add_unseen_recent_objects_to_traversal(struct rev_info 
*revs,
  FOR_EACH_OBJECT_LOCAL_ONLY);
 }

-static void *lookup_object_by_type(struct repository *r,
-  const struct object_id *oid,
-  enum object_type type)
-{
-   switch (type) {
-   case OBJ_COMMIT:
-   return lookup_commit(r, oid);
-   case OBJ_TREE:
-   return lookup_tree(r, oid);
-   case OBJ_TAG:
-   return lookup_tag(r, oid);
-   case OBJ_BLOB:
-   return lookup_blob(r, oid);
-   default:
-   die("BUG: unknown object type %d", type);
-   }
-}
-
-static int mark_object_seen(const struct object_id *oid,
-enum object_type type,
-int exclude,
-uint32_t name_hash,
-struct packed_git *found_pack,
-off_t found_offset)
-{
-   struct object *obj = lookup_object_by_type(the_repository, oid, type);
-   if (!obj)
-   die("unable to create object '%s'", oid_to_hex(oid));
-
-   obj->flags |= SEEN;
-   return 0;
-}
-
 void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
timestamp_t mark_recent, struct progress *progress)
 {
@@ -225,7 +192,10 @@ void mark_reachable_objects(struct rev_info *revs, int 
mark_reflog,

bitmap_git = prepare_bitmap_walk(revs);
if (bitmap_git) {
-   traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
+   /*
+* reachable objects are marked as SEEN
+* by prepare_bitmap_walk(revs).
+*/
free_bitmap_index(bitmap_git);
return;
}


Stalled ds/ branches (was What's cooking in git.git (Apr 2019, #03; Tue, 16))

2019-04-16 Thread Derrick Stolee
On 4/16/2019 9:19 AM, Junio C Hamano wrote:

> * ds/midx-expire-repack (2019-01-27) 10 commits
>  - midx: add test that 'expire' respects .keep files
>  - multi-pack-index: test expire while adding packs
>  - midx: implement midx_repack()
>  - multi-pack-index: prepare 'repack' subcommand
>  - multi-pack-index: implement 'expire' subcommand
>  - midx: refactor permutation logic and pack sorting
>  - midx: simplify computation of pack name lengths
>  - multi-pack-index: prepare for 'expire' subcommand
>  - Docs: rearrange subcommands for multi-pack-index
>  - repack: refactor pack deletion for future use
> 
>  "git multi-pack-index expire/repack" are new subcommands that
>  consult midx file and are used to drop unused pack files and
>  coalesce small pack files that are still in use.
> 
>  Comments?

I'll be back to work full-time next week, and I have some ideas
to update this command. Specifically, the 'repack' command can
be improved in a few ways. I will elaborate on-thread when I
send a new version.

> * ds/commit-graph-format-v2 (2019-01-29) 8 commits
>  - SQUASH : misnamed variables and style fix
>  - commit-graph: test verifying a corrupt v2 header
>  - commit-graph: implement file format version 2
>  - commit-graph: add --version= option
>  - commit-graph: create new version flags
>  - commit-graph: collapse parameters into flags
>  - commit-graph: return with errors during write
>  - Merge branch 'bc/sha-256' into ds/commit-graph-format-v2
> 
>  Introduce version 2 of the commit-graph format to correct
>  deficiency in the initial version.
> 
>  Needs update before merging to 'next'.

Sorry for the issues colliding with other topics. I'll send a
new version next week that will resolve the conflicts. Hopefully
I can do this without needing to update the base commit.

Thanks,
-Stolee




Re: [PATCH 07/14] parse-options: allow ll_callback with OPTION_CALLBACK

2019-04-16 Thread Derrick Stolee
On 4/16/2019 4:52 AM, Duy Nguyen wrote:
> On Mon, Apr 15, 2019 at 9:06 PM Derrick Stolee  wrote:
>>
>> On 1/26/2019 7:35 PM, Nguyễn Thái Ngọc Duy wrote:
>>> @@ -238,7 +249,10 @@ static enum parse_opt_result parse_short_opt(struct 
>>> parse_opt_ctx_t *p,
>>>   len++;
>>>   arg = xmemdupz(p->opt, len);
>>>   p->opt = p->opt[len] ? p->opt + len : NULL;
>>> - rc = (*numopt->callback)(numopt, arg, 0) ? (-1) : 0;
>>> + if (numopt->callback)
>>> + rc = (*numopt->callback)(numopt, arg, 0) ? (-1) : 0;
>>> + else
>>> + rc = (*numopt->ll_callback)(p, numopt, arg, 0);
>>>   free(arg);
>>>   return rc;
>>>   }
>>
>> Hi Duy,
>>
>> This "else" condition is unreachable. This block is only hit when we have a 
>> "-"
>> option, using OPT_NUMBER_CALLBACK, which is implemented by filling 
>> "callback", never
>> "ll_callback".
> 
> It does not mean ll_callback cannot be used in the future though.

That's not a very good reason to add it now. YAGNI.

> We
> have three options
> 
> 1. drop the else clause
> 2. replace with "else BUG();"
> 3. implement proper else clause
> 
> Option #1 to me sounds wrong. If you don't support something, yell up.
> Silently ignoring it only makes it harder to track down to this
> unsupported location when it becomes reachable, however unlikely that
> is.
> 
> Which leaves options #2 and #3. If you think this one line is risky
> enough, I'll send a patch to replace it with BUG().

It's not about risk, but the fact that it is pointless. The only way to get to
this block is to create a 'struct option' with type OPTION_NUMBER manually
(ignoring the OPT_NUMBER_CALLBACK macro), which _should_ be unsupported. If
someone goes to the pain of adding a way to instantiate with a low-level 
callback,
then they should add this 'if' statement.

If you are going to add protection from an incorrect instantiation of 'callback'
in a use of OPT_NUMBER_CALLBACK, then the proper place to do that is probably in
parse_options_check(), where you were already doing callback-vs-ll_callback 
checks
in the case of OPTION_CALLBACK and OPTION_LOWLEVEL_CALLBACK.

However, the only places where the OPT_NUMBER_CALLBACK option appears are
builtin/grep.c and t/helper/test-parse-options.c. I don't imagine a need to
add low-level callbacks any time soon.

>> I recommend reverting this diff segment, but please let me know if I'm 
>> missing something.

I still think this is the easiest way to remove the dead code.

I'll try to submit a patch later. I'm still not in full work mode, so I may be 
slow.

Thanks,
-Stolee



Re: [PATCH 2/3] prune: use bitmaps for reachability traversal

2019-04-15 Thread Derrick Stolee
On 2/13/2019 11:37 PM, Jeff King wrote:
> +static void *lookup_object_by_type(struct repository *r,
> +const struct object_id *oid,
> +enum object_type type)
> +{
> + switch (type) {
> + case OBJ_COMMIT:
> + return lookup_commit(r, oid);
> + case OBJ_TREE:
> + return lookup_tree(r, oid);
> + case OBJ_TAG:
> + return lookup_tag(r, oid);
> + case OBJ_BLOB:
> + return lookup_blob(r, oid);
> + default:
> + die("BUG: unknown object type %d", type);
> + }
> +}
> +
> +static int mark_object_seen(const struct object_id *oid,
> +  enum object_type type,
> +  int exclude,
> +  uint32_t name_hash,
> +  struct packed_git *found_pack,
> +  off_t found_offset)
> +{
> + struct object *obj = lookup_object_by_type(the_repository, oid, type);
> + if (!obj)
> + die("unable to create object '%s'", oid_to_hex(oid));
> +
> + obj->flags |= SEEN;
> + return 0;
> +}
> +
>  void mark_reachable_objects(struct rev_info *revs, int mark_reflog,
>   timestamp_t mark_recent, struct progress *progress)
>  {
>   struct connectivity_progress cp;
> + struct bitmap_index *bitmap_git;
>  
>   /*
>* Set up revision parsing, and mark us as being interested
> @@ -188,6 +223,13 @@ void mark_reachable_objects(struct rev_info *revs, int 
> mark_reflog,
>   cp.progress = progress;
>   cp.count = 0;
>  
> + bitmap_git = prepare_bitmap_walk(revs);
> + if (bitmap_git) {
> + traverse_bitmap_commit_list(bitmap_git, mark_object_seen);
> + free_bitmap_index(bitmap_git);
> + return;
> + }
> +

Peff,

This block after "if (bitmap_git)" is not exercised by the (non-performance)
test suite, so the rest of the code above is not tested, either. Could a test
of this "prune" capability be added to the regression tests around the bitmaps?

Thanks,
-Stolee


Re: [PATCH v6 2/2] config: allow giving separate author and committer idents

2019-04-15 Thread Derrick Stolee
On 2/5/2019 2:52 PM, Ævar Arnfjörð Bjarmason wrote:
> From: William Hubbs 
> -const char *fmt_name(const char *name, const char *email)
> +const char *fmt_name(enum want_ident whose_ident)
>  {
> - return fmt_ident(name, email, NULL, IDENT_STRICT | IDENT_NO_DATE);
> + char *name = NULL;
> + char *email = NULL;
> +
> + switch (whose_ident) {
> + case WANT_BLANK_IDENT:
> + break;
> + case WANT_AUTHOR_IDENT:
> + name = getenv("GIT_AUTHOR_NAME");
> + email = getenv("GIT_AUTHOR_EMAIL");
> + break;
> + case WANT_COMMITTER_IDENT:
> + name = getenv("GIT_COMMITTER_NAME");
> + email = getenv("GIT_COMMITTER_EMAIL");
> + break;
> + }
> + return fmt_ident(name, email, whose_ident, NULL,
> + IDENT_STRICT | IDENT_NO_DATE);
>  }

William and Ævar,

The "WANT_AUTHOR_IDENT" block of this switch statement does not
appear to be hit by any tests, despite the tests included in this
patch. My guess is that it is ignored because we have the following
code in builtin/commit.c:

static void determine_author_info(struct strbuf *author_ident)
{
char *name, *email, *date;
struct ident_split author;

name = xstrdup_or_null(getenv("GIT_AUTHOR_NAME"));
email = xstrdup_or_null(getenv("GIT_AUTHOR_EMAIL"));
date = xstrdup_or_null(getenv("GIT_AUTHOR_DATE"));
...


This is likely overriding the need to use fmt_name. Should we
de-duplicate this use of the environment variable by using your
new method at this spot in builtin/commit.c?

Thanks,
-Stolee


Re: [PATCH 07/14] parse-options: allow ll_callback with OPTION_CALLBACK

2019-04-15 Thread Derrick Stolee
On 1/26/2019 7:35 PM, Nguyễn Thái Ngọc Duy wrote:
> @@ -238,7 +249,10 @@ static enum parse_opt_result parse_short_opt(struct 
> parse_opt_ctx_t *p,
>   len++;
>   arg = xmemdupz(p->opt, len);
>   p->opt = p->opt[len] ? p->opt + len : NULL;
> - rc = (*numopt->callback)(numopt, arg, 0) ? (-1) : 0;
> + if (numopt->callback)
> + rc = (*numopt->callback)(numopt, arg, 0) ? (-1) : 0;
> + else
> + rc = (*numopt->ll_callback)(p, numopt, arg, 0);
>   free(arg);
>   return rc;
>   }

Hi Duy,

This "else" condition is unreachable. This block is only hit when we have a 
"-"
option, using OPT_NUMBER_CALLBACK, which is implemented by filling "callback", 
never
"ll_callback".

I recommend reverting this diff segment, but please let me know if I'm missing 
something.

Thanks,
-Stolee


Re: straw poll: git merge conference location

2019-03-14 Thread Derrick Stolee
On 3/13/2019 4:55 PM, Jeff King wrote:
> We're looking at doing it in North America, but there are two specific
> questions:
> 
>   - is there preference between East Coast vs West Coast?

I have no preference here.

>   - preferences between Canada and US?

There should be serious consideration for Canada only because visa issues
may prevent some participants from traveling into the US. I know that
some academic conferences have moved from US to Canada (even after already
being scheduled for a US location) due to recent changes and general
volitility in visa policies. Perhaps I'm making a big deal about something
that doesn't affect any contributors that would join us, so I hope that
anyone concerned about this can speak up (or contact Peff privately).

Thanks,
-Stolee


Re: [PATCH 2/4] revision walk: optionally use sparse reachability

2019-03-12 Thread Derrick Stolee
On 3/12/2019 9:18 AM, Nathaniel Filardo wrote:
> The only caller that passes a non-zero value to prepare_revision_walk
> after this patch is builtin/pack-objects.  Without this, sparsity seems
> to do little good therein, as prepare_revision_walk will densely
> propagate UNINTERESTING flags from trees to tree contents, before
> mark_edges_uninteresting has a chance to be faster by being sparse.
> 
> Signed-off-by: Nathaniel Filardo 
> ---
>  bisect.c |  2 +-
>  blame.c  |  2 +-
>  builtin/checkout.c   |  2 +-
>  builtin/commit.c |  2 +-
>  builtin/describe.c   |  2 +-
>  builtin/fast-export.c|  2 +-
>  builtin/fmt-merge-msg.c  |  2 +-
>  builtin/log.c| 10 +-
>  builtin/merge.c  |  2 +-
>  builtin/pack-objects.c   |  4 ++--
>  builtin/rev-list.c   |  2 +-
>  builtin/shortlog.c   |  2 +-
>  bundle.c |  2 +-
>  http-push.c  |  2 +-
>  merge-recursive.c|  2 +-
>  pack-bitmap-write.c  |  2 +-
>  pack-bitmap.c|  4 ++--
>  reachable.c  |  4 ++--
>  ref-filter.c |  2 +-
>  remote.c |  2 +-
>  revision.c   | 10 ++
>  revision.h   |  2 +-
>  sequencer.c  |  6 +++---
>  shallow.c|  2 +-
>  submodule.c  |  4 ++--
>  t/helper/test-revision-walking.c |  2 +-
>  26 files changed, 41 insertions(+), 39 deletions(-)

This patch is very noisy. Perhaps the pattern established in 4f6d26b1:
"list-objects: consume sparse tree walk" should be reversed and instead
include a 'sparse_tree_walk' setting into 'struct rev_info'.

Changing so many method prototypes is rather invasive and unlikely to
benefit many of these callers.

If the setting is added to 'struct rev_info', then you'll want to remove
the parameter from mark_edges_uninteresting().

Thanks,
-Stolee


Re: [PATCH 3/4] repack: add --sparse and pass to pack-objects

2019-03-12 Thread Derrick Stolee
On 3/12/2019 9:18 AM, Nathaniel Filardo wrote:
> The sparse connectivity algorithm saves a whole lot of time when there
> are UNINTERESTING trees around.

Interesting! Do you have some performance numbers to include with
this statement?
> @@ -48,6 +49,10 @@ static int repack_config(const char *var, const char 
> *value, void *cb)
>   use_delta_islands = git_config_bool(var, value);
>   return 0;
>   }
> + if (!strcmp(var, "pack.usesparse")) {
> + sparse = git_config_bool(var, value);
> + return 0;
> + }

This part is not handled inside of `pack-objects`. Since you are not
sending '--no-sparse' when the variable 'sparse' is zero, the config
setting will automatically be picked up by the pack-objects builtin.

Now, a question of whether you _should_ allow the '--no-sparse' option
in the 'repack' command, and send it along to the inner command (when
it is present) is another question.

> @@ -366,6 +374,8 @@ int cmd_repack(int argc, const char **argv, const char 
> *prefix)
>   argv_array_push(&cmd.args, "--all");
>   argv_array_push(&cmd.args, "--reflog");
>   argv_array_push(&cmd.args, "--indexed-objects");
> + if (sparse)
> + argv_array_push(&cmd.args, "--sparse");
>   if (repository_format_partial_clone)
>   argv_array_push(&cmd.args, "--exclude-promisor-objects");
>   if (write_bitmaps)
> 

How about a test with this new option? You can probably just add to
t5322-pack-objects-sparse.sh.

Thanks,
-Stolee


[PATCH v7 10/15] trace2:data: pack-objects: add trace2 regions

2019-02-22 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When studying the performance of 'git push' we would like to know
how much time is spent at various parts of the command. One area
that could cause performance trouble is 'git pack-objects'.

Add trace2 regions around the three main actions taken in this
command:

1. Enumerate objects.
2. Prepare pack.
3. Write pack-file.

Signed-off-by: Derrick Stolee 
Signed-off-by: Jeff Hostetler 
---
 builtin/pack-objects.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0a70d04604..8a64c2868e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -33,6 +33,7 @@
 #include "object-store.h"
 #include "dir.h"
 #include "midx.h"
+#include "trace2.h"
 
 #define IN_PACK(obj) oe_in_pack(&to_pack, obj)
 #define SIZE(obj) oe_size(&to_pack, obj)
@@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
}
}
 
+   trace2_region_enter("pack-objects", "enumerate-objects",
+   the_repository);
prepare_packing_data(the_repository, &to_pack);
 
if (progress)
@@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (include_tag && nr_result)
for_each_ref(add_ref_tag, NULL);
stop_progress(&progress_state);
+   trace2_region_leave("pack-objects", "enumerate-objects",
+   the_repository);
 
if (non_empty && !nr_result)
return 0;
-   if (nr_result)
+   if (nr_result) {
+   trace2_region_enter("pack-objects", "prepare-pack",
+   the_repository);
prepare_pack(window, depth);
+   trace2_region_leave("pack-objects", "prepare-pack",
+   the_repository);
+   }
+
+   trace2_region_enter("pack-objects", "write-pack-file", the_repository);
write_pack_file();
+   trace2_region_leave("pack-objects", "write-pack-file", the_repository);
+
if (progress)
fprintf_ln(stderr,
   _("Total %"PRIu32" (delta %"PRIu32"),"
-- 
gitgitgadget



Re: How to get next commit that just after a specified commit

2019-02-18 Thread Derrick Stolee
On 2/13/2019 4:36 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Feb 13 2019, Junio C Hamano wrote:
> 
>> Ævar Arnfjörð Bjarmason  writes:
> 
> Their point in comparing it to git is that there's nothing intuitive in
> the UI that exposes this information for the *current* graph, whereas in
> fossil this is a built-in command:
> https://fossil-scm.org/xfer/help/descendants
> 
> Perhaps this information is cheaper to extract these days with the
> commit graph and we could expose it somehow?

I believe the recent changes (collected in commit-reach.c) that improve
'git branch --contains' or 'git tag --contains' are important in helping
this descendants list. There is likely more work to be done to show this
list in a rev-list call.

Thanks,
-Stolee 


[PATCH v6 10/15] trace2:data: pack-objects: add trace2 regions

2019-02-06 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When studying the performance of 'git push' we would like to know
how much time is spent at various parts of the command. One area
that could cause performance trouble is 'git pack-objects'.

Add trace2 regions around the three main actions taken in this
command:

1. Enumerate objects.
2. Prepare pack.
3. Write pack-file.

Signed-off-by: Derrick Stolee 
Signed-off-by: Jeff Hostetler 
---
 builtin/pack-objects.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0a70d04604..8a64c2868e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -33,6 +33,7 @@
 #include "object-store.h"
 #include "dir.h"
 #include "midx.h"
+#include "trace2.h"
 
 #define IN_PACK(obj) oe_in_pack(&to_pack, obj)
 #define SIZE(obj) oe_size(&to_pack, obj)
@@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
}
}
 
+   trace2_region_enter("pack-objects", "enumerate-objects",
+   the_repository);
prepare_packing_data(the_repository, &to_pack);
 
if (progress)
@@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (include_tag && nr_result)
for_each_ref(add_ref_tag, NULL);
stop_progress(&progress_state);
+   trace2_region_leave("pack-objects", "enumerate-objects",
+   the_repository);
 
if (non_empty && !nr_result)
return 0;
-   if (nr_result)
+   if (nr_result) {
+   trace2_region_enter("pack-objects", "prepare-pack",
+   the_repository);
prepare_pack(window, depth);
+   trace2_region_leave("pack-objects", "prepare-pack",
+   the_repository);
+   }
+
+   trace2_region_enter("pack-objects", "write-pack-file", the_repository);
write_pack_file();
+   trace2_region_leave("pack-objects", "write-pack-file", the_repository);
+
if (progress)
fprintf_ln(stderr,
   _("Total %"PRIu32" (delta %"PRIu32"),"
-- 
gitgitgadget



Git Test Coverage Report (Wednesday, Feb. 6)

2019-02-06 Thread Derrick Stolee
s:hook hook:%s",
3c543ab3 314) strbuf_addstr(&buf_payload, " cd:");
3c543ab3 315) sq_quote_buf_pretty(&buf_payload, cmd->dir);
3c543ab3 320) strbuf_addstr(&buf_payload, " git");
3c543ab3 342) static void fn_thread_start_fl(const char *file, int line,
3c543ab3 345) const char *event_name = "thread_start";
3c543ab3 346) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 348) perf_io_write_fl(file, line, event_name, NULL, 
&us_elapsed_absolute,
3c543ab3 350) strbuf_release(&buf_payload);
3c543ab3 351) }
3c543ab3 353) static void fn_thread_exit_fl(const char *file, int line,
3c543ab3 357) const char *event_name = "thread_exit";
3c543ab3 358) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 360) perf_io_write_fl(file, line, event_name, NULL, 
&us_elapsed_absolute,
3c543ab3 362) strbuf_release(&buf_payload);
3c543ab3 363) }
3c543ab3 365) static void fn_exec_fl(const char *file, int line, uint64_t 
us_elapsed_absolute,
3c543ab3 368) const char *event_name = "exec";
3c543ab3 369) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 371) strbuf_addf(&buf_payload, "id:%d ", exec_id);
3c543ab3 372) strbuf_addstr(&buf_payload, "argv:");
3c543ab3 373) if (exe)
3c543ab3 374) strbuf_addf(&buf_payload, " %s", exe);
3c543ab3 375) sq_quote_argv_pretty(&buf_payload, argv);
3c543ab3 377) perf_io_write_fl(file, line, event_name, NULL, 
&us_elapsed_absolute,
3c543ab3 379) strbuf_release(&buf_payload);
3c543ab3 380) }
3c543ab3 382) static void fn_exec_result_fl(const char *file, int line,
3c543ab3 386) const char *event_name = "exec_result";
3c543ab3 387) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 389) strbuf_addf(&buf_payload, "id:%d code:%d", exec_id, code);
3c543ab3 390) if (code > 0)
3c543ab3 391) strbuf_addf(&buf_payload, " err:%s", strerror(code));
3c543ab3 393) perf_io_write_fl(file, line, event_name, NULL, 
&us_elapsed_absolute,
3c543ab3 395) strbuf_release(&buf_payload);
3c543ab3 396) }
3c543ab3 398) static void fn_param_fl(const char *file, int line, const char 
*param,
3c543ab3 401) const char *event_name = "def_param";
3c543ab3 402) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 404) strbuf_addf(&buf_payload, "%s:%s", param, value);
3c543ab3 406) perf_io_write_fl(file, line, event_name, NULL, NULL, NULL, NULL,
3c543ab3 408) strbuf_release(&buf_payload);
3c543ab3 409) }
3c543ab3 411) static void fn_repo_fl(const char *file, int line,
3c543ab3 414) const char *event_name = "def_repo";
3c543ab3 415) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 417) strbuf_addstr(&buf_payload, "worktree:");
3c543ab3 418) sq_quote_buf_pretty(&buf_payload, repo->worktree);
3c543ab3 420) perf_io_write_fl(file, line, event_name, repo, NULL, NULL, NULL,
3c543ab3 422) strbuf_release(&buf_payload);
3c543ab3 423) }
3c543ab3 425) static void fn_region_enter_printf_va_fl(const char *file, int 
line,
3c543ab3 432) const char *event_name = "region_enter";
3c543ab3 433) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 435) if (label)
3c543ab3 436) strbuf_addf(&buf_payload, "label:%s ", label);
3c543ab3 437) maybe_append_string_va(&buf_payload, fmt, ap);
3c543ab3 439) perf_io_write_fl(file, line, event_name, repo, 
&us_elapsed_absolute,
3c543ab3 441) strbuf_release(&buf_payload);
3c543ab3 442) }
3c543ab3 444) static void fn_region_leave_printf_va_fl(
3c543ab3 449) const char *event_name = "region_leave";
3c543ab3 450) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 452) if (label)
3c543ab3 453) strbuf_addf(&buf_payload, "label:%s ", label);
3c543ab3 454) maybe_append_string_va(&buf_payload, fmt, ap);
3c543ab3 456) perf_io_write_fl(file, line, event_name, repo, 
&us_elapsed_absolute,
3c543ab3 458) strbuf_release(&buf_payload);
3c543ab3 459) }
3c543ab3 461) static void fn_data_fl(const char *file, int line, uint64_t 
us_elapsed_absolute,
3c543ab3 466) const char *event_name = "data";
3c543ab3 467) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 469) strbuf_addf(&buf_payload, "%s:%s", key, value);
3c543ab3 471) perf_io_write_fl(file, line, event_name, repo, 
&us_elapsed_absolute,
3c543ab3 473) strbuf_release(&buf_payload);
3c543ab3 474) }
3c543ab3 476) static void fn_data_json_fl(const char *file, int line,
3c543ab3 482) const char *event_name = "data_json";
3c543ab3 483) struct strbuf buf_payload = STRBUF_INIT;
3c543ab3 485) strbuf_addf(&buf_payload, "%s:%s", key, value->json.buf);
3c543ab3 487) perf_io_write_fl(file, line, event_name, repo, 
&us_elapsed_absolute,
3c543ab3 489) strbuf_release(&buf_payload);
3c543ab3 490) }
3c543ab3 492) static void fn_printf_va_fl(const char *file, int line,
3c543ab3 496) const char *event_name = "printf";
3

[PATCH v5 10/15] trace2:data: pack-objects: add trace2 regions

2019-02-01 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When studying the performance of 'git push' we would like to know
how much time is spent at various parts of the command. One area
that could cause performance trouble is 'git pack-objects'.

Add trace2 regions around the three main actions taken in this
command:

1. Enumerate objects.
2. Prepare pack.
3. Write pack-file.

Signed-off-by: Derrick Stolee 
Signed-off-by: Jeff Hostetler 
---
 builtin/pack-objects.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0a70d04604..8a64c2868e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -33,6 +33,7 @@
 #include "object-store.h"
 #include "dir.h"
 #include "midx.h"
+#include "trace2.h"
 
 #define IN_PACK(obj) oe_in_pack(&to_pack, obj)
 #define SIZE(obj) oe_size(&to_pack, obj)
@@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
}
}
 
+   trace2_region_enter("pack-objects", "enumerate-objects",
+   the_repository);
prepare_packing_data(the_repository, &to_pack);
 
if (progress)
@@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (include_tag && nr_result)
for_each_ref(add_ref_tag, NULL);
stop_progress(&progress_state);
+   trace2_region_leave("pack-objects", "enumerate-objects",
+   the_repository);
 
if (non_empty && !nr_result)
return 0;
-   if (nr_result)
+   if (nr_result) {
+   trace2_region_enter("pack-objects", "prepare-pack",
+   the_repository);
prepare_pack(window, depth);
+   trace2_region_leave("pack-objects", "prepare-pack",
+   the_repository);
+   }
+
+   trace2_region_enter("pack-objects", "write-pack-file", the_repository);
write_pack_file();
+   trace2_region_leave("pack-objects", "write-pack-file", the_repository);
+
if (progress)
fprintf_ln(stderr,
   _("Total %"PRIu32" (delta %"PRIu32"),"
-- 
gitgitgadget



[PATCH v4 10/14] pack-objects: add trace2 regions

2019-01-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When studying the performance of 'git push' we would like to know
how much time is spent at various parts of the command. One area
that could cause performance trouble is 'git pack-objects'.

Add trace2 regions around the three main actions taken in this
command:

1. Enumerate objects.
2. Prepare pack.
3. Write pack-file.

Signed-off-by: Derrick Stolee 
Signed-off-by: Jeff Hostetler 
---
 builtin/pack-objects.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0a70d04604..8a64c2868e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -33,6 +33,7 @@
 #include "object-store.h"
 #include "dir.h"
 #include "midx.h"
+#include "trace2.h"
 
 #define IN_PACK(obj) oe_in_pack(&to_pack, obj)
 #define SIZE(obj) oe_size(&to_pack, obj)
@@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
}
}
 
+   trace2_region_enter("pack-objects", "enumerate-objects",
+   the_repository);
prepare_packing_data(the_repository, &to_pack);
 
if (progress)
@@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (include_tag && nr_result)
for_each_ref(add_ref_tag, NULL);
stop_progress(&progress_state);
+   trace2_region_leave("pack-objects", "enumerate-objects",
+   the_repository);
 
if (non_empty && !nr_result)
return 0;
-   if (nr_result)
+   if (nr_result) {
+   trace2_region_enter("pack-objects", "prepare-pack",
+   the_repository);
prepare_pack(window, depth);
+   trace2_region_leave("pack-objects", "prepare-pack",
+   the_repository);
+   }
+
+   trace2_region_enter("pack-objects", "write-pack-file", the_repository);
write_pack_file();
+   trace2_region_leave("pack-objects", "write-pack-file", the_repository);
+
if (progress)
fprintf_ln(stderr,
   _("Total %"PRIu32" (delta %"PRIu32"),"
-- 
gitgitgadget



[PATCH v3 10/14] pack-objects: add trace2 regions

2019-01-30 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When studying the performance of 'git push' we would like to know
how much time is spent at various parts of the command. One area
that could cause performance trouble is 'git pack-objects'.

Add trace2 regions around the three main actions taken in this
command:

1. Enumerate objects.
2. Prepare pack.
3. Write pack-file.

Signed-off-by: Derrick Stolee 
Signed-off-by: Jeff Hostetler 
---
 builtin/pack-objects.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 0a70d04604..8a64c2868e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -33,6 +33,7 @@
 #include "object-store.h"
 #include "dir.h"
 #include "midx.h"
+#include "trace2.h"
 
 #define IN_PACK(obj) oe_in_pack(&to_pack, obj)
 #define SIZE(obj) oe_size(&to_pack, obj)
@@ -3472,6 +3473,8 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
}
}
 
+   trace2_region_enter("pack-objects", "enumerate-objects",
+   the_repository);
prepare_packing_data(the_repository, &to_pack);
 
if (progress)
@@ -3486,12 +3489,23 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (include_tag && nr_result)
for_each_ref(add_ref_tag, NULL);
stop_progress(&progress_state);
+   trace2_region_leave("pack-objects", "enumerate-objects",
+   the_repository);
 
if (non_empty && !nr_result)
return 0;
-   if (nr_result)
+   if (nr_result) {
+   trace2_region_enter("pack-objects", "prepare-pack",
+   the_repository);
prepare_pack(window, depth);
+   trace2_region_leave("pack-objects", "prepare-pack",
+   the_repository);
+   }
+
+   trace2_region_enter("pack-objects", "write-pack-file", the_repository);
write_pack_file();
+   trace2_region_leave("pack-objects", "write-pack-file", the_repository);
+
if (progress)
fprintf_ln(stderr,
   _("Total %"PRIu32" (delta %"PRIu32"),"
-- 
gitgitgadget



Re: [PATCH 1/1] Makefile: add prove and coverage-prove targets

2019-01-29 Thread Derrick Stolee
On 1/29/2019 1:10 PM, Derrick Stolee wrote:
> On 1/29/2019 12:34 PM, SZEDER Gábor wrote:
>> On Tue, Jan 29, 2019 at 04:58:27PM +0100, SZEDER Gábor wrote:
>> And in the related email discussion [1]:
>>
>>   But even though the docs claim it [-j] should be possible,
>>   I've been getting "random" test failures when compiled with coverage
>>   support, that went away with -j1.  So the tests still run with -j1, as
>>   with the first version of the series.
>>
>> So it doesn't seem to be that bad after all, because it's not
>> "completely breaks" but "random test failures".  Still far from ideal,
>> but the original coverage patch is just about 3 weeks short of a
>> decade old, so maybe things have improved since then, and it'd be
>> worth a try to leave GIT_PROVE_OPTS as is and see what happens.
> 
> It would certainly be nice if the build time could be reduced through
> parallel test runs. I've kicked off a build using GIT_PROVE_OPTS="-j12"
> to see what happens.

I did get a failed test with this run:

t0025-crlf-renormalize.sh (Wstat: 256 Tests: 3 Failed: 1)
  Failed test:  2
  Non-zero exit status: 1

This was on the 'jch' branch, and an equivalent build with sequential
execution did not have this failure. That's flaky enough for me to stick
to sequential runs.

Thanks,
-Stolee
 


Re: Contributor Summit Topics and Logistics

2019-01-29 Thread Derrick Stolee
I was hoping to attend the contributors' summit remotely, but now my leave is
starting before then. This email contains a summary of what I would have
added to the discussion.

Thanks,
-Stolee


Commit-Graph Status Report
==

I'm really happy with the progress in this area, especially with the number of
other contributors working on the feature! Thanks Ævar, Jonathan, Josh, Stefan,
and Szeder in particular.

Here are some directions to take the feature in the near future:

File Format v2
--

The new format version [1] specifically fixes some shortcomings in v1:

* Uses the 4-byte format id for the hash algorithm.
* Creates a separate version byte for the reachability index.
* Enforces that the unused byte is zero until we use it for incremental writes.

Hopefully, this is the last time we need to update the file header.

[1] https://public-inbox.org/git/pull.112.git.gitgitgad...@gmail.com/
[PATCH 0/6] Create commit-graph file format v2

Reachability Index
--

As discussed on-list [2], we want to replace generation numbers with a different
(negative-cut) reachability index. I used the term "corrected commit date". The
definition is:

* If a commit has no parents, then its corrected commit date is its commit date.

* If a commit has parents, then its corrected commit date is the maximum of:
- its commit date
- one more than the maximum corrected commit date of its parents

The benefits of this definition were discussed already, but to summarize:

* This definition will work _at least as well_ as the commit date heuristic,
  with the added bonus of being absolutely sure our results are right. We can
  update algorithms like paint_down_to_common() to use this reachability index
  without performance problems in some cases.

* If someone creates a terrible commit with a date that is far in the future,
  this definition is no worse than existing generation numbers (because we
  enforce that the corrected commit date is strictly larger than the parents'
  corrected commit date).

To implement this index, we can re-use the 30 bits per commit in the
commit-graph file that are used for generation numbers, but use them instead
for the difference between the corrected commit date and the actual commit
date. File format v2 gives us a version value that can be incremented to signal
the change in meaning.

Some work is required to adjust the existing generation-number-aware algorithms
to care about an "arbitrary" reachability index. It could be as easy as a
helper function that returns a function pointer to the proper compare function.

If someone wants to move forward on this topic while I'm gone, please
volunteer. Otherwise, this will be among my first items to work on when I
return from leave.

[2] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/
[RFC] Generation Number v2

Incremental Writes
--

Similar to the split index, an incremental commit-graph file can be implemented
to reduce the write time when adding commits to an existing (large)
commit-graph. In this case, the .git/objects/info/commit-graph file would
be small, and have a pointer to a base file, say "cgraph-.cgraph", that
contains the majority of the commits.

The important thing to keep in mind here is that we use integers to refer to
a commit's parents. This integer would need to refer to the order of commits
when you concatenate the orderd lists from each file. When doing this, we
can point into the base file as well as the tip file. Since the base
commit-graph file would be closed under reachability, it only needs to care
about commits in its file.

It is also possible to have multiple base files, and we can use the unused
byte in the commit-graph file format v2 to store the number of base files.
We can then store a list of file names in a new chunk, presenting the ordered
list of base files. We still want to keep this list short, but there may be
benefits to a variable number. I expect the first version would limit the
construction to one base file for simplicity's sake.

When this is implemented, we can use it to write the commit-graph at fetch
time. A config setting, say 'fetch.writeCommitGraph', could enable this write.
Since most writes would add a small number of commits compared to the large
base file, this would be a more reasonable cost to add to a fetch. Since we
verify the pack upon download, the commits it contained will already be in
the memory cache and we won't need to re-parse those commits.

Volunteers welcome.

Bloom Filters
-

Using bloom filters to speed up file history has been discussed and prototyped
on-list (see [12] and the thread before it). Thanks for lots of contributions
in this area! A lot of people have shown an interest in this feature, and it
is particularly helpful with server-side queries.

Any implementation here should check that it is helping 'git blame' as much as
it can [13]. It's entirely 

Re: [PATCH 1/1] Makefile: add prove and coverage-prove targets

2019-01-29 Thread Derrick Stolee
On 1/29/2019 12:34 PM, SZEDER Gábor wrote:
> On Tue, Jan 29, 2019 at 04:58:27PM +0100, SZEDER Gábor wrote:
> And in the related email discussion [1]:
> 
>   But even though the docs claim it [-j] should be possible,
>   I've been getting "random" test failures when compiled with coverage
>   support, that went away with -j1.  So the tests still run with -j1, as
>   with the first version of the series.
> 
> So it doesn't seem to be that bad after all, because it's not
> "completely breaks" but "random test failures".  Still far from ideal,
> but the original coverage patch is just about 3 weeks short of a
> decade old, so maybe things have improved since then, and it'd be
> worth a try to leave GIT_PROVE_OPTS as is and see what happens.

It would certainly be nice if the build time could be reduced through
parallel test runs. I've kicked off a build using GIT_PROVE_OPTS="-j12"
to see what happens.

Thanks!
-Stolee


Git Test Coverage Report (Tue. January 29, 2019)

2019-01-29 Thread Derrick Stolee
Here is today's test coverage report.

Since the last report, I have overhauled the machinery for generating these
reports. The code is available on GitHub [1], and the reports are available
online in text [2] and HTML [3] form.

In addition to the two output formats, the code also has the ability to
ignore lines that are unimportant for coverage. This allows me to review
the report, and reduce the size of the next one. For this reason, I set
the comparison branch for 'master' to 'maint' so we can see what is
uncovered in that segment. If you wish to add ignored lines, then please
open a pull request against the GitHub repo [1].

One side effect is that some file names may appear with no uncovered code.
The ignored lines are visible by clicking a button on the HTML report [3].

The 'pu' versus 'jch' report looks rather large to me, but I did use prove
to run the tests, so the entire test suite did run (through failures). A
lot of the lines are related to trace2, so I'll add the GIT_TR2* variables
to my build when running with optional variables.

As always, I hope this is useful and I look forward to your feedback.

Thanks,
-Stolee

[1] https://github.com/derrickstolee/git-test-coverage
[2] https://derrickstolee.github.io/git-test-coverage/reports/2019-01-29.txt
[3] https://derrickstolee.github.io/git-test-coverage/reports/2019-01-29.htm

---

pu  c7baf85843ec1dc8272483baa2f84d0ec66ad882
jch 2ea90524518a3dfcb3cda871ef2b1a739075e5d9
nextaa96b0ce6b0fa4fa4cc6870f1a3aff3878967bfa
master  16a465bc018d09e9d7bbbdc5f40a7fb99c21f8ef
maint   0d0ac3826a3bbb9247e39e12623bbcfdd722f24c


Uncovered code in 'pu' not in 'jch'


bisect.c
4f6d26b1 661) mark_edges_uninteresting(revs, NULL, 0);

blame.c
07d04b91 482) ent->s_lno + ent->num_lines == next->s_lno &&
07d04b91 483) ent->ignored == next->ignored) {
e7973c85 940) blame_origin_decref(e->suspect);
e7973c85 941) e->suspect = blame_origin_incref(parent);
e7973c85 942) e->s_lno += offset;
07d04b91 943) e->ignored = 1;
e7973c85 944) e->next = ignoredp;
b543bb1c 1497) same = 1;
b543bb1c 1503) blame_origin_decref(porigin);

builtin/blame.c
07d04b91 485) length--;
31653c1a 699) if (!value)
31653c1a 700) return config_error_nonbool(var);
a5481a6c 701) parse_date_format(value, &blame_date_mode);
31653c1a 702) return 0;
5817da01 777) return 0;
31653c1a 930) blame_date_width = sizeof("Thu, 19 Oct 2006 16:00:04 -0700");
31653c1a 931) break;
5817da01 998) usage_with_options(blame_opt_usage, options);

builtin/change.c
49854277 23) static int change_list(int argc, const char **argv, const char* 
prefix)
49854277 25) struct option options[] = {
49854277 31) struct ref_format format = REF_FORMAT_INIT;
49854277 35) argc = parse_options(argc, argv, prefix, options, 
builtin_list_usage, 0);
49854277 37) setup_ref_filter_porcelain_msg();
49854277 39) memset(&filter, 0, sizeof(filter));
49854277 40) memset(&array, 0, sizeof(array));
49854277 42) filter.kind = FILTER_REFS_CHANGES;
49854277 43) filter.name_patterns = argv;
49854277 45) filter_refs(&array, &filter, FILTER_REFS_CHANGES);
49854277 52) if (!format.format) {
49854277 53) format.format = "%(refname:lstrip=1)";
49854277 56) if (verify_ref_format(&format))
49854277 57) die(_("unable to parse format string"));
49854277 59) for (i = 0; i < array.nr; i++) {
49854277 60) show_ref_array_item(array.items[i], &format);
49854277 63) ref_array_clear(&array);
49854277 65) return 0;
73f8829d 76) static void init_update_state(struct update_state *state)
73f8829d 78) memset(state, 0, sizeof(*state));
73f8829d 79) state->content = "HEAD";
73f8829d 80) string_list_init(&state->replace, 0);
73f8829d 81) string_list_init(&state->origin, 0);
73f8829d 82) }
73f8829d 84) static void clear_update_state(struct update_state *state)
73f8829d 86) string_list_clear(&state->replace, 0);
73f8829d 87) string_list_clear(&state->origin, 0);
73f8829d 88) }
73f8829d 90) static int update_option_parse_replace(const struct option *opt,
73f8829d 93) struct update_state *state = opt->value;
73f8829d 94) string_list_append(&state->replace, arg);
73f8829d 95) return 0;
73f8829d 98) static int update_option_parse_origin(const struct option *opt,
73f8829d 101) struct update_state *state = opt->value;
73f8829d 102) string_list_append(&state->origin, arg);
73f8829d 103) return 0;
73f8829d 106) static int resolve_commit(const char *committish, struct 
object_id *result)
73f8829d 109) if (get_oid_committish(committish, result))
73f8829d 110) die(_("Failed to resolve '%s' as a valid revision."), committish);
73f8829d 111) commit = lookup_commit_reference(the_repository, result);
73f8829d 112) if (!commit)
73f8829d 113) die(_("Could not parse object '%s'."), committish);
73f8829d 114) oidcpy(result, &commit->object.oid);
73f8829d 115) return 0;
73f8829d 118) static void resolve_commit_list(const struct string_list 
*commitsish_list,
73f8829d 122) for (i = 0; i < commitsish_list->nr; i++) {
73f8829d 123) s

[PATCH v2 1/1] Makefile: add coverage-prove target

2019-01-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Sometimes there are test failures in the 'pu' branch. This
is somewhat expected for a branch that takes the very latest
topics under development, and those sometimes have semantic
conflicts that only show up during test runs. This also can
happen when running the test suite with different GIT_TEST_*
environment variables that interact in unexpected ways

This causes a problem for the test coverage reports, as
the typical 'make coverage-test coverage-report' run halts
at the first failed test. If that test is early in the
suite, then many valuable tests are not exercising the code
and the coverage report becomes noisy with false positives.

Add a new 'coverage-prove' target to the Makefile,
modeled after the 'coverage-test' target. This compiles
the source using the coverage flags, then runs the test
suite using the 'prove' tool. Since the coverage
machinery is not thread-safe, enforce that the tests
are run in sequence by appending '-j1' to GIT_PROVE_OPTS.

Signed-off-by: Derrick Stolee 
---
 Makefile | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Makefile b/Makefile
index 1a44c811aa..23d8730482 100644
--- a/Makefile
+++ b/Makefile
@@ -3077,6 +3077,11 @@ coverage-test: coverage-clean-results coverage-compile
$(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \
DEFAULT_TEST_TARGET=test -j1 test
 
+coverage-prove: coverage-clean-results coverage-compile
+   $(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \
+   DEFAULT_TEST_TARGET=prove GIT_PROVE_OPTS="$(GIT_PROVE_OPTS) 
-j1" \
+   -j1 test
+
 coverage-report:
$(QUIET_GCOV)for dir in $(object_dirs); do \
$(GCOV) $(GCOVFLAGS) --object-directory=$$dir $$dir*.c || exit; 
\
-- 
gitgitgadget


[PATCH v2 0/1] Makefile: add prove and coverage-prove targets

2019-01-29 Thread Derrick Stolee via GitGitGadget
Sometimes there are test failures in the 'pu' branch. This is somewhat
expected for a branch that takes the very latest topics under development,
and those sometimes have semantic conflicts that only show up during test
runs. This also can happen when running the test suite with different
GIT_TEST_* environment variables that interact in unexpected ways.

This causes a problem for the test coverage reports, as the typical 'make
coverage-test coverage-report' run halts at the first failed test. If that
test is early in the suite, then many valuable tests are not exercising the
code and the coverage report becomes noisy with false positives.

This patch adds two targets to the Makefile: 'prove' and 'coverage-prove'.
The idea is to use the 'prove' tool to run the test suite, as it will track
failed tests but continue running the full suite even with a failure.

If/when this merges down, I will use this new target for the automation
around the test coverage reports.

Updates in V2:

 * Dropped the 'prove' target
   
   
 * Append '-j1' to GIT_PROVE_OPTS
   
   
 * Commit message tweaks.

Derrick Stolee (1):
  Makefile: add coverage-prove target

 Makefile | 5 +
 1 file changed, 5 insertions(+)


base-commit: 0d0ac3826a3bbb9247e39e12623bbcfdd722f24c
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-114%2Fderrickstolee%2Fcoverage-prove-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-114/derrickstolee/coverage-prove-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/114

Range-diff vs v1:

 1:  294187c696 < -:  -- Makefile: add prove and coverage-prove targets
 -:  -- > 1:  af810fad97 Makefile: add coverage-prove target

-- 
gitgitgadget


Re: [PATCH 1/1] Makefile: add prove and coverage-prove targets

2019-01-29 Thread Derrick Stolee
On 1/29/2019 10:58 AM, SZEDER Gábor wrote:
> On Tue, Jan 29, 2019 at 06:56:08AM -0800, Derrick Stolee via GitGitGadget 
> wrote:
>> +prove: all
>> +$(MAKE) -C t/ prove
>> +
> 
> You don't need this 'prove' target in the "main" Makefile, because
> 'make test' will run the test suite using DEFAULT_TEST_TARGET anyway.

Thanks!

>> +coverage-prove: coverage-clean-results coverage-compile
>> +$(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \
>> +DEFAULT_TEST_TARGET=prove -j1 prove
> 
> First I was wondering why do you need a dedicated 'coverage-prove'
> target, instead of letting DEFAULT_TEST_TARGET from the environment or
> from 'config.mak' do its thing.  But then I noticed in the hunk
> context, that, for some reason, the 'coverage-test' target hardcoded
> 'DEFAULT_TEST_TARGET=test -j1'.  Then I was wondering why would it
> want to do that, and stumbled upon commit c14cc77c11:
> 
> coverage: set DEFAULT_TEST_TARGET to avoid using prove
> 
> If the user sets DEFAULT_TEST_TARGET=prove in his config.mak, that
> carries over into the coverage tests.  Which is really bad if he also
> sets GIT_PROVE_OPTS=-j<..> as that completely breaks the coverage
> runs.
> 
> Instead of attempting to mess with the GIT_PROVE_OPTS, just force the
> test target to 'test' so that we run under make, like we intended all
> along.

Thanks for finding this!
 
> I'm afraid that this issue would badly affect 'coverage-prove' as well
> (I didn't try).  Or if doesn't (anymore?), then that should be
> mentioned in the commit message, and then perhaps it's time to remove
> that '-j1' from the 'coverage-test' target as well.

I'll fix this by forcing an update to GIT_PROVE_OPTS. It does limit our
ability to use GIT_PROVE_OPTS as a pass-through, but at least this new
target will have that assumption built in.

Thanks,
-Stolee


Re: [PATCH 1/1] Makefile: add prove and coverage-prove targets

2019-01-29 Thread Derrick Stolee
On 1/29/2019 11:00 AM, Jeff King wrote:
> On Tue, Jan 29, 2019 at 06:56:08AM -0800, Derrick Stolee via GitGitGadget 
> wrote:
> 
>> From: Derrick Stolee 
>>
>> When running the test suite for code coverage using
>> 'make coverage-test', a single test failure stops the
>> test suite from completing. This leads to significant
>> undercounting of covered blocks.
>>
>> Add two new targets to the Makefile:
>>
>> * 'prove' runs the test suite using 'prove'.
>>
>> * 'coverage-prove' compiles the source using the
>>   coverage flags, then runs the test suite using
>>   'prove'.
>>
>> These targets are modeled after the 'test' and
>> 'coverage-test' targets.
> 
> I think these are reasonable to have (and I personally much prefer
> "prove" to the raw "make test" output anyway).
> 
> For people who don't have "prove" available, I think they could just do
> "make -k test" to make sure the full suite runs. Should we perhaps be
> doing that automatically in the sub-make run by coverage-test?

I wanted to avoid changing the existing behavior, if I could. But, if
we can reasonably assume that anyone running 'make coverage-test' wants
to run the full suite even with failures, then that's fine by me.

I see from the make docs that '-k' will still result in an error code
at the end of the command, so no automation would result in an incorrect
response to a failed test. Am I correct?

>> @@ -3077,6 +3080,10 @@ coverage-test: coverage-clean-results coverage-compile
>>  $(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \
>>  DEFAULT_TEST_TARGET=test -j1 test
>>  
>> +coverage-prove: coverage-clean-results coverage-compile
>> +$(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \
>> +DEFAULT_TEST_TARGET=prove -j1 prove
>> +
> 
> You probably don't need to override DEFAULT_TEST_TARGET here, since the
> "prove" target doesn't look at it. Likewise, "-j1" probably does nothing
> here, since prove itself is a single target.

As Szeder mentioned, I can probably just drop the 'prove' target and use
DEFAULT_TEST_TARGET instead. Or do we think anyone will want to use
'make prove' from root?

> I'm not sure why we want to enforce -j1 for these targets, but if it's
> important to do so for the prove case, as well, you'd need to add it to
> GIT_PROVE_OPTS.

The '-j1' is necessary because the coverage data is collected in a way that
is not thread-safe. Our compile options also force single-threaded behavior.

I'll specifically override GIT_PROVE_OPTS here to force -j1, but also send
-j1 to the 'make' command, too.

Thanks,
-Stolee


[PATCH 1/1] Makefile: add prove and coverage-prove targets

2019-01-29 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When running the test suite for code coverage using
'make coverage-test', a single test failure stops the
test suite from completing. This leads to significant
undercounting of covered blocks.

Add two new targets to the Makefile:

* 'prove' runs the test suite using 'prove'.

* 'coverage-prove' compiles the source using the
  coverage flags, then runs the test suite using
  'prove'.

These targets are modeled after the 'test' and
'coverage-test' targets.

Signed-off-by: Derrick Stolee 
---
 Makefile | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Makefile b/Makefile
index 1a44c811aa..ec886635ae 100644
--- a/Makefile
+++ b/Makefile
@@ -2665,6 +2665,9 @@ export TEST_NO_MALLOC_CHECK
 test: all
$(MAKE) -C t/ all
 
+prove: all
+   $(MAKE) -C t/ prove
+
 perf: all
$(MAKE) -C t/perf/ all
 
@@ -3077,6 +3080,10 @@ coverage-test: coverage-clean-results coverage-compile
$(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \
DEFAULT_TEST_TARGET=test -j1 test
 
+coverage-prove: coverage-clean-results coverage-compile
+   $(MAKE) CFLAGS="$(COVERAGE_CFLAGS)" LDFLAGS="$(COVERAGE_LDFLAGS)" \
+   DEFAULT_TEST_TARGET=prove -j1 prove
+
 coverage-report:
$(QUIET_GCOV)for dir in $(object_dirs); do \
$(GCOV) $(GCOVFLAGS) --object-directory=$$dir $$dir*.c || exit; 
\
-- 
gitgitgadget


[PATCH 0/1] Makefile: add prove and coverage-prove targets

2019-01-29 Thread Derrick Stolee via GitGitGadget
Sometimes there are test failures in the 'pu' branch. This is somewhat
expected for a branch that takes the very latest topics under development,
and those sometimes have semantic conflicts that only show up during test
runs. This also can happen when running the test suite with different
GIT_TEST_* environment variables that interact in unexpected ways.

This causes a problem for the test coverage reports, as the typical 'make
coverage-test coverage-report' run halts at the first failed test. If that
test is early in the suite, then many valuable tests are not exercising the
code and the coverage report becomes noisy with false positives.

This patch adds two targets to the Makefile: 'prove' and 'coverage-prove'.
The idea is to use the 'prove' tool to run the test suite, as it will track
failed tests but continue running the full suite even with a failure.

If/when this merges down, I will use this new target for the automation
around the test coverage reports.

Thanks, -Stolee

Derrick Stolee (1):
  Makefile: add prove and coverage-prove targets

 Makefile | 7 +++
 1 file changed, 7 insertions(+)


base-commit: 0d0ac3826a3bbb9247e39e12623bbcfdd722f24c
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-114%2Fderrickstolee%2Fcoverage-prove-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-114/derrickstolee/coverage-prove-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/114
-- 
gitgitgadget


[PATCH v2 10/14] pack-objects: add trace2 regions

2019-01-28 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When studying the performance of 'git push' we would like to know
how much time is spent at various parts of the command. One area
that could cause performance trouble is 'git pack-objects'.

Add trace2 regions around the three main actions taken in this
command:

1. Enumerate objects.
2. Prepare pack.
3. Write pack-file.

Signed-off-by: Derrick Stolee 
---
 builtin/pack-objects.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 889df2c755..6708529e3c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -33,6 +33,7 @@
 #include "object-store.h"
 #include "dir.h"
 #include "midx.h"
+#include "trace2.h"
 
 #define IN_PACK(obj) oe_in_pack(&to_pack, obj)
 #define SIZE(obj) oe_size(&to_pack, obj)
@@ -3472,6 +3473,7 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
}
}
 
+   trace2_region_enter("pack-objects", "enumerate-objects", 
the_repository);
prepare_packing_data(the_repository, &to_pack);
 
if (progress)
@@ -3486,12 +3488,20 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (include_tag && nr_result)
for_each_ref(add_ref_tag, NULL);
stop_progress(&progress_state);
+   trace2_region_leave("pack-objects", "enumerate-objects", 
the_repository);
 
if (non_empty && !nr_result)
return 0;
-   if (nr_result)
+   if (nr_result) {
+   trace2_region_enter("pack-objects", "prepare-pack", 
the_repository);
prepare_pack(window, depth);
+   trace2_region_leave("pack-objects", "prepare-pack", 
the_repository);
+   }
+
+   trace2_region_enter("pack-objects", "write-pack-file", the_repository);
write_pack_file();
+   trace2_region_leave("pack-objects", "write-pack-file", the_repository);
+
if (progress)
fprintf_ln(stderr,
   _("Total %"PRIu32" (delta %"PRIu32"),"
-- 
gitgitgadget



Re: [PATCH] object_as_type: initialize commit-graph-related fields of 'struct commit'

2019-01-27 Thread Derrick Stolee
On 1/27/2019 8:28 AM, SZEDER Gábor wrote:
> On Sun, Jan 27, 2019 at 02:08:32PM +0100, SZEDER Gábor wrote:
>> When the commit graph and generation numbers were introduced in
>> commits 177722b344 (commit: integrate commit graph with commit
>> parsing, 2018-04-10) and 83073cc994 (commit: add generation number to
>> struct commit, 2018-04-25), they tried to make sure that the
>> corresponding 'graph_pos' and 'generation' fields of 'struct commit'
>> are initialized conservatively, as if the commit were not included in
>> the commit-graph file.
>>
>> Alas, initializing those fields only in alloc_commit_node() missed the
>> case when an object that happens to be a commit is first looked up via
>> lookup_unknown_object(), and is then later converted to a 'struct
>> commit' via the object_as_type() helper function (either calling it
>> directly, or as part of a subsequent lookup_commit() call).
>> Consequently, both of those fields incorrectly remain set to zero,
>> which means e.g. that the commit is present in and is the first entry
>> of the commit-graph file.  This will result in wrong timestamp, parent
>> and root tree hashes, if such a 'struct commit' instance is later
>> filled from the commit-graph.
>>
>> Extract the initialization of 'struct commit's fields from
>> alloc_commit_node() into a helper function, and call it from
>> object_as_type() as well, to make sure that it properly initializes
>> the two commit-graph-related fields, too.  With this helper function
>> it is hopefully less likely that any new fields added to 'struct
>> commit' in the future would remain uninitialized.
>>
>> With this change alloc_commit_index() won't have any remaining callers
>> outside of 'alloc.c', so mark it as static.
>>
>> Signed-off-by: SZEDER Gábor 
>> ---
>>
>> So, it turns out that ec0c5798ee (revision: use commit graph in
>> get_reference(), 2018-12-04) is not the culprit after all, it merely
>> highlighted a bug that is as old as the commit-graph feature itself.
>> This patch fixes this and all other related issues I reported
>> upthread.
> 
> And how/why does this affect 'git describe --dirty'?
> 
>   - 'git describe' first iterates over all refs, and somewhere deep
> inside for_each_ref() each commit (well, object) a ref points to
> is looked up via lookup_unknown_object().  This leaves all fields
> of the created object zero initialized.
> 
>   - Then it dereferences HEAD for '--dirty' and ec0c5798ee's changes
> to get_reference() kick in: lookup_commit() doesn't instantiate a
> brand new and freshly initialized 'struct commit', but returns the
> object created in the previous step converted into 'struct
> commit'.  This conversion doesn't set the commit-graph fields in
> 'struct commit', but leaves both as zero.  get_reference() then
> tries to load HEAD's commit information from the commit-graph,
> find_commit_in_graph() sees the the still zero 'graph_pos' field
> and doesn't perform a search through the commit-graph file, and
> the subsequent fill_commit_in_graph() reads the commit info from
> the first entry.
> 
> In case of the failing test I posted earlier, where only the first
> commit is in the commit-graph but HEAD isn't, this means that the
> HEAD's 'struct commit' is filled with the info of HEAD^.
> 
>   - Ultimately, the diff machinery then doesn't compare the worktree
> to HEAD's tree, but to HEAD^'s, finds that they differ, hence the
> incorrect '-dirty' flag in the output.
> 
> Before ec0c5798ee get_reference() simply called parse_object(), which
> ignored the commit-graph, so the issue could remain hidden.

Thanks for digging in, Szeder. This is a very subtle interaction, and
I'm glad you caught the issue. There are likely other ways this could
become problematic, including hitting BUG() statements regarding
generation numbers.

I recommend this be merged to 'maint' if possible.

Thanks,
-Stolee



Re: [PATCH v4 08/10] midx: implement midx_repack()

2019-01-26 Thread Derrick Stolee

On 1/24/2019 4:52 PM, Derrick Stolee via GitGitGadget wrote:

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index acc5e65ecc..d6c1353514 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -383,7 +383,8 @@ test_expect_success 'setup expire tests' '
git pack-objects --revs .git/objects/pack/pack-E <<-EOF &&
refs/heads/E
EOF
-   git multi-pack-index write
+   git multi-pack-index write &&
+   cp -r .git/objects/pack .git/objects/pack-backup
)
  '


Josh: Thanks for catching the failure in PATCH 7. It's due to this line 
that should be part of that commit, not this one.


Thanks,

-Stolee



Re: [PATCH v2] log,diff-tree: add --combined-with-paths options for merges with renames

2019-01-25 Thread Derrick Stolee
On 1/25/2019 11:54 AM, Elijah Newren wrote:
> +test_expect_success '--combined-with-paths works with -z as well' '
> + printf "0f9645804ebb04cc3eef91f799eb7fb54d70cefb\0::100644 100644 
> 100644 f00c965d8307308469e537302baa73048488f162 
> 088bd5d92c2a8e0203ca8e7e4c2a5c692f6ae3f7 
> 333b9c62519f285e1854830ade0fe1ef1d40ee1b 
> RR\0file\twith\ttabs\0i\tam\ttabbed\0fickle\tnaming\0" >expect &&

I'm guessing that you use printf here because the
'cat <<-\EOF' approach doesn't work with the special
tabs? Kudos for putting in the extra effort here for
the special formatting!

The rest of the patch looks good.

Thanks,
-Stolee


Re: [PATCH] log,diff-tree: add --combined-with-paths options for merges with renames

2019-01-25 Thread Derrick Stolee
On 1/24/2019 11:46 AM, Elijah Newren wrote:
> As an alternative, I considered perhaps trying to sell it as a bugfix
> (how often do people use -M, -c, and --raw together and have renames
> in merge commits -- can I just change the format to include the old
> names), but was worried that since diff-tree is plumbing and that the
> format was documented to not include original filename(s), that I'd be
> breaking backward compatibility in an important way for someone and
> thus opted for a new flag to get the behavior I needed.

This is wise. Changing the format is likely to break at least one GUI
client or tool, especially because many depend on the installed version
of Git instead of shipping with one "blessed" client.

In addition, there may be whitespace in the filenames. It appears the
diff format separates the filenames with tab characters. Is it
possible to have tab character in a file name? That would make the output
ambiguous, but is no worse than our current output in a non-combined
diff.

I'll repeat Brian's request for tests. I trust the compiler and the
test suite more than I trust my ability to read code.

Thanks,
-Stolee



Re: [PATCH 0/6] Create commit-graph file format v2

2019-01-25 Thread Derrick Stolee
On 1/24/2019 6:39 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget"  writes:
> 
>> This series is based on ab/commit-graph-write-progress and bc/sha-256.
> 
> Thanks.
> 
> It seems that the base (i.e. merge between these two topics) you
> used may have used a version of either topic (most likely the
> latter) slightly older than what I have, as patches 1 and 2 seem to
> lack the local variable "hashsz" in the context, near the beginning
> of commit_graph_write().  I wiggled the patches in, but it has too
> heavy conflict merging to 'pu', so it may have to wait until the
> other topics stabilize a bit further.

Sorry that the merge was painful. I would have waited longer for
things to stabilize, but I'm expecting to go on paternity leave
soon. Didn't want to get the idea out there before I disappear
for a while.

When things stabilize, I may have time to do a rebase and work
out the details myself. Otherwise, everyone has my blessing to
take work I've started and move it forward themselves.

Thanks,
-Stolee


[PATCH v4 08/10] midx: implement midx_repack()

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

To repack using a multi-pack-index, first sort all pack-files by
their modified time. Second, walk those pack-files from oldest
to newest, adding the packs to a list if they are smaller than the
given pack-size. Finally, collect the objects from the multi-pack-
index that are in those packs and send them to 'git pack-objects'.

While first designing a 'git multi-pack-index repack' operation, I
started by collecting the batches based on the size of the objects
instead of the size of the pack-files. This allows repacking a
large pack-file that has very few referencd objects. However, this
came at a significant cost of parsing pack-files instead of simply
reading the multi-pack-index and getting the file information for
the pack-files. This object-size idea could be a direction for
future expansion in this area.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 109 +++-
 t/t5319-multi-pack-index.sh |  31 +-
 2 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 768a7dff73..7d81bf943e 100644
--- a/midx.c
+++ b/midx.c
@@ -8,6 +8,7 @@
 #include "sha1-lookup.h"
 #include "midx.h"
 #include "progress.h"
+#include "run-command.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -1113,7 +1114,113 @@ int expire_midx_packs(const char *object_dir)
return result;
 }
 
-int midx_repack(const char *object_dir, size_t batch_size)
+struct time_and_id {
+   timestamp_t mtime;
+   uint32_t pack_int_id;
+};
+
+static int compare_by_mtime(const void *a_, const void *b_)
 {
+   const struct time_and_id *a, *b;
+
+   a = (const struct time_and_id *)a_;
+   b = (const struct time_and_id *)b_;
+
+   if (a->mtime < b->mtime)
+   return -1;
+   if (a->mtime > b->mtime)
+   return 1;
return 0;
 }
+
+int midx_repack(const char *object_dir, size_t batch_size)
+{
+   int result = 0;
+   uint32_t i, packs_to_repack;
+   size_t total_size;
+   struct time_and_id *pack_ti;
+   unsigned char *include_pack;
+   struct child_process cmd = CHILD_PROCESS_INIT;
+   struct strbuf base_name = STRBUF_INIT;
+   struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+
+   if (!m)
+   return 0;
+
+   include_pack = xcalloc(m->num_packs, sizeof(unsigned char));
+   pack_ti = xcalloc(m->num_packs, sizeof(struct time_and_id));
+
+   for (i = 0; i < m->num_packs; i++) {
+   pack_ti[i].pack_int_id = i;
+
+   if (prepare_midx_pack(m, i))
+   continue;
+
+   pack_ti[i].mtime = m->packs[i]->mtime;
+   }
+   QSORT(pack_ti, m->num_packs, compare_by_mtime);
+
+   total_size = 0;
+   packs_to_repack = 0;
+   for (i = 0; total_size < batch_size && i < m->num_packs; i++) {
+   int pack_int_id = pack_ti[i].pack_int_id;
+   struct packed_git *p = m->packs[pack_int_id];
+
+   if (!p)
+   continue;
+   if (p->pack_size >= batch_size)
+   continue;
+
+   packs_to_repack++;
+   total_size += p->pack_size;
+   include_pack[pack_int_id] = 1;
+   }
+
+   if (total_size < batch_size || packs_to_repack < 2)
+   goto cleanup;
+
+   argv_array_push(&cmd.args, "pack-objects");
+
+   strbuf_addstr(&base_name, object_dir);
+   strbuf_addstr(&base_name, "/pack/pack");
+   argv_array_push(&cmd.args, base_name.buf);
+   strbuf_release(&base_name);
+
+   cmd.git_cmd = 1;
+   cmd.in = cmd.out = -1;
+
+   if (start_command(&cmd)) {
+   error(_("could not start pack-objects"));
+   result = 1;
+   goto cleanup;
+   }
+
+   for (i = 0; i < m->num_objects; i++) {
+   struct object_id oid;
+   uint32_t pack_int_id = nth_midxed_pack_int_id(m, i);
+
+   if (!include_pack[pack_int_id])
+   continue;
+
+   nth_midxed_object_oid(&oid, m, i);
+   xwrite(cmd.in, oid_to_hex(&oid), the_hash_algo->hexsz);
+   xwrite(cmd.in, "\n", 1);
+   }
+   close(cmd.in);
+
+   if (finish_command(&cmd)) {
+   error(_("could not finish pack-objects"));
+   result = 1;
+   goto cleanup;
+   }
+
+   result = write_midx_internal(object_dir, m, NULL);
+   m = NULL;
+
+cleanup:
+   if (m)
+   close_midx(m);
+   free(include_pack);
+   free(pack_ti);
+   return result;
+}
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh

[PATCH v4 10/10] midx: add test that 'expire' respects .keep files

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The 'git multi-pack-index expire' subcommand may delete packs that
are not needed from the perspective of the multi-pack-index. If
a pack has a .keep file, then we should not delete that pack. Add
a test that ensures we preserve a pack that would otherwise be
expired. First, create a new pack that contains every object in
the repo, then add it to the multi-pack-index. Then create a .keep
file for a pack starting with "a-pack" that was added in the
previous test. Finally, expire and verify that the pack remains
and the other packs were expired.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 19b769eea0..bcfa520401 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -495,4 +495,22 @@ test_expect_success 'expire works when adding new packs' '
)
 '
 
+test_expect_success 'expire respects .keep files' '
+   (
+   cd dup &&
+   git pack-objects --revs .git/objects/pack/pack-all <<-EOF &&
+   refs/heads/A
+   EOF
+   git multi-pack-index write &&
+   PACKA=$(ls .git/objects/pack/a-pack*\.pack | sed s/\.pack\$//) 
&&
+   touch $PACKA.keep &&
+   git multi-pack-index expire &&
+   ls -S .git/objects/pack/a-pack* | grep $PACKA >a-pack-files &&
+   test_line_count = 3 a-pack-files &&
+   test-tool read-midx .git/objects | grep idx >midx-list &&
+   test_line_count = 2 midx-list
+   )
+'
+
+
 test_done
-- 
gitgitgadget


[PATCH v4 06/10] multi-pack-index: implement 'expire' subcommand

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The 'git multi-pack-index expire' subcommand looks at the existing
mult-pack-index, counts the number of objects referenced in each
pack-file, deletes the pack-fils with no referenced objects, and
rewrites the multi-pack-index to no longer reference those packs.

Refactor the write_midx_file() method to call write_midx_internal()
which now takes an existing 'struct multi_pack_index' and a list
of pack-files to drop (as specified by the names of their pack-
indexes). As we write the new multi-pack-index, we drop those
file names from the list of known pack-files.

The expire_midx_packs() method removes the unreferenced pack-files
after carefully closing the packs to avoid open handles.

Test that a new pack-file that covers the contents of two other
pack-files leads to those pack-files being deleted during the
expire subcommand. Be sure to read the multi-pack-index to ensure
it no longer references those packs.

Signed-off-by: Derrick Stolee 
---
 midx.c  | 120 +---
 t/t5319-multi-pack-index.sh |  20 ++
 2 files changed, 130 insertions(+), 10 deletions(-)

diff --git a/midx.c b/midx.c
index 95c39106b2..299e9b2e8f 100644
--- a/midx.c
+++ b/midx.c
@@ -33,6 +33,8 @@
 #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
 #define MIDX_LARGE_OFFSET_NEEDED 0x8000
 
+#define PACK_EXPIRED UINT_MAX
+
 static char *get_midx_filename(const char *object_dir)
 {
return xstrfmt("%s/pack/multi-pack-index", object_dir);
@@ -381,6 +383,7 @@ struct pack_info {
uint32_t orig_pack_int_id;
char *pack_name;
struct packed_git *p;
+   unsigned expired : 1;
 };
 
 static int pack_info_compare(const void *_a, const void *_b)
@@ -428,6 +431,7 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
 
packs->info[packs->nr].pack_name = xstrdup(file_name);
packs->info[packs->nr].orig_pack_int_id = packs->nr;
+   packs->info[packs->nr].expired = 0;
packs->nr++;
}
 }
@@ -587,13 +591,17 @@ static size_t write_midx_pack_names(struct hashfile *f,
size_t written = 0;
 
for (i = 0; i < num_packs; i++) {
-   size_t writelen = strlen(info[i].pack_name) + 1;
+   size_t writelen;
+
+   if (info[i].expired)
+   continue;
 
if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0)
BUG("incorrect pack-file order: %s before %s",
info[i - 1].pack_name,
info[i].pack_name);
 
+   writelen = strlen(info[i].pack_name) + 1;
hashwrite(f, info[i].pack_name, writelen);
written += writelen;
}
@@ -675,6 +683,11 @@ static size_t write_midx_object_offsets(struct hashfile 
*f, int large_offset_nee
for (i = 0; i < nr_objects; i++) {
struct pack_midx_entry *obj = list++;
 
+   if (perm[obj->pack_int_id] == PACK_EXPIRED)
+   BUG("object %s is in an expired pack with int-id %d",
+   oid_to_hex(&obj->oid),
+   obj->pack_int_id);
+
hashwrite_be32(f, perm[obj->pack_int_id]);
 
if (large_offset_needed && obj->offset >> 31)
@@ -721,7 +734,8 @@ static size_t write_midx_large_offsets(struct hashfile *f, 
uint32_t nr_large_off
return written;
 }
 
-int write_midx_file(const char *object_dir)
+static int write_midx_internal(const char *object_dir, struct multi_pack_index 
*m,
+  struct string_list *packs_to_drop)
 {
unsigned char cur_chunk, num_chunks = 0;
char *midx_name;
@@ -737,6 +751,8 @@ int write_midx_file(const char *object_dir)
struct pack_midx_entry *entries = NULL;
int large_offsets_needed = 0;
int pack_name_concat_len = 0;
+   int dropped_packs = 0;
+   int result = 0;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -745,7 +761,10 @@ int write_midx_file(const char *object_dir)
  midx_name);
}
 
-   packs.m = load_multi_pack_index(object_dir, 1);
+   if (m)
+   packs.m = m;
+   else
+   packs.m = load_multi_pack_index(object_dir, 1);
 
packs.nr = 0;
packs.alloc = packs.m ? packs.m->num_packs : 16;
@@ -759,13 +778,14 @@ int write_midx_file(const char *object_dir)
packs.info[packs.nr].orig_pack_int_id = i;
packs.info[packs.nr].pack_name = 
xstrdup(packs.m->pack_names[i]);
packs.info[packs.nr].p = NULL;
+   packs.info[packs.nr].expired = 0;
  

[PATCH v4 07/10] multi-pack-index: prepare 'repack' subcommand

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

In an environment where the multi-pack-index is useful, it is due
to many pack-files and an inability to repack the object store
into a single pack-file. However, it is likely that many of these
pack-files are rather small, and could be repacked into a slightly
larger pack-file without too much effort. It may also be important
to ensure the object store is highly available and the repack
operation does not interrupt concurrent git commands.

Introduce a 'repack' subcommand to 'git multi-pack-index' that
takes a '--batch-size' option. The subcommand will inspect the
multi-pack-index for referenced pack-files whose size is smaller
than the batch size, until collecting a list of pack-files whose
sizes sum to larger than the batch size. Then, a new pack-file
will be created containing the objects from those pack-files that
are referenced by the multi-pack-index. The resulting pack is
likely to actually be smaller than the batch size due to
compression and the fact that there may be objects in the pack-
files that have duplicate copies in other pack-files.

The current change introduces the command-line arguments, and we
add a test that ensures we parse these options properly. Since
we specify a small batch size, we will guarantee that future
implementations do not change the list of pack-files.

In addition, we hard-code the modified times of the packs in
the pack directory to ensure the list of packs sorted by modified
time matches the order if sorted by size (ascending). This will
be important in a future test.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt | 11 +++
 builtin/multi-pack-index.c | 12 ++--
 midx.c |  5 +
 midx.h |  1 +
 t/t5319-multi-pack-index.sh| 17 +
 5 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index 6186c4c936..de345c2400 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -36,6 +36,17 @@ expire::
have no objects referenced by the MIDX. Rewrite the MIDX file
afterward to remove all references to these pack-files.
 
+repack::
+   Create a new pack-file containing objects in small pack-files
+   referenced by the multi-pack-index. Select the pack-files by
+   examining packs from oldest-to-newest, adding a pack if its
+   size is below the batch size. Stop adding packs when the sum
+   of sizes of the added packs is above the batch size. If the
+   total size does not reach the batch size, then do nothing.
+   Rewrite the multi-pack-index to reference the new pack-file.
+   A later run of 'git multi-pack-index expire' will delete the
+   pack-files that were part of this batch.
+
 
 EXAMPLES
 
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 145de3a46c..c66239de33 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,12 +5,13 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-   N_("git multi-pack-index [--object-dir=] (write|verify|expire)"),
+   N_("git multi-pack-index [--object-dir=] 
(write|verify|expire|repack --batch-size=)"),
NULL
 };
 
 static struct opts_multi_pack_index {
const char *object_dir;
+   unsigned long batch_size;
 } opts;
 
 int cmd_multi_pack_index(int argc, const char **argv,
@@ -19,6 +20,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
static struct option builtin_multi_pack_index_options[] = {
OPT_FILENAME(0, "object-dir", &opts.object_dir,
  N_("object directory containing set of packfile and 
pack-index pairs")),
+   OPT_MAGNITUDE(0, "batch-size", &opts.batch_size,
+ N_("during repack, collect pack-files of smaller size into a 
batch that is larger than this size")),
OPT_END(),
};
 
@@ -40,6 +43,11 @@ int cmd_multi_pack_index(int argc, const char **argv,
return 1;
}
 
+   if (!strcmp(argv[0], "repack"))
+   return midx_repack(opts.object_dir, (size_t)opts.batch_size);
+   if (opts.batch_size)
+   die(_("--batch-size option is only for 'repack' subcommand"));
+
if (!strcmp(argv[0], "write"))
return write_midx_file(opts.object_dir);
if (!strcmp(argv[0], "verify"))
@@ -47,5 +55,5 @@ int cmd_multi_pack_index(int argc, const char **argv,
if (!strcmp(argv[0], "expire"))
return expire_midx_packs(opts.object_dir);
 
-   die(_("unrecognized verb: %s"), argv[0]);
+   die(_(&q

[PATCH v4 02/10] Docs: rearrange subcommands for multi-pack-index

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

We will add new subcommands to the multi-pack-index, and that will
make the documentation a bit messier. Clean up the 'verb'
descriptions by renaming the concept to 'subcommand' and removing
the reference to the object directory.

Helped-by: Stefan Beller 
Helped-by: Szeder Gábor 
Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index f7778a2c85..1af406aca2 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -9,7 +9,7 @@ git-multi-pack-index - Write and verify multi-pack-indexes
 SYNOPSIS
 
 [verse]
-'git multi-pack-index' [--object-dir=] 
+'git multi-pack-index' [--object-dir=] 
 
 DESCRIPTION
 ---
@@ -23,13 +23,13 @@ OPTIONS
`/packs/multi-pack-index` for the current MIDX file, and
`/packs` for the pack-files to index.
 
+The following subcommands are available:
+
 write::
-   When given as the verb, write a new MIDX file to
-   `/packs/multi-pack-index`.
+   Write a new MIDX file.
 
 verify::
-   When given as the verb, verify the contents of the MIDX file
-   at `/packs/multi-pack-index`.
+   Verify the contents of the MIDX file.
 
 
 EXAMPLES
-- 
gitgitgadget



[PATCH v4 03/10] multi-pack-index: prepare for 'expire' subcommand

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The multi-pack-index tracks objects in a collection of pack-files.
Only one copy of each object is indexed, using the modified time
of the pack-files to determine tie-breakers. It is possible to
have a pack-file with no referenced objects because all objects
have a duplicate in a newer pack-file.

Introduce a new 'expire' subcommand to the multi-pack-index builtin.
This subcommand will delete these unused pack-files and rewrite the
multi-pack-index to no longer refer to those files. More details
about the specifics will follow as the method is implemented.

Add a test that verifies the 'expire' subcommand is correctly wired,
but will still be valid when the verb is implemented. Specifically,
create a set of packs that should all have referenced objects and
should not be removed during an 'expire' operation. The packs are
created carefully to ensure they have a specific order when sorted
by size. This will be important in a later test.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-multi-pack-index.txt |  5 +++
 builtin/multi-pack-index.c |  4 ++-
 midx.c |  5 +++
 midx.h |  1 +
 t/t5319-multi-pack-index.sh| 49 ++
 5 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index 1af406aca2..6186c4c936 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -31,6 +31,11 @@ write::
 verify::
Verify the contents of the MIDX file.
 
+expire::
+   Delete the pack-files that are tracked  by the MIDX file, but
+   have no objects referenced by the MIDX. Rewrite the MIDX file
+   afterward to remove all references to these pack-files.
+
 
 EXAMPLES
 
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index fca70f8e4f..145de3a46c 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,7 +5,7 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-   N_("git multi-pack-index [--object-dir=] (write|verify)"),
+   N_("git multi-pack-index [--object-dir=] (write|verify|expire)"),
NULL
 };
 
@@ -44,6 +44,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
return write_midx_file(opts.object_dir);
if (!strcmp(argv[0], "verify"))
return verify_midx_file(opts.object_dir);
+   if (!strcmp(argv[0], "expire"))
+   return expire_midx_packs(opts.object_dir);
 
die(_("unrecognized verb: %s"), argv[0]);
 }
diff --git a/midx.c b/midx.c
index 730ff84dff..bb825ef816 100644
--- a/midx.c
+++ b/midx.c
@@ -1025,3 +1025,8 @@ int verify_midx_file(const char *object_dir)
 
return verify_midx_error;
 }
+
+int expire_midx_packs(const char *object_dir)
+{
+   return 0;
+}
diff --git a/midx.h b/midx.h
index 774f652530..e3a2b740b5 100644
--- a/midx.h
+++ b/midx.h
@@ -49,6 +49,7 @@ int prepare_multi_pack_index_one(struct repository *r, const 
char *object_dir, i
 int write_midx_file(const char *object_dir);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(const char *object_dir);
+int expire_midx_packs(const char *object_dir);
 
 void close_midx(struct multi_pack_index *m);
 
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 70926b5bc0..a8528f7da0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -348,4 +348,53 @@ test_expect_success 'verify incorrect 64-bit offset' '
"incorrect object offset"
 '
 
+test_expect_success 'setup expire tests' '
+   mkdir dup &&
+   (
+   cd dup &&
+   git init &&
+   test-tool genrandom "data" 4096 >large_file.txt &&
+   git update-index --add large_file.txt &&
+   for i in $(test_seq 1 20)
+   do
+   test_commit $i
+   done &&
+   git branch A HEAD &&
+   git branch B HEAD~8 &&
+   git branch C HEAD~13 &&
+   git branch D HEAD~16 &&
+   git branch E HEAD~18 &&
+   git pack-objects --revs .git/objects/pack/pack-A <<-EOF &&
+   refs/heads/A
+   ^refs/heads/B
+   EOF
+   git pack-objects --revs .git/objects/pack/pack-B <<-EOF &&
+   refs/heads/B
+   ^refs/heads/C
+   EOF
+   git pack-objects --revs .git/objects/pack/pack-C <<-EOF &&
+   refs/heads/C
+   ^refs/heads/D
+   EOF

[PATCH v4 05/10] midx: refactor permutation logic and pack sorting

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

In anticipation of the expire subcommand, refactor the way we sort
the packfiles by name. This will greatly simplify our approach to
dropping expired packs from the list.

First, create 'struct pack_info' to replace 'struct pack_pair'.
This struct contains the necessary information about a pack,
including its name, a pointer to its packfile struct (if not
already in the multi-pack-index), and the original pack-int-id.

Second, track the pack information using an array of pack_info
structs in the pack_list struct. This simplifies the logic around
the multiple arrays we were tracking in that struct.

Finally, update get_sorted_entries() to not permute the pack-int-id
and instead supply the permutation to write_midx_object_offsets().
This requires sorting the packs after get_sorted_entries().

Signed-off-by: Derrick Stolee 
---
 midx.c | 156 +
 1 file changed, 69 insertions(+), 87 deletions(-)

diff --git a/midx.c b/midx.c
index f087bbbe82..95c39106b2 100644
--- a/midx.c
+++ b/midx.c
@@ -377,12 +377,23 @@ static size_t write_midx_header(struct hashfile *f,
return MIDX_HEADER_SIZE;
 }
 
+struct pack_info {
+   uint32_t orig_pack_int_id;
+   char *pack_name;
+   struct packed_git *p;
+};
+
+static int pack_info_compare(const void *_a, const void *_b)
+{
+   struct pack_info *a = (struct pack_info *)_a;
+   struct pack_info *b = (struct pack_info *)_b;
+   return strcmp(a->pack_name, b->pack_name);
+}
+
 struct pack_list {
-   struct packed_git **list;
-   char **names;
+   struct pack_info *info;
uint32_t nr;
-   uint32_t alloc_list;
-   uint32_t alloc_names;
+   uint32_t alloc;
struct multi_pack_index *m;
 };
 
@@ -395,66 +406,32 @@ static void add_pack_to_midx(const char *full_path, 
size_t full_path_len,
if (packs->m && midx_contains_pack(packs->m, file_name))
return;
 
-   ALLOC_GROW(packs->list, packs->nr + 1, packs->alloc_list);
-   ALLOC_GROW(packs->names, packs->nr + 1, packs->alloc_names);
+   ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc);
 
-   packs->list[packs->nr] = add_packed_git(full_path,
-   full_path_len,
-   0);
+   packs->info[packs->nr].p = add_packed_git(full_path,
+ full_path_len,
+ 0);
 
-   if (!packs->list[packs->nr]) {
+   if (!packs->info[packs->nr].p) {
warning(_("failed to add packfile '%s'"),
full_path);
return;
}
 
-   if (open_pack_index(packs->list[packs->nr])) {
+   if (open_pack_index(packs->info[packs->nr].p)) {
warning(_("failed to open pack-index '%s'"),
full_path);
-   close_pack(packs->list[packs->nr]);
-   FREE_AND_NULL(packs->list[packs->nr]);
+   close_pack(packs->info[packs->nr].p);
+   FREE_AND_NULL(packs->info[packs->nr].p);
return;
}
 
-   packs->names[packs->nr] = xstrdup(file_name);
+   packs->info[packs->nr].pack_name = xstrdup(file_name);
+   packs->info[packs->nr].orig_pack_int_id = packs->nr;
packs->nr++;
}
 }
 
-struct pack_pair {
-   uint32_t pack_int_id;
-   char *pack_name;
-};
-
-static int pack_pair_compare(const void *_a, const void *_b)
-{
-   struct pack_pair *a = (struct pack_pair *)_a;
-   struct pack_pair *b = (struct pack_pair *)_b;
-   return strcmp(a->pack_name, b->pack_name);
-}
-
-static void sort_packs_by_name(char **pack_names, uint32_t nr_packs, uint32_t 
*perm)
-{
-   uint32_t i;
-   struct pack_pair *pairs;
-
-   ALLOC_ARRAY(pairs, nr_packs);
-
-   for (i = 0; i < nr_packs; i++) {
-   pairs[i].pack_int_id = i;
-   pairs[i].pack_name = pack_names[i];
-   }
-
-   QSORT(pairs, nr_packs, pack_pair_compare);
-
-   for (i = 0; i < nr_packs; i++) {
-   pack_names[i] = pairs[i].pack_name;
-   perm[pairs[i].pack_int_id] = i;
-   }
-
-   free(pairs);
-}
-
 struct pack_midx_entry {
struct object_id oid;
uint32_t pack_int_id;
@@ -480,7 +457,6 @@ static int midx_oid_compare(const void *_a, const void *_b)
 }
 
 static int nth_midxed_pack_midx_entry(struct multi_pack_index *m,
- 

[PATCH v4 04/10] midx: simplify computation of pack name lengths

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

Before writing the multi-pack-index, we compute the length of the
pack-index names concatenated together. This forms the data in the
pack name chunk, and we precompute it to compute chunk offsets.
The value is also modified to fit alignment needs.

Previously, this computation was coupled with adding packs from
the existing multi-pack-index and the remaining packs in the object
dir not already covered by the multi-pack-index.

In anticipation of this becoming more complicated with the 'expire'
subcommand, simplify the computation by centralizing it to a single
loop before writing the file.

Signed-off-by: Derrick Stolee 
---
 midx.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/midx.c b/midx.c
index bb825ef816..f087bbbe82 100644
--- a/midx.c
+++ b/midx.c
@@ -383,7 +383,6 @@ struct pack_list {
uint32_t nr;
uint32_t alloc_list;
uint32_t alloc_names;
-   size_t pack_name_concat_len;
struct multi_pack_index *m;
 };
 
@@ -418,7 +417,6 @@ static void add_pack_to_midx(const char *full_path, size_t 
full_path_len,
}
 
packs->names[packs->nr] = xstrdup(file_name);
-   packs->pack_name_concat_len += strlen(file_name) + 1;
packs->nr++;
}
 }
@@ -762,6 +760,7 @@ int write_midx_file(const char *object_dir)
uint32_t nr_entries, num_large_offsets = 0;
struct pack_midx_entry *entries = NULL;
int large_offsets_needed = 0;
+   int pack_name_concat_len = 0;
 
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name)) {
@@ -777,7 +776,6 @@ int write_midx_file(const char *object_dir)
packs.alloc_names = packs.alloc_list;
packs.list = NULL;
packs.names = NULL;
-   packs.pack_name_concat_len = 0;
ALLOC_ARRAY(packs.list, packs.alloc_list);
ALLOC_ARRAY(packs.names, packs.alloc_names);
 
@@ -788,7 +786,6 @@ int write_midx_file(const char *object_dir)
 
packs.list[packs.nr] = NULL;
packs.names[packs.nr] = xstrdup(packs.m->pack_names[i]);
-   packs.pack_name_concat_len += 
strlen(packs.names[packs.nr]) + 1;
packs.nr++;
}
}
@@ -798,10 +795,6 @@ int write_midx_file(const char *object_dir)
if (packs.m && packs.nr == packs.m->num_packs)
goto cleanup;
 
-   if (packs.pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
-   packs.pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
- (packs.pack_name_concat_len % 
MIDX_CHUNK_ALIGNMENT);
-
ALLOC_ARRAY(pack_perm, packs.nr);
sort_packs_by_name(packs.names, packs.nr, pack_perm);
 
@@ -814,6 +807,13 @@ int write_midx_file(const char *object_dir)
large_offsets_needed = 1;
}
 
+   for (i = 0; i < packs.nr; i++)
+   pack_name_concat_len += strlen(packs.names[i]) + 1;
+
+   if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
+   pack_name_concat_len += MIDX_CHUNK_ALIGNMENT -
+   (pack_name_concat_len % 
MIDX_CHUNK_ALIGNMENT);
+
hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
f = hashfd(lk.tempfile->fd, lk.tempfile->filename.buf);
FREE_AND_NULL(midx_name);
@@ -831,7 +831,7 @@ int write_midx_file(const char *object_dir)
 
cur_chunk++;
chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT;
-   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
packs.pack_name_concat_len;
+   chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + 
pack_name_concat_len;
 
cur_chunk++;
chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
-- 
gitgitgadget



[PATCH v4 09/10] multi-pack-index: test expire while adding packs

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

During development of the multi-pack-index expire subcommand, a
version went out that improperly computed the pack order if a new
pack was introduced while other packs were being removed. Part of
the subtlety of the bug involved the new pack being placed before
other packs that already existed in the multi-pack-index.

Add a test to t5319-multi-pack-index.sh that catches this issue.
The test adds new packs that cause another pack to be expired, and
creates new packs that are lexicographically sorted before and
after the existing packs.

Signed-off-by: Derrick Stolee 
---
 t/t5319-multi-pack-index.sh | 32 
 1 file changed, 32 insertions(+)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index d6c1353514..19b769eea0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -463,4 +463,36 @@ test_expect_success 'expire removes repacked packs' '
)
 '
 
+test_expect_success 'expire works when adding new packs' '
+   (
+   cd dup &&
+   git pack-objects --revs .git/objects/pack/pack-combined <<-EOF 
&&
+   refs/heads/A
+   ^refs/heads/B
+   EOF
+   git pack-objects --revs .git/objects/pack/pack-combined <<-EOF 
&&
+   refs/heads/B
+   ^refs/heads/C
+   EOF
+   git pack-objects --revs .git/objects/pack/pack-combined <<-EOF 
&&
+   refs/heads/C
+   ^refs/heads/D
+   EOF
+   git multi-pack-index write &&
+   git pack-objects --revs .git/objects/pack/a-pack <<-EOF &&
+   refs/heads/D
+   ^refs/heads/E
+   EOF
+   git multi-pack-index write &&
+   git pack-objects --revs .git/objects/pack/z-pack <<-EOF &&
+   refs/heads/E
+   EOF
+   git multi-pack-index expire &&
+   ls .git/objects/pack/ | grep idx >expect &&
+   test-tool read-midx .git/objects | grep idx >actual &&
+   test_cmp expect actual &&
+   git multi-pack-index verify
+   )
+'
+
 test_done
-- 
gitgitgadget



[PATCH v4 01/10] repack: refactor pack deletion for future use

2019-01-24 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The repack builtin deletes redundant pack-files and their
associated .idx, .promisor, .bitmap, and .keep files. We will want
to re-use this logic in the future for other types of repack, so
pull the logic into 'unlink_pack_path()' in packfile.c.

The 'ignore_keep' parameter is enabled for the use in repack, but
will be important for a future caller.

Signed-off-by: Derrick Stolee 
---
 builtin/repack.c | 14 ++
 packfile.c   | 28 
 packfile.h   |  7 +++
 3 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 45583683ee..3d445b34b4 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -129,19 +129,9 @@ static void get_non_kept_pack_filenames(struct string_list 
*fname_list,
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
-   const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", ".promisor"};
-   int i;
struct strbuf buf = STRBUF_INIT;
-   size_t plen;
-
-   strbuf_addf(&buf, "%s/%s", dir_name, base_name);
-   plen = buf.len;
-
-   for (i = 0; i < ARRAY_SIZE(exts); i++) {
-   strbuf_setlen(&buf, plen);
-   strbuf_addstr(&buf, exts[i]);
-   unlink(buf.buf);
-   }
+   strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
+   unlink_pack_path(buf.buf, 1);
strbuf_release(&buf);
 }
 
diff --git a/packfile.c b/packfile.c
index d1e6683ffe..bacecb4d0d 100644
--- a/packfile.c
+++ b/packfile.c
@@ -352,6 +352,34 @@ void close_all_packs(struct raw_object_store *o)
}
 }
 
+void unlink_pack_path(const char *pack_name, int force_delete)
+{
+   static const char *exts[] = {".pack", ".idx", ".keep", ".bitmap", 
".promisor"};
+   int i;
+   struct strbuf buf = STRBUF_INIT;
+   size_t plen;
+
+   strbuf_addstr(&buf, pack_name);
+   strip_suffix_mem(buf.buf, &buf.len, ".pack");
+   plen = buf.len;
+
+   if (!force_delete) {
+   strbuf_addstr(&buf, ".keep");
+   if (!access(buf.buf, F_OK)) {
+   strbuf_release(&buf);
+   return;
+   }
+   }
+
+   for (i = 0; i < ARRAY_SIZE(exts); i++) {
+   strbuf_setlen(&buf, plen);
+   strbuf_addstr(&buf, exts[i]);
+   unlink(buf.buf);
+   }
+
+   strbuf_release(&buf);
+}
+
 /*
  * The LRU pack is the one with the oldest MRU window, preferring packs
  * with no used windows, or the oldest mtime if it has no windows allocated.
diff --git a/packfile.h b/packfile.h
index 6c4037605d..5b7bcdb1dd 100644
--- a/packfile.h
+++ b/packfile.h
@@ -86,6 +86,13 @@ extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
+/*
+ * Unlink the .pack and associated extension files.
+ * Does not unlink if 'force_delete' is false and the pack-file is
+ * marked as ".keep".
+ */
+extern void unlink_pack_path(const char *pack_name, int force_delete);
+
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
  * and can provide at least 8 bytes of data.
-- 
gitgitgadget



[PATCH v4 00/10] Create 'expire' and 'repack' verbs for git-multi-pack-index

2019-01-24 Thread Derrick Stolee via GitGitGadget
The multi-pack-index provides a fast way to find an object among a large
list of pack-files. It stores a single pack-reference for each object id, so
duplicate objects are ignored. Among a list of pack-files storing the same
object, the most-recently modified one is used.

Create new subcommands for the multi-pack-index builtin.

 * 'git multi-pack-index expire': If we have a pack-file indexed by the
   multi-pack-index, but all objects in that pack are duplicated in
   more-recently modified packs, then delete that pack (and any others like
   it). Delete the reference to that pack in the multi-pack-index.
   
   
 * 'git multi-pack-index repack --batch-size=': Starting from the oldest
   pack-files covered by the multi-pack-index, find those whose on-disk size
   is below the batch size until we have a collection of packs whose sizes
   add up to the batch size. Create a new pack containing all objects that
   the multi-pack-index references to those packs.
   
   

This allows us to create a new pattern for repacking objects: run 'repack'.
After enough time has passed that all Git commands that started before the
last 'repack' are finished, run 'expire' again. This approach has some
advantages over the existing "repack everything" model:

 1. Incremental. We can repack a small batch of objects at a time, instead
of repacking all reachable objects. We can also limit ourselves to the
objects that do not appear in newer pack-files.


 2. Highly Available. By adding a new pack-file (and not deleting the old
pack-files) we do not interrupt concurrent Git commands, and do not
suffer performance degradation. By expiring only pack-files that have no
referenced objects, we know that Git commands that are doing normal
object lookups* will not be interrupted.


 3. Note: if someone concurrently runs a Git command that uses
get_all_packs(), then that command could try to read the pack-files and
pack-indexes that we are deleting during an expire command. Such
commands are usually related to object maintenance (i.e. fsck, gc,
pack-objects) or are related to less-often-used features (i.e.
fast-import, http-backend, server-info).



We plan to use this approach in VFS for Git to do background maintenance of
the "shared object cache" which is a Git alternate directory filled with
packfiles containing commits and trees. We currently download pack-files on
an hourly basis to keep up-to-date with the central server. The cache
servers supply packs on an hourly and daily basis, so most of the hourly
packs become useless after a new daily pack is downloaded. The 'expire'
command would clear out most of those packs, but many will still remain with
fewer than 100 objects remaining. The 'repack' command (with a batch size of
1-3gb, probably) can condense the remaining packs in commands that run for
1-3 min at a time. Since the daily packs range from 100-250mb, we will also
combine and condense those packs.

Updates in V2:

 * Added a method, unlink_pack_path() to remove packfiles, but with the
   additional check for a .keep file. This borrows logic from 
   builtin/repack.c.
   
   
 * Modified documentation and commit messages to replace 'verb' with
   'subcommand'. Simplified the documentation. (I left 'verbs' in the title
   of the cover letter for consistency.)
   
   

Updates in V3:

 * There was a bug in the expire logic when simultaneously removing packs
   and adding uncovered packs, specifically around the pack permutation.
   This was hard to see during review because I was using the 'pack_perm'
   array for multiple purposes. First, I was reducing its length, and then I
   was adding to it and resorting. In V3, I significantly overhauled the
   logic here, which required some extra commits before implementing
   'expire'. The final commit includes a test that would cover this case.

Updates in V4:

 * More 'verb' and 'command' instances replaced with 'subcommand'. I grepped
   the patch to check these should be fixed everywhere.
   
   
 * Update the tests to check .keep files (in last patch).
   
   
 * Modify the tests to show the terminating condition of --batch-size when
   there are three packs that fit under the size, but the first two are
   large enough to stop adding packs. This required rearranging the packs
   slightly to get different sizes than we had before. Also, I added 'touch
   -t' to set the modified times so we can fix the order in which the packs
   are selected.
   
   
 * Added a comment about the purpose of pack_perm.
   
   

Thanks, -Stolee

Derrick Stolee (10):
  repack: refactor pack deletion for future use
  Docs: rearrange subcommands for multi-pack-index
  multi-pack-index: prepare for 'expire' subcommand
  midx: simplify

Re: [PATCH v3 7/9] multi-pack-index: prepare 'repack' subcommand

2019-01-24 Thread Derrick Stolee

On 1/23/2019 5:38 PM, Jonathan Tan wrote:

diff --git a/Documentation/git-multi-pack-index.txt 
b/Documentation/git-multi-pack-index.txt
index 6186c4c936..cc63531cc0 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -36,6 +36,17 @@ expire::
have no objects referenced by the MIDX. Rewrite the MIDX file
afterward to remove all references to these pack-files.
  
+repack::

+   Collect a batch of pack-files whose size are all at most the
+   size given by --batch-size, but whose sizes sum to larger
+   than --batch-size. The batch is selected by greedily adding
+   small pack-files starting with the oldest pack-files that fit
+   the size. Create a new pack-file containing the objects the
+   multi-pack-index indexes into those pack-files, and rewrite
+   the multi-pack-index to contain that pack-file. A later run
+   of 'git multi-pack-index expire' will delete the pack-files
+   that were part of this batch.

I see in the subsequent patch that you stop once the batch size is
matched or exceeded - I see that you mention "whose sizes sum to larger
than --batch-size", but this leads me to think that if the total so
happens to not exceed the batch size, don't do anything, but otherwise
repack *all* the small packs together.

I would write this as:

   Create a new packfile containing the objects in the N least-sized
   packfiles referenced by the multi-pack-index, where N is the smallest
   number such that the total size of the packfiles equals or exceeds the
   given batch size. Rewrite the multi-pack-index to reference the new
   packfile instead of the N packfiles. A later run of ...


Thanks for the suggestion.

It is slightly wrong, in that we don't sort by size. Instead we sort by 
modified time. That makes is a little complicated, but I'll give it 
another shot using your framing:


    Create a new pack-file containing objects in small pack-files
    referenced by the multi-pack-index. Select the pack-files by
    examining packs from oldest-to-newest, adding a pack if its
    size is below the batch size. Stop adding packs when the sum
    of sizes of the added packs is above the batch size. If the
    total size does not reach the batch size, then do nothing.
    Rewrite the multi-pack-index to reference the new pack-file.
    A later run of 'git multi-pack-index expire' will delete the
    pack-files that were part of this batch.

-Stolee


Re: Git Test Coverage Report (Sat Jan 19)

2019-01-24 Thread Derrick Stolee

On 1/24/2019 1:15 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


Here is today's test coverage report.

Also, there has been some feedback that it can be hard to manually
match up uncovered lines with names at the bottom of the summary. The
suggestion was to auto-generate an HTML report that could be posted to
a public page and referenced in this mail for those who prefer
that.

I wanted to "grep" for lines attributed to certain commits that
appear in the list, by filtering lines that begin with enough number
of hexdigits, except for those object names, but the attempt failed
miserably because of the line wrapping (which probably comes from
the assumption that it is OK because the "text/plain; format=flowed"
would not care).  If you can keep the long lines (due to the object
names and line numbers prefixed to each line) unsplit, it would be
more useful to locate and isolate lines.
This is likely more a problem with my workflow (pasting the report into 
Thunderbird and sending) than with the content itself.


If I instead created a text document and sent it with `git send-email`, 
then would the line endings work the way you want?


Thanks,
-Stolee


Re: [PATCH v3 5/9] midx: refactor permutation logic and pack sorting

2019-01-24 Thread Derrick Stolee

On 1/24/2019 12:34 PM, Derrick Stolee wrote:

On 1/23/2019 4:00 PM, Jonathan Tan wrote:

Indeed, the sorting of pack_info is moved to after get_sorted_entries().
Also, pack_perm[old number] = new number, as expected.


Thanks for chiming in with all the detail on the use of 'perm'. This 
is the most confusing part of this code path.



I think a comment explaining why the perm is needed would be helpful -
something explaining that the entries were generated using the old pack
numbers, so we need this mapping to be able to write them using the new
numbers.


I can put this comment in the struct definition. Is that the right 
place for it?


I mistakenly thought the pack_perm array was placed into the pack_list 
struct. I'll put the comment right before we populate the contents of 
the array.


Thanks,
-Stolee


Re: [PATCH v3 6/9] multi-pack-index: implement 'expire' verb

2019-01-24 Thread Derrick Stolee

On 1/23/2019 5:13 PM, Jonathan Tan wrote:

Maybe add a fsck at the end for sanity's sake. Also, I think that
preservation of .keep packfiles is an important feature, and maybe worth
a test.
Good points! I forgot to test the .keep stuff directly here because I 
have an equivalent test in VFS for Git, so my wires got crossed.


Thanks,
-Stolee


Re: [PATCH v3 5/9] midx: refactor permutation logic and pack sorting

2019-01-24 Thread Derrick Stolee

On 1/23/2019 4:00 PM, Jonathan Tan wrote:

Indeed, the sorting of pack_info is moved to after get_sorted_entries().
Also, pack_perm[old number] = new number, as expected.


Thanks for chiming in with all the detail on the use of 'perm'. This is 
the most confusing part of this code path.



I think a comment explaining why the perm is needed would be helpful -
something explaining that the entries were generated using the old pack
numbers, so we need this mapping to be able to write them using the new
numbers.


I can put this comment in the struct definition. Is that the right place 
for it?




As part of this review, I did attempt to make everything use the sorted
pack_info order, but failed. I got as far as making get_sorted_entries()
always use start_pack=0 and skip the pack_infos that didn't have the p
pointer (to match the existing behavior of get_sorted_entries() that
only operates on new pack_infos being added by add_pack_to_midx), but it
still didn't work, and I didn't investigate further.


In the previous version, I tried to use 'perm' and the arrays as they 
existed previously, but that led to the bug in that version. Hopefully 
the new code is easier to read and understand. The delta may actually be 
hard to clearly see that it does the same work, but reading the code 
after it is applied looks clearer (to me).


Thanks,
-Stolee


<    2   3   4   5   6   7   8   9   10   11   >