date:20180125

Re: Bug, git rebase -i does not abort correctly

2018-01-25 Thread Kaartic Sivaraam

Hi,

On Thursday 25 January 2018 05:43 PM, Duy Nguyen wrote:
> rebase scripts are too much for me, so I'll play the user reporting
> bugs this time :)
> 
> Instead of doing
> 
> $ git rebase -i --onto origin/master @~3
> 
> I sometimes accidentally type
> 
> $ git rebase -i origin/master @~3
> 
> ("rebase -i" is actually an alias, which is why I never forget to
type -i)
> 
> Usually the todo list in $EDITOR shows noop, I realize my mistake and
> try to abort it by clearing the todo list before saving and closing
> $EDITOR. The problem is, HEAD is moved away anyway (to origin/master
I
> think) 

For me it left HEAD at @~3. Reading the synopsis of `man git rebase` I
could guess that the corresponding abstract form would be,

$ git rebase -i  

> even if rebase is supposed to abort the operation and leave
> HEAD untouched.
> 

This might seem to be a bug as the comment in "git-rebase-todo" says,

However, if you remove everything, the rebase will be aborted.

But "man git rebase" clearly says,

If  is specified, git rebase will perform an automatic "git
checkout " before doing anything else. Otherwise it remains
on
the current branch.

Junio has previously confirmed that "git rebase [-i] 
" is just a short hand for "git checkout  && git rebase
[-i] ".[ref 1] So, it's not surprising that it left HEAD at
@~3 when you completely removed the contents of git-rebase-todo and
exited the editor.

Does that help solve your issue?

[ref 1]: 

https://public-inbox.org/git/%3cxmqqpo8387hz@gitster.mtv.corp.google.com%3E

-- 
Kaartic

QUOTE

“It is impossible to live without failing at something, unless you live
so cautiously that you might as well not have lived at all – in which
case, you fail by default.”

  - J. K. Rowling

WIKIPEDIA: DID YOU KNOW?

Only 33% of internet users in India have heard of Wikipedia !!

* What do you think could be the reason behind this?

* What are ways in which the awareness about Wikipedia in India and
other countries be increased ?

Reference:

* Give your ideas for increasing the awareness of Wikipedia in India
and
in other countries and get a Grant from the Wikimedia Foundation to
bring your idea to life.

  https://meta.wikimedia.org/wiki/Grants:IdeaLab/Inspire

* Know more about the awareness problem of Wikipedia

  https://meta.wikimedia.org/wiki/New_Readers/Awareness

  https://meta.wikimedia.org/wiki/New_Readers/Next_steps/Raising_awareness
-- 
KaarticHi,

I seem to able to reproduce your issue. More comments inline.

On Thursday 25 January 2018 05:43 PM, Duy Nguyen wrote:
> rebase scripts are too much for me, so I'll play the user reporting
> bugs this time :)
> 
> Instead of doing
> 
> $ git rebase -i --onto origin/master @~3
> 
> I sometimes accidentally type
> 
> $ git rebase -i origin/master @~3
> 
> ("rebase -i" is actually an alias, which is why I never forget to type -i)
> 
> Usually the todo list in $EDITOR shows noop, I realize my mistake and
> try to abort it by clearing the todo list before saving and closing
> $EDITOR. The problem is, HEAD is moved away anyway (to origin/master I
> think) 

For me it left HEAD at @~3. Reading the synopsis of `man git rebase` I
could guess that the corresponding abstract form would be,

$ git rebase -i  

> even if rebase is supposed to abort the operation and leave
> HEAD untouched.
> 

This might seem to be a bug as the comment in "git-rebase-todo" says,

However, if you remove everything, the rebase will be aborted.

But "man git rebase" clearly says,

If  is specified, git rebase will perform an automatic git
checkout  before doing anything else. Otherwise it remains on
   the current branch.
Surprisingly when git-rebase-todo's content is only a "noop" just
closing the editor without removing the contents also seems to be lying
as it checks
-- 
Kaartic

QUOTE

“It is impossible to live without failing at something, unless you live
so cautiously that you might as well not have lived at all – in which
case, you fail by default.”

  - J. K. Rowling

WIKIPEDIA: DID YOU KNOW?

Only 33% of internet users in India have heard of Wikipedia !!

* What do you think could be the reason behind this?

* What are ways in which the awareness about Wikipedia in India and
other countries be increased ?

Reference:

* Give your ideas for increasing the awareness of Wikipedia in India and
in other countries and get a Grant from the Wikimedia Foundation to
bring your idea to life.

  https://meta.wikimedia.org/wiki/Grants:IdeaLab/Inspire

* Know more about the awareness problem of Wikipedia

  https://meta.wikimedia.org/wiki/New_Readers/Awareness

  https://meta.wikimedia.org/wiki/New_Readers/Next_steps/Raising_awareness

RE: pushing a delete-only commit consumes too much traffic

2018-01-25 Thread Randall S. Becker

So the point here is that the jars are still in the repository. They are 
deleted in your branch, but any objects depending on others (without a lot more 
information on your exact repository structure) may cause packed objects to be 
sent upstream. The delete is local to your branch, but the delete does not mean 
the objects are actually removed from your repository. To actually effect a 
removal, the objects would need to be no longer accessible, thus git gc would 
remove them permanently. That may be difficult depending on security on your 
upstream repository and what you are able to do there. So:

A--- B--- C

If A has no jars, B has the jars and has A as its parent, and C has no jars, 
but has B as its parent, then B is still accessible and the jars still exist in 
the repository but only not at the HEAD of your branch. Your tree may vary. How 
a push gets objects from your repository to your upstream depends on more 
information that I have but the point is that the jars still exist in a 
distributed sense. Your repository state and the upstream repository state do 
not need to be (and are likely not) identical.

> -Original Message-
> From: Basin Ilya [mailto:basini...@gmail.com]
> Sent: January 25, 2018 10:08 AM
> To: Randall S. Becker ; git@vger.kernel.org
> Subject: Re: pushing a delete-only commit consumes too much traffic
> 
> > Were the 60Mb of jars previously pushed in a commit that already existed
> on the upstream?
> yes
> 
> > Was the delete an actual removal of history or did you commit with the jars
> deleted, then pushed?
> I committed with the jars deleted
> 
> > Did you do a merge squash or delete branch to effect the removal.
> No
> 
> 
> 
> 
> On 25.01.2018 17:24, Randall S. Becker wrote:
> > On January 25, 2018 9:15 AM, Basin Ilya wrote:
> >
> >> I had a 60Mb worth of unneeded jar files in the project. I created a
> >> new branch and performed `git rm` on them. Now while I was pushing
> >> the change the counter of sent data reached 80Mb. Why is that?
> >
> > Can you provide more info? Were the 60Mb of jars previously pushed in a
> commit that already existed on the upstream? Was the delete an actual
> removal of history or did you commit with the jars deleted, then pushed? Did
> you do a merge squash or delete branch to effect the removal. More info
> please.
> >
> > Cheers,
> > Randall
> >
> > -- Brief whoami:
> >   NonStop developer since approximately NonStop(2112884442)
> >   UNIX developer since approximately 421664400
> > -- In my real life, I talk too much.
> >
> >
> >

Re: [PATCH] merge: support --strategy '?' for git-completion.bash

2018-01-25 Thread Duy Nguyen

On Thu, Jan 25, 2018 at 10:49:45AM -0800, Junio C Hamano wrote:
> Nguyễn Thái Ngọc Duy   writes:
> 
> > Bash completion support gets the list of available strategies with a
> > grep and sed trick which does not work on non-C locale since the anchor
> > string is translated and it does not cover custom strategies either.
> >
> > Let's do it a better way and make git-merge provide all available
> > strategies in machine-readable form.
> >
> > Signed-off-by: Nguyễn Thái Ngọc Duy 
> > ---
> >  Another, perhaps better, option is add "git merge --list-strategies".
> >  It requires some code movement, so I'll try with a simpler approach
> >  first.
> 
> If you run the probing "merge -s help" under C locale, that would
> just be a one-liner, no ;-)  I.e.

That was my first choice but I was worried that it failed to catch
custom strategies which are preceded with a different anchor string
"Available custom strategies are:".

I didn't look carefully at those sed magic. But it looks like it
correctly handles this case too. So v2 follows below. It still feels
dirty to do this kind of text manipulation though. But that can wait.

-- 8< --
Subject: [PATCH] completion: fix completing merge strategies on non-C locales

The anchor string "Available strategies are:" is translatable so
__git_list_merge_strategies may fail to collect available strategies
from 'git merge' on non-C locales. Force C locale on this command.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 contrib/completion/git-completion.bash | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 3683c772c5..88813e9124 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -594,7 +594,7 @@ __git_is_configured_remote ()
 
 __git_list_merge_strategies ()
 {
-   git merge -s help 2>&1 |
+   LANG=C LC_ALL=C git merge -s help 2>&1 |
sed -n -e '/[Aa]vailable strategies are: /,/^$/{
s/\.$//
s/.*://
-- 
2.16.1.196.g4f5118c359



-- 8< --

Re: git credential-osxkeychain bug/feature?

2018-01-25 Thread Jeff King

On Thu, Jan 25, 2018 at 12:16:33PM -0500, Robert Leach wrote:

> But perhaps access to KA from a remote ssh session is restricted for
> security reasons?  I'm just curious I suppose.  Should/can this work?

It's definitely not a restriction from the osxkeychain credential
helper. I believe that MacOS only unlocks the keychain for console
logins by default.

Googling around I turned up this[1]:

  security unlock-keychain -p  ~/Library/Keychains/login.keychain

but I don't even have a mac these days to see if that still works. And
according to [2] you may have to fiddle with auto-confirmation. Good
luck. :)

-Peff

[1] 
https://superuser.com/questions/270095/when-i-ssh-into-os-x-i-dont-have-my-keychain-when-i-use-terminal-i-do

[2] 
https://apple.stackexchange.com/questions/178139/how-can-i-access-the-keychain-remotely-from-the-command-line

Re: [RFC PATCH 0/1] Implement CMake build

2018-01-25 Thread Isaac Hier

Hi Jeff,

I have been looking at the build generator, which looks promising, but
I have one concern. Assuming I can generate a CMakeLists.txt that
appropriately updates the library sources, etc. how do you suggest I
handle new portability macros? For example, assume someone adds a
macro HAVE_X to indicate the availability of some platform-specific
function x. In the current Makefile, a comment would be added to the
top indicating when HAVE_X or NO_X should be set, and that option
would toggle the HAVE_X C macro. But CMake can test for the
availability of x, which is one of the main motives for adding a CMake
build. The current build generator uses the output of make, so all it
would know is whether or not HAVE_X is defined on the platform that
ran the Makefile, but not the entire list of platform that git
supports.

Bottom line: should I add the portability tests as they are now,
without accounting for future portability macros? One good alternative
might be to suggest the authors of new portability macros include a
small sample C program to test it. That would allow me to easily patch
the CMake tests whenever that came up. In a best case scenario, a
practice could be established to write the test in a specific
directory with a certain name so that I could automatically update the
CMake tests from the build generator.

Thanks for the help,

Isaac

On Wed, Jan 24, 2018 at 4:00 PM, Jeff Hostetler  wrote:
>
>
> On 1/24/2018 2:59 PM, Isaac Hier wrote:
>>
>> Jeff, no worries, fair enough. I know https://github.com/grpc/grpc
>> uses a shared file to generate code for several build systems instead
>> of maintaining them individually. I plan on doing the work anyway just
>> because I have my own reasons to use CMake in Git (for packaging in
>> https://github.com/ruslo/hunter is my main motive here). Whether or
>> not it is maintained upstream is not a real concern for me at the
>> moment.
>
> [...]
>>
>> I'll see how the Windows build currently works and if that makes
>> sense, maybe I'll try using that build generator here too.
>>
>> Thanks for the feedback,
>>
>> Isaac
>
>
> Look at the "vcxproj:" target in config.mak.uname (in the GfW repo).
>
> Jeff

[PATCH v2 21/27] fetch-pack: perform a fetch using v2

2018-01-25 Thread Brandon Williams

When communicating with a v2 server, perform a fetch by requesting the
'fetch' command.

Signed-off-by: Brandon Williams 
---
 builtin/fetch-pack.c   |   2 +-
 fetch-pack.c   | 277 -
 fetch-pack.h   |   4 +-
 t/t5702-protocol-v2.sh |  40 +++
 transport.c|   8 +-
 5 files changed, 324 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index f492e8abd..867dd3cc7 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -213,7 +213,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
}
 
ref = fetch_pack(, fd, conn, ref, dest, sought, nr_sought,
-, pack_lockfile_ptr);
+, pack_lockfile_ptr, protocol_v0);
if (pack_lockfile) {
printf("lock %s\n", pack_lockfile);
fflush(stdout);
diff --git a/fetch-pack.c b/fetch-pack.c
index 9f6b07ad9..17927ae99 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1008,6 +1008,272 @@ static struct ref *do_fetch_pack(struct fetch_pack_args 
*args,
return ref;
 }
 
+static void add_wants(const struct ref *wants, struct strbuf *req_buf)
+{
+   for ( ; wants ; wants = wants->next) {
+   const struct object_id *remote = >old_oid;
+   const char *remote_hex;
+   struct object *o;
+
+   /*
+* If that object is complete (i.e. it is an ancestor of a
+* local ref), we tell them we have it but do not have to
+* tell them about its ancestors, which they already know
+* about.
+*
+* We use lookup_object here because we are only
+* interested in the case we *know* the object is
+* reachable and we have already scanned it.
+*/
+   if (((o = lookup_object(remote->hash)) != NULL) &&
+   (o->flags & COMPLETE)) {
+   continue;
+   }
+
+   remote_hex = oid_to_hex(remote);
+   packet_buf_write(req_buf, "want %s\n", remote_hex);
+   }
+}
+
+static int add_haves(struct strbuf *req_buf, int *in_vain)
+{
+   int ret = 0;
+   int haves_added = 0;
+   const struct object_id *oid;
+
+   while ((oid = get_rev())) {
+   packet_buf_write(req_buf, "have %s\n", oid_to_hex(oid));
+   if (++haves_added >= INITIAL_FLUSH)
+   break;
+   };
+
+   *in_vain += haves_added;
+   if (!haves_added || *in_vain >= MAX_IN_VAIN) {
+   /* Send Done */
+   packet_buf_write(req_buf, "done\n");
+   ret = 1;
+   }
+
+   return ret;
+}
+
+static int send_haves(int fd_out, int *in_vain)
+{
+   int ret = 0;
+   struct strbuf req_buf = STRBUF_INIT;
+
+   ret = add_haves(_buf, in_vain);
+
+   /* Send request */
+   packet_buf_flush(_buf);
+   write_or_die(fd_out, req_buf.buf, req_buf.len);
+
+   strbuf_release(_buf);
+   return ret;
+}
+
+static int send_fetch_request(int fd_out, const struct fetch_pack_args *args,
+ const struct ref *wants, struct oidset *common,
+ int *in_vain)
+{
+   int ret = 0;
+   struct strbuf req_buf = STRBUF_INIT;
+
+   packet_buf_write(_buf, "command=fetch");
+   packet_buf_write(_buf, "agent=%s", git_user_agent_sanitized());
+   if (args->stateless_rpc)
+   packet_buf_write(_buf, "stateless-rpc=true");
+
+   packet_buf_delim(_buf);
+   if (args->use_thin_pack)
+   packet_buf_write(_buf, "thin-pack");
+   if (args->no_progress)
+   packet_buf_write(_buf, "no-progress");
+   if (args->include_tag)
+   packet_buf_write(_buf, "include-tag");
+   if (prefer_ofs_delta)
+   packet_buf_write(_buf, "ofs-delta");
+
+   /* add wants */
+   add_wants(wants, _buf);
+
+   /*
+* If we are running stateless-rpc we need to add all the common
+* commits we've found in previous rounds
+*/
+   if (args->stateless_rpc) {
+   struct oidset_iter iter;
+   const struct object_id *oid;
+   oidset_iter_init(common, );
+
+   while ((oid = oidset_iter_next())) {
+   packet_buf_write(_buf, "have %s\n", 
oid_to_hex(oid));
+   }
+   }
+
+   /* Add initial haves */
+   ret = add_haves(_buf, in_vain);
+
+   /* Send request */
+   packet_buf_flush(_buf);
+   write_or_die(fd_out, req_buf.buf, req_buf.len);
+
+   strbuf_release(_buf);
+   return ret;
+}
+
+/*
+ * Processes a section header in a server's response and checks if it matches
+ * `section`.  If the value of `peek` is 1, the header line will be peeked (and
+ * not consumed); if 0, the line

[PATCH v2 11/27] test-pkt-line: introduce a packet-line test helper

2018-01-25 Thread Brandon Williams

Introduce a packet-line test helper which can either pack or unpack an
input stream into packet-lines and writes out the result to stdout.

Signed-off-by: Brandon Williams 
---
 Makefile |  1 +
 t/helper/test-pkt-line.c | 62 
 2 files changed, 63 insertions(+)
 create mode 100644 t/helper/test-pkt-line.c

diff --git a/Makefile b/Makefile
index b7ccc05fa..3b849c060 100644
--- a/Makefile
+++ b/Makefile
@@ -669,6 +669,7 @@ TEST_PROGRAMS_NEED_X += test-mktemp
 TEST_PROGRAMS_NEED_X += test-online-cpus
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+TEST_PROGRAMS_NEED_X += test-pkt-line
 TEST_PROGRAMS_NEED_X += test-prio-queue
 TEST_PROGRAMS_NEED_X += test-read-cache
 TEST_PROGRAMS_NEED_X += test-write-cache
diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
new file mode 100644
index 0..5df32b4cb
--- /dev/null
+++ b/t/helper/test-pkt-line.c
@@ -0,0 +1,62 @@
+#include "pkt-line.h"
+
+static void pack_line(const char *line)
+{
+   if (!strcmp(line, "") || !strcmp(line, "\n"))
+   packet_flush(1);
+   else if (!strcmp(line, "0001") || !strcmp(line, "0001\n"))
+   packet_delim(1);
+   else
+   packet_write_fmt(1, "%s", line);
+}
+
+static void pack(int argc, const char **argv)
+{
+   if (argc) { /* read from argv */
+   int i;
+   for (i = 0; i < argc; i++)
+   pack_line(argv[i]);
+   } else { /* read from stdin */
+   char line[LARGE_PACKET_MAX];
+   while (fgets(line, sizeof(line), stdin)) {
+   pack_line(line);
+   }
+   }
+}
+
+static void unpack(void)
+{
+   struct packet_reader reader;
+   packet_reader_init(, 0, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF |
+  PACKET_READ_CHOMP_NEWLINE);
+
+   while (packet_reader_read() != PACKET_READ_EOF) {
+   switch (reader.status) {
+   case PACKET_READ_EOF:
+   break;
+   case PACKET_READ_NORMAL:
+   printf("%s\n", reader.line);
+   break;
+   case PACKET_READ_FLUSH:
+   printf("\n");
+   break;
+   case PACKET_READ_DELIM:
+   printf("0001\n");
+   break;
+   }
+   }
+}
+
+int cmd_main(int argc, const char **argv)
+{
+   if (argc < 2)
+   die("too few arguments");
+
+   if (!strcmp(argv[1], "pack"))
+   pack(argc - 2, argv + 2);
+   else if (!strcmp(argv[1], "unpack"))
+   unpack();
+
+   return 0;
+}
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 20/27] upload-pack: introduce fetch server command

2018-01-25 Thread Brandon Williams

Introduce the 'fetch' server command.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt | 121 +
 serve.c |   2 +
 t/t5701-git-serve.sh|   1 +
 upload-pack.c   | 293 
 upload-pack.h   |   5 +
 5 files changed, 422 insertions(+)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 4683d41ac..ca09a2cfe 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -147,3 +147,124 @@ The output of ls-refs is as follows:
 ref-attribute = (symref | peeled)
 symref = "symref-target:" symref-target
 peeled = "peeled:" obj-id
+
+ fetch
+---
+
+`fetch` is the command used to fetch a packfile in v2.  It can be looked
+at as a modified version of the v1 fetch where the ref-advertisement is
+stripped out (since the `ls-refs` command fills that role) and the
+message format is tweaked to eliminate redundancies and permit easy
+addition of future extensions.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features, e.g.  "=
+".
+
+A `fetch` request can take the following parameters wrapped in
+packet-lines:
+
+want 
+   Indicates to the server an object which the client wants to
+   retrieve.
+
+have 
+   Indicates to the server an object which the client has locally.
+   This allows the server to make a packfile which only contains
+   the objects that the client needs. Multiple 'have' lines can be
+   supplied.
+
+done
+   Indicates to the server that negotiation should terminate (or
+   not even begin if performing a clone) and that the server should
+   use the information supplied in the request to construct the
+   packfile.
+
+thin-pack
+   Request that a thin pack be sent, which is a pack with deltas
+   which reference base objects not contained within the pack (but
+   are known to exist at the receiving end). This can reduce the
+   network traffic significantly, but it requires the receiving end
+   to know how to "thicken" these packs by adding the missing bases
+   to the pack.
+
+no-progress
+   Request that progress information that would normally be sent on
+   side-band channel 2, during the packfile transfer, should not be
+   sent.  However, the side-band channel 3 is still used for error
+   responses.
+
+include-tag
+   Request that annotated tags should be sent if the objects they
+   point to are being sent.
+
+ofs-delta
+   Indicate that the client understands PACKv2 with delta referring
+   to its base by position in pack rather than by an oid.  That is,
+   they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
+
+The response of `fetch` is broken into a number of sections separated by
+delimiter packets (0001), with each section beginning with its section
+header.
+
+output = *section
+section = (acknowledgments | packfile)
+ (flush-pkt | delim-pkt)
+
+acknowledgments = PKT-LINE("acknowledgments" LF)
+ *(ready | nak | ack)
+ready = PKT-LINE("ready" LF)
+nak = PKT-LINE("NAK" LF)
+ack = PKT-LINE("ACK" SP obj-id LF)
+
+packfile = PKT-LINE("packfile" LF)
+  [PACKFILE]
+
+
+acknowledgments section
+   * Always begins with the section header "acknowledgments"
+
+   * The server will respond with "NAK" if none of the object ids sent
+ as have lines were common.
+
+   * The server will respond with "ACK obj-id" for all of the
+ object ids sent as have lines which are common.
+
+   * A response cannot have both "ACK" lines as well as a "NAK"
+ line.
+
+   * The server will respond with a "ready" line indicating that
+ the server has found an acceptable common base and is ready to
+ make and send a packfile (which will be found in the packfile
+ section of the same response)
+
+   * If the client determines that it is finished with negotiations
+ by sending a "done" line, the acknowledgments sections can be
+ omitted from the server's response as an optimization.
+
+   * If the server has found a suitable cut point and has decided
+ to send a "ready" line, then the server can decide to (as an
+ optimization) omit any "ACK" lines it would have sent during
+ its response.  This is because the server will have already
+ determined the objects it plans to send to the client and no
+ further negotiation is needed.
+
+
+packfile section
+   * Always begins with the section header "packfile"
+
+   * The transmission of the packfile begins

[PATCH v2 05/27] upload-pack: factor out processing lines

2018-01-25 Thread Brandon Williams

Factor out the logic for processing shallow, deepen, deepen_since, and
deepen_not lines into their own functions to simplify the
'receive_needs()' function in addition to making it easier to reuse some
of this logic when implementing protocol_v2.

Signed-off-by: Brandon Williams 
---
 upload-pack.c | 113 ++
 1 file changed, 74 insertions(+), 39 deletions(-)

diff --git a/upload-pack.c b/upload-pack.c
index 2ad73a98b..42d83d5b1 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -724,6 +724,75 @@ static void deepen_by_rev_list(int ac, const char **av,
packet_flush(1);
 }
 
+static int process_shallow(const char *line, struct object_array *shallows)
+{
+   const char *arg;
+   if (skip_prefix(line, "shallow ", )) {
+   struct object_id oid;
+   struct object *object;
+   if (get_oid_hex(arg, ))
+   die("invalid shallow line: %s", line);
+   object = parse_object();
+   if (!object)
+   return 1;
+   if (object->type != OBJ_COMMIT)
+   die("invalid shallow object %s", oid_to_hex());
+   if (!(object->flags & CLIENT_SHALLOW)) {
+   object->flags |= CLIENT_SHALLOW;
+   add_object_array(object, NULL, shallows);
+   }
+   return 1;
+   }
+
+   return 0;
+}
+
+static int process_deepen(const char *line, int *depth)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen ", )) {
+   char *end = NULL;
+   *depth = (int) strtol(arg, , 0);
+   if (!end || *end || *depth <= 0)
+   die("Invalid deepen: %s", line);
+   return 1;
+   }
+
+   return 0;
+}
+
+static int process_deepen_since(const char *line, timestamp_t *deepen_since, 
int *deepen_rev_list)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen-since ", )) {
+   char *end = NULL;
+   *deepen_since = parse_timestamp(arg, , 0);
+   if (!end || *end || !deepen_since ||
+   /* revisions.c's max_age -1 is special */
+   *deepen_since == -1)
+   die("Invalid deepen-since: %s", line);
+   *deepen_rev_list = 1;
+   return 1;
+   }
+   return 0;
+}
+
+static int process_deepen_not(const char *line, struct string_list 
*deepen_not, int *deepen_rev_list)
+{
+   const char *arg;
+   if (skip_prefix(line, "deepen-not ", )) {
+   char *ref = NULL;
+   struct object_id oid;
+   if (expand_ref(arg, strlen(arg), , ) != 1)
+   die("git upload-pack: ambiguous deepen-not: %s", line);
+   string_list_append(deepen_not, ref);
+   free(ref);
+   *deepen_rev_list = 1;
+   return 1;
+   }
+   return 0;
+}
+
 static void receive_needs(void)
 {
struct object_array shallows = OBJECT_ARRAY_INIT;
@@ -745,49 +814,15 @@ static void receive_needs(void)
if (!line)
break;
 
-   if (skip_prefix(line, "shallow ", )) {
-   struct object_id oid;
-   struct object *object;
-   if (get_oid_hex(arg, ))
-   die("invalid shallow line: %s", line);
-   object = parse_object();
-   if (!object)
-   continue;
-   if (object->type != OBJ_COMMIT)
-   die("invalid shallow object %s", 
oid_to_hex());
-   if (!(object->flags & CLIENT_SHALLOW)) {
-   object->flags |= CLIENT_SHALLOW;
-   add_object_array(object, NULL, );
-   }
+   if (process_shallow(line, ))
continue;
-   }
-   if (skip_prefix(line, "deepen ", )) {
-   char *end = NULL;
-   depth = strtol(arg, , 0);
-   if (!end || *end || depth <= 0)
-   die("Invalid deepen: %s", line);
+   if (process_deepen(line, ))
continue;
-   }
-   if (skip_prefix(line, "deepen-since ", )) {
-   char *end = NULL;
-   deepen_since = parse_timestamp(arg, , 0);
-   if (!end || *end || !deepen_since ||
-   /* revisions.c's max_age -1 is special */
-   deepen_since == -1)
-   die("Invalid deepen-since: %s", line);
-   deepen_rev_list = 1;
+   if (process_deepen_since(line, _since, _rev_list))

[PATCH v2 27/27] remote-curl: implement stateless-connect command

2018-01-25 Thread Brandon Williams

Teach remote-curl the 'stateless-connect' command which is used to
establish a stateless connection with servers which support protocol
version 2.  This allows remote-curl to act as a proxy, allowing the git
client to communicate natively with a remote end, simply using
remote-curl as a pass through to convert requests to http.

Signed-off-by: Brandon Williams 
---
 remote-curl.c  | 185 -
 t/t5702-protocol-v2.sh |  41 +++
 2 files changed, 224 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 4086aa733..a17c7e228 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -171,6 +171,7 @@ struct discovery {
size_t len;
struct ref *refs;
struct oid_array shallow;
+   enum protocol_version version;
unsigned proto_git : 1;
 };
 static struct discovery *last_discovery;
@@ -184,9 +185,13 @@ static struct ref *parse_git_refs(struct discovery *heads, 
int for_push)
   PACKET_READ_CHOMP_NEWLINE |
   PACKET_READ_GENTLE_ON_EOF);
 
-   switch (discover_version()) {
+   heads->version = discover_version();
+   switch (heads->version) {
case protocol_v2:
-   die("support for protocol v2 not implemented yet");
+   /*
+* Do nothing.  Client should run 'stateless-connect' and
+* request the refs themselves.
+*/
break;
case protocol_v1:
case protocol_v0:
@@ -1047,6 +1052,178 @@ static void parse_push(struct strbuf *buf)
free(specs);
 }
 
+struct proxy_state {
+   char *service_name;
+   char *service_url;
+   struct curl_slist *headers;
+   struct strbuf request_buffer;
+   int in;
+   int out;
+   struct packet_reader reader;
+   size_t pos;
+   int seen_flush;
+};
+
+static void proxy_state_init(struct proxy_state *p, const char *service_name)
+{
+   struct strbuf buf = STRBUF_INIT;
+
+   memset(p, 0, sizeof(*p));
+   p->service_name = xstrdup(service_name);
+
+   p->in = 0;
+   p->out = 1;
+   strbuf_init(>request_buffer, 0);
+
+   strbuf_addf(, "%s%s", url.buf, p->service_name);
+   p->service_url = strbuf_detach(, NULL);
+
+   p->headers = http_copy_default_headers();
+
+   strbuf_addf(, "Content-Type: application/x-%s-request", 
p->service_name);
+   p->headers = curl_slist_append(p->headers, buf.buf);
+   strbuf_reset();
+
+   strbuf_addf(, "Accept: application/x-%s-result", p->service_name);
+   p->headers = curl_slist_append(p->headers, buf.buf);
+
+   p->headers = curl_slist_append(p->headers, "Transfer-Encoding: 
chunked");
+
+   packet_reader_init(>reader, p->in, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF);
+
+   strbuf_release();
+}
+
+static void proxy_state_clear(struct proxy_state *p)
+{
+   free(p->service_name);
+   free(p->service_url);
+   curl_slist_free_all(p->headers);
+   strbuf_release(>request_buffer);
+}
+
+static size_t proxy_in(char *buffer, size_t eltsize,
+  size_t nmemb, void *userdata)
+{
+   size_t max = eltsize * nmemb;
+   struct proxy_state *p = userdata;
+   size_t avail = p->request_buffer.len - p->pos;
+
+   if (!avail) {
+   if (p->seen_flush) {
+   p->seen_flush = 0;
+   return 0;
+   }
+
+   strbuf_reset(>request_buffer);
+   switch (packet_reader_read(>reader)) {
+   case PACKET_READ_EOF:
+   die("unexpected EOF when reading from parent process");
+   case PACKET_READ_NORMAL:
+   packet_buf_write_len(>request_buffer, p->reader.line,
+p->reader.pktlen);
+   break;
+   case PACKET_READ_DELIM:
+   packet_buf_delim(>request_buffer);
+   break;
+   case PACKET_READ_FLUSH:
+   packet_buf_flush(>request_buffer);
+   p->seen_flush = 1;
+   break;
+   }
+   p->pos = 0;
+   avail = p->request_buffer.len;
+   }
+
+   if (max < avail)
+   avail = max;
+   memcpy(buffer, p->request_buffer.buf + p->pos, avail);
+   p->pos += avail;
+   return avail;
+}
+
+static size_t proxy_out(char *buffer, size_t eltsize,
+   size_t nmemb, void *userdata)
+{
+   size_t size = eltsize * nmemb;
+   struct proxy_state *p = userdata;
+
+   write_or_die(p->out, buffer, size);
+   return size;
+}
+
+static int proxy_post(struct proxy_state *p)
+{
+   struct active_request_slot *slot;
+   int err;
+
+   slot = get_active_slot();
+
+   curl_easy_setopt(slot->curl,

[PATCH v2 12/27] serve: introduce git-serve

2018-01-25 Thread Brandon Williams

Introduce git-serve, the base server for protocol version 2.

Protocol version 2 is intended to be a replacement for Git's current
wire protocol.  The intention is that it will be a simpler, less
wasteful protocol which can evolve over time.

Protocol version 2 improves upon version 1 by eliminating the initial
ref advertisement.  In its place a server will export a list of
capabilities and commands which it supports in a capability
advertisement.  A client can then request that a particular command be
executed by providing a number of capabilities and command specific
parameters.  At the completion of a command, a client can request that
another command be executed or can terminate the connection by sending a
flush packet.

Signed-off-by: Brandon Williams 
---
 .gitignore  |   1 +
 Documentation/technical/protocol-v2.txt | 117 +++
 Makefile|   2 +
 builtin.h   |   1 +
 builtin/serve.c |  30 
 git.c   |   1 +
 serve.c | 249 
 serve.h |  15 ++
 t/t5701-git-serve.sh|  56 +++
 9 files changed, 472 insertions(+)
 create mode 100644 Documentation/technical/protocol-v2.txt
 create mode 100644 builtin/serve.c
 create mode 100644 serve.c
 create mode 100644 serve.h
 create mode 100755 t/t5701-git-serve.sh

diff --git a/.gitignore b/.gitignore
index 833ef3b0b..2d0450c26 100644
--- a/.gitignore
+++ b/.gitignore
@@ -140,6 +140,7 @@
 /git-rm
 /git-send-email
 /git-send-pack
+/git-serve
 /git-sh-i18n
 /git-sh-i18n--envsubst
 /git-sh-setup
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
new file mode 100644
index 0..7f619a76c
--- /dev/null
+++ b/Documentation/technical/protocol-v2.txt
@@ -0,0 +1,117 @@
+ Git Wire Protocol, Version 2
+==
+
+This document presents a specification for a version 2 of Git's wire
+protocol.  Protocol v2 will improve upon v1 in the following ways:
+
+  * Instead of multiple service names, multiple commands will be
+supported by a single service.
+  * Easily extendable as capabilities are moved into their own section
+of the protocol, no longer being hidden behind a NUL byte and
+limited by the size of a pkt-line (as there will be a single
+capability per pkt-line).
+  * Separate out other information hidden behind NUL bytes (e.g. agent
+string as a capability and symrefs can be requested using 'ls-refs')
+  * Reference advertisement will be omitted unless explicitly requested
+  * ls-refs command to explicitly request some refs
+
+ Detailed Design
+=
+
+A client can request to speak protocol v2 by sending `version=2` in the
+side-channel `GIT_PROTOCOL` in the initial request to the server.
+
+In protocol v2 communication is command oriented.  When first contacting a
+server a list of capabilities will advertised.  Some of these capabilities
+will be commands which a client can request be executed.  Once a command
+has completed, a client can reuse the connection and request that other
+commands be executed.
+
+ Special Packets
+-
+
+In protocol v2 these special packets will have the following semantics:
+
+  * '' Flush Packet (flush-pkt) - indicates the end of a message
+  * '0001' Delimiter Packet (delim-pkt) - separates sections of a message
+
+ Capability Advertisement
+--
+
+A server which decides to communicate (based on a request from a client)
+using protocol version 2, notifies the client by sending a version string
+in its initial response followed by an advertisement of its capabilities.
+Each capability is a key with an optional value.  Clients must ignore all
+unknown keys.  Semantics of unknown values are left to the definition of
+each key.  Some capabilities will describe commands which can be requested
+to be executed by the client.
+
+capability-advertisement = protocol-version
+  capability-list
+  flush-pkt
+
+protocol-version = PKT-LINE("version 2" LF)
+capability-list = *capability
+capability = PKT-LINE(key[=value] LF)
+
+key = 1*CHAR
+value = 1*CHAR
+CHAR = 1*(ALPHA / DIGIT / "-" / "_")
+
+A client then responds to select the command it wants with any particular
+capabilities or arguments.  There is then an optional section where the
+client can provide any command specific parameters or queries.
+
+command-request = command
+ capability-list
+ (command-args)
+ flush-pkt
+command = PKT-LINE("command=" key LF)
+command-args = delim-pkt
+  *arg
+arg = 1*CHAR
+
+The server will then check to ensure that the client's request is
+comprised of a

[PATCH v2 10/27] protocol: introduce enum protocol_version value protocol_v2

2018-01-25 Thread Brandon Williams

Introduce protocol_v2, a new value for 'enum protocol_version'.
Subsequent patches will fill in the implementation of protocol_v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch-pack.c   | 3 +++
 builtin/receive-pack.c | 6 ++
 builtin/send-pack.c| 3 +++
 builtin/upload-pack.c  | 7 +++
 connect.c  | 3 +++
 protocol.c | 2 ++
 protocol.h | 1 +
 remote-curl.c  | 3 +++
 transport.c| 9 +
 9 files changed, 37 insertions(+)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 85d4faf76..f492e8abd 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -201,6 +201,9 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version()) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
case protocol_v0:
get_remote_heads(, , 0, NULL, );
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index b7ce7c7f5..3656e94fd 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1963,6 +1963,12 @@ int cmd_receive_pack(int argc, const char **argv, const 
char *prefix)
unpack_limit = receive_unpack_limit;
 
switch (determine_protocol_version_server()) {
+   case protocol_v2:
+   /*
+* push support for protocol v2 has not been implemented yet,
+* so ignore the request to use v2 and fallback to using v0.
+*/
+   break;
case protocol_v1:
/*
 * v1 is just the original protocol with a version string,
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 83cb125a6..b5427f75e 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -263,6 +263,9 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version()) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
case protocol_v0:
get_remote_heads(, _refs, REF_NORMAL,
diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 2cb5cb35b..8d53e9794 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -47,6 +47,13 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
die("'%s' does not appear to be a git repository", dir);
 
switch (determine_protocol_version_server()) {
+   case protocol_v2:
+   /*
+* fetch support for protocol v2 has not been implemented yet,
+* so ignore the request to use v2 and fallback to using v0.
+*/
+   upload_pack();
+   break;
case protocol_v1:
/*
 * v1 is just the original protocol with a version string,
diff --git a/connect.c b/connect.c
index db3c9d24c..f2157a821 100644
--- a/connect.c
+++ b/connect.c
@@ -84,6 +84,9 @@ enum protocol_version discover_version(struct packet_reader 
*reader)
 
/* Maybe process capabilities here, at least for v2 */
switch (version) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
/* Read the peeked version line */
packet_reader_read(reader);
diff --git a/protocol.c b/protocol.c
index 43012b7eb..5e636785d 100644
--- a/protocol.c
+++ b/protocol.c
@@ -8,6 +8,8 @@ static enum protocol_version parse_protocol_version(const char 
*value)
return protocol_v0;
else if (!strcmp(value, "1"))
return protocol_v1;
+   else if (!strcmp(value, "2"))
+   return protocol_v2;
else
return protocol_unknown_version;
 }
diff --git a/protocol.h b/protocol.h
index 1b2bc94a8..2ad35e433 100644
--- a/protocol.h
+++ b/protocol.h
@@ -5,6 +5,7 @@ enum protocol_version {
protocol_unknown_version = -1,
protocol_v0 = 0,
protocol_v1 = 1,
+   protocol_v2 = 2,
 };
 
 /*
diff --git a/remote-curl.c b/remote-curl.c
index 9f6d07683..dae8a4a48 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -185,6 +185,9 @@ static struct ref *parse_git_refs(struct discovery *heads, 
int for_push)
   PACKET_READ_GENTLE_ON_EOF);
 
switch (discover_version()) {
+   case protocol_v2:
+   die("support for protocol v2 not implemented yet");
+   break;
case protocol_v1:
case protocol_v0:
get_remote_heads(, , for_push ? REF_NORMAL : 0,
diff --git a/transport.c b/transport.c
index 2378dcb38..83d9dd1df 100644
--- a/transport.c
+++ b/transport.c
@@ -203,6 +203,9 @@

[PATCH v2 25/27] pkt-line: add packet_buf_write_len function

2018-01-25 Thread Brandon Williams

Add the 'packet_buf_write_len()' function which allows for writing an
arbitrary length buffer into a 'struct strbuf' and formatting it in
packet-line format.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 16 
 pkt-line.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 726e109ca..5a8a17ecc 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -215,6 +215,22 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, 
...)
va_end(args);
 }
 
+void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len)
+{
+   size_t orig_len, n;
+
+   orig_len = buf->len;
+   strbuf_addstr(buf, "");
+   strbuf_add(buf, data, len);
+   n = buf->len - orig_len;
+
+   if (n > LARGE_PACKET_MAX)
+   die("protocol error: impossibly long line");
+
+   set_packet_header(>buf[orig_len], n);
+   packet_trace(buf->buf + orig_len + 4, n - 4, 1);
+}
+
 int write_packetized_from_fd(int fd_in, int fd_out)
 {
static char buf[LARGE_PACKET_DATA_MAX];
diff --git a/pkt-line.h b/pkt-line.h
index 16fe8bdbf..63724d4bf 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -26,6 +26,7 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_delim(struct strbuf *buf);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len);
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int write_packetized_from_fd(int fd_in, int fd_out);
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 18/27] fetch: pass ref patterns when fetching

2018-01-25 Thread Brandon Williams

Construct a list of ref patterns to be passed to
'transport_get_remote_refs()' from the refspec to be used during the
fetch.  This list of ref patterns will be used to allow the server to
filter the ref advertisement when communicating using protocol v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 850382f55..8128450bf 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -332,11 +332,21 @@ static struct ref *get_ref_map(struct transport 
*transport,
struct ref *rm;
struct ref *ref_map = NULL;
struct ref **tail = _map;
+   struct argv_array ref_patterns = ARGV_ARRAY_INIT;
 
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
 
-   const struct ref *remote_refs = transport_get_remote_refs(transport, 
NULL);
+   const struct ref *remote_refs;
+
+   for (i = 0; i < refspec_count; i++) {
+   if (!refspecs[i].exact_sha1)
+   argv_array_push(_patterns, refspecs[i].src);
+   }
+
+   remote_refs = transport_get_remote_refs(transport, _patterns);
+
+   argv_array_clear(_patterns);
 
if (refspec_count) {
struct refspec *fetch_refspec;
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 17/27] ls-remote: pass ref patterns when requesting a remote's refs

2018-01-25 Thread Brandon Williams

Construct an argv_array of the ref patterns supplied via the command
line and pass them to 'transport_get_remote_refs()' to be used when
communicating protocol v2 so that the server can limit the ref
advertisement based on the supplied patterns.

Signed-off-by: Brandon Williams 
---
 builtin/ls-remote.c| 7 +--
 t/t5702-protocol-v2.sh | 8 
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index c6e9847c5..caf1051f3 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -43,6 +43,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
int show_symref_target = 0;
const char *uploadpack = NULL;
const char **pattern = NULL;
+   struct argv_array ref_patterns = ARGV_ARRAY_INIT;
 
struct remote *remote;
struct transport *transport;
@@ -74,8 +75,10 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (argc > 1) {
int i;
pattern = xcalloc(argc, sizeof(const char *));
-   for (i = 1; i < argc; i++)
+   for (i = 1; i < argc; i++) {
pattern[i - 1] = xstrfmt("*/%s", argv[i]);
+   argv_array_push(_patterns, argv[i]);
+   }
}
 
remote = remote_get(dest);
@@ -96,7 +99,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (uploadpack != NULL)
transport_set_option(transport, TRANS_OPT_UPLOADPACK, 
uploadpack);
 
-   ref = transport_get_remote_refs(transport, NULL);
+   ref = transport_get_remote_refs(transport, _patterns);
if (transport_disconnect(transport))
return 1;
 
diff --git a/t/t5702-protocol-v2.sh b/t/t5702-protocol-v2.sh
index 4bf4d61ac..7d8aeb766 100755
--- a/t/t5702-protocol-v2.sh
+++ b/t/t5702-protocol-v2.sh
@@ -25,4 +25,12 @@ test_expect_success 'list refs with file:// using protocol 
v2' '
test_cmp actual expect
 '
 
+test_expect_success 'ref advertisment is filtered with ls-remote using 
protocol v2' '
+   GIT_TRACE_PACKET=1 git -c protocol.version=2 \
+   ls-remote "file://$(pwd)/file_parent" master 2>log &&
+
+   grep "ref-pattern master" log &&
+   ! grep "refs/tags/" log
+'
+
 test_done
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 19/27] push: pass ref patterns when pushing

2018-01-25 Thread Brandon Williams

Construct a list of ref patterns to be passed to 'get_refs_list()' from
the refspec to be used during the push.  This list of ref patterns will
be used to allow the server to filter the ref advertisement when
communicating using protocol v2.

Signed-off-by: Brandon Williams 
---
 transport.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/transport.c b/transport.c
index dfc603b36..6ea3905e3 100644
--- a/transport.c
+++ b/transport.c
@@ -1026,11 +1026,26 @@ int transport_push(struct transport *transport,
int porcelain = flags & TRANSPORT_PUSH_PORCELAIN;
int pretend = flags & TRANSPORT_PUSH_DRY_RUN;
int push_ret, ret, err;
+   struct refspec *tmp_rs;
+   struct argv_array ref_patterns = ARGV_ARRAY_INIT;
+   int i;
 
if (check_push_refs(local_refs, refspec_nr, refspec) < 0)
return -1;
 
-   remote_refs = transport->vtable->get_refs_list(transport, 1, 
NULL);
+   tmp_rs = parse_push_refspec(refspec_nr, refspec);
+   for (i = 0; i < refspec_nr; i++) {
+   if (tmp_rs[i].dst)
+   argv_array_push(_patterns, tmp_rs[i].dst);
+   else if (tmp_rs[i].src && !tmp_rs[i].exact_sha1)
+   argv_array_push(_patterns, tmp_rs[i].src);
+   }
+
+   remote_refs = transport->vtable->get_refs_list(transport, 1,
+  _patterns);
+
+   argv_array_clear(_patterns);
+   free_refspec(refspec_nr, tmp_rs);
 
if (flags & TRANSPORT_PUSH_ALL)
match_flags |= MATCH_REFS_ALL;
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 16/27] transport: convert transport_get_remote_refs to take a list of ref patterns

2018-01-25 Thread Brandon Williams

Convert 'transport_get_remote_refs()' to optionally take a list of ref
patterns.

Signed-off-by: Brandon Williams 
---
 builtin/clone.c | 2 +-
 builtin/fetch.c | 4 ++--
 builtin/ls-remote.c | 2 +-
 builtin/remote.c| 2 +-
 transport.c | 7 +--
 transport.h | 3 ++-
 6 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 284651797..6e77d993f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1121,7 +1121,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (transport->smart_options && !deepen)
transport->smart_options->check_self_contained_and_connected = 
1;
 
-   refs = transport_get_remote_refs(transport);
+   refs = transport_get_remote_refs(transport, NULL);
 
if (refs) {
mapped_refs = wanted_peer_refs(refs, refspec);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 7bbcd26fa..850382f55 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -250,7 +250,7 @@ static void find_non_local_tags(struct transport *transport,
struct string_list_item *item = NULL;
 
for_each_ref(add_existing, _refs);
-   for (ref = transport_get_remote_refs(transport); ref; ref = ref->next) {
+   for (ref = transport_get_remote_refs(transport, NULL); ref; ref = 
ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
 
@@ -336,7 +336,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = 
 
-   const struct ref *remote_refs = transport_get_remote_refs(transport);
+   const struct ref *remote_refs = transport_get_remote_refs(transport, 
NULL);
 
if (refspec_count) {
struct refspec *fetch_refspec;
diff --git a/builtin/ls-remote.c b/builtin/ls-remote.c
index c4be98ab9..c6e9847c5 100644
--- a/builtin/ls-remote.c
+++ b/builtin/ls-remote.c
@@ -96,7 +96,7 @@ int cmd_ls_remote(int argc, const char **argv, const char 
*prefix)
if (uploadpack != NULL)
transport_set_option(transport, TRANS_OPT_UPLOADPACK, 
uploadpack);
 
-   ref = transport_get_remote_refs(transport);
+   ref = transport_get_remote_refs(transport, NULL);
if (transport_disconnect(transport))
return 1;
 
diff --git a/builtin/remote.c b/builtin/remote.c
index d95bf904c..d0b6ff6e2 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -862,7 +862,7 @@ static int get_remote_ref_states(const char *name,
if (query) {
transport = transport_get(states->remote, 
states->remote->url_nr > 0 ?
states->remote->url[0] : NULL);
-   remote_refs = transport_get_remote_refs(transport);
+   remote_refs = transport_get_remote_refs(transport, NULL);
transport_disconnect(transport);
 
states->queried = 1;
diff --git a/transport.c b/transport.c
index c54a44630..dfc603b36 100644
--- a/transport.c
+++ b/transport.c
@@ -1136,10 +1136,13 @@ int transport_push(struct transport *transport,
return 1;
 }
 
-const struct ref *transport_get_remote_refs(struct transport *transport)
+const struct ref *transport_get_remote_refs(struct transport *transport,
+   const struct argv_array 
*ref_patterns)
 {
if (!transport->got_remote_refs) {
-   transport->remote_refs = 
transport->vtable->get_refs_list(transport, 0, NULL);
+   transport->remote_refs =
+   transport->vtable->get_refs_list(transport, 0,
+ref_patterns);
transport->got_remote_refs = 1;
}
 
diff --git a/transport.h b/transport.h
index 731c78b67..4b656f315 100644
--- a/transport.h
+++ b/transport.h
@@ -178,7 +178,8 @@ int transport_push(struct transport *connection,
   int refspec_nr, const char **refspec, int flags,
   unsigned int * reject_reasons);
 
-const struct ref *transport_get_remote_refs(struct transport *transport);
+const struct ref *transport_get_remote_refs(struct transport *transport,
+   const struct argv_array 
*ref_patterns);
 
 int transport_fetch_refs(struct transport *transport, struct ref *refs);
 void transport_unlock_pack(struct transport *transport);
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 24/27] transport-helper: introduce stateless-connect

2018-01-25 Thread Brandon Williams

Introduce the transport-helper capability 'stateless-connect'.  This
capability indicates that the transport-helper can be requested to run
the 'stateless-connect' command which should attempt to make a
stateless connection with a remote end.  Once established, the
connection can be used by the git client to communicate with
the remote end natively in a stateless-rpc manner as supported by
protocol v2.  This means that the client must send everything the server
needs in a single request as the client must not assume any
state-storing on the part of the server or transport.

If a stateless connection cannot be established then the remote-helper
will respond in the same manner as the 'connect' command indicating that
the client should fallback to using the dumb remote-helper commands.

Signed-off-by: Brandon Williams 
---
 transport-helper.c | 8 
 transport.c| 1 +
 transport.h| 6 ++
 3 files changed, 15 insertions(+)

diff --git a/transport-helper.c b/transport-helper.c
index c032a2a87..82eb57c4a 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -26,6 +26,7 @@ struct helper_data {
option : 1,
push : 1,
connect : 1,
+   stateless_connect : 1,
signed_tags : 1,
check_connectivity : 1,
no_disconnect_req : 1,
@@ -188,6 +189,8 @@ static struct child_process *get_helper(struct transport 
*transport)
refspecs[refspec_nr++] = xstrdup(arg);
} else if (!strcmp(capname, "connect")) {
data->connect = 1;
+   } else if (!strcmp(capname, "stateless-connect")) {
+   data->stateless_connect = 1;
} else if (!strcmp(capname, "signed-tags")) {
data->signed_tags = 1;
} else if (skip_prefix(capname, "export-marks ", )) {
@@ -612,6 +615,11 @@ static int process_connect_service(struct transport 
*transport,
if (data->connect) {
strbuf_addf(, "connect %s\n", name);
ret = run_connect(transport, );
+   } else if (data->stateless_connect) {
+   strbuf_addf(, "stateless-connect %s\n", name);
+   ret = run_connect(transport, );
+   if (ret)
+   transport->stateless_rpc = 1;
}
 
strbuf_release();
diff --git a/transport.c b/transport.c
index 4fdbd9adc..aafb8fbb4 100644
--- a/transport.c
+++ b/transport.c
@@ -250,6 +250,7 @@ static int fetch_refs_via_pack(struct transport *transport,
data->options.check_self_contained_and_connected;
args.cloning = transport->cloning;
args.update_shallow = data->options.update_shallow;
+   args.stateless_rpc = transport->stateless_rpc;
 
if (!data->got_remote_heads)
refs_tmp = get_refs_via_connect(transport, 0, NULL);
diff --git a/transport.h b/transport.h
index 4b656f315..9eac809ee 100644
--- a/transport.h
+++ b/transport.h
@@ -55,6 +55,12 @@ struct transport {
 */
unsigned cloning : 1;
 
+   /*
+* Indicates that the transport is connected via a half-duplex
+* connection and should operate in stateless-rpc mode.
+*/
+   unsigned stateless_rpc : 1;
+
/*
 * These strings will be passed to the {pre, post}-receive hook,
 * on the remote side, if both sides support the push options 
capability.
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 26/27] remote-curl: create copy of the service name

2018-01-25 Thread Brandon Williams

Make a copy of the service name being requested instead of relying on
the buffer pointed to by the passed in 'const char *' to remain
unchanged.

Signed-off-by: Brandon Williams 
---
 remote-curl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index dae8a4a48..4086aa733 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -165,7 +165,7 @@ static int set_option(const char *name, const char *value)
 }
 
 struct discovery {
-   const char *service;
+   char *service;
char *buf_alloc;
char *buf;
size_t len;
@@ -257,6 +257,7 @@ static void free_discovery(struct discovery *d)
free(d->shallow.oid);
free(d->buf_alloc);
free_refs(d->refs);
+   free(d->service);
free(d);
}
 }
@@ -343,7 +344,7 @@ static struct discovery *discover_refs(const char *service, 
int for_push)
warning(_("redirecting to %s"), url.buf);
 
last= xcalloc(1, sizeof(*last_discovery));
-   last->service = service;
+   last->service = xstrdup(service);
last->buf_alloc = strbuf_detach(, >len);
last->buf = last->buf_alloc;
 
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 14/27] connect: request remote refs using v2

2018-01-25 Thread Brandon Williams

Teach the client to be able to request a remote's refs using protocol
v2.  This is done by having a client issue a 'ls-refs' request to a v2
server.

Signed-off-by: Brandon Williams 
---
 builtin/upload-pack.c  |  10 ++--
 connect.c  | 123 -
 remote.h   |   4 ++
 t/t5702-protocol-v2.sh |  28 +++
 transport.c|   2 +-
 5 files changed, 160 insertions(+), 7 deletions(-)
 create mode 100755 t/t5702-protocol-v2.sh

diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
index 8d53e9794..a757df8da 100644
--- a/builtin/upload-pack.c
+++ b/builtin/upload-pack.c
@@ -5,6 +5,7 @@
 #include "parse-options.h"
 #include "protocol.h"
 #include "upload-pack.h"
+#include "serve.h"
 
 static const char * const upload_pack_usage[] = {
N_("git upload-pack [] "),
@@ -16,6 +17,7 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
const char *dir;
int strict = 0;
struct upload_pack_options opts = { 0 };
+   struct serve_options serve_opts = SERVE_OPTIONS_INIT;
struct option options[] = {
OPT_BOOL(0, "stateless-rpc", _rpc,
 N_("quit after a single request/response exchange")),
@@ -48,11 +50,9 @@ int cmd_upload_pack(int argc, const char **argv, const char 
*prefix)
 
switch (determine_protocol_version_server()) {
case protocol_v2:
-   /*
-* fetch support for protocol v2 has not been implemented yet,
-* so ignore the request to use v2 and fallback to using v0.
-*/
-   upload_pack();
+   serve_opts.advertise_capabilities = opts.advertise_refs;
+   serve_opts.stateless_rpc = opts.stateless_rpc;
+   serve(_opts);
break;
case protocol_v1:
/*
diff --git a/connect.c b/connect.c
index f2157a821..3c653b65b 100644
--- a/connect.c
+++ b/connect.c
@@ -12,9 +12,11 @@
 #include "sha1-array.h"
 #include "transport.h"
 #include "strbuf.h"
+#include "version.h"
 #include "protocol.h"
 
 static char *server_capabilities;
+static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;
 static const char *parse_feature_value(const char *, const char *, int *);
 
 static int check_ref(const char *name, unsigned int flags)
@@ -62,6 +64,33 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
+/* Checks if the server supports the capability 'c' */
+static int server_supports_v2(const char *c, int die_on_error)
+{
+   int i;
+
+   for (i = 0; i < server_capabilities_v2.argc; i++) {
+   const char *out;
+   if (skip_prefix(server_capabilities_v2.argv[i], c, ) &&
+   (!*out || *out == '='))
+   return 1;
+   }
+
+   if (die_on_error)
+   die("server doesn't support '%s'", c);
+
+   return 0;
+}
+
+static void process_capabilities_v2(struct packet_reader *reader)
+{
+   while (packet_reader_read(reader) == PACKET_READ_NORMAL)
+   argv_array_push(_capabilities_v2, reader->line);
+
+   if (reader->status != PACKET_READ_FLUSH)
+   die("protocol error");
+}
+
 enum protocol_version discover_version(struct packet_reader *reader)
 {
enum protocol_version version = protocol_unknown_version;
@@ -85,7 +114,7 @@ enum protocol_version discover_version(struct packet_reader 
*reader)
/* Maybe process capabilities here, at least for v2 */
switch (version) {
case protocol_v2:
-   die("support for protocol v2 not implemented yet");
+   process_capabilities_v2(reader);
break;
case protocol_v1:
/* Read the peeked version line */
@@ -293,6 +322,98 @@ struct ref **get_remote_heads(struct packet_reader *reader,
return list;
 }
 
+static int process_ref_v2(const char *line, struct ref ***list)
+{
+   int ret = 1;
+   int i = 0;
+   struct object_id old_oid;
+   struct ref *ref;
+   struct string_list line_sections = STRING_LIST_INIT_DUP;
+
+   if (string_list_split(_sections, line, ' ', -1) < 2) {
+   ret = 0;
+   goto out;
+   }
+
+   if (get_oid_hex(line_sections.items[i++].string, _oid)) {
+   ret = 0;
+   goto out;
+   }
+
+   ref = alloc_ref(line_sections.items[i++].string);
+
+   oidcpy(>old_oid, _oid);
+   **list = ref;
+   *list = >next;
+
+   for (; i < line_sections.nr; i++) {
+   const char *arg = line_sections.items[i].string;
+   if (skip_prefix(arg, "symref-target:", ))
+   ref->symref = xstrdup(arg);
+
+   if (skip_prefix(arg, "peeled:", )) {
+   struct object_id peeled_oid;
+   char *peeled_name;
+

[PATCH v2 22/27] transport-helper: remove name parameter

2018-01-25 Thread Brandon Williams

Commit 266f1fdfa (transport-helper: be quiet on read errors from
helpers, 2013-06-21) removed a call to 'die()' which printed the name of
the remote helper passed in to the 'recvline_fh()' function using the
'name' parameter.  Once the call to 'die()' was removed the parameter
was no longer necessary but wasn't removed.  Clean up 'recvline_fh()'
parameter list by removing the 'name' parameter.

Signed-off-by: Brandon Williams 
---
 transport-helper.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 4c334b5ee..d72155768 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -49,7 +49,7 @@ static void sendline(struct helper_data *helper, struct 
strbuf *buffer)
die_errno("Full write to remote helper failed");
 }
 
-static int recvline_fh(FILE *helper, struct strbuf *buffer, const char *name)
+static int recvline_fh(FILE *helper, struct strbuf *buffer)
 {
strbuf_reset(buffer);
if (debug)
@@ -67,7 +67,7 @@ static int recvline_fh(FILE *helper, struct strbuf *buffer, 
const char *name)
 
 static int recvline(struct helper_data *helper, struct strbuf *buffer)
 {
-   return recvline_fh(helper->out, buffer, helper->name);
+   return recvline_fh(helper->out, buffer);
 }
 
 static void write_constant(int fd, const char *str)
@@ -586,7 +586,7 @@ static int process_connect_service(struct transport 
*transport,
goto exit;
 
sendline(data, );
-   if (recvline_fh(input, , name))
+   if (recvline_fh(input, ))
exit(128);
 
if (!strcmp(cmdbuf.buf, "")) {
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 13/27] ls-refs: introduce ls-refs server command

2018-01-25 Thread Brandon Williams

Introduce the ls-refs server command.  In protocol v2, the ls-refs
command is used to request the ref advertisement from the server.  Since
it is a command which can be requested (as opposed to mandatory in v1),
a client can sent a number of parameters in its request to limit the ref
advertisement based on provided ref-patterns.

Signed-off-by: Brandon Williams 
---
 Documentation/technical/protocol-v2.txt |  32 +
 Makefile|   1 +
 ls-refs.c   |  96 ++
 ls-refs.h   |   9 +++
 serve.c |   2 +
 t/t5701-git-serve.sh| 115 
 6 files changed, 255 insertions(+)
 create mode 100644 ls-refs.c
 create mode 100644 ls-refs.h

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 7f619a76c..4683d41ac 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -115,3 +115,35 @@ complete it (multiple for negotiation during fetch or no 
additional
 trips in the case of ls-refs).  If the client sends the `stateless-rpc`
 capability with a value of `true` (in the form `stateless-rpc=true`)
 then the invoked command must only last a single round.
+
+ ls-refs
+-
+
+`ls-refs` is the command used to request a reference advertisement in v2.
+Unlike the current reference advertisement, ls-refs takes in parameters
+which can be used to limit the refs sent from the server.
+
+Additional features not supported in the base command will be advertised
+as the value of the command in the capability advertisement in the form
+of a space separated list of features, e.g.  "=
+".
+
+ls-refs takes in the following parameters wrapped in packet-lines:
+
+symrefs
+   In addition to the object pointed by it, show the underlying ref
+   pointed by it when showing a symbolic ref.
+peel
+   Show peeled tags.
+ref-pattern 
+   When specified, only references matching the one of the provided
+   patterns are displayed.
+
+The output of ls-refs is as follows:
+
+output = *ref
+flush-pkt
+ref = PKT-LINE(obj-id SP refname *(SP ref-attribute) LF)
+ref-attribute = (symref | peeled)
+symref = "symref-target:" symref-target
+peeled = "peeled:" obj-id
diff --git a/Makefile b/Makefile
index 18c255428..e50927cfb 100644
--- a/Makefile
+++ b/Makefile
@@ -825,6 +825,7 @@ LIB_OBJS += list-objects-filter-options.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
+LIB_OBJS += ls-refs.o
 LIB_OBJS += mailinfo.o
 LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
diff --git a/ls-refs.c b/ls-refs.c
new file mode 100644
index 0..70682b4f7
--- /dev/null
+++ b/ls-refs.c
@@ -0,0 +1,96 @@
+#include "cache.h"
+#include "repository.h"
+#include "refs.h"
+#include "remote.h"
+#include "argv-array.h"
+#include "ls-refs.h"
+#include "pkt-line.h"
+
+struct ls_refs_data {
+   unsigned peel;
+   unsigned symrefs;
+   struct argv_array patterns;
+};
+
+/*
+ * Check if one of the patterns matches the tail part of the ref.
+ * If no patterns were provided, all refs match.
+ */
+static int ref_match(const struct argv_array *patterns, const char *refname)
+{
+   char *pathbuf;
+   int i;
+
+   if (!patterns->argc)
+   return 1; /* no restriction */
+
+   pathbuf = xstrfmt("/%s", refname);
+   for (i = 0; i < patterns->argc; i++) {
+   if (!wildmatch(patterns->argv[i], pathbuf, 0)) {
+   free(pathbuf);
+   return 1;
+   }
+   }
+   free(pathbuf);
+   return 0;
+}
+
+static int send_ref(const char *refname, const struct object_id *oid,
+   int flag, void *cb_data)
+{
+   struct ls_refs_data *data = cb_data;
+   const char *refname_nons = strip_namespace(refname);
+   struct strbuf refline = STRBUF_INIT;
+
+   if (!ref_match(>patterns, refname))
+   return 0;
+
+   strbuf_addf(, "%s %s", oid_to_hex(oid), refname_nons);
+   if (data->symrefs && flag & REF_ISSYMREF) {
+   struct object_id unused;
+   const char *symref_target = resolve_ref_unsafe(refname, 0,
+  ,
+  );
+
+   if (!symref_target)
+   die("'%s' is a symref but it is not?", refname);
+
+   strbuf_addf(, " symref-target:%s", symref_target);
+   }
+
+   if (data->peel) {
+   struct object_id peeled;
+   if (!peel_ref(refname, ))
+   strbuf_addf(, " peeled:%s", 
oid_to_hex());
+   }
+
+   strbuf_addch(, '\n');
+   packet_write(1, refline.buf, refline.len);
+
+   strbuf_release();
+   return 0;
+}
+

[PATCH v2 23/27] transport-helper: refactor process_connect_service

2018-01-25 Thread Brandon Williams

A future patch will need to take advantage of the logic which runs and
processes the response of the connect command on a remote helper so
factor out this logic from 'process_connect_service()' and place it into
a helper function 'run_connect()'.

Signed-off-by: Brandon Williams 
---
 transport-helper.c | 67 +++---
 1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index d72155768..c032a2a87 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -545,14 +545,13 @@ static int fetch_with_import(struct transport *transport,
return 0;
 }
 
-static int process_connect_service(struct transport *transport,
-  const char *name, const char *exec)
+static int run_connect(struct transport *transport, struct strbuf *cmdbuf)
 {
struct helper_data *data = transport->data;
-   struct strbuf cmdbuf = STRBUF_INIT;
-   struct child_process *helper;
-   int r, duped, ret = 0;
+   int ret = 0;
+   int duped;
FILE *input;
+   struct child_process *helper;
 
helper = get_helper(transport);
 
@@ -568,44 +567,54 @@ static int process_connect_service(struct transport 
*transport,
input = xfdopen(duped, "r");
setvbuf(input, NULL, _IONBF, 0);
 
+   sendline(data, cmdbuf);
+   if (recvline_fh(input, cmdbuf))
+   exit(128);
+
+   if (!strcmp(cmdbuf->buf, "")) {
+   data->no_disconnect_req = 1;
+   if (debug)
+   fprintf(stderr, "Debug: Smart transport connection "
+   "ready.\n");
+   ret = 1;
+   } else if (!strcmp(cmdbuf->buf, "fallback")) {
+   if (debug)
+   fprintf(stderr, "Debug: Falling back to dumb "
+   "transport.\n");
+   } else {
+   die("Unknown response to connect: %s",
+   cmdbuf->buf);
+   }
+
+   fclose(input);
+   return ret;
+}
+
+static int process_connect_service(struct transport *transport,
+  const char *name, const char *exec)
+{
+   struct helper_data *data = transport->data;
+   struct strbuf cmdbuf = STRBUF_INIT;
+   int ret = 0;
+
/*
 * Handle --upload-pack and friends. This is fire and forget...
 * just warn if it fails.
 */
if (strcmp(name, exec)) {
-   r = set_helper_option(transport, "servpath", exec);
+   int r = set_helper_option(transport, "servpath", exec);
if (r > 0)
warning("Setting remote service path not supported by 
protocol.");
else if (r < 0)
warning("Invalid remote service path.");
}
 
-   if (data->connect)
+   if (data->connect) {
strbuf_addf(, "connect %s\n", name);
-   else
-   goto exit;
-
-   sendline(data, );
-   if (recvline_fh(input, ))
-   exit(128);
-
-   if (!strcmp(cmdbuf.buf, "")) {
-   data->no_disconnect_req = 1;
-   if (debug)
-   fprintf(stderr, "Debug: Smart transport connection "
-   "ready.\n");
-   ret = 1;
-   } else if (!strcmp(cmdbuf.buf, "fallback")) {
-   if (debug)
-   fprintf(stderr, "Debug: Falling back to dumb "
-   "transport.\n");
-   } else
-   die("Unknown response to connect: %s",
-   cmdbuf.buf);
+   ret = run_connect(transport, );
+   }
 
-exit:
strbuf_release();
-   fclose(input);
return ret;
 }
 
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 15/27] transport: convert get_refs_list to take a list of ref patterns

2018-01-25 Thread Brandon Williams

Convert the 'struct transport' virtual function 'get_refs_list()' to
optionally take an argv_array of ref patterns.  When communicating with
a server using protocol v2 these ref patterns can be sent when
requesting a listing of their refs allowing the server to filter the
refs it sends based on the sent patterns.

Signed-off-by: Brandon Williams 
---
 transport-helper.c   |  5 +++--
 transport-internal.h |  4 +++-
 transport.c  | 16 +---
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 508015023..4c334b5ee 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1026,7 +1026,8 @@ static int has_attribute(const char *attrs, const char 
*attr) {
}
 }
 
-static struct ref *get_refs_list(struct transport *transport, int for_push)
+static struct ref *get_refs_list(struct transport *transport, int for_push,
+const struct argv_array *ref_patterns)
 {
struct helper_data *data = transport->data;
struct child_process *helper;
@@ -1039,7 +1040,7 @@ static struct ref *get_refs_list(struct transport 
*transport, int for_push)
 
if (process_connect(transport, for_push)) {
do_take_over(transport);
-   return transport->vtable->get_refs_list(transport, for_push);
+   return transport->vtable->get_refs_list(transport, for_push, 
ref_patterns);
}
 
if (data->push && for_push)
diff --git a/transport-internal.h b/transport-internal.h
index 3c1a29d72..a67657ce3 100644
--- a/transport-internal.h
+++ b/transport-internal.h
@@ -3,6 +3,7 @@
 
 struct ref;
 struct transport;
+struct argv_array;
 
 struct transport_vtable {
/**
@@ -21,7 +22,8 @@ struct transport_vtable {
 * the ref without a huge amount of effort, it should store it
 * in the ref's old_sha1 field; otherwise it should be all 0.
 **/
-   struct ref *(*get_refs_list)(struct transport *transport, int for_push);
+   struct ref *(*get_refs_list)(struct transport *transport, int for_push,
+const struct argv_array *ref_patterns);
 
/**
 * Fetch the objects for the given refs. Note that this gets
diff --git a/transport.c b/transport.c
index ffc6b2614..c54a44630 100644
--- a/transport.c
+++ b/transport.c
@@ -72,7 +72,7 @@ struct bundle_transport_data {
struct bundle_header header;
 };
 
-static struct ref *get_refs_from_bundle(struct transport *transport, int 
for_push)
+static struct ref *get_refs_from_bundle(struct transport *transport, int 
for_push, const struct argv_array *ref_patterns)
 {
struct bundle_transport_data *data = transport->data;
struct ref *result = NULL;
@@ -189,7 +189,8 @@ static int connect_setup(struct transport *transport, int 
for_push)
return 0;
 }
 
-static struct ref *get_refs_via_connect(struct transport *transport, int 
for_push)
+static struct ref *get_refs_via_connect(struct transport *transport, int 
for_push,
+   const struct argv_array *ref_patterns)
 {
struct git_transport_data *data = transport->data;
struct ref *refs = NULL;
@@ -204,7 +205,8 @@ static struct ref *get_refs_via_connect(struct transport 
*transport, int for_pus
data->version = discover_version();
switch (data->version) {
case protocol_v2:
-   get_remote_refs(data->fd[1], , , for_push, NULL);
+   get_remote_refs(data->fd[1], , , for_push,
+   ref_patterns);
break;
case protocol_v1:
case protocol_v0:
@@ -250,7 +252,7 @@ static int fetch_refs_via_pack(struct transport *transport,
args.update_shallow = data->options.update_shallow;
 
if (!data->got_remote_heads)
-   refs_tmp = get_refs_via_connect(transport, 0);
+   refs_tmp = get_refs_via_connect(transport, 0, NULL);
 
switch (data->version) {
case protocol_v2:
@@ -568,7 +570,7 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
int ret = 0;
 
if (!data->got_remote_heads)
-   get_refs_via_connect(transport, 1);
+   get_refs_via_connect(transport, 1, NULL);
 
memset(, 0, sizeof(args));
args.send_mirror = !!(flags & TRANSPORT_PUSH_MIRROR);
@@ -1028,7 +1030,7 @@ int transport_push(struct transport *transport,
if (check_push_refs(local_refs, refspec_nr, refspec) < 0)
return -1;
 
-   remote_refs = transport->vtable->get_refs_list(transport, 1);
+   remote_refs = transport->vtable->get_refs_list(transport, 1, 
NULL);
 
if (flags & TRANSPORT_PUSH_ALL)
match_flags |= MATCH_REFS_ALL;
@@ -1137,7 +1139,7 @@ int transport_push(struct transport *transport,
 const struct ref

[PATCH v2 00/27] protocol version 2

2018-01-25 Thread Brandon Williams

Changes in v2:
 * Added documentation for fetch
 * changes #defines for state variables to be enums
 * couple code changes to pkt-line functions and documentation
 * Added unit tests for the git-serve binary as well as for ls-refs

Areas for improvement
 * Push isn't implemented, right now this is ok because if v2 is requested the
   server can just default to v0.  Before this can be merged we may want to
   change how the client request a new protocol, and not allow for sending
   "version=2" when pushing even though the user has it configured.  Or maybe
   its fine to just have an older client who doesn't understand how to push
   (and request v2) to die if the server tries to speak v2 at it.

   Fixing this essentially would just require piping through a bit more
   information to the function which ultimately runs connect (for both builtins
   and remote-curl)

 * I want to make sure that the docs are well written before this gets merged
   so I'm hoping that someone can do a through review on the docs themselves to
   make sure they are clear.

 * Right now there is a capability 'stateless-rpc' which essentially makes sure
   that a server command completes after a single round (this is to make sure
   http works cleanly).  After talking with some folks it may make more sense
   to just have v2 be stateless in nature so that all commands terminate after
   a single round trip.  This makes things a bit easier if a server wants to
   have ssh just be a proxy for http.

   One potential thing would be to flip this so that by default the protocol is
   stateless and if a server/command has a state-full mode that can be
   implemented as a capability at a later point.  Thoughts?

 * Shallow repositories and shallow clones aren't supported yet.  I'm working
   on it and it can be either added to v2 by default if people think it needs
   to be in there from the start, or we can add it as a capability at a later
   point.

Series can also be found on on github: 
https://github.com/bmwill/git/tree/protocol-v2

Brandon Williams (27):
  pkt-line: introduce packet_read_with_status
  pkt-line: introduce struct packet_reader
  pkt-line: add delim packet support
  upload-pack: convert to a builtin
  upload-pack: factor out processing lines
  transport: use get_refs_via_connect to get refs
  connect: convert get_remote_heads to use struct packet_reader
  connect: discover protocol version outside of get_remote_heads
  transport: store protocol version
  protocol: introduce enum protocol_version value protocol_v2
  test-pkt-line: introduce a packet-line test helper
  serve: introduce git-serve
  ls-refs: introduce ls-refs server command
  connect: request remote refs using v2
  transport: convert get_refs_list to take a list of ref patterns
  transport: convert transport_get_remote_refs to take a list of ref
patterns
  ls-remote: pass ref patterns when requesting a remote's refs
  fetch: pass ref patterns when fetching
  push: pass ref patterns when pushing
  upload-pack: introduce fetch server command
  fetch-pack: perform a fetch using v2
  transport-helper: remove name parameter
  transport-helper: refactor process_connect_service
  transport-helper: introduce stateless-connect
  pkt-line: add packet_buf_write_len function
  remote-curl: create copy of the service name
  remote-curl: implement stateless-connect command

 .gitignore  |   1 +
 Documentation/technical/protocol-v2.txt | 270 +
 Makefile|   7 +-
 builtin.h   |   2 +
 builtin/clone.c |   2 +-
 builtin/fetch-pack.c|  21 +-
 builtin/fetch.c |  14 +-
 builtin/ls-remote.c |   7 +-
 builtin/receive-pack.c  |   6 +
 builtin/remote.c|   2 +-
 builtin/send-pack.c |  20 +-
 builtin/serve.c |  30 ++
 builtin/upload-pack.c   |  74 +
 connect.c   | 295 ++-
 connect.h   |   3 +
 fetch-pack.c| 277 +-
 fetch-pack.h|   4 +-
 git.c   |   2 +
 ls-refs.c   |  96 ++
 ls-refs.h   |   9 +
 pkt-line.c  | 149 +-
 pkt-line.h  |  77 +
 protocol.c  |   2 +
 protocol.h  |   1 +
 remote-curl.c   | 209 -
 remote.h|   9 +-
 serve.c | 253 
 serve.h |  15 +
 t/helper/test-pkt-line.c|  62 
 t/t5701-git-serve.sh| 172 +++
 t/t5702-protocol-v2.sh

[PATCH v2 02/27] pkt-line: introduce struct packet_reader

2018-01-25 Thread Brandon Williams

Sometimes it is advantageous to be able to peek the next packet line
without consuming it (e.g. to be able to determine the protocol version
a server is speaking).  In order to do that introduce 'struct
packet_reader' which is an abstraction around the normal packet reading
logic.  This enables a caller to be able to peek a single line at a time
using 'packet_reader_peek()' and having a caller consume a line by
calling 'packet_reader_read()'.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 59 +++
 pkt-line.h | 58 ++
 2 files changed, 117 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index af0d2430f..4fc9ad4b0 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -406,3 +406,62 @@ ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf 
*sb_out)
}
return sb_out->len - orig_len;
 }
+
+/* Packet Reader Functions */
+void packet_reader_init(struct packet_reader *reader, int fd,
+   char *src_buffer, size_t src_len,
+   int options)
+{
+   memset(reader, 0, sizeof(*reader));
+
+   reader->fd = fd;
+   reader->src_buffer = src_buffer;
+   reader->src_len = src_len;
+   reader->buffer = packet_buffer;
+   reader->buffer_size = sizeof(packet_buffer);
+   reader->options = options;
+}
+
+enum packet_read_status packet_reader_read(struct packet_reader *reader)
+{
+   if (reader->line_peeked) {
+   reader->line_peeked = 0;
+   return reader->status;
+   }
+
+   reader->status = packet_read_with_status(reader->fd,
+>src_buffer,
+>src_len,
+reader->buffer,
+reader->buffer_size,
+>pktlen,
+reader->options);
+
+   switch (reader->status) {
+   case PACKET_READ_EOF:
+   reader->pktlen = -1;
+   reader->line = NULL;
+   break;
+   case PACKET_READ_NORMAL:
+   reader->line = reader->buffer;
+   break;
+   case PACKET_READ_FLUSH:
+   reader->pktlen = 0;
+   reader->line = NULL;
+   break;
+   }
+
+   return reader->status;
+}
+
+enum packet_read_status packet_reader_peek(struct packet_reader *reader)
+{
+   /* Only allow peeking a single line */
+   if (reader->line_peeked)
+   return reader->status;
+
+   /* Peek a line by reading it and setting peeked flag */
+   packet_reader_read(reader);
+   reader->line_peeked = 1;
+   return reader->status;
+}
diff --git a/pkt-line.h b/pkt-line.h
index 06c468927..7d9f0e537 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -111,6 +111,64 @@ char *packet_read_line_buf(char **src_buf, size_t 
*src_len, int *size);
  */
 ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out);
 
+struct packet_reader {
+   /* source file descriptor */
+   int fd;
+
+   /* source buffer and its size */
+   char *src_buffer;
+   size_t src_len;
+
+   /* buffer that pkt-lines are read into and its size */
+   char *buffer;
+   unsigned buffer_size;
+
+   /* options to be used during reads */
+   int options;
+
+   /* status of the last read */
+   enum packet_read_status status;
+
+   /* length of data read during the last read */
+   int pktlen;
+
+   /* the last line read */
+   const char *line;
+
+   /* indicates if a line has been peeked */
+   int line_peeked;
+};
+
+/*
+ * Initialize a 'struct packet_reader' object which is an
+ * abstraction around the 'packet_read_with_status()' function.
+ */
+extern void packet_reader_init(struct packet_reader *reader, int fd,
+  char *src_buffer, size_t src_len,
+  int options);
+
+/*
+ * Perform a packet read and return the status of the read.
+ * The values of 'pktlen' and 'line' are updated based on the status of the
+ * read as follows:
+ *
+ * PACKET_READ_ERROR: 'pktlen' is set to '-1' and 'line' is set to NULL
+ * PACKET_READ_NORMAL: 'pktlen' is set to the number of bytes read
+ *'line' is set to point at the read line
+ * PACKET_READ_FLUSH: 'pktlen' is set to '0' and 'line' is set to NULL
+ */
+extern enum packet_read_status packet_reader_read(struct packet_reader 
*reader);
+
+/*
+ * Peek the next packet line without consuming it and return the status.
+ * The next call to 'packet_reader_read()' will perform a read of the same line
+ * that was peeked, consuming the line.
+ *
+ * Peeking multiple times without calling 'packet_reader_read()' will return
+ * the same result.
+ */
+extern enum packet_read_status

[PATCH v2 06/27] transport: use get_refs_via_connect to get refs

2018-01-25 Thread Brandon Williams

Remove code duplication and use the existing 'get_refs_via_connect()'
function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
'git_transport_push()'.

Signed-off-by: Brandon Williams 
---
 transport.c | 18 --
 1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/transport.c b/transport.c
index fc802260f..8e8779096 100644
--- a/transport.c
+++ b/transport.c
@@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport *transport,
args.cloning = transport->cloning;
args.update_shallow = data->options.update_shallow;
 
-   if (!data->got_remote_heads) {
-   connect_setup(transport, 0);
-   get_remote_heads(data->fd[0], NULL, 0, _tmp, 0,
-NULL, >shallow);
-   data->got_remote_heads = 1;
-   }
+   if (!data->got_remote_heads)
+   refs_tmp = get_refs_via_connect(transport, 0);
 
refs = fetch_pack(, data->fd, data->conn,
  refs_tmp ? refs_tmp : transport->remote_refs,
@@ -541,14 +537,8 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
struct send_pack_args args;
int ret;
 
-   if (!data->got_remote_heads) {
-   struct ref *tmp_refs;
-   connect_setup(transport, 1);
-
-   get_remote_heads(data->fd[0], NULL, 0, _refs, REF_NORMAL,
-NULL, >shallow);
-   data->got_remote_heads = 1;
-   }
+   if (!data->got_remote_heads)
+   get_refs_via_connect(transport, 1);
 
memset(, 0, sizeof(args));
args.send_mirror = !!(flags & TRANSPORT_PUSH_MIRROR);
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 09/27] transport: store protocol version

2018-01-25 Thread Brandon Williams

Once protocol_v2 is introduced requesting a fetch or a push will need to
be handled differently depending on the protocol version.  Store the
protocol version the server is speaking in 'struct git_transport_data'
and use it to determine what to do in the case of a fetch or a push.

Signed-off-by: Brandon Williams 
---
 transport.c | 35 ++-
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/transport.c b/transport.c
index 63c3dbab9..2378dcb38 100644
--- a/transport.c
+++ b/transport.c
@@ -118,6 +118,7 @@ struct git_transport_data {
struct child_process *conn;
int fd[2];
unsigned got_remote_heads : 1;
+   enum protocol_version version;
struct oid_array extra_have;
struct oid_array shallow;
 };
@@ -200,7 +201,8 @@ static struct ref *get_refs_via_connect(struct transport 
*transport, int for_pus
   PACKET_READ_CHOMP_NEWLINE |
   PACKET_READ_GENTLE_ON_EOF);
 
-   switch (discover_version()) {
+   data->version = discover_version();
+   switch (data->version) {
case protocol_v1:
case protocol_v0:
get_remote_heads(, ,
@@ -221,7 +223,7 @@ static int fetch_refs_via_pack(struct transport *transport,
 {
int ret = 0;
struct git_transport_data *data = transport->data;
-   struct ref *refs;
+   struct ref *refs = NULL;
char *dest = xstrdup(transport->url);
struct fetch_pack_args args;
struct ref *refs_tmp = NULL;
@@ -247,10 +249,18 @@ static int fetch_refs_via_pack(struct transport 
*transport,
if (!data->got_remote_heads)
refs_tmp = get_refs_via_connect(transport, 0);
 
-   refs = fetch_pack(, data->fd, data->conn,
- refs_tmp ? refs_tmp : transport->remote_refs,
- dest, to_fetch, nr_heads, >shallow,
- >pack_lockfile);
+   switch (data->version) {
+   case protocol_v1:
+   case protocol_v0:
+   refs = fetch_pack(, data->fd, data->conn,
+ refs_tmp ? refs_tmp : transport->remote_refs,
+ dest, to_fetch, nr_heads, >shallow,
+ >pack_lockfile);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
+
close(data->fd[0]);
close(data->fd[1]);
if (finish_connect(data->conn))
@@ -549,7 +559,7 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
 {
struct git_transport_data *data = transport->data;
struct send_pack_args args;
-   int ret;
+   int ret = 0;
 
if (!data->got_remote_heads)
get_refs_via_connect(transport, 1);
@@ -574,8 +584,15 @@ static int git_transport_push(struct transport *transport, 
struct ref *remote_re
else
args.push_cert = SEND_PACK_PUSH_CERT_NEVER;
 
-   ret = send_pack(, data->fd, data->conn, remote_refs,
-   >extra_have);
+   switch (data->version) {
+   case protocol_v1:
+   case protocol_v0:
+   ret = send_pack(, data->fd, data->conn, remote_refs,
+   >extra_have);
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
close(data->fd[1]);
close(data->fd[0]);
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 01/27] pkt-line: introduce packet_read_with_status

2018-01-25 Thread Brandon Williams

The current pkt-line API encodes the status of a pkt-line read in the
length of the read content.  An error is indicated with '-1', a flush
with '0' (which can be confusing since a return value of '0' can also
indicate an empty pkt-line), and a positive integer for the length of
the read content otherwise.  This doesn't leave much room for allowing
the addition of additional special packets in the future.

To solve this introduce 'packet_read_with_status()' which reads a packet
and returns the status of the read encoded as an 'enum packet_status'
type.  This allows for easily identifying between special and normal
packets as well as errors.  It also enables easily adding a new special
packet in the future.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 57 +++--
 pkt-line.h | 15 +++
 2 files changed, 58 insertions(+), 14 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 2827ca772..af0d2430f 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -280,28 +280,33 @@ static int packet_length(const char *linelen)
return (val < 0) ? val : (val << 8) | hex2chr(linelen + 2);
 }
 
-int packet_read(int fd, char **src_buf, size_t *src_len,
-   char *buffer, unsigned size, int options)
+enum packet_read_status packet_read_with_status(int fd, char **src_buffer, 
size_t *src_len,
+   char *buffer, unsigned size, 
int *pktlen,
+   int options)
 {
-   int len, ret;
+   int len;
char linelen[4];
 
-   ret = get_packet_data(fd, src_buf, src_len, linelen, 4, options);
-   if (ret < 0)
-   return ret;
+   if (get_packet_data(fd, src_buffer, src_len, linelen, 4, options) < 0)
+   return PACKET_READ_EOF;
+
len = packet_length(linelen);
-   if (len < 0)
+
+   if (len < 0) {
die("protocol error: bad line length character: %.4s", linelen);
-   if (!len) {
+   } else if (!len) {
packet_trace("", 4, 0);
-   return 0;
+   return PACKET_READ_FLUSH;
+   } else if (len < 4) {
+   die("protocol error: bad line length %d", len);
}
+
len -= 4;
-   if (len >= size)
+   if ((unsigned)len >= size)
die("protocol error: bad line length %d", len);
-   ret = get_packet_data(fd, src_buf, src_len, buffer, len, options);
-   if (ret < 0)
-   return ret;
+
+   if (get_packet_data(fd, src_buffer, src_len, buffer, len, options) < 0)
+   return PACKET_READ_EOF;
 
if ((options & PACKET_READ_CHOMP_NEWLINE) &&
len && buffer[len-1] == '\n')
@@ -309,7 +314,31 @@ int packet_read(int fd, char **src_buf, size_t *src_len,
 
buffer[len] = 0;
packet_trace(buffer, len, 0);
-   return len;
+   *pktlen = len;
+   return PACKET_READ_NORMAL;
+}
+
+int packet_read(int fd, char **src_buffer, size_t *src_len,
+   char *buffer, unsigned size, int options)
+{
+   enum packet_read_status status;
+   int pktlen;
+
+   status = packet_read_with_status(fd, src_buffer, src_len,
+buffer, size, ,
+options);
+   switch (status) {
+   case PACKET_READ_EOF:
+   pktlen = -1;
+   break;
+   case PACKET_READ_NORMAL:
+   break;
+   case PACKET_READ_FLUSH:
+   pktlen = 0;
+   break;
+   }
+
+   return pktlen;
 }
 
 static char *packet_read_line_generic(int fd,
diff --git a/pkt-line.h b/pkt-line.h
index 3dad583e2..06c468927 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -65,6 +65,21 @@ int write_packetized_from_buf(const char *src_in, size_t 
len, int fd_out);
 int packet_read(int fd, char **src_buffer, size_t *src_len, char
*buffer, unsigned size, int options);
 
+/*
+ * Read a packetized line into a buffer like the 'packet_read()' function but
+ * returns an 'enum packet_read_status' which indicates the status of the read.
+ * The number of bytes read will be assigined to *pktlen if the status of the
+ * read was 'PACKET_READ_NORMAL'.
+ */
+enum packet_read_status {
+   PACKET_READ_EOF = -1,
+   PACKET_READ_NORMAL,
+   PACKET_READ_FLUSH,
+};
+enum packet_read_status packet_read_with_status(int fd, char **src_buffer, 
size_t *src_len,
+   char *buffer, unsigned size, 
int *pktlen,
+   int options);
+
 /*
  * Convenience wrapper for packet_read that is not gentle, and sets the
  * CHOMP_NEWLINE option. The return value is NULL for a flush packet,
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 08/27] connect: discover protocol version outside of get_remote_heads

2018-01-25 Thread Brandon Williams

In order to prepare for the addition of protocol_v2 push the protocol
version discovery outside of 'get_remote_heads()'.  This will allow for
keeping the logic for processing the reference advertisement for
protocol_v1 and protocol_v0 separate from the logic for protocol_v2.

Signed-off-by: Brandon Williams 
---
 builtin/fetch-pack.c | 16 +++-
 builtin/send-pack.c  | 17 +++--
 connect.c| 27 ++-
 connect.h|  3 +++
 remote-curl.c| 20 ++--
 remote.h |  5 +++--
 transport.c  | 24 +++-
 7 files changed, 83 insertions(+), 29 deletions(-)

diff --git a/builtin/fetch-pack.c b/builtin/fetch-pack.c
index 366b9d13f..85d4faf76 100644
--- a/builtin/fetch-pack.c
+++ b/builtin/fetch-pack.c
@@ -4,6 +4,7 @@
 #include "remote.h"
 #include "connect.h"
 #include "sha1-array.h"
+#include "protocol.h"
 
 static const char fetch_pack_usage[] =
 "git fetch-pack [--all] [--stdin] [--quiet | -q] [--keep | -k] [--thin] "
@@ -52,6 +53,7 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
struct fetch_pack_args args;
struct oid_array shallow = OID_ARRAY_INIT;
struct string_list deepen_not = STRING_LIST_INIT_DUP;
+   struct packet_reader reader;
 
packet_trace_identity("fetch-pack");
 
@@ -193,7 +195,19 @@ int cmd_fetch_pack(int argc, const char **argv, const char 
*prefix)
if (!conn)
return args.diag_url ? 0 : 1;
}
-   get_remote_heads(fd[0], NULL, 0, , 0, NULL, );
+
+   packet_reader_init(, fd[0], NULL, 0,
+  PACKET_READ_CHOMP_NEWLINE |
+  PACKET_READ_GENTLE_ON_EOF);
+
+   switch (discover_version()) {
+   case protocol_v1:
+   case protocol_v0:
+   get_remote_heads(, , 0, NULL, );
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
ref = fetch_pack(, fd, conn, ref, dest, sought, nr_sought,
 , pack_lockfile_ptr);
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index fc4f0bb5f..83cb125a6 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -14,6 +14,7 @@
 #include "sha1-array.h"
 #include "gpg-interface.h"
 #include "gettext.h"
+#include "protocol.h"
 
 static const char * const send_pack_usage[] = {
N_("git send-pack [--all | --mirror] [--dry-run] [--force] "
@@ -154,6 +155,7 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
int progress = -1;
int from_stdin = 0;
struct push_cas_option cas = {0};
+   struct packet_reader reader;
 
struct option options[] = {
OPT__VERBOSITY(),
@@ -256,8 +258,19 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
args.verbose ? CONNECT_VERBOSE : 0);
}
 
-   get_remote_heads(fd[0], NULL, 0, _refs, REF_NORMAL,
-_have, );
+   packet_reader_init(, fd[0], NULL, 0,
+  PACKET_READ_CHOMP_NEWLINE |
+  PACKET_READ_GENTLE_ON_EOF);
+
+   switch (discover_version()) {
+   case protocol_v1:
+   case protocol_v0:
+   get_remote_heads(, _refs, REF_NORMAL,
+_have, );
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
 
transport_verify_remote_names(nr_refspecs, refspecs);
 
diff --git a/connect.c b/connect.c
index 00e90075c..db3c9d24c 100644
--- a/connect.c
+++ b/connect.c
@@ -62,7 +62,7 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
-static enum protocol_version discover_version(struct packet_reader *reader)
+enum protocol_version discover_version(struct packet_reader *reader)
 {
enum protocol_version version = protocol_unknown_version;
 
@@ -234,7 +234,7 @@ enum get_remote_heads_state {
 /*
  * Read all the refs from the other end
  */
-struct ref **get_remote_heads(int in, char *src_buf, size_t src_len,
+struct ref **get_remote_heads(struct packet_reader *reader,
  struct ref **list, unsigned int flags,
  struct oid_array *extra_have,
  struct oid_array *shallow_points)
@@ -242,24 +242,17 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
struct ref **orig_list = list;
int len = 0;
enum get_remote_heads_state state = EXPECTING_FIRST_REF;
-   struct packet_reader reader;
const char *arg;
 
-   packet_reader_init(, in, src_buf, src_len,
-  PACKET_READ_CHOMP_NEWLINE |
-  PACKET_READ_GENTLE_ON_EOF);
-
-   discover_version();
-
*list = NULL;
 
while

[PATCH v2 03/27] pkt-line: add delim packet support

2018-01-25 Thread Brandon Williams

One of the design goals of protocol-v2 is to improve the semantics of
flush packets.  Currently in protocol-v1, flush packets are used both to
indicate a break in a list of packet lines as well as an indication that
one side has finished speaking.  This makes it particularly difficult
to implement proxies as a proxy would need to completely understand git
protocol instead of simply looking for a flush packet.

To do this, introduce the special deliminator packet '0001'.  A delim
packet can then be used as a deliminator between lists of packet lines
while flush packets can be reserved to indicate the end of a response.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 17 +
 pkt-line.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 4fc9ad4b0..726e109ca 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,12 @@ void packet_flush(int fd)
write_or_die(fd, "", 4);
 }
 
+void packet_delim(int fd)
+{
+   packet_trace("0001", 4, 1);
+   write_or_die(fd, "0001", 4);
+}
+
 int packet_flush_gently(int fd)
 {
packet_trace("", 4, 1);
@@ -105,6 +111,12 @@ void packet_buf_flush(struct strbuf *buf)
strbuf_add(buf, "", 4);
 }
 
+void packet_buf_delim(struct strbuf *buf)
+{
+   packet_trace("0001", 4, 1);
+   strbuf_add(buf, "0001", 4);
+}
+
 static void set_packet_header(char *buf, const int size)
 {
static char hexchar[] = "0123456789abcdef";
@@ -297,6 +309,9 @@ enum packet_read_status packet_read_with_status(int fd, 
char **src_buffer, size_
} else if (!len) {
packet_trace("", 4, 0);
return PACKET_READ_FLUSH;
+   } else if (len == 1) {
+   packet_trace("0001", 4, 0);
+   return PACKET_READ_DELIM;
} else if (len < 4) {
die("protocol error: bad line length %d", len);
}
@@ -333,6 +348,7 @@ int packet_read(int fd, char **src_buffer, size_t *src_len,
break;
case PACKET_READ_NORMAL:
break;
+   case PACKET_READ_DELIM:
case PACKET_READ_FLUSH:
pktlen = 0;
break;
@@ -445,6 +461,7 @@ enum packet_read_status packet_reader_read(struct 
packet_reader *reader)
case PACKET_READ_NORMAL:
reader->line = reader->buffer;
break;
+   case PACKET_READ_DELIM:
case PACKET_READ_FLUSH:
reader->pktlen = 0;
reader->line = NULL;
diff --git a/pkt-line.h b/pkt-line.h
index 7d9f0e537..16fe8bdbf 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -20,8 +20,10 @@
  * side can't, we stay with pure read/write interfaces.
  */
 void packet_flush(int fd);
+void packet_delim(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format 
(printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
+void packet_buf_delim(struct strbuf *buf);
 void packet_write(int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
@@ -75,6 +77,7 @@ enum packet_read_status {
PACKET_READ_EOF = -1,
PACKET_READ_NORMAL,
PACKET_READ_FLUSH,
+   PACKET_READ_DELIM,
 };
 enum packet_read_status packet_read_with_status(int fd, char **src_buffer, 
size_t *src_len,
char *buffer, unsigned size, 
int *pktlen,
-- 
2.16.0.rc1.238.g530d649a79-goog

[PATCH v2 04/27] upload-pack: convert to a builtin

2018-01-25 Thread Brandon Williams

In order to allow for code sharing with the server-side of fetch in
protocol-v2 convert upload-pack to be a builtin.

Signed-off-by: Brandon Williams 
---
 Makefile  |   3 +-
 builtin.h |   1 +
 builtin/upload-pack.c |  67 +++
 git.c |   1 +
 upload-pack.c | 107 --
 upload-pack.h |  13 ++
 6 files changed, 109 insertions(+), 83 deletions(-)
 create mode 100644 builtin/upload-pack.c
 create mode 100644 upload-pack.h

diff --git a/Makefile b/Makefile
index 1a9b23b67..b7ccc05fa 100644
--- a/Makefile
+++ b/Makefile
@@ -639,7 +639,6 @@ PROGRAM_OBJS += imap-send.o
 PROGRAM_OBJS += sh-i18n--envsubst.o
 PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
-PROGRAM_OBJS += upload-pack.o
 PROGRAM_OBJS += remote-testsvn.o
 
 # Binary suffix, set to .exe for Windows builds
@@ -909,6 +908,7 @@ LIB_OBJS += tree-diff.o
 LIB_OBJS += tree.o
 LIB_OBJS += tree-walk.o
 LIB_OBJS += unpack-trees.o
+LIB_OBJS += upload-pack.o
 LIB_OBJS += url.o
 LIB_OBJS += urlmatch.o
 LIB_OBJS += usage.o
@@ -1026,6 +1026,7 @@ BUILTIN_OBJS += builtin/update-index.o
 BUILTIN_OBJS += builtin/update-ref.o
 BUILTIN_OBJS += builtin/update-server-info.o
 BUILTIN_OBJS += builtin/upload-archive.o
+BUILTIN_OBJS += builtin/upload-pack.o
 BUILTIN_OBJS += builtin/var.o
 BUILTIN_OBJS += builtin/verify-commit.o
 BUILTIN_OBJS += builtin/verify-pack.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa..f332a1257 100644
--- a/builtin.h
+++ b/builtin.h
@@ -231,6 +231,7 @@ extern int cmd_update_ref(int argc, const char **argv, 
const char *prefix);
 extern int cmd_update_server_info(int argc, const char **argv, const char 
*prefix);
 extern int cmd_upload_archive(int argc, const char **argv, const char *prefix);
 extern int cmd_upload_archive_writer(int argc, const char **argv, const char 
*prefix);
+extern int cmd_upload_pack(int argc, const char **argv, const char *prefix);
 extern int cmd_var(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_commit(int argc, const char **argv, const char *prefix);
 extern int cmd_verify_tag(int argc, const char **argv, const char *prefix);
diff --git a/builtin/upload-pack.c b/builtin/upload-pack.c
new file mode 100644
index 0..2cb5cb35b
--- /dev/null
+++ b/builtin/upload-pack.c
@@ -0,0 +1,67 @@
+#include "cache.h"
+#include "builtin.h"
+#include "exec_cmd.h"
+#include "pkt-line.h"
+#include "parse-options.h"
+#include "protocol.h"
+#include "upload-pack.h"
+
+static const char * const upload_pack_usage[] = {
+   N_("git upload-pack [] "),
+   NULL
+};
+
+int cmd_upload_pack(int argc, const char **argv, const char *prefix)
+{
+   const char *dir;
+   int strict = 0;
+   struct upload_pack_options opts = { 0 };
+   struct option options[] = {
+   OPT_BOOL(0, "stateless-rpc", _rpc,
+N_("quit after a single request/response exchange")),
+   OPT_BOOL(0, "advertise-refs", _refs,
+N_("exit immediately after initial ref 
advertisement")),
+   OPT_BOOL(0, "strict", ,
+N_("do not try /.git/ if  is no 
Git directory")),
+   OPT_INTEGER(0, "timeout", ,
+   N_("interrupt transfer after  seconds of 
inactivity")),
+   OPT_END()
+   };
+
+   packet_trace_identity("upload-pack");
+   check_replace_refs = 0;
+
+   argc = parse_options(argc, argv, NULL, options, upload_pack_usage, 0);
+
+   if (argc != 1)
+   usage_with_options(upload_pack_usage, options);
+
+   if (opts.timeout)
+   opts.daemon_mode = 1;
+
+   setup_path();
+
+   dir = argv[0];
+
+   if (!enter_repo(dir, strict))
+   die("'%s' does not appear to be a git repository", dir);
+
+   switch (determine_protocol_version_server()) {
+   case protocol_v1:
+   /*
+* v1 is just the original protocol with a version string,
+* so just fall through after writing the version string.
+*/
+   if (opts.advertise_refs || !opts.stateless_rpc)
+   packet_write_fmt(1, "version 1\n");
+
+   /* fallthrough */
+   case protocol_v0:
+   upload_pack();
+   break;
+   case protocol_unknown_version:
+   BUG("unknown protocol version");
+   }
+
+   return 0;
+}
diff --git a/git.c b/git.c
index c870b9719..f71073dc8 100644
--- a/git.c
+++ b/git.c
@@ -478,6 +478,7 @@ static struct cmd_struct commands[] = {
{ "update-server-info", cmd_update_server_info, RUN_SETUP },
{ "upload-archive", cmd_upload_archive },
{ "upload-archive--writer", cmd_upload_archive_writer },
+   { "upload-pack", cmd_upload_pack },
{ "var", cmd_var, RUN_SETUP_GENTLY },
{ "verify-commit",

[PATCH v2 07/27] connect: convert get_remote_heads to use struct packet_reader

2018-01-25 Thread Brandon Williams

In order to allow for better control flow when protocol_v2 is introduced
convert 'get_remote_heads()' to use 'struct packet_reader' to read
packet lines.  This enables a client to be able to peek the first line
of a server's response (without consuming it) in order to determine the
protocol version its speaking and then passing control to the
appropriate handler.

This is needed because the initial response from a server speaking
protocol_v0 includes the first ref, while subsequent protocol versions
respond with a version line.  We want to be able to read this first line
without consuming the first ref sent in the protocol_v0 case so that the
protocol version the server is speaking can be determined outside of
'get_remote_heads()' in a future patch.

Signed-off-by: Brandon Williams 
---
 connect.c | 174 ++
 1 file changed, 96 insertions(+), 78 deletions(-)

diff --git a/connect.c b/connect.c
index c3a014c5b..00e90075c 100644
--- a/connect.c
+++ b/connect.c
@@ -48,6 +48,12 @@ int check_ref_type(const struct ref *ref, int flags)
 
 static void die_initial_contact(int unexpected)
 {
+   /*
+* A hang-up after seeing some response from the other end
+* means that it is unexpected, as we know the other end is
+* willing to talk to us.  A hang-up before seeing any
+* response does not necessarily mean an ACL problem, though.
+*/
if (unexpected)
die(_("The remote end hung up upon initial contact"));
else
@@ -56,6 +62,41 @@ static void die_initial_contact(int unexpected)
  "and the repository exists."));
 }
 
+static enum protocol_version discover_version(struct packet_reader *reader)
+{
+   enum protocol_version version = protocol_unknown_version;
+
+   /*
+* Peek the first line of the server's response to
+* determine the protocol version the server is speaking.
+*/
+   switch (packet_reader_peek(reader)) {
+   case PACKET_READ_EOF:
+   die_initial_contact(0);
+   case PACKET_READ_FLUSH:
+   case PACKET_READ_DELIM:
+   version = protocol_v0;
+   break;
+   case PACKET_READ_NORMAL:
+   version = determine_protocol_version_client(reader->line);
+   break;
+   }
+
+   /* Maybe process capabilities here, at least for v2 */
+   switch (version) {
+   case protocol_v1:
+   /* Read the peeked version line */
+   packet_reader_read(reader);
+   break;
+   case protocol_v0:
+   break;
+   case protocol_unknown_version:
+   die("unknown protocol version: '%s'\n", reader->line);
+   }
+
+   return version;
+}
+
 static void parse_one_symref_info(struct string_list *symref, const char *val, 
int len)
 {
char *sym, *target;
@@ -109,60 +150,21 @@ static void annotate_refs_with_symref_info(struct ref 
*ref)
string_list_clear(, 0);
 }
 
-/*
- * Read one line of a server's ref advertisement into packet_buffer.
- */
-static int read_remote_ref(int in, char **src_buf, size_t *src_len,
-  int *responded)
+static void process_capabilities(const char *line, int *len)
 {
-   int len = packet_read(in, src_buf, src_len,
- packet_buffer, sizeof(packet_buffer),
- PACKET_READ_GENTLE_ON_EOF |
- PACKET_READ_CHOMP_NEWLINE);
-   const char *arg;
-   if (len < 0)
-   die_initial_contact(*responded);
-   if (len > 4 && skip_prefix(packet_buffer, "ERR ", ))
-   die("remote error: %s", arg);
-
-   *responded = 1;
-
-   return len;
-}
-
-#define EXPECTING_PROTOCOL_VERSION 0
-#define EXPECTING_FIRST_REF 1
-#define EXPECTING_REF 2
-#define EXPECTING_SHALLOW 3
-
-/* Returns 1 if packet_buffer is a protocol version pkt-line, 0 otherwise. */
-static int process_protocol_version(void)
-{
-   switch (determine_protocol_version_client(packet_buffer)) {
-   case protocol_v1:
-   return 1;
-   case protocol_v0:
-   return 0;
-   default:
-   die("server is speaking an unknown protocol");
-   }
-}
-
-static void process_capabilities(int *len)
-{
-   int nul_location = strlen(packet_buffer);
+   int nul_location = strlen(line);
if (nul_location == *len)
return;
-   server_capabilities = xstrdup(packet_buffer + nul_location + 1);
+   server_capabilities = xstrdup(line + nul_location + 1);
*len = nul_location;
 }
 
-static int process_dummy_ref(void)
+static int process_dummy_ref(const char *line)
 {
struct object_id oid;
const char *name;
 
-   if (parse_oid_hex(packet_buffer, , ))
+   if (parse_oid_hex(line, , ))
return 0;
if (*name != ' ')
return 0;
@@

Re: [PATCH 09/14] packed-graph: implement git-graph --clear

2018-01-25 Thread Stefan Beller

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee  wrote:
> Teach Git to delete the current 'graph_head' file and the packed graph
> it references. This is a good safety valve if somehow the file is
> corrupted and needs to be recalculated. Since the packed graph is a
> summary of contents already in the ODB, it can be regenerated.
>
> Signed-off-by: Derrick Stolee 


>  static int graph_read(void)
>  {
> struct object_id graph_oid;
> @@ -105,6 +130,8 @@ int cmd_graph(int argc, const char **argv, const char 
> *prefix)
> { OPTION_STRING, 'p', "pack-dir", _dir,
> N_("dir"),
> N_("The pack directory to store the graph") },
> +   OPT_BOOL('c', "clear", ,
> +   N_("clear graph file and graph-head")),

Given the docs building up a large list of "Cannot be combined with",
maybe these OPT_BOOLS want to be OPT_CMDMODE ?

Re: [PATCH 06/14] packed-graph: implement git-graph --write

2018-01-25 Thread Stefan Beller

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee  wrote:

> +
> +$ git midx --write

midx?

> +test_done

The tests basically tests that there is no segfault?
Makes sense.

Re: [PATCH 05/14] packed-graph: implement construct_graph()

2018-01-25 Thread Stefan Beller

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee  wrote:

> +struct object_id *construct_graph(const char *pack_dir)
> +{
> +   // Find a list of oids, adding the pointer to a list.

Comment style.
Here is where the bike shedding begins. ;)

Our records

2018-01-25 Thread no reply

Our records show that  your email has reach the storage limit set. You will not 
be able to send or receive messages.
To activate click the link 
https://secureoracleonlineservice.wufoo.com/forms/z8cawps0norajm/ ,and complete 
the information required;
The account must be reactivated today to regenerate new space.
Support Help-desk

Re: [PATCH 00/14] Serialized Commit Graph

2018-01-25 Thread Ævar Arnfjörð Bjarmason


On Thu, Jan 25 2018, Derrick Stolee jotted:

> On 1/25/2018 10:46 AM, Ævar Arnfjörð Bjarmason wrote:
>> On Thu, Jan 25 2018, Derrick Stolee jotted:
>>
>>> * 'git log --topo-order -1000' walks all reachable commits to avoid
>>>incorrect topological orders, but only needs the commit message for
>>>the top 1000 commits.
>>>
>>> * 'git merge-base  ' may walk many commits to find the correct
>>>boundary between the commits reachable from A and those reachable
>>>from B. No commit messages are needed.
>>>
>>> * 'git branch -vv' checks ahead/behind status for all local branches
>>>compared to their upstream remote branches. This is essentially as
>>>hard as computing merge bases for each.
>> This is great, spotted / questions so far:
>>
>> * git graph --blah says you need to enable the config, should say
>>"unknown option --blah ". I.e. overzelous config guard.
>
> This is a good point.
>
>> * On a big repo (git show-ref -s | ~/g/git/git-graph --write
>>--update-head) is as of writing this still hanging for me, but strace
>>shows it's brk()-ing. Presumably just still busy, a progress bar would
>>be very nice.
>
> Oops! This is my mistake. The correct command should be:
>
>  git show-ref -s | git graph --write --update-head --stdin-commits
>
> Without "--stdin-commits" the command will walk all packed objects
> to look for commits and then build the graph. That's why it's taking
> so long. That method takes several minutes on the Linux repo, but with
> --stdin-commits it should take as long as "git log >/dev/null".

Thanks, it took around 15m to finish with the command I initially ran on
my test repo.

Then the `merge-base --is-ancestor` performance problem I was
complaining about in
https://public-inbox.org/git/87608bawoa@evledraar.gmail.com/ takes
around 1s with your series, 5s without it. Nice.

Re: [PATCH 03/14] packed-graph: create git-graph builtin

2018-01-25 Thread Junio C Hamano

Derrick Stolee  writes:

> Teach Git the 'graph' builtin that will be used for writing and
> reading packed graph files. The current implementation is mostly
> empty, except for a check that the core.graph setting is enabled
> and a '--pack-dir' option.

Just to set my expectation straight.

Is it fair to say that in the ideal endgame state, this will be like
"git pack-objects" in that end users won't have to know about it,
but would serve as a crucial building block that is invoked by other
front-end commands that are more familiar to end users (just like
pack-objects are used behind the scenes by repack, push, etc.)?

Re: [PATCH 04/14] packed-graph: add format document

2018-01-25 Thread Junio C Hamano

Stefan Beller  writes:

> The downside of just having one parent or pointer into the edge list
> would be to penalize 25% of the commit lookups with an indirection
> compared to ~0% (the 35 octopus'). I'd rather want to optimize for
> speed than disk size? (4 bytes for 37k is 145kB for git.git, which I
> find is not a lot).

My comment is not about disk size but is about the size of working
set (or "size of array element").

Re: [PATCH 04/14] packed-graph: add format document

2018-01-25 Thread Stefan Beller

On Thu, Jan 25, 2018 at 2:06 PM, Junio C Hamano  wrote:
> Derrick Stolee  writes:
>
>> Add document specifying the binary format for packed graphs. This
>> format allows for:
>>
>> * New versions.
>> * New hash functions and hash lengths.
>> * Optional extensions.
>>
>> Basic header information is followed by a binary table of contents
>> into "chunks" that include:
>>
>> * An ordered list of commit object IDs.
>> * A 256-entry fanout into that list of OIDs.
>> * A list of metadata for the commits.
>> * A list of "large edges" to enable octopus merges.
>>
>> Signed-off-by: Derrick Stolee 
>> ---
>>  Documentation/technical/graph-format.txt | 88 
>> 
>>  1 file changed, 88 insertions(+)
>>  create mode 100644 Documentation/technical/graph-format.txt
>>
>> diff --git a/Documentation/technical/graph-format.txt 
>> b/Documentation/technical/graph-format.txt
>> new file mode 100644
>> index 00..a15e1036d7
>> --- /dev/null
>> +++ b/Documentation/technical/graph-format.txt
>> @@ -0,0 +1,88 @@
>> +Git commit graph format
>> +===
>
> Good that this is not saying "graph format" but is explicit that it
> is about "commit".  Do the same for the previous steps.  Especially,
> builtin/graph.c that does not have much to do with graph.c is not a
> good way forward ;-)
>
> I do like the fact that later parents of octopus merges are moved
> out of way to make the majority of records fixed length, but I am
> not sure if the "up to two parents are recorded in line" is truly
> the best arrangement.  Aren't majority of commits single-parent,
> thereby wasting 4 bytes almost always?

git.git has ~37k non-merge commits and ~12k merge commits,
(35 of them have 3 or more parents).

So 75% would waste the 4 bytes of the second parent.

However the first parent is still there, so any operation that only needs
the first parent (git bisect --first-parent?) would still be fast.
Not sure if that is common.

The downside of just having one parent or pointer into the edge list
would be to penalize 25% of the commit lookups with an indirection
compared to ~0% (the 35 octopus'). I'd rather want to optimize for
speed than disk size? (4 bytes for 37k is 145kB for git.git, which I
find is not a lot).

Re: [PATCH 04/14] packed-graph: add format document

2018-01-25 Thread Stefan Beller

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee  wrote:
> Add document specifying the binary format for packed graphs. This
> format allows for:
>
> * New versions.
> * New hash functions and hash lengths.
> * Optional extensions.
>
> Basic header information is followed by a binary table of contents
> into "chunks" that include:
>
> * An ordered list of commit object IDs.
> * A 256-entry fanout into that list of OIDs.
> * A list of metadata for the commits.
> * A list of "large edges" to enable octopus merges.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/technical/graph-format.txt | 88 
> 

So this is different from Documentation/technical/packed-graph.txt,
which gives high level design and this gives the details on how
to set bits.

>  1 file changed, 88 insertions(+)
>  create mode 100644 Documentation/technical/graph-format.txt
>
> diff --git a/Documentation/technical/graph-format.txt 
> b/Documentation/technical/graph-format.txt
> new file mode 100644
> index 00..a15e1036d7
> --- /dev/null
> +++ b/Documentation/technical/graph-format.txt
> @@ -0,0 +1,88 @@
> +Git commit graph format
> +===
> +
> +The Git commit graph stores a list of commit OIDs and some associated
> +metadata, including:
> +
> +- The generation number of the commit. Commits with no parents have
> +  generation number 1; commits with parents have generation number
> +  one more than the maximum generation number of its parents. We
> +  reserve zero as special, and can be used to mark a generation
> +  number invalid or as "not computed".
> +
> +- The root tree OID.
> +
> +- The commit date.
> +
> +- The parents of the commit, stored using positional references within
> +  the graph file.
> +
> +== graph-*.graph files have the following format:
> +
> +In order to allow extensions that add extra data to the graph, we organize
> +the body into "chunks" and provide a binary lookup table at the beginning
> +of the body. The header includes certain values, such as number of chunks,
> +hash lengths and types.
> +
> +All 4-byte numbers are in network order.
> +
> +HEADER:
> +
> +   4-byte signature:
> +   The signature is: {'C', 'G', 'P', 'H'}
> +
> +   1-byte version number:
> +   Currently, the only valid version is 1.
> +
> +   1-byte Object Id Version (1 = SHA-1)
> +
> +   1-byte Object Id Length (H)

  This is 20 or 40 for sha1 ? (binary or text representation?)

> +   1-byte number (C) of "chunks"
> +
> +CHUNK LOOKUP:
> +
> +   (C + 1) * 12 bytes listing the table of contents for the chunks:
> +   First 4 bytes describe chunk id. Value 0 is a terminating label.
> +   Other 8 bytes provide offset in current file for chunk to start.

... offset [in bytes/words/4k blocks?] in ...


> +   (Chunks are ordered contiguously in the file, so you can infer
> +   the length using the next chunk position if necessary.)
> +
> +   The remaining data in the body is described one chunk at a time, and
> +   these chunks may be given in any order. Chunks are required unless
> +   otherwise specified.

> +
> +CHUNK DATA:
> +
> +   OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
> +   The ith entry, F[i], stores the number of OIDs with first
> +   byte at most i. Thus F[255] stores the total
> +   number of commits (N).

So F[0] > 0 for git.git for example.

Or another way: To lookup a 01xxx, I need to look at
entry(F[00] + 1 )...entry(F[01]).

Makes sense.

> +
> +   OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
> +   The OIDs for all commits in the graph.

... sorted ascending.


> +   Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
> +   * The first H bytes are for the OID of the root tree.
> +   * The next 8 bytes are for the int-ids of the first two parents of
> + the ith commit. Stores value 0x if no parent in that 
> position.
> + If there are more than two parents, the second value has its 
> most-
> + significant bit on and the other bits store an offset into the 
> Large
> + Edge List chunk.

s/an offset into/position in/ ? (otherwise offset in bytes?)

> +   * The next 8 bytes store the generation number of the commit and 
> the
> + commit time in seconds since EPOCH. The generation number uses 
> the
> + higher 30 bits of the first 4 bytes, while the commit time uses 
> the
> + 32 bits of the second 4 bytes, along with the lowest 2 bits of 
> the
> + lowest byte, storing the 33rd and 34th bit of the commit time.

This allows for a maximum generation number of
1.073.741.823 (2^30 -1) = 1 billion,
and a max time stamp of later than 2100.

Do you allow negative time stamps?


> +
> +   [Optional] Large Edge List (ID: {'E', 'D', 'G', 'E'})
> +

Re: [PATCH 04/14] packed-graph: add format document

2018-01-25 Thread Junio C Hamano

Derrick Stolee  writes:

> Add document specifying the binary format for packed graphs. This
> format allows for:
>
> * New versions.
> * New hash functions and hash lengths.
> * Optional extensions.
>
> Basic header information is followed by a binary table of contents
> into "chunks" that include:
>
> * An ordered list of commit object IDs.
> * A 256-entry fanout into that list of OIDs.
> * A list of metadata for the commits.
> * A list of "large edges" to enable octopus merges.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/technical/graph-format.txt | 88 
> 
>  1 file changed, 88 insertions(+)
>  create mode 100644 Documentation/technical/graph-format.txt
>
> diff --git a/Documentation/technical/graph-format.txt 
> b/Documentation/technical/graph-format.txt
> new file mode 100644
> index 00..a15e1036d7
> --- /dev/null
> +++ b/Documentation/technical/graph-format.txt
> @@ -0,0 +1,88 @@
> +Git commit graph format
> +===

Good that this is not saying "graph format" but is explicit that it
is about "commit".  Do the same for the previous steps.  Especially,
builtin/graph.c that does not have much to do with graph.c is not a
good way forward ;-)

I do like the fact that later parents of octopus merges are moved
out of way to make the majority of records fixed length, but I am
not sure if the "up to two parents are recorded in line" is truly
the best arrangement.  Aren't majority of commits single-parent,
thereby wasting 4 bytes almost always?

Will 32-bit stay to be enough for everybody?  Wouldn't it make sense
to at least define them to be indices into arrays (i.e. scaled to
element size), not "offsets", to recover a few lost bits?

What's the point of storing object id length?  If you do not
understand the object ID scheme, knowing only the length would not
do you much good anyway, no?  And if you know the hashing scheme
specified by Object ID version, you already know the length, no?

Re: [PATCH 03/14] packed-graph: create git-graph builtin

2018-01-25 Thread Stefan Beller

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee  wrote:
> Teach Git the 'graph' builtin that will be used for writing and
> reading packed graph files. The current implementation is mostly
> empty, except for a check that the core.graph setting is enabled
> and a '--pack-dir' option.

I wonder if this builtin should not respect the boolean core graph,
as this new builtin commands' whole existence
is to deal with these new files?

As you assume this builtin as a plumbing command, I would
expect it to pay less attention to config rather than more.

> @@ -408,6 +408,7 @@ static struct cmd_struct commands[] = {
> { "fsck-objects", cmd_fsck, RUN_SETUP },
> { "gc", cmd_gc, RUN_SETUP },
> { "get-tar-commit-id", cmd_get_tar_commit_id },
> +   { "graph", cmd_graph, RUN_SETUP_GENTLY },

Why gently, though?

>From reading the docs (and assumptions on further patches)
we'd want to abort if there is no .git dir to be found?

Or is a future patch having manual logic? (e.g. if pack-dir is
given, the command may be invoked from outside a git dir)

Stefan

Re: [PATCH 02/14] packed-graph: add core.graph setting

2018-01-25 Thread Junio C Hamano

Derrick Stolee  writes:

> The packed graph feature is controlled by the new core.graph config
> setting. This defaults to 0, so the feature is opt-in.
>
> The intention of core.graph is that a user can always stop checking
> for or parsing packed graph files if core.graph=0.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/config.txt | 3 +++
>  cache.h  | 1 +
>  config.c | 5 +
>  environment.c| 1 +
>  4 files changed, 10 insertions(+)

Before you get too married to the name "graph", is it reasonable to
assume that the commit ancestry graph is the primary "graph" that
should come to users' minds when a simple word "graph" is used in
the context of discussing Git?  I suspect not.

Let's not just call this "core.graph" and "packed-graph", and in
addition give some adjective before "graph".




> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0e25b2c92b..e7b98fa14f 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -898,6 +898,9 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>  
> +core.graph::
> + Enable git commit graph feature. Allows writing and reading from .graph 
> files.
> +
>  core.sparseCheckout::
>   Enable "sparse checkout" feature. See section "Sparse checkout" in
>   linkgit:git-read-tree[1] for more information.
> diff --git a/cache.h b/cache.h
> index d8b975a571..655a81ac90 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
>  extern int fsync_object_files;
>  extern int core_preload_index;
>  extern int core_apply_sparse_checkout;
> +extern int core_graph;
>  extern int precomposed_unicode;
>  extern int protect_hfs;
>  extern int protect_ntfs;
> diff --git a/config.c b/config.c
> index e617c2018d..fee90912d8 100644
> --- a/config.c
> +++ b/config.c
> @@ -1223,6 +1223,11 @@ static int git_default_core_config(const char *var, 
> const char *value)
>   return 0;
>   }
>  
> + if (!strcmp(var, "core.graph")) {
> + core_graph = git_config_bool(var, value);
> + return 0;
> + }
> +
>   if (!strcmp(var, "core.sparsecheckout")) {
>   core_apply_sparse_checkout = git_config_bool(var, value);
>   return 0;
> diff --git a/environment.c b/environment.c
> index 63ac38a46f..0c56a3d869 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -61,6 +61,7 @@ enum object_creation_mode object_creation_mode = 
> OBJECT_CREATION_MODE;
>  char *notes_ref_name;
>  int grafts_replace_parents = 1;
>  int core_apply_sparse_checkout;
> +int core_graph;
>  int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;

Re: [PATCH 6/6] daemon: fix length computation in newline stripping

2018-01-25 Thread Eric Sunshine

On Wed, Jan 24, 2018 at 7:58 PM, Jeff King  wrote:
> When git-daemon gets a pktline request, we strip off any
> trailing newline, replacing it with a NUL. Clients prior to
> 5ad312bede (in git v1.4.0) would send: [...]
>
> Reported-by: Michael Haggerty 
> Signed-off-by: Jeff King 
> ---
> diff --git a/t/t5570-git-daemon.sh b/t/t5570-git-daemon.sh
> @@ -196,5 +196,20 @@ test_expect_success 'daemon log records all attributes' '
> +test_expect_success FAKENC 'hostname interpolation works after LF-stripping' 
> '
> +   {
> +   printf "git-upload-pack /interp.git\n\0host=localhost" | 
> packetize

Do we care about the &&-chain here? (We'd notice if something went
wrong in 'packetize' even without &&-chain since 'input' would likely
end up with incorrect content, but still...)

> +   printf ""
> +   } >input &&
> +   fake_nc "$GIT_DAEMON_HOST_PORT" output &&
> +   depacketize output.raw &&
> +
> +   # just pick out the value of master, which avoids any protocol
> +   # particulars
> +   perl -lne "print \$1 if m{^(\\S+) refs/heads/master}"  >actual &&
> +   git -C "$repo" rev-parse master >expect &&
> +   test_cmp expect actual
> +'

Re: [PATCH 01/14] graph: add packed graph design document

2018-01-25 Thread Junio C Hamano

Derrick Stolee  writes:

> Add Documentation/technical/packed-graph.txt with details of the planned
> packed graph feature, including future plans.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/technical/packed-graph.txt | 185 
> +++
>  1 file changed, 185 insertions(+)
>  create mode 100644 Documentation/technical/packed-graph.txt

I really wanted to like having this patch at the beginning, but
unfortunatelly I didn't see the actual file format description,
which was a bit disappointing.  An example of the things that I was
curious about was how the "integer ID" is used to access into the
file.  If we could somehow use "integer ID" as an index into an
array of fixed size elements, it would be ideal to gain "fast
lookups", but because of the "list of parents" thing, it needs some
trickery to do so, and that was among the things that I wanted to
see how much thought went into the design, for example.

> diff --git a/Documentation/technical/packed-graph.txt 
> b/Documentation/technical/packed-graph.txt
> new file mode 100644
> index 00..fcc0c83874
> --- /dev/null
> +++ b/Documentation/technical/packed-graph.txt
> @@ -0,0 +1,185 @@
> +Git Packed Graph Design Notes
> +=
> +
> +Git walks the commit graph for many reasons, including:
> +
> +1. Listing and filtering commit history.
> +2. Computing merge bases.
> +
> +These operations can become slow as the commit count grows above 100K.
> +The merge base calculation shows up in many user-facing commands, such
> +as 'status' and 'fetch' and can take minutes to compute depending on
> +data shape. There are two main costs here:

s/data shape/history shape/ may make it even clearer.

> +1. The commit OID.
> +2. The list of parents.
> +3. The commit date.
> +4. The root tree OID.
> +5. An integer ID for fast lookups in the graph.
> +6. The generation number (see definition below).
> +
> +Values 1-4 satisfy the requirements of parse_commit_gently().
> +
> +By providing an integer ID we can avoid lookups in the graph as we walk
> +commits. Specifically, we need to provide the integer ID of the parent
> +commits so we navigate directly to their information on request.

Commits created after a packed graph file is built may of course not
appear in a packed graph file, but that is OK because they never need
to be listed as parents of commits in the file.  So "list of parents"
can always refer to the parents using the "integer ID for fast lookup".

Makes sense.  Item 2. might want to say "The list of parents, using
the fast lookup integer ID (see 5.) as reference instead of OID",
though.

> +Define the "generation number" of a commit recursively as follows:
> + * A commit with no parents (a root commit) has generation number 1.
> + * A commit with at least one parent has generation number 1 more than
> +   the largest generation number among its parents.
> +Equivalently, the generation number is one more than the length of a
> +longest path from the commit to a root commit.

When a commit A can reach roots X and Y, and Y is further than X,
the distance between Y and A becomes A's generation number.  "One
more than the length of the path from the commit to the furthest
root commit it can reach", in other words.

> +The recursive definition
> +is easier to use for computation and the following property:
> +
> +If A and B are commits with generation numbers N and M, respectively,
> +and N <= M, then A cannot reach B. That is, we know without searching
> +that B is not an ancestor of A because it is further from a root commit
> +than A.
> +
> +Conversely, when checking if A is an ancestor of B, then we only need
> +to walk commits until all commits on the walk boundary have generation
> +number at most N. If we walk commits using a priority queue seeded by
> +generation numbers, then we always expand the boundary commit with 
> highest
> +generation number and can easily detect the stopping condition.

These are both true.  It would be nice if an optimized walker
algorithm can also deal with history with recent commits for which
we do not yet know the generation numbers (i.e. you first traverse
and assign generation numbers and record in packed graph, then
history grows but we haven't added the new commits to the packed
graph yet).

> +- A graph file is stored in a file named 'graph-.graph' in the pack
> +  directory. This could be stored in an alternate.

Is that  really an object name?  The  that appears in the
name of a packfile pack-.pack is *not* an , and I somehow
suspect that you are doing a similar "use hash of (some) contents to
make it uniquely identify the content", not "information about a
single object that is identified by the ".

> +- The graph file is only a supplemental structure. If a user downgrades
> +  or disables the 'core.graph' config setting, then the existing ODB is
> +  sufficient.

OK, that is

Re: Feature request: Improve diff algorithm

2018-01-25 Thread SZEDER Gábor

On Thu, Jan 25, 2018 at 9:12 PM, SZEDER Gábor  wrote:
>> One yet more:
>>
>> @@ -141,5 +86,9 @@
>>   // }
>>
>>
>> - OP* o;

Oops, when trying to reproduce I overlooked that here the * is stuck
after OP ...

>> + SV *tvs =  newSVpvs( "ScalarHistory" );
>> + SV *tva =  newSVpvs( "ArrayHistory"  );
>> + SV *tvh =  newSVpvs( "HashHistory"   );
>> +
>> + OP *o;

... but here it's stuck to o.

With that adjusted I do get the same diff as you, and I think that's the
right output in this case.

Re: [PATCH 02/14] packed-graph: add core.graph setting

2018-01-25 Thread Derrick Stolee


On 1/25/2018 3:17 PM, Stefan Beller wrote:

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee  wrote:

The packed graph feature is controlled by the new core.graph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.graph is that a user can always stop checking
for or parsing packed graph files if core.graph=0.

@@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
  extern int fsync_object_files;
  extern int core_preload_index;
  extern int core_apply_sparse_checkout;
+extern int core_graph;

Putting it here instead of say the_repository makes sense as you'd want
to use this feature globally. However you can still have the config
different per repository  (e.g. version number of the graph setting,
as one might be optimized for speed and the other for file size of
the .graph file or such).

So not sure if we'd rather want to put this into the repository struct.
But then again the other core settings aren't there either and this
feature sounds like it is repository specific only in the experimental
phase; later it is expected to be on everywhere?


I do think that more things should go in the repository struct. 
Unfortunately, that is not the world we live in.


However, to make things clearer I'm following the pattern currently in 
master. You'll see the same with the global 'packed_graph' pointer, 
similar to 'packed_git'. I think these should be paired together until 
the repository absorbs them.


If other 'core_*' variables move to the repository, I'm happy to move 
core_graph.

If 'packed_git' moves to the repository, I'm happy to move 'packed_git'.

However, if there is significant interest in moving all new state to the 
repository, then I'll move these values there. Let's have that 
discussion here instead of spread around the rest of the patch.


Thanks,
-Stolee

please resolve a mystery for me: what is j-c-diff exactly? ;)

2018-01-25 Thread Yaroslav Halchenko

Hi Junio et al,

j-c-diff is "used" in the docs within git, git-annex, and other places
discussing git.  But I failed to find it to seek for an ultimate prototypical
example of the diff command used by git ppl ;)

$> git log -Sj-c-diff -p Documentation/gitattributes.txt 
commit 2cc3167c688d1c91bc4cb9b1caa737b9d4971056
Author: Junio C Hamano 
Date:   Mon Apr 23 00:21:02 2007 -0700

Document "diff=driver" attribute

Signed-off-by: Junio C Hamano 

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 126871756d..d2edb9b14a 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -151,8 +151,34 @@ Unspecified::
text, it is treated as text.  Otherwise it would
generate `Binary files differ`.
 
-Any other value set to `diff` attribute is ignored and git acts
-as if the attribute is left unspecified.
+String::
+
+ Diff is shown using the specified custom diff driver.
+ The driver program is given its input using the same
+ calling convention as used for GIT_EXTERNAL_DIFF
+ program.
+
+
+Defining a custom diff driver
+^
+
+The definition of a diff driver is done in `gitconfig`, not
+`gitattributes` file, so strictly speaking this manual page is a
+wrong place to talk about it.  However...
+
+To define a custom diff driver `jcdiff`, add a section to your
+`$GIT_DIR/config` file (or `$HOME/.gitconfig` file) like this:
+
+
+[diff "jcdiff"]
+ command = j-c-diff
+
+
+When git needs to show you a diff for the path with `diff`
+attribute set to `jcdiff`, it calls the command you specified
+with the above configuration, i.e. `j-c-diff`, with 7
+parameters, just like `GIT_EXTERNAL_DIFF` program is called.
+See gitlink:git[7] for details.

Cheers!
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik

Re: Feature request: Improve diff algorithm

2018-01-25 Thread Junio C Hamano

KES  writes:

> One yet more:
>
> @@ -141,5 +86,9 @@
>   // }
>  
>  
> - OP* o;
> + SV *tvs =  newSVpvs( "ScalarHistory" );
> + SV *tva =  newSVpvs( "ArrayHistory"  );
> + SV *tvh =  newSVpvs( "HashHistory"   );
> +
> + OP *o;
>   while( PL_op ) {

Huh?

If the asterisk between type OP and var o did not change, then
inserting the three new lines before o's definition may make sense,
but otherwise...

Hi Dear.

2018-01-25 Thread Jennifer Williams

Hi Dear,

How are you today I hope that everything is OK with you as it is my great 
pleasure to contact you in having communication with you starting from today, I 
was just going through the Internet search when I found your email address, I 
want to make a very new and special friend, so I decided to contact you to see 
how we can make it work if we can. Please I wish you will have the desire with 
me so that we can get to know each other better and see what happens in future.

My name is Jennifer Williams, I am an American  presently I live in the UK, I 
will be very happy if you can write me through my private email address() for 
easy communication so that we can know each other, I will give you my pictures 
and details about me.

Bye
Jennifer.

Re: Please review my code

2018-01-25 Thread Christian Couder

Hi Olga,

On Thu, Jan 25, 2018 at 6:20 PM, Оля Тележная  wrote:
> Hi everyone,
> I haven't sent the code by mailing lists because 25 commits (every
> commit in separate message) look like a spam.

Yeah, so now that you added tests, it might be interesting to see if
the patch series can be refactored to be shorter or to be clearer.

> Please look at my code:
> https://github.com/telezhnaya/git/commits/catfile
> You could send me any ideas here or in Github.

I left some comments on GitHub. My main suggestion is to try to get
rid of the is_cat global and if possible to remove the "struct
expand_data *cat_file_info" global.

> The main idea of the patch is to get rid of using custom formatting in
> cat-file and start using general one from ref-filter.
> Additional bonus is that cat-file becomes to support many new
> formatting commands like %(if), %(color), %(committername) etc.

Yeah, that is a really nice result.

> I remember I need to rewrite docs, I will do that in the near future.

Great, thanks,
Christian.

Re: [PATCH 02/14] packed-graph: add core.graph setting

2018-01-25 Thread Stefan Beller

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee  wrote:
> The packed graph feature is controlled by the new core.graph config
> setting. This defaults to 0, so the feature is opt-in.
>
> The intention of core.graph is that a user can always stop checking
> for or parsing packed graph files if core.graph=0.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/config.txt | 3 +++
>  cache.h  | 1 +
>  config.c | 5 +
>  environment.c| 1 +
>  4 files changed, 10 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 0e25b2c92b..e7b98fa14f 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -898,6 +898,9 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>
> +core.graph::
> +   Enable git commit graph feature. Allows writing and reading from 
> .graph files.
> +
>  core.sparseCheckout::
> Enable "sparse checkout" feature. See section "Sparse checkout" in
> linkgit:git-read-tree[1] for more information.
> diff --git a/cache.h b/cache.h
> index d8b975a571..655a81ac90 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
>  extern int fsync_object_files;
>  extern int core_preload_index;
>  extern int core_apply_sparse_checkout;
> +extern int core_graph;

Putting it here instead of say the_repository makes sense as you'd want
to use this feature globally. However you can still have the config
different per repository  (e.g. version number of the graph setting,
as one might be optimized for speed and the other for file size of
the .graph file or such).

So not sure if we'd rather want to put this into the repository struct.
But then again the other core settings aren't there either and this
feature sounds like it is repository specific only in the experimental
phase; later it is expected to be on everywhere?

>  extern int precomposed_unicode;
>  extern int protect_hfs;
>  extern int protect_ntfs;
> diff --git a/config.c b/config.c
> index e617c2018d..fee90912d8 100644
> --- a/config.c
> +++ b/config.c
> @@ -1223,6 +1223,11 @@ static int git_default_core_config(const char *var, 
> const char *value)
> return 0;
> }
>
> +   if (!strcmp(var, "core.graph")) {
> +   core_graph = git_config_bool(var, value);
> +   return 0;
> +   }
> +
> if (!strcmp(var, "core.sparsecheckout")) {
> core_apply_sparse_checkout = git_config_bool(var, value);
> return 0;
> diff --git a/environment.c b/environment.c
> index 63ac38a46f..0c56a3d869 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -61,6 +61,7 @@ enum object_creation_mode object_creation_mode = 
> OBJECT_CREATION_MODE;
>  char *notes_ref_name;
>  int grafts_replace_parents = 1;
>  int core_apply_sparse_checkout;
> +int core_graph;
>  int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
>  unsigned long pack_size_limit_cfg;
> --
> 2.16.0
>

Re: Feature request: Improve diff algorithm

2018-01-25 Thread SZEDER Gábor

> One yet more:
> 
> @@ -141,5 +86,9 @@
>   // }
>  
>  
> - OP* o;
> + SV *tvs =  newSVpvs( "ScalarHistory" );
> + SV *tva =  newSVpvs( "ArrayHistory"  );
> + SV *tvh =  newSVpvs( "HashHistory"   );
> +
> + OP *o;
>   while( PL_op ) {

What version of Git are you using?

The current version gives me this:

diff --git a/f b/f
index 30a292bbd..fa1e98292 100644
--- a/f
+++ b/f
@@ -1,5 +1,9 @@
// }
 
 
+   SV *tvs =  newSVpvs( "ScalarHistory" );
+   SV *tva =  newSVpvs( "ArrayHistory"  );
+   SV *tvh =  newSVpvs( "HashHistory"   );
+
OP* o;
while( PL_op ) {

Re: [PATCH 01/14] graph: add packed graph design document

2018-01-25 Thread Stefan Beller

On Thu, Jan 25, 2018 at 6:02 AM, Derrick Stolee  wrote:
> Add Documentation/technical/packed-graph.txt with details of the planned
> packed graph feature, including future plans.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/technical/packed-graph.txt | 185 
> +++
>  1 file changed, 185 insertions(+)
>  create mode 100644 Documentation/technical/packed-graph.txt
>
> diff --git a/Documentation/technical/packed-graph.txt 
> b/Documentation/technical/packed-graph.txt
> new file mode 100644
> index 00..fcc0c83874
> --- /dev/null
> +++ b/Documentation/technical/packed-graph.txt
> @@ -0,0 +1,185 @@
> +Git Packed Graph Design Notes
> +=
> +
> +Git walks the commit graph for many reasons, including:
> +
> +1. Listing and filtering commit history.
> +2. Computing merge bases.
> +
> +These operations can become slow as the commit count grows above 100K.

How did you come up with that specific number? (Is it platform dependent?)
I'd avoid a specific number to not derail the reader here into wondering
how this got measured.

> +The merge base calculation shows up in many user-facing commands, such
> +as 'status' and 'fetch' and can take minutes to compute depending on
> +data shape. There are two main costs here:

status needs the walk for the ahead/behind computation which is (1)?
(I forget how status would need to compute a merge-base)

fetch is a networked command, which traditionally in Git is understood as
"can be slow" because you might be in Australia, or the connection is slow
otherwise. So giving this as an example it is not obvious that the DAG walking
is the bottleneck. Maybe git-merge or "git show --remerge-diff" [1] are better
examples for walk-intensive commands?

[1] https://public-inbox.org/git/cover.1409860234.git...@thomasrast.ch/
never landed, so maybe that is a bad example. But for me that command
is more obviously dependent on cheap walking the DAG compared to fetch.
So, take my comments with a grain of salt!

> +1. Decompressing and parsing commits.
> +2. Walking the entire graph to avoid topological order mistakes.
> +
> +The packed graph is a file that stores the commit graph structure along
> +with some extra metadata to speed up graph walks. This format allows a
> +consumer to load the following info for a commit:
> +
> +1. The commit OID.
> +2. The list of parents.
> +3. The commit date.
> +4. The root tree OID.
> +5. An integer ID for fast lookups in the graph.
> +6. The generation number (see definition below).
> +
> +Values 1-4 satisfy the requirements of parse_commit_gently().


This new format is specifically removing the cost of decompression and parsing
(1) completely, whereas (2) we still have to walk the entire graph for now as
the generation numbers are not fully used as of yet, but provided.

> +By providing an integer ID we can avoid lookups in the graph as we walk
> +commits. Specifically, we need to provide the integer ID of the parent
> +commits so we navigate directly to their information on request.

Does this mean we decrease the pressure on fast lookups in
packfiles/loose objects?

> +Define the "generation number" of a commit recursively as follows:
> + * A commit with no parents (a root commit) has generation number 1.
> + * A commit with at least one parent has generation number 1 more than
> +   the largest generation number among its parents.
> +Equivalently, the generation number is one more than the length of a
> +longest path from the commit to a root commit. The recursive definition
> +is easier to use for computation and the following property:
> +
> +If A and B are commits with generation numbers N and M, respectively,
> +and N <= M, then A cannot reach B. That is, we know without searching
> +that B is not an ancestor of A because it is further from a root commit
> +than A.
> +
> +Conversely, when checking if A is an ancestor of B, then we only need
> +to walk commits until all commits on the walk boundary have generation
> +number at most N. If we walk commits using a priority queue seeded by
> +generation numbers, then we always expand the boundary commit with 
> highest
> +generation number and can easily detect the stopping condition.

Thanks for including the definition and potential benefits!

> +
> +This property can be used to significantly reduce the time it takes to
> +walk commits and determine topological relationships. Without generation
> +numbers, the general heuristic is the following:
> +
> +If A and B are commits with commit time X and Y, respectively, and
> +X < Y, then A _probably_ cannot reach B.
> +
> +This heuristic is currently used whenever the computation can make
> +mistakes with topological orders (such as "git log" with default order),
> +but is not used when the topological order is required (such as merge
> +base calculations, "git log --graph").
> +
> +Design

Re: Feature request: Improve diff algorithm

2018-01-25 Thread KES

One yet more:

@@ -141,5 +86,9 @@
// }
 
 
-   OP* o;
+   SV *tvs =  newSVpvs( "ScalarHistory" );
+   SV *tva =  newSVpvs( "ArrayHistory"  );
+   SV *tvh =  newSVpvs( "HashHistory"   );
+
+   OP *o;
while( PL_op ) {

Re: [PATCH] builtin/pull: respect verbosity settings in submodules

2018-01-25 Thread Junio C Hamano

Stefan Beller  writes:

> In a6d7eb2c7a (pull: optionally rebase submodules (remote submodule
> changes only), 2017-06-23), we taught Git how to rebase submodules in
> a pull. However we missed to pass on the verbosity settings.

Makes sense.  Thanks.

>
> Reported-by: Robin H. Johnson 
> Signed-off-by: Stefan Beller 
> ---
>  builtin/pull.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/builtin/pull.c b/builtin/pull.c
> index 511dbbe0f6..1876271af9 100644
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -574,6 +574,7 @@ static int rebase_submodules(void)
>   cp.no_stdin = 1;
>   argv_array_pushl(, "submodule", "update",
>  "--recursive", "--rebase", NULL);
> + argv_push_verbosity();
>  
>   return run_command();
>  }
> @@ -586,6 +587,7 @@ static int update_submodules(void)
>   cp.no_stdin = 1;
>   argv_array_pushl(, "submodule", "update",
>  "--recursive", "--checkout", NULL);
> + argv_push_verbosity();
>  
>   return run_command();
>  }

Re: [PATCH 0/6] off-by-one errors in git-daemon

2018-01-25 Thread Jeff King

On Thu, Jan 25, 2018 at 10:46:51AM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > This series fixes two off-by-one errors in git-daemon noticed by Michael
> > (who then nerd-sniped me into fixing them). It also improves
> > git-daemon's verbose logging of extended attributes, and beefs up the
> > tests a bit.
> >
> > Before anyone gets excited, no, these aren't security-interesting
> > errors. The only effect you could have is for git-daemon to reject your
> > request as nonsense. ;)
> 
> Thanks.  All looked sensible.

Thanks. Do you mind replacing patch 2 with the update below while
queuing? It uses the more robust loop mentioned by Lucas, and clarifies
the commit message a bit.

There should be no changes necessary for the other patches on top.

-- >8 --
Subject: [PATCH v2] t/lib-git-daemon: record daemon log

When we start git-daemon for our tests, we send its stderr
log stream to a named pipe. We synchronously read the first
line to make sure that the daemon started, and then dump the
rest to descriptor 4. This is handy for debugging test
output with "--verbose", but the tests themselves can't
access the log data.

Let's dump the log into a file, as well, so that future
tests can check the log. There are a few subtleties worth
calling out here:

  - we'll continue to send output to descriptor 4 for
viewing/debugging, which would imply swapping out "cat"
for "tee". But we want to ensure that there's no
buffering, and "tee" doesn't have a standard way to
ask for that. So we'll use a shell loop around "read"
and "printf" instead. That ensures that after a request
has been served, the matching log entries will have made
it to the file.

  - the existing first-line shell loop used read/echo. We'll
switch to consistently using "read -r" and "printf" to
relay data as faithfully as possible.

  - we open the logfile for append, rather than just output.
That makes it OK for tests to truncate the logfile
without restarting the daemon (the OS will atomically
seek to the end of the file when outputting each line).
That allows tests to look at the log without worrying
about pollution from earlier tests.

Helped-by: Lucas Werkmeister 
Signed-off-by: Jeff King 
---
 t/lib-git-daemon.sh | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/t/lib-git-daemon.sh b/t/lib-git-daemon.sh
index 987d40680b..9612cccefb 100644
--- a/t/lib-git-daemon.sh
+++ b/t/lib-git-daemon.sh
@@ -53,11 +53,19 @@ start_git_daemon() {
"$@" "$GIT_DAEMON_DOCUMENT_ROOT_PATH" \
>&3 2>git_daemon_output &
GIT_DAEMON_PID=$!
+   >daemon.log
{
-   read line <&7
-   echo >&4 "$line"
-   cat <&7 >&4 &
-   } 7&4 "%s\n" "$line"
+   (
+   while read -r line <&7
+   do
+   printf "%s\n" "$line"
+   printf >&4 "%s\n" "$line"
+   done
+   ) &
+   } 7>"$TRASH_DIRECTORY/daemon.log" &&
 
# Check expected output
if test x"$(expr "$line" : "\[[0-9]*\] \(.*\)")" != x"Ready to rumble"
-- 
2.16.1.273.g775ca5227b

Re: [PATCH 2/6] t/lib-git-daemon: record daemon log

2018-01-25 Thread Jeff King

On Thu, Jan 25, 2018 at 12:56:47PM +0100, Lucas Werkmeister wrote:

> > Let's dump the log into a file, as well, so that future
> > tests can check the log. There are two subtleties worth
> > calling out here:
> > 
> >   - we replace "cat" with a subshell loop around "read" to
> > ensure that there's no buffering (so that tests can be
> > sure that after a request has been served, the matching
> > log entries will have made it to the file)
> 
> POSIX specifies the -u option for that behavior, can’t you use that?
> (GNU coreutils’ cat ignores it, since writing without delay is
> apparently its default behavior already.)

Actually, this glosses over one other detail, which is that we'd also
need to replace "cat" with "tee" to keep output going to descriptor 4.
That's not strictly necessary (it's just for debugging output), so we
could drop that. But the shell loop seemed easy enough.

> > {
> > read line <&7
> > +   echo "$line"
> > echo >&4 "$line"
> > -   cat <&7 >&4 &
> > -   } 7 > +   (
> > +   while read line <&7
> > +   do
> > +   echo "$line"
> > +   echo >&4 "$line"
> > +   done
> > +   ) &
> > +   } 7>"$TRASH_DIRECTORY/daemon.log" &&
> >  
> > # Check expected output
> > if test x"$(expr "$line" : "\[[0-9]*\] \(.*\)")" != x"Ready to rumble"
> > 
> 
> read without -r clobbers backslashes, and echo may interpret escape
> sequences. To faithfully reproduce the output, it would be better to use
> read -r and printf '%s\n' "$line", I think. (However, it looks like the
> existing code already uses read+echo, so I guess you could also keep
> that pattern in this change and then fix it in a later one.)

Yeah. I doubt it matters much, since this is just inside our tests, and
we control the input. But it doesn't hurt to do it in the more robust
way. I'll re-roll this patch.

-Peff

[PATCH] builtin/pull: respect verbosity settings in submodules

2018-01-25 Thread Stefan Beller

In a6d7eb2c7a (pull: optionally rebase submodules (remote submodule
changes only), 2017-06-23), we taught Git how to rebase submodules in
a pull. However we missed to pass on the verbosity settings.

Reported-by: Robin H. Johnson 
Signed-off-by: Stefan Beller 
---
 builtin/pull.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/pull.c b/builtin/pull.c
index 511dbbe0f6..1876271af9 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -574,6 +574,7 @@ static int rebase_submodules(void)
cp.no_stdin = 1;
argv_array_pushl(, "submodule", "update",
   "--recursive", "--rebase", NULL);
+   argv_push_verbosity();
 
return run_command();
 }
@@ -586,6 +587,7 @@ static int update_submodules(void)
cp.no_stdin = 1;
argv_array_pushl(, "submodule", "update",
   "--recursive", "--checkout", NULL);
+   argv_push_verbosity();
 
return run_command();
 }
-- 
2.16.0.rc1.238.g530d649a79-goog

RE: GIT 2.3.1 - Code Execution Vulnerability

2018-01-25 Thread Dyer, Edwin

Current Solaris 10/11 version of Git is 2.4.0:

https://www.opencsw.org/package/git/

Ed Dyer
Associate DevOps Engineer

Alliance Data Retail Services
3075 Loyalty Circle, Columbus OH 43219
Office: 614-944-3923| Mobile: 614-432-3862

-Original Message-
From: git-ow...@vger.kernel.org [mailto:git-ow...@vger.kernel.org] On Behalf Of 
Santiago Torres
Sent: Thursday, January 25, 2018 12:21 PM
To: christian.del.vecc...@zurich.com
Cc: git@vger.kernel.org
Subject: Re: GIT 2.3.1 - Code Execution Vulnerability

Hi, Christian.

They are probably talking about one of these[1][2]. I don't have access to a 
solaris machine right now, so I don't know which is the latest version they 
ship, but they probably backported patches. 

Here we can't do much more about it, given that the packagers for your solaris 
version are the ones (possibly) packaging 2.3.1. I'd email or open a ticket 
with Oracle after making sure they 1) haven't backported patches to fix these, 
or 2) don't have a newer version in their repositories.

Cheers!
-Santiago.

[1] https://security.archlinux.org/CVE-2017-1000117
[2] https://nvd.nist.gov/vuln/detail/CVE-2016-2324

On Thu, Jan 25, 2018 at 06:02:34PM +0100, christian.del.vecc...@zurich.com 
wrote:
> dear Team
> 
> I am Christian Del Vecchio,and i work in the infrastructure of Middleware on 
> Zurich.
> we have installed in our system Sun your product in order to connect to our 
> bitbucket repository.
> 
> we have followed the instruction provided on your Web Page:
> 
> https://git-scm.com/download/linux
> pkgutil -i git
> 
> the version installed is the 2.3.1, and actually it works.
> 
> but last week our security team informed that this software didn't 
> pass the check control due: Git Server and Client Remote Code 
> Execution Vulnerability
> 
> 
> please, is it available a newer version that fix this problem?
> 
> our system is: Sun Solaris v10 sparc
> 
> best regards
> __
> 
> Christian Del Vecchio
> Middleware SME
> 
> Zurich Insurance Group Ltd. 
> bac de Roda 58,
> Building C, 4th floor
> 08019 Barcelona, Spain
> 
> 64402 (internal)
> +34 93 4465402 (direct)
> christian.del.vecc...@zurich.com
> http://www.zurich.com

__
The information contained in this e-mail message and any attachments may be 
privileged and confidential. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you are hereby notified that any review, dissemination, distribution or copying 
of this communication is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately by replying to 
this e-mail and delete the message and any attachments from your computer.
__

[PATCH] Docs: split out long-running subprocess handshake

2018-01-25 Thread Jonathan Tan

Separating out the implementation of the handshake when starting a
long-running subprocess (for example, as is done for a clean/smudge
filter) was done in commit fa64a2fdbeed ("sub-process: refactor
handshake to common function", 2017-07-26), but its documentation still
resides in gitattributes. Split out the documentation as well.

Signed-off-by: Jonathan Tan 
---
This is extracted from my patch from July 2017 [1], which later got
superseded by another patch set that no longer used a long-running
subprocess.

I think this patch (or something similar) is something we'll need sooner
or later, but I'm also OK if we wait until we have a second usage of the
long-running subprocess to merge it in.

[1]
https://public-inbox.org/git/eadce97b6a1e80345a2621e71ce187e9e6bc05bf.1501532294.git.jonathanta...@google.com/#t
---
 Documentation/Makefile |  1 +
 Documentation/gitattributes.txt| 54 --
 .../technical/long-running-process-protocol.txt| 50 
 sub-process.h  |  4 +-
 4 files changed, 61 insertions(+), 48 deletions(-)
 create mode 100644 Documentation/technical/long-running-process-protocol.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 4ae9ba5c8..6232143cb 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -72,6 +72,7 @@ TECH_DOCS += SubmittingPatches
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
+TECH_DOCS += technical/long-running-process-protocol
 TECH_DOCS += technical/pack-format
 TECH_DOCS += technical/pack-heuristics
 TECH_DOCS += technical/pack-protocol
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 30687de81..c21f5ca10 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -392,46 +392,14 @@ Long Running Filter Process
 If the filter command (a string value) is defined via
 `filter..process` then Git can process all blobs with a
 single filter invocation for the entire life of a single Git
-command. This is achieved by using a packet format (pkt-line,
-see technical/protocol-common.txt) based protocol over standard
-input and standard output as follows. All packets, except for the
-"*CONTENT" packets and the "" flush packet, are considered
-text and therefore are terminated by a LF.
-
-Git starts the filter when it encounters the first file
-that needs to be cleaned or smudged. After the filter started
-Git sends a welcome message ("git-filter-client"), a list of supported
-protocol version numbers, and a flush packet. Git expects to read a welcome
-response message ("git-filter-server"), exactly one protocol version number
-from the previously sent list, and a flush packet. All further
-communication will be based on the selected version. The remaining
-protocol description below documents "version=2". Please note that
-"version=42" in the example below does not exist and is only there
-to illustrate how the protocol would look like with more than one
-version.
-
-After the version negotiation Git sends a list of all capabilities that
-it supports and a flush packet. Git expects to read a list of desired
-capabilities, which must be a subset of the supported capabilities list,
-and a flush packet as response:
-
-packet:  git> git-filter-client
-packet:  git> version=2
-packet:  git> version=42
-packet:  git> 
-packet:  git< git-filter-server
-packet:  git< version=2
-packet:  git< 
-packet:  git> capability=clean
-packet:  git> capability=smudge
-packet:  git> capability=not-yet-invented
-packet:  git> 
-packet:  git< capability=clean
-packet:  git< capability=smudge
-packet:  git< 
-
-Supported filter capabilities in version 2 are "clean", "smudge",
-and "delay".
+command. This is achieved by using the long-running process protocol
+(described in technical/long-running-process-protocol.txt).
+
+When Git encounters the first file that needs to be cleaned or smudged,
+it starts the filter and performs the handshake. In the handshake, the
+welcome message sent by Git is "git-filter-client", only version 2 is
+suppported, and the supported capabilities are "clean", "smudge", and
+"delay".
 
 Afterwards Git sends a list of "key=value" pairs terminated with
 a flush packet. The list will contain at least the filter command
@@ -517,12 +485,6 @@ the protocol then Git will stop the filter process and 
restart it
 with the next file that needs to be processed. Depending on the
 `filter..required` flag Git will interpret that as error.
 
-After the filter has processed a command it is expected to wait for
-a "key=value" list containing the next command. Git will close
-the command pipe on exit. The

Re: [PATCH] merge: support --strategy '?' for git-completion.bash

2018-01-25 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

> Bash completion support gets the list of available strategies with a
> grep and sed trick which does not work on non-C locale since the anchor
> string is translated and it does not cover custom strategies either.
>
> Let's do it a better way and make git-merge provide all available
> strategies in machine-readable form.
>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Another, perhaps better, option is add "git merge --list-strategies".
>  It requires some code movement, so I'll try with a simpler approach
>  first.

If you run the probing "merge -s help" under C locale, that would
just be a one-liner, no ;-)  I.e.

> - git merge -s help 2>&1 |
> + LANG=C LC_ALL=C git merge -s help 2>&1 |

Not rejecting the patch, but just wondering.

Re: [PATCH 0/6] off-by-one errors in git-daemon

2018-01-25 Thread Junio C Hamano

Jeff King  writes:

> This series fixes two off-by-one errors in git-daemon noticed by Michael
> (who then nerd-sniped me into fixing them). It also improves
> git-daemon's verbose logging of extended attributes, and beefs up the
> tests a bit.
>
> Before anyone gets excited, no, these aren't security-interesting
> errors. The only effect you could have is for git-daemon to reject your
> request as nonsense. ;)

Thanks.  All looked sensible.

git credential-osxkeychain bug/feature?

2018-01-25 Thread Robert Leach

Hi,

I wanted to submit an issue regarding the credential-osxkeychain interface 
functionality.  I don't know whether it's a bug or a feature request.  Maybe 
it's even impossible to address? - I don't know, but here's my issue.  Let me 
know if fixing/implementing this is a bad idea or if it's an issue relating to 
something wrong on my system and not an issue with git itself.

I have a git repo in a dropbox directory (not sure if that's relevant - just 
giving full disclosure).  I have set up git to not ask me for my password by 
using the credential-osxkeychain feature, and it works great... unless I'm 
connected via ssh to the machine that has a copy of my Dropbox on it.  Various 
git commands interrogate me for my login instead of using the credentials 
stored in the KA database on that remote machine, e.g.:

git remote show origin
Username for 'https://github.com': ^C

When I'm sitting at that machine, I do not get asked for my login credentials.

It's a minor issue, and kind of silly TBH.  I can always just work on my repo 
locally and synch, but if I'm working on a project that processes a large 
dataset, it's faster to edit the project on the machine with the data I'm 
testing it with.  I could VNC in or just synch after each change, which works 
around the issue, but it's nicer to just do everything on one ssh terminal 
session.

But perhaps access to KA from a remote ssh session is restricted for security 
reasons?  I'm just curious I suppose.  Should/can this work?

Thanks,
Rob

Please review my code

2018-01-25 Thread Оля Тележная

Hi everyone,
I haven't sent the code by mailing lists because 25 commits (every
commit in separate message) look like a spam.

Please look at my code:
https://github.com/telezhnaya/git/commits/catfile
You could send me any ideas here or in Github.

The main idea of the patch is to get rid of using custom formatting in
cat-file and start using general one from ref-filter.
Additional bonus is that cat-file becomes to support many new
formatting commands like %(if), %(color), %(committername) etc.

I remember I need to rewrite docs, I will do that in the near future.

I would be happy to get any ideas from you.
Thanks!
Olga

Re: GIT 2.3.1 - Code Execution Vulnerability

2018-01-25 Thread Santiago Torres

Hi, Christian.

They are probably talking about one of these[1][2]. I don't have access
to a solaris machine right now, so I don't know which is the latest
version they ship, but they probably backported patches. 

Here we can't do much more about it, given that the packagers for your
solaris version are the ones (possibly) packaging 2.3.1. I'd email or
open a ticket with Oracle after making sure they 1) haven't backported
patches to fix these, or 2) don't have a newer version in their
repositories.

Cheers!
-Santiago.


[1] https://security.archlinux.org/CVE-2017-1000117
[2] https://nvd.nist.gov/vuln/detail/CVE-2016-2324

On Thu, Jan 25, 2018 at 06:02:34PM +0100, christian.del.vecc...@zurich.com 
wrote:
> dear Team
> 
> I am Christian Del Vecchio,and i work in the infrastructure of Middleware on 
> Zurich.
> we have installed in our system Sun your product in order to connect to our 
> bitbucket repository.
> 
> we have followed the instruction provided on your Web Page:
> 
> https://git-scm.com/download/linux
> pkgutil -i git
> 
> the version installed is the 2.3.1, and actually it works.
> 
> but last week our security team informed that this software didn't pass the 
> check control due: Git Server and Client Remote Code Execution Vulnerability
> 
> 
> please, is it available a newer version that fix this problem?
> 
> our system is: Sun Solaris v10 sparc
> 
> best regards
> __ 
> 
> Christian Del Vecchio 
> Middleware SME 
> 
> Zurich Insurance Group Ltd. 
> bac de Roda 58, 
> Building C, 4th floor 
> 08019 Barcelona, Spain 
> 
> 64402 (internal) 
> +34 93 4465402 (direct) 
> christian.del.vecc...@zurich.com 
> http://www.zurich.com 


signature.asc
Description: PGP signature

GIT 2.3.1 - Code Execution Vulnerability

2018-01-25 Thread christian . del . vecchio

dear Team

I am Christian Del Vecchio,and i work in the infrastructure of Middleware on 
Zurich.
we have installed in our system Sun your product in order to connect to our 
bitbucket repository.

we have followed the instruction provided on your Web Page:

https://git-scm.com/download/linux
pkgutil -i git

the version installed is the 2.3.1, and actually it works.

but last week our security team informed that this software didn't pass the 
check control due: Git Server and Client Remote Code Execution Vulnerability


please, is it available a newer version that fix this problem?

our system is: Sun Solaris v10 sparc

best regards
__ 

Christian Del Vecchio 
Middleware SME 

Zurich Insurance Group Ltd. 
bac de Roda 58, 
Building C, 4th floor 
08019 Barcelona, Spain 

64402 (internal) 
+34 93 4465402 (direct) 
christian.del.vecc...@zurich.com 
http://www.zurich.com

Re: [PATCH v3 3/3] sequencer: run 'prepare-commit-msg' hook

2018-01-25 Thread Phillip Wood


On 24/01/18 18:59, Junio C Hamano wrote:

Ramsay Jones  writes:


On 24/01/18 12:34, Phillip Wood wrote:

From: Phillip Wood 

Commit 356ee4659b ("sequencer: try to commit without forking 'git
commit'", 2017-11-24) forgot to run the 'prepare-commit-msg' hook when
creating the commit. Fix this by writing the commit message to a
different file and running the hook. Using a different file means that
if the commit is cancelled the original message file is
unchanged. Also move the checks for an empty commit so the order
matches 'git commit'.

Reported-by: Dmitry Torokhov 
Signed-off-by: Phillip Wood 
Reviewed-by: Ramsay Jones 


Echoing Eric's earlier email, I don't think this Reviewed-by is
warranted - I only requested the addition of a static keyword,
I didn't actually review the patch.


Thanks for clarification, and I tend to agree.  You, Eric and I
certainly did not review what is posted here, so if I "git am" these
patches as-is, we'd be lying.

Having said that, I think this round takes all the review comments
raised against the previous round(s) into account.  So I'm tempted
to tweak them with s/Reviewed-/Helped-/ and queue.

Thanks Junio, I wasn't sure whether to go with Reviewed-by or Helped-by, 
I'll know for next time


Best Wishes

Phillip

Re: [PATCH 00/14] Serialized Commit Graph

2018-01-25 Thread Derrick Stolee


On 1/25/2018 10:46 AM, Ævar Arnfjörð Bjarmason wrote:

On Thu, Jan 25 2018, Derrick Stolee jotted:


* 'git log --topo-order -1000' walks all reachable commits to avoid
   incorrect topological orders, but only needs the commit message for
   the top 1000 commits.

* 'git merge-base  ' may walk many commits to find the correct
   boundary between the commits reachable from A and those reachable
   from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
   compared to their upstream remote branches. This is essentially as
   hard as computing merge bases for each.

This is great, spotted / questions so far:

* git graph --blah says you need to enable the config, should say
   "unknown option --blah ". I.e. overzelous config guard.


This is a good point.


* On a big repo (git show-ref -s | ~/g/git/git-graph --write
   --update-head) is as of writing this still hanging for me, but strace
   shows it's brk()-ing. Presumably just still busy, a progress bar would
   be very nice.


Oops! This is my mistake. The correct command should be:

    git show-ref -s | git graph --write --update-head --stdin-commits

Without "--stdin-commits" the command will walk all packed objects
to look for commits and then build the graph. That's why it's taking
so long. That method takes several minutes on the Linux repo, but with
--stdin-commits it should take as long as "git log >/dev/null".


* Shouldn't there be a pack.useGraph option so this gets auto-updated on
   repack? I understand this series is a WIP, so that's more a "is that
   the UI" than "it needs now".


This will definitely be part of a follow-up patch.

Thanks,
-Stolee

Re: [PATCH 00/14] Serialized Commit Graph

2018-01-25 Thread Ævar Arnfjörð Bjarmason


On Thu, Jan 25 2018, Derrick Stolee jotted:

> * 'git log --topo-order -1000' walks all reachable commits to avoid
>   incorrect topological orders, but only needs the commit message for
>   the top 1000 commits.
>
> * 'git merge-base  ' may walk many commits to find the correct
>   boundary between the commits reachable from A and those reachable
>   from B. No commit messages are needed.
>
> * 'git branch -vv' checks ahead/behind status for all local branches
>   compared to their upstream remote branches. This is essentially as
>   hard as computing merge bases for each.

This is great, spotted / questions so far:

* git graph --blah says you need to enable the config, should say
  "unknown option --blah ". I.e. overzelous config guard.

* On a big repo (git show-ref -s | ~/g/git/git-graph --write
  --update-head) is as of writing this still hanging for me, but strace
  shows it's brk()-ing. Presumably just still busy, a progress bar would
  be very nice.

* Shouldn't there be a pack.useGraph option so this gets auto-updated on
  repack? I understand this series is a WIP, so that's more a "is that
  the UI" than "it needs now".

Re: pushing a delete-only commit consumes too much traffic

2018-01-25 Thread Basin Ilya

> Were the 60Mb of jars previously pushed in a commit that already existed on 
> the upstream?
yes

> Was the delete an actual removal of history or did you commit with the jars 
> deleted, then pushed?
I committed with the jars deleted

> Did you do a merge squash or delete branch to effect the removal.
No




On 25.01.2018 17:24, Randall S. Becker wrote:
> On January 25, 2018 9:15 AM, Basin Ilya wrote:
> 
>> I had a 60Mb worth of unneeded jar files in the project. I created a new
>> branch and performed `git rm` on them. Now while I was pushing the change
>> the counter of sent data reached 80Mb. Why is that?
> 
> Can you provide more info? Were the 60Mb of jars previously pushed in a 
> commit that already existed on the upstream? Was the delete an actual removal 
> of history or did you commit with the jars deleted, then pushed? Did you do a 
> merge squash or delete branch to effect the removal. More info please.
> 
> Cheers,
> Randall
> 
> -- Brief whoami:
>   NonStop developer since approximately NonStop(2112884442)
>   UNIX developer since approximately 421664400
> -- In my real life, I talk too much.
> 
> 
>

RE: pushing a delete-only commit consumes too much traffic

2018-01-25 Thread Randall S. Becker

On January 25, 2018 9:15 AM, Basin Ilya wrote:

> I had a 60Mb worth of unneeded jar files in the project. I created a new
> branch and performed `git rm` on them. Now while I was pushing the change
> the counter of sent data reached 80Mb. Why is that?

Can you provide more info? Were the 60Mb of jars previously pushed in a commit 
that already existed on the upstream? Was the delete an actual removal of 
history or did you commit with the jars deleted, then pushed? Did you do a 
merge squash or delete branch to effect the removal. More info please.

Cheers,
Randall

-- Brief whoami:
  NonStop developer since approximately NonStop(2112884442)
  UNIX developer since approximately 421664400
-- In my real life, I talk too much.

pushing a delete-only commit consumes too much traffic

2018-01-25 Thread Basin Ilya

Hi list.
I had a 60Mb worth of unneeded jar files in the project. I created a new branch 
and performed `git rm` on them. Now while I was pushing the change the counter 
of sent data
reached 80Mb. Why is that?

[PATCH 12/14] packed-graph: read only from specific pack-indexes

2018-01-25 Thread Derrick Stolee

Teach git-graph to inspect the objects only in a certain list of
pack-indexes within the given pack directory. This allows updating
the graph iteratively, since we add all commits stored in a previous
packed graph.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-graph.txt | 12 
 builtin/graph.c | 26 +++---
 packed-graph.c  | 27 +++
 packed-graph.h  |  2 +-
 packfile.c  |  4 ++--
 packfile.h  |  2 ++
 t/t5319-graph.sh| 10 ++
 7 files changed, 69 insertions(+), 14 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index f4f1048d28..b68a61ddea 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -43,6 +43,11 @@ OPTIONS
When used with --write and --update-head, delete the graph file
previously referenced by graph-head.
 
+--stdin-packs::
+   When used with --write, generate the new graph by walking objects
+   only in the specified packfiles and any commits in the
+   existing graph-head.
+
 EXAMPLES
 
 
@@ -65,6 +70,13 @@ $ git graph --write
 $ git graph --write --update-head --delete-expired
 
 
+* Write a graph file, extending the current graph file using commits
+* in , update graph-head, and delete the old graph-.graph file.
++
+
+$ echo  | git graph --write --update-head --delete-expired 
--stdin-packs
+
+
 * Read basic information from a graph file.
 +
 
diff --git a/builtin/graph.c b/builtin/graph.c
index adf779b601..3cace3a18c 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_graph_usage[] ={
N_("git graph [--pack-dir ]"),
N_("git graph --clear [--pack-dir ]"),
N_("git graph --read [--graph-id=]"),
-   N_("git graph --write [--pack-dir ] [--update-head] 
[--delete-expired]"),
+   N_("git graph --write [--pack-dir ] [--update-head] 
[--delete-expired] [--stdin-packs]"),
NULL
 };
 
@@ -24,6 +24,7 @@ static struct opts_graph {
int write;
int update_head;
int delete_expired;
+   int stdin_packs;
int has_existing;
struct object_id old_graph_oid;
 } opts;
@@ -113,7 +114,24 @@ static void update_head_file(const char *pack_dir, const 
struct object_id *graph
 
 static int graph_write(void)
 {
-   struct object_id *graph_id = construct_graph(opts.pack_dir);
+   struct object_id *graph_id;
+   char **pack_indexes = NULL;
+   int num_packs = 0;
+   int size_packs = 0;
+
+   if (opts.stdin_packs) {
+   struct strbuf buf = STRBUF_INIT;
+   size_packs = 128;
+   ALLOC_ARRAY(pack_indexes, size_packs);
+
+   while (strbuf_getline(, stdin) != EOF) {
+   ALLOC_GROW(pack_indexes, num_packs + 1, size_packs);
+   pack_indexes[num_packs++] = buf.buf;
+   strbuf_detach(, NULL);
+   }
+   }
+
+   graph_id = construct_graph(opts.pack_dir, pack_indexes, num_packs);
 
if (opts.update_head)
update_head_file(opts.pack_dir, graph_id);
@@ -150,7 +168,9 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
N_("update graph-head to written graph file")),
OPT_BOOL('d', "delete-expired", _expired,
N_("delete expired head graph file")),
-   { OPTION_STRING, 'M', "graph-id", _id,
+   OPT_BOOL('s', "stdin-packs", _packs,
+   N_("only scan packfiles listed by stdin")),
+   { OPTION_STRING, 'G', "graph-id", _id,
N_("oid"),
N_("An OID for a specific graph file in the pack-dir."),
PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
diff --git a/packed-graph.c b/packed-graph.c
index 343b231973..0dc68a077e 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -401,7 +401,7 @@ static int fill_packed_commit(struct commit *item, struct 
packed_graph *g, uint3
  *  2. date
  *  3. parents.
  *
- * Returns 1 iff the commit was found in the packed graph.
+ * Returns 1 if and only if the commit was found in the packed graph.
  *
  * See parse_commit_buffer() for the fallback after this call.
  */
@@ -427,7 +427,7 @@ int parse_packed_commit(struct commit *item)
return fill_packed_commit(item, packed_graph, pos);
}
 
-   return parse_commit_internal(item, 0, 0);
+   return 0;
 }
 
 static void write_graph_chunk_fanout(
@@ -638,7 +638,7 @@ static int if_packed_commit_add_to_list(const struct 
object_id *oid,
return 0;
 }
 
-struct object_id

[PATCH 13/14] packed-graph: close under reachability

2018-01-25 Thread Derrick Stolee

Teach construct_graph() to walk all parents from the commits discovered in
packfiles. This prevents gaps given by loose objects or previously-missed
packfiles.

Signed-off-by: Derrick Stolee 
---
 packed-graph.c   | 26 ++
 t/t5319-graph.sh | 14 ++
 2 files changed, 40 insertions(+)

diff --git a/packed-graph.c b/packed-graph.c
index 0dc68a077e..c93515f18e 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -5,6 +5,7 @@
 #include "packfile.h"
 #include "commit.h"
 #include "object.h"
+#include "revision.h"
 #include "packed-graph.h"
 
 #define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
@@ -638,6 +639,29 @@ static int if_packed_commit_add_to_list(const struct 
object_id *oid,
return 0;
 }
 
+static void close_reachable(struct packed_oid_list *oids)
+{
+   int i;
+   struct rev_info revs;
+   struct commit *commit;
+   init_revisions(, NULL);
+
+   for (i = 0; i < oids->num; i++) {
+   commit = lookup_commit(oids->list[i]);
+   if (commit && !parse_commit(commit))
+   revs.commits = commit_list_insert(commit, 
);
+   }
+
+   if (prepare_revision_walk())
+   die(_("revision walk setup failed"));
+
+   while ((commit = get_revision()) != NULL) {
+   ALLOC_GROW(oids->list, oids->num + 1, oids->size);
+   oids->list[oids->num] = &(commit->object.oid);
+   (oids->num)++;
+   }
+}
+
 struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, 
int nr_packs)
 {
// Find a list of oids, adding the pointer to a list.
@@ -698,6 +722,8 @@ struct object_id *construct_graph(const char *pack_dir, 
char **pack_indexes, int
} else {
for_each_packed_object(if_packed_commit_add_to_list, , 0);
}
+
+   close_reachable();
QSORT(oids.list, oids.num, commit_compare);
 
count_distinct = 1;
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 969150cd21..8bf5a0c993 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -212,6 +212,20 @@ test_expect_success 'clear graph' \
 _graph_git_behavior commits/20 merge/1
 _graph_git_behavior commits/20 merge/2
 
+test_expect_success 'build graph from latest pack with closure' \
+'graph5=$(cat new-idx | git graph --write --update-head --stdin-packs) &&
+ test_path_is_file ${packdir}/graph-${graph5}.graph &&
+ test_path_is_file ${packdir}/graph-${graph1}.graph &&
+ test_path_is_file ${packdir}/graph-head &&
+ echo ${graph5} >expect &&
+ cmp -n 40 expect ${packdir}/graph-head &&
+ git graph --read --graph-id=${graph5} >output &&
+ _graph_read_expect "21" "${packdir}" &&
+ cmp expect output'
+
+_graph_git_behavior commits/20 merge/1
+_graph_git_behavior commits/20 merge/2
+
 test_expect_success 'setup bare repo' \
 'cd .. &&
  git clone --bare full bare &&
-- 
2.16.0

[PATCH 11/14] commit: integrate packed graph with commit parsing

2018-01-25 Thread Derrick Stolee

Teach Git to inspect a packed graph to supply the contents of a
struct commit when calling parse_commit_gently(). This implementation
satisfies all post-conditions on the struct commit, including loading
parents, the root tree, and the commit date. The only loosely-expected
condition is that the commit buffer is loaded into the cache. This
was checked in log-tree.c:show_log(), but the "return;" on failure
produced unexpected results (i.e. the message line was never terminated).
The new behavior of loading the buffer when needed prevents the
unexpected behavior.

If core.graph is false, then do not load the graph and behave as usual.

In test script t5319-graph.sh, add output-matching conditions on read-
only graph operations.

By loading commits from the graph instead of parsing commit buffers, we
save a lot of time on long commits walks. Here are some performance
results for a copy of the Linux repository where 'master' has 704,766
reachable commits and is behind 'origin/master' by 19,610 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv   |  0.42s |  0.27s | -35%  |
| rev-list --all   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

Signed-off-by: Derrick Stolee 
---
 alloc.c  |   1 +
 commit.c |  20 -
 commit.h |   2 +
 log-tree.c   |   3 +-
 packed-graph.c   | 242 +++
 packed-graph.h   |  18 +
 t/t5319-graph.sh | 114 --
 7 files changed, 387 insertions(+), 13 deletions(-)

diff --git a/alloc.c b/alloc.c
index 12afadfacd..4a4dcfa2b7 100644
--- a/alloc.c
+++ b/alloc.c
@@ -93,6 +93,7 @@ void *alloc_commit_node(void)
struct commit *c = alloc_node(_state, sizeof(struct commit));
c->object.type = OBJ_COMMIT;
c->index = alloc_commit_index();
+   c->graphId = 0x;
return c;
 }
 
diff --git a/commit.c b/commit.c
index cab8d4455b..253c102808 100644
--- a/commit.c
+++ b/commit.c
@@ -12,6 +12,7 @@
 #include "prio-queue.h"
 #include "sha1-lookup.h"
 #include "wt-status.h"
+#include "packed-graph.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char 
*buf, size_t len, const char **);
 
@@ -374,7 +375,7 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
check_packed)
 {
enum object_type type;
void *buffer;
@@ -383,19 +384,27 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
 
if (!item)
return -1;
+
+   // If we already parsed, but got it from the graph, then keep going!
if (item->object.parsed)
return 0;
+
+   if (check_packed && parse_packed_commit(item))
+   return 0;
+
buffer = read_sha1_file(item->object.oid.hash, , );
if (!buffer)
return quiet_on_missing ? -1 :
error("Could not read %s",
-oid_to_hex(>object.oid));
+   oid_to_hex(>object.oid));
if (type != OBJ_COMMIT) {
free(buffer);
return error("Object %s not a commit",
-oid_to_hex(>object.oid));
+   oid_to_hex(>object.oid));
}
+
ret = parse_commit_buffer(item, buffer, size);
+
if (save_commit_buffer && !ret) {
set_commit_buffer(item, buffer, size);
return 0;
@@ -404,6 +413,11 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+   return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
if (parse_commit(item))
diff --git a/commit.h b/commit.h
index 8c68ca1a5a..02f5f2a182 100644
--- a/commit.h
+++ b/commit.h
@@ -21,6 +21,7 @@ struct commit {
timestamp_t date;
struct commit_list *parents;
struct tree *tree;
+   uint32_t graphId;
 };
 
 extern int save_commit_buffer;
@@ -60,6 +61,7 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size);
+extern int parse_commit_internal(struct commit *item, int quiet_on_missing, 
int check_packed);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
diff --git a/log-tree.c

[PATCH 05/14] packed-graph: implement construct_graph()

2018-01-25 Thread Derrick Stolee

Teach Git to write a packed graph file by checking all packed objects
to see if they are commits, then store the file in the given pack
directory.

Signed-off-by: Derrick Stolee 
---
 Makefile   |   1 +
 packed-graph.c | 375 +
 packed-graph.h |  20 +++
 3 files changed, 396 insertions(+)
 create mode 100644 packed-graph.c
 create mode 100644 packed-graph.h

diff --git a/Makefile b/Makefile
index d8b0d0457a..59439e13a1 100644
--- a/Makefile
+++ b/Makefile
@@ -841,6 +841,7 @@ LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidmap.o
 LIB_OBJS += oidset.o
+LIB_OBJS += packed-graph.o
 LIB_OBJS += packfile.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
diff --git a/packed-graph.c b/packed-graph.c
new file mode 100644
index 00..9be9811667
--- /dev/null
+++ b/packed-graph.c
@@ -0,0 +1,375 @@
+#include "cache.h"
+#include "config.h"
+#include "git-compat-util.h"
+#include "pack.h"
+#include "packfile.h"
+#include "commit.h"
+#include "object.h"
+#include "packed-graph.h"
+
+#define GRAPH_SIGNATURE 0x43475048 /* "CGPH" */
+#define GRAPH_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
+#define GRAPH_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
+#define GRAPH_CHUNKID_DATA 0x43444154 /* "CDAT" */
+#define GRAPH_CHUNKID_LARGEEDGES 0x45444745 /* "EDGE" */
+
+#define GRAPH_DATA_WIDTH 36
+
+#define GRAPH_VERSION_1 0x1
+#define GRAPH_VERSION GRAPH_VERSION_1
+
+#define GRAPH_OID_VERSION_SHA1 1
+#define GRAPH_OID_LEN_SHA1 20
+#define GRAPH_OID_VERSION GRAPH_OID_VERSION_SHA1
+#define GRAPH_OID_LEN GRAPH_OID_LEN_SHA1
+
+#define GRAPH_LARGE_EDGES_NEEDED 0x8000
+#define GRAPH_PARENT_MISSING 0x7fff
+#define GRAPH_EDGE_LAST_MASK 0x7fff
+#define GRAPH_PARENT_NONE 0x7000
+
+#define GRAPH_LAST_EDGE 0x8000
+
+char* get_graph_filename_oid(const char *pack_dir,
+ struct object_id *oid)
+{
+   size_t len;
+   struct strbuf head_path = STRBUF_INIT;
+   strbuf_addstr(_path, pack_dir);
+   strbuf_addstr(_path, "/graph-");
+   strbuf_addstr(_path, oid_to_hex(oid));
+   strbuf_addstr(_path, ".graph");
+
+   return strbuf_detach(_path, );
+}
+
+static void write_graph_chunk_fanout(
+   struct sha1file *f,
+   struct commit **commits, int nr_commits)
+{
+   uint32_t i, count = 0;
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+
+   /*
+   * Write the first-level table (the list is sorted,
+   * but we use a 256-entry lookup to be able to avoid
+   * having to do eight extra binary search iterations).
+   */
+   for (i = 0; i < 256; i++) {
+   uint32_t swap_count;
+
+   while (list < last) {
+   if ((*list)->object.oid.hash[0] != i)
+   break;
+   count++;
+   list++;
+   }
+
+   swap_count = htonl(count);
+   sha1write(f, _count, 4);
+   }
+}
+
+static void write_graph_chunk_oids(
+   struct sha1file *f, int hash_len,
+   struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   uint32_t i;
+   for (i = 0; i < nr_commits; i++) {
+   sha1write(f, (*list)->object.oid.hash, (int)hash_len);
+   list++;
+   }
+}
+
+static int commit_pos(struct commit **commits, int nr_commits, const struct 
object_id *oid, uint32_t *pos)
+{
+   uint32_t first = 0, last = nr_commits;
+
+   while (first < last) {
+   uint32_t mid = first + (last - first) / 2;
+   struct object_id *current;
+   int cmp;
+
+   current = &(commits[mid]->object.oid);
+   cmp = oidcmp(oid, current);
+   if (!cmp) {
+   *pos = mid;
+   return 1;
+   }
+   if (cmp > 0) {
+   first = mid + 1;
+   continue;
+   }
+   last = mid;
+   }
+
+   *pos = first;
+   return 0;
+}
+
+static void write_graph_chunk_data(
+   struct sha1file *f, int hash_len,
+   struct commit **commits, int nr_commits)
+{
+   struct commit **list = commits;
+   struct commit **last = commits + nr_commits;
+   uint32_t num_large_edges = 0;
+
+   while (list < last) {
+   struct commit_list *parent;
+   uint32_t intId, swapIntId;
+   uint32_t packedDate[2];
+
+   parse_commit(*list);
+   sha1write(f, (*list)->tree->object.oid.hash, hash_len);
+
+   parent = (*list)->parents;
+
+   if (!parent)
+   swapIntId = htonl(GRAPH_PARENT_NONE);
+   else if (commit_pos(commits, nr_commits, 
&(parent->item->object.oid), ))
+   swapIntId = htonl(intId);
+   else
+

[PATCH 07/14] packed-graph: implement git-graph --read

2018-01-25 Thread Derrick Stolee

Teach git-graph to read packed graph files and summarize their contents.

Use the --read option to verify the contents of a graph file in the
graph tests.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-graph.txt |   7 +++
 builtin/graph.c |  54 
 packed-graph.c  | 147 +++-
 packed-graph.h  |  25 
 t/t5319-graph.sh|  50 +--
 5 files changed, 260 insertions(+), 23 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index be6bc38814..0939c3f1be 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -10,6 +10,7 @@ SYNOPSIS
 
 [verse]
 'git graph' --write  [--pack-dir ]
+'git graph' --read  [--pack-dir ]
 
 EXAMPLES
 
@@ -20,6 +21,12 @@ EXAMPLES
 $ git midx --write
 
 
+* Read basic information from a graph file.
++
+
+$ git midx --read --graph-id=
+
+
 CONFIGURATION
 -
 
diff --git a/builtin/graph.c b/builtin/graph.c
index 09f5552338..bc66722924 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -10,15 +10,58 @@
 
 static char const * const builtin_graph_usage[] ={
N_("git graph [--pack-dir ]"),
+   N_("git graph --read [--graph-id=]"),
N_("git graph --write [--pack-dir ]"),
NULL
 };
 
 static struct opts_graph {
const char *pack_dir;
+   int read;
+   const char *graph_id;
int write;
 } opts;
 
+static int graph_read(void)
+{
+   struct object_id graph_oid;
+   struct packed_graph *graph = 0;
+   const char *graph_file;
+
+   if (opts.graph_id && strlen(opts.graph_id) == GIT_MAX_HEXSZ)
+   get_oid_hex(opts.graph_id, _oid);
+   else
+   die("no graph id specified");
+
+   graph_file = get_graph_filename_oid(opts.pack_dir, _oid);
+   graph = load_packed_graph_one(graph_file, opts.pack_dir);
+
+   if (!graph)
+   die("graph file %s does not exist.\n", graph_file);
+
+   printf("header: %08x %02x %02x %02x %02x\n",
+   ntohl(graph->hdr->graph_signature),
+   graph->hdr->graph_version,
+   graph->hdr->hash_version,
+   graph->hdr->hash_len,
+   graph->hdr->num_chunks);
+   printf("num_commits: %u\n", graph->num_commits);
+   printf("chunks:");
+
+   if (graph->chunk_oid_fanout)
+   printf(" oid_fanout");
+   if (graph->chunk_oid_lookup)
+   printf(" oid_lookup");
+   if (graph->chunk_commit_data)
+   printf(" commit_metadata");
+   if (graph->chunk_large_edges)
+   printf(" large_edges");
+   printf("\n");
+
+   printf("pack_dir: %s\n", graph->pack_dir);
+   return 0;
+}
+
 static int graph_write(void)
 {
struct object_id *graph_id = construct_graph(opts.pack_dir);
@@ -36,8 +79,14 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
{ OPTION_STRING, 'p', "pack-dir", _dir,
N_("dir"),
N_("The pack directory to store the graph") },
+   OPT_BOOL('r', "read", ,
+   N_("read graph file")),
OPT_BOOL('w', "write", ,
N_("write graph file")),
+   { OPTION_STRING, 'M', "graph-id", _id,
+   N_("oid"),
+   N_("An OID for a specific graph file in the pack-dir."),
+   PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
OPT_END(),
};
 
@@ -52,6 +101,9 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
 builtin_graph_options,
 builtin_graph_usage, 0);
 
+   if (opts.write + opts.read > 1)
+   usage_with_options(builtin_graph_usage, builtin_graph_options);
+
if (!opts.pack_dir) {
struct strbuf path = STRBUF_INIT;
strbuf_addstr(, get_object_directory());
@@ -59,6 +111,8 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
opts.pack_dir = strbuf_detach(, NULL);
}
 
+   if (opts.read)
+   return graph_read();
if (opts.write)
return graph_write();
 
diff --git a/packed-graph.c b/packed-graph.c
index 9be9811667..eaa656becb 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -30,6 +30,11 @@
 
 #define GRAPH_LAST_EDGE 0x8000
 
+#define GRAPH_FANOUT_SIZE (4*256)
+#define GRAPH_CHUNKLOOKUP_SIZE (5 * 12)
+#define GRAPH_MIN_SIZE (GRAPH_CHUNKLOOKUP_SIZE + GRAPH_FANOUT_SIZE + \
+   GRAPH_OID_LEN + sizeof(struct packed_graph_header))
+
 char* get_graph_filename_oid(const char *pack_dir,
  struct object_id

[PATCH 10/14] packed-graph: teach git-graph --delete-expired

2018-01-25 Thread Derrick Stolee

Teach git-graph to delete the graph previously referenced by 'graph_head'
when writing a new graph file and updating 'graph_head'. This prevents
data creep by storing a list of useless graphs. Be careful to not delete
the graph if the file did not change.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-graph.txt |  8 ++--
 builtin/graph.c | 14 +-
 t/t5319-graph.sh| 37 +++--
 3 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index f690699570..f4f1048d28 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -39,6 +39,10 @@ OPTIONS
When used with --write, update the graph-head file to point to
the written graph file.
 
+--delete-expired::
+   When used with --write and --update-head, delete the graph file
+   previously referenced by graph-head.
+
 EXAMPLES
 
 
@@ -55,10 +59,10 @@ $ git graph --write
 
 
 * Write a graph file for the packed commits in your local .git folder,
-* and update graph-head.
+* update graph-head, and delete the old graph-.graph file.
 +
 
-$ git graph --write --update-head
+$ git graph --write --update-head --delete-expired
 
 
 * Read basic information from a graph file.
diff --git a/builtin/graph.c b/builtin/graph.c
index ac15febc46..adf779b601 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_graph_usage[] ={
N_("git graph [--pack-dir ]"),
N_("git graph --clear [--pack-dir ]"),
N_("git graph --read [--graph-id=]"),
-   N_("git graph --write [--pack-dir ] [--update-head]"),
+   N_("git graph --write [--pack-dir ] [--update-head] 
[--delete-expired]"),
NULL
 };
 
@@ -23,6 +23,7 @@ static struct opts_graph {
const char *graph_id;
int write;
int update_head;
+   int delete_expired;
int has_existing;
struct object_id old_graph_oid;
 } opts;
@@ -120,6 +121,15 @@ static int graph_write(void)
if (graph_id)
printf("%s\n", oid_to_hex(graph_id));
 
+   if (opts.delete_expired && opts.update_head && opts.has_existing &&
+   oidcmp(graph_id, _graph_oid)) {
+   char *old_path = get_graph_filename_oid(opts.pack_dir, 
_graph_oid);
+   if (remove_path(old_path))
+   die("failed to remove path %s", old_path);
+
+   free(old_path);
+   }
+
free(graph_id);
return 0;
 }
@@ -138,6 +148,8 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
N_("write graph file")),
OPT_BOOL('u', "update-head", _head,
N_("update graph-head to written graph file")),
+   OPT_BOOL('d', "delete-expired", _expired,
+   N_("delete expired head graph file")),
{ OPTION_STRING, 'M', "graph-id", _id,
N_("oid"),
N_("An OID for a specific graph file in the pack-dir."),
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 311fb9dd67..a70c7bbb02 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -80,9 +80,42 @@ test_expect_success 'write graph with merges' \
  _graph_read_expect "18" "${packdir}" &&
  cmp expect output'
 
+test_expect_success 'Add more commits' \
+'git reset --hard commits/3 &&
+ for i in $(test_seq 16 20)
+ do
+git commit --allow-empty -m "commit $i" &&
+git branch commits/$i
+ done &&
+ git repack'
+
+test_expect_success 'write graph with merges' \
+'graph3=$(git graph --write --update-head --delete-expired) &&
+ test_path_is_file ${packdir}/graph-${graph3}.graph &&
+ test_path_is_missing ${packdir}/graph-${graph2}.graph &&
+ test_path_is_file ${packdir}/graph-${graph1}.graph &&
+ test_path_is_file ${packdir}/graph-head &&
+ echo ${graph3} >expect &&
+ cmp -n 40 expect ${packdir}/graph-head &&
+ git graph --read --graph-id=${graph3} >output &&
+ _graph_read_expect "23" "${packdir}" &&
+ cmp expect output'
+
+test_expect_success 'write graph with nothing new' \
+'graph4=$(git graph --write --update-head --delete-expired) &&
+ test_path_is_file ${packdir}/graph-${graph4}.graph &&
+ test_path_is_file ${packdir}/graph-${graph1}.graph &&
+ test_path_is_file ${packdir}/graph-head &&
+ echo ${graph4} >expect &&
+ cmp -n 40 expect ${packdir}/graph-head &&
+ git graph --read --graph-id=${graph4} >output &&
+ _graph_read_expect "23" "${packdir}" &&
+ cmp expect output'
+
 test_expect_success 'clear graph' \
 'git graph --clear &&
- test_path_is_missing ${packdir}/graph-${graph2}.graph &&
+

[PATCH 09/14] packed-graph: implement git-graph --clear

2018-01-25 Thread Derrick Stolee

Teach Git to delete the current 'graph_head' file and the packed graph
it references. This is a good safety valve if somehow the file is
corrupted and needs to be recalculated. Since the packed graph is a
summary of contents already in the ODB, it can be regenerated.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-graph.txt | 16 ++--
 builtin/graph.c | 31 ++-
 t/t5319-graph.sh|  7 ++-
 3 files changed, 50 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index ac20aa67a9..f690699570 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -11,6 +11,7 @@ SYNOPSIS
 [verse]
 'git graph' --write  [--pack-dir ]
 'git graph' --read  [--pack-dir ]
+'git graph' --clear [--pack-dir ]
 
 OPTIONS
 ---
@@ -18,16 +19,21 @@ OPTIONS
Use given directory for the location of packfiles, graph-head,
and graph files.
 
+--clear::
+   Delete the graph-head file and the graph file it references.
+   (Cannot be combined with --read or --write.)
+
 --read::
Read a graph file given by the graph-head file and output basic
-   details about the graph file. (Cannot be combined with --write.)
+   details about the graph file. (Cannot be combined with --clear
+   or --write.)
 
 --graph-id::
When used with --read, consider the graph file graph-.graph.
 
 --write::
Write a new graph file to the pack directory. (Cannot be combined
-   with --read.)
+   with --clear or --read.)
 
 --update-head::
When used with --write, update the graph-head file to point to
@@ -61,6 +67,12 @@ $ git graph --write --update-head
 $ git graph --read --graph-id=
 
 
+* Delete /graph-head and the file it references.
++
+
+$ git graph --clear --pack-dir=
+
+
 CONFIGURATION
 -
 
diff --git a/builtin/graph.c b/builtin/graph.c
index 0760d99f43..ac15febc46 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -10,6 +10,7 @@
 
 static char const * const builtin_graph_usage[] ={
N_("git graph [--pack-dir ]"),
+   N_("git graph --clear [--pack-dir ]"),
N_("git graph --read [--graph-id=]"),
N_("git graph --write [--pack-dir ] [--update-head]"),
NULL
@@ -17,6 +18,7 @@ static char const * const builtin_graph_usage[] ={
 
 static struct opts_graph {
const char *pack_dir;
+   int clear;
int read;
const char *graph_id;
int write;
@@ -25,6 +27,29 @@ static struct opts_graph {
struct object_id old_graph_oid;
 } opts;
 
+static int graph_clear(void)
+{
+   struct strbuf head_path = STRBUF_INIT;
+   char *old_path;
+
+   if (!opts.has_existing)
+   return 0;
+
+   strbuf_addstr(_path, opts.pack_dir);
+   strbuf_addstr(_path, "/");
+   strbuf_addstr(_path, "graph-head");
+   if (remove_path(head_path.buf))
+   die("failed to remove path %s", head_path.buf);
+   strbuf_release(_path);
+
+   old_path = get_graph_filename_oid(opts.pack_dir, _graph_oid);
+   if (remove_path(old_path))
+   die("failed to remove path %s", old_path);
+   free(old_path);
+
+   return 0;
+}
+
 static int graph_read(void)
 {
struct object_id graph_oid;
@@ -105,6 +130,8 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
{ OPTION_STRING, 'p', "pack-dir", _dir,
N_("dir"),
N_("The pack directory to store the graph") },
+   OPT_BOOL('c', "clear", ,
+   N_("clear graph file and graph-head")),
OPT_BOOL('r', "read", ,
N_("read graph file")),
OPT_BOOL('w', "write", ,
@@ -129,7 +156,7 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
 builtin_graph_options,
 builtin_graph_usage, 0);
 
-   if (opts.write + opts.read > 1)
+   if (opts.write + opts.read + opts.clear > 1)
usage_with_options(builtin_graph_usage, builtin_graph_options);
 
if (!opts.pack_dir) {
@@ -141,6 +168,8 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
 
opts.has_existing = !!get_graph_head_oid(opts.pack_dir, 
_graph_oid);
 
+   if (opts.clear)
+   return graph_clear();
if (opts.read)
return graph_read();
if (opts.write)
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
index 3919a3ad73..311fb9dd67 100755
--- a/t/t5319-graph.sh
+++ b/t/t5319-graph.sh
@@ -80,6 +80,11 @@ test_expect_success 'write graph with merges' \
  _graph_read_expect "18" "${packdir}" &&
  cmp expect output'
 
+test_expect_success 'clear graph' \
+'git graph

[PATCH 08/14] graph: implement git-graph --update-head

2018-01-25 Thread Derrick Stolee

It is possible to have multiple packed graph files in a pack directory,
but only one is important at a time. Use a 'graph_head' file to point
to the important file. Teach git-graph to write 'graph_head' upon
writing a new packed graph file.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-graph.txt | 38 --
 builtin/graph.c | 38 +++---
 packed-graph.c  | 25 +
 packed-graph.h  |  1 +
 t/t5319-graph.sh| 12 ++--
 5 files changed, 107 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index 0939c3f1be..ac20aa67a9 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -12,19 +12,53 @@ SYNOPSIS
 'git graph' --write  [--pack-dir ]
 'git graph' --read  [--pack-dir ]
 
+OPTIONS
+---
+--pack-dir::
+   Use given directory for the location of packfiles, graph-head,
+   and graph files.
+
+--read::
+   Read a graph file given by the graph-head file and output basic
+   details about the graph file. (Cannot be combined with --write.)
+
+--graph-id::
+   When used with --read, consider the graph file graph-.graph.
+
+--write::
+   Write a new graph file to the pack directory. (Cannot be combined
+   with --read.)
+
+--update-head::
+   When used with --write, update the graph-head file to point to
+   the written graph file.
+
 EXAMPLES
 
 
+* Output the OID of the graph file pointed to by /graph-head.
++
+
+$ git graph --pack-dir=
+
+
 * Write a graph file for the packed commits in your local .git folder.
 +
 
-$ git midx --write
+$ git graph --write
+
+
+* Write a graph file for the packed commits in your local .git folder,
+* and update graph-head.
++
+
+$ git graph --write --update-head
 
 
 * Read basic information from a graph file.
 +
 
-$ git midx --read --graph-id=
+$ git graph --read --graph-id=
 
 
 CONFIGURATION
diff --git a/builtin/graph.c b/builtin/graph.c
index bc66722924..0760d99f43 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -11,7 +11,7 @@
 static char const * const builtin_graph_usage[] ={
N_("git graph [--pack-dir ]"),
N_("git graph --read [--graph-id=]"),
-   N_("git graph --write [--pack-dir ]"),
+   N_("git graph --write [--pack-dir ] [--update-head]"),
NULL
 };
 
@@ -20,6 +20,9 @@ static struct opts_graph {
int read;
const char *graph_id;
int write;
+   int update_head;
+   int has_existing;
+   struct object_id old_graph_oid;
 } opts;
 
 static int graph_read(void)
@@ -30,8 +33,8 @@ static int graph_read(void)
 
if (opts.graph_id && strlen(opts.graph_id) == GIT_MAX_HEXSZ)
get_oid_hex(opts.graph_id, _oid);
-   else
-   die("no graph id specified");
+   else if (!get_graph_head_oid(opts.pack_dir, _oid))
+   die("no graph-head exists.");
 
graph_file = get_graph_filename_oid(opts.pack_dir, _oid);
graph = load_packed_graph_one(graph_file, opts.pack_dir);
@@ -62,10 +65,33 @@ static int graph_read(void)
return 0;
 }
 
+static void update_head_file(const char *pack_dir, const struct object_id 
*graph_id)
+{
+   struct strbuf head_path = STRBUF_INIT;
+   int fd;
+   struct lock_file lk = LOCK_INIT;
+
+   strbuf_addstr(_path, pack_dir);
+   strbuf_addstr(_path, "/");
+   strbuf_addstr(_path, "graph-head");
+
+   fd = hold_lock_file_for_update(, head_path.buf, LOCK_DIE_ON_ERROR);
+   strbuf_release(_path);
+
+   if (fd < 0)
+   die_errno("unable to open graph-head");
+
+   write_in_full(fd, oid_to_hex(graph_id), GIT_MAX_HEXSZ);
+   commit_lock_file();
+}
+
 static int graph_write(void)
 {
struct object_id *graph_id = construct_graph(opts.pack_dir);
 
+   if (opts.update_head)
+   update_head_file(opts.pack_dir, graph_id);
+
if (graph_id)
printf("%s\n", oid_to_hex(graph_id));
 
@@ -83,6 +109,8 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
N_("read graph file")),
OPT_BOOL('w', "write", ,
N_("write graph file")),
+   OPT_BOOL('u', "update-head", _head,
+   N_("update graph-head to written graph file")),
{ OPTION_STRING, 'M', "graph-id", _id,
N_("oid"),
N_("An OID for a specific graph file in the

[PATCH 06/14] packed-graph: implement git-graph --write

2018-01-25 Thread Derrick Stolee

Teach git-graph to write graph files. Create new test script to verify
this command succeeds without failure.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-graph.txt | 26 ++
 builtin/graph.c | 37 ++--
 t/t5319-graph.sh| 83 +
 3 files changed, 143 insertions(+), 3 deletions(-)
 create mode 100755 t/t5319-graph.sh

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
index de5a3c07e6..be6bc38814 100644
--- a/Documentation/git-graph.txt
+++ b/Documentation/git-graph.txt
@@ -5,3 +5,29 @@ NAME
 
 git-graph - Write and verify Git commit graphs (.graph files)
 
+
+SYNOPSIS
+
+[verse]
+'git graph' --write  [--pack-dir ]
+
+EXAMPLES
+
+
+* Write a graph file for the packed commits in your local .git folder.
++
+
+$ git midx --write
+
+
+CONFIGURATION
+-
+
+core.graph::
+   The graph command will fail if core.graph is false.
+   Also, the written graph files will be ignored by other commands
+   unless core.graph is true.
+
+GIT
+---
+Part of the linkgit:git[1] suite
\ No newline at end of file
diff --git a/builtin/graph.c b/builtin/graph.c
index a902dc8646..09f5552338 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -6,31 +6,62 @@
 #include "lockfile.h"
 #include "packfile.h"
 #include "parse-options.h"
+#include "packed-graph.h"
 
 static char const * const builtin_graph_usage[] ={
N_("git graph [--pack-dir ]"),
+   N_("git graph --write [--pack-dir ]"),
NULL
 };
 
 static struct opts_graph {
const char *pack_dir;
+   int write;
 } opts;
 
+static int graph_write(void)
+{
+   struct object_id *graph_id = construct_graph(opts.pack_dir);
+
+   if (graph_id)
+   printf("%s\n", oid_to_hex(graph_id));
+
+   free(graph_id);
+   return 0;
+}
+
 int cmd_graph(int argc, const char **argv, const char *prefix)
 {
static struct option builtin_graph_options[] = {
{ OPTION_STRING, 'p', "pack-dir", _dir,
N_("dir"),
N_("The pack directory to store the graph") },
+   OPT_BOOL('w', "write", ,
+   N_("write graph file")),
OPT_END(),
};
 
-   if (!core_graph)
-   die("core.graph is false");
-
if (argc == 2 && !strcmp(argv[1], "-h"))
usage_with_options(builtin_graph_usage, builtin_graph_options);
 
+   git_config(git_default_config, NULL);
+   if (!core_graph)
+   die("git-graph requires core.graph=true.");
+
+   argc = parse_options(argc, argv, prefix,
+builtin_graph_options,
+builtin_graph_usage, 0);
+
+   if (!opts.pack_dir) {
+   struct strbuf path = STRBUF_INIT;
+   strbuf_addstr(, get_object_directory());
+   strbuf_addstr(, "/pack");
+   opts.pack_dir = strbuf_detach(, NULL);
+   }
+
+   if (opts.write)
+   return graph_write();
+
return 0;
 }
 
diff --git a/t/t5319-graph.sh b/t/t5319-graph.sh
new file mode 100755
index 00..52e979dfd3
--- /dev/null
+++ b/t/t5319-graph.sh
@@ -0,0 +1,83 @@
+#!/bin/sh
+
+test_description='packed graph'
+. ./test-lib.sh
+
+test_expect_success \
+'setup full repo' \
+'rm -rf .git &&
+ mkdir full &&
+ cd full &&
+ git init &&
+ git config core.graph true &&
+ git config pack.threads 1 &&
+ packdir=".git/objects/pack"'
+
+test_expect_success \
+'write graph with no packs' \
+'git graph --write --pack-dir .'
+
+test_expect_success \
+'create commits and repack' \
+'for i in $(test_seq 5)
+ do
+echo $i >$i.txt &&
+git add $i.txt &&
+git commit -m "commit $i" &&
+git branch commits/$i
+ done &&
+ git repack'
+
+test_expect_success \
+'write graph' \
+'graph1=$(git graph --write) &&
+ test_path_is_file ${packdir}/graph-${graph1}.graph'
+
+test_expect_success \
+'Add more commits' \
+'git reset --hard commits/3 &&
+ for i in $(test_seq 6 10)
+ do
+echo $i >$i.txt &&
+git add $i.txt &&
+git commit -m "commit $i" &&
+git branch commits/$i
+ done &&
+ git reset --hard commits/7 &&
+ for i in $(test_seq 11 15)
+ do
+echo $i >$i.txt &&
+git add $i.txt &&
+git commit -m "commit $i" &&
+git branch commits/$i
+ done &&
+ git reset --hard commits/7 &&
+ git merge commits/4 &&
+ git branch merge/1 &&
+ git reset --hard commits/8 &&
+ git merge commits/11 &&
+ git branch merge/2 &&
+ git reset --hard commits/9 &&
+ git merge commits/5 commits/13 &&
+ git repack'
+
+test_expect_success \
+'write graph

[PATCH 14/14] packed-graph: teach git-graph to read commits

2018-01-25 Thread Derrick Stolee

Teach git-graph to read commits from stdin when the --stdin-commits
flag is specified. Commits reachable from these commits are added to
the graph. This is a much faster way to construct the graph than
inspecting all packed objects, but is restricted to known tips.

For the Linux repository, 700,000+ commits were added to the graph
file starting from 'master' in 7-9 seconds, depending on the number
of packfiles in the repo (1, 24, or 120).

Signed-off-by: Derrick Stolee 
---
 builtin/graph.c  | 33 +
 packed-graph.c   | 18 +++---
 packed-graph.h   |  3 ++-
 t/t5319-graph.sh | 18 ++
 4 files changed, 60 insertions(+), 12 deletions(-)

diff --git a/builtin/graph.c b/builtin/graph.c
index 3cace3a18c..708889677b 100644
--- a/builtin/graph.c
+++ b/builtin/graph.c
@@ -12,7 +12,7 @@ static char const * const builtin_graph_usage[] ={
N_("git graph [--pack-dir ]"),
N_("git graph --clear [--pack-dir ]"),
N_("git graph --read [--graph-id=]"),
-   N_("git graph --write [--pack-dir ] [--update-head] 
[--delete-expired] [--stdin-packs]"),
+   N_("git graph --write [--pack-dir ] [--update-head] 
[--delete-expired] [--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -25,6 +25,7 @@ static struct opts_graph {
int update_head;
int delete_expired;
int stdin_packs;
+   int stdin_commits;
int has_existing;
struct object_id old_graph_oid;
 } opts;
@@ -116,22 +117,36 @@ static int graph_write(void)
 {
struct object_id *graph_id;
char **pack_indexes = NULL;
+   char **commits = NULL;
int num_packs = 0;
-   int size_packs = 0;
+   int num_commits = 0;
+   char **lines = NULL;
+   int num_lines = 0;
+   int size_lines = 0;
 
-   if (opts.stdin_packs) {
+   if (opts.stdin_packs || opts.stdin_commits) {
struct strbuf buf = STRBUF_INIT;
-   size_packs = 128;
-   ALLOC_ARRAY(pack_indexes, size_packs);
+   size_lines = 128;
+   ALLOC_ARRAY(lines, size_lines);
 
while (strbuf_getline(, stdin) != EOF) {
-   ALLOC_GROW(pack_indexes, num_packs + 1, size_packs);
-   pack_indexes[num_packs++] = buf.buf;
+   ALLOC_GROW(lines, num_lines + 1, size_lines);
+   lines[num_lines++] = buf.buf;
strbuf_detach(, NULL);
}
+
+   if (opts.stdin_packs) {
+   pack_indexes = lines;
+   num_packs = num_lines;
+   }
+   if (opts.stdin_commits) {
+   commits = lines;
+   num_commits = num_lines;
+   }
}
 
-   graph_id = construct_graph(opts.pack_dir, pack_indexes, num_packs);
+   graph_id = construct_graph(opts.pack_dir, pack_indexes, num_packs,
+  commits, num_commits);
 
if (opts.update_head)
update_head_file(opts.pack_dir, graph_id);
@@ -170,6 +185,8 @@ int cmd_graph(int argc, const char **argv, const char 
*prefix)
N_("delete expired head graph file")),
OPT_BOOL('s', "stdin-packs", _packs,
N_("only scan packfiles listed by stdin")),
+   OPT_BOOL('C', "stdin-commits", _commits,
+   N_("start walk at commits listed by stdin")),
{ OPTION_STRING, 'G', "graph-id", _id,
N_("oid"),
N_("An OID for a specific graph file in the pack-dir."),
diff --git a/packed-graph.c b/packed-graph.c
index c93515f18e..94e1a97000 100644
--- a/packed-graph.c
+++ b/packed-graph.c
@@ -662,7 +662,8 @@ static void close_reachable(struct packed_oid_list *oids)
}
 }
 
-struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, 
int nr_packs)
+struct object_id *construct_graph(const char *pack_dir, char **pack_indexes, 
int nr_packs,
+ char **commit_hex, int nr_commits)
 {
// Find a list of oids, adding the pointer to a list.
struct packed_oid_list oids;
@@ -719,10 +720,21 @@ struct object_id *construct_graph(const char *pack_dir, 
char **pack_indexes, int
for_each_object_in_pack(p, 
if_packed_commit_add_to_list, );
close_pack(p);
}
-   } else {
-   for_each_packed_object(if_packed_commit_add_to_list, , 0);
}
 
+   if (commit_hex) {
+   for (i = 0; i < nr_commits; i++) {
+   const char *end;
+   ALLOC_GROW(oids.list, oids.num + 1, oids.size);
+   oids.list[oids.num] = malloc(sizeof(struct object_id));
+   parse_oid_hex(commit_hex[i], oids.list[oids.num], );
+

[PATCH 04/14] packed-graph: add format document

2018-01-25 Thread Derrick Stolee

Add document specifying the binary format for packed graphs. This
format allows for:

* New versions.
* New hash functions and hash lengths.
* Optional extensions.

Basic header information is followed by a binary table of contents
into "chunks" that include:

* An ordered list of commit object IDs.
* A 256-entry fanout into that list of OIDs.
* A list of metadata for the commits.
* A list of "large edges" to enable octopus merges.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/graph-format.txt | 88 
 1 file changed, 88 insertions(+)
 create mode 100644 Documentation/technical/graph-format.txt

diff --git a/Documentation/technical/graph-format.txt 
b/Documentation/technical/graph-format.txt
new file mode 100644
index 00..a15e1036d7
--- /dev/null
+++ b/Documentation/technical/graph-format.txt
@@ -0,0 +1,88 @@
+Git commit graph format
+===
+
+The Git commit graph stores a list of commit OIDs and some associated
+metadata, including:
+
+- The generation number of the commit. Commits with no parents have
+  generation number 1; commits with parents have generation number
+  one more than the maximum generation number of its parents. We
+  reserve zero as special, and can be used to mark a generation
+  number invalid or as "not computed".
+
+- The root tree OID.
+
+- The commit date.
+
+- The parents of the commit, stored using positional references within
+  the graph file.
+
+== graph-*.graph files have the following format:
+
+In order to allow extensions that add extra data to the graph, we organize
+the body into "chunks" and provide a binary lookup table at the beginning
+of the body. The header includes certain values, such as number of chunks,
+hash lengths and types.
+
+All 4-byte numbers are in network order.
+
+HEADER:
+
+   4-byte signature:
+   The signature is: {'C', 'G', 'P', 'H'}
+
+   1-byte version number:
+   Currently, the only valid version is 1.
+
+   1-byte Object Id Version (1 = SHA-1)
+
+   1-byte Object Id Length (H)
+
+   1-byte number (C) of "chunks"
+
+CHUNK LOOKUP:
+
+   (C + 1) * 12 bytes listing the table of contents for the chunks:
+   First 4 bytes describe chunk id. Value 0 is a terminating label.
+   Other 8 bytes provide offset in current file for chunk to start.
+   (Chunks are ordered contiguously in the file, so you can infer
+   the length using the next chunk position if necessary.)
+
+   The remaining data in the body is described one chunk at a time, and
+   these chunks may be given in any order. Chunks are required unless
+   otherwise specified.
+
+CHUNK DATA:
+
+   OID Fanout (ID: {'O', 'I', 'D', 'F'}) (256 * 4 bytes)
+   The ith entry, F[i], stores the number of OIDs with first
+   byte at most i. Thus F[255] stores the total
+   number of commits (N).
+
+   OID Lookup (ID: {'O', 'I', 'D', 'L'}) (N * H bytes)
+   The OIDs for all commits in the graph.
+
+   Commit Data (ID: {'C', 'G', 'E', 'T' }) (N * (H + 16) bytes)
+   * The first H bytes are for the OID of the root tree.
+   * The next 8 bytes are for the int-ids of the first two parents of
+ the ith commit. Stores value 0x if no parent in that 
position.
+ If there are more than two parents, the second value has its most-
+ significant bit on and the other bits store an offset into the 
Large
+ Edge List chunk.
+   * The next 8 bytes store the generation number of the commit and the
+ commit time in seconds since EPOCH. The generation number uses the
+ higher 30 bits of the first 4 bytes, while the commit time uses 
the
+ 32 bits of the second 4 bytes, along with the lowest 2 bits of the
+ lowest byte, storing the 33rd and 34th bit of the commit time.
+
+   [Optional] Large Edge List (ID: {'E', 'D', 'G', 'E'})
+   This list of 4-byte values store the second through nth parents for
+   all octoput merges. The second parent value in the commit data is a
+   negative number pointing into this list. Then iterate through this
+   list starting at that position until reaching a value with the most-
+   significant bit on. The other bits correspond to the int-id of the
+   last parent.
+
+TRAILER:
+
+   H-byte HASH-checksum of all of the above.
-- 
2.16.0

[PATCH 02/14] packed-graph: add core.graph setting

2018-01-25 Thread Derrick Stolee

The packed graph feature is controlled by the new core.graph config
setting. This defaults to 0, so the feature is opt-in.

The intention of core.graph is that a user can always stop checking
for or parsing packed graph files if core.graph=0.

Signed-off-by: Derrick Stolee 
---
 Documentation/config.txt | 3 +++
 cache.h  | 1 +
 config.c | 5 +
 environment.c| 1 +
 4 files changed, 10 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0e25b2c92b..e7b98fa14f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -898,6 +898,9 @@ core.notesRef::
 This setting defaults to "refs/notes/commits", and it can be overridden by
 the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
 
+core.graph::
+   Enable git commit graph feature. Allows writing and reading from .graph 
files.
+
 core.sparseCheckout::
Enable "sparse checkout" feature. See section "Sparse checkout" in
linkgit:git-read-tree[1] for more information.
diff --git a/cache.h b/cache.h
index d8b975a571..655a81ac90 100644
--- a/cache.h
+++ b/cache.h
@@ -825,6 +825,7 @@ extern char *git_replace_ref_base;
 extern int fsync_object_files;
 extern int core_preload_index;
 extern int core_apply_sparse_checkout;
+extern int core_graph;
 extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
diff --git a/config.c b/config.c
index e617c2018d..fee90912d8 100644
--- a/config.c
+++ b/config.c
@@ -1223,6 +1223,11 @@ static int git_default_core_config(const char *var, 
const char *value)
return 0;
}
 
+   if (!strcmp(var, "core.graph")) {
+   core_graph = git_config_bool(var, value);
+   return 0;
+   }
+
if (!strcmp(var, "core.sparsecheckout")) {
core_apply_sparse_checkout = git_config_bool(var, value);
return 0;
diff --git a/environment.c b/environment.c
index 63ac38a46f..0c56a3d869 100644
--- a/environment.c
+++ b/environment.c
@@ -61,6 +61,7 @@ enum object_creation_mode object_creation_mode = 
OBJECT_CREATION_MODE;
 char *notes_ref_name;
 int grafts_replace_parents = 1;
 int core_apply_sparse_checkout;
+int core_graph;
 int merge_log_config = -1;
 int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */
 unsigned long pack_size_limit_cfg;
-- 
2.16.0

[PATCH 00/14] Serialized Commit Graph

2018-01-25 Thread Derrick Stolee

As promised [1], this patch contains a way to serialize the commit graph.
The current implementation defines a new file format to store the graph
structure (parent relationships) and basic commit metadata (commit date,
root tree OID) in order to prevent parsing raw commits while performing
basic graph walks. For example, we do not need to parse the full commit
when performing these walks:

* 'git log --topo-order -1000' walks all reachable commits to avoid
  incorrect topological orders, but only needs the commit message for
  the top 1000 commits.

* 'git merge-base  ' may walk many commits to find the correct
  boundary between the commits reachable from A and those reachable
  from B. No commit messages are needed.

* 'git branch -vv' checks ahead/behind status for all local branches
  compared to their upstream remote branches. This is essentially as
  hard as computing merge bases for each.

The current patch speeds up these calculations by injecting a check in
parse_commit_gently() to check if there is a graph file and using that
to provide the required metadata to the struct commit.

The file format has room to store generation numbers, which will be
provided as a patch after this framework is merged. Generation numbers
are referenced by the design document but not implemented in order to
make the current patch focus on the graph construction process. Once
that is stable, it will be easier to add generation numbers and make
graph walks aware of generation numbers one-by-one.

Here are some performance results for a copy of the Linux repository
where 'master' has 704,766 reachable commits and is behind 'origin/master'
by 19,610 commits.

| Command  | Before | After  | Rel % |
|--|||---|
| log --oneline --topo-order -1000 |  5.9s  |  0.7s  | -88%  |
| branch -vv   |  0.42s |  0.27s | -35%  |
| rev-list --all   |  6.4s  |  1.0s  | -84%  |
| rev-list --all --objects | 32.6s  | 27.6s  | -15%  |

To test this yourself, run the following on your repo:

git config core.graph true
git show-ref -s | git graph --write --update-head

The second command writes a commit graph file containing every commit
reachable from your refs. Now, all git commands that walk commits will
check your graph first before consulting the ODB. You can run your own
performance comparisions by toggling the 'core.graph' setting.

[1] 
https://public-inbox.org/git/d154319e-bb9e-b300-7c37-27b1dcd2a...@jeffhostetler.com/
Re: What's cooking in git.git (Jan 2018, #03; Tue, 23)

[2] https://github.com/derrickstolee/git/pull/2
A GitHub pull request containing the latest version of this patch.

P.S. I'm sending this patch from my gmail address to avoid Outlook
munging the URLs included in the design document.

Derrick Stolee (14):
  graph: add packed graph design document
  packed-graph: add core.graph setting
  packed-graph: create git-graph builtin
  packed-graph: add format document
  packed-graph: implement construct_graph()
  packed-graph: implement git-graph --write
  packed-graph: implement git-graph --read
  graph: implement git-graph --update-head
  packed-graph: implement git-graph --clear
  packed-graph: teach git-graph --delete-expired
  commit: integrate packed graph with commit parsing
  packed-graph: read only from specific pack-indexes
  packed-graph: close under reachability
  packed-graph: teach git-graph to read commits

 Documentation/config.txt |   3 +
 Documentation/git-graph.txt  | 102 
 Documentation/technical/graph-format.txt |  88 
 Documentation/technical/packed-graph.txt | 185 +++
 Makefile |   2 +
 alloc.c  |   1 +
 builtin.h|   1 +
 builtin/graph.c  | 231 +
 cache.h  |   1 +
 command-list.txt |   1 +
 commit.c |  20 +-
 commit.h |   2 +
 config.c |   5 +
 environment.c|   1 +
 git.c|   1 +
 log-tree.c   |   3 +-
 packed-graph.c   | 840 +++
 packed-graph.h   |  65 +++
 packfile.c   |   4 +-
 packfile.h   |   2 +
 t/t5319-graph.sh | 271 ++
 21 files changed, 1822 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/git-graph.txt
 create mode 100644 Documentation/technical/graph-format.txt
 create mode 100644 Documentation/technical/packed-graph.txt
 create mode 100644 builtin/graph.c
 create mode 100644 packed-graph.c
 create mode 100644 packed-graph.h
 create mode 100755 t/t5319-graph.sh

--

[PATCH 01/14] graph: add packed graph design document

2018-01-25 Thread Derrick Stolee

Add Documentation/technical/packed-graph.txt with details of the planned
packed graph feature, including future plans.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/packed-graph.txt | 185 +++
 1 file changed, 185 insertions(+)
 create mode 100644 Documentation/technical/packed-graph.txt

diff --git a/Documentation/technical/packed-graph.txt 
b/Documentation/technical/packed-graph.txt
new file mode 100644
index 00..fcc0c83874
--- /dev/null
+++ b/Documentation/technical/packed-graph.txt
@@ -0,0 +1,185 @@
+Git Packed Graph Design Notes
+=
+
+Git walks the commit graph for many reasons, including:
+
+1. Listing and filtering commit history.
+2. Computing merge bases.
+
+These operations can become slow as the commit count grows above 100K.
+The merge base calculation shows up in many user-facing commands, such
+as 'status' and 'fetch' and can take minutes to compute depending on
+data shape. There are two main costs here:
+
+1. Decompressing and parsing commits.
+2. Walking the entire graph to avoid topological order mistakes.
+
+The packed graph is a file that stores the commit graph structure along
+with some extra metadata to speed up graph walks. This format allows a
+consumer to load the following info for a commit:
+
+1. The commit OID.
+2. The list of parents.
+3. The commit date.
+4. The root tree OID.
+5. An integer ID for fast lookups in the graph.
+6. The generation number (see definition below).
+
+Values 1-4 satisfy the requirements of parse_commit_gently().
+
+By providing an integer ID we can avoid lookups in the graph as we walk
+commits. Specifically, we need to provide the integer ID of the parent
+commits so we navigate directly to their information on request.
+
+Define the "generation number" of a commit recursively as follows:
+ * A commit with no parents (a root commit) has generation number 1.
+ * A commit with at least one parent has generation number 1 more than
+   the largest generation number among its parents.
+Equivalently, the generation number is one more than the length of a
+longest path from the commit to a root commit. The recursive definition
+is easier to use for computation and the following property:
+
+If A and B are commits with generation numbers N and M, respectively,
+and N <= M, then A cannot reach B. That is, we know without searching
+that B is not an ancestor of A because it is further from a root commit
+than A.
+
+Conversely, when checking if A is an ancestor of B, then we only need
+to walk commits until all commits on the walk boundary have generation
+number at most N. If we walk commits using a priority queue seeded by
+generation numbers, then we always expand the boundary commit with highest
+generation number and can easily detect the stopping condition.
+
+This property can be used to significantly reduce the time it takes to
+walk commits and determine topological relationships. Without generation
+numbers, the general heuristic is the following:
+
+If A and B are commits with commit time X and Y, respectively, and
+X < Y, then A _probably_ cannot reach B.
+
+This heuristic is currently used whenever the computation can make
+mistakes with topological orders (such as "git log" with default order),
+but is not used when the topological order is required (such as merge
+base calculations, "git log --graph").
+
+Design Details
+--
+
+- A graph file is stored in a file named 'graph-.graph' in the pack
+  directory. This could be stored in an alternate.
+
+- The most-recent graph file OID is stored in a 'graph-head' file for
+  immediate access and storing backup graphs. This could be stored in an
+  alternate, and refers to a 'graph-.graph' file in the same pack
+  directory.
+
+- The core.graph config setting must be on to create or consume graph files.
+
+- The graph file is only a supplemental structure. If a user downgrades
+  or disables the 'core.graph' config setting, then the existing ODB is
+  sufficient.
+
+- The file format includes parameters for the object id length
+  and hash algorithm, so a future change of hash algorithm does
+  not require a change in format.
+
+Current Limitations
+---
+
+- Only one graph file is used at one time. This allows the integer ID to
+  seek into the single graph file. It is possible to extend the model
+  for multiple graph files, but that is currently not part of the design.
+
+- .graph files are managed only by the 'graph' builtin. These are not
+  updated automatically during clone or fetch.
+
+- There is no '--verify' option for the 'graph' builtin to verify the
+  contents of the graph file.
+
+- The graph only considers commits existing in packfiles and does not
+  walk to fill in reachable commits. [Small]
+
+- When rewriting the graph, we do not check for a commit still existing
+  in the ODB, so garbage collection may remove

[PATCH 03/14] packed-graph: create git-graph builtin

2018-01-25 Thread Derrick Stolee

Teach Git the 'graph' builtin that will be used for writing and
reading packed graph files. The current implementation is mostly
empty, except for a check that the core.graph setting is enabled
and a '--pack-dir' option.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-graph.txt |  7 +++
 Makefile|  1 +
 builtin.h   |  1 +
 builtin/graph.c | 36 
 command-list.txt|  1 +
 git.c   |  1 +
 6 files changed, 47 insertions(+)
 create mode 100644 Documentation/git-graph.txt
 create mode 100644 builtin/graph.c

diff --git a/Documentation/git-graph.txt b/Documentation/git-graph.txt
new file mode 100644
index 00..de5a3c07e6
--- /dev/null
+++ b/Documentation/git-graph.txt
@@ -0,0 +1,7 @@
+git-graph(1)
+
+
+NAME
+
+git-graph - Write and verify Git commit graphs (.graph files)
+
diff --git a/Makefile b/Makefile
index 1a9b23b679..d8b0d0457a 100644
--- a/Makefile
+++ b/Makefile
@@ -965,6 +965,7 @@ BUILTIN_OBJS += builtin/for-each-ref.o
 BUILTIN_OBJS += builtin/fsck.o
 BUILTIN_OBJS += builtin/gc.o
 BUILTIN_OBJS += builtin/get-tar-commit-id.o
+BUILTIN_OBJS += builtin/graph.o
 BUILTIN_OBJS += builtin/grep.o
 BUILTIN_OBJS += builtin/hash-object.o
 BUILTIN_OBJS += builtin/help.o
diff --git a/builtin.h b/builtin.h
index 42378f3aa4..ae7e816908 100644
--- a/builtin.h
+++ b/builtin.h
@@ -168,6 +168,7 @@ extern int cmd_format_patch(int argc, const char **argv, 
const char *prefix);
 extern int cmd_fsck(int argc, const char **argv, const char *prefix);
 extern int cmd_gc(int argc, const char **argv, const char *prefix);
 extern int cmd_get_tar_commit_id(int argc, const char **argv, const char 
*prefix);
+extern int cmd_graph(int argc, const char **argv, const char *prefix);
 extern int cmd_grep(int argc, const char **argv, const char *prefix);
 extern int cmd_hash_object(int argc, const char **argv, const char *prefix);
 extern int cmd_help(int argc, const char **argv, const char *prefix);
diff --git a/builtin/graph.c b/builtin/graph.c
new file mode 100644
index 00..a902dc8646
--- /dev/null
+++ b/builtin/graph.c
@@ -0,0 +1,36 @@
+#include "builtin.h"
+#include "cache.h"
+#include "config.h"
+#include "dir.h"
+#include "git-compat-util.h"
+#include "lockfile.h"
+#include "packfile.h"
+#include "parse-options.h"
+
+static char const * const builtin_graph_usage[] ={
+   N_("git graph [--pack-dir ]"),
+   NULL
+};
+
+static struct opts_graph {
+   const char *pack_dir;
+} opts;
+
+int cmd_graph(int argc, const char **argv, const char *prefix)
+{
+   static struct option builtin_graph_options[] = {
+   { OPTION_STRING, 'p', "pack-dir", _dir,
+   N_("dir"),
+   N_("The pack directory to store the graph") },
+   OPT_END(),
+   };
+
+   if (!core_graph)
+   die("core.graph is false");
+
+   if (argc == 2 && !strcmp(argv[1], "-h"))
+   usage_with_options(builtin_graph_usage, builtin_graph_options);
+
+   return 0;
+}
+
diff --git a/command-list.txt b/command-list.txt
index a1fad28fd8..d9c17cb9f8 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -61,6 +61,7 @@ git-format-patchmainporcelain
 git-fsckancillaryinterrogators
 git-gc  mainporcelain
 git-get-tar-commit-id   ancillaryinterrogators
+git-graph   plumbingmanipulators
 git-grepmainporcelain   info
 git-gui mainporcelain
 git-hash-object plumbingmanipulators
diff --git a/git.c b/git.c
index c870b9719c..29f8b6e7dd 100644
--- a/git.c
+++ b/git.c
@@ -408,6 +408,7 @@ static struct cmd_struct commands[] = {
{ "fsck-objects", cmd_fsck, RUN_SETUP },
{ "gc", cmd_gc, RUN_SETUP },
{ "get-tar-commit-id", cmd_get_tar_commit_id },
+   { "graph", cmd_graph, RUN_SETUP_GENTLY },
{ "grep", cmd_grep, RUN_SETUP_GENTLY },
{ "hash-object", cmd_hash_object },
{ "help", cmd_help },
-- 
2.16.0

Bug, git rebase -i does not abort correctly

2018-01-25 Thread Duy Nguyen

rebase scripts are too much for me, so I'll play the user reporting
bugs this time :)

Instead of doing

$ git rebase -i --onto origin/master @~3

I sometimes accidentally type

$ git rebase -i origin/master @~3

("rebase -i" is actually an alias, which is why I never forget to type -i)

Usually the todo list in $EDITOR shows noop, I realize my mistake and
try to abort it by clearing the todo list before saving and closing
$EDITOR. The problem is, HEAD is moved away anyway (to origin/master I
think) even if rebase is supposed to abort the operation and leave
HEAD untouched.
-- 
Duy

[PATCH v2 2/2] format-patch: reduce patch diffstat width to 72

2018-01-25 Thread Nguyễn Thái Ngọc Duy

Patches generated by format-patch are meant to be exchanged as emails,
most of the time. And since it's generally agreed that text in mails
should be wrapped around 70 columns or so, make sure these diffstat
follow the convention (especially when used with --cover-letter since we
already defaults to wrapping 72 columns). The default can still be
overriden with command line options.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/log.c  |  2 ++
 t/t4052-stat-output.sh | 28 ++--
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 96af897403..94ee177d56 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1617,6 +1617,8 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
(!rev.diffopt.output_format ||
 rev.diffopt.output_format == DIFF_FORMAT_PATCH))
rev.diffopt.output_format = DIFF_FORMAT_DIFFSTAT | 
DIFF_FORMAT_SUMMARY;
+   if (!rev.diffopt.stat_width)
+   rev.diffopt.stat_width = MAIL_DEFAULT_WRAP;
 
/* Always generate a patch */
rev.diffopt.output_format |= DIFF_FORMAT_PATCH;
diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
index 9f563db20a..1e62333b46 100755
--- a/t/t4052-stat-output.sh
+++ b/t/t4052-stat-output.sh
@@ -60,7 +60,7 @@ do
test_cmp expect actual
'
 done <<\EOF
-format-patch -1 --stdout
+format-patch --stat=80 -1 --stdout
 diff HEAD^ HEAD --stat
 show --stat
 log -1 --stat
@@ -79,11 +79,11 @@ test_expect_success 'preparation for big change tests' '
git commit -m message abcd
 '
 
-cat >expect80 <<'EOF'
- abcd | 1000 ++
+cat >expect72 <<'EOF'
+ abcd | 1000 ++
 EOF
-cat >expect80-graph <<'EOF'
-|  abcd | 1000 
++
+cat >expect72-graph <<'EOF'
+|  abcd | 1000 ++
 EOF
 cat >expect200 <<'EOF'
  abcd | 1000 
++
@@ -107,7 +107,7 @@ do
test_cmp "$expect-graph" actual
'
 done <<\EOF
-ignores expect80 format-patch -1 --stdout
+ignores expect72 format-patch -1 --stdout
 respects expect200 diff HEAD^ HEAD --stat
 respects expect200 show --stat
 respects expect200 log -1 --stat
@@ -135,7 +135,7 @@ do
test_cmp "$expect-graph" actual
'
 done <<\EOF
-ignores expect80 format-patch -1 --stdout
+ignores expect72 format-patch -1 --stdout
 respects expect40 diff HEAD^ HEAD --stat
 respects expect40 show --stat
 respects expect40 log -1 --stat
@@ -163,7 +163,7 @@ do
test_cmp "$expect-graph" actual
'
 done <<\EOF
-ignores expect80 format-patch -1 --stdout
+ignores expect72 format-patch -1 --stdout
 respects expect40 diff HEAD^ HEAD --stat
 respects expect40 show --stat
 respects expect40 log -1 --stat
@@ -250,11 +250,11 @@ show --stat
 log -1 --stat
 EOF
 
-cat >expect80 <<'EOF'
- ...aaa | 1000 
+cat >expect72 <<'EOF'
+ ...aa | 1000 +
 EOF
-cat >expect80-graph <<'EOF'
-|  ...aaa | 1000 

+cat >expect72-graph <<'EOF'
+|  ...aa | 1000 +
 EOF
 cat >expect200 <<'EOF'
  aaa | 1000 
+++
@@ -278,7 +278,7 @@ do
test_cmp "$expect-graph" actual
'
 done <<\EOF
-ignores expect80 format-patch -1 --stdout
+ignores expect72 format-patch -1 --stdout
 respects expect200 diff HEAD^ HEAD --stat
 respects expect200 show --stat
 respects expect200 log -1 --stat
@@ -308,7 +308,7 @@ do
test_cmp "$expect-graph" actual
'
 done <<\EOF
-ignores expect80 format-patch -1 --stdout
+ignores expect72 format-patch -1 --stdout
 respects expect1 diff HEAD^ HEAD --stat
 respects expect1 show --stat
 respects expect1 log -1 --stat
-- 
2.16.1.200.g71ee9f6994

[PATCH v2 0/2] wrap format-patch diffstats around 72 columns

2018-01-25 Thread Nguyễn Thái Ngọc Duy

Like v1, these changes keep diffstat generated by format-patch in 72
columns. This constant is already used in the code, so it's a bit
better than my random "70 or 75" value.

Granted these hard coded values (both 80 and 72) are not really nice.
But I would wait for somebody to say "I need or want this" before I
add code to make the default configurable.

Nguyễn Thái Ngọc Duy (2):
  format-patch: keep cover-letter diffstat wrapped in 72 columns
  format-patch: reduce patch diffstat width to 72

 builtin/log.c  |  7 ++-
 t/t4052-stat-output.sh | 28 ++--
 2 files changed, 20 insertions(+), 15 deletions(-)

-- 
2.16.1.200.g71ee9f6994

[PATCH v2 1/2] format-patch: keep cover-letter diffstat wrapped in 72 columns

2018-01-25 Thread Nguyễn Thái Ngọc Duy

We already wrap shortlog around 72 columns in cover letters. Do the same
for diffstat (also in cover letters).

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/log.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/log.c b/builtin/log.c
index 46b4ca13e5..96af897403 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -29,6 +29,8 @@
 #include "gpg-interface.h"
 #include "progress.h"
 
+#define MAIL_DEFAULT_WRAP 72
+
 /* Set a default date-time format for git log ("log.date" config variable) */
 static const char *default_date_mode = NULL;
 
@@ -1044,7 +1046,7 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
 
shortlog_init();
log.wrap_lines = 1;
-   log.wrap = 72;
+   log.wrap = MAIL_DEFAULT_WRAP;
log.in1 = 2;
log.in2 = 4;
log.file = rev->diffopt.file;
@@ -1061,6 +1063,7 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
 
memcpy(, >diffopt, sizeof(opts));
opts.output_format = DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
+   opts.stat_width = MAIL_DEFAULT_WRAP;
 
diff_setup_done();
 
-- 
2.16.1.200.g71ee9f6994

Re: [PATCH 2/6] t/lib-git-daemon: record daemon log

2018-01-25 Thread Lucas Werkmeister

On 25.01.2018 01:55, Jeff King wrote:
> When we start git-daemon for our tests, we send its stderr
> log stream to a named pipe. We synchronously read the first
> line to make sure that the daemon started, and then dump the
> rest to descriptor 4. This is handy for debugging test
> output with "--verbose", but the tests themselves can't
> access the log data.
> 
> Let's dump the log into a file, as well, so that future
> tests can check the log. There are two subtleties worth
> calling out here:
> 
>   - we replace "cat" with a subshell loop around "read" to
> ensure that there's no buffering (so that tests can be
> sure that after a request has been served, the matching
> log entries will have made it to the file)

POSIX specifies the -u option for that behavior, can’t you use that?
(GNU coreutils’ cat ignores it, since writing without delay is
apparently its default behavior already.)

> 
>   - we open the logfile for append, rather than just output.
> That makes it OK for tests to truncate the logfile
> without restarting the daemon (the OS will atomically
> seek to the end of the file when outputting each line).
> That allows tests to look at the log without worrying
> about pollution from earlier tests.
> 
> Signed-off-by: Jeff King 
> ---
>  t/lib-git-daemon.sh | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/t/lib-git-daemon.sh b/t/lib-git-daemon.sh
> index 987d40680b..19f3ffdbb1 100644
> --- a/t/lib-git-daemon.sh
> +++ b/t/lib-git-daemon.sh
> @@ -53,11 +53,19 @@ start_git_daemon() {
>   "$@" "$GIT_DAEMON_DOCUMENT_ROOT_PATH" \
>   >&3 2>git_daemon_output &
>   GIT_DAEMON_PID=$!
> + >daemon.log
>   {
>   read line <&7
> + echo "$line"
>   echo >&4 "$line"
> - cat <&7 >&4 &
> - } 7 + (
> + while read line <&7
> + do
> + echo "$line"
> + echo >&4 "$line"
> + done
> + ) &
> + } 7>"$TRASH_DIRECTORY/daemon.log" &&
>  
>   # Check expected output
>   if test x"$(expr "$line" : "\[[0-9]*\] \(.*\)")" != x"Ready to rumble"
> 

read without -r clobbers backslashes, and echo may interpret escape
sequences. To faithfully reproduce the output, it would be better to use
read -r and printf '%s\n' "$line", I think. (However, it looks like the
existing code already uses read+echo, so I guess you could also keep
that pattern in this change and then fix it in a later one.)

Re: [BUG] git pull with pull.rebase and rebase.autoStash is not working anymore in 2.16

2018-01-25 Thread Dimitriy

Yeah it seems like this bug. Thank you for sharing this with me.
-- 
Sincerely,
Dimitriy Ryazantcev

> Could this be the same one as reported as Git for Windows issue
> #1437[1] ("`git status` reports (non-existent) modifications after
> `git stash push`", 2018-01-20), fixed in Git for Windows v2.16.1...?
>
> Care to try it out? :)
>
> Regards, Buga
>
> [1] https://github.com/git-for-windows/git/issues/1437

Re: [PATCH] setup: recognise extensions.objectFormat

2018-01-25 Thread Duy Nguyen

On Wed, Jan 24, 2018 at 8:09 PM, Patryk Obara  wrote:
> This extension selects which hashing algorithm from vtable should be
> used for reading and writing objects in the object store.  At the moment
> supports only single value (sha-1).
>
> In case value of objectFormat is an unknown hashing algorithm, Git
> command will fail with following message:
>
>   fatal: unknown repository extensions found:
>   objectformat = 
>
> To indicate, that this specific objectFormat value is not recognised.
>
> The objectFormat extension is not allowed in repository marked as
> version 0 to prevent any possibility of accidentally writing a NewHash
> object in the sha-1 object store. This extension behaviour is different
> than preciousObjects extension (which is allowed in repo version 0).

This config is so sensitive I wonder if we should forbid changing it
via git-config. You can't simply change this and expect anything to
work anyway.

"git init" can have an option to specify object format. "git clone"
naturally inherits the format from the remote repository. Maybe a
future command allows to convert hash algorithm on an existing repo
(*). But other than that nobody is allowed to change this.

(*) it's probably git-clone that does this job, cloning and converting
at the same time.

> +`objectFormat`
> +~~
> +
> +This extension instructs Git to use a specific algorithm for addressing
> +and interpreting objects in the object store. Currently, the only
> +supported object format is `sha-1`.  See `hash-function-transition.txt`
> +document for more detailed explanation.

Maybe the word "experimental" should be mentioned somewhere.

> +static int find_object_format(const char *value)
> +{
> +   int i;
> +   for (i = GIT_HASH_SHA1; i < GIT_HASH_NALGOS; ++i) {
> +   if (strcmp(value, hash_algos[i].name) == 0)
> +   return i;
> +   }
> +   return GIT_HASH_UNKNOWN;
> +}
> +
> +static void detect_object_format(struct repository_format *data,
> +const char *value)
> +{
> +   if (data->version == 0)
> +   die("invalid repository format version");

die(_("invalid repository format version '%d'"), data->version);

> +
> +   data->hash_algo = find_object_format(value);
> +   if (data->hash_algo == GIT_HASH_UNKNOWN) {
> +   char object_format[25];
> +   xsnprintf(object_format, sizeof(object_format),
> + "objectformat = %s", value);

We have strbuf so that we don't have to deal with fixed size buffers like this.
-- 
Duy

Re: [PATCH v3 00/14] Some fixes and bunch of object_id conversions

2018-01-25 Thread Duy Nguyen

On Thu, Jan 25, 2018 at 4:41 AM, Junio C Hamano  wrote:
> Patryk Obara  writes:
>
>> Patryk Obara (14):
>>   http-push: improve error log
>>   clang-format: adjust penalty for return type line break
>>   sha1_file: convert pretend_sha1_file to object_id
>>   dir: convert struct sha1_stat to use object_id
>>   sha1_file: convert hash_sha1_file to object_id
>>   cache: clear whole hash buffer with oidclr
>>   match-trees: convert splice_tree to object_id
>>   commit: convert commit_tree* to object_id
>>   notes: convert combine_notes_* to object_id
>>   notes: convert write_notes_tree to object_id
>>   sha1_file: convert write_sha1_file to object_id
>>   sha1_file: convert force_object_loose to object_id
>>   sha1_file: convert write_loose_object to object_id
>>   sha1_file: rename hash_sha1_file_literally
>
> These were mostly pleasant read.  I'll queue these on two topic
> branches and wait to see what others say.

Looks good to me too.
-- 
Duy

LOAN OFFER AT 3% INTEREST RATE

2018-01-25 Thread Davivas

Hello

Are you desperately in need of a loan?
Have you been denied a loan from your bank or any institution?
Do you need financial assistance?
Do you need a loan to buy off your bills or buy a home?
Do you want to have a business of your own and you need a Loan for your
financial demands?


Regard
Solomon Brandy
SKYPE:  Davivas
E-mail: davi...@shqiptar.eu

[PATCH] merge: support --strategy '?' for git-completion.bash

2018-01-25 Thread Nguyễn Thái Ngọc Duy

Bash completion support gets the list of available strategies with a
grep and sed trick which does not work on non-C locale since the anchor
string is translated and it does not cover custom strategies either.

Let's do it a better way and make git-merge provide all available
strategies in machine-readable form.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Another, perhaps better, option is add "git merge --list-strategies".
 It requires some code movement, so I'll try with a simpler approach
 first.

 Documentation/merge-options.txt| 3 +++
 builtin/merge.c| 7 +++
 contrib/completion/git-completion.bash | 9 +
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/Documentation/merge-options.txt b/Documentation/merge-options.txt
index 3888c3ff85..cd4342844f 100644
--- a/Documentation/merge-options.txt
+++ b/Documentation/merge-options.txt
@@ -97,6 +97,9 @@ option can be used to override --squash.
If there is no `-s` option, a built-in list of strategies
is used instead ('git merge-recursive' when merging a single
head, 'git merge-octopus' otherwise).
++
+The special strategy '?' lists all available strategies and exits
+immediately. No merge operation is done.
 
 -X ::
 --strategy-option=::
diff --git a/builtin/merge.c b/builtin/merge.c
index 30264cfd7c..a09d04302c 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -140,6 +140,13 @@ static struct strategy *get_strategy(const char *name)
}
exclude_cmds(_cmds, _strategies);
}
+   if (!strcmp(name, "?")) {
+   for (i = 0; i < main_cmds.cnt; i++)
+   puts(main_cmds.names[i]->name);
+   for (i = 0; i < other_cmds.cnt; i++)
+   puts(other_cmds.names[i]->name);
+   exit(0);
+   }
if (!is_in_cmdlist(_cmds, name) && !is_in_cmdlist(_cmds, 
name)) {
fprintf(stderr, _("Could not find merge strategy '%s'.\n"), 
name);
fprintf(stderr, _("Available strategies are:"));
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 3683c772c5..6d947be858 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -594,14 +594,7 @@ __git_is_configured_remote ()
 
 __git_list_merge_strategies ()
 {
-   git merge -s help 2>&1 |
-   sed -n -e '/[Aa]vailable strategies are: /,/^$/{
-   s/\.$//
-   s/.*://
-   s/^[]*//
-   s/[ ]*$//
-   p
-   }'
+   git merge --strategy '?'
 }
 
 __git_merge_strategies=
-- 
2.16.0.47.g3d9b0fac3a

1 2 >

100 matches

Mail list logo