Re: [PATCH] fetch: replace string-list used as a look-up table with a hashmap

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 03:59:08PM -0700, Stefan Beller wrote:

> > +struct refname_hash_entry {
> > +   struct hashmap_entry ent; /* must be the first member */
> 
> $ git grep "struct hashmap_entry" reveals that we have another
> convention that we follow but not document, which is to stress
> the importance of putting the hashmap_entry first. ;-)

One thing I've liked about the list.h implementation is that you can
store the list pointer anywhere in the struct, or even have the same
struct in multiple lists.

The only funny thing is that you have to "dereference" the iterator like
this:

  struct list_head *pos;
  struct actual_thing *item;
  ...
  item = list_entry(pos, struct actual_thing, list_member);

which is a minor pain, but it's reasonably hard to get it wrong.

I wonder if we could do the same here. The hashmap would only ever see
the "struct hashmap_entry", and then the caller would convert that back
to the actual type. I think we could even get away with not converting
existing callers; if the hashmap _is_ at the front, then that
list_entry() really just devolves to a cast. So as long as the struct
definition and the users of the struct agree, it would just work.

> > +static int refname_hash_entry_cmp(const void *hashmap_cmp_fn_data,
> > + const struct refname_hash_entry *e1,
> > + const struct refname_hash_entry *e2,
> > + const void *keydata)
> > +{
> > +   return strcmp(e1->refname, keydata ? keydata : e2->refname);
> > +}
> 
> and later
> 
> hashmap_init(...  (hashmap_cmp_fn)refname_hash_entry_cmp, ...);
> 
> I wonder if we'd want to stick to this style, as that seems to be the easiest
> to do, and drop the efforts started in 55c965f3a2 (Merge branch
> 'sb/hashmap-cleanup', 2017-08-11), that avoids the cast in the init,
> but puts the (implicit) casts in the _cmp function as we'd take
> void pointers as the function signature and recast to a local variable.

I think that casting the function pointer technically breaks the C
standard, though probably not in a way that is a problem on any real
systems.

The other thing about the "cast inside the cmp function" approach from
sb/hashmap-cleanup is that it is less likely to go stale. If we change
the definition of hashmap_cmp_fn, then any functions which are manually
cast will suppress the compiler errors. When the function takes void
pointers, the same can happen if "struct refname_hash_entry" is swapped
out for another struct, but IMHO that's a less likely error to make.

I admit that's just a gut feeling, though. It would be nice if we could
avoid casting altogether, but I think that puts us into macro territory
(which I'm not altogether opposed to, but it has its own drawbacks).

-Peff


[PATCH v2] git.txt: mention mailing list archive

2018-09-26 Thread Martin Ågren
In the "Reporting Bugs" section of git(1), we refer to the mailing list,
but we do not give any hint about where the archives might be found. Of
course, any web search engine can be used to try to hunt down whether an
issue is already known. But we can do better by mentioning the archive
at public-inbox. Make sure to phrase this in a way that avoids raising
the bar for reporting.

public-inbox.org/git/ is usually our preferred archive, since it uses
message ids in its permalinks. But it also has a search function right
at the top of each page, and searching gives the most recent hits first.
Searching for some keyword about a bug or regression should pretty
easily reveal whether it has been recently reported.

Helped-by: Junio C Hamano 
Signed-off-by: Martin Ågren 
---
 Thanks Junio and Taylor for thoughts on this. I agree we do not want
 to scare anyone away. I hope this does the trick.

 Documentation/git.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 74a9d7edb4..68393f3235 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -859,6 +859,9 @@ Reporting Bugs
 Report bugs to the Git mailing list  where the
 development and maintenance is primarily done.  You do not have to be
 subscribed to the list to send a message there.
+If you want to check to see if the issue has
+been reported already, the list archive can be found at
+ and other places.
 
 Issues which are security relevant should be disclosed privately to
 the Git Security mailing list .
-- 
2.19.0.216.g2d3b1c576c



Re: [PATCH] fetch: replace string-list used as a look-up table with a hashmap

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 02:28:28PM -0700, Junio C Hamano wrote:

> In find_non_local_tags() helper function (used to implement the
> "follow tags"), we use string_list_has_string() on two string lists
> as a way to see if a refname has already been processed, etc.
> 
> All this code predates more modern in-core lookup API like hashmap;
> replace them with two hashmaps and one string list---the hashmaps
> are used for look-ups and the string list is to keep the order of
> items in the returned result stable (which is the only single thing
> hashmap does worse than lookups on string-list).
> 
> Signed-off-by: Junio C Hamano 
> ---
> 
>  * Just to remind ourselves that we talked about getting rid of
>string-list used as look-up tables by replacing them with hashmap
>but haven't seen much effort expended on it.  I think this is my
>first semi-serious use of hashmap, and the code is expected to be
>full of newbie mistakes, inefficiencies and ignorance of idioms;
>pointing out any of which is much appreciated.

As you probably noticed, there's a bit of boilerplate required to use a
hashmap. I had figured we could replace most of these with a single
"strmap" API to map a string to a void pointer (which is basically
what string-list gives you).

That would save on the boilerplate. But your solution here replaces a
"void *" pointer with an actual "struct object_id" member, which
improves type-safety. It also removes questions about memory lifetimes
(at the minor cost of copying the oids). So this path is probably better
if we don't mind a little extra code.

I do note that your struct just has the same information as "struct
ref":

> +struct refname_hash_entry {
> + struct hashmap_entry ent; /* must be the first member */
> + struct object_id oid;
> + char refname[FLEX_ARRAY];
> +};

So yet another alternative here is to just define a single hashmap that
stores void pointers. That also throws away some type safety, but is
maybe conceptually simpler. I dunno. It's actually a pain to do that
with "struct hashmap" because it requires the caller to handle
allocation. An open-addressed hash table, as we use elsewhere (and in
khash.h) is a bit simpler, since it doesn't need to do any per-entry
malloc.

To be clear, I'm perfectly happy with the approach in your patch here.
I'm just musing on what might might be the least painful thing for doing
more of them.

-Peff


Re: [PATCH] doc: clarify gitcredentials path component matching

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 10:23:11PM +, Zych, David M wrote:

> The gitcredentials documentation implied that the config file's
> "pattern" URL might include a path component, but did not explain that
> it must match exactly (potentially leaving readers with the false hope
> that it would support a more flexible prefix match).
> [...]

Thanks, the proposed text looks perfect to me.

-Peff


Re: Triggering "BUG: wt-status.c:476: multiple renames on the same target? how?"

2018-09-26 Thread Elijah Newren
On Wed, Sep 26, 2018, 2:27 PM Ævar Arnfjörð Bjarmason  wrote:
>
> On Wed, Sep 26 2018, Andrea Stacchiotti wrote:
>
> > I'm very sorry, I indeed forgot the `diff.renames=copies`.
> >
> > The following script can reproduce the bug even with a blank config:

Thanks for the bug report and the simple testcase.

> > -
> >
> > # Make a test repo
> > git init testrepo
> > cd testrepo
> > git config user.name A
> > git config user.email B
> > git config diff.renames copies
> >
> > # Add a file called orig
> > echo 'a' > orig
> > git add orig
> > git commit -m'orig'
> >
> > # Copy orig in new and modify orig
> > cp orig new
> > echo 'b' > orig
> >
> > # add -N and then commit trigger the bug
> > git add -N new
> > git commit
> >
> > # Cleanup
> > cd ..
> > rm -rf testrepo
>
> Thanks. Bisecting shows that the bug is in dc6b1d92ca ("wt-status: use
> settings from git_diff_ui_config", 2018-05-04) first released with
> 2.18.0.

The bisect is slightly misleading; the bug was introduced in 2.17.0
for renames, and when copy detection became a thing in 2.18.0 it also
incidentally would trigger with copies.  I'll post a patch soon.


URGENT RESPONSE PLEASE.

2018-09-26 Thread Sean Kim.
Hello my dear.

Did you receive my email message to you? Please, get back to me ASAP as the 
matter is becoming late.  Expecting your urgent response.

Sean.



[PATCH v2 4/4] archive: allow archive over HTTP(S) with proto v2

2018-09-26 Thread Josh Steadmon
Signed-off-by: Josh Steadmon 
---
 builtin/archive.c  | 12 +++-
 http-backend.c | 13 -
 transport-helper.c |  7 ---
 3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index f91d222677..78a259518d 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -87,7 +87,17 @@ static int run_remote_archiver(int argc, const char **argv,
status = packet_reader_read();
if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
die(_("git archive: expected a flush"));
-   }
+   } else if (version == protocol_v2 &&
+  (starts_with(transport->url, "http://;) ||
+   starts_with(transport->url, "https://;)))
+   /*
+* Commands over HTTP require two requests, so there's an
+* additional server response to parse. We do only basic sanity
+* checking here that the versions presented match across
+* requests.
+*/
+   if (version != discover_version())
+   die(_("git archive: received different protocol 
versions in subsequent requests"));
 
/* Now, start reading from fd[0] and spit it out to stdout */
rv = recv_sideband("archive", fd[0], 1);
diff --git a/http-backend.c b/http-backend.c
index 9e894f197f..8e262d50e0 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -32,6 +32,7 @@ struct rpc_service {
 static struct rpc_service rpc_service[] = {
{ "upload-pack", "uploadpack", 1, 1 },
{ "receive-pack", "receivepack", 0, -1 },
+   { "upload-archive", "uploadarchive", 1, 1 },
 };
 
 static struct string_list *get_parameters(void)
@@ -637,6 +638,15 @@ static void service_rpc(struct strbuf *hdr, char 
*service_name)
struct rpc_service *svc = select_service(hdr, service_name);
struct strbuf buf = STRBUF_INIT;
 
+   if (!strcmp(service_name, "git-upload-archive")) {
+   /*
+* git-upload-archive doesn't need --stateless-rpc, because it
+* always handles only a single request.
+*/
+   argv[1] = ".";
+   argv[2] = NULL;
+   }
+
strbuf_reset();
strbuf_addf(, "application/x-git-%s-request", svc->name);
check_content_type(hdr, buf.buf);
@@ -713,7 +723,8 @@ static struct service_cmd {
{"GET", "/objects/pack/pack-[0-9a-f]{40}\\.idx$", get_idx_file},
 
{"POST", "/git-upload-pack$", service_rpc},
-   {"POST", "/git-receive-pack$", service_rpc}
+   {"POST", "/git-receive-pack$", service_rpc},
+   {"POST", "/git-upload-archive$", service_rpc},
 };
 
 static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
diff --git a/transport-helper.c b/transport-helper.c
index 143ca008c8..c41c3dfcff 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -604,8 +604,9 @@ static int process_connect_service(struct transport 
*transport,
strbuf_addf(, "connect %s\n", name);
ret = run_connect(transport, );
} else if (data->stateless_connect &&
-  (get_protocol_version_config() == protocol_v2) &&
-  !strcmp("git-upload-pack", name)) {
+  get_protocol_version_config() == protocol_v2 &&
+  (!strcmp("git-upload-pack", name) ||
+   !strcmp("git-upload-archive", name))) {
strbuf_addf(, "stateless-connect %s\n", name);
ret = run_connect(transport, );
if (ret)
@@ -639,7 +640,7 @@ static int connect_helper(struct transport *transport, 
const char *name,
 
/* Get_helper so connect is inited. */
get_helper(transport);
-   if (!data->connect)
+   if (!data->connect && !data->stateless_connect)
die(_("operation not supported by protocol"));
 
if (!process_connect_service(transport, name, exec))
-- 
2.19.0.605.g01d371f741-goog



[PATCH v2 3/4] archive: implement protocol v2 archive command

2018-09-26 Thread Josh Steadmon
This adds a new archive command for protocol v2. The command expects
arguments in the form "argument X" which are passed unmodified to
git-upload-archive--writer.

This command works over the file://, Git, and SSH transports. HTTP
support will be added in a separate patch.

NEEDSWORK: this is not backwards-compatible with older clients that set
GIT_PROTOCOL=version=2 or configure protocol.version=2.

Signed-off-by: Josh Steadmon 
---
 Documentation/technical/protocol-v2.txt | 21 +++-
 builtin/archive.c   | 45 +
 builtin/upload-archive.c| 27 +--
 serve.c |  7 
 t/t5000-tar-tree.sh |  7 
 t/t5701-git-serve.sh|  1 +
 6 files changed, 90 insertions(+), 18 deletions(-)

diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 09e4e0273f..6126fc5738 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -77,7 +77,8 @@ A v2 server would reply:
S: 
 
 Subsequent requests are then made directly to the service
-`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack).
+`$GIT_URL/git-upload-pack`. (This works the same for git-receive-pack or
+git-upload-archive).
 
  Capability Advertisement
 --
@@ -168,6 +169,24 @@ printable ASCII characters except space (i.e., the byte 
range 32 < x <
 and debugging purposes, and MUST NOT be used to programmatically assume
 the presence or absence of particular features.
 
+ archive
+~
+
+`archive` is the command to request an archive (such as a tarball) from a 
server
+over v2.
+
+archive takes the following optional arguments:
+
+argument 
+   This specifies that  should be passed as an argument to the
+   underlying archive writer.  This option can be repeated to pass
+   additional arguments to the archive writer.
+
+See linkgit:git-archive[1] for allowed values of ``.
+
+The output of archive is a packet-line stream of the resulting archive, with
+error messages transferred over the sideband channel.
+
  ls-refs
 ~
 
diff --git a/builtin/archive.c b/builtin/archive.c
index 4eb547c5b7..f91d222677 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -5,9 +5,11 @@
 #include "cache.h"
 #include "builtin.h"
 #include "archive.h"
+#include "connect.h"
 #include "transport.h"
 #include "parse-options.h"
 #include "pkt-line.h"
+#include "protocol.h"
 #include "sideband.h"
 
 static void create_output_file(const char *output_file)
@@ -23,6 +25,13 @@ static void create_output_file(const char *output_file)
}
 }
 
+static void do_v2_command_and_cap(int out)
+{
+   packet_write_fmt(out, "command=archive\n");
+   /* Capability list would go here, if we wanted to request any. */
+   packet_delim(out);
+}
+
 static int run_remote_archiver(int argc, const char **argv,
   const char *remote, const char *exec,
   const char *name_hint)
@@ -32,6 +41,7 @@ static int run_remote_archiver(int argc, const char **argv,
struct remote *_remote;
struct packet_reader reader;
enum packet_read_status status;
+   enum protocol_version version;
 
_remote = remote_get(remote);
if (!_remote->url[0])
@@ -41,6 +51,11 @@ static int run_remote_archiver(int argc, const char **argv,
 
packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
 
+   version = discover_version();
+
+   if (version == protocol_v2 && server_supports_v2("archive", 0))
+   do_v2_command_and_cap(fd[1]);
+
/*
 * Inject a fake --format field at the beginning of the
 * arguments, with the format inferred from our output
@@ -56,22 +71,24 @@ static int run_remote_archiver(int argc, const char **argv,
packet_write_fmt(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
 
-   status = packet_reader_read();
-
-   if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
-   die(_("git archive: expected ACK/NAK, got a flush packet"));
-   if (strcmp(reader.line, "ACK")) {
-   if (starts_with(reader.line, "NACK "))
-   die(_("git archive: NACK %s"), reader.line + 5);
-   if (starts_with(reader.line, "ERR "))
-   die(_("remote error: %s"), reader.line + 4);
-   die(_("git archive: protocol error"));
+   if (version == protocol_v0) {
+   status = packet_reader_read();
+
+   if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
+   die(_("git archive: expected ACK/NAK, got a flush 
packet"));
+   if (strcmp(reader.line, "ACK")) {
+   if (starts_with(reader.line, "NACK "))
+   die(_("git archive: NACK %s"), reader.line + 5);

[PATCH v2 1/4] archive: follow test standards around assertions

2018-09-26 Thread Josh Steadmon
Move assertions outside of the check_tar function so that all top-level
code is wrapped in a test_expect_* assertion.

Signed-off-by: Josh Steadmon 
---
 t/t5000-tar-tree.sh | 26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 2a97b27b0a..c408e3a23d 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -62,11 +62,9 @@ check_tar() {
dir=$1
dir_with_prefix=$dir/$2
 
-   test_expect_success ' extract tar archive' '
-   (mkdir $dir && cd $dir && "$TAR" xf -) <$tarfile
-   '
+   (mkdir $dir && cd $dir && "$TAR" xf -) <$tarfile &&
 
-   test_expect_success TAR_NEEDS_PAX_FALLBACK ' interpret pax headers' '
+   if test_have_prereq TAR_NEEDS_PAX_FALLBACK ; then
(
cd $dir &&
for header in *.paxheader
@@ -82,16 +80,11 @@ check_tar() {
fi
done
)
-   '
+   fi &&
 
-   test_expect_success ' validate filenames' '
-   (cd ${dir_with_prefix}a && find .) | sort >$listfile &&
-   test_cmp a.lst $listfile
-   '
-
-   test_expect_success ' validate file contents' '
-   diff -r a ${dir_with_prefix}a
-   '
+   (cd ${dir_with_prefix}a && find .) | sort >$listfile &&
+   test_cmp a.lst $listfile &&
+   diff -r a ${dir_with_prefix}a
 }
 
 test_expect_success \
@@ -143,19 +136,20 @@ test_expect_success \
 'git archive' \
 'git archive HEAD >b.tar'
 
-check_tar b
+test_expect_success 'extract archive' 'check_tar b'
 
 test_expect_success 'git archive --prefix=prefix/' '
git archive --prefix=prefix/ HEAD >with_prefix.tar
 '
 
-check_tar with_prefix prefix/
+test_expect_success 'extract with prefix' 'check_tar with_prefix prefix/'
 
 test_expect_success 'git-archive --prefix=olde-' '
git archive --prefix=olde- HEAD >with_olde-prefix.tar
 '
 
-check_tar with_olde-prefix olde-
+test_expect_success 'extract with olde- prefix' \
+   'check_tar with_olde-prefix olde-'
 
 test_expect_success 'git archive on large files' '
 test_config core.bigfilethreshold 1 &&
-- 
2.19.0.605.g01d371f741-goog



[PATCH v2 2/4] archive: use packet_reader for communications

2018-09-26 Thread Josh Steadmon
Using packet_reader will simplify version detection and capability
handling, which will make implementation of protocol v2 support in
git-archive easier.

This refactoring does not change the behavior of "git archive".

Signed-off-by: Josh Steadmon 
---
 builtin/archive.c | 23 ++-
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index e74f675390..4eb547c5b7 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -27,10 +27,11 @@ static int run_remote_archiver(int argc, const char **argv,
   const char *remote, const char *exec,
   const char *name_hint)
 {
-   char *buf;
int fd[2], i, rv;
struct transport *transport;
struct remote *_remote;
+   struct packet_reader reader;
+   enum packet_read_status status;
 
_remote = remote_get(remote);
if (!_remote->url[0])
@@ -38,6 +39,8 @@ static int run_remote_archiver(int argc, const char **argv,
transport = transport_get(_remote, _remote->url[0]);
transport_connect(transport, "git-upload-archive", exec, fd);
 
+   packet_reader_init(, fd[0], NULL, 0, PACKET_READ_CHOMP_NEWLINE);
+
/*
 * Inject a fake --format field at the beginning of the
 * arguments, with the format inferred from our output
@@ -53,18 +56,20 @@ static int run_remote_archiver(int argc, const char **argv,
packet_write_fmt(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
 
-   buf = packet_read_line(fd[0], NULL);
-   if (!buf)
+   status = packet_reader_read();
+
+   if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
die(_("git archive: expected ACK/NAK, got a flush packet"));
-   if (strcmp(buf, "ACK")) {
-   if (starts_with(buf, "NACK "))
-   die(_("git archive: NACK %s"), buf + 5);
-   if (starts_with(buf, "ERR "))
-   die(_("remote error: %s"), buf + 4);
+   if (strcmp(reader.line, "ACK")) {
+   if (starts_with(reader.line, "NACK "))
+   die(_("git archive: NACK %s"), reader.line + 5);
+   if (starts_with(reader.line, "ERR "))
+   die(_("remote error: %s"), reader.line + 4);
die(_("git archive: protocol error"));
}
 
-   if (packet_read_line(fd[0], NULL))
+   status = packet_reader_read();
+   if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
die(_("git archive: expected a flush"));
 
/* Now, start reading from fd[0] and spit it out to stdout */
-- 
2.19.0.605.g01d371f741-goog



[PATCH v2 0/4] Add proto v2 archive command with HTTP support

2018-09-26 Thread Josh Steadmon
This is the second version of my series to add a new protocol v2 command
for archiving, with support for HTTP(S).

NEEDSWORK: a server built with this series is not backwards-compatible
with clients that set GIT_PROTOCOL=version=2 or configure
protocol.version=2. The old client will unconditionally send "argument
..." packet lines, which breaks the server's expectations of a
"command=archive" request, while the server's capability advertisement
in turn breaks the clients expectation of either an ACK or NACK.

I've been discussing workarounds for this with Jonathan Nieder, but
please let me know if you have any suggestions for v3 of this series.


Josh Steadmon (4):
  archive: follow test standards around assertions
  archive: use packet_reader for communications
  archive: implement protocol v2 archive command
  archive: allow archive over HTTP(S) with proto v2

 Documentation/technical/protocol-v2.txt | 21 -
 builtin/archive.c   | 58 +++--
 builtin/upload-archive.c| 27 ++--
 http-backend.c  | 13 +-
 serve.c |  7 +++
 t/t5000-tar-tree.sh | 33 +++---
 t/t5701-git-serve.sh|  1 +
 transport-helper.c  |  7 +--
 8 files changed, 130 insertions(+), 37 deletions(-)

Range-diff against v1:
-:  -- > 1:  c2e371ad24 archive: follow test standards around assertions
1:  b514184273 ! 2:  a65f73f627 archive: use packet_reader for communications
@@ -6,7 +6,10 @@
 handling, which will make implementation of protocol v2 support in
 git-archive easier.
 
+This refactoring does not change the behavior of "git archive".
+
 Signed-off-by: Josh Steadmon 
+Reviewed-by: Stefan Beller 
 
  diff --git a/builtin/archive.c b/builtin/archive.c
@@ -42,24 +45,24 @@
 -  if (!buf)
 +  status = packet_reader_read();
 +
-+  if (status == PACKET_READ_FLUSH)
++  if (status != PACKET_READ_NORMAL || reader.pktlen <= 0)
die(_("git archive: expected ACK/NAK, got a flush packet"));
 -  if (strcmp(buf, "ACK")) {
 -  if (starts_with(buf, "NACK "))
 -  die(_("git archive: NACK %s"), buf + 5);
 -  if (starts_with(buf, "ERR "))
 -  die(_("remote error: %s"), buf + 4);
-+  if (strcmp(reader.buffer, "ACK")) {
-+  if (starts_with(reader.buffer, "NACK "))
-+  die(_("git archive: NACK %s"), reader.buffer + 5);
-+  if (starts_with(reader.buffer, "ERR "))
-+  die(_("remote error: %s"), reader.buffer + 4);
++  if (strcmp(reader.line, "ACK")) {
++  if (starts_with(reader.line, "NACK "))
++  die(_("git archive: NACK %s"), reader.line + 5);
++  if (starts_with(reader.line, "ERR "))
++  die(_("remote error: %s"), reader.line + 4);
die(_("git archive: protocol error"));
}
  
 -  if (packet_read_line(fd[0], NULL))
 +  status = packet_reader_read();
-+  if (status != PACKET_READ_FLUSH)
++  if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
die(_("git archive: expected a flush"));
  
/* Now, start reading from fd[0] and spit it out to stdout */
2:  1518c15dc1 < -:  -- archive: implement protocol v2 archive command
-:  -- > 3:  0a8cc5e331 archive: implement protocol v2 archive command
3:  1b7ad8d8f6 ! 4:  97a1424f32 archive: allow archive over HTTP(S) with proto 
v2
@@ -10,16 +10,20 @@
  +++ b/builtin/archive.c
 @@
status = packet_reader_read();
-   if (status != PACKET_READ_FLUSH)
+   if (status == PACKET_READ_NORMAL && reader.pktlen > 0)
die(_("git archive: expected a flush"));
 -  }
 +  } else if (version == protocol_v2 &&
-+ starts_with(transport->url, "http"))
++ (starts_with(transport->url, "http://;) ||
++  starts_with(transport->url, "https://;)))
 +  /*
 +   * Commands over HTTP require two requests, so there's an
-+   * additional server response to parse.
++   * additional server response to parse. We do only basic sanity
++   * checking here that the versions presented match across
++   * requests.
 +   */
-+  discover_version();
++  if (version != discover_version())
++  die(_("git archive: received different protocol 
versions in subsequent requests"));
  
/* Now, start reading from fd[0] and spit it out to stdout */
rv = recv_sideband("archive", fd[0], 1);
@@ -40,7 +44,10 @@
struct strbuf buf = STRBUF_INIT;
  
 +  if (!strcmp(service_name, "git-upload-archive")) {

Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Reminder,

2018-09-26 Thread Juliet Muhammad
Hello 

   Please i still await your response regarding my previous email.


Re: Fixing constant preference prompts during tests

2018-09-26 Thread Tacitus Aedifex

On Wed, Sep 26, 2018 at 02:48:49PM -0700, Junio C Hamano wrote:

We do not want your choice of gpg.program or what kind of
trust you have personally recorded in your ~/.gnupg/ affect how gpg
invoked inside our tests work.


This makes sense to me now. I get what you are saying. The gpg binary installed 
on my system is fairly new:


gpg --version
gpg (GnuPG) 2.2.8
libgcrypt 1.8.3
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Home: /home/user/.gnupg
Supported algorithms:
Pubkey: RSA, ELG, DSA, ECDH, ECDSA, EDDSA
Cipher: IDEA, 3DES, CAST5, BLOWFISH, AES, AES192, AES256, TWOFISH,
   CAMELLIA128, CAMELLIA192, CAMELLIA256
Hash: SHA1, RIPEMD160, SHA256, SHA384, SHA512, SHA224
Compression: Uncompressed

I'm not sure if that has anything to do with it. I'm going to have to 
investigate further to figure out what is being executed and with what 
parameters that leads to the prerefences prompt. While working with GPG on 
another project I noticed that GPG doesn't like to work with keyrings other 
than the default ones. I tried a bunch of different combinations of 
--no-default-keyrings, --homedir, --default-key, etc to try to get GPG to never 
touch ~/.gnupg and I couldn't figure it out. It would always re-create ~/.gnupg 
and default keyrings and even read gpg.conf when explicitly told not to. I 
suspect that is what is going on here.


//tæ


Re: On shipping more of our technical docs as manpages

2018-09-26 Thread Stefan Beller
On Wed, Sep 26, 2018 at 1:44 PM Ævar Arnfjörð Bjarmason
 wrote:

> > And if we were going to generate something external, would it make more
> > sense to write in a structured format like doxygen? I am not a big fan
> > of it myself, but at least from there you can generate a more richly
> > interconnected set of documentation.
>
> It's useful to have a single authoritative source for all documentation
> that's easy to search through.

If that is the case I would propose to keep it all in header files and
organize the headers.

> That includes stuff like perl585delta(1) which we'd stick in
> Documentation/RelNotes, and "Internals and C Language Interface". Most
> of what we'd put in Documentation/technical/api-* & headers is in
> perlapi(1).

This seems cool, but was also a recent introduction?
perl400delta seems to yield nothing for me (which may be because
I do not have an old version of perl installed?)

>
> Sometimes it's obvious where to look, but as someone who uses most of
> these APIs very occasionally (because I contribute less) I keep
> (re-)discovering various internal APIs.

Sometimes I have the same feeling. Maybe more structure in the
source files would help (e.g. datastructures/{strbuf, string-list}.h
and objects/{packfile.h, sha1*} ?

> Every perl installation also ships perlapi and a MB or so of obscure
> internal docs to everyone, which makes it easier to find via Google et
> al, which I find useful. So I guess I'm more on the side fence of
> dropping a hypothetical gitapi-oid-array into /usr/share/man than you
> are.
>

Personally I would not want to ship our internal docs everywhere
as it seems like overly wasteful. But then, only my early days
of childhood were guided by Internet that is not available anywhere
and everywhere. Today I would just take for granted that I can lookup
things in github/git/git when I am in not at my regular desk.

> ANYWAY
>
> This E-mail is much longer than I intended, sorry about that. I didn't
> just mean to bitch about how we ship docs, but I was wondering if there
> was a desire to move to something like what I've outlined above, or
> whether the status quo was mostly by design and intended.
>
> I just thought I'd write this rather lengthy E-Mails because this is one
> of the rare areas where you can fully describe an idea in E-Mail without
> writing any patches.
>
> If the consensus is that something like the exhaustive index "perldoc
> perl" provides wouldn't be useful for git I can just drop this, but if
> people are enthusiastic about having that it would be useful to work on
> it...

I agree with Junio, as that the discoverability is unrelated to where to store
the actual docs, Documentation/technical/* has the advantage that it
only has "good" stuff, i.e. some topic that someone cared enough to
write about and it is easy to guess if it is relevant to your need.
In *.h, we have a lot of false positives (xdiff-interface.h or cache.h
just pollute the searching space when looking for suitable storage
classes.)

So I wonder if we'd want to have a list (similar as in
command-list.txt) that describes the header files and gives
some classification of them, for example one class could be the
data structures (strbuf, stringlist, hashmap), algorithms
(diff, range-diff), OS specific stuff (run-command, trace, alloc)
or Git specific things (apply, fetch, submodule)


Re: [PATCH] fetch: replace string-list used as a look-up table with a hashmap

2018-09-26 Thread Stefan Beller
On Wed, Sep 26, 2018 at 2:28 PM Junio C Hamano  wrote:

>
> +struct refname_hash_entry {
> +   struct hashmap_entry ent; /* must be the first member */

$ git grep "struct hashmap_entry" reveals that we have another
convention that we follow but not document, which is to stress
the importance of putting the hashmap_entry first. ;-)

> +   struct object_id oid;
> +   char refname[FLEX_ARRAY];
> +};
> +
> +static int refname_hash_entry_cmp(const void *hashmap_cmp_fn_data,
> + const struct refname_hash_entry *e1,
> + const struct refname_hash_entry *e2,
> + const void *keydata)
> +{
> +   return strcmp(e1->refname, keydata ? keydata : e2->refname);
> +}

and later

hashmap_init(...  (hashmap_cmp_fn)refname_hash_entry_cmp, ...);

I wonder if we'd want to stick to this style, as that seems to be the easiest
to do, and drop the efforts started in 55c965f3a2 (Merge branch
'sb/hashmap-cleanup', 2017-08-11), that avoids the cast in the init,
but puts the (implicit) casts in the _cmp function as we'd take
void pointers as the function signature and recast to a local variable.

> +
> +static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
> +  const char *refname,
> +  const struct object_id 
> *oid)
> +{
> +   struct refname_hash_entry *ent;
> +   size_t len = strlen(refname);
> +
> +   FLEX_ALLOC_MEM(ent, refname, refname, len);
> +   hashmap_entry_init(ent, memhash(refname, len));

memhash is preferred over strhash as we already know the length.
For a second I wondered why we use it instead of strhash

> +static int refname_hash_exists(struct hashmap *map, const char *refname)
> +{
> +   struct hashmap_entry key;
> +   size_t len = strlen(refname);
> +   hashmap_entry_init(, memhash(refname, len));

Here we could use strhash, but as they could change to
differing implementation, we keep using memhash.

> /*
> -* For all the tags in the remote_refs string list,
> +* For all the tags in the remote_refs_list,
>  * add them to the list of refs to be fetched
>  */
> -   for_each_string_list_item(item, _refs) {
> +   for_each_string_list_item(remote_ref_item, _refs_list) {
> +   const char *refname = remote_ref_item->string;
> +   struct hashmap_entry key;
> +
> +   hashmap_entry_init(, memhash(refname, strlen(refname)));
> +   item = hashmap_get(_refs, , refname);

> +   if (!item)
> +   continue; /* can this happen??? */

I thought this could happen when we have hash collisions,
I convinced myself otherwise, as we pass the refname to hashmap_get
as the 'keydata', so when there is a hash collision we keep comparing
to the real value.

And as whenever we have an insert to remote_refs_list, we also
have a refname_hash_add to the remote_refs hashmap,
I think the return of NULL is not possible, and we could switch
it to BUG(...);


> All this code predates more modern in-core lookup API like hashmap;
> replace them with two hashmaps and one string list---the hashmaps
> are used for look-ups and the string list is to keep the order of
> items in the returned result stable (which is the only single thing
> hashmap does worse than lookups on string-list).
>
> Signed-off-by: Junio C Hamano 
> ---

I wonder if we want to add an API to the hashmap that allows for ordered
iteration (as hashmap_iter_* relies on the hashing function for its output)

I think it may be too early to do so, but there are 2 thoughts on how to do it:
* Each element keeps (prev/next) pointers to the previously inserted
  element and updates the next pointer when the next element is inserted.
  When removing elements we'd have to update the next and prev pointer
  of the adjacent elements.
  Then iteration can be done in insertion-order.
  Probably this would go into its own file ordered-hashmap.{c, h}
* provide an order function to hashmap_iter_init, which then
  orders according to that function before the first call of
  hashmap_iter_next returns.

>  * Just to remind ourselves that we talked about getting rid of
>string-list used as look-up tables by replacing them with hashmap
>but haven't seen much effort expended on it.  I think this is my
>first semi-serious use of hashmap, and the code is expected to be
>full of newbie mistakes, inefficiencies and ignorance of idioms;
>pointing out any of which is much appreciated.

I could not find newbie mistakes, but gave my thoughts anyway.

Stefan


[PATCH] doc: clarify gitcredentials path component matching

2018-09-26 Thread Zych, David M
The gitcredentials documentation implied that the config file's
"pattern" URL might include a path component, but did not explain that
it must match exactly (potentially leaving readers with the false hope
that it would support a more flexible prefix match).

Signed-off-by: David Zych 
---
 Documentation/gitcredentials.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/gitcredentials.txt b/Documentation/gitcredentials.txt
index f970196..adc7596 100644
--- a/Documentation/gitcredentials.txt
+++ b/Documentation/gitcredentials.txt
@@ -133,6 +133,12 @@ compares hostnames exactly, without considering whether 
two hosts are part of
 the same domain. Likewise, a config entry for `http://example.com` would not
 match: Git compares the protocols exactly.
 
+If the "pattern" URL does include a path component, then this too must match
+exactly: the context `https://example.com/bar/baz.git` will match a config
+entry for `https://example.com/bar/baz.git` (in addition to matching the config
+entry for `https://example.com`) but will not match a config entry for
+`https://example.com/bar`.
+
 
 CONFIGURATION OPTIONS
 -
-- 
2.7.4



Re: [PATCH v4 4/9] submodule: move global changed_submodule_names into fetch submodule struct

2018-09-26 Thread Junio C Hamano
Stefan Beller  writes:

> The `changed_submodule_names` are only used for fetching, so let's make it
> part of the struct that is passed around for fetching submodules.
>
> Signed-off-by: Stefan Beller 
> ---
>  submodule.c | 42 +++---
>  1 file changed, 23 insertions(+), 19 deletions(-)

Yup, the less file-scope static we have and the more of them moved
to a struct, the closer we get to be able to use multiple of them at
the same time, which is a very nice step in the right direction.

>
> diff --git a/submodule.c b/submodule.c
> index 22c64bd8559..17103379ba4 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -25,7 +25,7 @@
>  #include "commit-reach.h"
>  
>  static int config_update_recurse_submodules = RECURSE_SUBMODULES_OFF;
> -static struct string_list changed_submodule_names = STRING_LIST_INIT_DUP;
> +
>  static int initialized_fetch_ref_tips;
>  static struct oid_array ref_tips_before_fetch;
>  static struct oid_array ref_tips_after_fetch;
> @@ -1110,7 +1110,22 @@ void check_for_new_submodule_commits(struct object_id 
> *oid)
>   oid_array_append(_tips_after_fetch, oid);
>  }
>  
> -static void calculate_changed_submodule_paths(void)
> +struct submodule_parallel_fetch {
> + int count;
> + struct argv_array args;
> + struct repository *r;
> + const char *prefix;
> + int command_line_option;
> + int default_option;
> + int quiet;
> + int result;
> +
> + struct string_list changed_submodule_names;
> +};
> +#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0, 
> STRING_LIST_INIT_DUP }
> +
> +static void calculate_changed_submodule_paths(
> + struct submodule_parallel_fetch *spf)
>  {
>   struct argv_array argv = ARGV_ARRAY_INIT;
>   struct string_list changed_submodules = STRING_LIST_INIT_DUP;
> @@ -1148,7 +1163,8 @@ static void calculate_changed_submodule_paths(void)
>   continue;
>  
>   if (!submodule_has_commits(path, commits))
> - string_list_append(_submodule_names, 
> name->string);
> + string_list_append(>changed_submodule_names,
> +name->string);
>   }
>  
>   free_submodules_oids(_submodules);
> @@ -1185,18 +1201,6 @@ int submodule_touches_in_range(struct object_id 
> *excl_oid,
>   return ret;
>  }
>  
> -struct submodule_parallel_fetch {
> - int count;
> - struct argv_array args;
> - struct repository *r;
> - const char *prefix;
> - int command_line_option;
> - int default_option;
> - int quiet;
> - int result;
> -};
> -#define SPF_INIT {0, ARGV_ARRAY_INIT, NULL, NULL, 0, 0, 0, 0}
> -
>  static int get_fetch_recurse_config(const struct submodule *submodule,
>   struct submodule_parallel_fetch *spf)
>  {
> @@ -1257,7 +1261,7 @@ static int get_next_submodule(struct child_process *cp,
>   case RECURSE_SUBMODULES_ON_DEMAND:
>   if (!submodule ||
>   !string_list_lookup(
> - _submodule_names,
> + >changed_submodule_names,
>   submodule->name))
>   continue;
>   default_argv = "on-demand";
> @@ -1349,8 +1353,8 @@ int fetch_populated_submodules(struct repository *r,
>   argv_array_push(, "--recurse-submodules-default");
>   /* default value, "--submodule-prefix" and its value are added later */
>  
> - calculate_changed_submodule_paths();
> - string_list_sort(_submodule_names);
> + calculate_changed_submodule_paths();
> + string_list_sort(_submodule_names);
>   run_processes_parallel(max_parallel_jobs,
>  get_next_submodule,
>  fetch_start_failure,
> @@ -1359,7 +1363,7 @@ int fetch_populated_submodules(struct repository *r,
>  
>   argv_array_clear();
>  out:
> - string_list_clear(_submodule_names, 1);
> + string_list_clear(_submodule_names, 1);
>   return spf.result;
>  }


Re: [PATCH v4 2/9] submodule.c: fix indentation

2018-09-26 Thread Junio C Hamano
Stefan Beller  writes:

> The submodule subsystem is really bad at staying within 80 characters.
> Fix it while we are here.
>
> Signed-off-by: Stefan Beller 
> ---

Makes sense.

>  submodule.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index b53cb6e9c47..0de9e2800ad 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1244,7 +1244,8 @@ static int get_next_submodule(struct child_process *cp,
>   if (!submodule) {
>   const char *name = default_name_or_path(ce->name);
>   if (name) {
> - default_submodule.path = default_submodule.name 
> = name;
> + default_submodule.path = name;
> + default_submodule.name = name;
>   submodule = _submodule;
>   }
>   }
> @@ -1254,8 +1255,10 @@ static int get_next_submodule(struct child_process *cp,
>   default:
>   case RECURSE_SUBMODULES_DEFAULT:
>   case RECURSE_SUBMODULES_ON_DEMAND:
> - if (!submodule || 
> !unsorted_string_list_lookup(_submodule_names,
> -  submodule->name))
> + if (!submodule ||
> + !unsorted_string_list_lookup(
> + _submodule_names,
> + submodule->name))
>   continue;
>   default_argv = "on-demand";
>   break;


Re: [PATCH v4 1/9] sha1-array: provide oid_array_filter

2018-09-26 Thread Junio C Hamano
Stefan Beller  writes:

> Helped-by: Junio C Hamano 
> Signed-off-by: Stefan Beller 
> ---
>  Documentation/technical/api-oid-array.txt |  5 +
>  sha1-array.c  | 17 +
>  sha1-array.h  |  3 +++
>  3 files changed, 25 insertions(+)

Perfect ;-)

>
> diff --git a/Documentation/technical/api-oid-array.txt 
> b/Documentation/technical/api-oid-array.txt
> index 9febfb1d528..c97428c2c34 100644
> --- a/Documentation/technical/api-oid-array.txt
> +++ b/Documentation/technical/api-oid-array.txt
> @@ -48,6 +48,11 @@ Functions
>   is not sorted, this function has the side effect of sorting
>   it.
>  
> +`oid_array_filter`::
> + Apply the callback function `want` to each entry in the array,
> + retaining only the entries for which the function returns true.
> + Preserve the order of the entries that are retained.
> +
>  Examples
>  
>  
> diff --git a/sha1-array.c b/sha1-array.c
> index b94e0ec0f5e..d922e94e3fc 100644
> --- a/sha1-array.c
> +++ b/sha1-array.c
> @@ -77,3 +77,20 @@ int oid_array_for_each_unique(struct oid_array *array,
>   }
>   return 0;
>  }
> +
> +void oid_array_filter(struct oid_array *array,
> +   for_each_oid_fn want,
> +   void *cb_data)
> +{
> + unsigned nr = array->nr, src, dst;
> + struct object_id *oids = array->oid;
> +
> + for (src = dst = 0; src < nr; src++) {
> + if (want([src], cb_data)) {
> + if (src != dst)
> + oidcpy([dst], [src]);
> + dst++;
> + }
> + }
> + array->nr = dst;
> +}
> diff --git a/sha1-array.h b/sha1-array.h
> index 232bf950172..55d016c4bf7 100644
> --- a/sha1-array.h
> +++ b/sha1-array.h
> @@ -22,5 +22,8 @@ int oid_array_for_each(struct oid_array *array,
>  int oid_array_for_each_unique(struct oid_array *array,
> for_each_oid_fn fn,
> void *data);
> +void oid_array_filter(struct oid_array *array,
> +   for_each_oid_fn want,
> +   void *cbdata);
>  
>  #endif /* SHA1_ARRAY_H */


Re: [PATCH v4 3/9] submodule.c: sort changed_submodule_names before searching it

2018-09-26 Thread Junio C Hamano
Stefan Beller  writes:

> We can string_list_insert() to maintain sorted-ness of the
> list as we find new items, or we can string_list_append() to
> build an unsorted list and sort it at the end just once.
>
> To pick which one is more appropriate, we notice the fact
> that we discover new items more or less in the already
> sorted order.  That makes "append then sort" more
> appropriate.

I somehow thought that we agreed that the second paragraph above did
not make much sense in the previous review round.

... goes and looks ...

https://public-inbox.org/git/CAGZ79kbavjVbTqXsmtjW6=jhkq47_p3mc6=92xop4_mfhqd...@mail.gmail.com/

That was two review cycles ago, I guess.


Re: [PATCH v6 0/7] speed up index load through parallelization

2018-09-26 Thread Junio C Hamano
Ben Peart  writes:

> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/a0300882d4
> Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v6 
> && git checkout a0300882d4
>
>
> This iteration brings back the Index Entry Offset Table (IEOT) extension
> which enables us to multi-thread the cache entry parsing without having
> the primary thread have to scan all the entries first.  In cases where the
> cache entry parsing is the most expensive part, this yields some additional
> savings.

Nice.

> Test w/100,000 filesBaseline  Optimize V4Extensions Entries
> 
> 0002.1: read_cache  22.36 18.74 -16.2%   18.64 -16.6%   12.63 -43.5%
>
> Test w/1,000,000 files  Baseline  Optimize V4Extensions Entries
> -
> 0002.1: read_cache  304.40270.70 -11.1%  195.50 -35.8%  204.82 -32.7%
>
> Note that on the 1,000,000 files case, multi-threading the cache entry parsing
> does not yield a performance win.  This is because the cost to parse the
> index extensions in this repo, far outweigh the cost of loading the cache
> entries.
> ...
> The high cost of parsing the index extensions is driven by the cache tree
> and the untracked cache extensions. As this is currently the longest pole,
> any reduction in this time will reduce the overall index load times so is
> worth further investigation in another patch series.

Interesting.

> One option would be to load each extension on a separate thread but I
> believe that is overkill for the vast majority of repos.  Instead, some
> optimization of the loading code for these two extensions is probably worth
> looking into as a quick examination shows that the bulk of the time for both
> of them is spent in xcalloc().

Thanks.  Looking forward to block some quality time off to read this
through, but from the cursory look (read: diff between the previous
round), this looks quite promising.


Re: Fixing constant preference prompts during tests

2018-09-26 Thread Junio C Hamano
Tacitus Aedifex  writes:

> It may be a little more complicated than this because looking at the
> tests it seems like they set up their own dummy user with dummy keys
> and use gpg directly.

Yes, that is """the test framework (t/test-lib.sh, t/lib-gpg.sh,
etc.)  tries to run our tests in a stable environment that is not
affected by real $HOME etc. owned by the user who happens to be
running the tests, but it could be that your copy of GnuPG may
require a bit more seeding than we do in our test framework to
squelch that preference prompt.""" part in my earlier response tried
to say.  We do not want your choice of gpg.program or what kind of
trust you have personally recorded in your ~/.gnupg/ affect how gpg
invoked inside our tests work.





Re: GPG signing is bent on WIndows

2018-09-26 Thread Tacitus Aedifex

On Wed, Sep 26, 2018 at 04:05:02PM -0400, Jeffrey Walton wrote:

I got to look at it today. On Windows:

$ cat ~/.gnupg/gpg-agent.conf
pinentry-program
/usr/local/MacGPG2/libexec/pinentry-mac.app/Contents/MacOS/pinentry-mac


Do you have this app on your system? That path looks like one for a mac 
pinentry program and won't work on Windows. There are several tutorials on the 
web that explain how to get GPG working on windows. I suggest starting there.  
Get so that you can sign normal files from the command line on Windows and then 
try again.


//tæ


Re: On shipping more of our technical docs as manpages

2018-09-26 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> So in terms of implementation I'm a big fan of the perl.git approach of
> having these docs as comments before the function implementation in the
> *.c file.
>
> It means you can't avoid seeing it or updating it when source
> spelunking, and it leaves header files short, which is useful when you'd
> like to get a general API overview without all the docs. Our documented
> headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but
> less than 20% when you strip the docs.

Just like you I do not care too much where the authoritative source
lives, and I do agree with Peff that separate text file is much more
likely to go stale wrt the code, and that it is an acceptable trade
off to write docs in comment and live with slightly less freedom
than writing in plain text in order to keep the docs and source
closer together.  I do not think between Peff and me, neither of us
were contrasting in-header vs in-code comments.

And I tend to agree that it makes it even harder to let doc and code
drift apart to have the doc near the implementation (as opposed to
in the header).  It probably makes the result less useful, unless
the project invests in making the result accessible like "man
perlapi" does, than having them in the header for people who view
headers as guide to users of API, though.

> We just don't have some central index of those, and they're not a
> default target which means the likes of our own https://git-scm.com
> don't build them, so they're harder to find.
>
> Every perl installation also ships perlapi and a MB or so of obscure
> internal docs to everyone, which makes it easier to find via Google et
> al, which I find useful. So I guess I'm more on the side fence of
> dropping a hypothetical gitapi-oid-array into /usr/share/man than you
> are.

Regardless of where we place docs (between .c/.h or .txt), making
them indexable and investing in make target to actually produce
these docs with ToC is different matter.  I do not think people who
actually spent time on our docs had enough inclination to do so
themselves, but that should not prevent you or other like-minded
people from leading by example.



Re: Fixing constant preference prompts during tests

2018-09-26 Thread Tacitus Aedifex

On Wed, Sep 26, 2018 at 10:15:06AM -0700, Junio C Hamano wrote:

Nobody raised this so far as far as I recall; thanks for the first
one to do so, as it is a sign that you are doing something unusual
(e.g. have newer or different version of GPG than most other people)
and others will hit by the same symptom later when whatever thing
you are using as a minority right now becomes more prevalent.


Indeed I am. I use Qubes OS with a split GPG setup. I have a virtual machine 
dedicated to handling my crypto keys and I use the qubes GPG wrapper to 
interact with the signing VM through the Xen hypervisor. I think the only thing 
that needs to change is that the tests respect gpg.program setting in 
.gitconfig. Signing and verifying commits and tags works perfectly but the 
tests don't allow me to override which program is used.


It may be a little more complicated than this because looking at the tests it 
seems like they set up their own dummy user with dummy keys and use gpg 
directly. I have gpg installed on my client VM so in theory it should work. I 
can't tell exctly where the prompt is coming from.


//tæ


[PATCH] fetch: replace string-list used as a look-up table with a hashmap

2018-09-26 Thread Junio C Hamano
In find_non_local_tags() helper function (used to implement the
"follow tags"), we use string_list_has_string() on two string lists
as a way to see if a refname has already been processed, etc.

All this code predates more modern in-core lookup API like hashmap;
replace them with two hashmaps and one string list---the hashmaps
are used for look-ups and the string list is to keep the order of
items in the returned result stable (which is the only single thing
hashmap does worse than lookups on string-list).

Signed-off-by: Junio C Hamano 
---

 * Just to remind ourselves that we talked about getting rid of
   string-list used as look-up tables by replacing them with hashmap
   but haven't seen much effort expended on it.  I think this is my
   first semi-serious use of hashmap, and the code is expected to be
   full of newbie mistakes, inefficiencies and ignorance of idioms;
   pointing out any of which is much appreciated.

 builtin/fetch.c | 116 +---
 1 file changed, 90 insertions(+), 26 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0696abfc2a..5d1fd8dd49 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -246,16 +246,73 @@ static int will_fetch(struct ref **head, const unsigned 
char *sha1)
return 0;
 }
 
+struct refname_hash_entry {
+   struct hashmap_entry ent; /* must be the first member */
+   struct object_id oid;
+   char refname[FLEX_ARRAY];
+};
+
+static int refname_hash_entry_cmp(const void *hashmap_cmp_fn_data,
+ const struct refname_hash_entry *e1,
+ const struct refname_hash_entry *e2,
+ const void *keydata)
+{
+   return strcmp(e1->refname, keydata ? keydata : e2->refname);
+}
+
+static struct refname_hash_entry *refname_hash_add(struct hashmap *map,
+  const char *refname,
+  const struct object_id *oid)
+{
+   struct refname_hash_entry *ent;
+   size_t len = strlen(refname);
+
+   FLEX_ALLOC_MEM(ent, refname, refname, len);
+   hashmap_entry_init(ent, memhash(refname, len));
+   oidcpy(>oid, oid);
+   hashmap_add(map, ent);
+   return ent;
+}
+
+static int add_one_refname(const char *refname,
+  const struct object_id *oid,
+  int flag, void *cbdata)
+{
+   struct hashmap *refname_map = cbdata;
+
+   (void) refname_hash_add(refname_map, refname, oid);
+   return 0;
+}
+
+static void refname_hash_init(struct hashmap *map)
+{
+   hashmap_init(map, (hashmap_cmp_fn)refname_hash_entry_cmp, NULL, 0);
+}
+
+static int refname_hash_exists(struct hashmap *map, const char *refname)
+{
+   struct hashmap_entry key;
+   size_t len = strlen(refname);
+   hashmap_entry_init(, memhash(refname, len));
+
+   return !!hashmap_get(map, , refname);
+}
+
 static void find_non_local_tags(const struct ref *refs,
struct ref **head,
struct ref ***tail)
 {
-   struct string_list existing_refs = STRING_LIST_INIT_DUP;
-   struct string_list remote_refs = STRING_LIST_INIT_NODUP;
+   struct hashmap existing_refs;
+   struct hashmap remote_refs;
+   struct string_list remote_refs_list = STRING_LIST_INIT_NODUP;
+   struct string_list_item *remote_ref_item;
const struct ref *ref;
-   struct string_list_item *item = NULL;
+   struct refname_hash_entry *item = NULL;
 
-   for_each_ref(add_existing, _refs);
+   refname_hash_init(_refs);
+   refname_hash_init(_refs);
+
+   for_each_ref(add_one_refname, _refs);
for (ref = refs; ref; ref = ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
@@ -271,10 +328,10 @@ static void find_non_local_tags(const struct ref *refs,
!has_object_file_with_flags(>old_oid,
OBJECT_INFO_QUICK) &&
!will_fetch(head, ref->old_oid.hash) &&
-   !has_sha1_file_with_flags(item->util,
+   !has_sha1_file_with_flags(item->oid.hash,
  OBJECT_INFO_QUICK) &&
-   !will_fetch(head, item->util))
-   item->util = NULL;
+   !will_fetch(head, item->oid.hash))
+   oidclr(>oid);
item = NULL;
continue;
}
@@ -286,47 +343,54 @@ static void find_non_local_tags(const struct ref *refs,
 * fetch.
 */
if (item &&
-   !has_sha1_file_with_flags(item->util, OBJECT_INFO_QUICK) &&
-   !will_fetch(head, item->util))
- 

Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Stefan Beller
> > (I am not seriously proposing unbalanced comments, but for
> > longer documenting comments we may want to consider,
> > omitting the asterisk?)
>
> If this is not a serious proposal, then I'll stop wasting
> everybody's time.

I proposed two things, one of which I was not serious about.

The other has downsides as both you and Peff point out.
So living with comments may not be the worst after all.

I'll read to Aevars proposal now.


Re: [PATCH 1/1] read-cache: update index format default to v4

2018-09-26 Thread Matthias Sohn
On Tue, Sep 25, 2018 at 8:01 PM Stefan Beller  wrote:
>
> On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee  wrote:
> >
> > On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> > >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via 
> > >> GitGitGadget wrote:
> > >>> From: Derrick Stolee 
> > >>>
> > >>> The index v4 format has been available since 2012 with 9d22778
> > >>> "reach-cache.c: write prefix-compressed names in the index". Since
> > >>> the format has been stable for so long, almost all versions of Git
> > >>> in use today understand version 4, removing one barrier to upgrade
> > >>> -- that someone may want to downgrade and needs a working repo.
> > >> What about alternative implementations, like JGit, libgit2, etc.?
> > > Speaking of libgit2, we are able to read and write index v4 since
> > > commit c1b370e93
> >
> > This is a good point, Szeder.
> >
> > Patrick: I'm glad LibGit2 is up-to-date with index formats.
> >
> > Unfortunately, taking a look (for the first time) at the JGit code
> > reveals that they don't appear to have v4 support. In
> > org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> > DirCache.readFrom() method: lines 488-494, I see the following snippet:
> >
> >  final int ver = NB.decodeInt32(hdr, 4);
> >  boolean extended = false;
> >  if (ver == 3)
> >  extended = true;
> >  else if (ver != 2)
> >  throw new
> > CorruptObjectException(MessageFormat.format(
> > JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
> >
> > It looks like this will immediately throw with versions other than 2 or 3.
> >
> > I'm adding Jonathan Nieder to CC so he can check with JGit people about
> > the impact of this change.
>
> JGit is used both on the server (which doesn't use index/staging area)
> as well as client side as e.g. an Eclipse integration, which would
> very much like to use the index.
>
> Adding Matthias Sohn as well, who is active in JGit and cares
> more about the client side than Googlers who only make use
> of the server side part of JGit.

thanks for the heads up, in fact JGit does not yet support index version 4.
I will look into this.

-Matthias


Re: Triggering "BUG: wt-status.c:476: multiple renames on the same target? how?"

2018-09-26 Thread Andrea Stacchiotti
I'm very sorry, I indeed forgot the `diff.renames=copies`.

The following script can reproduce the bug even with a blank config:

-

# Make a test repo
git init testrepo
cd testrepo
git config user.name A
git config user.email B
git config diff.renames copies

# Add a file called orig
echo 'a' > orig
git add orig
git commit -m'orig'

# Copy orig in new and modify orig
cp orig new
echo 'b' > orig

# add -N and then commit trigger the bug
git add -N new
git commit

# Cleanup
cd ..
rm -rf testrepo

Il 26/09/18 22:56, Ævar Arnfjörð Bjarmason ha scritto:
> 
> On Wed, Sep 26 2018, Andrea Stacchiotti wrote:
> 
>> Dear maintainer(s),
>>
>> the following script, when executed with git 2.19 triggers the bug in
>> the subject line.
>> The problem seems to be the interaction between add -N and rename detection.
>>
>> The git binary used is the one currently packaged in Debian unstable.
>>
>> I have searched the list for the bug text and have found nothing,
>> apologies if the bug is already known.
>>
>> System information, script content and script output follow.
>>
>> Andrea Stacchiotti
>>
>> --
>>
>> andreas@trelitri:/tmp$ uname -a
>> Linux trelitri 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18)
>> x86_64 GNU/Linux
>> andreas@trelitri:/tmp$ git --version
>> git version 2.19.0
>>
>> andreas@trelitri:/tmp$ cat bugscript.sh
>> # Make a test repo
>> git init testrepo
>> cd testrepo
>> git config user.name A
>> git config user.email B
>>
>> # Add a file called orig
>> echo 'a' > orig
>> git add orig
>> git commit -m'orig'
>>
>> # Copy orig in new and modify orig
>> cp orig new
>> echo 'b' > orig
>>
>> # add -N and then commit trigger the bug
>> git add -N new
>> git commit
>>
>> # Cleanup
>> cd ..
>> rm -rf testrepo
>>
>> andreas@trelitri:/tmp$ LANG=C ./bugscript.sh
>> Initialized empty Git repository in /tmp/testrepo/.git/
>> [master (root-commit) 5dedf30] orig
>>  1 file changed, 0 insertions(+), 0 deletions(-)
>>  create mode 100644 orig
>> BUG: wt-status.c:476: multiple renames on the same target? how?
>> ./bugscript.sh: line 18: 22762 Aborted git commit
> 
> I can't reproduce this on Debian AMD64 either 2.19.0 in unstable, or
> 2.19.0.605.g01d371f741 in experimental. I tried moving my ~/.gitconfig
> out of the way, do you have some config options there that might be
> contributing to this?
> 


pEpkey.asc
Description: application/pgp-keys


Re: Triggering "BUG: wt-status.c:476: multiple renames on the same target? how?"

2018-09-26 Thread Ævar Arnfjörð Bjarmason


On Wed, Sep 26 2018, Andrea Stacchiotti wrote:

> Dear maintainer(s),
>
> the following script, when executed with git 2.19 triggers the bug in
> the subject line.
> The problem seems to be the interaction between add -N and rename detection.
>
> The git binary used is the one currently packaged in Debian unstable.
>
> I have searched the list for the bug text and have found nothing,
> apologies if the bug is already known.
>
> System information, script content and script output follow.
>
> Andrea Stacchiotti
>
> --
>
> andreas@trelitri:/tmp$ uname -a
> Linux trelitri 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18)
> x86_64 GNU/Linux
> andreas@trelitri:/tmp$ git --version
> git version 2.19.0
>
> andreas@trelitri:/tmp$ cat bugscript.sh
> # Make a test repo
> git init testrepo
> cd testrepo
> git config user.name A
> git config user.email B
>
> # Add a file called orig
> echo 'a' > orig
> git add orig
> git commit -m'orig'
>
> # Copy orig in new and modify orig
> cp orig new
> echo 'b' > orig
>
> # add -N and then commit trigger the bug
> git add -N new
> git commit
>
> # Cleanup
> cd ..
> rm -rf testrepo
>
> andreas@trelitri:/tmp$ LANG=C ./bugscript.sh
> Initialized empty Git repository in /tmp/testrepo/.git/
> [master (root-commit) 5dedf30] orig
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  create mode 100644 orig
> BUG: wt-status.c:476: multiple renames on the same target? how?
> ./bugscript.sh: line 18: 22762 Aborted git commit

I can't reproduce this on Debian AMD64 either 2.19.0 in unstable, or
2.19.0.605.g01d371f741 in experimental. I tried moving my ~/.gitconfig
out of the way, do you have some config options there that might be
contributing to this?


Subtree bugs

2018-09-26 Thread Strain, Roger L.
I emailed the group yesterday regarding a possible error in the subtree script. 
I've continued debugging and since have been able to reproduce the problem. In 
fact, there are two failure cases which may expose three different problems. To 
demonstrate, the following commands create two repos (main and sub), add a 
variety of commits, perform some merges, splits, and rejoins, and finally 
perform two splits. These two splits generate a commit history which cannot be 
pushed back to the subtree repo in two different manners.

Commands to demonstrate both failures are here:
https://gist.github.com/FoxFireX/1b794384612b7fd5e7cd157cff96269e

Short summary of three problems involved:
1. Split using rejoins fails in some cases where a commit has a parent which 
was a parent commit further upstream from a rejoin, causing a new initial 
commit to be created, which is not related to the original subtree commits.
2. Split using rejoins fails to generate a merge commit which may have triaged 
the previous problem, but instead elected to use only the parent which is not 
connected to the original subtree commits. (This may occur when the commit and 
both parents all share the same subtree hash.)
3. Split ignoring joins also ignores the original add commit, which causes 
content prior to the add to be considered part of the subtree graph, changing 
the commit hashes so it is not connected to the original subtree commits.

I'm going to start looking at how to fix/work around either or both of these 
issues because it's holding up some work for us, but if anyone else has seen 
something like this, or has suggestions, I'd be happy to hear them.  If I do 
come up with a fix, I'll ask for approval to contribute it back.

-- 
Roger Strain


Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 01:19:20PM -0700, Junio C Hamano wrote:

> > /**
> > Would we be open to consider another commenting style?
> 
> No.  You still cannot write */ in that comment section, for example,
> so you cannot do "plain text" still---that is not a great trade off
> to deviate from the common comment style used for other stuff.

We can do:

#if HEADER_DOC

Write anything here except #endif!

#endif /* HEADER_DOC */

But I actually think that is more jarring than a big comment (when I am
reading C code, I expect to see C-like things mostly.

I do agree with you that a true plain text file is nicer in terms of
freedom to format, etc. But IMHO the tradeoff is easily worth it to keep
the related content (function declarations and their documentation)
close together.

-Peff


On shipping more of our technical docs as manpages

2018-09-26 Thread Ævar Arnfjörð Bjarmason


[Changing subject because this stopped being about Stefan's patches a
while ago]

On Wed, Sep 26 2018, Jeff King wrote:

> On Wed, Sep 26, 2018 at 08:43:14PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> While we're on that subject, I'd much prefer if these technical docs and
>> other asciidoc-ish stuff we would be manpages build as part of our
>> normal "make doc" target. So e.g. this one would be readable via
>> something like "man 3 gitapi-oid-array".

[responding out of order]

> Now I'll admit it seems like make-work to me because I would not plan to
> ever look at the formatted output myself. But I guess I don't understand
> the audience for this formatted output. These are APIs internal to Git
> itself. We would not generally want to install gitapi-oid-array into
> /usr/share/man, because only people actually working on Git would be
> able to use it. So it sounds like a convenience for a handful of
> developers (who like to look at this manpage versus the source). It
> doesn't seem like the cost/benefit is there.
>
> And if we were going to generate something external, would it make more
> sense to write in a structured format like doxygen? I am not a big fan
> of it myself, but at least from there you can generate a more richly
> interconnected set of documentation.

It's useful to have a single authoritative source for all documentation
that's easy to search through.

I don't really care where that text actually lives, i.e. whether it's
all in *.txt to begin with, or in *.c or *.h and pulled together by some
"make" step, or much how it's formatted, beyond it being easy to work
with. I'm not a big fan of asciidoc, but it's what we've got if we want
formatting, inter-linking etc.

My bias here is that I've also contributed actively to the perl project
in the past, and with that project you can get an overview of *all* of
the docs by typing:

man perl

That includes stuff like perl585delta(1) which we'd stick in
Documentation/RelNotes, and "Internals and C Language Interface". Most
of what we'd put in Documentation/technical/api-* & headers is in
perlapi(1).

I spent an embarrassingly long time submitting patches to git before
discovering that Documentation/technical/api-*.txt even existed, I think
something like 1-2 years ago.

We have very well documented stuff like strbuf.h (mostly thanks to you),
but how is such documentation supposed to be discovered? Something like:

git grep -A20 '^/\*$' -- *.h


?

In terms of getting an overview it's indistinguishable from
comments. I.e. there's nothing like an index of:

man git-api-strbuf ==> working with strings
man git-api-sha1-array ==> list, iterate and binary lookup SHA1s

etc.

Sometimes it's obvious where to look, but as someone who uses most of
these APIs very occasionally (because I contribute less) I keep
(re-)discovering various internal APIs.

I'm also not in the habit of opening the *.h file for something, I
usually just start reading the *.c and only later discover there's some
API docs in the relevant *.h (or not).

So in terms of implementation I'm a big fan of the perl.git approach of
having these docs as comments before the function implementation in the
*.c file.

It means you can't avoid seeing it or updating it when source
spelunking, and it leaves header files short, which is useful when you'd
like to get a general API overview without all the docs. Our documented
headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but
less than 20% when you strip the docs.

> I'm mildly negative on this, just because it introduces extra work on
> people writing the documentation. Now it has to follow special
> formatting rules, and sometimes the source is uglier (e.g., the horrible
> +-continuation in lists). Are authors now responsible for formatting any
> changes they make to make sure they look good in asciidoc? Or are people
> who care about the formatted output going to come along afterwards and
> submit fixup patches? Either way it seems like make-work.

This part I'm slightly confused by. If you're just saying "let's
document stuff in *.h files in free-flowing text", fine. But if we're
talking about the difference between Documentation/technical/api-*.txt
and having the same stuff in manpages, then the stuff in api-*.txt *is*
already asciidoc, and we have a target to format it all up and ship it
with git, and distros ship that.

E.g. on Debian you can "apt install git-doc" and browse stuff like
file:///usr/share/doc/git-doc/technical/api-argv-array.html which is the
HTML formatted version, same for all the other api-*.txt docs.

We just don't have some central index of those, and they're not a
default target which means the likes of our own https://git-scm.com
don't build them, so they're harder to find.

Every perl installation also ships perlapi and a MB or so of obscure
internal docs to everyone, which makes it easier to find via Google et
al, which I find useful. So I guess I'm more on the side 

Triggering "BUG: wt-status.c:476: multiple renames on the same target? how?"

2018-09-26 Thread Andrea Stacchiotti
Dear maintainer(s),

the following script, when executed with git 2.19 triggers the bug in
the subject line.
The problem seems to be the interaction between add -N and rename detection.

The git binary used is the one currently packaged in Debian unstable.

I have searched the list for the bug text and have found nothing,
apologies if the bug is already known.

System information, script content and script output follow.

Andrea Stacchiotti

--

andreas@trelitri:/tmp$ uname -a
Linux trelitri 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18)
x86_64 GNU/Linux
andreas@trelitri:/tmp$ git --version
git version 2.19.0

andreas@trelitri:/tmp$ cat bugscript.sh
# Make a test repo
git init testrepo
cd testrepo
git config user.name A
git config user.email B

# Add a file called orig
echo 'a' > orig
git add orig
git commit -m'orig'

# Copy orig in new and modify orig
cp orig new
echo 'b' > orig

# add -N and then commit trigger the bug
git add -N new
git commit

# Cleanup
cd ..
rm -rf testrepo

andreas@trelitri:/tmp$ LANG=C ./bugscript.sh
Initialized empty Git repository in /tmp/testrepo/.git/
[master (root-commit) 5dedf30] orig
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 orig
BUG: wt-status.c:476: multiple renames on the same target? how?
./bugscript.sh: line 18: 22762 Aborted git commit


pEpkey.asc
Description: application/pgp-keys


Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Junio C Hamano
Stefan Beller  writes:

>> I agree on both counts.  I just like to read these in plain text
>> while I am coding for Git (or reviewing patches coded for Git).
>>
>> The reason why I have mild preference to D/technical/ over in-header
>> doc is only because I find even these asterisks at the left-side-end
>> distracting; it is not that materials in D/technical could be passed
>> through AsciiDoc.
>
> /**
> Would we be open to consider another commenting style?

No.  You still cannot write */ in that comment section, for example,
so you cannot do "plain text" still---that is not a great trade off
to deviate from the common comment style used for other stuff.

When I say I want plain text, I mean it.  Anything you write in .h
cannot be plain text, unless you make all readers agree with some
convention, and "there is always '*' at the left edge" is such an
acceptable convention can learn to live with, just like everybody
else does ; at least, that is consistent with other comments.

> (I am not seriously proposing unbalanced comments, but for
> longer documenting comments we may want to consider,
> omitting the asterisk?)

If this is not a serious proposal, then I'll stop wasting
everybody's time.

Now it is clear that the only pieces of consensus we got out of this
discussion so far is that we want to add to CodingGuidelines, that
my earlier "something like this" is not the text we want and that we
want everything in the header as comment.

So let's see if we can have usable alternative, apply that and move
on.


GPG signing is bent on WIndows

2018-09-26 Thread Jeffrey Walton
Several weeks ago I updated to the latest Git for Windows (when
prompted by the version check). At the time I noticed:

$ git commit -S -am "Fix unset MAKE variable in test scripts"
gpg: signing failed: No pinentry
gpg: signing failed: No pinentry
error: gpg failed to sign the data
fatal: failed to write commit object

I got to look at it today. On Windows:

$ cat ~/.gnupg/gpg-agent.conf
pinentry-program
/usr/local/MacGPG2/libexec/pinentry-mac.app/Contents/MacOS/pinentry-mac

Jeff


Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Stefan Beller
> I agree on both counts.  I just like to read these in plain text
> while I am coding for Git (or reviewing patches coded for Git).
>
> The reason why I have mild preference to D/technical/ over in-header
> doc is only because I find even these asterisks at the left-side-end
> distracting; it is not that materials in D/technical could be passed
> through AsciiDoc.

/**
Would we be open to consider another commenting style?
AFAICT the asterisks are a cultural thing and not enforced by
and old compiler not understanding these comments.

Just like ending the comment in an asymetric fashion ;-) */

(I am not seriously proposing unbalanced comments, but for
longer documenting comments we may want to consider,
omitting the asterisk?)


[PATCH v6 7/7] read-cache: load cache entries on worker threads

2018-09-26 Thread Ben Peart
This patch helps address the CPU cost of loading the index by utilizing
the Index Entry Offset Table (IEOT) to divide loading and conversion of
the cache entries across multiple threads in parallel.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 files reduced the time by 32.24%
Test w/1,000,000 files reduced the time by -4.77%

Note that on the 1,000,000 files case, multi-threading the cache entry parsing
does not yield a performance win.  This is because the cost to parse the
index extensions in this repo, far outweigh the cost of loading the cache
entries.

The high cost of parsing the index extensions is driven by the cache tree
and the untracked cache extensions. As this is currently the longest pole,
any reduction in this time will reduce the overall index load times so is
worth further investigation in another patch series.

Signed-off-by: Ben Peart 
---
 read-cache.c | 224 +++
 1 file changed, 189 insertions(+), 35 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 9b0554d4e6..f5d766088d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1720,7 +1720,8 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *create_from_disk(struct index_state *istate,
+static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool,
+   unsigned int version,
struct ondisk_cache_entry *ondisk,
unsigned long *ent_size,
const struct cache_entry 
*previous_ce)
@@ -1737,7 +1738,7 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
 * number of bytes to be stripped from the end of the previous name,
 * and the bytes to append to the result, to come up with its name.
 */
-   int expand_name_field = istate->version == 4;
+   int expand_name_field = version == 4;
 
/* On-disk flags are just 16 bits */
flags = get_be16(>flags);
@@ -1761,16 +1762,17 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
const unsigned char *cp = (const unsigned char *)name;
size_t strip_len, previous_len;
 
-   previous_len = previous_ce ? previous_ce->ce_namelen : 0;
+   /* If we're at the begining of a block, ignore the previous 
name */
strip_len = decode_varint();
-   if (previous_len < strip_len) {
-   if (previous_ce)
+   if (previous_ce) {
+   previous_len = previous_ce->ce_namelen;
+   if (previous_len < strip_len)
die(_("malformed name field in the index, near 
path '%s'"),
-   previous_ce->name);
-   else
-   die(_("malformed name field in the index in the 
first path"));
+   previous_ce->name);
+   copy_len = previous_len - strip_len;
+   } else {
+   copy_len = 0;
}
-   copy_len = previous_len - strip_len;
name = (const char *)cp;
}
 
@@ -1780,7 +1782,7 @@ static struct cache_entry *create_from_disk(struct 
index_state *istate,
len += copy_len;
}
 
-   ce = mem_pool__ce_alloc(istate->ce_mem_pool, len);
+   ce = mem_pool__ce_alloc(ce_mem_pool, len);
 
ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec);
ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec);
@@ -1950,6 +1952,52 @@ static void *load_index_extensions(void *_data)
return NULL;
 }
 
+/*
+ * A helper function that will load the specified range of cache entries
+ * from the memory mapped file and add them to the given index.
+ */
+static unsigned long load_cache_entry_block(struct index_state *istate,
+   struct mem_pool *ce_mem_pool, int offset, int nr, const 
char *mmap,
+   unsigned long start_offset, const struct cache_entry 
*previous_ce)
+{
+   int i;
+   unsigned long src_offset = start_offset;
+
+   for (i = offset; i < offset + nr; i++) {
+   struct ondisk_cache_entry *disk_ce;
+   struct cache_entry *ce;
+   unsigned long consumed;
+
+   disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset);
+   ce = create_from_disk(ce_mem_pool, istate->version, disk_ce, 
, previous_ce);
+   set_index_entry(istate, i, ce);
+
+   src_offset += consumed;
+   previous_ce = ce;
+   }
+   return src_offset - start_offset;
+}
+
+static unsigned long load_all_cache_entries(struct index_state *istate,
+  

[PATCH v6 6/7] ieot: add Index Entry Offset Table (IEOT) extension

2018-09-26 Thread Ben Peart
This patch enables addressing the CPU cost of loading the index by adding
additional data to the index that will allow us to efficiently multi-
thread the loading and conversion of cache entries.

It accomplishes this by adding an (optional) index extension that is a
table of offsets to blocks of cache entries in the index file.  To make
this work for V4 indexes, when writing the cache entries, it periodically
"resets" the prefix-compression by encoding the current entry as if the
path name for the previous entry is completely different and saves the
offset of that entry in the IEOT.  Basically, with V4 indexes, it
generates offsets into blocks of prefix-compressed entries.

Signed-off-by: Ben Peart 
---
 Documentation/technical/index-format.txt |  18 +++
 read-cache.c | 166 +++
 2 files changed, 184 insertions(+)

diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index 6bc2d90f7f..7c4d67aa6a 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -337,3 +337,21 @@ The remaining data of each directory block is grouped by 
type:
 
SHA-1("TREE" +  +
"REUC" + )
+
+== Index Entry Offset Table
+
+  The Index Entry Offset Table (IEOT) is used to help address the CPU
+  cost of loading the index by enabling multi-threading the process of
+  converting cache entries from the on-disk format to the in-memory format.
+  The signature for this extension is { 'I', 'E', 'O', 'T' }.
+
+  The extension consists of:
+
+  - 32-bit version (currently 1)
+
+  - A number of index offset entries each consisting of:
+
+- 32-bit offset from the begining of the file to the first cache entry
+   in this block of entries.
+
+- 32-bit count of cache entries in this block
diff --git a/read-cache.c b/read-cache.c
index 8da21c9273..9b0554d4e6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -45,6 +45,7 @@
 #define CACHE_EXT_UNTRACKED 0x554E5452   /* "UNTR" */
 #define CACHE_EXT_FSMONITOR 0x46534D4E   /* "FSMN" */
 #define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */
+#define CACHE_EXT_INDEXENTRYOFFSETTABLE 0x49454F54 /* "IEOT" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
@@ -1696,6 +1697,7 @@ static int read_index_extension(struct index_state 
*istate,
read_fsmonitor_extension(istate, data, sz);
break;
case CACHE_EXT_ENDOFINDEXENTRIES:
+   case CACHE_EXT_INDEXENTRYOFFSETTABLE:
/* already handled in do_read_index() */
break;
default:
@@ -1888,6 +1890,23 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
return ondisk_size + entries * per_entry;
 }
 
+struct index_entry_offset
+{
+   /* starting byte offset into index file, count of index entries in this 
block */
+   int offset, nr;
+};
+
+struct index_entry_offset_table
+{
+   int nr;
+   struct index_entry_offset entries[0];
+};
+
+#ifndef NO_PTHREADS
+static struct index_entry_offset_table *read_ieot_extension(const char *mmap, 
size_t mmap_size, size_t offset);
+static void write_ieot_extension(struct strbuf *sb, struct 
index_entry_offset_table *ieot);
+#endif
+
 static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, size_t offset);
 
@@ -1931,6 +1950,15 @@ static void *load_index_extensions(void *_data)
return NULL;
 }
 
+/*
+ * Mostly randomly chosen maximum thread counts: we
+ * cap the parallelism to online_cpus() threads, and we want
+ * to have at least 1 cache entries per thread for it to
+ * be worth starting a thread.
+ */
+
+#define THREAD_COST(1)
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -2523,6 +2551,9 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
int drop_cache_tree = istate->drop_cache_tree;
off_t offset;
+   int ieot_work = 1;
+   struct index_entry_offset_table *ieot = NULL;
+   int nr;
 
for (i = removed = extended = 0; i < entries; i++) {
if (cache[i]->ce_flags & CE_REMOVE)
@@ -2556,7 +2587,33 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
if (ce_write(, newfd, , sizeof(hdr)) < 0)
return -1;
 
+#ifndef NO_PTHREADS
+   if (!strip_extensions && (nr = git_config_get_index_threads()) != 1) {
+   int ieot_blocks, cpus;
+
+   /*
+* ensure default number of ieot blocks maps evenly to the
+* default number of threads that will process them

[PATCH v6 5/7] read-cache: load cache extensions on a worker thread

2018-09-26 Thread Ben Peart
This patch helps address the CPU cost of loading the index by loading
the cache extensions on a worker thread in parallel with loading the cache
entries.

In some cases, loading the extensions takes longer than loading the
cache entries so this patch utilizes the new EOIE to start the thread to
load the extensions before loading all the cache entries in parallel.

This is possible because the current extensions don't access the cache
entries in the index_state structure so are OK that they don't all exist
yet.

The CACHE_EXT_TREE, CACHE_EXT_RESOLVE_UNDO, and CACHE_EXT_UNTRACKED
extensions don't even get a pointer to the index so don't have access to the
cache entries.

CACHE_EXT_LINK only uses the index_state to initialize the split index.
CACHE_EXT_FSMONITOR only uses the index_state to save the fsmonitor last
update and dirty flags.

I used p0002-read-cache.sh to generate some performance data:

Test w/100,000 files reduced the time by 0.53%
Test w/1,000,000 files reduced the time by 27.78%

Signed-off-by: Ben Peart 
---
 read-cache.c | 97 +++-
 1 file changed, 81 insertions(+), 16 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 80255d3088..8da21c9273 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -23,6 +23,7 @@
 #include "split-index.h"
 #include "utf8.h"
 #include "fsmonitor.h"
+#include "thread-utils.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -1890,6 +1891,46 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
 static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, size_t offset);
 
+struct load_index_extensions
+{
+#ifndef NO_PTHREADS
+   pthread_t pthread;
+#endif
+   struct index_state *istate;
+   const char *mmap;
+   size_t mmap_size;
+   unsigned long src_offset;
+};
+
+static void *load_index_extensions(void *_data)
+{
+   struct load_index_extensions *p = _data;
+   unsigned long src_offset = p->src_offset;
+
+   while (src_offset <= p->mmap_size - the_hash_algo->rawsz - 8) {
+   /* After an array of active_nr index entries,
+* there can be arbitrary number of extended
+* sections, each of which is prefixed with
+* extension name (4-byte) and section length
+* in 4-byte network byte order.
+*/
+   uint32_t extsize;
+   memcpy(, p->mmap + src_offset + 4, 4);
+   extsize = ntohl(extsize);
+   if (read_index_extension(p->istate,
+   p->mmap + src_offset,
+   p->mmap + src_offset + 8,
+   extsize) < 0) {
+   munmap((void *)p->mmap, p->mmap_size);
+   die(_("index file corrupt"));
+   }
+   src_offset += 8;
+   src_offset += extsize;
+   }
+
+   return NULL;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -1900,6 +1941,11 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
const char *mmap;
size_t mmap_size;
const struct cache_entry *previous_ce = NULL;
+   struct load_index_extensions p;
+   size_t extension_offset = 0;
+#ifndef NO_PTHREADS
+   int nr_threads;
+#endif
 
if (istate->initialized)
return istate->cache_nr;
@@ -1936,6 +1982,30 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
istate->cache = xcalloc(istate->cache_alloc, sizeof(*istate->cache));
istate->initialized = 1;
 
+   p.istate = istate;
+   p.mmap = mmap;
+   p.mmap_size = mmap_size;
+
+#ifndef NO_PTHREADS
+   nr_threads = git_config_get_index_threads();
+   if (!nr_threads)
+   nr_threads = online_cpus();
+
+   if (nr_threads > 1) {
+   extension_offset = read_eoie_extension(mmap, mmap_size);
+   if (extension_offset) {
+   int err;
+
+   p.src_offset = extension_offset;
+   err = pthread_create(, NULL, 
load_index_extensions, );
+   if (err)
+   die(_("unable to create load_index_extensions 
thread: %s"), strerror(err));
+
+   nr_threads--;
+   }
+   }
+#endif
+
if (istate->version == 4) {
mem_pool_init(>ce_mem_pool,
  
estimate_cache_size_from_compressed(istate->cache_nr));
@@ -1960,22 +2030,17 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
istate->timestamp.sec = st.st_mtime;
istate->timestamp.nsec = ST_MTIME_NSEC(st);
 
-   

[PATCH v6 4/7] config: add new index.threads config setting

2018-09-26 Thread Ben Peart
Add support for a new index.threads config setting which will be used to
control the threading code in do_read_index().  A value of 0 will tell the
index code to automatically determine the correct number of threads to use.
A value of 1 will make the code single threaded.  A value greater than 1
will set the maximum number of threads to use.

For testing purposes, this setting can be overwritten by setting the
GIT_TEST_INDEX_THREADS= environment variable to a value greater than 0.

Signed-off-by: Ben Peart 
---
 Documentation/config.txt |  7 +++
 config.c | 18 ++
 config.h |  1 +
 t/README |  5 +
 t/t1700-split-index.sh   |  1 +
 5 files changed, 32 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ad0f4510c3..8fd973b76b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2413,6 +2413,13 @@ imap::
The configuration variables in the 'imap' section are described
in linkgit:git-imap-send[1].
 
+index.threads::
+   Specifies the number of threads to spawn when loading the index.
+   This is meant to reduce index load time on multiprocessor machines.
+   Specifying 0 or 'true' will cause Git to auto-detect the number of
+   CPU's and set the number of threads accordingly. Specifying 1 or
+   'false' will disable multithreading. Defaults to 'true'.
+
 index.version::
Specify the version with which new index files should be
initialized.  This does not affect existing repositories.
diff --git a/config.c b/config.c
index 3461993f0a..2ee29f6f86 100644
--- a/config.c
+++ b/config.c
@@ -2289,6 +2289,24 @@ int git_config_get_fsmonitor(void)
return 0;
 }
 
+int git_config_get_index_threads(void)
+{
+   int is_bool, val = 0;
+
+   val = git_env_ulong("GIT_TEST_INDEX_THREADS", 0);
+   if (val)
+   return val;
+
+   if (!git_config_get_bool_or_int("index.threads", _bool, )) {
+   if (is_bool)
+   return val ? 0 : 1;
+   else
+   return val;
+   }
+
+   return 0; /* auto */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
diff --git a/config.h b/config.h
index ab46e0165d..a06027e69b 100644
--- a/config.h
+++ b/config.h
@@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 extern int git_config_get_fsmonitor(void);
+extern int git_config_get_index_threads(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/t/README b/t/README
index aa33ac4f26..0fcecf4500 100644
--- a/t/README
+++ b/t/README
@@ -332,6 +332,11 @@ This is used to allow tests 1, 4-9 in t1700-split-index.sh 
to succeed
 as they currently hard code SHA values for the index which are no longer
 valid due to the addition of the EOIE extension.
 
+GIT_TEST_INDEX_THREADS= enables exercising the multi-threaded loading
+of the index for the whole test suite by bypassing the default number of
+cache entries and thread minimums. Settting this to 1 will make the
+index loading single threaded.
+
 Naming Tests
 
 
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 1f168378c8..ab205954cf 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -8,6 +8,7 @@ test_description='split index mode tests'
 sane_unset GIT_TEST_SPLIT_INDEX
 sane_unset GIT_FSMONITOR_TEST
 GIT_TEST_DISABLE_EOIE=true; export GIT_TEST_DISABLE_EOIE
+GIT_TEST_INDEX_THREADS=1; export GIT_TEST_INDEX_THREADS
 
 test_expect_success 'enable split index' '
git config splitIndex.maxPercentChange 100 &&
-- 
2.18.0.windows.1



[PATCH v6 2/7] read-cache: clean up casting and byte decoding

2018-09-26 Thread Ben Peart
This patch does a clean up pass to minimize the casting required to work
with the memory mapped index (mmap).

It also makes the decoding of network byte order more consistent by using
get_be32() where possible.

Signed-off-by: Ben Peart 
---
 read-cache.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 583a4fb1f8..6ba99e2c96 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1650,7 +1650,7 @@ int verify_index_checksum;
 /* Allow fsck to force verification of the cache entry order. */
 int verify_ce_order;
 
-static int verify_hdr(struct cache_header *hdr, unsigned long size)
+static int verify_hdr(const struct cache_header *hdr, unsigned long size)
 {
git_hash_ctx c;
unsigned char hash[GIT_MAX_RAWSZ];
@@ -1674,7 +1674,7 @@ static int verify_hdr(struct cache_header *hdr, unsigned 
long size)
 }
 
 static int read_index_extension(struct index_state *istate,
-   const char *ext, void *data, unsigned long sz)
+   const char *ext, const char *data, unsigned 
long sz)
 {
switch (CACHE_EXT(ext)) {
case CACHE_EXT_TREE:
@@ -1889,8 +1889,8 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
int fd, i;
struct stat st;
unsigned long src_offset;
-   struct cache_header *hdr;
-   void *mmap;
+   const struct cache_header *hdr;
+   const char *mmap;
size_t mmap_size;
const struct cache_entry *previous_ce = NULL;
 
@@ -1918,7 +1918,7 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
die_errno("unable to map index file");
close(fd);
 
-   hdr = mmap;
+   hdr = (const struct cache_header *)mmap;
if (verify_hdr(hdr, mmap_size) < 0)
goto unmap;
 
@@ -1943,7 +1943,7 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
struct cache_entry *ce;
unsigned long consumed;
 
-   disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
src_offset);
+   disk_ce = (struct ondisk_cache_entry *)(mmap + src_offset);
ce = create_from_disk(istate, disk_ce, , previous_ce);
set_index_entry(istate, i, ce);
 
@@ -1961,21 +1961,20 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
 * in 4-byte network byte order.
 */
uint32_t extsize;
-   memcpy(, (char *)mmap + src_offset + 4, 4);
-   extsize = ntohl(extsize);
+   extsize = get_be32(mmap + src_offset + 4);
if (read_index_extension(istate,
-(const char *) mmap + src_offset,
-(char *) mmap + src_offset + 8,
+mmap + src_offset,
+mmap + src_offset + 8,
 extsize) < 0)
goto unmap;
src_offset += 8;
src_offset += extsize;
}
-   munmap(mmap, mmap_size);
+   munmap((void *)mmap, mmap_size);
return istate->cache_nr;
 
 unmap:
-   munmap(mmap, mmap_size);
+   munmap((void *)mmap, mmap_size);
die("index file corrupt");
 }
 
-- 
2.18.0.windows.1



[PATCH v6 1/7] read-cache.c: optimize reading index format v4

2018-09-26 Thread Ben Peart
From: Nguyễn Thái Ngọc Duy 

Index format v4 requires some more computation to assemble a path
based on a previous one. The current code is not very efficient
because

 - it doubles memory copy, we assemble the final path in a temporary
   first before putting it back to a cache_entry

 - strbuf_remove() in expand_name_field() is not exactly a good fit
   for stripping a part at the end, _setlen() would do the same job
   and is much cheaper.

 - the open-coded loop to find the end of the string in
   expand_name_field() can't beat an optimized strlen()

This patch avoids the temporary buffer and writes directly to the new
cache_entry, which addresses the first two points. The last point
could also be avoided if the total string length fits in the first 12
bits of ce_flags, if not we fall back to strlen().

Running "test-tool read-cache 100" on webkit.git (275k files), reading
v2 only takes 4.226 seconds, while v4 takes 5.711 seconds, 35% more
time. The patch reduces read time on v4 to 4.319 seconds.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 read-cache.c | 128 ---
 1 file changed, 60 insertions(+), 68 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 8d04d78a58..583a4fb1f8 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1713,63 +1713,24 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool,
-  struct ondisk_cache_entry 
*ondisk,
-  unsigned int flags,
-  const char *name,
-  size_t len)
-{
-   struct cache_entry *ce = mem_pool__ce_alloc(mem_pool, len);
-
-   ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec);
-   ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec);
-   ce->ce_stat_data.sd_ctime.nsec = get_be32(>ctime.nsec);
-   ce->ce_stat_data.sd_mtime.nsec = get_be32(>mtime.nsec);
-   ce->ce_stat_data.sd_dev   = get_be32(>dev);
-   ce->ce_stat_data.sd_ino   = get_be32(>ino);
-   ce->ce_mode  = get_be32(>mode);
-   ce->ce_stat_data.sd_uid   = get_be32(>uid);
-   ce->ce_stat_data.sd_gid   = get_be32(>gid);
-   ce->ce_stat_data.sd_size  = get_be32(>size);
-   ce->ce_flags = flags & ~CE_NAMEMASK;
-   ce->ce_namelen = len;
-   ce->index = 0;
-   hashcpy(ce->oid.hash, ondisk->sha1);
-   memcpy(ce->name, name, len);
-   ce->name[len] = '\0';
-   return ce;
-}
-
-/*
- * Adjacent cache entries tend to share the leading paths, so it makes
- * sense to only store the differences in later entries.  In the v4
- * on-disk format of the index, each on-disk cache entry stores the
- * number of bytes to be stripped from the end of the previous name,
- * and the bytes to append to the result, to come up with its name.
- */
-static unsigned long expand_name_field(struct strbuf *name, const char *cp_)
-{
-   const unsigned char *ep, *cp = (const unsigned char *)cp_;
-   size_t len = decode_varint();
-
-   if (name->len < len)
-   die("malformed name field in the index");
-   strbuf_remove(name, name->len - len, len);
-   for (ep = cp; *ep; ep++)
-   ; /* find the end */
-   strbuf_add(name, cp, ep - cp);
-   return (const char *)ep + 1 - cp_;
-}
-
-static struct cache_entry *create_from_disk(struct mem_pool *mem_pool,
+static struct cache_entry *create_from_disk(struct index_state *istate,
struct ondisk_cache_entry *ondisk,
unsigned long *ent_size,
-   struct strbuf *previous_name)
+   const struct cache_entry 
*previous_ce)
 {
struct cache_entry *ce;
size_t len;
const char *name;
unsigned int flags;
+   size_t copy_len;
+   /*
+* Adjacent cache entries tend to share the leading paths, so it makes
+* sense to only store the differences in later entries.  In the v4
+* on-disk format of the index, each on-disk cache entry stores the
+* number of bytes to be stripped from the end of the previous name,
+* and the bytes to append to the result, to come up with its name.
+*/
+   int expand_name_field = istate->version == 4;
 
/* On-disk flags are just 16 bits */
flags = get_be16(>flags);
@@ -1789,21 +1750,54 @@ static struct cache_entry *create_from_disk(struct 
mem_pool *mem_pool,
else
name = ondisk->name;
 
-   if (!previous_name) {
-   /* v3 and earlier */
-   if (len == CE_NAMEMASK)
-   len = strlen(name);
-   ce = cache_entry_from_ondisk(mem_pool, 

[PATCH v6 3/7] eoie: add End of Index Entry (EOIE) extension

2018-09-26 Thread Ben Peart
The End of Index Entry (EOIE) is used to locate the end of the variable
length index entries and the beginning of the extensions. Code can take
advantage of this to quickly locate the index extensions without having
to parse through all of the index entries.

Because it must be able to be loaded before the variable length cache
entries and other index extensions, this extension must be written last.
The signature for this extension is { 'E', 'O', 'I', 'E' }.

The extension consists of:

- 32-bit offset to the end of the index entries

- 160-bit SHA-1 over the extension types and their sizes (but not
their contents).  E.g. if we have "TREE" extension that is N-bytes
long, "REUC" extension that is M-bytes long, followed by "EOIE",
then the hash would be:

SHA-1("TREE" +  +
"REUC" + )

Signed-off-by: Ben Peart 
---
 Documentation/technical/index-format.txt |  23 
 read-cache.c | 151 +--
 t/README |   5 +
 t/t1700-split-index.sh   |   1 +
 4 files changed, 172 insertions(+), 8 deletions(-)

diff --git a/Documentation/technical/index-format.txt 
b/Documentation/technical/index-format.txt
index db3572626b..6bc2d90f7f 100644
--- a/Documentation/technical/index-format.txt
+++ b/Documentation/technical/index-format.txt
@@ -314,3 +314,26 @@ The remaining data of each directory block is grouped by 
type:
 
   - An ewah bitmap, the n-th bit indicates whether the n-th index entry
 is not CE_FSMONITOR_VALID.
+
+== End of Index Entry
+
+  The End of Index Entry (EOIE) is used to locate the end of the variable
+  length index entries and the begining of the extensions. Code can take
+  advantage of this to quickly locate the index extensions without having
+  to parse through all of the index entries.
+
+  Because it must be able to be loaded before the variable length cache
+  entries and other index extensions, this extension must be written last.
+  The signature for this extension is { 'E', 'O', 'I', 'E' }.
+
+  The extension consists of:
+
+  - 32-bit offset to the end of the index entries
+
+  - 160-bit SHA-1 over the extension types and their sizes (but not
+   their contents).  E.g. if we have "TREE" extension that is N-bytes
+   long, "REUC" extension that is M-bytes long, followed by "EOIE",
+   then the hash would be:
+
+   SHA-1("TREE" +  +
+   "REUC" + )
diff --git a/read-cache.c b/read-cache.c
index 6ba99e2c96..80255d3088 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -43,6 +43,7 @@
 #define CACHE_EXT_LINK 0x6c696e6b/* "link" */
 #define CACHE_EXT_UNTRACKED 0x554E5452   /* "UNTR" */
 #define CACHE_EXT_FSMONITOR 0x46534D4E   /* "FSMN" */
+#define CACHE_EXT_ENDOFINDEXENTRIES 0x454F4945 /* "EOIE" */
 
 /* changes that can be kept in $GIT_DIR/index (basically all extensions) */
 #define EXTMASK (RESOLVE_UNDO_CHANGED | CACHE_TREE_CHANGED | \
@@ -1693,6 +1694,9 @@ static int read_index_extension(struct index_state 
*istate,
case CACHE_EXT_FSMONITOR:
read_fsmonitor_extension(istate, data, sz);
break;
+   case CACHE_EXT_ENDOFINDEXENTRIES:
+   /* already handled in do_read_index() */
+   break;
default:
if (*ext < 'A' || 'Z' < *ext)
return error("index uses %.4s extension, which we do 
not understand",
@@ -1883,6 +1887,9 @@ static size_t estimate_cache_size(size_t ondisk_size, 
unsigned int entries)
return ondisk_size + entries * per_entry;
 }
 
+static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
+static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, size_t offset);
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
@@ -2190,11 +2197,15 @@ static int ce_write(git_hash_ctx *context, int fd, void 
*data, unsigned int len)
return 0;
 }
 
-static int write_index_ext_header(git_hash_ctx *context, int fd,
- unsigned int ext, unsigned int sz)
+static int write_index_ext_header(git_hash_ctx *context, git_hash_ctx 
*eoie_context,
+ int fd, unsigned int ext, unsigned int sz)
 {
ext = htonl(ext);
sz = htonl(sz);
+   if (eoie_context) {
+   the_hash_algo->update_fn(eoie_context, , 4);
+   the_hash_algo->update_fn(eoie_context, , 4);
+   }
return ((ce_write(context, fd, , 4) < 0) ||
(ce_write(context, fd, , 4) < 0)) ? -1 : 0;
 }
@@ -2437,7 +2448,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 {
uint64_t start = getnanotime();
int newfd = tempfile->fd;
-   git_hash_ctx c;
+   git_hash_ctx c, eoie_c;
struct cache_header hdr;
int i, err = 0, removed, extended, hdr_version;
struct 

[PATCH v6 0/7] speed up index load through parallelization

2018-09-26 Thread Ben Peart


Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/a0300882d4
Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v6 
&& git checkout a0300882d4


This iteration brings back the Index Entry Offset Table (IEOT) extension
which enables us to multi-thread the cache entry parsing without having
the primary thread have to scan all the entries first.  In cases where the
cache entry parsing is the most expensive part, this yields some additional
savings.

Using p0002-read-cache.sh to generate some performance numbers shows how
each of the various patches contribute to the overall performance win.


Test w/100,000 filesBaseline  Optimize V4Extensions Entries

0002.1: read_cache  22.36 18.74 -16.2%   18.64 -16.6%   12.63 -43.5%

Test w/1,000,000 files  Baseline  Optimize V4Extensions Entries
-
0002.1: read_cache  304.40270.70 -11.1%  195.50 -35.8%  204.82 -32.7%

Note that on the 1,000,000 files case, multi-threading the cache entry parsing
does not yield a performance win.  This is because the cost to parse the
index extensions in this repo, far outweigh the cost of loading the cache
entries.

NameFirstLast Elapsed   
load_index_extensions() 629.001  870.244  241.243   
load_cache_entries_thread() 683.911  723.199  39.288
load_cache_entries_thread() 686.206  723.512  37.306
load_cache_entries_thread() 686.43   722.596  36.166
load_cache_entries_thread() 684.998  718.74   33.742
load_cache_entries_thread() 685.035  718.698  33.663
load_cache_entries_thread() 686.557  709.545  22.988
load_cache_entries_thread() 684.533  703.536  19.003
load_cache_entries_thread() 684.537  703.521  18.984
load_cache_entries_thread() 685.062  703.774  18.712
load_cache_entries_thread() 685.42   703.416  17.996
load_cache_entries_thread() 648.604  664.496  15.892

293.74 Total load_cache_entries_thread()

The high cost of parsing the index extensions is driven by the cache tree
and the untracked cache extensions. As this is currently the longest pole,
any reduction in this time will reduce the overall index load times so is
worth further investigation in another patch series.

Name        FirstLast Elapsed
|   + git!read_index_extension  684.052  870.244  186.192
|    + git!cache_tree_read  684.052  797.801  113.749
|    + git!read_untracked_extension 797.801  870.244  72.443

One option would be to load each extension on a separate thread but I
believe that is overkill for the vast majority of repos.  Instead, some
optimization of the loading code for these two extensions is probably worth
looking into as a quick examination shows that the bulk of the time for both
of them is spent in xcalloc().


### Patches

Ben Peart (6):
  read-cache: clean up casting and byte decoding
  eoie: add End of Index Entry (EOIE) extension
  config: add new index.threads config setting
  read-cache: load cache extensions on a worker thread
  ieot: add Index Entry Offset Table (IEOT) extension
  read-cache: load cache entries on worker threads

Nguyễn Thái Ngọc Duy (1):
  read-cache.c: optimize reading index format v4

 Documentation/config.txt |   7 +
 Documentation/technical/index-format.txt |  41 ++
 config.c |  18 +
 config.h |   1 +
 read-cache.c | 741 +++
 t/README |  10 +
 t/t1700-split-index.sh   |   2 +
 7 files changed, 705 insertions(+), 115 deletions(-)


base-commit: fe8321ec057f9231c26c29b364721568e58040f7
-- 
2.18.0.windows.1




Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Junio C Hamano
Jeff King  writes:

> Now I'll admit it seems like make-work to me because I would not plan to
> ever look at the formatted output myself. But I guess I don't understand
> the audience for this formatted output. These are APIs internal to Git
> itself. We would not generally want to install gitapi-oid-array into
> /usr/share/man, because only people actually working on Git would be
> able to use it. So it sounds like a convenience for a handful of
> developers (who like to look at this manpage versus the source). It
> doesn't seem like the cost/benefit is there.
>
> And if we were going to generate something external, would it make more
> sense to write in a structured format like doxygen? I am not a big fan
> of it myself, but at least from there you can generate a more richly
> interconnected set of documentation.

I agree on both counts.  I just like to read these in plain text
while I am coding for Git (or reviewing patches coded for Git).  

The reason why I have mild preference to D/technical/ over in-header
doc is only because I find even these asterisks at the left-side-end
distracting; it is not that materials in D/technical could be passed
through AsciiDoc.



Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Stefan Beller
On Wed, Sep 26, 2018 at 11:58 AM Jeff King  wrote:

> Now I'll admit it seems like make-work to me because I would not plan to
> ever look at the formatted output myself. But I guess I don't understand
> the audience for this formatted output.

I have been watching from the side lines so far, as I have a view that is very
much like Peffs, if I were to dictate my opinion onto the project, we'd:
* put all internal API docs into headerfiles.
* get rid of all Documentation/technical/ that describes things
   at a function level. So the remaining things are high level docs such
   as protocol, hash transition plan, partial clones...

But I'd want to understand why we are not doing this (obvious to me)
best thing, I have the impression other people use the docs differently.

How are these docs (for example api-oid-array or
api-revision-walking) used?

I usually tend to use git-grep a lot when looking for a function, just to
jump to the header file and read the comment there and go around the
header file looking if we have a better suited thing.
(Also I tend to like to have fewer files open, such that I prefer
a header file of reasonable size over 2 files, one docs and one header)

So when I find a function documented in Docs/technical, I would first
read there, then go to the header to see the signature and then make
use of it.

If I recall an earlier discussion on this topic, I recall Junio saying that
this *may* pollute the header files, as for example sha1-array.h
is easy to grok in its entirety, and makes it easy to see what
functions regarding oid_arrays are available. A counter example would
be hashmap.h, as that requires some scrolling and skimming.

> These are APIs internal to Git
> itself. We would not generally want to install gitapi-oid-array into
> /usr/share/man, because only people actually working on Git would be
> able to use it. So it sounds like a convenience for a handful of
> developers (who like to look at this manpage versus the source). It
> doesn't seem like the cost/benefit is there.

If this type of docs makes Ævar more productive, I am all for keeping it.

> And if we were going to generate something external, would it make more
> sense to write in a structured format like doxygen? I am not a big fan
> of it myself, but at least from there you can generate a more richly
> interconnected set of documentation.

That sounds sensible to me as the additional burden of yet-another-tool
is only imposed to the developers, but doxygen would provide
a better output in my experience.

Stefan


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Junio C Hamano
Jeff King  writes:

>> When you have "[alias] cp = cherry-pick -n", "git cp --help" should
>> not do "git help cherry-pick".  Only a single word that exactly
>> matches a git command should get this treatment.
>
> I'm not sure I agree. A plausible scenario (under the rules I gave
> above) is:
>
>   $ git cp -h
>   'cp' is aliased to 'cherry-pick -n'
>   usage: git cherry-pick ...

With that additional rule, I can buy "it is fine for 'git cp --help'
to completely ignore -n and behave as if 'git help cherry-pick' was
given", I think.  People already expect "git cp --help" to give the
alias expansion, so to them any change will be a regression any way
we cut it---but I think this is the least bad approach.

>   $ git cp --help
>
> I.e., you already know the "-n" part, and now you want to dig further.

One very good thing about the "make '--help' go directly to the
manpage, while teaching '-h' to report also alias expansion" is that
people already expect "-h" is more concise than "--help".  The
current output from "git cp --help" violates that expectation, and
the change you suggest rectifies it.

> Of course one could just type "git cherry-pick --help" since you also
> know that, too.

Yeah, but that is not an argument.  The user aliased cp because
cherry-pick was quite a mouthful and do not want to type "git
cherry-pick --help" in the first place.


Re: t7005-editor.sh failure

2018-09-26 Thread Andrei Rybak
On 2018-09-26 21:16, Junio C Hamano wrote:
> While at it, update the way to creatd these scripts to use the

s/creatd/create/

Or rewording as "... update the way these scripts are created ..."

> write_script wrapper, so that we do not have to worry about writing
> the she-bang line and making the result executable.


Re: [PATCH] t7005-editor: quote filename to fix whitespace-issue

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 06:14:11PM +0200, Martin Ågren wrote:

> diff --git a/t/t7005-editor.sh b/t/t7005-editor.sh
> index b2ca77b338..5fcf281dfb 100755
> --- a/t/t7005-editor.sh
> +++ b/t/t7005-editor.sh
> @@ -112,7 +112,7 @@ do
>  done
>  
>  test_expect_success 'editor with a space' '
> - echo "echo space >\$1" >"e space.sh" &&
> + echo "echo space >\"\$1\"" >"e space.sh" &&
>   chmod a+x "e space.sh" &&
>   GIT_EDITOR="./e\ space.sh" git commit --amend &&

I was actually puzzled how SHELL_PATH matters here at all, since the
resulting script does not mention it.

What happens is that we first try to execve("./e space.sh"), can get
ENOEXEC. And then we resort to passing it to the shell, which then uses
historical shell magic (which apparently predates the invention of #!
entirely!) to decide to run it as a script using the current shell. And
that shell is selected via the SHELL_PATH at build time (and not the
$SHELL_PATH we have in our environment here).

So I think this fix and the explanation are correct. I do think it would
be a lot less subtle (and a lot more readable) as:

  write_script "e space.sh" <<-\EOF &&
  echo space >"$1"
  EOF

but that is orthogonal to what you're fixing.

-Peff


Re: t7005-editor.sh failure

2018-09-26 Thread Junio C Hamano
Junio C Hamano  writes:

> SZEDER Gábor  writes:
>
>> Having said all that, I didn't omit the quotes in 4362da078e with the
>> above in mind; in fact I tend to use quotes even when they are
>> unnecessary (e.g. in variable assignments: var="$1"), because unquoted
>> variables and command substitutions freak me out before I can think
>> through whether its safe to omit the quotes or not :)
>
> I quote >"$file" (but not var=$var) because the CodingGuidelines
> tells me to:
>
>  - Redirection operators should be written with space before, but no
>space after them.  In other words, write 'echo test >"$file"'
>instead of 'echo test> $file' or 'echo test > $file'.  Note that
>even though it is not required by POSIX to double-quote the
>redirection target in a variable (as shown above), our code does so
>because some versions of bash issue a warning without the quotes.
>
> ;-)
>
>> Sidenote: this test should use the write_script helper to create this
>> editor script.
>
> Good suggestion.

Perhaps like this.

-- >8 --
Subject: t7005: make sure it passes under /bin/bash

In POSIX.1 compliant shells, you should be able to use a variable
reference without quoting for the target of the redirection, e.g.

echo foo >$file
echo bar >$1

without fear of substitution of $file getting split at $IFS.
However, some versions of bash throws a warning, especially when
they are invoked as /bin/bash (not as /bin/sh).  Those who build
with SHELL_PATH=/bin/bash and run t/t7005-editor.sh triggers an
unnecessary failure due to this issue.

Fix it by making sure that the generated "editor" script quotes the
target of redirection.  

While at it, update the way to creatd these scripts to use the
write_script wrapper, so that we do not have to worry about writing
the she-bang line and making the result executable.

Reported-by: Alexander Pyhalov 
Suggested-by: SZEDER Gábor 
Signed-off-by: Junio C Hamano 
---
 t/t7005-editor.sh | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/t/t7005-editor.sh b/t/t7005-editor.sh
index b2ca77b338..b0c4cc4ca0 100755
--- a/t/t7005-editor.sh
+++ b/t/t7005-editor.sh
@@ -20,11 +20,9 @@ fi
 
 for i in GIT_EDITOR core_editor EDITOR VISUAL $vi
 do
-   cat >e-$i.sh <<-EOF
-   #!$SHELL_PATH
+   write_script "e-$i.sh" <<-EOF
echo "Edited by $i" >"\$1"
EOF
-   chmod +x e-$i.sh
 done
 
 if ! test -z "$vi"
@@ -112,8 +110,9 @@ do
 done
 
 test_expect_success 'editor with a space' '
-   echo "echo space >\$1" >"e space.sh" &&
-   chmod a+x "e space.sh" &&
+   write_script "e space.sh" <<-\EOF &&
+   echo space >"$1"
+   EOF
GIT_EDITOR="./e\ space.sh" git commit --amend &&
test space = "$(git show -s --pretty=format:%s)"
 


Re: [PATCH v3 1/7] prio-queue: add 'peek' operation

2018-09-26 Thread Derrick Stolee

On 9/21/2018 1:39 PM, Derrick Stolee via GitGitGadget wrote:

From: Derrick Stolee 

When consuming a priority queue, it can be convenient to inspect
the next object that will be dequeued without actually dequeueing
it. Our existing library did not have such a 'peek' operation, so
add it as prio_queue_peek().

Add a reference-level comparison in t/helper/test-prio-queue.c
so this method is exercised by t0009-prio-queue.sh.

Signed-off-by: Derrick Stolee 
---
  prio-queue.c   |  9 +
  prio-queue.h   |  6 ++
  t/helper/test-prio-queue.c | 10 +++---
  3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/prio-queue.c b/prio-queue.c
index a078451872..d3f488cb05 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -85,3 +85,12 @@ void *prio_queue_get(struct prio_queue *queue)
}
return result;
  }
+
+void *prio_queue_peek(struct prio_queue *queue)
+{
+   if (!queue->nr)
+   return NULL;
+   if (!queue->compare)
+   return queue->array[queue->nr - 1].data;
+   return queue->array[0].data;
+}


The second branch here is never run by the test suite, as the only 
consumers never have compare== NULL. I'll add an ability to test this 
"stack" behavior into t0009-prio-queue.sh.


-Stolee



Re: Git Test Coverage Report (Tuesday, Sept 25)

2018-09-26 Thread Elijah Newren
On Tue, Sep 25, 2018 at 11:44 AM Derrick Stolee  wrote:
> c3b9bc94b9b (Elijah Newren  2018-09-05 10:03:07 -0700
> 181)options.filter = xstrdup(value);

I blame Duy; he was the one who wanted me to look for and remove stray
semicolons in additional places[1].  ;-)

All joking aside, and even though I know nothing of remote-curl.c
beyond the fact that it used to have an excessive number of
semicolons, I think it's awesome that you're running a report like
this.


Elijah

[1] 
https://public-inbox.org/git/cacsjy8cf5+3+6ydwe4y4wylze4y6naw-pj134ktpxm+wywb...@mail.gmail.com/


Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 08:43:14PM +0200, Ævar Arnfjörð Bjarmason wrote:

> While we're on that subject, I'd much prefer if these technical docs and
> other asciidoc-ish stuff we would be manpages build as part of our
> normal "make doc" target. So e.g. this one would be readable via
> something like "man 3 gitapi-oid-array".

I'm mildly negative on this, just because it introduces extra work on
people writing the documentation. Now it has to follow special
formatting rules, and sometimes the source is uglier (e.g., the horrible
+-continuation in lists). Are authors now responsible for formatting any
changes they make to make sure they look good in asciidoc? Or are people
who care about the formatted output going to come along afterwards and
submit fixup patches? Either way it seems like make-work.

Now I'll admit it seems like make-work to me because I would not plan to
ever look at the formatted output myself. But I guess I don't understand
the audience for this formatted output. These are APIs internal to Git
itself. We would not generally want to install gitapi-oid-array into
/usr/share/man, because only people actually working on Git would be
able to use it. So it sounds like a convenience for a handful of
developers (who like to look at this manpage versus the source). It
doesn't seem like the cost/benefit is there.

And if we were going to generate something external, would it make more
sense to write in a structured format like doxygen? I am not a big fan
of it myself, but at least from there you can generate a more richly
interconnected set of documentation.

-Peff


Re: Git Test Coverage Report (Tuesday, Sept 25)

2018-09-26 Thread Derrick Stolee

On 9/26/2018 2:43 PM, Thomas Gummerer wrote:

On 09/26, Derrick Stolee wrote:

This is a bit tricky to do, but I will investigate. For some things, the
values can conflict with each other (GIT_TEST_SPLIT_INDEX doesn't play
nicely with other index options, I think).

Just commenting on this point.  I think all the index options should
be playing nicely with eachother.  I occasionally run the test suite
with some of them turned on, and if something failed that was always
an actual bug.  The different modes can be used in different
combinations in the wild as well, so we should get them to interact
nicely in the test suite.


Thanks! I'm still working out details on this, since the test suite is 
broken with GIT_TEST_COMMIT_GRAPH due to a verbosity issue [1], which I 
forgot until I ran the tests with the variable on.


I'll re-run with these variables:

GIT_TEST_SPLIT_INDEX=1

GIT_TEST_FULL_IN_PACK_ARRAY=1

GIT_TEST_OE_SIZE=128

GIT_TEST_OE_DELTA_SIZE=128

GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES=1

GIT_TEST_FSMONITOR=$PWD/t/t7519/fsmonitor-all

GIT_TEST_INDEX_VERSION=4

GIT_TEST_PRELOAD_INDEX=1

GIT_TEST_DISABLE_EOIE=1

GIT_TEST_INDEX_THREADS=1

Thanks,

-Stolee

[1] 
https://public-inbox.org/git/60aae3d6-35b2-94fb-afd7-6978e935a...@gmail.com/




Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 08:16:36AM -0700, Junio C Hamano wrote:

> > This introduces a help.followAlias config option that transparently
> > redirects to (the first word of) the alias text (provided of course it
> > is not a shell command), similar to the option for autocorrect of
> > misspelled commands.
> 
> While I do agree with you that it would sometimes be very handy if
> "git cp --help" behaved identically to "git cherry-pick --help" just
> like "git cp -h" behaves identically to "git cherry-pick -h" when
> you have "[alias] cp = cherry-pick", I do not think help.followAlias
> configuration is a good idea.  I may know, perhaps because I use it
> all the time, by heart that "cp" is aliased to "cherry-pick" and
> want "git cp --help" to directly give me the manpage, but I may not
> remember if "co" was commit or checkout and want to be concisely
> told that it is aliased to checkout without seeing the full manpage.
> Which means you'd want some way to command line override anyway, and
> having to say "git -c help.followAlias=false cp --help" is not a
> great solution.
> 
> If we expect users to use "git cp --help" a lot more often than "git
> help cp" (or the other way around), one way to give a nicer experience
> may be to unconditionally make "git cp --help" to directly show the
> manpage of cherry-pick, while keeping "git help cp" to never do
> that.  Then those who want to remember what "co" is aliased to can
> ask "git help co".

I like that direction much better. I also wondered if we could leverage
the "-h" versus "--help" distinction. The problem with printing the
alias definition along with "--help" is that the latter will start a
pager that obliterates what we wrote before (and hence all of this delay
trickery).

But for "-h" we generally expect the command to output a usage message.

So what if the rules were:

  - "git help cp" shows "cp is an alias for cherry-pick" (as it does
now)

  - "git cp -h" shows "cp is an alias for cherry-pick", followed by
actually running "cherry-pick -h", which will show the usage
message. For a single-word command that does very little, since the
usage message starts with "cherry-pick". But if your alias is
actually "cp = cherry-pick -n", then it _is_ telling you extra
information. And this could even work with "!" aliases: we define
it, and then it is up to the alias to handle "-h" sensibly.

  - "git cp --help" opens the manpage for cherry-pick. We don't bother
with the alias definition, as it's available through other means
(and thus we skip the obliteration/timing thing totally).

This really only works for non-! aliases. Those would continue to
show the alias definition.


> If you have "[alias] cp = cherry-pick -n", split_cmdline discards
> "-n" and the follow-alias prompt does not even tell you that it did
> so, and you get "git help cherry-pick".  This code somehow expects
> you to know to jump to the section that describes the "--no-commit"
> option.  I do not think that is a reasonable expectation.
> 
> When you have "[alias] cp = cherry-pick -n", "git cp --help" should
> not do "git help cherry-pick".  Only a single word that exactly
> matches a git command should get this treatment.

I'm not sure I agree. A plausible scenario (under the rules I gave
above) is:

  $ git cp -h
  'cp' is aliased to 'cherry-pick -n'
  usage: git cherry-pick ...

  $ git cp --help

I.e., you already know the "-n" part, and now you want to dig further.
Of course one could just type "git cherry-pick --help" since you also
know that, too. But by that rationale, one could already do:

  $ git help cp
  $ git help cherry-pick

without this patch at all.

-Peff


Re: Git Test Coverage Report (Tuesday, Sept 25)

2018-09-26 Thread Derrick Stolee

On 9/26/2018 1:59 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


In an effort to ensure new code is reasonably covered by the test
suite, we now have contrib/coverage-diff.sh to combine the gcov output
from 'make coverage-test ; make coverage-report' with the output from
'git diff A B' to discover _new_ lines of code that are not covered.

This report takes the output of these results after running on four
branches:

         pu: 80e728fa913dc3a1165b6dec9a7afa6052a86325

        jch: 0c10634844314ab89666ed0a1c7d36dde7ac9689

       next: 76f2f5c1e34c4dbef1029e2984c2892894c444ce

     master: fe8321ec057f9231c26c29b364721568e58040f7

master@{1}: 2d3b1c576c85b7f5db1f418907af00ab88e0c303

I ran the test suite on each of these branches on an Ubuntu Linux VM,
and I'm missing some dependencies (like apache, svn, and perforce) so
not all tests are run.

Thanks.


master@{1}..master:

builtin/remote.c
5025425dfff (   Shulhan 2018-09-13 20:18:33 +0700
864)    return error(_("No such remote: '%s'"), name);
commit-reach.c
b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
559)    continue;
b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
569)    from->objects[i].item->flags |= assign_flag;
b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
570)    continue;
b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
576)    result = 0;
b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
577)    goto cleanup;
cat: compat#mingw.c.gcov: No such file or directory
ll-merge.c
d64324cb60e (Torsten Bögershausen   2018-09-12 21:32:02
+0200   379)    marker_size =
DEFAULT_CONFLICT_MARKER_SIZE;
remote-curl.c
c3b9bc94b9b (Elijah Newren  2018-09-05 10:03:07 -0700
181)    options.filter = xstrdup(value);

As I think the data presented here is very valuable, let me ignore
the findings of this specific run (which will be fixed by individual
authors as/if necessary), and focus on the way the data is presented
(which will affect the ease of consumption by authors of future
commits).

These wrapped blame output lines are harder to view.  Can we have
this in plain/text without format=flowed at least?


Perhaps removing the middle columns of data and just " ) 
" would be easier? We could also remove tabs to save space. For 
example:


builtin/remote.c
5025425dfff  864) return error(_("No such remote: '%s'"), name);

commit-reach.c
b67f6b26e35 559) continue;
b67f6b26e35 569) from->objects[i].item->flags |= assign_flag;
b67f6b26e35 570) continue;
b67f6b26e35 576) result = 0;
b67f6b26e35 577) goto cleanup;

ll-merge.c
d64324cb60e 379) marker_size = DEFAULT_CONFLICT_MARKER_SIZE;

remote-curl.c
c3b9bc94b9b  181) options.filter = xstrdup(value);

This does still pad the data by a bit, but should be more readable. Most 
"uncovered" code will be indented at least one level.


We do lose the author information, but keen readers could identify code 
they are interested in by filename and then look up the commit by OID later.




I personally do not mind a monospaced and non-wrapping website, just
I do not mind visiting travis-ci.org for recent results from time to
time.  Others may differ.

There is an error message from "cat" in it, by the way.
Thanks! I'll add an 'if' statement when there is no gcov file. This 
happens for the compat layers that are not compiled in and for the 
t/helper directory, it seems.



preload-index.c
ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
+0200   73) struct progress_data *pd =
p->progress;
ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
+0200   75) pthread_mutex_lock(>mutex);
ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
+0200   76) pd->n += last_nr - nr;
ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
+0200   77) display_progress(pd->progress, pd->n);
ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
+0200   78) pthread_mutex_unlock(>mutex);
ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
+0200   79) last_nr = nr;

I wonder how much we can save the effort that is needed to scan the
output if we somehow coalesce these consecutive lines that are
attributed to the same commit.


It could be possible to group consecutive lines together, but hopefully 
reducing the total data is enough, and we can keep the actual lines visible.


Thanks,

-Stolee



Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Ævar Arnfjörð Bjarmason


On Wed, Sep 26 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>> On Wed, Sep 26 2018, Junio C Hamano wrote:
>>
>>> Jeff King  writes:

 Yes, please. I think it prevents exactly this sort of confusion. :)
>>>
>>> CodingGuidelines or SubmittingPatches update, perhaps?
>>>
>>>  Documentation/CodingGuidelines | 6 +-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
>>> index 48aa4edfbd..b54684e807 100644
>>> --- a/Documentation/CodingGuidelines
>>> +++ b/Documentation/CodingGuidelines
>>> @@ -358,7 +358,11 @@ For C programs:
>>> string_list for sorted string lists, a hash map (mapping struct
>>> objects) named "struct decorate", amongst other things.
>>>
>>> - - When you come up with an API, document it.
>>> + - When you come up with an API, document it.  It used to be
>>> +   encouraged to do so in Documentation/technical/, and the birds-eye
>>> +   level overview may still be more suitable there, but detailed
>>> +   function-by-function level of documentation is done by comments in
>>> +   corresponding .h files these days.
>>>
>>>   - The first #include in C files, except in platform specific compat/
>>> implementations, must be either "git-compat-util.h", "cache.h" or
>>
>> Thanks. I had not looked at this closely and was under the false
>> impression that it was going in the other direction. Good to have it
>> clarified.
>
> Heh, I knew people were in favor of one over the other but until
> Peff chimed in to this thread, I didn't recall which one was
> preferred, partly because I personally do not see a huge advantage
> in using in-code comments as docs for programmers, and do not like
> having to read them as in-code comments.
>
> If somebody wants to wordsmith the text and send in a patch with
> good log message, please do so, as I myself am not sure if what I
> wrote is the consensus position.  It could be that they want to have
> even birds-eye overview in the header files.

While we're on that subject, I'd much prefer if these technical docs and
other asciidoc-ish stuff we would be manpages build as part of our
normal "make doc" target. So e.g. this one would be readable via
something like "man 3 gitapi-oid-array".

They could still be maintained along with the code given a sufficiently
smart doc build system, e.g. perl does that:
https://github.com/Perl/perl5/blob/v5.26.0/av.c#L294-L387

Although doing that in some readable and sane way would mean getting rid
of your beloved multi-line /* ... */ comment formatting, at least for
that case. It would be a pain to have everyone configure their editor to
word-wrap lines with leading "*" without screwing up the asciidoc
format.


Re: Git Test Coverage Report (Tuesday, Sept 25)

2018-09-26 Thread Thomas Gummerer
On 09/26, Derrick Stolee wrote:
> This is a bit tricky to do, but I will investigate. For some things, the
> values can conflict with each other (GIT_TEST_SPLIT_INDEX doesn't play
> nicely with other index options, I think).

Just commenting on this point.  I think all the index options should
be playing nicely with eachother.  I occasionally run the test suite
with some of them turned on, and if something failed that was always
an actual bug.  The different modes can be used in different
combinations in the wild as well, so we should get them to interact
nicely in the test suite.


Re: [PATCH v2 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 06:39:56AM -0700, Taylor Blau wrote:

> > A perl tangent if you're interested:
> [...]
> 
> To be clear, we ought to leave this function as:
> 
>   extract_haves () {
> depacketize | perl -lne '/^(\S+) \.have/ and print $1'
>   }

Yes, I agree. You cannot do the "$@" there because it relies on
depacketize, which only handles stdin.

> Or are you suggesting that we change it to:
> 
>   extract_haves () {
> perl -lne '/^(\S+) \.have/ and print $1'
>   }

No, sorry. I just used the ".have" snippet as filler text, but I see
that muddied my meaning considerably. This really was just a tangent for
the future. What you've written above is the best thing for this case.

> And call it as:
> 
>   printf "" | git receive-pack fork >actual &&
>   depacketize actual.packets
>   extract_haves actual.haves &&
> 
> Frankly, (and I think that this is what you're getting at in your reply
> above), I think that the former (e.g., calling 'depacketize()' in
> 'extract_haves()') is cleaner. This approach leaves us with "actual" and
> "actual.haves", and obviates the need for another intermediary,
> "actual.packets".

Yeah. I have no problem with the three-liner you wrote above, but I do
not see any particular reason for it.

-Peff


Re: [PATCH v9 00/21] Convert "git stash" to C builtin

2018-09-26 Thread Junio C Hamano
Paul-Sebastian Ungureanu  writes:

> Hello,
>
> This is a new iteration of `git stash`, based on the last review I got.
> This new iteration brings mostly code styling fix issues in order to make
> the code more readable. There is also a new patch "strbuf.c: add
> `strbuf_join_argv()`". By making some small changes, the code is now a
> little bit closer to be used as API.
>
> Joel Teichroeb (5):
>   stash: improve option parsing test coverage
>   stash: convert apply to builtin
>   stash: convert drop and clear to builtin
>   stash: convert branch to builtin
>   stash: convert pop to builtin
>
> Paul-Sebastian Ungureanu (16):
>   sha1-name.c: add `get_oidf()` which acts like `get_oid()`
>   strbuf.c: add `strbuf_join_argv()`
>   stash: update test cases conform to coding guidelines
>   stash: rename test cases to be more descriptive
>   stash: add tests for `git stash show` config
>   stash: convert list to builtin
>   stash: convert show to builtin
>   stash: mention options in `show` synopsis.
>   stash: convert store to builtin
>   stash: convert create to builtin
>   stash: convert push to builtin
>   stash: make push -q quiet
>   stash: convert save to builtin
>   stash: convert `stash--helper.c` into `stash.c`
>   stash: optimize `get_untracked_files()` and `check_changes()`
>   stash: replace all `write-tree` child processes with API calls

Here is how these appear in my MUA.  I am guessing that 03/21 is the
right copy and 02/21 that appear after 21/21 is a duplicat that I
should not even look at (the same for 4 other ones with [GSoC]
label)?

   [  53: PSU] [PATCH v9 00/21] Convert "git stash" to C builtin
[  57: PSU] [PATCH v9 01/21] sha1-name.c: add `get_oidf()` which acts like 
`get_oid()`
[  56: PSU] [PATCH v9 02/21] strbuf.c: add `strbuf_join_argv()`
[  68: PSU] [PATCH v9 03/21] stash: improve option parsing test coverage
[ 378: PSU] [PATCH v9 04/21] stash: update test cases conform to coding 
guidelines
[  78: PSU] [PATCH v9 05/21] stash: rename test cases to be more descriptive
[ 102: PSU] [PATCH v9 06/21] stash: add tests for `git stash show` config
[ 632: PSU] [PATCH v9 07/21] stash: convert apply to builtin
[ 199: PSU] [PATCH v9 08/21] stash: convert drop and clear to builtin
[ 132: PSU] [PATCH v9 09/21] stash: convert branch to builtin
[ 151: PSU] [PATCH v9 10/21] stash: convert pop to builtin
[ 100: PSU] [PATCH v9 11/21] stash: convert list to builtin
[ 296: PSU] [PATCH v9 12/21] stash: convert show to builtin
[  56: PSU] [PATCH v9 13/21] stash: mention options in `show` synopsis.
[ 165: PSU] [PATCH v9 14/21] stash: convert store to builtin
[ 513: PSU] [PATCH v9 15/21] stash: convert create to builtin
[ 339: PSU] [PATCH v9 16/21] stash: convert push to builtin
[ 195: PSU] [PATCH v9 17/21] stash: make push -q quiet
[ 430: PSU] [PATCH v9 18/21] stash: convert save to builtin
[ 575: PSU] [PATCH v9 19/21] stash: convert `stash--helper.c` into `stash.c`
[ 156: PSU] [PATCH v9 20/21] stash: optimize `get_untracked_files()` and 
`check_changes()`
[ 130: PSU] [PATCH v9 21/21] stash: replace all `write-tree` child 
processes with API calls
   [  68: PSU] [GSoC][PATCH v9 02/21] stash: improve option parsing test 
coverage
   [ 378: PSU] [GSoC][PATCH v9 03/21] stash: update test cases conform to 
coding guidelines
   [  78: PSU] [GSoC][PATCH v9 04/21] stash: rename test cases to be more 
descriptive
   [ 102: PSU] [GSoC][PATCH v9 05/21] stash: add tests for `git stash show` 
config
   [  55: PSU] [GSoC][PATCH v9 06/21] strbuf.c: add `strbuf_join_argv()`

Thanks.

>
>  Documentation/git-stash.txt  |4 +-
>  Makefile |2 +-
>  builtin.h|1 +
>  builtin/stash.c  | 1595 ++
>  cache.h  |1 +
>  git-stash.sh |  752 
>  git.c|1 +
>  sha1-name.c  |   19 +
>  strbuf.c |   15 +
>  strbuf.h |7 +
>  t/t3903-stash.sh |  192 ++--
>  t/t3907-stash-show-config.sh |   83 ++
>  12 files changed, 1851 insertions(+), 821 deletions(-)
>  create mode 100644 builtin/stash.c
>  delete mode 100755 git-stash.sh
>  create mode 100755 t/t3907-stash-show-config.sh


Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Jeff King
On Wed, Sep 26, 2018 at 11:27:53AM -0700, Junio C Hamano wrote:

> >> diff --git a/Documentation/CodingGuidelines 
> >> b/Documentation/CodingGuidelines
> >> index 48aa4edfbd..b54684e807 100644
> >> --- a/Documentation/CodingGuidelines
> >> +++ b/Documentation/CodingGuidelines
> >> @@ -358,7 +358,11 @@ For C programs:
> >> string_list for sorted string lists, a hash map (mapping struct
> >> objects) named "struct decorate", amongst other things.
> >>
> >> - - When you come up with an API, document it.
> >> + - When you come up with an API, document it.  It used to be
> >> +   encouraged to do so in Documentation/technical/, and the birds-eye
> >> +   level overview may still be more suitable there, but detailed
> >> +   function-by-function level of documentation is done by comments in
> >> +   corresponding .h files these days.
> >>
> >>   - The first #include in C files, except in platform specific compat/
> >> implementations, must be either "git-compat-util.h", "cache.h" or
> >
> > Thanks. I had not looked at this closely and was under the false
> > impression that it was going in the other direction. Good to have it
> > clarified.
> 
> Heh, I knew people were in favor of one over the other but until
> Peff chimed in to this thread, I didn't recall which one was
> preferred, partly because I personally do not see a huge advantage
> in using in-code comments as docs for programmers, and do not like
> having to read them as in-code comments.
> 
> If somebody wants to wordsmith the text and send in a patch with
> good log message, please do so, as I myself am not sure if what I
> wrote is the consensus position.  It could be that they want to have
> even birds-eye overview in the header files.

Yes, I would say that everything should go into the header files. For
the same reason that the function descriptions should go there: by being
close to the thing being changed, it is more likely that authors will
remember to adjust the documentation. It's not exactly literate
programming, but it's a step in that direction.

That's just my opinion, of course. I've been very happy so far with the
documentation that we have transitioned. We talked a while ago about a
script to extract the comments into a "just documentation" and build it
as asciidoc, but I doubt I would use such a thing myself.

I do agree in general that mentioning this in CodingGuidelines is a good
idea.

-Peff


Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Wed, Sep 26 2018, Junio C Hamano wrote:
>
>> Jeff King  writes:
>>>
>>> Yes, please. I think it prevents exactly this sort of confusion. :)
>>
>> CodingGuidelines or SubmittingPatches update, perhaps?
>>
>>  Documentation/CodingGuidelines | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
>> index 48aa4edfbd..b54684e807 100644
>> --- a/Documentation/CodingGuidelines
>> +++ b/Documentation/CodingGuidelines
>> @@ -358,7 +358,11 @@ For C programs:
>> string_list for sorted string lists, a hash map (mapping struct
>> objects) named "struct decorate", amongst other things.
>>
>> - - When you come up with an API, document it.
>> + - When you come up with an API, document it.  It used to be
>> +   encouraged to do so in Documentation/technical/, and the birds-eye
>> +   level overview may still be more suitable there, but detailed
>> +   function-by-function level of documentation is done by comments in
>> +   corresponding .h files these days.
>>
>>   - The first #include in C files, except in platform specific compat/
>> implementations, must be either "git-compat-util.h", "cache.h" or
>
> Thanks. I had not looked at this closely and was under the false
> impression that it was going in the other direction. Good to have it
> clarified.

Heh, I knew people were in favor of one over the other but until
Peff chimed in to this thread, I didn't recall which one was
preferred, partly because I personally do not see a huge advantage
in using in-code comments as docs for programmers, and do not like
having to read them as in-code comments.

If somebody wants to wordsmith the text and send in a patch with
good log message, please do so, as I myself am not sure if what I
wrote is the consensus position.  It could be that they want to have
even birds-eye overview in the header files.



Re: [PATCH] worktree: add per-worktree config files

2018-09-26 Thread Ævar Arnfjörð Bjarmason


On Sun, Sep 23 2018, Nguyễn Thái Ngọc Duy wrote:

> +extensions.worktreeConfig::
> + If set, by default "git config" reads from both "config" and
> + "config.worktree" file in that order.

How does this interact with config that's now only used if it's in
.git/config? E.g. you can't set remote.. in ~/.gitconfig,
will that be inherited across the two of these?

> In multiple working
> + directory mode, "config" file is shared while
> + "config.worktree" is per-working directory.

"But then how will it work with more than one?" I found myself thinking
before reading some more and remembering .git/worktree. Shouldn't we
consistently say:

[...]"config" and "worktrees//config"[...]

Or something like that?


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Junio C Hamano
Taylor Blau  writes:

> This pause (though I'm a little surprised by it when reviewing the
> code), I think strikes a good balance between the two, i.e., that you
> can get help for whatever it is aliased to, and see what that alias is.

And I need to react to it within subsecond with ^C when I want a
compact answer to "what is this aliased to"?  No, thanks.


Re: t7005-editor.sh failure

2018-09-26 Thread Junio C Hamano
SZEDER Gábor  writes:

> Having said all that, I didn't omit the quotes in 4362da078e with the
> above in mind; in fact I tend to use quotes even when they are
> unnecessary (e.g. in variable assignments: var="$1"), because unquoted
> variables and command substitutions freak me out before I can think
> through whether its safe to omit the quotes or not :)

I quote >"$file" (but not var=$var) because the CodingGuidelines
tells me to:

 - Redirection operators should be written with space before, but no
   space after them.  In other words, write 'echo test >"$file"'
   instead of 'echo test> $file' or 'echo test > $file'.  Note that
   even though it is not required by POSIX to double-quote the
   redirection target in a variable (as shown above), our code does so
   because some versions of bash issue a warning without the quotes.

;-)

> Sidenote: this test should use the write_script helper to create this
> editor script.

Good suggestion.


Re: [PATCH] t7005-editor: quote filename to fix whitespace-issue

2018-09-26 Thread Taylor Blau
On Wed, Sep 26, 2018 at 06:14:11PM +0200, Martin Ågren wrote:
> From: Alexander Pyhalov 
>
> Commit 4362da078e (t7005-editor: get rid of the SPACES_IN_FILENAMES
> prereq, 2018-05-14) removed code for detecting whether spaces in
> filenames work. Since we rely on spaces throughout the test suite
> ("trash directory.t1234-foo"), testing whether we can use the filename
> "e space.sh" was redundant and unnecessary.
>
> In simplifying the code, though, this introduced a regression around how
> spaces are handled, not in the /name/ of the editor script, but /in/ the
> script itself. The script just does `echo space >$1`, where $1 is for
> example "/foo/t/trash directory.t7005-editor/.git/COMMIT_EDITMSG".
>
> With most shells, or with Bash in posix mode, $1 will not be subjected
> to field splitting. But if we invoke Bash directly, which will happen if
> we build Git with SHELL_PATH=/bin/bash, it will detect and complain
> about an "ambiguous redirect". More details can be found in [1], thanks
> to SZEDER Gábor.
>
> Make sure that the editor script quotes "$1" to remove the ambiguity.
>
> [1] https://public-inbox.org/git/20180926121107.GH27036@localhost/
>
> Signed-off-by: Alexander Pyhalov 
> Commit-message-by: Martin Ågren 
> Signed-off-by: Martin Ågren 

Thanks. I find that the explanation of the regression is a helpful one,
and the changes below look sane to me.

Since I couldn't find any other style in the surrounding script that
needed matching against, this has my:

  Reviewed-by: Taylor Blau 

Thanks,
Taylor


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Taylor Blau
On Wed, Sep 26, 2018 at 08:16:36AM -0700, Junio C Hamano wrote:
> Rasmus Villemoes  writes:
>
> > I often use 'git  --help' as a quick way to get the documentation
> > for a command. However, I've also trained my muscle memory to use my
> > aliases (cp=cherry-pick, co=checkout etc.), which means that I often end
> > up doing
> >
> >   git cp --help
> >
> > to which git correctly informs me that cp is an alias for
> > cherry-pick. However, I already knew that, and what I really wanted was
> > the man page for the cherry-pick command.
> >
> > This introduces a help.followAlias config option that transparently
> > redirects to (the first word of) the alias text (provided of course it
> > is not a shell command), similar to the option for autocorrect of
> > misspelled commands.
>
> While I do agree with you that it would sometimes be very handy if
> "git cp --help" behaved identically to "git cherry-pick --help" just
> like "git cp -h" behaves identically to "git cherry-pick -h" when
> you have "[alias] cp = cherry-pick", I do not think help.followAlias
> configuration is a good idea.  I may know, perhaps because I use it
> all the time, by heart that "cp" is aliased to "cherry-pick" and
> want "git cp --help" to directly give me the manpage, but I may not
> remember if "co" was commit or checkout and want to be concisely
> told that it is aliased to checkout without seeing the full manpage.
> Which means you'd want some way to command line override anyway, and
> having to say "git -c help.followAlias=false cp --help" is not a
> great solution.

I think I responded partially to this hunk in another thread, but I
think I can add some additional information here where it is more
relevant.

One approach to take when digesting this is that 'git co --help' shows
you the manual page for 'git-checkout(1)' (or whatever you have it
aliased to), so that answers the question for the caller who has a
terminal open.

In the case where you are scripting (and want to know what 'git co'
means for programmatic usage), I think that there are two options. One,
which you note above, is the 'git -c help.followAlias=false ...'
approach, which I don't think is so bad for callers, given the tradeoff.

Another way to go is 'git config alias.co', which should provide the
same answer. I think that either would be fine.

Perhaps we could assume that 'help.followAlias' is false when it is
unset, _and_ isatty(2) says that we aren't a TTY. Otherwise, since I
feel that this is a good feature that should be the new default, we can
assume it's set to true.

Thanks,
Taylor


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-09-26 Thread Junio C Hamano
Jeff King  writes:

>> I do not think "sources that are not git repositories" is all that
>> interesting, unless they can also serve as the source for ext::
>> remote helper.  And if they can serve "git fetch ext::...", I think
>> they can be treated just like a normal Git repository by the
>> backfill code that needs to lazily populate the partial clone.
>
> I don't know about that. Imagine I had a regular Git repo with a bunch
> of large blobs, and then I also stored those large blobs in something
> like S3 that provides caching, geographic locality, and resumable
> transfers.
>
> It would be nice to be able to say:
>
>   1. Clone from the real repo, but do not transfer any blobs larger than
>  10MB.
>
>   2. When you need a blob, check the external odb that points to S3. Git
>  cannot know about this automatically, but presumably you would set
>  a few config variables to point to an external-odb helper script.
>
>   3. If for some reason S3 doesn't work, you can always request it from
>  the original repo. That part _doesn't_ need extra config, since we
>  can assume that the source of the promisor pack can feed us the
>  extra objects[1].
>
> But you don't need to ever be able to "git fetch" from the S3 repo.
>
> Now if you are arguing that the interface to the external-odb helper
> script should be that it _looks_ like upload-pack, but simply advertises
> no refs and will let you fetch any object, that makes more sense to me.
> It's not something you could "git clone", but you can "git fetch" from
> it.

Yup.  The lazy backfill JTan has, if I understand correctly, only
wants "Please give me this and that object" and use of "upload-pack"
is an implementation detail.  Over the existing Git protocols, you
may implement it as sending these object names as "want" and perhaps
restrict the traversal (if there is a "want" object that is commit)
by giving some commits as "have", i.e. "upload-pack" may not be the
best model for the other side, but that is what we have readily
available.  I was hoping that the way we take to move forward is to
enhance that interface so that we can use different "object store"
backends as needed, to satisfy needs from both parties.


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Taylor Blau
On Wed, Sep 26, 2018 at 08:30:32AM -0700, Junio C Hamano wrote:
> Taylor Blau  writes:
>
> >> +help.followAlias::
> >> +  When requesting help for an alias, git prints a line of the
> >> +  form "'' is aliased to ''". If this option is
> >> +  set to a positive integer, git proceeds to show the help for
> >
> > With regard to "set to a positive integer", I'm not sure why this is the
> > way that it is. I see below you used 'git_config_int()', but I think
> > that 'git_config_bool()' would be more appropriate.
> >
> > The later understands strings like "yes", "on" or "true", which I think
> > is more of what I would expect from a configuration setting such as
> > this.
>
> That is, as you read in the next paragraph, because it gives the
> number of deciseconds to show a prompt before showing the manpage.
>
> Not that I think this configuration is a good idea (see my review).
>
> >> +  the first word of  after the given number of
> >> +  deciseconds. If the value of this option is negative, the
> >> +  redirect happens immediately. If the value is 0 (which is the
> >> +  default), or  begins with an exclamation point, no
> >> +  redirect takes place.
> >
> > It was unclear to my originlly why this was given as a configuration
> > knob, but my understanding after reading the patch is that this is to do
> > _additional_ things besides printing what is aliased to what.
> >
> > Could you perhaps note this in the documentation?
>
> It may be that the description for the "execute the likely typoed
> command" configuration is poorly written and this merely copied the
> badness from it.  Over there the prompt gives a chance to ^C out,
> which serves useful purpose, and if that is not documented, we should.
>
> On the other hand, I'd rather see this prompt in the new code
> removed, because I do not think the prompt given in the new code
> here is all that useful.
>
> >> @@ -415,9 +420,34 @@ static const char *check_git_cmd(const char* cmd)
> >>
> >>alias = alias_lookup(cmd);
> >>if (alias) {
> >> -  printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
> >> -  free(alias);
> >> -  exit(0);
> >> +  const char **argv;
> >> +  int count;
> >> +
> >> +  if (!follow_alias || alias[0] == '!') {
> >> +  printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
> >> +  free(alias);
> >> +  exit(0);
> >> +  }
> >> +  fprintf_ln(stderr, _("'%s' is aliased to '%s'"), cmd, alias);
> >
> > OK, I think that this is a sensible decision: print to STDERR when
> > that's not the main purpose of what're doing (e.g., we're going to
> > follow the alias momentarily), and STDOUT when it's the only thing we're
> > doing.
>
> > Potentially we could call 'fprintf_ln()' only once, and track an `int
> > fd` at the top of this block.
>
> I actually think this should always give the output to standard output.
>
> >> +
> >> +  /*
> >> +   * We use split_cmdline() to get the first word of the
> >> +   * alias, to ensure that we use the same rules as when
> >> +   * the alias is actually used. split_cmdline()
> >> +   * modifies alias in-place.
> >> +   */
> >> +  count = split_cmdline(alias, );
> >> +  if (count < 0)
> >> +  die("Bad alias.%s string: %s", cmd,
> >> +  split_cmdline_strerror(count));
> >
> > Please wrap this in _() so that translators can translate it.
> >
> >> +  if (follow_alias > 0) {
> >> +  fprintf_ln(stderr,
> >> + _("Continuing to help for %s in %0.1f 
> >> seconds."),
> >> + alias, follow_alias/10.0);
> >> +  sleep_millisec(follow_alias * 100);
> >> +  }
> >> +  return alias;
> >
> > I'm not sure that this notification is necessary, but I'll defer to the
> > judgement of others on this one.
>
> I didn't bother to check the original but this is mimicking an
> existing code that lets configuration to be set to num-deciseconds
> to pause and give chance to ^C out, and also allows it to be set to
> negative to immediately go ahead.  follow-alias at this point cannot
> be zero in the codeflow, but it still can be negative.

I think that this is the most compelling argument _for_ the configuration
that you are not in favor of. I understood your previous review as "I
know that 'git cp' is a synonym of 'git cherry-pick', but I want to use
'git co --help' for when I don't remember what 'git co' is a synonym
of."

This pause (though I'm a little surprised by it when reviewing the
code), I think strikes a good balance between the two, i.e., that you
can get help for whatever it is aliased to, and see what that alias is.

Thanks,
Taylor


Re: Git Test Coverage Report (Tuesday, Sept 25)

2018-09-26 Thread Junio C Hamano
Derrick Stolee  writes:

> In an effort to ensure new code is reasonably covered by the test
> suite, we now have contrib/coverage-diff.sh to combine the gcov output
> from 'make coverage-test ; make coverage-report' with the output from
> 'git diff A B' to discover _new_ lines of code that are not covered.
>
> This report takes the output of these results after running on four
> branches:
>
>         pu: 80e728fa913dc3a1165b6dec9a7afa6052a86325
>
>        jch: 0c10634844314ab89666ed0a1c7d36dde7ac9689
>
>       next: 76f2f5c1e34c4dbef1029e2984c2892894c444ce
>
>     master: fe8321ec057f9231c26c29b364721568e58040f7
>
> master@{1}: 2d3b1c576c85b7f5db1f418907af00ab88e0c303
>
> I ran the test suite on each of these branches on an Ubuntu Linux VM,
> and I'm missing some dependencies (like apache, svn, and perforce) so
> not all tests are run.

Thanks.

> master@{1}..master:
>
> builtin/remote.c
> 5025425dfff (   Shulhan 2018-09-13 20:18:33 +0700
> 864)    return error(_("No such remote: '%s'"), name);
> commit-reach.c
> b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
> 559)    continue;
> b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
> 569)    from->objects[i].item->flags |= assign_flag;
> b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
> 570)    continue;
> b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
> 576)    result = 0;
> b67f6b26e35 (Derrick Stolee 2018-09-21 08:05:26 -0700
> 577)    goto cleanup;
> cat: compat#mingw.c.gcov: No such file or directory
> ll-merge.c
> d64324cb60e (Torsten Bögershausen   2018-09-12 21:32:02
> +0200   379)    marker_size =
> DEFAULT_CONFLICT_MARKER_SIZE;
> remote-curl.c
> c3b9bc94b9b (Elijah Newren  2018-09-05 10:03:07 -0700
> 181)    options.filter = xstrdup(value);

As I think the data presented here is very valuable, let me ignore
the findings of this specific run (which will be fixed by individual
authors as/if necessary), and focus on the way the data is presented
(which will affect the ease of consumption by authors of future
commits).

These wrapped blame output lines are harder to view.  Can we have
this in plain/text without format=flowed at least?  

I personally do not mind a monospaced and non-wrapping website, just
I do not mind visiting travis-ci.org for recent results from time to
time.  Others may differ.

There is an error message from "cat" in it, by the way.

> preload-index.c
> ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
> +0200   73) struct progress_data *pd =
> p->progress;
> ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
> +0200   75) pthread_mutex_lock(>mutex);
> ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
> +0200   76) pd->n += last_nr - nr;
> ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
> +0200   77) display_progress(pd->progress, pd->n);
> ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
> +0200   78) pthread_mutex_unlock(>mutex);
> ae9af12287b (Nguyễn Thái Ngọc Duy   2018-09-15 19:56:04
> +0200   79) last_nr = nr;

I wonder how much we can save the effort that is needed to scan the
output if we somehow coalesce these consecutive lines that are
attributed to the same commit.

In any case, thanks for doing this.


Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Ævar Arnfjörð Bjarmason


On Wed, Sep 26 2018, Junio C Hamano wrote:

> Jeff King  writes:
>
>> On Tue, Sep 25, 2018 at 12:26:44PM -0700, Stefan Beller wrote:
>>
>>> On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason
>>>  wrote:
>>> >
>>> >
>>> > On Fri, Sep 21 2018, Stefan Beller wrote:
>>> >
>>> > > +/*
>>> > > + * Apply want to each entry in array, retaining only the entries for
>>> > > + * which the function returns true.  Preserve the order of the entries
>>> > > + * that are retained.
>>> > > + */
>>> > > +void oid_array_filter(struct oid_array *array,
>>> > > +   for_each_oid_fn want,
>>> > > +   void *cbdata);
>>> > > +
>>> > >  #endif /* SHA1_ARRAY_H */
>>> >
>>> > The code LGTM, but this comment should instead be an update to the API
>>> > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects
>>> > by type, then SHA-1", 2018-05-10) for an addition of a new function to
>>> > this API where I added some new docs.
>>>
>>> ok will fix for consistency (this whole API is there).
>>>
>>> Longer term (I thought) we were trying to migrate API docs
>>> to headers instead?
>>
>> Yes, please. I think it prevents exactly this sort of confusion. :)
>
> CodingGuidelines or SubmittingPatches update, perhaps?
>
>  Documentation/CodingGuidelines | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
> index 48aa4edfbd..b54684e807 100644
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -358,7 +358,11 @@ For C programs:
> string_list for sorted string lists, a hash map (mapping struct
> objects) named "struct decorate", amongst other things.
>
> - - When you come up with an API, document it.
> + - When you come up with an API, document it.  It used to be
> +   encouraged to do so in Documentation/technical/, and the birds-eye
> +   level overview may still be more suitable there, but detailed
> +   function-by-function level of documentation is done by comments in
> +   corresponding .h files these days.
>
>   - The first #include in C files, except in platform specific compat/
> implementations, must be either "git-compat-util.h", "cache.h" or

Thanks. I had not looked at this closely and was under the false
impression that it was going in the other direction. Good to have it
clarified.


Re: [PATCH] worktree: add per-worktree config files

2018-09-26 Thread Junio C Hamano
Duy Nguyen  writes:

> I believe the main selling point of multiple worktrees is sharing
> refs. You could easily avoid expensive clones with --local, but
> synchronizing between different clones is not very convenient. Other
> than that, different worktrees tend to behave like separate clones.

OK.  Even with the enforced limitation that no single branch can be
checked out in multiple worktrees at the same time, it is more
convenient as you can "merge" other branch and trust that the result
on the checked-out branch in your worktree is immediately visible to
other worktrees.

> user has multiple worktrees, but the extension is not enabled. I'm

Exactly.  "config --worktree" in your script will silently break
other worktrees; it wanted to affect only the current worktree, but
it changed settings to all the others.

> probably not clear in the commit message, but "git config" can detect
> that the extension has not been enabled, automatically enable it (and
> move core.bare and core.worktree (if present) to the main worktree's
> private config), so "git config --worktree" will never share the
> change.

I wonder if that is good enough.  The user in one worktree did
"config --worktree" and all the worktrees now start diverging in
their config---but that is true _only_ when they use "--worktree"
option to say "I want this to be set differently from others".  All
the other calls to "git config" will stil be shared.

So from a cursory thinking, it sounds OK to me, but somebody else
may think of bad ramifications after a good night sleep.





Re: [PATCH] git help: promote 'git help -av'

2018-09-26 Thread Junio C Hamano
Duy Nguyen  writes:

> Here's the patch that adds that external commands and aliases
> sections. I feel that external commands section is definitely good to
> have even if we don't replace "help -a". Aliases are more
> subjective...

I didn't apply this (so I didn't try running it), but a quick
eyeballing tells me that the listing of external commands in -av
output done in this patch seems reasonable.

I do not think removing "-v" and making the current "help -a" output
unavailable is a wise idea, though.


Re: [PATCH] git help: promote 'git help -av'

2018-09-26 Thread Junio C Hamano
Duy Nguyen  writes:

> -v was recently added just for the new "help -a" in May 2018. I think
> it's ok to get rid of it. Memory muscles probably take a couple more
> months to kick in.

If it is not hurting, keeping it lets people say "--no-verbose" to
get a less verbose output to help those who do not like the new
default.

Unless you are removing code to show the current "help -a" output,
that is.


Re: [PATCH 1/8] sha1-array: provide oid_array_filter

2018-09-26 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Sep 25, 2018 at 12:26:44PM -0700, Stefan Beller wrote:
>
>> On Sat, Sep 22, 2018 at 5:58 AM Ævar Arnfjörð Bjarmason
>>  wrote:
>> >
>> >
>> > On Fri, Sep 21 2018, Stefan Beller wrote:
>> >
>> > > +/*
>> > > + * Apply want to each entry in array, retaining only the entries for
>> > > + * which the function returns true.  Preserve the order of the entries
>> > > + * that are retained.
>> > > + */
>> > > +void oid_array_filter(struct oid_array *array,
>> > > +   for_each_oid_fn want,
>> > > +   void *cbdata);
>> > > +
>> > >  #endif /* SHA1_ARRAY_H */
>> >
>> > The code LGTM, but this comment should instead be an update to the API
>> > docs, see my recent 5cc044e025 ("get_short_oid: sort ambiguous objects
>> > by type, then SHA-1", 2018-05-10) for an addition of a new function to
>> > this API where I added some new docs.
>> 
>> ok will fix for consistency (this whole API is there).
>> 
>> Longer term (I thought) we were trying to migrate API docs
>> to headers instead?
>
> Yes, please. I think it prevents exactly this sort of confusion. :)

CodingGuidelines or SubmittingPatches update, perhaps?

 Documentation/CodingGuidelines | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index 48aa4edfbd..b54684e807 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -358,7 +358,11 @@ For C programs:
string_list for sorted string lists, a hash map (mapping struct
objects) named "struct decorate", amongst other things.
 
- - When you come up with an API, document it.
+ - When you come up with an API, document it.  It used to be
+   encouraged to do so in Documentation/technical/, and the birds-eye
+   level overview may still be more suitable there, but detailed
+   function-by-function level of documentation is done by comments in
+   corresponding .h files these days.
 
  - The first #include in C files, except in platform specific compat/
implementations, must be either "git-compat-util.h", "cache.h" or


Re: Fixing constant preference prompts during tests

2018-09-26 Thread Junio C Hamano
Tacitus Aedifex  writes:

> I keep getting prompted for my algorithm preferences while running all
> of the git test suite:
>
> Set preference list to:
> Cipher: AES256, AES192, AES, 3DES
> Digest: SHA512, SHA384, SHA256, SHA224, SHA1
> Compression: Uncompressed
> Features: MDC, Keyserver no-modify
>
> What is the best way to prevent this from happening? I want to run the
> entire test suite unattended and have it complete on its own.

Nobody raised this so far as far as I recall; thanks for the first
one to do so, as it is a sign that you are doing something unusual
(e.g. have newer or different version of GPG than most other people)
and others will hit by the same symptom later when whatever thing
you are using as a minority right now becomes more prevalent.

In other words, we need to have a bit more detail of your
environment.  I made a blind guess that the above may be coming from
GnuPG, and the test framework (t/test-lib.sh, t/lib-gpg.sh, etc.)
tries to run our tests in a stable environment that is not affected
by real $HOME etc. owned by the user who happens to be running the
tests, but it could be that your copy of GnuPG may require a bit
more seeding than we do in our test framework to squelch that
preference prompt.

It may not be GnuPG and something else, but I think you got the
general idea.




Re: git fetch behaves weirdely when run in a worktree

2018-09-26 Thread Junio C Hamano
Duy Nguyen  writes:

> On Wed, Sep 26, 2018 at 05:24:14PM +0200, Duy Nguyen wrote:
>> On Wed, Sep 26, 2018 at 6:46 AM Kaartic Sivaraam
>>  wrote:
>> > This is the most interesting part of the issue. I **did not** run
>> >'git fetch ...' in between those cat commands. I was surprised by
>> >how the contents of FETCH_HEAD are changing without me spawning any
>> >git processes that might change it. Am I missing something here? As
>> >far as i could remember, there wasn't any IDE running that might
>> >automatically spawn git commands especially in that work
>> >tree. Weird.
>
> Maybe something like this could help track down that rogue "git fetch"
> process (it's definitely _some_ process writing to the wrong file; or
> some file synchronization thingy is going on). You can log more of
> course, but this is where FETCH_HEAD is updated.

Well, a background-ish thing could be some vendor-provided copy of
Git that has nothing to do with what Kaartic would be compiling with
this patch X-<.

> -- 8< --
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0696abfc2a..0dfb580e92 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -786,6 +786,13 @@ static int store_updated_refs(const char *raw_url, const 
> char *remote_name,
>   if (!fp)
>   return error_errno(_("cannot open %s"), filename);
>  
> + {
> + struct strbuf sb = STRBUF_INIT;
> + strbuf_addf(, "( date; ls -l /proc/%d/cwd ) >>%s.log", 
> getpid(), filename);
> + system(sb.buf);
> + strbuf_release();
> + }
> +
>   if (raw_url)
>   url = transport_anonymize_url(raw_url);
>   else
> -- 8< --
>
> --
> Duy


Fixing constant preference prompts during tests

2018-09-26 Thread Tacitus Aedifex
I keep getting prompted for my algorithm preferences while running all of the 
git test suite:


Set preference list to:
Cipher: AES256, AES192, AES, 3DES
Digest: SHA512, SHA384, SHA256, SHA224, SHA1
Compression: Uncompressed
Features: MDC, Keyserver no-modify

What is the best way to prevent this from happening? I want to run the entire 
test suite unattended and have it complete on its own.


//tæ


Re: [PATCH] git help: promote 'git help -av'

2018-09-26 Thread Duy Nguyen
On Tue, Sep 25, 2018 at 10:54 PM Jeff King  wrote:
> Mentioning aliases seems reasonable to me. The definitions of some of
> mine are pretty nasty bits of shell, but I guess that people either
> don't have any ugly aliases, or are comfortable enough with them that
> they won't be scared away. :)

I have one with a very long pretty format for git-log. Well, you wrote
your aliases you learn to live with them :) We could certainly cut too
long aliased commands to keep the listing pretty, but I don't think we
should do that until somebody asks for it.

> > -- 8< --
> > @@ -53,7 +52,6 @@ static struct option builtin_help_options[] = {
> >   HELP_FORMAT_WEB),
> >   OPT_SET_INT('i', "info", _format, N_("show info page"),
> >   HELP_FORMAT_INFO),
> > - OPT__VERBOSE(, N_("print command description")),
> >   OPT_END(),
> >  };
>
> Would we want to continue respecting "-v" as a noop? I admit I did not
> even know it existed until this thread, but if people have trained
> themselves to run "git help -av", we should probably continue to give
> them this output.

-v was recently added just for the new "help -a" in May 2018. I think
it's ok to get rid of it. Memory muscles probably take a couple more
months to kick in.
-- 
Duy


[PATCH] t7005-editor: quote filename to fix whitespace-issue

2018-09-26 Thread Martin Ågren
From: Alexander Pyhalov 

Commit 4362da078e (t7005-editor: get rid of the SPACES_IN_FILENAMES
prereq, 2018-05-14) removed code for detecting whether spaces in
filenames work. Since we rely on spaces throughout the test suite
("trash directory.t1234-foo"), testing whether we can use the filename
"e space.sh" was redundant and unnecessary.

In simplifying the code, though, this introduced a regression around how
spaces are handled, not in the /name/ of the editor script, but /in/ the
script itself. The script just does `echo space >$1`, where $1 is for
example "/foo/t/trash directory.t7005-editor/.git/COMMIT_EDITMSG".

With most shells, or with Bash in posix mode, $1 will not be subjected
to field splitting. But if we invoke Bash directly, which will happen if
we build Git with SHELL_PATH=/bin/bash, it will detect and complain
about an "ambiguous redirect". More details can be found in [1], thanks
to SZEDER Gábor.

Make sure that the editor script quotes "$1" to remove the ambiguity.

[1] https://public-inbox.org/git/20180926121107.GH27036@localhost/

Signed-off-by: Alexander Pyhalov 
Commit-message-by: Martin Ågren 
Signed-off-by: Martin Ågren 
---
 SZEDER wrote:
 > Let me put on my POSIX-lawyer hat for a moment to explain this :)

 Thanks for an excellent explanation.

 > Sidenote: this test should use the write_script helper to create this
 > editor script.

 Yes. I've punted on that for now.

 Here's my updated commit message as part of a proper patch. Thanks
 Alexander for the analysis and the diff, and thanks Eric and SZEDER for
 getting me on the right track with the commit message.

 t/t7005-editor.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t7005-editor.sh b/t/t7005-editor.sh
index b2ca77b338..5fcf281dfb 100755
--- a/t/t7005-editor.sh
+++ b/t/t7005-editor.sh
@@ -112,7 +112,7 @@ do
 done
 
 test_expect_success 'editor with a space' '
-   echo "echo space >\$1" >"e space.sh" &&
+   echo "echo space >\"\$1\"" >"e space.sh" &&
chmod a+x "e space.sh" &&
GIT_EDITOR="./e\ space.sh" git commit --amend &&
test space = "$(git show -s --pretty=format:%s)"
-- 
2.19.0.216.g2d3b1c576c



Re: git fetch behaves weirdely when run in a worktree

2018-09-26 Thread Duy Nguyen
On Wed, Sep 26, 2018 at 05:24:14PM +0200, Duy Nguyen wrote:
> On Wed, Sep 26, 2018 at 6:46 AM Kaartic Sivaraam
>  wrote:
> > This is the most interesting part of the issue. I **did not** run
> >'git fetch ...' in between those cat commands. I was surprised by
> >how the contents of FETCH_HEAD are changing without me spawning any
> >git processes that might change it. Am I missing something here? As
> >far as i could remember, there wasn't any IDE running that might
> >automatically spawn git commands especially in that work
> >tree. Weird.

Maybe something like this could help track down that rogue "git fetch"
process (it's definitely _some_ process writing to the wrong file; or
some file synchronization thingy is going on). You can log more of
course, but this is where FETCH_HEAD is updated.

-- 8< --
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 0696abfc2a..0dfb580e92 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -786,6 +786,13 @@ static int store_updated_refs(const char *raw_url, const 
char *remote_name,
if (!fp)
return error_errno(_("cannot open %s"), filename);
 
+   {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addf(, "( date; ls -l /proc/%d/cwd ) >>%s.log", 
getpid(), filename);
+   system(sb.buf);
+   strbuf_release();
+   }
+
if (raw_url)
url = transport_anonymize_url(raw_url);
else
-- 8< --

--
Duy


Re: [PATCH] worktree: add per-worktree config files

2018-09-26 Thread Duy Nguyen
On Tue, Sep 25, 2018 at 11:26 PM Junio C Hamano  wrote:
> Warning: devil's advocate mode on.

Oh good. I need to a good dose of sanity anyway.

> Done in either way, this will confuse the users.  What is the reason
> why people are led to think it is a good idea to use multiple
> worktrees, even when they need different settings?  What do they
> want out of "multiple worktrees linked to a single repository" as
> opposed to just a simple "as many clones as necessary"?  Reduced
> disk footprint?

I believe the main selling point of multiple worktrees is sharing
refs. You could easily avoid expensive clones with --local, but
synchronizing between different clones is not very convenient. Other
than that, different worktrees tend to behave like separate clones.

This leaves a gray area where other things should be shared or not. I
think the preference (or default mode) is still _not_ shared (*).
Sharing more things (besides refs and object database) is just a new
opportunity popping up when we implement multiple worktrees. Since
multiple worktrees (or clones before its time) are grouped together,
sometimes you would like to share common configuration. We could sort
of achieve this already with includeIf but again not as convenient.

(*) real life is not that simple. Since refs are shared, including
_remotes_ refs, so configuration related to remotes should also be
shared, or it will be a maintenance nightmare. Which is why
$GIT_DIR/config so far has been shared besides the two exceptions that
are core.bare and core.worktree. And I really like to get rid of these
exceptions.

> Is there a better way to achieve that without the
> downside of multiple worktrees (e.g. configuration need to be
> uniform)?

Is there a better way to achieve sharing refs between clones? I gave
it a minute but couldn't come up with a good answer :(

> > (*) "git config --worktree" points back to "config" file when this
> > extension is not present so that it works in any setup.
>
> Shouldn't it barf and error out instead?

The intention is a uniform interface/api that works with both single
and multiple worktrees configurations. Otherwise in your scripts you
would need to write "if single worktree, do this, else do that". If
"git config --worktree" works with both, the existing scripts can be
audited and updated just a bit, adding "--worktree" where the config
should not be shared, and we're done.

> A user who hasn't enabled
> the extension uses --worktree option and misled to believe that the
> setting affects only a single worktree, even though the change is
> made globally---that does not sound like a great end-user experience.

I was talking about a single worktree. But I think here you meant the
user has multiple worktrees, but the extension is not enabled. I'm
probably not clear in the commit message, but "git config" can detect
that the extension has not been enabled, automatically enable it (and
move core.bare and core.worktree (if present) to the main worktree's
private config), so "git config --worktree" will never share the
change.

But perhaps the user should be made aware of this situation and asked
to explicitly enable the extension instead? It's certainly a more
conservative approach.
-- 
Duy


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Junio C Hamano
Taylor Blau  writes:

>> +help.followAlias::
>> +When requesting help for an alias, git prints a line of the
>> +form "'' is aliased to ''". If this option is
>> +set to a positive integer, git proceeds to show the help for
>
> With regard to "set to a positive integer", I'm not sure why this is the
> way that it is. I see below you used 'git_config_int()', but I think
> that 'git_config_bool()' would be more appropriate.
>
> The later understands strings like "yes", "on" or "true", which I think
> is more of what I would expect from a configuration setting such as
> this.

That is, as you read in the next paragraph, because it gives the
number of deciseconds to show a prompt before showing the manpage.

Not that I think this configuration is a good idea (see my review).

>> +the first word of  after the given number of
>> +deciseconds. If the value of this option is negative, the
>> +redirect happens immediately. If the value is 0 (which is the
>> +default), or  begins with an exclamation point, no
>> +redirect takes place.
>
> It was unclear to my originlly why this was given as a configuration
> knob, but my understanding after reading the patch is that this is to do
> _additional_ things besides printing what is aliased to what.
>
> Could you perhaps note this in the documentation?

It may be that the description for the "execute the likely typoed
command" configuration is poorly written and this merely copied the
badness from it.  Over there the prompt gives a chance to ^C out,
which serves useful purpose, and if that is not documented, we should.

On the other hand, I'd rather see this prompt in the new code
removed, because I do not think the prompt given in the new code
here is all that useful.

>> @@ -415,9 +420,34 @@ static const char *check_git_cmd(const char* cmd)
>>
>>  alias = alias_lookup(cmd);
>>  if (alias) {
>> -printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
>> -free(alias);
>> -exit(0);
>> +const char **argv;
>> +int count;
>> +
>> +if (!follow_alias || alias[0] == '!') {
>> +printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
>> +free(alias);
>> +exit(0);
>> +}
>> +fprintf_ln(stderr, _("'%s' is aliased to '%s'"), cmd, alias);
>
> OK, I think that this is a sensible decision: print to STDERR when
> that's not the main purpose of what're doing (e.g., we're going to
> follow the alias momentarily), and STDOUT when it's the only thing we're
> doing.

> Potentially we could call 'fprintf_ln()' only once, and track an `int
> fd` at the top of this block.

I actually think this should always give the output to standard output.

>> +
>> +/*
>> + * We use split_cmdline() to get the first word of the
>> + * alias, to ensure that we use the same rules as when
>> + * the alias is actually used. split_cmdline()
>> + * modifies alias in-place.
>> + */
>> +count = split_cmdline(alias, );
>> +if (count < 0)
>> +die("Bad alias.%s string: %s", cmd,
>> +split_cmdline_strerror(count));
>
> Please wrap this in _() so that translators can translate it.
>
>> +if (follow_alias > 0) {
>> +fprintf_ln(stderr,
>> +   _("Continuing to help for %s in %0.1f 
>> seconds."),
>> +   alias, follow_alias/10.0);
>> +sleep_millisec(follow_alias * 100);
>> +}
>> +return alias;
>
> I'm not sure that this notification is necessary, but I'll defer to the
> judgement of others on this one.

I didn't bother to check the original but this is mimicking an
existing code that lets configuration to be set to num-deciseconds
to pause and give chance to ^C out, and also allows it to be set to
negative to immediately go ahead.  follow-alias at this point cannot
be zero in the codeflow, but it still can be negative.


Re: git fetch behaves weirdely when run in a worktree

2018-09-26 Thread Duy Nguyen
On Wed, Sep 26, 2018 at 6:46 AM Kaartic Sivaraam
 wrote:
> This is the most interesting part of the issue. I **did not** run 'git fetch 
> ...' in between those cat commands. I was surprised by how the contents of 
> FETCH_HEAD are changing without me spawning any git processes that might 
> change it. Am I missing something here? As far as i could remember, there 
> wasn't any IDE running that might automatically spawn git commands especially 
> in that work tree. Weird.

Another possibility is FETCH_HEAD is somehow a symlink (or the whole
worktrees/$BUILD_WORKTREE is a symlink) and is accidentally shared
with some other repos/worktrees.
-- 
Duy


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Duy Nguyen
On Wed, Sep 26, 2018 at 4:42 PM Taylor Blau  wrote:
> > +
> > + /*
> > +  * We use split_cmdline() to get the first word of the
> > +  * alias, to ensure that we use the same rules as when
> > +  * the alias is actually used. split_cmdline()
> > +  * modifies alias in-place.
> > +  */
> > + count = split_cmdline(alias, );
> > + if (count < 0)
> > + die("Bad alias.%s string: %s", cmd,
> > + split_cmdline_strerror(count));
>
> Please wrap this in _() so that translators can translate it.

Yes! And another nit. die(), error(), warning()... usually start the
message with a lowercase letter because we already start the sentence
with a prefix, like

fatal: bad alias.blah blah
-- 
Duy


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Duy Nguyen
On Wed, Sep 26, 2018 at 12:29 PM Rasmus Villemoes  
wrote:
>
> I often use 'git  --help' as a quick way to get the documentation
> for a command. However, I've also trained my muscle memory to use my
> aliases (cp=cherry-pick, co=checkout etc.), which means that I often end
> up doing
>
>   git cp --help
>
> to which git correctly informs me that cp is an alias for
> cherry-pick. However, I already knew that, and what I really wanted was
> the man page for the cherry-pick command.
>
> This introduces a help.followAlias config option that transparently
> redirects to (the first word of) the alias text (provided of course it
> is not a shell command), similar to the option for autocorrect of
> misspelled commands.
>
> The documentation in config.txt could probably be improved.

While at there, maybe you could also mention the behavior of "git
help" when given an alias, in git-help.txt. And you could also add a
hint to suggest this new config help.followAlias there.
-- 
Duy


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Junio C Hamano
Rasmus Villemoes  writes:

> I often use 'git  --help' as a quick way to get the documentation
> for a command. However, I've also trained my muscle memory to use my
> aliases (cp=cherry-pick, co=checkout etc.), which means that I often end
> up doing
>
>   git cp --help
>
> to which git correctly informs me that cp is an alias for
> cherry-pick. However, I already knew that, and what I really wanted was
> the man page for the cherry-pick command.
>
> This introduces a help.followAlias config option that transparently
> redirects to (the first word of) the alias text (provided of course it
> is not a shell command), similar to the option for autocorrect of
> misspelled commands.

While I do agree with you that it would sometimes be very handy if
"git cp --help" behaved identically to "git cherry-pick --help" just
like "git cp -h" behaves identically to "git cherry-pick -h" when
you have "[alias] cp = cherry-pick", I do not think help.followAlias
configuration is a good idea.  I may know, perhaps because I use it
all the time, by heart that "cp" is aliased to "cherry-pick" and
want "git cp --help" to directly give me the manpage, but I may not
remember if "co" was commit or checkout and want to be concisely
told that it is aliased to checkout without seeing the full manpage.
Which means you'd want some way to command line override anyway, and
having to say "git -c help.followAlias=false cp --help" is not a
great solution.

If we expect users to use "git cp --help" a lot more often than "git
help cp" (or the other way around), one way to give a nicer experience
may be to unconditionally make "git cp --help" to directly show the
manpage of cherry-pick, while keeping "git help cp" to never do
that.  Then those who want to remember what "co" is aliased to can
ask "git help co".

> + /*
> +  * We use split_cmdline() to get the first word of the
> +  * alias, to ensure that we use the same rules as when
> +  * the alias is actually used. split_cmdline()
> +  * modifies alias in-place.
> +  */
> + count = split_cmdline(alias, );
> + if (count < 0)
> + die("Bad alias.%s string: %s", cmd,
> + split_cmdline_strerror(count));
> +
> + if (follow_alias > 0) {
> + fprintf_ln(stderr,
> +_("Continuing to help for %s in %0.1f 
> seconds."),
> +alias, follow_alias/10.0);
> + sleep_millisec(follow_alias * 100);
> + }
> + return alias;

If you have "[alias] cp = cherry-pick -n", split_cmdline discards
"-n" and the follow-alias prompt does not even tell you that it did
so, and you get "git help cherry-pick".  This code somehow expects
you to know to jump to the section that describes the "--no-commit"
option.  I do not think that is a reasonable expectation.

When you have "[alias] cp = cherry-pick -n", "git cp --help" should
not do "git help cherry-pick".  Only a single word that exactly
matches a git command should get this treatment.

>   }
>  
>   if (exclude_guides)


Re: [PATCH] help: allow redirecting to help for aliased command

2018-09-26 Thread Taylor Blau
On Wed, Sep 26, 2018 at 12:26:36PM +0200, Rasmus Villemoes wrote:
> I often use 'git  --help' as a quick way to get the documentation
> for a command. However, I've also trained my muscle memory to use my
> aliases (cp=cherry-pick, co=checkout etc.), which means that I often end
> up doing
>
>   git cp --help
>
> to which git correctly informs me that cp is an alias for
> cherry-pick. However, I already knew that, and what I really wanted was
> the man page for the cherry-pick command.

Neat. I have many of those such aliases myself, and have always wanted
something like this. Thanks for taking the time to put such a patch
together :-).

> This introduces a help.followAlias config option that transparently
> redirects to (the first word of) the alias text (provided of course it
> is not a shell command), similar to the option for autocorrect of
> misspelled commands.

Good. I was curious if you were going to introduce a convention along
the lines of, "If the alias begins with a '!', then pass "--help" to it
and it must respond appropriately." I'm glad that you didn't take that
approach.

> The documentation in config.txt could probably be improved. Also, I
> mimicked the autocorrect case in that the "Continuing to ..." text goes
> to stderr, but because of that, I also print the "is aliased to" text to
> stderr, which is different from the current behaviour of using
> stdout. I'm not sure what the most correct thing is, but I assume --help
> is mostly used interactively with stdout and stderr pointing at the same
> place.
>
> Signed-off-by: Rasmus Villemoes 
> ---
>  Documentation/config.txt | 10 ++
>  builtin/help.c   | 36 +---
>  2 files changed, 43 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index ad0f4510c3..8a1fc8064e 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2105,6 +2105,16 @@ help.autoCorrect::
>   value is 0 - the command will be just shown but not executed.
>   This is the default.
>
> +help.followAlias::
> + When requesting help for an alias, git prints a line of the
> + form "'' is aliased to ''". If this option is
> + set to a positive integer, git proceeds to show the help for

With regard to "set to a positive integer", I'm not sure why this is the
way that it is. I see below you used 'git_config_int()', but I think
that 'git_config_bool()' would be more appropriate.

The later understands strings like "yes", "on" or "true", which I think
is more of what I would expect from a configuration setting such as
this.

> + the first word of  after the given number of
> + deciseconds. If the value of this option is negative, the
> + redirect happens immediately. If the value is 0 (which is the
> + default), or  begins with an exclamation point, no
> + redirect takes place.

It was unclear to my originlly why this was given as a configuration
knob, but my understanding after reading the patch is that this is to do
_additional_ things besides printing what is aliased to what.

Could you perhaps note this in the documentation?

>  help.htmlPath::
>   Specify the path where the HTML documentation resides. File system paths
>   and URLs are supported. HTML pages will be prefixed with this path when
> diff --git a/builtin/help.c b/builtin/help.c
> index 8d4f6dd301..ef1c3f0916 100644
> --- a/builtin/help.c
> +++ b/builtin/help.c
> @@ -34,6 +34,7 @@ enum help_format {
>  };
>
>  static const char *html_path;
> +static int follow_alias;
>
>  static int show_all = 0;
>  static int show_guides = 0;
> @@ -273,6 +274,10 @@ static int git_help_config(const char *var, const char 
> *value, void *cb)
>   html_path = xstrdup(value);
>   return 0;
>   }
> + if (!strcmp(var, "help.followalias")) {
> + follow_alias = git_config_int(var, value);
> + return 0;
> + }

Good. I think in modern Git, we'd prefer to write this as a series of
`else if`'s, but this matches the style of the surrounding code. I think
that you could optionally clean up this style as a preparatory commit,
but ultimately I don't think it's worth a reroll on its own.

>   if (!strcmp(var, "man.viewer")) {
>   if (!value)
>   return config_error_nonbool(var);
> @@ -415,9 +420,34 @@ static const char *check_git_cmd(const char* cmd)
>
>   alias = alias_lookup(cmd);
>   if (alias) {
> - printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
> - free(alias);
> - exit(0);
> + const char **argv;
> + int count;
> +
> + if (!follow_alias || alias[0] == '!') {
> + printf_ln(_("'%s' is aliased to '%s'"), cmd, alias);
> + free(alias);
> + exit(0);
> + }
> + fprintf_ln(stderr, _("'%s' is aliased to '%s'"), cmd, alias);

OK, 

Re: [PATCH] git.txt: mention mailing list archive

2018-09-26 Thread Taylor Blau
On Wed, Sep 26, 2018 at 10:05:29AM +0200, Martin Ågren wrote:
> On Thu, 20 Sep 2018 at 21:07, Junio C Hamano  wrote:
> >
> > Martin Ågren  writes:
> >
> > > In the "Reporting Bugs" section of git(1), we refer to the mailing list,
> > > but we do not give any hint about where the archives might be found.
> >
> > And why is it a good idea to give that information in Reporting Bugs
> > section?  Are we asking the bug reporters to look for similar issues
> > in the archive before they send their message?  If so, I think that
>
> Your guess is correct, sorry for forcing you to make one.
>
> > we should be explicit about it, too.  Otherwise, the list archive
> > location would look like an irrelevant noise to those who wanted to
> > find the address to report bugs to.
> >
> > For example, we can say something like this:
> >
> > >  Report bugs to the Git mailing list  where the
> > >  development and maintenance is primarily done.  You do not have to be
> > >  subscribed to the list to send a message there.
> >   +If you want to check to see if the issue has
> >   +been reported already, the list archive can be found at
> >   + and other places.
>
> I think that one reason I avoided spelling out why giving the archive
> location was a good thing to do, was that I didn't want to begin a huge
> list of "please do this and that", scaring away potential bug reporters.
> I think your "If you want to" solves that problem very nicely. I'll wrap
> this up later today.

Yeah. This is a tricky issue in my mind. On the one hand, getting a
deluge of duplicate bug reports is a burden for people who actively
read and respond to the list. On the other hand, imposing such a burden
on bug reporters is a detriment to Git users, who wouldn't benefit from
the fixes that would come with good bug reports.

But I think that the above is only a problem if bug reporters are
consistently ignoring this advice. I don't think they will, since the
barrier to entry is already quite high (e.g., sending email is more
foreign to some than opening a GitHub issue, say).

So, I think that the suggestion above is a good one, since I believe
we'd rather get some bad bug reports than no bug reports at all.

Thanks,
Taylor


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-09-26 Thread Taylor Blau
On Wed, Sep 26, 2018 at 12:12:22AM -0400, Jeff King wrote:
> On Tue, Sep 25, 2018 at 03:31:36PM -0700, Junio C Hamano wrote:
>
> > Christian Couder  writes:
> >
> > > The main issue that this patch series tries to solve is that
> > > extensions.partialclone config option limits the partial clone and
> > > promisor features to only one remote. One related issue is that it
> > > also prevents to have other kind of promisor/partial clone/odb
> > > remotes. By other kind I mean remotes that would not necessarily be
> > > git repos, but that could store objects (that's where ODB, for Object
> > > DataBase, comes from) and could provide those objects to Git through a
> > > helper (or driver) script or program.
> >
> > I do not think "sources that are not git repositories" is all that
> > interesting, unless they can also serve as the source for ext::
> > remote helper.  And if they can serve "git fetch ext::...", I think
> > they can be treated just like a normal Git repository by the
> > backfill code that needs to lazily populate the partial clone.
>
> I don't know about that. Imagine I had a regular Git repo with a bunch
> of large blobs, and then I also stored those large blobs in something
> like S3 that provides caching, geographic locality, and resumable
> transfers.
>
> [ ... ]
>
> Now if you are arguing that the interface to the external-odb helper
> script should be that it _looks_ like upload-pack, but simply advertises
> no refs and will let you fetch any object, that makes more sense to me.
> It's not something you could "git clone", but you can "git fetch" from
> it.
>
> However, that may be an overly constricting interface for the helper.
> E.g., we might want to be able to issue several requests and have them
> transfer in parallel. But I suppose we could teach that trick to
> upload-pack in the long run, as it may be applicable even to fetching
> from "real" git repos.
>
> Hmm. Actually, I kind of like that direction the more I think about it.

Yes, this is an important design decision for Git LFS, which I believe
is important to this series. Git LFS allows the caller to issue `n`
parallel object transfers (uploads or downloads) at a time, which is
useful when, say, checking out a repository that has many large objects.

We do this trick with 'filter.lfs.process', where we accumulate many Git
LFS objects that we wish to tell Git about so that it can check them out
into the working copy, and then promise that we will provide the
contents later (e.g., by sending status=delayed).

We then "batch" up all of those requests, issue them all at once (after
which the LFS API will tell us the URLs of where to upload/download each
item), and then we open "N" threads to do that work.

After all of that, we respond back with all of the objects that we had
to download, and close the process filter.

Thanks,
Taylor


Re: [PATCH v2 2/3] transport.c: introduce core.alternateRefsCommand

2018-09-26 Thread Taylor Blau
On Tue, Sep 25, 2018 at 11:33:37PM -0400, Jeff King wrote:
> On Tue, Sep 25, 2018 at 06:09:35PM -0700, Taylor Blau wrote:
>
> > > So I think this is fine (modulo that the grep and sed can be combined).
> > > Yet another option would be to simply strip away everything except the
> > > object id (which is all we care about), like:
> > >
> > >   depacketize | perl -lne '/^(\S+) \.have/ and print $1'
> >
> > Thanks for this. This is the suggestion I ended up taking (modulo taking
> > '-' as the first argument to 'depacketize').
>
> I don't think depacketize takes any arguments. It always reads from
> stdin directly, doesn't it? Your "-" is not hurting anything, but it is
> totally ignored.

Yep, certainly. I think that I was drawn to this claim because I watched
t5410 fail after applying the above recommendation, so thusly assumed
that it was my fault for not passing `-` to 'depacketize()`.

In the end, I'm not sure why the test failed originally (it's likely
that I hadn't removed the ".have" part of 'expect_haves()', yet). But, I
removed the `-` in my local copy of v3, and the tests passes on all
revisions of this series that have it.

> A perl tangent if you're interested:
>
>   Normally for shell functions like this that are just wrappers around
>   perl snippets, I would suggest to pass "$@" from the function's
>   arguments to perl. So for example if we had:
>
> haves_from_packets () {
>   perl -lne '/^(\S+) \.have/ and print $1' "$@"
> }
>
>   then you could call it with a filename:
>
> haves_from_packets packets
>
>   or input on stdin:
>
> haves_from_packets 
>   and either works (this is magic from perl's "-p" loop, but you get the
>   same if you write "while (<>)" explicitly in your program).
>
>   But because depacketize() has to use byte-wise read() calls, it
>   doesn't get that magic for free. And it did not seem worth the effort
>   to implement, when shell redirections are so easy. ;)

To be clear, we ought to leave this function as:

  extract_haves () {
depacketize | perl -lne '/^(\S+) \.have/ and print $1'
  }

Or are you suggesting that we change it to:

  extract_haves () {
perl -lne '/^(\S+) \.have/ and print $1'
  }

And call it as:

  printf "" | git receive-pack fork >actual &&
  depacketize actual.packets
  extract_haves actual.haves &&

Frankly, (and I think that this is what you're getting at in your reply
above), I think that the former (e.g., calling 'depacketize()' in
'extract_haves()') is cleaner. This approach leaves us with "actual" and
"actual.haves", and obviates the need for another intermediary,
"actual.packets".

> > The 'print $1' part of this makes things a lot nicer, actually, having
> > removed the " .have" suffix. We can get rid of the expect_haves()
> > function above, and instead call 'git rev-parse' inline and get the
> > right results.
>
> Yes. You can even do it all in a single rev-parse call.

Indeed.

Thanks,
Taylor


  1   2   >