date:20180620

Re: [PATCH] git-stash.txt: document show options

2018-06-20 Thread Johannes Schindelin

Hi Mike,

there is a GSoC project going on to make `git stash` a builtin, and I
think that this change makes sense and at the same time needs to be
coordinated with Paul, who is performing that builtin work, so I Cc:ed him
for visibility.

Ciao,
Johannes

On Wed, 20 Jun 2018, Mike Frysinger wrote:

> Signed-off-by: Mike Frysinger 
> ---
>  Documentation/git-stash.txt | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
> index 7ef8c4791177..76e4ca788102 100644
> --- a/Documentation/git-stash.txt
> +++ b/Documentation/git-stash.txt
> @@ -106,7 +106,7 @@ stash@{1}: On master: 9cc0589... Add git-stash
>  The command takes options applicable to the 'git log'
>  command to control what is shown and how. See linkgit:git-log[1].
>  
> -show []::
> +show [] []::
>  
>   Show the changes recorded in the stash entry as a diff between the
>   stashed contents and the commit back when the stash entry was first
> @@ -116,6 +116,9 @@ show []::
>   to view the second most recent entry in patch form).
>   You can use stash.showStat and/or stash.showPatch config variables
>   to change the default behavior.
> ++
> +The command takes options applicable to the 'git diff'
> +command to control what is shown and how. See linkgit:git-diff[1].
>  
>  pop [--index] [-q|--quiet] []::
>  
> -- 
> 2.17.1
> 
>

[PATCH] git-stash.txt: document show options

2018-06-20 Thread Mike Frysinger

Signed-off-by: Mike Frysinger 
---
 Documentation/git-stash.txt | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index 7ef8c4791177..76e4ca788102 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -106,7 +106,7 @@ stash@{1}: On master: 9cc0589... Add git-stash
 The command takes options applicable to the 'git log'
 command to control what is shown and how. See linkgit:git-log[1].
 
-show []::
+show [] []::
 
Show the changes recorded in the stash entry as a diff between the
stashed contents and the commit back when the stash entry was first
@@ -116,6 +116,9 @@ show []::
to view the second most recent entry in patch form).
You can use stash.showStat and/or stash.showPatch config variables
to change the default behavior.
++
+The command takes options applicable to the 'git diff'
+command to control what is shown and how. See linkgit:git-diff[1].
 
 pop [--index] [-q|--quiet] []::
 
-- 
2.17.1

Re: [msysGit] Possible git status problem at case insensitive file system

2018-06-20 Thread Philip Oakley


Hi Frank,

Your system Clock looks to be providing the wrong date for your emails.

The last XP version was 
https://github.com/git-for-windows/git/releases/tag/v2.10.0.windows.1 so you 
may want to upgrade to that. (see FAQs 
https://github.com/git-for-windows/git/wiki/FAQ)


It won't solve the capitalisation problem - that is a Windows FS issue. Git 
assumes case matters, but the FS will fetch directories and branches case 
insensitively.


Philip


- Original Message - 
From: "Frank Li" 

To: "Git List" ; "msysGit" 
Sent: Monday, August 09, 2010 5:22 AM
Subject: [msysGit] Possible git status problem at case insensitive file 
system




All:
   I use msysgit 1.7.0.2 at windows xp.
   Problem: git status will list tracked directory as untracked dir.
   Duplicate:
   1. mkdir test, cd test
   2. git init-db
   3. mkdir d, cd d
   4. touch a.c
   5. git add a.c
   6. git commit -a -m "test"
   7. cd ..
   8.  mv d d1
   9.  mv d1 D
  10. git status


# On branch master
# Untracked files:
#   (use "git add ..." to include in what will be committed)
#
#   D/
nothing added to commit but untracked files present (use "git add" to 
track)


   D/ should be same as d/ at case insensitive file system.
   D/ should not listed by git status.

best regards
Frank Li
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
--
*** Please reply-to-all at all times ***
*** (do not pretend to know who is subscribed and who is not) ***
*** Please avoid top-posting. ***
The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github 
accounts are free.


You received this message because you are subscribed to the Google
Groups "msysGit" group.
To post to this group, send email to msys...@googlegroups.com
To unsubscribe from this group, send email to
msysgit+unsubscr...@googlegroups.com
For more options, and view previous threads, visit this group at
http://groups.google.com/group/msysgit?hl=en_US?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"Git for Windows" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to msysgit+unsubscr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

[PATCH] submodule.c: report the submodule that an error occurs in

2018-06-20 Thread Stefan Beller

When an error occurs in updating the working tree of a submodule in
submodule_move_head, tell the user which submodule the error occurred in.

The call to read-tree contains a super-prefix, such that the read-tree
will correctly report any path related issues, but some error messages
do not contain a path, for example:

  ~/gerrit$ git checkout --recurse-submodules origin/master
  ~/gerrit$ fatal: failed to unpack tree object 
07672f31880ba80300b38492df9d0acfcd6ee00a

Give the hint which submodule has a problem.

Signed-off-by: Stefan Beller 
---
 submodule.c   | 2 +-
 t/lib-submodule-update.sh | 3 ++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index 939d6870ecd..ebd092a14fd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1668,7 +1668,7 @@ int submodule_move_head(const char *path,
argv_array_push(&cp.args, new_head ? new_head : empty_tree_oid_hex());
 
if (run_command(&cp)) {
-   ret = -1;
+   ret = error(_("Submodule '%s' could not be updated."), path);
goto out;
}
 
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 1f38a85371a..e27f5d8541d 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -781,7 +781,8 @@ test_submodule_recursing_with_args_common() {
(
cd submodule_update &&
git branch -t invalid_sub1 origin/invalid_sub1 &&
-   test_must_fail $command invalid_sub1 &&
+   test_must_fail $command invalid_sub1 2>err &&
+   grep sub1 err &&
test_superproject_content origin/add_sub1 &&
test_submodule_content sub1 origin/add_sub1
)
-- 
2.18.0.rc2.346.g013aa6912e-goog

[PATCH v2] docs: link to gitsubmodules

2018-06-20 Thread Brandon Williams

Add a link to gitsubmodules(7) under the `submodule.active` entry in
git-config(1).

Signed-off-by: Brandon Williams 
---
 Documentation/config.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a..340eb1f3c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3327,12 +3327,13 @@ submodule..ignore::
 submodule..active::
Boolean value indicating if the submodule is of interest to git
commands.  This config option takes precedence over the
-   submodule.active config option.
+   submodule.active config option. See linkgit:gitsubmodules[7] for
+   details.
 
 submodule.active::
A repeated field which contains a pathspec used to match against a
submodule's path to determine if the submodule is of interest to git
-   commands.
+   commands. See linkgit:gitsubmodules[7] for details.
 
 submodule.recurse::
Specifies if commands recurse into submodules by default. This
-- 
2.18.0.rc1.244.gcf134e6275-goog

[PATCH v3 8/8] fetch-pack: implement ref-in-want

2018-06-20 Thread Brandon Williams

Implement ref-in-want on the client side so that when a server supports
the "ref-in-want" feature, a client will send "want-ref" lines for each
reference the client wants to fetch.

Signed-off-by: Brandon Williams 
---
 fetch-pack.c   | 35 +++---
 remote.c   |  1 +
 remote.h   |  1 +
 t/t5703-upload-pack-ref-in-want.sh |  4 ++--
 4 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index f1709e816..681b44061 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -1102,9 +1102,10 @@ static void add_shallow_requests(struct strbuf *req_buf,
 
 static void add_wants(const struct ref *wants, struct strbuf *req_buf)
 {
+   int use_ref_in_want = server_supports_feature("fetch", "ref-in-want", 
0);
+
for ( ; wants ; wants = wants->next) {
const struct object_id *remote = &wants->old_oid;
-   const char *remote_hex;
struct object *o;
 
/*
@@ -1122,8 +1123,10 @@ static void add_wants(const struct ref *wants, struct 
strbuf *req_buf)
continue;
}
 
-   remote_hex = oid_to_hex(remote);
-   packet_buf_write(req_buf, "want %s\n", remote_hex);
+   if (!use_ref_in_want || wants->exact_oid)
+   packet_buf_write(req_buf, "want %s\n", 
oid_to_hex(remote));
+   else
+   packet_buf_write(req_buf, "want-ref %s\n", wants->name);
}
 }
 
@@ -1334,6 +1337,29 @@ static void receive_shallow_info(struct fetch_pack_args 
*args,
args->deepen = 1;
 }
 
+static void receive_wanted_refs(struct packet_reader *reader, struct ref *refs)
+{
+   process_section_header(reader, "wanted-refs", 0);
+   while (packet_reader_read(reader) == PACKET_READ_NORMAL) {
+   struct object_id oid;
+   const char *end;
+   struct ref *r = NULL;
+
+   if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ')
+   die("expected wanted-ref, got '%s'", reader->line);
+
+   for (r = refs; r; r = r->next) {
+   if (!strcmp(end, r->name)) {
+   oidcpy(&r->old_oid, &oid);
+   break;
+   }
+   }
+   }
+
+   if (reader->status != PACKET_READ_DELIM)
+   die("error processing wanted refs: %d", reader->status);
+}
+
 enum fetch_state {
FETCH_CHECK_LOCAL = 0,
FETCH_SEND_REQUEST,
@@ -1408,6 +1434,9 @@ static struct ref *do_fetch_pack_v2(struct 
fetch_pack_args *args,
if (process_section_header(&reader, "shallow-info", 1))
receive_shallow_info(args, &reader);
 
+   if (process_section_header(&reader, "wanted-refs", 1))
+   receive_wanted_refs(&reader, ref);
+
/* get the pack */
process_section_header(&reader, "packfile", 0);
if (get_pack(args, fd, pack_lockfile))
diff --git a/remote.c b/remote.c
index abe80c139..2c2376fff 100644
--- a/remote.c
+++ b/remote.c
@@ -1735,6 +1735,7 @@ int get_fetch_map(const struct ref *remote_refs,
if (refspec->exact_sha1) {
ref_map = alloc_ref(name);
get_oid_hex(name, &ref_map->old_oid);
+   ref_map->exact_oid = 1;
} else {
ref_map = get_remote_ref(remote_refs, name);
}
diff --git a/remote.h b/remote.h
index 45ecc6cef..976292152 100644
--- a/remote.h
+++ b/remote.h
@@ -73,6 +73,7 @@ struct ref {
force:1,
forced_update:1,
expect_old_sha1:1,
+   exact_oid:1,
deletion:1;
 
enum {
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
index 979ab6d03..b94a51380 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -204,7 +204,7 @@ test_expect_success 'server is initially ahead - no ref in 
want' '
grep "ERR upload-pack: not our ref" err
 '
 
-test_expect_failure 'server is initially ahead - ref in want' '
+test_expect_success 'server is initially ahead - ref in want' '
git -C "$REPO" config uploadpack.allowRefInWant true &&
rm -rf local &&
cp -r "$LOCAL_PRISTINE" local &&
@@ -228,7 +228,7 @@ test_expect_success 'server is initially behind - no ref in 
want' '
test_cmp expected actual
 '
 
-test_expect_failure 'server is initially behind - ref in want' '
+test_expect_success 'server is initially behind - ref in want' '
git -C "$REPO" config uploadpack.allowRefInWant true &&
rm -rf local &&
cp -r "$LOCAL_PRISTINE" local &&
-- 
2.18.0.rc1.244.gcf134e6275-goog

[PATCH v3 7/8] fetch-pack: put shallow info in output parameter

2018-06-20 Thread Brandon Williams

Expand the transport fetch method signature, by adding an output
parameter, to allow transports to return information about the refs they
have fetched.  Then communicate shallow status information through this
mechanism instead of by modifying the input list of refs.

This does require clients to sometimes generate the ref map twice: once
from the list of refs provided by the remote (as is currently done) and
potentially once from the new list of refs that the fetch mechanism
provides.

Signed-off-by: Brandon Williams 
---
 builtin/clone.c  |  4 ++--
 builtin/fetch.c  | 28 
 fetch-object.c   |  2 +-
 fetch-pack.c | 17 +
 transport-helper.c   |  6 --
 transport-internal.h |  9 -
 transport.c  | 34 --
 transport.h  |  3 ++-
 8 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 99e73dae8..8f86d99c5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -1155,7 +1155,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
}
 
if (!is_local && !complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs);
+   transport_fetch_refs(transport, mapped_refs, NULL);
 
remote_head = find_ref_by_name(refs, "HEAD");
remote_head_points_at =
@@ -1197,7 +1197,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (is_local)
clone_local(path, git_dir);
else if (refs && complete_refs_before_fetch)
-   transport_fetch_refs(transport, mapped_refs);
+   transport_fetch_refs(transport, mapped_refs, NULL);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
   branch_top.buf, reflog_msg.buf, transport,
diff --git a/builtin/fetch.c b/builtin/fetch.c
index b600e1f10..8362a97a2 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -946,11 +946,13 @@ static int quickfetch(struct ref *ref_map)
return check_connected(iterate_ref_map, &rm, &opt);
 }
 
-static int fetch_refs(struct transport *transport, struct ref *ref_map)
+static int fetch_refs(struct transport *transport, struct ref *ref_map,
+ struct ref **updated_remote_refs)
 {
int ret = quickfetch(ref_map);
if (ret)
-   ret = transport_fetch_refs(transport, ref_map);
+   ret = transport_fetch_refs(transport, ref_map,
+  updated_remote_refs);
if (ret)
transport_unlock_pack(transport);
return ret;
@@ -1106,7 +1108,7 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   if (!fetch_refs(transport, ref_map))
+   if (!fetch_refs(transport, ref_map, NULL))
consume_refs(transport, ref_map);
 
if (gsecondary) {
@@ -1122,6 +1124,7 @@ static int do_fetch(struct transport *transport,
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
const struct ref *remote_refs;
+   struct ref *updated_remote_refs = NULL;
struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
if (tags == TAGS_DEFAULT) {
@@ -1172,7 +1175,24 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) 
{
+
+   if (fetch_refs(transport, ref_map, &updated_remote_refs)) {
+   free_refs(ref_map);
+   retcode = 1;
+   goto cleanup;
+   }
+   if (updated_remote_refs) {
+   /*
+* Regenerate ref_map using the updated remote refs.  This is
+* to account for additional information which may be provided
+* by the transport (e.g. shallow info).
+*/
+   free_refs(ref_map);
+   ref_map = get_ref_map(transport->remote, updated_remote_refs, 
rs,
+ tags, &autotags);
+   free_refs(updated_remote_refs);
+   }
+   if (consume_refs(transport, ref_map)) {
free_refs(ref_map);
retcode = 1;
goto cleanup;
diff --git a/fetch-object.c b/fetch-object.c
index 853624f81..48fe63dd6 100644
--- a/fetch-object.c
+++ b/fetch-object.c
@@ -19,7 +19,7 @@ static void fetch_refs(const char *remote_name, struct ref 
*ref)
 
transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1");
transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1");
-   transport_fetch_re

[PATCH v3 6/8] fetch: refactor to make function args narrower

2018-06-20 Thread Brandon Williams

Refactor find_non_local_tags and get_ref_map to only take the
information they need instead of the entire transport struct. Besides
improving code clarity, this also improves their flexibility, allowing
for a different set of refs to be used instead of relying on the ones
stored in the transport struct.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 52 -
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ee8b87c78..b600e1f10 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -254,9 +254,9 @@ static int will_fetch(struct ref **head, const unsigned 
char *sha1)
return 0;
 }
 
-static void find_non_local_tags(struct transport *transport,
-   struct ref **head,
-   struct ref ***tail)
+static void find_non_local_tags(const struct ref *refs,
+   struct ref **head,
+   struct ref ***tail)
 {
struct string_list existing_refs = STRING_LIST_INIT_DUP;
struct string_list remote_refs = STRING_LIST_INIT_NODUP;
@@ -264,7 +264,7 @@ static void find_non_local_tags(struct transport *transport,
struct string_list_item *item = NULL;
 
for_each_ref(add_existing, &existing_refs);
-   for (ref = transport_get_remote_refs(transport, NULL); ref; ref = 
ref->next) {
+   for (ref = refs; ref; ref = ref->next) {
if (!starts_with(ref->name, "refs/tags/"))
continue;
 
@@ -338,7 +338,8 @@ static void find_non_local_tags(struct transport *transport,
string_list_clear(&remote_refs, 0);
 }
 
-static struct ref *get_ref_map(struct transport *transport,
+static struct ref *get_ref_map(struct remote *remote,
+  const struct ref *remote_refs,
   struct refspec *rs,
   int tags, int *autotags)
 {
@@ -346,27 +347,11 @@ static struct ref *get_ref_map(struct transport 
*transport,
struct ref *rm;
struct ref *ref_map = NULL;
struct ref **tail = &ref_map;
-   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = &orefs;
 
struct string_list existing_refs = STRING_LIST_INIT_DUP;
-   const struct ref *remote_refs;
-
-   if (rs->nr)
-   refspec_ref_prefixes(rs, &ref_prefixes);
-   else if (transport->remote && transport->remote->fetch.nr)
-   refspec_ref_prefixes(&transport->remote->fetch, &ref_prefixes);
-
-   if (ref_prefixes.argc &&
-   (tags == TAGS_SET || (tags == TAGS_DEFAULT && !rs->nr))) {
-   argv_array_push(&ref_prefixes, "refs/tags/");
-   }
-
-   remote_refs = transport_get_remote_refs(transport, &ref_prefixes);
-
-   argv_array_clear(&ref_prefixes);
 
if (rs->nr) {
struct refspec *fetch_refspec;
@@ -403,7 +388,7 @@ static struct ref *get_ref_map(struct transport *transport,
if (refmap.nr)
fetch_refspec = &refmap;
else
-   fetch_refspec = &transport->remote->fetch;
+   fetch_refspec = &remote->fetch;
 
for (i = 0; i < fetch_refspec->nr; i++)
get_fetch_map(ref_map, &fetch_refspec->items[i], 
&oref_tail, 1);
@@ -411,7 +396,6 @@ static struct ref *get_ref_map(struct transport *transport,
die("--refmap option is only meaningful with command-line 
refspec(s).");
} else {
/* Use the defaults */
-   struct remote *remote = transport->remote;
struct branch *branch = branch_get(NULL);
int has_merge = branch_has_merge_config(branch);
if (remote &&
@@ -450,7 +434,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* also fetch all tags */
get_fetch_map(remote_refs, tag_refspec, &tail, 0);
else if (tags == TAGS_DEFAULT && *autotags)
-   find_non_local_tags(transport, &ref_map, &tail);
+   find_non_local_tags(remote_refs, &ref_map, &tail);
 
/* Now append any refs to be updated opportunistically: */
*tail = orefs;
@@ -1137,6 +1121,8 @@ static int do_fetch(struct transport *transport,
struct ref *ref_map;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
+   const struct ref *remote_refs;
+   struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
 
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
@@ -1152,7 +1138,21 @@ static int do_fetch(struct transport *transport,
goto cleanup;
}
 
-   ref_map = get_ref_map(transport, rs, tags, &autotags);
+   if (rs->nr)
+   refspec_ref_prefixe

[PATCH v3 0/8] ref-in-want

2018-06-20 Thread Brandon Williams

Changes in v3:
* Discussion seemed to settle on keeping the simplified version of
  ref-in-want where the "want-ref" line only accepts full ref names.  If
  we want to we can add patterns at a later time.
* Reverted back to v1's behavior where requesting a ref that doesn't
  exists is a hard error on the server.  I went back and forth many
  times on what the right thing to do here is and decided that a hard
  error works much cleaner for the time being.
* Some typos.

Brandon Williams (8):
  test-pkt-line: add unpack-sideband subcommand
  upload-pack: implement ref-in-want
  upload-pack: test negotiation with changing repository
  fetch: refactor the population of peer ref OIDs
  fetch: refactor fetch_refs into two functions
  fetch: refactor to make function args narrower
  fetch-pack: put shallow info in output parameter
  fetch-pack: implement ref-in-want

 Documentation/config.txt|   7 +
 Documentation/technical/protocol-v2.txt |  28 ++-
 builtin/clone.c |   4 +-
 builtin/fetch.c | 131 -
 fetch-object.c  |   2 +-
 fetch-pack.c|  52 +++--
 remote.c|   1 +
 remote.h|   1 +
 t/helper/test-pkt-line.c|  33 
 t/lib-httpd.sh  |   1 +
 t/lib-httpd/apache.conf |   8 +
 t/lib-httpd/one-time-sed.sh |  16 ++
 t/t5703-upload-pack-ref-in-want.sh  | 245 
 transport-helper.c  |   6 +-
 transport-internal.h|   9 +-
 transport.c |  34 +++-
 transport.h |   3 +-
 upload-pack.c   |  64 +++
 18 files changed, 568 insertions(+), 77 deletions(-)
 create mode 100644 t/lib-httpd/one-time-sed.sh
 create mode 100755 t/t5703-upload-pack-ref-in-want.sh

-- 
2.18.0.rc1.244.gcf134e6275-goog

[PATCH v3 2/8] upload-pack: implement ref-in-want

2018-06-20 Thread Brandon Williams

Currently, while performing packfile negotiation, clients are only
allowed to specify their desired objects using object ids.  This causes
a vulnerability to failure when an object turns non-existent during
negotiation, which may happen if, for example, the desired repository is
provided by multiple Git servers in a load-balancing arrangement and
there exists replication delay.

In order to eliminate this vulnerability, implement the ref-in-want
feature for the 'fetch' command in protocol version 2.  This feature
enables the 'fetch' command to support requests in the form of ref names
through a new "want-ref " parameter.  At the conclusion of
negotiation, the server will send a list of all of the wanted references
(as provided by "want-ref" lines) in addition to the generated packfile.

Signed-off-by: Brandon Williams 
---
 Documentation/config.txt|   7 ++
 Documentation/technical/protocol-v2.txt |  28 -
 t/t5703-upload-pack-ref-in-want.sh  | 153 
 upload-pack.c   |  64 ++
 4 files changed, 251 insertions(+), 1 deletion(-)
 create mode 100755 t/t5703-upload-pack-ref-in-want.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a..fb1dd7428 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3479,6 +3479,13 @@ Note that this configuration variable is ignored if it 
is seen in the
 repository-level config (this is a safety measure against fetching from
 untrusted repositories).
 
+uploadpack.allowRefInWant::
+   If this option is set, `upload-pack` will support the `ref-in-want`
+   feature of the protocol version 2 `fetch` command.  This feature
+   is intended for the benefit of load-balanced servers which may
+   not have the same view of what OIDs their refs point to due to
+   replication delay.
+
 url..insteadOf::
Any URL that starts with this value will be rewritten to
start, instead, with . In cases where some site serves a
diff --git a/Documentation/technical/protocol-v2.txt 
b/Documentation/technical/protocol-v2.txt
index 49bda76d2..9dee75d45 100644
--- a/Documentation/technical/protocol-v2.txt
+++ b/Documentation/technical/protocol-v2.txt
@@ -299,12 +299,21 @@ included in the client's request:
for use with partial clone and partial fetch operations. See
`rev-list` for possible "filter-spec" values.
 
+If the 'ref-in-want' feature is advertised, the following argument can
+be included in the client's request as well as the potential addition of
+the 'wanted-refs' section in the server's response as explained below.
+
+want-ref 
+   Indicates to the server that the client wants to retrieve a
+   particular ref, where  is the full name of a ref on the
+   server.
+
 The response of `fetch` is broken into a number of sections separated by
 delimiter packets (0001), with each section beginning with its section
 header.
 
 output = *section
-section = (acknowledgments | shallow-info | packfile)
+section = (acknowledgments | shallow-info | wanted-refs | packfile)
  (flush-pkt | delim-pkt)
 
 acknowledgments = PKT-LINE("acknowledgments" LF)
@@ -319,6 +328,10 @@ header.
 shallow = "shallow" SP obj-id
 unshallow = "unshallow" SP obj-id
 
+wanted-refs = PKT-LINE("wanted-refs" LF)
+ *PKT-LINE(wanted-ref LF)
+wanted-ref = obj-id SP refname
+
 packfile = PKT-LINE("packfile" LF)
   *PKT-LINE(%x01-03 *%x00-ff)
 
@@ -379,6 +392,19 @@ header.
* This section is only included if a packfile section is also
  included in the response.
 
+wanted-refs section
+   * This section is only included if the client has requested a
+ ref using a 'want-ref' line and if a packfile section is also
+ included in the response.
+
+   * Always begins with the section header "wanted-refs"
+
+   * The server will send a ref listing (" ") for
+ each reference requested using 'want-ref' lines.
+
+   * The server MUST NOT send any refs which were not requested
+ using 'want-ref' lines.
+
 packfile section
* This section is only included if the client has sent 'want'
  lines in its request and either requested that no more
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
new file mode 100755
index 0..0ef182970
--- /dev/null
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -0,0 +1,153 @@
+#!/bin/sh
+
+test_description='upload-pack ref-in-want'
+
+. ./test-lib.sh
+
+get_actual_refs() {
+   sed -n '/wanted-refs/,/0001/p' actual_refs
+}
+
+get_actual_commits() {
+   sed -n '/packfile/,//p' o.pack &&
+   git index-pack o.pack &&
+   git verify-pack -v o.idx | grep commit | cut -c-40 | sort 
>actual_commits
+}
+
+check_output() {
+   get_actual_refs &&
+   test_cmp expected_refs actual_refs &&
+   get_actual_commits &&
+

[PATCH v3 4/8] fetch: refactor the population of peer ref OIDs

2018-06-20 Thread Brandon Williams

Populate peer ref OIDs in get_ref_map instead of do_fetch. Besides
tightening scopes of variables in the code, this also prepares for
get_ref_map being able to be called multiple times within do_fetch.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index ea5b9669a..545635448 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -351,6 +351,7 @@ static struct ref *get_ref_map(struct transport *transport,
/* opportunistically-updated references: */
struct ref *orefs = NULL, **oref_tail = &orefs;
 
+   struct string_list existing_refs = STRING_LIST_INIT_DUP;
const struct ref *remote_refs;
 
if (rs->nr)
@@ -458,7 +459,23 @@ static struct ref *get_ref_map(struct transport *transport,
tail = &rm->next;
}
 
-   return ref_remove_duplicates(ref_map);
+   ref_map = ref_remove_duplicates(ref_map);
+
+   for_each_ref(add_existing, &existing_refs);
+   for (rm = ref_map; rm; rm = rm->next) {
+   if (rm->peer_ref) {
+   struct string_list_item *peer_item =
+   string_list_lookup(&existing_refs,
+  rm->peer_ref->name);
+   if (peer_item) {
+   struct object_id *old_oid = peer_item->util;
+   oidcpy(&rm->peer_ref->old_oid, old_oid);
+   }
+   }
+   }
+   string_list_clear(&existing_refs, 1);
+
+   return ref_map;
 }
 
 #define STORE_REF_ERROR_OTHER 1
@@ -1110,14 +1127,10 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
 static int do_fetch(struct transport *transport,
struct refspec *rs)
 {
-   struct string_list existing_refs = STRING_LIST_INIT_DUP;
struct ref *ref_map;
-   struct ref *rm;
int autotags = (transport->remote->fetch_tags == 1);
int retcode = 0;
 
-   for_each_ref(add_existing, &existing_refs);
-
if (tags == TAGS_DEFAULT) {
if (transport->remote->fetch_tags == 2)
tags = TAGS_SET;
@@ -1136,18 +1149,6 @@ static int do_fetch(struct transport *transport,
if (!update_head_ok)
check_not_current_branch(ref_map);
 
-   for (rm = ref_map; rm; rm = rm->next) {
-   if (rm->peer_ref) {
-   struct string_list_item *peer_item =
-   string_list_lookup(&existing_refs,
-  rm->peer_ref->name);
-   if (peer_item) {
-   struct object_id *old_oid = peer_item->util;
-   oidcpy(&rm->peer_ref->old_oid, old_oid);
-   }
-   }
-   }
-
if (tags == TAGS_DEFAULT && autotags)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1");
if (prune) {
@@ -1183,7 +1184,6 @@ static int do_fetch(struct transport *transport,
}
 
  cleanup:
-   string_list_clear(&existing_refs, 1);
return retcode;
 }
 
-- 
2.18.0.rc1.244.gcf134e6275-goog

[PATCH v3 1/8] test-pkt-line: add unpack-sideband subcommand

2018-06-20 Thread Brandon Williams

Add an 'unpack-sideband' subcommand to the test-pkt-line helper to
enable unpacking packet line data sent multiplexed using a sideband.

Signed-off-by: Brandon Williams 
---
 t/helper/test-pkt-line.c | 33 +
 1 file changed, 33 insertions(+)

diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c
index 0f19e53c7..30775f986 100644
--- a/t/helper/test-pkt-line.c
+++ b/t/helper/test-pkt-line.c
@@ -1,3 +1,4 @@
+#include "cache.h"
 #include "pkt-line.h"
 
 static void pack_line(const char *line)
@@ -48,6 +49,36 @@ static void unpack(void)
}
 }
 
+static void unpack_sideband(void)
+{
+   struct packet_reader reader;
+   packet_reader_init(&reader, 0, NULL, 0,
+  PACKET_READ_GENTLE_ON_EOF |
+  PACKET_READ_CHOMP_NEWLINE);
+
+   while (packet_reader_read(&reader) != PACKET_READ_EOF) {
+   int band;
+   int fd;
+
+   switch (reader.status) {
+   case PACKET_READ_EOF:
+   break;
+   case PACKET_READ_NORMAL:
+   band = reader.line[0] & 0xff;
+   if (band < 1 || band > 2)
+   die("unexpected side band %d", band);
+   fd = band;
+
+   write_or_die(fd, reader.line + 1, reader.pktlen - 1);
+   break;
+   case PACKET_READ_FLUSH:
+   return;
+   case PACKET_READ_DELIM:
+   break;
+   }
+   }
+}
+
 int cmd_main(int argc, const char **argv)
 {
if (argc < 2)
@@ -57,6 +88,8 @@ int cmd_main(int argc, const char **argv)
pack(argc - 2, argv + 2);
else if (!strcmp(argv[1], "unpack"))
unpack();
+   else if (!strcmp(argv[1], "unpack-sideband"))
+   unpack_sideband();
else
die("invalid argument '%s'", argv[1]);
 
-- 
2.18.0.rc1.244.gcf134e6275-goog

[PATCH v3 5/8] fetch: refactor fetch_refs into two functions

2018-06-20 Thread Brandon Williams

Refactor the fetch_refs function into a function that does the fetching
of refs and another function that stores them.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 545635448..ee8b87c78 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -967,10 +967,16 @@ static int fetch_refs(struct transport *transport, struct 
ref *ref_map)
int ret = quickfetch(ref_map);
if (ret)
ret = transport_fetch_refs(transport, ref_map);
-   if (!ret)
-   ret |= store_updated_refs(transport->url,
-   transport->remote->name,
-   ref_map);
+   if (ret)
+   transport_unlock_pack(transport);
+   return ret;
+}
+
+static int consume_refs(struct transport *transport, struct ref *ref_map)
+{
+   int ret = store_updated_refs(transport->url,
+transport->remote->name,
+ref_map);
transport_unlock_pack(transport);
return ret;
 }
@@ -1116,7 +1122,8 @@ static void backfill_tags(struct transport *transport, 
struct ref *ref_map)
transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL);
transport_set_option(transport, TRANS_OPT_DEPTH, "0");
transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL);
-   fetch_refs(transport, ref_map);
+   if (!fetch_refs(transport, ref_map))
+   consume_refs(transport, ref_map);
 
if (gsecondary) {
transport_disconnect(gsecondary);
@@ -1165,7 +1172,7 @@ static int do_fetch(struct transport *transport,
   transport->url);
}
}
-   if (fetch_refs(transport, ref_map)) {
+   if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) 
{
free_refs(ref_map);
retcode = 1;
goto cleanup;
-- 
2.18.0.rc1.244.gcf134e6275-goog

Re: [RFC PATCH 04/10] submodule--helper: add a new 'config' subcommand

2018-06-20 Thread Antonio Ospite

On Mon, 14 May 2018 18:33:44 -0700
Stefan Beller  wrote:

> On Mon, May 14, 2018 at 3:58 AM, Antonio Ospite  wrote:
> > Add a new 'config' subcommand to 'submodule--helper', this extra level
> > of indirection makes it possible to add some flexibility to how the
> > submodules configuration is handled.
> >
> > Signed-off-by: Antonio Ospite 
> > ---
> >  builtin/submodule--helper.c | 39 +
> >  t/t7411-submodule-config.sh | 26 +
> >  2 files changed, 65 insertions(+)
> >
> > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> > index 9e8f2acd5..b32110e3b 100644
> > --- a/builtin/submodule--helper.c
> > +++ b/builtin/submodule--helper.c
> > @@ -1825,6 +1825,44 @@ static int is_active(int argc, const char **argv, 
> > const char *prefix)
> > return !is_submodule_active(the_repository, argv[1]);
> >  }
> >
> > +static int config_print_callback(const char *key_, const char *value_, 
> > void *cb_data)
> > +{
> > +   char *key = cb_data;
> > +
> > +   if (!strcmp(key, key_))
> > +   printf("%s\n", value_);
> > +
> > +   return 0;
> > +}
> > +
> > +static int module_config(int argc, const char **argv, const char *prefix)
> > +{
> > +   int ret;
> > +
> > +   if (argc < 2 || argc > 3)
> > +   die("submodule--helper config takes 1 or 2 arguments: name 
> > [value]");
> > +
> > +   /* Equivalent to ACTION_GET in builtin/config.c */
> > +   if (argc == 2) {
> > +   char *key;
> > +
> > +   ret = git_config_parse_key(argv[1], &key, NULL);
> > +   if (ret < 0)
> > +   return CONFIG_INVALID_KEY;
> > +
> > +   config_from_gitmodules(config_print_callback, 
> > the_repository, key);
> > +
> > +   free(key);
> > +   return 0;
> > +   }
> > +
> > +   /* Equivalent to ACTION_SET in builtin/config.c */
> > +   if (argc == 3)
> > +   return config_gitmodules_set(argv[1], argv[2]);
> 
> Ah, here we definitely want to set it in the .gitmodules file?
> (Or does that change later in this series?)
> 
> > +
> > +   return 0;
> > +}
> > +
> >  #define SUPPORT_SUPER_PREFIX (1<<0)
> >
> >  struct cmd_struct {
> > @@ -1850,6 +1888,7 @@ static struct cmd_struct commands[] = {
> > {"push-check", push_check, 0},
> > {"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
> > {"is-active", is_active, 0},
> > +   {"config", module_config, 0},
> >  };
> >
> >  int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
> > diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
> > index a648de6a9..dfe019f05 100755
> > --- a/t/t7411-submodule-config.sh
> > +++ b/t/t7411-submodule-config.sh
> > @@ -139,4 +139,30 @@ test_expect_success 'error in history in 
> > fetchrecursesubmodule lets continue' '
> > )
> >  '
> >
> > +test_expect_success 'reading submodules config with "submodule--helper 
> > config"' '
> > +   (cd super &&
> 
> I think the project prefers a style
> of the cd at the same level of the echo and the following commands.
>

There is mixed style about that, so for new tests in existing files I'd
stick to the predominant style in the file.

For new test files I'll use the recommended style of cd on the same
level of the following commands.

> However we might not need the (cd super && ...) via
> 
>   echo "../submodule"  >expected
>   git -C super ubmodule--helper config submodule.submodule.url >../actual
>   test_cmp expected actual
> 
> Our friends developing git on Windows will thank us for saving
> to spawn a shell as spawning processes is expensive on Windows. :)
> Also we have fewer lines of code.
>

I'll see how that looks, thanks for the suggestion.

Again, I'd use a subshell if that's what the other tests in a file do,
and use "git -C" in new files.

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

[PATCH v3 3/8] upload-pack: test negotiation with changing repository

2018-06-20 Thread Brandon Williams

Add tests to check the behavior of fetching from a repository which
changes between rounds of negotiation (for example, when different
servers in a load-balancing agreement participate in the same stateless
RPC negotiation). This forms a baseline of comparison to the ref-in-want
functionality (which will be introduced to the client in subsequent
commits), and ensures that subsequent commits do not change existing
behavior.

As part of this effort, a mechanism to substitute strings in a single
HTTP response is added.

Signed-off-by: Brandon Williams 
---
 t/lib-httpd.sh |  1 +
 t/lib-httpd/apache.conf|  8 +++
 t/lib-httpd/one-time-sed.sh| 16 ++
 t/t5703-upload-pack-ref-in-want.sh | 92 ++
 4 files changed, 117 insertions(+)
 create mode 100644 t/lib-httpd/one-time-sed.sh

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465..84f8efdd4 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -132,6 +132,7 @@ prepare_httpd() {
cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH"
install_script broken-smart-http.sh
install_script error.sh
+   install_script one-time-sed.sh
 
ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules"
 
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 724d9ae46..fe68d37bb 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -111,9 +111,14 @@ Alias /auth/dumb/ www/auth/dumb/
SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
SetEnv GIT_HTTP_EXPORT_ALL
 
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
 ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1
 ScriptAlias /broken_smart/ broken-smart-http.sh/
 ScriptAlias /error/ error.sh/
+ScriptAliasMatch /one_time_sed/(.*) one-time-sed.sh/$1
 
Options FollowSymlinks
 
@@ -123,6 +128,9 @@ ScriptAlias /error/ error.sh/
 
   Options ExecCGI
 
+
+   Options ExecCGI
+
 
Options ExecCGI
 
diff --git a/t/lib-httpd/one-time-sed.sh b/t/lib-httpd/one-time-sed.sh
new file mode 100644
index 0..a9c4aa5f4
--- /dev/null
+++ b/t/lib-httpd/one-time-sed.sh
@@ -0,0 +1,16 @@
+#!/bin/sh
+
+if [ -e one-time-sed ]; then
+   "$GIT_EXEC_PATH/git-http-backend" >out
+
+   sed "$(cat one-time-sed)" out_modified
+
+   if diff out out_modified >/dev/null; then
+   cat out
+   else
+   cat out_modified
+   rm one-time-sed
+   fi
+else
+   "$GIT_EXEC_PATH/git-http-backend"
+fi
diff --git a/t/t5703-upload-pack-ref-in-want.sh 
b/t/t5703-upload-pack-ref-in-want.sh
index 0ef182970..979ab6d03 100755
--- a/t/t5703-upload-pack-ref-in-want.sh
+++ b/t/t5703-upload-pack-ref-in-want.sh
@@ -150,4 +150,96 @@ test_expect_success 'want-ref with ref we already have 
commit for' '
check_output
 '
 
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo"
+LOCAL_PRISTINE="$(pwd)/local_pristine"
+
+test_expect_success 'setup repos for change-while-negotiating test' '
+   (
+   git init "$REPO" &&
+   cd "$REPO" &&
+   >.git/git-daemon-export-ok &&
+   test_commit m1 &&
+   git tag -d m1 &&
+
+   # Local repo with many commits (so that negotiation will take
+   # more than 1 request/response pair)
+   git clone "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo"; 
"$LOCAL_PRISTINE" &&
+   cd "$LOCAL_PRISTINE" &&
+   git checkout -b side &&
+   for i in $(seq 1 33); do test_commit s$i; done &&
+
+   # Add novel commits to upstream
+   git checkout master &&
+   cd "$REPO" &&
+   test_commit m2 &&
+   test_commit m3 &&
+   git tag -d m2 m3
+   ) &&
+   git -C "$LOCAL_PRISTINE" remote set-url origin 
"http://127.0.0.1:$LIB_HTTPD_PORT/one_time_sed/repo"; &&
+   git -C "$LOCAL_PRISTINE" config protocol.version 2
+'
+
+inconsistency() {
+   # Simulate that the server initially reports $2 as the ref
+   # corresponding to $1, and after that, $1 as the ref corresponding to
+   # $1. This corresponds to the real-life situation where the server's
+   # repository appears to change during negotiation, for example, when
+   # different servers in a load-balancing arrangement serve (stateless)
+   # RPCs during a single negotiation.
+   printf "s/%s/%s/" \
+  $(git -C "$REPO" rev-parse $1 | tr -d "\n") \
+  $(git -C "$REPO" rev-parse $2 | tr -d "\n") \
+  >"$HTTPD_ROOT_PATH/one-time-sed"
+}
+
+test_expect_success 'server is initially ahead - no ref in want' '
+   git -C "$REPO" config uploadpack.allowRefInWant false &&
+   rm -rf local &&
+   cp -r "$LOCAL_PRISTINE" local &&
+   inconsistency master 1234567890123456789012345678901234567890 &&
+   test_must_fail git -C local fetch 2>err &&

Re: [RFC PATCH 03/10] t7411: be nicer to other tests and really clean things up

2018-06-20 Thread Antonio Ospite

On Mon, 14 May 2018 18:23:22 -0700
Stefan Beller  wrote:

> On Mon, May 14, 2018 at 3:58 AM, Antonio Ospite  wrote:
> > Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with
> > invalid lines in .gitmodules but then only the second commit is removed.
> >
> > This may affect subsequent tests if they assume that the .gitmodules
> > file has no errors.
> >
> > Since those commits are not needed anymore remove both of them.
> >
> > Signed-off-by: Antonio Ospite 
> > ---
> >
> > I am putting these fixups to the test-suite before the patch that actually
> > needs them so that the test-suite passes after each commit.
> >
> >  t/t7411-submodule-config.sh | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
> > index 0bde5850a..a648de6a9 100755
> > --- a/t/t7411-submodule-config.sh
> > +++ b/t/t7411-submodule-config.sh
> > @@ -135,7 +135,7 @@ test_expect_success 'error in history in 
> > fetchrecursesubmodule lets continue' '
> > HEAD submodule \
> > >actual &&
> > test_cmp expect_error actual  &&
> > -   git reset --hard HEAD^
> > +   git reset --hard HEAD~2
> > )
> >  '
> 
> As this is the last test in this file, we do not change any subsequent
> tests in a subtle way.
> Good!
> 
> This is
> Reviewed-by: Stefan Beller 
> 
> FYI:
> This test -- of course -- doesn't quite follow the latest coding guidelines,
> as usually we'd prefer a test_when_finished ""
> at the beginning of a test.

I'll keep that in mind for new tests, trying to remember that
'test_when_finished' does not work in a subshell.

BTW I can use 'test_when_finished' here as well, maybe adding a comment
to clarify the the command also cleans up something from a previous
test.

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

RE: [PATCH v3 0/7] allocate cache entries from memory pool

2018-06-20 Thread Jameson Miller

> 
> > We debated several approaches for what to do here
> 
> it would be awesome if the list could participate in the discussion even if 
> only
> read-only.

A bit of delay in my response here, but I like the suggestion. here is a 
summary of
some approaches I considered:

1) Do not include any information about where the cache_entry was allocated.
 Pros: No extra memory overhead
 Cons: Caller is responsible for freeing the cache entry correctly

This was our initial approach. We were hoping that all cache
entries could be allocated from a memory pool, and we would not
have to worry about non-pool allocated entries. There are still a
couple of places that need "temporary" cache entries at the
moment, so we couldn't move completely to only memory pool
allocated cache_entries. This would have resulted in the code
allocating many "temporary" cache_entries from a pool, and the
memory would not be eligible to be reclaimed until the entire
memory pool was freed - and this was a tradeoff we didn't want to
make.

2) Include an extra field encoding whether the cache_entry was
   allocated from a memory pool

Pro: the discard function can now make a decision regarding how
 to free the cache_entry
Con: each cache entry grows by an extra int field.

The bits in the existing ce_flags field are all claimed, so we
need an extra field to track this bit of information. We could
claim just a bit in the field now, which would result in the
cache_entry still growing by the same amount, but any future bits
would not require extra space. This pushes off the work for an
actual bit field off into future work.

3) Encode whether the cache_entry was allocated from a memory
   pool as a bit in a field.

Pro: only a single bit is required to encode this information.
Con: All the existings bits in the existing ce_flags field are
 claimed. We need to add an extra bit field and take the same
 memory growth.

I considered this approach (and am still open to it), but I went
for a simpler initial approach to make sure the overall change is
acceptable. There is no difference in the memory footprint with
this change, so it is only to enable future changes more easily.

4) Include pointer to the memory pool from which this cache_entry
   was created from

Pro: Could (potentially) do some extra bookkeeping, such as
 automatically cleaning up the memory_pool when all
 allocated cache_entries are freed.
Con: extra complexity, larger growth to cache_entry struct to
 accommodate mem_pool pointer

In the end, we didn't see a tangible benefit to this option at this point.

Given the tradeoffs, I went with option #2 for now.

[PATCH v4 2/8] block alloc: add lifecycle APIs for cache_entry structs

2018-06-20 Thread Jameson Miller

Add an API around managing the lifetime of cache_entry
structs. Abstracting memory management details behind this API will
allow for alternative memory management strategies without affecting
all the call sites.  This commit does not change how memory is
allocated or freed. A later commit in this series will allocate cache
entries from memory pools as appropriate.

Motivation:
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made when loading an index.

This API makes a distinction between cache entries that are intended
for use with a particular index and cache entries that are not. This
enables us to use the knowledge about how a cache entry will be used
to make informed decisions about how to handle the corresponding
memory.

Signed-off-by: Jameson Miller 
---
 apply.c| 24 ++---
 blame.c|  5 ++-
 builtin/checkout.c |  8 ++---
 builtin/difftool.c |  6 ++--
 builtin/reset.c|  2 +-
 builtin/update-index.c | 26 ++
 cache.h| 24 -
 merge-recursive.c  |  2 +-
 read-cache.c   | 96 ++
 resolve-undo.c |  4 ++-
 split-index.c  |  8 ++---
 tree.c |  4 +--
 unpack-trees.c | 35 --
 13 files changed, 155 insertions(+), 89 deletions(-)

diff --git a/apply.c b/apply.c
index d79e61591b..1b5d923f4e 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_cache_entry(&result, patch->old_mode, oid.hash, name, 
0, 0);
if (!ce)
return error(_("make_cache_entry failed for path '%s'"),
 name);
if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("could not add %s to temporary index"),
 name);
}
@@ -4263,12 +4263,11 @@ static int add_index_file(struct apply_state *state,
struct stat st;
struct cache_entry *ce;
int namelen = strlen(path);
-   unsigned ce_size = cache_entry_size(namelen);
 
if (!state->update_index)
return 0;
 
-   ce = xcalloc(1, ce_size);
+   ce = make_empty_cache_entry(&the_index, namelen);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
ce->ce_flags = create_ce_flags(0);
@@ -4278,13 +4277,13 @@ static int add_index_file(struct apply_state *state,
 
if (!skip_prefix(buf, "Subproject commit ", &s) ||
get_oid_hex(s, &ce->oid)) {
-   free(ce);
-  return error(_("corrupt patch for submodule %s"), path);
+   discard_cache_entry(ce);
+   return error(_("corrupt patch for submodule %s"), path);
}
} else {
if (!state->cached) {
if (lstat(path, &st) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error_errno(_("unable to stat newly "
 "created file '%s'"),
   path);
@@ -4292,13 +4291,13 @@ static int add_index_file(struct apply_state *state,
fill_stat_cache_info(ce, &st);
}
if (write_object_file(buf, size, blob_type, &ce->oid) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("unable to create backing store "
   "for newly created file %s"), path);
}
}
if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0) {
-   free(ce);
+   discard_cache_entry(ce);
return error(_("unable to add cache entry for %s"), path);
}
 
@@ -4422,27 +4421,26 @@ static int add_conflicted_stages_file(struct 
apply_state *state,
   struct patch *patch)
 {
int stage, namelen;
-   unsigned ce_size, mode;
+   unsigned mode;
struct cache_entry *ce;
 
if (!state->update_index)
return 0;
namelen = strlen(patch->new_name);
-   ce_size = cache_entry_size(namelen);
mode = patch->new_

[PATCH v4 4/8] mem-pool: tweak math on mp_block allocation size

2018-06-20 Thread Jameson Miller

The amount of memory to allocate for mp_blocks was off by a
sizeof(uintmax_t) on platforms that do not support flexible arrays. To
account for this, use the offset of the space field when calculating
the total amount of memory to allocate for a mp_block.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index c80124f1fe..a5d5eed923 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -8,9 +8,11 @@
 static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
 {
struct mp_block *p;
+   size_t total_alloc = st_add(offsetof(struct mp_block, space), 
block_alloc);
+
+   mem_pool->pool_alloc = st_add(mem_pool->pool_alloc, total_alloc);
+   p = xmalloc(total_alloc);
 
-   mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc;
-   p = xmalloc(st_add(sizeof(struct mp_block), block_alloc));
p->next_block = mem_pool->mp_block;
p->next_free = (char *)p->space;
p->end = p->next_free + block_alloc;
-- 
2.14.3

[PATCH v4 6/8] mem-pool: fill out functionality

2018-06-20 Thread Jameson Miller

Add functions for:

- combining two memory pools

- determining if a memory address is within the range managed by a
  memory pool

These functions will be used by future commits.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 42 ++
 mem-pool.h | 13 +
 2 files changed, 55 insertions(+)

diff --git a/mem-pool.c b/mem-pool.c
index 4e544459e9..81fda969d3 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -94,3 +94,45 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   struct mp_block *p;
+
+   /* Check if memory is allocated in a block */
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   struct mp_block *p;
+
+   /* Append the blocks from src to dst */
+   if (dst->mp_block && src->mp_block) {
+   /*
+* src and dst have blocks, append
+* blocks from src to dst.
+*/
+   p = dst->mp_block;
+   while (p->next_block)
+   p = p->next_block;
+
+   p->next_block = src->mp_block;
+   } else if (src->mp_block) {
+   /*
+* src has blocks, dst is empty.
+*/
+   dst->mp_block = src->mp_block;
+   } else {
+   /* src is empty, nothing to do. */
+   }
+
+   dst->pool_alloc += src->pool_alloc;
+   src->pool_alloc = 0;
+   src->mp_block = NULL;
+}
diff --git a/mem-pool.h b/mem-pool.h
index f75b3365d5..adeefdcb28 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -41,4 +41,17 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len);
  */
 void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size);
 
+/*
+ * Move the memory associated with the 'src' pool to the 'dst' pool. The 'src'
+ * pool will be empty and not contain any memory. It still needs to be free'd
+ * with a call to `mem_pool_discard`.
+ */
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src);
+
+/*
+ * Check if a memory pointed at by 'mem' is part of the range of
+ * memory managed by the specified mem_pool.
+ */
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
+
 #endif
-- 
2.14.3

[PATCH v4 7/8] block alloc: allocate cache entries from mem_pool

2018-06-20 Thread Jameson Miller

When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.

This change moves the cache entry allocation to be on top of memory
pools.

Design:

The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool.  This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.

Signed-off-by: Jameson Miller 
---
 cache.h|  21 ++
 mem-pool.c |   3 +-
 read-cache.c   | 119 -
 split-index.c  |  50 
 unpack-trees.c |  13 +--
 5 files changed, 167 insertions(+), 39 deletions(-)

diff --git a/cache.h b/cache.h
index abcc27ff87..74d3ebebc2 100644
--- a/cache.h
+++ b/cache.h
@@ -15,6 +15,7 @@
 #include "path.h"
 #include "sha1-array.h"
 #include "repository.h"
+#include "mem-pool.h"
 
 #include 
 typedef struct git_zstream {
@@ -156,6 +157,7 @@ struct cache_entry {
struct stat_data ce_stat_data;
unsigned int ce_mode;
unsigned int ce_flags;
+   unsigned int mem_pool_allocated;
unsigned int ce_namelen;
unsigned int index; /* for link extension */
struct object_id oid;
@@ -227,6 +229,7 @@ static inline void copy_cache_entry(struct cache_entry *dst,
const struct cache_entry *src)
 {
unsigned int state = dst->ce_flags & CE_HASHED;
+   int mem_pool_allocated = dst->mem_pool_allocated;
 
/* Don't copy hash chain and name */
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
@@ -235,6 +238,9 @@ static inline void copy_cache_entry(struct cache_entry *dst,
 
/* Restore the hash state */
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
+
+   /* Restore the mem_pool_allocated flag */
+   dst->mem_pool_allocated = mem_pool_allocated;
 }
 
 static inline unsigned create_ce_flags(unsigned stage)
@@ -328,6 +334,7 @@ struct index_state {
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
struct ewah_bitmap *fsmonitor_dirty;
+   struct mem_pool *ce_mem_pool;
 };
 
 extern struct index_state the_index;
@@ -362,6 +369,20 @@ extern struct cache_entry 
*make_empty_transient_cache_entry(size_t name_len);
  */
 void discard_cache_entry(struct cache_entry *ce);
 
+/*
+ * Duplicate a cache_entry. Allocate memory for the new entry from a
+ * memory_pool. Takes into account cache_entry fields that are meant
+ * for managing the underlying memory allocation of the cache_entry.
+ */
+struct cache_entry *dup_cache_entry(const struct cache_entry *ce, struct 
index_state *istate);
+
+/*
+ * Validate the cache entries in the index.  This is an internal
+ * consistency check that the cache_entry structs are allocated from
+ * the expected memory pool.
+ */
+void validate_cache_entries(const struct index_state *istate);
+
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
 #define active_cache (the_index.cache)
 #define active_nr (the_index.cache_nr)
diff --git a/mem-pool.c b/mem-pool.c
index 81fda969d3..0f19cc01a9 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -55,7 +55,8 @@ void mem_pool_discard(struct mem_pool *mem_pool)
 {
struct mp_block *block, *block_to_free;
 
-   while ((block = mem_pool->mp_block))
+   block = mem_pool->mp_block;
+   while (block)
{
block_to_free = block;
block = block->next_block;
diff --git a/read-cache.c b/read-cache.c
index 6396e04e45..a8932ce2a6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -46,6 +46,48 @@
 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED)
 
+
+/*
+ * This is an estimate of the pathname length in the index.  We use
+ * this for V4 in

[PATCH v4 8/8] block alloc: add validations around cache_entry lifecyle

2018-06-20 Thread Jameson Miller

Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
 were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.

Signed-off-by: Jameson Miller 
---
 cache.h  |  6 ++
 git.c|  3 +++
 mem-pool.c   |  6 +-
 mem-pool.h   |  2 +-
 read-cache.c | 55 +--
 5 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 74d3ebebc2..c0d6b976f5 100644
--- a/cache.h
+++ b/cache.h
@@ -369,6 +369,12 @@ extern struct cache_entry 
*make_empty_transient_cache_entry(size_t name_len);
  */
 void discard_cache_entry(struct cache_entry *ce);
 
+/*
+ * Check configuration if we should perform extra validation on cache
+ * entries.
+ */
+int should_validate_cache_entries(void);
+
 /*
  * Duplicate a cache_entry. Allocate memory for the new entry from a
  * memory_pool. Takes into account cache_entry fields that are meant
diff --git a/git.c b/git.c
index c2f48d53dd..010898ba6d 100644
--- a/git.c
+++ b/git.c
@@ -414,7 +414,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 
trace_argv_printf(argv, "trace: built-in: git");
 
+   validate_cache_entries(&the_index);
status = p->fn(argc, argv, prefix);
+   validate_cache_entries(&the_index);
+
if (status)
return status;
 
diff --git a/mem-pool.c b/mem-pool.c
index 0f19cc01a9..92d106a637 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -51,7 +51,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size)
*mem_pool = pool;
 }
 
-void mem_pool_discard(struct mem_pool *mem_pool)
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory)
 {
struct mp_block *block, *block_to_free;
 
@@ -60,6 +60,10 @@ void mem_pool_discard(struct mem_pool *mem_pool)
{
block_to_free = block;
block = block->next_block;
+
+   if (invalidate_memory)
+   memset(block_to_free->space, 0xDD, ((char 
*)block_to_free->end) - ((char *)block_to_free->space));
+
free(block_to_free);
}
 
diff --git a/mem-pool.h b/mem-pool.h
index adeefdcb28..999d3c3a52 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -29,7 +29,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t 
initial_size);
 /*
  * Discard a memory pool and free all the memory it is responsible for.
  */
-void mem_pool_discard(struct mem_pool *mem_pool);
+void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory);
 
 /*
  * Alloc memory from the mem_pool.
diff --git a/read-cache.c b/read-cache.c
index a8932ce2a6..2652f2aeb0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2047,8 +2047,10 @@ int discard_index(struct index_state *istate)
 * Cache entries in istate->cache[] should have been allocated
 * from the memory pool associated with this index, or from an
 * associated split_index. There is no need to free individual
-* cache entries.
+* cache entries. validate_cache_entries can detect when this
+* assertion does not hold.
 */
+   validate_cache_entries(istate);
 
resolve_undo_clear_index(istate);
istate->cache_nr = 0;
@@ -2065,13 +2067,45 @@ int discard_index(struct index_state *istate)
istate->untracked = NULL;
 
if (istate->ce_mem_pool) {
-   mem_pool_discard(istate->ce_mem_pool);
+   mem_pool_discard(istate->ce_mem_pool, 
should_validate_cache_entries());
istate->ce_mem_pool = NULL;
}
 
return 0;
 }
 
+/*
+ * Validate the cache entries of this index.
+ * All cache entries associated with this index
+ * should have been allocated by the memory pool
+ * associated with this index, or by a referenced
+ * split index.
+ */
+void validate_cache_entries(const struct index_state *istate)
+{
+   int i;
+
+   if (!should_validate_cache_entries() ||!istate || !istate->initialized)
+   return;
+
+   for (i = 0; i < istate->cache_nr; i++) {
+   if (!istate) {
+   die("internal error: cache entry is not allocated from 
expected memory pool");
+   } else if (!istate->ce_mem_pool ||
+   !mem_pool_contains(istate->ce_mem_pool, 
istate->cache[i])) {
+   if (!istate->split_index ||
+   !istate->split_index->base ||
+   !istate->split_index->base->ce_mem_pool ||
+   
!mem_pool_contains(istate->split_index->base->ce_mem_pool, istate->cache[i])) {
+

[PATCH v4 1/8] read-cache: teach refresh_cache_entry() to take istate

2018-06-20 Thread Jameson Miller

Refactor refresh_cache_entry() to work on a specific index, instead of
implicitly using the_index. This is in preparation for making the
make_cache_entry function apply to a specific index.

Signed-off-by: Jameson Miller 
---
 cache.h   | 2 +-
 merge-recursive.c | 2 +-
 read-cache.c  | 9 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/cache.h b/cache.h
index 89a107a7f7..9538511d9f 100644
--- a/cache.h
+++ b/cache.h
@@ -751,7 +751,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, 
struct stat *st);
 #define REFRESH_IGNORE_SUBMODULES  0x0010  /* ignore submodules */
 #define REFRESH_IN_PORCELAIN   0x0020  /* user friendly output, not "needs 
update" */
 extern int refresh_index(struct index_state *, unsigned int flags, const 
struct pathspec *pathspec, char *seen, const char *header_msg);
-extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned 
int);
+extern struct cache_entry *refresh_cache_entry(struct index_state *, struct 
cache_entry *, unsigned int);
 
 /*
  * Opportunistically update the index but do not complain if we can't.
diff --git a/merge-recursive.c b/merge-recursive.c
index f110e1c5ec..11a767cc72 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -323,7 +323,7 @@ static int add_cacheinfo(struct merge_options *o,
if (refresh) {
struct cache_entry *nce;
 
-   nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
+   nce = refresh_cache_entry(&the_index, ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
if (!nce)
return err(o, _("add_cacheinfo failed to refresh for 
path '%s'; merge aborting."), path);
if (nce != ce)
diff --git a/read-cache.c b/read-cache.c
index 372588260e..fa8366ecab 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
ce->ce_namelen = len;
ce->ce_mode = create_ce_mode(mode);
 
-   ret = refresh_cache_entry(ce, refresh_options);
+   ret = refresh_cache_entry(&the_index, ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
@@ -1473,10 +1473,11 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
return has_errors;
 }
 
-struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
-  unsigned int options)
+struct cache_entry *refresh_cache_entry(struct index_state *istate,
+   struct cache_entry *ce,
+   unsigned int options)
 {
-   return refresh_cache_ent(&the_index, ce, options, NULL, NULL);
+   return refresh_cache_ent(istate, ce, options, NULL, NULL);
 }
 
 
-- 
2.14.3

[PATCH v4 0/8] Allocate cache entries from mem_pool

2018-06-20 Thread Jameson Miller

Changes from V3:

Mainly changes from last round of feedback:

  - Rename make_index_cache_entry -> make_cache_entry

  - Rename make_empty_index_cache_entry -> make_empty-cache_entry

  - Remove tail pointer in mem_pool

  - Small code tweaks

  - More accurately calculate mp_block size for platforms that do not
support flexible arrays


One thing that came up with my testing is that the current automated
tests do not fully cover the code path of "large" allocations from a
memory pool. I was able to force this condition by manually tweaking
some variables and then running the automated tests, but this is not
ideal for preventing regressions in the future.

One way I can think of testing this is to add a test-helper and
directly test the memory pool struct. This will allow me to control
the parameters and different conditions. I was hoping for some
guidance before I actually implemented these tests.

Either way, I would like to do the additional tests in a separate
patch series to have a more focused discussion. I am not sure if these
tests would prevent inclusion of this patch series - I am open to
guidance here.

Base Ref: master
Web-Diff: https://github.com/jamill/git/compare/242ba98e44...667b8de06c

Jameson Miller (8):
  read-cache: teach refresh_cache_entry() to take istate
  block alloc: add lifecycle APIs for cache_entry structs
  mem-pool: only search head block for available space
  mem-pool: tweak math on mp_block allocation size
  mem-pool: add lifecycle management functions
  mem-pool: fill out functionality
  block alloc: allocate cache entries from mem_pool
  block alloc: add validations around cache_entry lifecyle

 apply.c|  24 +++--
 blame.c|   5 +-
 builtin/checkout.c |   8 +-
 builtin/difftool.c |   6 +-
 builtin/reset.c|   2 +-
 builtin/update-index.c |  26 +++--
 cache.h|  53 +-
 git.c  |   3 +
 mem-pool.c | 124 +++
 mem-pool.h |  23 +
 merge-recursive.c  |   4 +-
 read-cache.c   | 259 -
 resolve-undo.c |   4 +-
 split-index.c  |  58 ---
 tree.c |   4 +-
 unpack-trees.c |  40 
 16 files changed, 504 insertions(+), 139 deletions(-)


base-commit: 242ba98e44d8314fb184d240939614a3c9b424db
-- 
2.14.3

[PATCH v4 5/8] mem-pool: add lifecycle management functions

2018-06-20 Thread Jameson Miller

Add initialization and discard functions to mem_pool type. As the
memory allocated by mem_pool can now be freed, we also track the large
memory alllocations so they can be freed as well.

If the there are existing mp_blocks in the mem_pool's linked list of
mp_blocks, then the mp_block for a large allocation is inserted
after the head block. This is because only the head mp_block is considered
when searching for availble space. This results in the following
desirable properties:

1) The mp_block allocated for the large request will not be included
in the search for available space in future requests, as the large
mp_block is sized for the specific request and does not contain any
spare space.

2) The head mp_block will not bumped from considation for future
memory requests just because a request for a large chunk of memory
came in.

These changes are in preparation for a future commit that will utilize
creating and discarding memory pool.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 63 ++
 mem-pool.h | 10 ++
 2 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index a5d5eed923..4e544459e9 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,7 +5,14 @@
 #include "cache.h"
 #include "mem-pool.h"
 
-static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
+#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);
+
+/*
+ * Allocate a new mp_block and insert it after the block specified in
+ * `insert_after`. If `insert_after` is NULL, then insert block at the
+ * head of the linked list.
+ */
+static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc, struct mp_block *insert_after)
 {
struct mp_block *p;
size_t total_alloc = st_add(offsetof(struct mp_block, space), 
block_alloc);
@@ -13,14 +20,51 @@ static struct mp_block *mem_pool_alloc_block(struct 
mem_pool *mem_pool, size_t b
mem_pool->pool_alloc = st_add(mem_pool->pool_alloc, total_alloc);
p = xmalloc(total_alloc);
 
-   p->next_block = mem_pool->mp_block;
p->next_free = (char *)p->space;
p->end = p->next_free + block_alloc;
-   mem_pool->mp_block = p;
+
+   if (insert_after) {
+   p->next_block = insert_after->next_block;
+   insert_after->next_block = p;
+   } else {
+   p->next_block = mem_pool->mp_block;
+   mem_pool->mp_block = p;
+   }
 
return p;
 }
 
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
+{
+   struct mem_pool *pool;
+
+   if (*mem_pool)
+   return;
+
+   pool = xcalloc(1, sizeof(*pool));
+
+   pool->block_alloc = BLOCK_GROWTH_SIZE;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block(pool, initial_size, NULL);
+
+   *mem_pool = pool;
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   struct mp_block *block, *block_to_free;
+
+   while ((block = mem_pool->mp_block))
+   {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p = NULL;
@@ -33,15 +77,10 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
if (mem_pool->mp_block &&
mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len)
p = mem_pool->mp_block;
-
-   if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
-
-   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
-   }
+   else if (len >= (mem_pool->block_alloc / 2))
+   p = mem_pool_alloc_block(mem_pool, len, mem_pool->mp_block);
+   else
+   p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc, NULL);
 
r = p->next_free;
p->next_free += len;
diff --git a/mem-pool.h b/mem-pool.h
index 829ad58ecf..f75b3365d5 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -21,6 +21,16 @@ struct mem_pool {
size_t pool_alloc;
 };
 
+/*
+ * Initialize mem_pool with specified initial size.
+ */
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size);
+
+/*
+ * Discard a memory pool and free all the memory it is responsible for.
+ */
+void mem_pool_discard(struct mem_pool *mem_pool);
+
 /*
  * Alloc memory from the mem_pool.
  */
-- 
2.14.3

[PATCH v4 3/8] mem-pool: only search head block for available space

2018-06-20 Thread Jameson Miller

Instead of searching all memory blocks for available space to fulfill
a memory request, only search the head block. If the head block does
not have space, assume that previous block would most likely not be
able to fulfill request either. This could potentially lead to more
memory fragmentation, but also avoids searching memory blocks that
probably will not be able to fulfill request.

This pattern will benefit consumers that are able to generate a good
estimate for how much memory will be needed, or if they are performing
fixed sized allocations, so that once a block is exhausted it will
never be able to fulfill a future request.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index 389d7af447..c80124f1fe 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -21,16 +21,16 @@ static struct mp_block *mem_pool_alloc_block(struct 
mem_pool *mem_pool, size_t b
 
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
-   struct mp_block *p;
+   struct mp_block *p = NULL;
void *r;
 
/* round up to a 'uintmax_t' alignment */
if (len & (sizeof(uintmax_t) - 1))
len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1));
 
-   for (p = mem_pool->mp_block; p; p = p->next_block)
-   if (p->end - p->next_free >= len)
-   break;
+   if (mem_pool->mp_block &&
+   mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len)
+   p = mem_pool->mp_block;
 
if (!p) {
if (len >= (mem_pool->block_alloc / 2)) {
-- 
2.14.3

[PATCH v2 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'

2018-06-20 Thread Taylor Blau

Teach 'git-grep(1)' a new option, '--column', to show the column
number of the first match on a non-context line. This makes it possible
to teach 'contrib/git-jump/git-jump' how to seek to the first matching
position of a grep match in your editor, and allows similar additional
scripting capabilities.

For example:

  $ git grep -n --column foo | head -n3
  .clang-format:51:14:# myFunction(foo, bar, baz);
  .clang-format:64:7:# int foo();
  .clang-format:75:8:# void foo()

Signed-off-by: Taylor Blau 
---
 Documentation/git-grep.txt |  6 ++-
 builtin/grep.c |  1 +
 t/t7810-grep.sh| 84 ++
 3 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 312409a607..31dc0392a6 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -13,7 +13,7 @@ SYNOPSIS
   [-v | --invert-match] [-h|-H] [--full-name]
   [-E | --extended-regexp] [-G | --basic-regexp]
   [-P | --perl-regexp]
-  [-F | --fixed-strings] [-n | --line-number]
+  [-F | --fixed-strings] [-n | --line-number] [--column]
   [-l | --files-with-matches] [-L | --files-without-match]
   [(-O | --open-files-in-pager) []]
   [-z | --null]
@@ -169,6 +169,10 @@ providing this option will cause it to die.
 --line-number::
Prefix the line number to matching lines.
 
+--column::
+   Prefix the 1-indexed byte-offset of the first match from the start of 
the
+   matching line.
+
 -l::
 --files-with-matches::
 --name-only::
diff --git a/builtin/grep.c b/builtin/grep.c
index ee753a403e..61bcaf6e58 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -828,6 +828,7 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
GREP_PATTERN_TYPE_PCRE),
OPT_GROUP(""),
OPT_BOOL('n', "line-number", &opt.linenum, N_("show line 
numbers")),
+   OPT_BOOL(0, "column", &opt.columnnum, N_("show column number of 
first match")),
OPT_NEGBIT('h', NULL, &opt.pathname, N_("don't show 
filenames"), 1),
OPT_BIT('H', NULL, &opt.pathname, N_("show filenames"), 1),
OPT_NEGBIT(0, "full-name", &opt.relative,
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 1797f632a3..bf0b572dab 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -99,6 +99,90 @@ do
test_cmp expected actual
'
 
+   test_expect_success "grep -w $L (with --column)" '
+   {
+   echo ${HC}file:5:foo mmap bar
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:5:foo mmap bar_mmap
+   echo ${HC}file:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -e mmap $H >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep -w $L (with --column, extended)" '
+   {
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:19:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -e mmap$ --or -e baz $H >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep -w $L (with --column, --invert)" '
+   {
+   echo ${HC}file:1:foo mmap bar
+   echo ${HC}file:1:foo_mmap bar
+   echo ${HC}file:1:foo_mmap bar mmap
+   echo ${HC}file:1:foo mmap bar_mmap
+   } >expected &&
+   git grep --column --invert -w -e baz $H -- file >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep $L (with --column, --invert, extended)" '
+   {
+   echo ${HC}hello_world:6:HeLLo_world
+   } >expected &&
+   git grep --column --invert -e ll --or --not -e _ $H -- 
hello_world \
+   >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep $L (with --column, double-negation)" '
+   {
+   echo ${HC}file:1:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column --not \( --not -e foo --or --not -e baz \) $H 
-- file \
+   >actual &&
+   test_cmp expected actual
+   '
+
+   test_expect_success "grep -w $L (with --column, -C)" '
+   {
+   echo ${HC}file:5:foo mmap bar
+   echo ${HC}file-foo_mmap bar
+   echo ${HC}file:14:foo_mmap bar mmap
+   echo ${HC}file:5:foo mmap bar_mmap
+   echo ${HC}file:14:foo_mmap bar mmap baz
+   } >expected &&
+   git grep --column -w -C1 -e mmap $H >actual &&
+

[PATCH v2 2/7] grep.c: expose {,inverted} match column in match_line()

2018-06-20 Thread Taylor Blau

When calling match_line(), callers presently cannot determine the
relative offset of the match because match_line() discards the
'regmatch_t' that contains this information.

Instead, teach match_line() to take in two 'ssize_t's. Fill the first
with the offset of the match produced by the given expression. If
extended, fill the later with the offset of the match produced as if
--invert were given.

For instance, matching "--not -e x" on this line produces a columnar
offset of 0, (i.e., the whole line does not contain an x), but "--invert
--not -e -x" will fill the later ssize_t of the column containing an
"x", because this expression is semantically equivalent to "-e x".

To determine the column for the inverted and non-inverted case, do the
following:

  - If matching an atom, the non-inverted column is as given from
match_one_pattern(), and the inverted column is unset.

  - If matching a --not, the inverted column and non-inverted column
swap.

  - If matching an --and, or --or, the non-inverted column is the
minimum of the two children, with the exception that --or is
short-circuiting. For instance, if we match "-e a --or -e b" on a
line that contains both "a" and "b" (and "b" comes first), the match
column will hold "a", since we inspected the left child first, and
short-circuited over the right child.

This patch will become useful when we later pick between the two new
results in order to display the column number of the first match on a
line with --column.

Co-authored-by: Jeff King 
Signed-off-by: Taylor Blau 
---
 grep.c | 58 +++---
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/grep.c b/grep.c
index 45ec7e636c..dedfe17f93 100644
--- a/grep.c
+++ b/grep.c
@@ -1248,11 +1248,11 @@ static int match_one_pattern(struct grep_pat *p, char 
*bol, char *eol,
return hit;
 }
 
-static int match_expr_eval(struct grep_expr *x, char *bol, char *eol,
-  enum grep_context ctx, int collect_hits)
+static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, char 
*bol,
+  char *eol, enum grep_context ctx, ssize_t *col,
+  ssize_t *icol, int collect_hits)
 {
int h = 0;
-   regmatch_t match;
 
if (!x)
die("Not a valid grep expression");
@@ -1261,25 +1261,39 @@ static int match_expr_eval(struct grep_expr *x, char 
*bol, char *eol,
h = 1;
break;
case GREP_NODE_ATOM:
-   h = match_one_pattern(x->u.atom, bol, eol, ctx, &match, 0);
+   {
+   regmatch_t tmp;
+   h = match_one_pattern(x->u.atom, bol, eol, ctx,
+ &tmp, 0);
+   if (h && (*col < 0 || tmp.rm_so < *col))
+   *col = tmp.rm_so;
+   }
break;
case GREP_NODE_NOT:
-   h = !match_expr_eval(x->u.unary, bol, eol, ctx, 0);
+   /*
+* Upon visiting a GREP_NODE_NOT, col and icol become swapped.
+*/
+   h = !match_expr_eval(opt, x->u.unary, bol, eol, ctx, icol, col,
+0);
break;
case GREP_NODE_AND:
-   if (!match_expr_eval(x->u.binary.left, bol, eol, ctx, 0))
+   if (!match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col,
+icol, 0))
return 0;
-   h = match_expr_eval(x->u.binary.right, bol, eol, ctx, 0);
+   h = match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col,
+   icol, 0);
break;
case GREP_NODE_OR:
if (!collect_hits)
-   return (match_expr_eval(x->u.binary.left,
-   bol, eol, ctx, 0) ||
-   match_expr_eval(x->u.binary.right,
-   bol, eol, ctx, 0));
-   h = match_expr_eval(x->u.binary.left, bol, eol, ctx, 0);
+   return (match_expr_eval(opt, x->u.binary.left, bol, eol,
+   ctx, col, icol, 0) ||
+   match_expr_eval(opt, x->u.binary.right, bol,
+   eol, ctx, col, icol, 0));
+   h = match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col,
+   icol, 0);
x->u.binary.left->hit |= h;
-   h |= match_expr_eval(x->u.binary.right, bol, eol, ctx, 1);
+   h |= match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col,
+icol, 1);
break;
default:
die("Unexpected node type (internal error) %

[PATCH v2 1/7] Documentation/config.txt: camel-case lineNumber for consistency

2018-06-20 Thread Taylor Blau

lineNumber has casing that is inconsistent with surrounding options,
like color.grep.matchContext, and color.grep.matchSelected. Re-case this
documentation in order to be consistent with the text around it, and to
ensure that new entries are consistent, too.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..58fde4daea 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1181,7 +1181,7 @@ color.grep.::
filename prefix (when not using `-h`)
 `function`;;
function name lines (when using `-p`)
-`linenumber`;;
+`lineNumber`;;
line number prefix (when using `-n`)
 `match`;;
matching text (same as setting `matchContext` and `matchSelected`)
-- 
2.17.0.582.gccdcbd54c

[PATCH v2 6/7] grep.c: add configuration variables to show matched option

2018-06-20 Thread Taylor Blau

To support git-grep(1)'s new option, '--column', document and teach
grep.c how to interpret relevant configuration options, similar to those
associated with '--line-number'.

Signed-off-by: Taylor Blau 
---
 Documentation/config.txt   | 5 +
 Documentation/git-grep.txt | 3 +++
 grep.c | 6 ++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 58fde4daea..e4cbed3078 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1183,6 +1183,8 @@ color.grep.::
function name lines (when using `-p`)
 `lineNumber`;;
line number prefix (when using `-n`)
+`column`;;
+   column number prefix (when using `--column`)
 `match`;;
matching text (same as setting `matchContext` and `matchSelected`)
 `matchContext`;;
@@ -1797,6 +1799,9 @@ gitweb.snapshot::
 grep.lineNumber::
If set to true, enable `-n` option by default.
 
+grep.column::
+   If set to true, enable the `--column` option by default.
+
 grep.patternType::
Set the default matching behavior. Using a value of 'basic', 'extended',
'fixed', or 'perl' will enable the `--basic-regexp`, 
`--extended-regexp`,
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index 31dc0392a6..0de3493b80 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -44,6 +44,9 @@ CONFIGURATION
 grep.lineNumber::
If set to true, enable `-n` option by default.
 
+grep.column::
+   If set to true, enable the `--column` option by default.
+
 grep.patternType::
Set the default matching behavior. Using a value of 'basic', 'extended',
'fixed', or 'perl' will enable the `--basic-regexp`, 
`--extended-regexp`,
diff --git a/grep.c b/grep.c
index d353d5d976..08d3df2855 100644
--- a/grep.c
+++ b/grep.c
@@ -96,6 +96,10 @@ int grep_config(const char *var, const char *value, void *cb)
opt->linenum = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "grep.column")) {
+   opt->columnnum = git_config_bool(var, value);
+   return 0;
+   }
 
if (!strcmp(var, "grep.fullname")) {
opt->relative = !git_config_bool(var, value);
@@ -112,6 +116,8 @@ int grep_config(const char *var, const char *value, void 
*cb)
color = opt->color_function;
else if (!strcmp(var, "color.grep.linenumber"))
color = opt->color_lineno;
+   else if (!strcmp(var, "color.grep.column"))
+   color = opt->color_columnno;
else if (!strcmp(var, "color.grep.matchcontext"))
color = opt->color_match_context;
else if (!strcmp(var, "color.grep.matchselected"))
-- 
2.17.0.582.gccdcbd54c

[PATCH v2 3/7] grep.[ch]: extend grep_opt to allow showing matched column

2018-06-20 Thread Taylor Blau

To support showing the matched column when calling 'git-grep(1)', teach
'grep_opt' the normal set of options to configure the default behavior
and colorization of this feature.

Now that we have opt->columnnum, use it to disable short-circuiting over
ORs so that col and icol are always filled with the earliest matches on
each line. In addition, don't return the first match from match_line(),
for the same reason.

Signed-off-by: Taylor Blau 
---
 grep.c | 33 +++--
 grep.h |  2 ++
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/grep.c b/grep.c
index dedfe17f93..d56d4a3a37 100644
--- a/grep.c
+++ b/grep.c
@@ -46,6 +46,7 @@ void init_grep_defaults(void)
color_set(opt->color_filename, "");
color_set(opt->color_function, "");
color_set(opt->color_lineno, "");
+   color_set(opt->color_columnno, "");
color_set(opt->color_match_context, GIT_COLOR_BOLD_RED);
color_set(opt->color_match_selected, GIT_COLOR_BOLD_RED);
color_set(opt->color_selected, "");
@@ -155,6 +156,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
opt->extended_regexp_option = def->extended_regexp_option;
opt->pattern_type_option = def->pattern_type_option;
opt->linenum = def->linenum;
+   opt->columnnum = def->columnnum;
opt->max_depth = def->max_depth;
opt->pathname = def->pathname;
opt->relative = def->relative;
@@ -164,6 +166,7 @@ void grep_init(struct grep_opt *opt, const char *prefix)
color_set(opt->color_filename, def->color_filename);
color_set(opt->color_function, def->color_function);
color_set(opt->color_lineno, def->color_lineno);
+   color_set(opt->color_columnno, def->color_columnno);
color_set(opt->color_match_context, def->color_match_context);
color_set(opt->color_match_selected, def->color_match_selected);
color_set(opt->color_selected, def->color_selected);
@@ -1284,16 +1287,23 @@ static int match_expr_eval(struct grep_opt *opt, struct 
grep_expr *x, char *bol,
icol, 0);
break;
case GREP_NODE_OR:
-   if (!collect_hits)
+   if (!(collect_hits || opt->columnnum)) {
+   /*
+* Don't short-circuit OR when given --column (or
+* collecting hits) to ensure we don't skip a later
+* child that would produce an earlier match.
+*/
return (match_expr_eval(opt, x->u.binary.left, bol, eol,
ctx, col, icol, 0) ||
match_expr_eval(opt, x->u.binary.right, bol,
eol, ctx, col, icol, 0));
+   }
h = match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col,
icol, 0);
-   x->u.binary.left->hit |= h;
+   if (collect_hits)
+   x->u.binary.left->hit |= h;
h |= match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col,
-icol, 1);
+icol, collect_hits);
break;
default:
die("Unexpected node type (internal error) %d", x->node);
@@ -1316,6 +1326,7 @@ static int match_line(struct grep_opt *opt, char *bol, 
char *eol,
  enum grep_context ctx, int collect_hits)
 {
struct grep_pat *p;
+   int hit = 0;
 
if (opt->extended)
return match_expr(opt, bol, eol, ctx, col, icol,
@@ -1325,11 +1336,21 @@ static int match_line(struct grep_opt *opt, char *bol, 
char *eol,
for (p = opt->pattern_list; p; p = p->next) {
regmatch_t tmp;
if (match_one_pattern(p, bol, eol, ctx, &tmp, 0)) {
-   *col = tmp.rm_so;
-   return 1;
+   hit |= 1;
+   if (!opt->columnnum) {
+   /*
+* Without --column, any single match on a line
+* is enough to know that it needs to be
+* printed. With --column, scan _all_ patterns
+* to find the earliest.
+*/
+   break;
+   }
+   if (*col < 0 || tmp.rm_so < *col)
+   *col = tmp.rm_so;
}
}
-   return 0;
+   return hit;
 }
 
 static int match_next_pattern(struct grep_pat *p, char *bol, char *eol,
diff --git a/grep.h b/grep.h
index 399381c908..08a0b391c5 100644
--- a/grep.h
+++ b/grep.h
@@ -127,6 +127,7 @@ struct grep_opt {
int prefix_length;
regex_t regexp;
int linenum;
+

[PATCH v2 4/7] grep.c: display column number of first match

2018-06-20 Thread Taylor Blau

To prepare for 'git grep' learning '--column', teach grep.c's
show_line() how to show the column of the first match on non-context
lines.

Signed-off-by: Taylor Blau 
---
 grep.c | 33 -
 1 file changed, 28 insertions(+), 5 deletions(-)

diff --git a/grep.c b/grep.c
index d56d4a3a37..d353d5d976 100644
--- a/grep.c
+++ b/grep.c
@@ -1399,7 +1399,7 @@ static int next_match(struct grep_opt *opt, char *bol, 
char *eol,
 }
 
 static void show_line(struct grep_opt *opt, char *bol, char *eol,
- const char *name, unsigned lno, char sign)
+ const char *name, unsigned lno, ssize_t cno, char sign)
 {
int rest = eol - bol;
const char *match_color, *line_color = NULL;
@@ -1434,6 +1434,17 @@ static void show_line(struct grep_opt *opt, char *bol, 
char *eol,
output_color(opt, buf, strlen(buf), opt->color_lineno);
output_sep(opt, sign);
}
+   /*
+* Treat 'cno' as the 1-indexed offset from the start of a non-context
+* line to its first match. Otherwise, 'cno' is 0 indicating that we are
+* being called with a context line.
+*/
+   if (opt->columnnum && cno) {
+   char buf[32];
+   xsnprintf(buf, sizeof(buf), "%zu", cno);
+   output_color(opt, buf, strlen(buf), opt->color_columnno);
+   output_sep(opt, sign);
+   }
if (opt->color) {
regmatch_t match;
enum grep_context ctx = GREP_CONTEXT_BODY;
@@ -1539,7 +1550,7 @@ static void show_funcname_line(struct grep_opt *opt, 
struct grep_source *gs,
break;
 
if (match_funcname(opt, gs, bol, eol)) {
-   show_line(opt, bol, eol, gs->name, lno, '=');
+   show_line(opt, bol, eol, gs->name, lno, 0, '=');
break;
}
}
@@ -1604,7 +1615,7 @@ static void show_pre_context(struct grep_opt *opt, struct 
grep_source *gs,
 
while (*eol != '\n')
eol++;
-   show_line(opt, bol, eol, gs->name, cur, sign);
+   show_line(opt, bol, eol, gs->name, cur, 0, sign);
bol = eol + 1;
cur++;
}
@@ -1803,6 +1814,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
while (left) {
char *eol, ch;
int hit;
+   ssize_t cno;
ssize_t col = -1, icol = -1;
 
/*
@@ -1868,7 +1880,18 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
show_pre_context(opt, gs, bol, eol, lno);
else if (opt->funcname)
show_funcname_line(opt, gs, bol, lno);
-   show_line(opt, bol, eol, gs->name, lno, ':');
+   cno = opt->invert ? icol : col;
+   if (cno < 0) {
+   /*
+* A negative cno indicates that there was no
+* match on the line. We are thus inverted and
+* being asked to show all lines that _don't_
+* match a given expression. Therefore, set cno
+* to 0 to suggest the whole line matches.
+*/
+   cno = 0;
+   }
+   show_line(opt, bol, eol, gs->name, lno, cno + 1, ':');
last_hit = lno;
if (opt->funcbody)
show_function = 1;
@@ -1897,7 +1920,7 @@ static int grep_source_1(struct grep_opt *opt, struct 
grep_source *gs, int colle
/* If the last hit is within the post context,
 * we need to show this line.
 */
-   show_line(opt, bol, eol, gs->name, lno, '-');
+   show_line(opt, bol, eol, gs->name, lno, col + 1, '-');
}
 
next_line:
-- 
2.17.0.582.gccdcbd54c

[PATCH v2 7/7] contrib/git-jump/git-jump: jump to exact location

2018-06-20 Thread Taylor Blau

Take advantage of 'git-grep(1)''s new option, '--column' in order to
teach Peff's 'git-jump' script how to jump to the correct column for any
given match.

'git-grep(1)''s output is in the correct format for Vim's jump list, so
no additional cleanup is necessary.

Signed-off-by: Taylor Blau 
---
 contrib/git-jump/README   | 12 ++--
 contrib/git-jump/git-jump |  2 +-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/contrib/git-jump/README b/contrib/git-jump/README
index 4484bda410..2f618a7f97 100644
--- a/contrib/git-jump/README
+++ b/contrib/git-jump/README
@@ -25,6 +25,13 @@ git-jump will feed this to the editor:
 foo.c:2: printf("hello word!\n");
 ---
 
+Or, when running 'git jump grep', column numbers will also be emitted,
+e.g. `git jump grep "hello"` would return:
+
+---
+foo.c:2:9: printf("hello word!\n");
+---
+
 Obviously this trivial case isn't that interesting; you could just open
 `foo.c` yourself. But when you have many changes scattered across a
 project, you can use the editor's support to "jump" from point to point.
@@ -35,7 +42,8 @@ Git-jump can generate four types of interesting lists:
 
   2. The beginning of any merge conflict markers.
 
-  3. Any grep matches.
+  3. Any grep matches, including the column of the first match on a
+ line.
 
   4. Any whitespace errors detected by `git diff --check`.
 
@@ -82,7 +90,7 @@ which does something similar to `git jump grep`. However, it 
is limited
 to positioning the cursor to the correct line in only the first file,
 leaving you to locate subsequent hits in that file or other files using
 the editor or pager. By contrast, git-jump provides the editor with a
-complete list of files and line numbers for each match.
+complete list of files, lines, and a column number for each match.
 
 
 Limitations
diff --git a/contrib/git-jump/git-jump b/contrib/git-jump/git-jump
index 80ab0590bc..931b0fe3a9 100755
--- a/contrib/git-jump/git-jump
+++ b/contrib/git-jump/git-jump
@@ -52,7 +52,7 @@ mode_merge() {
 # editor shows them to us in the status bar.
 mode_grep() {
cmd=$(git config jump.grepCmd)
-   test -n "$cmd" || cmd="git grep -n"
+   test -n "$cmd" || cmd="git grep -n --column"
$cmd "$@" |
perl -pe '
s/[ \t]+/ /g;
-- 
2.17.0.582.gccdcbd54c

[PATCH v2 0/7] grep.c: teach --column to 'git-grep(1)'

2018-06-20 Thread Taylor Blau

Hi,

Here is a re-roll of my series to add --column to 'git-grep(1)'. Since
last time, not much has changed other than the following:

  - Fix a typo where 'col', 'icol' were spelled as 'match', 'imatch'
[1].

  - Disable short-circuiting OR when --column is given [2].

  - Disable early-return in match_line() when multiple patterns are
given and --column is, too [3].

  - Add some more tests in t7810.

Thanks again for your kind and through review; hopefully this re-roll
should be OK to queue as-is.


Thanks,
Taylor

[1]: https://public-inbox.org/git/xmqqwouuvi0e@gitster-ct.c.googlers.com/
[2]: https://public-inbox.org/git/20180619174452.ga47...@syl.attlocal.net/
[3]: https://public-inbox.org/git/80b9a0b1-3849-7097-fe1a-dd80835d6...@web.de/

Taylor Blau (7):
  Documentation/config.txt: camel-case lineNumber for consistency
  grep.c: expose {,inverted} match column in match_line()
  grep.[ch]: extend grep_opt to allow showing matched column
  grep.c: display column number of first match
  builtin/grep.c: add '--column' option to 'git-grep(1)'
  grep.c: add configuration variables to show matched option
  contrib/git-jump/git-jump: jump to exact location

 Documentation/config.txt   |   7 ++-
 Documentation/git-grep.txt |   9 ++-
 builtin/grep.c |   1 +
 contrib/git-jump/README|  12 +++-
 contrib/git-jump/git-jump  |   2 +-
 grep.c | 126 -
 grep.h |   2 +
 t/t7810-grep.sh|  84 +
 8 files changed, 210 insertions(+), 33 deletions(-)

Inter-diff (since v1):

diff --git a/grep.c b/grep.c
index 8ffa94c791..08d3df2855 100644
--- a/grep.c
+++ b/grep.c
@@ -1257,8 +1257,8 @@ static int match_one_pattern(struct grep_pat *p, char 
*bol, char *eol,
return hit;
 }

-static int match_expr_eval(struct grep_expr *x, char *bol, char *eol,
-  enum grep_context ctx, ssize_t *col,
+static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, char 
*bol,
+  char *eol, enum grep_context ctx, ssize_t *col,
   ssize_t *icol, int collect_hits)
 {
int h = 0;
@@ -1280,29 +1280,36 @@ static int match_expr_eval(struct grep_expr *x, char 
*bol, char *eol,
break;
case GREP_NODE_NOT:
/*
-* Upon visiting a GREP_NODE_NOT, imatch and match become
-* swapped.
+* Upon visiting a GREP_NODE_NOT, col and icol become swapped.
 */
-   h = !match_expr_eval(x->u.unary, bol, eol, ctx, icol, col, 0);
+   h = !match_expr_eval(opt, x->u.unary, bol, eol, ctx, icol, col,
+0);
break;
case GREP_NODE_AND:
-   if (!match_expr_eval(x->u.binary.left, bol, eol, ctx, col,
+   if (!match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col,
 icol, 0))
return 0;
-   h = match_expr_eval(x->u.binary.right, bol, eol, ctx, col,
+   h = match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col,
icol, 0);
break;
case GREP_NODE_OR:
-   if (!collect_hits)
-   return (match_expr_eval(x->u.binary.left, bol, eol, ctx,
-   col, icol, 0) ||
-   match_expr_eval(x->u.binary.right, bol, eol,
-   ctx, col, icol, 0));
-   h = match_expr_eval(x->u.binary.left, bol, eol, ctx, col,
+   if (!(collect_hits || opt->columnnum)) {
+   /*
+* Don't short-circuit OR when given --column (or
+* collecting hits) to ensure we don't skip a later
+* child that would produce an earlier match.
+*/
+   return (match_expr_eval(opt, x->u.binary.left, bol, eol,
+   ctx, col, icol, 0) ||
+   match_expr_eval(opt, x->u.binary.right, bol,
+   eol, ctx, col, icol, 0));
+   }
+   h = match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col,
icol, 0);
-   x->u.binary.left->hit |= h;
-   h |= match_expr_eval(x->u.binary.right, bol, eol, ctx, col,
-icol, 1);
+   if (collect_hits)
+   x->u.binary.left->hit |= h;
+   h |= match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col,
+icol, collect_hits);
break;
default:
die("Unexpected node type (internal error) %d", x->node);
@@ -1317

Re: [RFC PATCH 01/10] config: make config_from_gitmodules generally useful

2018-06-20 Thread Stefan Beller

Hi Antonio!

On Wed, Jun 20, 2018 at 11:06 AM Antonio Ospite  wrote:
> I get that the _content_ of .gitmodules is not meant to be generic
> config, but I still see some value in having a single point when its
> _location_ is decided.

I agree that a single point for the _location_ as well as the _order_
(in case there will be multiple files; as of now we have the layering
of .gitmodules overlayed by .git/config; IIRC I explained having
an intermediate layer in our conversation to be useful; See one of the latest
bug reports[1], where an intermediate layer outside a single branch would
prove to be useful.) parsing are useful.

[1] 
https://public-inbox.org/git/db6pr0101mb2344d682511891e4e9528598d9...@db6pr0101mb2344.eurprd01.prod.exchangelabs.com/

Sorry for not explaining my point of view well enough, let me try again:

Historically we did not store any config in the repository itself.
There are some exceptions, but these configurations are content related,
i.e. .gitmodules or .gitattributes can tell you meta data about the
content itself.

However other config was kept out: One could have a .gitconfig that
pre-populates the .git/config file, right? That was intentionally avoided
as there are many configurations that are easy to abuse security wise,
e.g. setting core.pager = "rm -rf /"

And then submodules entered the big picture. Submodules need quite
a lot of configuration, and hence the .gitmodules file was born. Initially
IIRC there was only a very simple config like url store:

  [submodule.]
url = 

and that was it as it was just like the .gitignore and .gitattributes
related to the content and transporting this configuration with the
content itself seemed so natural.

However then a lot of settings were invented for submodules and some
made it into the .gitmodules file; only recently there was a discussion
on list how these settings may or may not pose a security issue. It turns out
we had a CVE (with sumodule names) and that got fixed but we really want
to keep the .gitmodules file simple and ignore any additional things in there
as they may pose security issues later.

With that said, how would you write the code while keeping this in mind?
If you look at builtin/submodule--helper.c and builtin/fetch.c, you'll see that
they use

  config_from_gitmodules(their_parse_function);
  git_config(their_parse_function);

and config_from_gitmodules is documented to not be expanded;
the config_from_gitmodules is singled out to treat those settings
that snuck in and are kept around as we don't want to break existing
users. But we'd really not want to expand the use of that function
again for its implications on security. Right now it is nailed down beautifully
and it is easy to tell which commands actually look at config from
the .gitmodules file (not to be confused with the submodule parsing,
that is in submodule-config.{h, c}. That is actually about finding
submodule specific name/path/url etc instead of the generic
"submodule.fetchjobs" or "fetch.recursesubmodules".

So far about the background story. I would rather replicate the code in
repo_read_gitmodules in the submodule-config.c as to mix those
two lines (reading generic config vs. reading submodule specific config,
like name/url/path). And to not mix them, I would not reuse that function
but rather replicate (or have a static helper) in submodule helper,
as then we cannot pass in an arbitrary config parsing callback to
that, but are bound to the submodule helper code.

> > I think extending 'repo_read_gitmodules' makes sense, as that is
> > used everywhere for the loading of submodule configuration.
>
> I would follow Brandon's suggestion here and still use
> 'config_from_gitmodules' from 'repo_read_gitmodules' to avoid code
> duplication.
>
> I will try to be clear in the comments and in commit message when
> motivating the decision.

Rereading what Brandon said, I agree with this approach, sorry for writing
out the story above in such lengthy detail.

Thanks for picking up this series again!
Stefan

Re: [RFC PATCH 02/10] submodule: factor out a config_gitmodules_set function

2018-06-20 Thread Antonio Ospite

On Mon, 14 May 2018 18:20:21 -0700
Stefan Beller  wrote:

> On Mon, May 14, 2018 at 3:58 AM, Antonio Ospite  wrote:
> > Introduce a new config_gitmodules_set function to write config values to the
> > .gitmodules file.
> >
> > This is in preparation for a future change which will use the function
> > to write to the .gitmodules file in a more controlled way instead of
> > using "git config -f .gitmodules".
> >
> > Signed-off-by: Antonio Ospite 
> > ---
> >
> > Not sure about the name, and maybe it can go in config.c for symmetry with
> > config_from_gitmodules?
> 
> What is the function about (in the end state and now) ?
> is it more of a
> * configure_submodules_config()
>   which would convey it is a generic function to configure submodules
>   (i.e. it may not even write to *the* .gitmodules file but somewhere else,
>   such as a helper ref)
>   This doesn't sound like it as we make use of the function in
>   update_path_in_gitmodules() that is used from git-mv, which would want
>   to ensure that the specific .gitmodules file is changed.
> * gitmodules_file_set()
>   that focuses on the specific file that we want to modify?
> * ...
> 
> Let's continue reading the series to see the end state for
> a good name.
> 

OK, that helped to clarify one point: eventually there will be some
asymmetry between reading the submodule config and writing it.

1. reading will be either form .gitmodules or HEAD:.gitmodules;
2. writing will be only to .gitmodules.

I'll try to consider that when naming the functions.

If a more generic mechanism to write the submodules configuration is
added in the future (e.g. the special ref you were mentioning) the
functions can always be renamed accordingly.

Thanks,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

Re: [RFC PATCH 01/10] config: make config_from_gitmodules generally useful

2018-06-20 Thread Antonio Ospite

On Mon, 14 May 2018 11:19:28 -0700
Brandon Williams  wrote:

Hi Brandon,

sorry for the delay, some comments below.

> On 05/14, Antonio Ospite wrote:
> > The config_from_gitmodules() function is a good candidate for
> > a centralized point where to read the gitmodules configuration file.
> > 
> > Add a repo argument to it to make the function more general, and adjust
> > the current callers in cmd_fetch and update-clone.
> > 
> > As a proof of the utility of the change, start using the function also
> > in repo_read_gitmodules which was basically doing the same operations.
> > 
> > Signed-off-by: Antonio Ospite 
> > ---
> >  builtin/fetch.c |  2 +-
> >  builtin/submodule--helper.c |  2 +-
> >  config.c| 13 +++--
> >  config.h| 10 +-
> >  submodule-config.c  | 16 
> >  5 files changed, 14 insertions(+), 29 deletions(-)
> > 
> > diff --git a/builtin/fetch.c b/builtin/fetch.c
> > index 7ee83ac0f..a67ee7c39 100644
> > --- a/builtin/fetch.c
> > +++ b/builtin/fetch.c
> > @@ -1445,7 +1445,7 @@ int cmd_fetch(int argc, const char **argv, const char 
> > *prefix)
> > for (i = 1; i < argc; i++)
> > strbuf_addf(&default_rla, " %s", argv[i]);
> >  
> > -   config_from_gitmodules(gitmodules_fetch_config, NULL);
> > +   config_from_gitmodules(gitmodules_fetch_config, the_repository, NULL);
> > git_config(git_fetch_config, NULL);
> >  
> > argc = parse_options(argc, argv, prefix,
> > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> > index c2403a915..9e8f2acd5 100644
> > --- a/builtin/submodule--helper.c
> > +++ b/builtin/submodule--helper.c
> > @@ -1602,7 +1602,7 @@ static int update_clone(int argc, const char **argv, 
> > const char *prefix)
> > };
> > suc.prefix = prefix;
> >  
> > -   config_from_gitmodules(gitmodules_update_clone_config, &max_jobs);
> > +   config_from_gitmodules(gitmodules_update_clone_config, the_repository, 
> > &max_jobs);
> > git_config(gitmodules_update_clone_config, &max_jobs);
> >  
> > argc = parse_options(argc, argv, prefix, module_update_clone_options,
> > diff --git a/config.c b/config.c
> > index 6f8f1d8c1..8ffe29330 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -2173,17 +2173,18 @@ int git_config_get_pathname(const char *key, const 
> > char **dest)
> >  }
> >  
> >  /*
> > - * Note: This function exists solely to maintain backward compatibility 
> > with
> > - * 'fetch' and 'update_clone' storing configuration in '.gitmodules' and 
> > should
> > - * NOT be used anywhere else.
> > + * Note: Initially this function existed solely to maintain backward
> > + * compatibility with 'fetch' and 'update_clone' storing configuration in
> > + * '.gitmodules' but it turns out it can be useful as a centralized point 
> > to
> > + * read the gitmodules config file.
> 
> I'm all for what you're trying to accomplish in this patch series but I
> think a little more care needs to be taken here.  Maybe about a year ago
> I did some refactoring with how the gitmodules file was loaded and it
> was decided that allowing arbitrary configuration in the .gitmodules
> file was a bad thing, so I tried to make sure that it was very difficult
> to sneak in more of that and limiting it to the places where it was
> already done (fetch and update_clone).  Now this patch is explicitly
> changing the comment on this function to loosen the requirements I made
> when it was introduced, which could be problematic in the future.
>

Yes, it was a little inconsiderate of me, sorry.

> So here's what I suggest doing:
>   1. Move this function from config.{c,h} to submodule-config.{c,h} to
>  make it clear who owns this.
>   2. Move the gitmodules_set function you created to live in submodule-config.
>   3. You can still use this function as the main driver for reading the
>  submodule config BUT the comment above the function in both the .c and
>  .h files should be adapted so that it is clearly spells out that the
>  function is to be used only by the submodule config code (reading it
>  in repo_read_gitmodules and reading/writing it in the
>  submodule-helper config function you've added) and that the only
>  exceptions to this are to maintain backwards compatibility with the
>  existing callers and that new callers shouldn't be added.
>

I fully agree, I am going to send a new RFC soon.

> This is just a long way to say that if new callers to this function are
> added in the future that it is made very clear in the code that the
> .gitmodules file exists for a specific purpose and that it shouldn't be
> exploited to ship config with a repository. (If we end up allowing
> config to be shipped with a repository at a later date that will need to
> be handled with great care due to security concerns).
>

Got it.

> Thanks for working on this, the end result is definitely a step in the
> direction I've wanted the submodule config to he

Re: [RFC PATCH 01/10] config: make config_from_gitmodules generally useful

2018-06-20 Thread Antonio Ospite

On Mon, 14 May 2018 18:05:19 -0700
Stefan Beller  wrote:

Hi again Stefan,

> On Mon, May 14, 2018 at 3:58 AM, Antonio Ospite  wrote:
> > The config_from_gitmodules() function is a good candidate for
> > a centralized point where to read the gitmodules configuration file.
> 
> It is very tempting to use that function. However it was introduced
> specifically to not do that. ;)
> 
> See the series that was merged at 5aa0b6c506c (Merge branch
> 'bw/grep-recurse-submodules', 2017-08-22), specifically
> f20e7c1ea24 (submodule: remove submodule.fetchjobs from
> submodule-config parsing, 2017-08-02), where both
> builtin/fetch as well as the submodule helper use the pattern to
> read from the .gitmodules file va this function and then overlay it
> with the read from config.
>

I get that the _content_ of .gitmodules is not meant to be generic
config, but I still see some value in having a single point when its
_location_ is decided.

> > Add a repo argument to it to make the function more general, and adjust
> > the current callers in cmd_fetch and update-clone.
> 
> This could be a preparatory patch, but including it here is fine, too.
>

I agree, having this as a preparatory change will help focusing the
review, thanks.

> > As a proof of the utility of the change, start using the function also
> > in repo_read_gitmodules which was basically doing the same operations.
> 
> I think they were separated for the reason outlined above, or what Brandon
> said in his reply.
>
> I think extending 'repo_read_gitmodules' makes sense, as that is
> used everywhere for the loading of submodule configuration.

I would follow Brandon's suggestion here and still use
'config_from_gitmodules' from 'repo_read_gitmodules' to avoid code
duplication.

I will try to be clear in the comments and in commit message when
motivating the decision.

Thanks for the review Stefan.

Ciao,
   Antonio

-- 
Antonio Ospite
https://ao2.it
https://twitter.com/ao2it

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?

Re: [RFC PATCH v2 1/7] git-rebase.txt: document incompatible options

2018-06-20 Thread Elijah Newren

Hi Eric,

On Sun, Jun 17, 2018 at 10:15 AM, Eric Sunshine  wrote:
> On Sun, Jun 17, 2018 at 1:59 AM Elijah Newren  wrote:
>> git rebase has many options that only work with one of its three backends.
>> It also has a few other pairs of incompatible options.  Document these.
>>
>> Signed-off-by: Elijah Newren 
>> ---
>> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
>> @@ -487,6 +510,51 @@ recreates the topic branch with fresh commits so it can 
>> be remerged
>> +Flags only understood by the interactive backend:
>> + [...]
>> + * --edit-todo
>> + * --root + --onto
>
> What does "+" mean? Is it "and"?

Yes, it means "and", though perhaps it'd be even clearer to write "in
combination with" or "when used in combination with"

>> +Other incompatible flag pairs:
>> +
>> + * --preserve-merges && --interactive
>> + * --preserve-merges && --signoff
>> + * --preserve-merges && --rebase-merges
>> + * --rebase-merges && --strategy
>
> It's somewhat odd seeing "programmer" notation in end-user
> documentation. Perhaps replace "&&" with "and"?

Yes, good point.  I'll make that fix.

Re: [RFC PATCH v2 1/7] git-rebase.txt: document incompatible options

2018-06-20 Thread Elijah Newren

On Sun, Jun 17, 2018 at 8:38 AM, Phillip Wood  wrote:
>> +Other incompatible flag pairs:
>> +
>> + * --preserve-merges && --interactive
>> + * --preserve-merges && --signoff
>> + * --preserve-merges && --rebase-merges
>> + * --rebase-merges && --strategy
>
> Does --rebase-merges support --strategy-options?

Good catch.  I hadn't yet tested --rebase-merges or --preserve-merges
myself, so I was just going off notes in the documentation or code for
those ones.  I created a simple case needing -Xignore-space-change to
merge cleanly, and found that any --strategy-option argument will be
silently ignored by --rebase-merges currently.  So, we should also
document and warn the user that those options are currently
incompatible.

>> +
>>   include::merge-strategies.txt[]
>> NOTES
>>
> This is a great step forward. In the long run it would be nice not to have
> to mention the backends. Once git-rebase--merge.sh is gone, the situation is
> simpler and it would be nice to reword this to say something like
>   The --committer-date-is-author-date, --ignore-date, --whitespace,
>   --ignore-whitespace and -C options are incompatible with the following
>   --exec ... Also note that --rebase-merges is incompatible with
>   --strategy; and --preserve-merges is incompatible with --interactive,
>   --signoff, --rebase-merges.
> rather than talking about am options etc.

That sounds like it would be simpler for the incompatible options, but
what about discussing inconsistent behavior between different backends
(empty commit handling, empty commit message handling, and directory
rename detection)?  Do we first need to remove all those
inconsistencies before rewording the documentation here, or is there a
clever way to discuss those?

Re: [RFC PATCH v2 7/7] git-rebase: make --allow-empty-message the default

2018-06-20 Thread Elijah Newren

Hi Phillip,

On Sun, Jun 17, 2018 at 8:30 AM, Phillip Wood  wrote:

> Looking at the option parsing in git-rebase.sh it appears that it does not
> check for --no-allow-empty-message so there's no way to turn off the default
> for those modes that support it. I'm not sure what to think of this change,
> I'm slightly uneasy with changing to default to be different from
> cherry-pick, though one could argue rebasing is a different operation and
> just keeping the existing messages even if they are empty is more
> appropriate and having all the rebase modes do the default to the same
> behaviour is definitely an improvement.
>

Here's my extended rationale for why to change the default for rebase
-i/-m (and _maybe_ also for cherry-pick):

First, commits with an empty commit message are fairly rare, so it's
hard to gather that much data on; we're dealing with out-liers.  That
also means that folks who have touched the area in the past probably
weren't dealing with full information either; they just tweaked what
was already there.

It makes sense that git-commit would default to not allowing empty
commit messages without a flag.  We'd like to warn against that kind
of poor practice.  However, that implicitly means that commands which
build on top of git-commit might copy that default even when it
doesn't make sense.  So it's not at all clear to me that cherry-pick
or rebase -i/-m should have the same default.  They do, but that may
have been entirely accidental.  And when someone comes along and
notices the problematic default of stopping with empty commit
messages, they assume it was intended and add a flag to change it, as
happened for both commands.

On the other hand git-am was designed from the workflow of applying
many patches from the beginning, and has a different default.
am-based rebases copy that default of silently applying commits with
empty commit messages; it may not be clear that rebase should copy the
default from am, but it does.  Yet, despite the fact that ignoring
empty commit messages is the default backend for rebases, and thus is
likely used more than the other rebase backends, no one has ever
submitted a patch to add a flag to rebase to get the opposite
behavior.  To me, this argues pretty strongly that not only is
ignoring empty messages the right default, that it's a waste of time
to implement a flag (e.g. --no-allow-empty-message) providing the
opposite behavior for rebases.

It is obvious that one of the two rebase backends have the wrong
default, because they should have the same default as each other.  I
am inclined to believe, based on the above logic, that am-based rebase
has the correct default.  I also suspect that cherry-pick has the
wrong default, though it's not as cut-and-dry since it doesn't have
multiple backends with conflicting defaults.

Does that make sense?  Am I overlooking anything important?

Re: [PATCH 06/23] midx: struct midxed_git and 'read' subcommand

2018-06-20 Thread Derrick Stolee


On 6/20/2018 11:07 AM, Duy Nguyen wrote:

On Wed, Jun 20, 2018 at 3:33 PM Derrick Stolee  wrote:

On 6/7/2018 2:31 PM, Duy Nguyen wrote:

On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee  wrote:

diff --git a/Documentation/git-midx.txt b/Documentation/git-midx.txt
index dcaeb1a91b..919283fdd8 100644
--- a/Documentation/git-midx.txt
+++ b/Documentation/git-midx.txt
@@ -23,6 +23,11 @@ OPTIONS
  /packs/multi-pack-index for the current MIDX file, and
  /packs for the pack-files to index.

+read::
+   When given as the verb, read the current MIDX file and output
+   basic information about its contents. Used for debugging
+   purposes only.

On second thought. If you just need a temporary debugging interface,
adding a program in t/helper may be a better option. In the end we
might still need 'read' to dump a file out, but we should have some
stable output format (and json might be a good choice).

My intention with this 'read' pattern in the MIDX (and commit-graph) is
two-fold:

1. We can test that we are writing the correct data in our test suite. A
test-tool builtin would suffice for this purpose.

2. We can help trouble-shoot users who may be having trouble with their
MIDX files. Having the subcommand in a plumbing command allows us to do
this in the shipped versions of Git.

Maybe this second purpose isn't enough to justify the feature in Git and
we should move this to the test-tool, especially with the 'verify' mode
coming in a second series. Note that a 'verify' mode doesn't satisfy
item (1).

Yeah I think normally we just have some "fsck" thing to verify when
things go bad. If you need more than that I think you just ask the
user to send the .midx to you (with full understanding of potentially
revealing confidential info and stuff). It'll be faster than
instructing them to "run this command", "ok, run another command"
I thought of suggesting a command to dump the midx file in readable
form (like json), but I think if fsck fails then chances of that
command successfully dumping may be very low.

Either way, if the command is meant for troubleshooting, I think it
should be added at the end when the whole midx file is implemented and
understood and we see what we need to troubleshoot. Adding small
pieces of changes from patch to patch makes it really hard to see if
it helps troubleshooting at all, it just helps the first purpose.


I'll abandon point (2) for the later 'verify' patch series. Adding the 
test helper early allows tests to demonstrate that each patch does the 
right thing, and that we don't miss testing something.


Thanks,
-Stolee

[PATCH] packfile: generalize pack directory list

2018-06-20 Thread Derrick Stolee

In anticipation of sharing the pack directory listing with the
multi-pack-index, generalize prepare_packed_git_one() into
for_each_file_in_pack_dir().

Signed-off-by: Derrick Stolee 
---

Duy,

I think this is what you mean by sharing code between packfile.c and
midx.c for reading the packfiles from a pack directory. This does make
the code in midx.c much simpler. Is this change worth it?

This patch could stand on its own, or can be incorporated into the next
version of the MIDX series.

Thanks,
-Stolee

 packfile.c | 103 +
 packfile.h |   6 
 2 files changed, 71 insertions(+), 38 deletions(-)

diff --git a/packfile.c b/packfile.c
index 7cd45aa4b2..db61c8813b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -738,13 +738,14 @@ static void report_pack_garbage(struct string_list *list)
report_helper(list, seen_bits, first, list->nr);
 }
 
-static void prepare_packed_git_one(struct repository *r, char *objdir, int 
local)
+void for_each_file_in_pack_dir(const char *objdir,
+  each_file_in_pack_dir_fn fn,
+  void *data)
 {
struct strbuf path = STRBUF_INIT;
size_t dirnamelen;
DIR *dir;
struct dirent *de;
-   struct string_list garbage = STRING_LIST_INIT_DUP;
 
strbuf_addstr(&path, objdir);
strbuf_addstr(&path, "/pack");
@@ -759,53 +760,79 @@ static void prepare_packed_git_one(struct repository *r, 
char *objdir, int local
strbuf_addch(&path, '/');
dirnamelen = path.len;
while ((de = readdir(dir)) != NULL) {
-   struct packed_git *p;
-   size_t base_len;
-
if (is_dot_or_dotdot(de->d_name))
continue;
 
strbuf_setlen(&path, dirnamelen);
strbuf_addstr(&path, de->d_name);
 
-   base_len = path.len;
-   if (strip_suffix_mem(path.buf, &base_len, ".idx")) {
-   /* Don't reopen a pack we already have. */
-   for (p = r->objects->packed_git; p;
-p = p->next) {
-   size_t len;
-   if (strip_suffix(p->pack_name, ".pack", &len) &&
-   len == base_len &&
-   !memcmp(p->pack_name, path.buf, len))
-   break;
-   }
-   if (p == NULL &&
-   /*
-* See if it really is a valid .idx file with
-* corresponding .pack file that we can map.
-*/
-   (p = add_packed_git(path.buf, path.len, local)) != 
NULL)
-   install_packed_git(r, p);
-   }
-
-   if (!report_garbage)
-   continue;
-
-   if (ends_with(de->d_name, ".idx") ||
-   ends_with(de->d_name, ".pack") ||
-   ends_with(de->d_name, ".bitmap") ||
-   ends_with(de->d_name, ".keep") ||
-   ends_with(de->d_name, ".promisor"))
-   string_list_append(&garbage, path.buf);
-   else
-   report_garbage(PACKDIR_FILE_GARBAGE, path.buf);
+   fn(path.buf, path.len, de->d_name, data);
}
+
closedir(dir);
-   report_pack_garbage(&garbage);
-   string_list_clear(&garbage, 0);
strbuf_release(&path);
 }
 
+struct prepare_pack_data
+{
+   struct repository *r;
+   struct string_list *garbage;
+   int local;
+};
+
+static void prepare_pack(const char *full_name, size_t full_name_len, const 
char *file_name, void *_data)
+{
+   struct prepare_pack_data *data = (struct prepare_pack_data *)_data;
+   struct packed_git *p;
+   size_t base_len = full_name_len;
+
+   if (strip_suffix_mem(full_name, &base_len, ".idx")) {
+   /* Don't reopen a pack we already have. */
+   for (p = data->r->objects->packed_git; p; p = p->next) {
+   size_t len;
+   if (strip_suffix(p->pack_name, ".pack", &len) &&
+   len == base_len &&
+   !memcmp(p->pack_name, full_name, len))
+   break;
+   }
+
+   if (p == NULL &&
+   /*
+* See if it really is a valid .idx file with
+* corresponding .pack file that we can map.
+*/
+   (p = add_packed_git(full_name, full_name_len, data->local)) 
!= NULL)
+   install_packed_git(data->r, p);
+   }
+
+   if (!report_garbage)
+  return;
+
+   if (ends_with(file_name, ".idx") ||
+   ends_with(file_name, ".pack") ||
+   ends_with(

Re: [RFC PATCH 3/3] git-rebase.sh: make git-rebase--interactive the default

2018-06-20 Thread Elijah Newren

Hi Dscho,

On Sun, Jun 17, 2018 at 2:44 PM, Johannes Schindelin
 wrote:

> I was really referring to speed. But I have to admit that I do not have
> any current numbers.
>
> Another issue just hit me, though: rebase --am does not need to look at as
> many Git objects as rebase --merge or rebase -i. Therefore, GVFS users
> will still want to use --am wherever possible, to avoid "hydrating"
> many objects during their rebase.

What is it that makes rebase --am need fewer Git objects than rebase
--merge or rebase -i?  I have one idea which isn't intrinsic to the
algorithm, so I'm curious if there's something else I'm unaware of.

My guess at what objects are needed by each type:

At a high level, rebase --am for each commit will need to compare the
commit to its parent to generate a diff (which thus involves walking
over the objects in both the commit and its parent, though it should
be able to skip over subtrees that are equal), and then will need to
look at all the objects in the target commit on which it needs to
apply the patch (in order to properly fill the index for a starting
point, and used later when creating a new commit).  If the application
of the diff fails, it falls back to a three-way merge, though the
three-way merge shouldn't need any additional objects.  So, to
summarize, rebase--am needs objects from the commit being rebased, its
parent, and the target commit onto which it is applying, though it can
short circuit some objects when the commit and its parent have
matching subtree(s).

rebase -i, if I understand correctly, does a three-way merge between
the commit, its parent, and the target commit.  Thus, we again walk
over objects in those three commits; I think unpack_trees() does not
take advantage of matching trees to avoid descending into subtrees,
but if so that's an optimization that we may be able to implement
(though it would require diving into unpack_trees() code, which is
never easy...).

(Side notes: (1) rebase --merge is basically the same as rebase -i
here; it's path to reaching the recursive merge machinery is a bit
different but the resulting arguments are the same; (2) a real merge
between branches would require more objects because it would have to
do some revision walking to find a merge base, and a real merge base
is likely to differ more than just the parent commit.  But finding
merge bases isn't relevant to rebase -m or rebase -i)

Is there something else I'm missing that fundamentally makes rebase -i
need more objects?

> As to speed: that might be harder. But then, the performance might already
> be good enough. I do not have numbers (nor the time to generate them) to
> back up my hunch that --am is substantially faster than --merge.

I too have a hunch that --am is faster than --merge, on big enough
repos or repos with enough renames.  I can partially back it up with
an indirect number: at [1], it was reported that cherry-picks could be
sped up by a factor of 20-30 on some repos with lots of renames.  I
believe there are other performance improvements possible too, for the
--merge or -i cases.

I'm also curious now whether your comment on hydrating objects might
uncover additional areas where performance improvements could be made
for non-am-based rebases of large-enough repos.

Elijah

[1] 
https://public-inbox.org/git/cabpp-bh4llzejje5cvwwqj8xtj3m9oc-41tu8bm8c7r0gqt...@mail.gmail.com/
(see also Peter's last reply in that thread, and compare to his first
post)

Re: Adding nested repository with slash adds files instead of gitlink

2018-06-20 Thread Rafael Ascensão

On Tue, Jun 19, 2018 at 11:28 AM Heiko Voigt  wrote:
>
> Interesting and nobody complained to the mailinglist?
>

On Wed, Jun 20, 2018 at 3:57 PM Duy Nguyen  wrote:
>
> Abusing a long standing bug does not make it a feature.
>

To make things clear, I wasn't defending if this should be considered a
bug or a feature. Was just clarifying that this isn't a newly discovered
thing and was reported/discussed in the past.

--
Cheers,
Rafael

Re: t5562: gzip -k is not portable

2018-06-20 Thread Torsten Bögershausen

On Wed, Jun 20, 2018 at 08:40:06AM -0400, Jeff King wrote:
> On Wed, Jun 20, 2018 at 08:13:06AM +0200, Torsten Bögershausen wrote:
> 
> > Good eyes, thanks.
> > The "-f -c" combo works:
> > 
> > -   gzip -k fetch_body &&
> > +   gzip -f -c fetch_body >fetch_body.gz &&
> > test_copy_bytes 10 fetch_body.gz.trunc &&
> > -   gzip -k push_body &&
> > +   gzip -f -c push_body >push_body.gz &&
> 
> Do we still need "-f"?  With "-c" gzip is only writing to stdout, so we
> should not need to force anything.
> 
> -Peff

You are rigth, -f is not needed.

I was confusing stdout with terminal :-( 
Most often stdout is the terminal, but not here of course.

Re: What's cooking in git.git (Jun 2018, #05; Mon, 18)

2018-06-20 Thread Junio C Hamano

Jonathan Tan  writes:

>> Here are the topics that have been cooking.  Commits prefixed with
>> '-' are only in 'pu' (proposed updates) while commits prefixed with
>> '+' are in 'next'.  The ones marked with '.' do not appear in any of
>> the integration branches, but I am still holding onto them.
>
> Would it be possible to have my patches that make bitmap_git not global
> [1] in this list? Peff seems OK with it. Let me know if you'd like to
> see anything else.

That one did fall through the cracks.  I'm (unfotunately) offline
this morning but hopefully can tag the 2.18 final by the end of the
day, and then I'll go into the usual "pick up the patches and
discussions missed during the pre-release freeze" mode for a few
days, during which time a reminder like this is greatly appreciated.

Thanks.

> The original patch should contain an extra paragraph that I've provided
> here [2] in the commit message - let me know if you want a reroll with
> that extra paragraph included.
>
> [1] 
> https://public-inbox.org/git/cover.1528397984.git.jonathanta...@google.com/
>
> [2] 
> https://public-inbox.org/git/2018065046.d03f8093347dc6c0e9b11...@google.com/

Re: The state of the object store series

2018-06-20 Thread Duy Nguyen

On Wed, Jun 20, 2018 at 12:09 AM Stefan Beller  wrote:
> "xx/convert-revision-walking"
>   This series aims to convert get_merge_bases(), in_merge_bases() and all its
>   revision walking code to take a repository argument.

We'll see who gets there first [1] ;-). It does not really matter (and
I suspect you may beat me to it) but I hope we all agree to pass
'struct repository *' in init_revisions(), or I should start reworking
that patch now.

[1] 
https://gitlab.com/pclouds/git/commit/a4a91e3815cf0e594d5ce0c862f5060e2c26
-- 
Duy

Re: [PATCH 06/23] midx: struct midxed_git and 'read' subcommand

2018-06-20 Thread Duy Nguyen

On Wed, Jun 20, 2018 at 3:33 PM Derrick Stolee  wrote:
>
> On 6/7/2018 2:31 PM, Duy Nguyen wrote:
> > On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee  wrote:
> >> diff --git a/Documentation/git-midx.txt b/Documentation/git-midx.txt
> >> index dcaeb1a91b..919283fdd8 100644
> >> --- a/Documentation/git-midx.txt
> >> +++ b/Documentation/git-midx.txt
> >> @@ -23,6 +23,11 @@ OPTIONS
> >>  /packs/multi-pack-index for the current MIDX file, and
> >>  /packs for the pack-files to index.
> >>
> >> +read::
> >> +   When given as the verb, read the current MIDX file and output
> >> +   basic information about its contents. Used for debugging
> >> +   purposes only.
> > On second thought. If you just need a temporary debugging interface,
> > adding a program in t/helper may be a better option. In the end we
> > might still need 'read' to dump a file out, but we should have some
> > stable output format (and json might be a good choice).
>
> My intention with this 'read' pattern in the MIDX (and commit-graph) is
> two-fold:
>
> 1. We can test that we are writing the correct data in our test suite. A
> test-tool builtin would suffice for this purpose.
>
> 2. We can help trouble-shoot users who may be having trouble with their
> MIDX files. Having the subcommand in a plumbing command allows us to do
> this in the shipped versions of Git.
>
> Maybe this second purpose isn't enough to justify the feature in Git and
> we should move this to the test-tool, especially with the 'verify' mode
> coming in a second series. Note that a 'verify' mode doesn't satisfy
> item (1).

Yeah I think normally we just have some "fsck" thing to verify when
things go bad. If you need more than that I think you just ask the
user to send the .midx to you (with full understanding of potentially
revealing confidential info and stuff). It'll be faster than
instructing them to "run this command", "ok, run another command"
I thought of suggesting a command to dump the midx file in readable
form (like json), but I think if fsck fails then chances of that
command successfully dumping may be very low.

Either way, if the command is meant for troubleshooting, I think it
should be added at the end when the whole midx file is implemented and
understood and we see what we need to troubleshoot. Adding small
pieces of changes from patch to patch makes it really hard to see if
it helps troubleshooting at all, it just helps the first purpose.
-- 
Duy

Re: Adding nested repository with slash adds files instead of gitlink

2018-06-20 Thread Duy Nguyen

On Wed, Jun 20, 2018 at 1:55 PM Rafael Ascensão  wrote:
>
> On Wed, Jun 20, 2018 at 5:39 AM Kevin Daudt  wrote:
> >
> > What this is about that when doing `git add path/` (with trailing /),
> >
>
> This is what I was referring to. If you search for 'Fake Submodules',
> you'll see that some people were/are intentionally using this instead of
> subtrees or submodules. Unfortunately the original article [1] seems to
> be dead, but searching url in the mailing list archives leads to some
> additional discussion on the subject [2,3].

Abusing a long standing bug does not make it a feature. I'm not
opposed to having a new option to keep that behavior, but it should
not be the default. If you use it that way, you're on your own.

> [1]:http://debuggable.com/posts/git-fake-submodules:4b563ee4-f3cc-4061-967e-0e48cbdd56cb
> [2]:https://public-inbox.org/git/xmqqy47o6q71@gitster.mtv.corp.google.com/
> [3]:https://public-inbox.org/git/CAGZ79kZofg3jS+g0weTdco+PGo_p-_Hd-NScZ=q2ufb7tf2...@mail.gmail.com/
-- 
Duy

Re: Excessive mmap [was Git server eats all memory]

2018-06-20 Thread Duy Nguyen

On Tue, Jun 19, 2018 at 10:27 PM Ivan Kanis
 wrote:
>
> Dmitry Potapov  wrote:
>
> > On Fri, Aug 06, 2010 at 07:23:17PM +0200, Ivan Kanis wrote:
> >>
> >> I expected the malloc to take 4G but was surprised it didn't. It seems
> >> to be mmap taking all the memory. I am not familiar with that function,
> >> it looks like it's mapping memory to a file... Is it reasonable to mmap
> >> so much memory?
> >
> > AFAIK, Git does not need to mmap the whole pack to memory, but it
> > is more efficient to mmap the whole pack wherever possible, because
> > it has a completely random access, so if you store only one sliding
> > window, you will have to re-read it many times. Besides, mmap size
> > does not mean that so much physical memory is used. Pages should
> > be loaded when they are necessary, and if you have more than one
> > client cloning the same repo, this memory should be shared by them.
>
> I have clone identical repositories and the system starts to swap. I
> think it shows that cloning two repository doesn't share mmap.

I doubt it (assuming you're on linux). If you suspect this, configure
core.packedGitWindowSize to reduce the mmap size. There are lots of
other things in a cloning process that do not share (is this client or
server btw?) and things could add up.
--
Duy

Re: [PATCH 06/23] midx: struct midxed_git and 'read' subcommand

2018-06-20 Thread Derrick Stolee


On 6/7/2018 2:31 PM, Duy Nguyen wrote:

On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee  wrote:

diff --git a/Documentation/git-midx.txt b/Documentation/git-midx.txt
index dcaeb1a91b..919283fdd8 100644
--- a/Documentation/git-midx.txt
+++ b/Documentation/git-midx.txt
@@ -23,6 +23,11 @@ OPTIONS
 /packs/multi-pack-index for the current MIDX file, and
 /packs for the pack-files to index.

+read::
+   When given as the verb, read the current MIDX file and output
+   basic information about its contents. Used for debugging
+   purposes only.

On second thought. If you just need a temporary debugging interface,
adding a program in t/helper may be a better option. In the end we
might still need 'read' to dump a file out, but we should have some
stable output format (and json might be a good choice).


My intention with this 'read' pattern in the MIDX (and commit-graph) is 
two-fold:


1. We can test that we are writing the correct data in our test suite. A 
test-tool builtin would suffice for this purpose.


2. We can help trouble-shoot users who may be having trouble with their 
MIDX files. Having the subcommand in a plumbing command allows us to do 
this in the shipped versions of Git.


Maybe this second purpose isn't enough to justify the feature in Git and 
we should move this to the test-tool, especially with the 'verify' mode 
coming in a second series. Note that a 'verify' mode doesn't satisfy 
item (1).


Thanks,
-Stolee

Re: [PATCH 06/23] midx: struct midxed_git and 'read' subcommand

2018-06-20 Thread Derrick Stolee


On 6/7/2018 1:54 PM, Duy Nguyen wrote:

On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee  wrote:

As we build the multi-pack-index feature by adding chunks at a time,
we want to test that the data is being written correctly.

Create struct midxed_git to store an in-memory representation of a

A word play on 'packed_git'? Amusing. Some more descriptive name would
be better though. midxed looks almost like random letters thrown
together.


I'll use 'struct multi_pack_index'.




multi-pack-index and a memory-map of the binary file. Initialize this
struct in load_midxed_git(object_dir).
+static int read_midx_file(const char *object_dir)
+{
+   struct midxed_git *m = load_midxed_git(object_dir);
+
+   if (!m)
+   return 0;

This looks like an error case, please don't just return zero,
typically used to say "success". I don't know if this command stays
"for debugging purposes" until the end. Of course in that case it does
not really matter.


It is intended for debugging and testing. Generally, it is not an error 
to not have a MIDX in an object directory.



+struct midxed_git *load_midxed_git(const char *object_dir)
+{
+   struct midxed_git *m;
+   int fd;
+   struct stat st;
+   size_t midx_size;
+   void *midx_map;
+   const char *midx_name = get_midx_filename(object_dir);

mem leak? This function returns allocated memory if I remember correctly.


+
+   fd = git_open(midx_name);
+   if (fd < 0)
+   return NULL;

do an error_errno() so we know what went wrong at least.


+   if (fstat(fd, &st)) {
+   close(fd);
+   return NULL;

same here, we should know why fstat() fails.


+   }
+   midx_size = xsize_t(st.st_size);
+
+   if (midx_size < MIDX_MIN_SIZE) {
+   close(fd);
+   die("multi-pack-index file %s is too small", midx_name);

_()

The use of die() should be discouraged though. Many people still try
(or wish) to libify code and new die() does not help. I think error()
here would be enough then you can return NULL. Or you can go fancier
and store the error string in a strbuf like refs code.


+   }
+
+   midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+
+   m = xcalloc(1, sizeof(*m) + strlen(object_dir) + 1);
+   strcpy(m->object_dir, object_dir);
+   m->data = midx_map;
+
+   m->signature = get_be32(m->data);
+   if (m->signature != MIDX_SIGNATURE) {
+   error("multi-pack-index signature %X does not match signature 
%X",
+ m->signature, MIDX_SIGNATURE);

_(). Maybe 0x%08x instead of %x


+   goto cleanup_fail;
+   }
+
+   m->version = *(m->data + 4);

m->data[4] instead? shorter and easier to understand.

Same comment on "*(m->data + x)" and error() without _() for the rest.


+   if (m->version != MIDX_VERSION) {
+   error("multi-pack-index version %d not recognized",
+ m->version);

_()

+   goto cleanup_fail;
+   }
+
+   m->hash_version = *(m->data + 5);

m->data[5]


+cleanup_fail:
+   FREE_AND_NULL(m);
+   munmap(midx_map, midx_size);
+   close(fd);
+   exit(1);

It's bad enough that you die() but exit() in this code seems too much.
Please just return NULL and let the caller handle the error.


Will do.




diff --git a/midx.h b/midx.h
index 3a63673952..a1d18ed991 100644
--- a/midx.h
+++ b/midx.h
@@ -1,4 +1,13 @@
+#ifndef MIDX_H
+#define MIDX_H
+
+#include "git-compat-util.h"
  #include "cache.h"
+#include "object-store.h"

I don't really think you need object-store here (git-compat-util.h
too). "struct mixed_git;" would be enough for load_midxed_git
declaration below.


  #include "packfile.h"

+struct midxed_git *load_midxed_git(const char *object_dir);
+
  int write_midx_file(const char *object_dir);
+
+#endif
diff --git a/object-store.h b/object-store.h
index d683112fd7..77cb82621a 100644
--- a/object-store.h
+++ b/object-store.h
@@ -84,6 +84,25 @@ struct packed_git {
 char pack_name[FLEX_ARRAY]; /* more */
  };

+struct midxed_git {
+   struct midxed_git *next;

Do we really have multiple midx files?


There is one per object directory currently, but you may have one 
locally and one in each of your alternates. I do need to double-check 
that we populate this list later in the series. (And I'll remove it from 
this commit and save it for when it is needed.)





+
+   int fd;
+
+   const unsigned char *data;
+   size_t data_len;
+
+   uint32_t signature;
+   unsigned char version;
+   unsigned char hash_version;
+   unsigned char hash_len;
+   unsigned char num_chunks;
+   uint32_t num_packs;
+   uint32_t num_objects;
+
+   char object_dir[FLEX_ARRAY];

Why do you need to keep object_dir when it could be easily retrieved
when the repo is available?


+};
+
  struct raw_object_store {
 /*
  * Path to the repository's object store.

Re: t5562: gzip -k is not portable

2018-06-20 Thread Jeff King

On Wed, Jun 20, 2018 at 08:13:06AM +0200, Torsten Bögershausen wrote:

> Good eyes, thanks.
> The "-f -c" combo works:
> 
> -   gzip -k fetch_body &&
> +   gzip -f -c fetch_body >fetch_body.gz &&
> test_copy_bytes 10 fetch_body.gz.trunc &&
> -   gzip -k push_body &&
> +   gzip -f -c push_body >push_body.gz &&

Do we still need "-f"?  With "-c" gzip is only writing to stdout, so we
should not need to force anything.

-Peff

Re: Adding nested repository with slash adds files instead of gitlink

2018-06-20 Thread Rafael Ascensão

On Wed, Jun 20, 2018 at 5:39 AM Kevin Daudt  wrote:
>
> What this is about that when doing `git add path/` (with trailing /),
>

This is what I was referring to. If you search for 'Fake Submodules',
you'll see that some people were/are intentionally using this instead of
subtrees or submodules. Unfortunately the original article [1] seems to
be dead, but searching url in the mailing list archives leads to some
additional discussion on the subject [2,3].

[1]:http://debuggable.com/posts/git-fake-submodules:4b563ee4-f3cc-4061-967e-0e48cbdd56cb
[2]:https://public-inbox.org/git/xmqqy47o6q71@gitster.mtv.corp.google.com/
[3]:https://public-inbox.org/git/CAGZ79kZofg3jS+g0weTdco+PGo_p-_Hd-NScZ=q2ufb7tf2...@mail.gmail.com/

Re: [msysGit] Possible git status problem at case insensitive file system

2018-06-20 Thread Johannes Schindelin

Hi Frank,

On Mon, 9 Aug 2010, Frank Li wrote:

> All:
> I use msysgit 1.7.0.2 at windows xp.
> Problem: git status will list tracked directory as untracked dir.

That is the *least* problem you have with that version: you are vulnerable
to several security issues.

Sadly, there were literally zero contributions from community members
using Windows XP to uphold XP support, so the latest Git for Windows you
can use (unless you build it yourself, fixing several compile issues by
now) is v2.10.

See https://gitforwindows.org/requirements.html for details.

Ciao,
Johannes

Re: [PATCH] t3404: check root commit in 'rebase -i --root reword root commit'

2018-06-20 Thread Johannes Schindelin

Hi Junio,

On Tue, 19 Jun 2018, Junio C Hamano wrote:

> Todd Zullinger  writes:
> 
> > index e500d7c320..352a52e59d 100755
> > --- a/t/t3404-rebase-interactive.sh
> > +++ b/t/t3404-rebase-interactive.sh
> > @@ -977,7 +977,8 @@ test_expect_success 'rebase -i --root reword root 
> > commit' '
> > set_fake_editor &&
> > FAKE_LINES="reword 1 2" FAKE_COMMIT_MESSAGE="A changed" \
> > git rebase -i --root &&
> > -   git show HEAD^ | grep "A changed"
> > +   git show HEAD^ | grep "A changed" &&
> > +   test -z "$(git show -s --format=%p HEAD^)"
> >  '
> 
> The additional test probably will pass when HEAD is a root commit by
> failing to refer to HEAD^, resulting an empty output from show.  The
> previous step would also give an error and won't emit anything that
> would match "A changed", so it probably is OK, though.

That matches my thinking. Otherwise, we would have had to do the

git show -s --format=%p HEAD^ >out &&
test_must_be_empty out

dance.

Ciao,
Dscho

[ANNOUNCE] Git Rev News edition 40

2018-06-20 Thread Christian Couder

Hi everyone,

The 40th edition of Git Rev News is now published:

  https://git.github.io/rev_news/2018/06/20/edition-40/

Thanks a lot to all the contributors: Adam Spiers, Bryan Turner and
Nicolas Pitre!

Enjoy,
Christian, Jakub, Markus and Gabriel.

Re: [PATCH] t3404: check root commit in 'rebase -i --root reword root commit'

2018-06-20 Thread Johannes Schindelin

Hi Junio,

On Tue, 19 Jun 2018, Junio C Hamano wrote:

> Todd Zullinger  writes:
> 
> > With luck, this will save you a few minutes, assuming the
> > commit message is reasonable (or can be improved with help
> > from Phillip and others). :)
> 
> OK.
> 
> > Or Junio may just squash this onto js/rebase-i-root-fix.
> 
> Nah, not for a hotfix on the last couple of days before the final.
> We'd need to build on top, not "squash".

Right. Can we take this on top, at a leisurely pace? I mean: we verified
that this works in the upcoming v2.18.0, and it would be nice to have that
extra regression test safety in the future, but it is not crucial to
include it in v2.18.0 itself.

Ciao,
Dscho

Re: want option to git-rebase

2018-06-20 Thread Ian Jackson

Jonathan Nieder writes ("Re: want  option to git-rebase"):
> Oh, good catch.  "git rebase" already generally does the right thing
> when GIT_REFLOG_ACTION is set (by only appending to it and never
> replacing it).

Great.  I indeed did not know about this.

> Ian, does that work well for you?  If so, any ideas where it should go
> in the documentation to be more discoverable for next time?

Thanks for asking exactly the right question :-).

I didn't make a record of exactly where I looked but I'm pretty sure I
looked at the manpages for git-reflog and git-rebase.  I think I
probably also looked at git-update-ref; I have read git-update-ref a
number of times.

Right now in Debian unstable I see that none of these places document
this convention.

I think git-reflog ought to mention it, so that it says where the
information it provides comes from.

It also ought to be mentioned in git-update-ref, because all callers
of git-update-ref need to implement it !

Indeed, because I didn't know about this convention, dgit and
git-debrebase do not honour it.  At least in my case, if it had been
in git-update-ref I would have implemented it myself and then I would
obviously have thought of making use of it myself.


Also, I have to say, the documentation for GIT_REFLOG_ACTION
in git(1) is very obscure.  It sort of generally waffles around what
it is for, but it does not say:
 * what does this variable contain
 * who can and should set it
 * who should consume it
 * what the rules are for modifying it

I don't think simply adding a cross-reference to GIT_REFLOG_ACTION in
git(1) would be sufficient, without also improving this part.


The explanations provided by you and Johannes, here in these emails,
are much much better:

> >> "git rebase" sets this itself, so it doesn't solve your problem.
> >
> > If it does so unconditionally, then that is a bug. If a script
> > wants to set GIT_REFLOG_ACTION, but finds that it is already set,
> > then it must not change the value. set_reflog_action in
> > git-sh-setup does the right thing.
> >
> > So, if there is another script or application around git-rebase, then it
> > should just set GIT_REFLOG_ACTION (if it is not already set) and export the
> > environment variable to git-rebase.
> 
> Oh, good catch.  "git rebase" already generally does the right thing
> when GIT_REFLOG_ACTION is set (by only appending to it and never
> replacing it).

Maybe some of this prose, which explains things quite well, could be
reworked into a form suitable for the git docs.  (Even though there
seems to be disagreement about whether a subcommand may *append* to
GIT_REFLOG_ACTION; which, ISTM, is a practice which ought to be
encouraged rather than discouraged.)


Regards,
Ian.

-- 
Ian JacksonThese opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

Re: [GSoC] GSoC with git, week 7

2018-06-20 Thread Pratik Karki

On Tue, Jun 19, 2018 at 12:33 AM, Paul-Sebastian Ungureanu
 wrote:
> Hello!
>
>
> On 18.06.2018 20:34, Alban Gruin wrote:
>>
>> Hi,
>>
>> I published a new blog post here:
>>
>>https://blog.pa1ch.fr/posts/2018/06/18/en/gsoc2018-week-7.html
>>
>> It’s shorter than the last one, but feel free to tell me what you think
>> about it!  :)
>>
>> Cheers,
>> Alban
>>
>
> Good job! My blog is also up:
>
> https://ungps.github.io/
>

Awesome!
My blog post is also up:

https://prertik.github.io/post/week-07

Cheers,
Pratik

Org

2018-06-20 Thread KoiAnocha




Sent from my iPhone

62 matches

Mail list logo