date:20170223

Re: [PATCH v2] config: reject invalid VAR in 'git -c VAR=VAL command'

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 10:08:57PM -0800, Junio C Hamano wrote:

> Anyway, here is an updated one (the part of the patch to t/ is not
> shown as it is unchanged).
> 
> -- >8 --
> Subject: [PATCH] config: use git_config_parse_key() in 
> git_config_parse_parameter()

Looks good. Nice and simple.

-Peff

Re: [PATCH v2] config: reject invalid VAR in 'git -c VAR=VAL command'

2017-02-23 Thread Junio C Hamano

Jeff King  writes:

>> Backtracking will not fundamentally "fix" parsing of
>> 
>>  a.b=c=.d
>> 
>> between twhse two
>> 
>>  [a "b="] c = ".d"
>>  [a]  b = "c=.d"
>> 
>> unfortunately, I think.  I do not think it is worth doing the "best
>> effort" with erroring out when ambiguous, because there is no way
>> for the end user to disambiguate, unless we introduce a different
>> syntax, at which point we cannot use config_parse_key() anymore.
>
> Ah, yeah, you're right. I thought the problem was just that the "split"
> was too naive, but it really is that the whole syntax is badly
> specified.
>
> I guess "git config --list" suffers from the same problem. You can get
> around it there with "-z", but that probably would not be very pleasant
> here. :)
>
> Probably not worth worrying too much about if nobody is complaining.

Yup.

Anyway, here is an updated one (the part of the patch to t/ is not
shown as it is unchanged).

-- >8 --
Subject: [PATCH] config: use git_config_parse_key() in 
git_config_parse_parameter()

The parsing of one-shot assignments of configuration variables that
come from the command line historically was quite loose and allowed
anything to pass.  It also downcased everything in the variable name,
even a three-level .. name in which
the  part must be treated in a case sensitive manner.

Existing git_config_parse_key() helper is used to parse the variable
name that comes from the command line, i.e. "git config VAR VAL",
and handles these details correctly.  Replace the strbuf_tolower()
call in git_config_parse_parameter() with a call to it to correct
both issues.  git_config_parse_key() does a bit more things that are
not necessary for the purpose of this codepath (e.g. it allocates a
separate buffer to return the canonicalized variable name because it
takes a "const char *" input), but we are not in a performance-critical
codepath here.

Signed-off-by: Junio C Hamano 
---
 config.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/config.c b/config.c
index b8cce1dffa..1c1a1520ff 100644
--- a/config.c
+++ b/config.c
@@ -295,7 +295,9 @@ int git_config_parse_parameter(const char *text,
   config_fn_t fn, void *data)
 {
const char *value;
+   char *canonical_name;
struct strbuf **pair;
+   int ret;

pair = strbuf_split_str(text, '=', 2);
if (!pair[0])
@@ -313,13 +315,15 @@ int git_config_parse_parameter(const char *text,
strbuf_list_free(pair);
return error("bogus config parameter: %s", text);
}
-   strbuf_tolower(pair[0]);
-   if (fn(pair[0]->buf, value, data) < 0) {
-   strbuf_list_free(pair);
-   return -1;
+
+   if (git_config_parse_key(pair[0]->buf, _name, NULL)) {
+   ret = -1;
+   } else {
+   ret = (fn(canonical_name, value, data) < 0) ? -1 : 0;
+   free(canonical_name);
}
strbuf_list_free(pair);
-   return 0;
+   return ret;
 }

 int git_config_from_parameters(config_fn_t fn, void *data)
-- 
2.12.0-rc2-308-gbf7e63c428

Re: [PATCH v2] config: reject invalid VAR in 'git -c VAR=VAL command'

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 08:17:44PM -0800, Junio C Hamano wrote:

> > Hmm. I suspect one cannot do:
> >
> >   git -c 'section.subsection with an = in it.key=foo' ...
> >
> > Definitely not a new problem, nor something that should block your
> > patch. But if we want to fix it, I suspect the problem will ultimately
> > involve parsing left-to-right to get the key first, then confirming it
> > has an =, and then the value.
> 
> Backtracking will not fundamentally "fix" parsing of
> 
>   a.b=c=.d
> 
> between twhse two
> 
>   [a "b="] c = ".d"
>   [a]  b = "c=.d"
> 
> unfortunately, I think.  I do not think it is worth doing the "best
> effort" with erroring out when ambiguous, because there is no way
> for the end user to disambiguate, unless we introduce a different
> syntax, at which point we cannot use config_parse_key() anymore.

Ah, yeah, you're right. I thought the problem was just that the "split"
was too naive, but it really is that the whole syntax is badly
specified.

I guess "git config --list" suffers from the same problem. You can get
around it there with "-z", but that probably would not be very pleasant
here. :)

Probably not worth worrying too much about if nobody is complaining.

-Peff

Re: [PATCH v2] config: reject invalid VAR in 'git -c VAR=VAL command'

2017-02-23 Thread Junio C Hamano

Jeff King  writes:

>>  pair = strbuf_split_str(text, '=', 2);
>>  if (!pair[0])
>
> Hmm. I suspect one cannot do:
>
>   git -c 'section.subsection with an = in it.key=foo' ...
>
> Definitely not a new problem, nor something that should block your
> patch. But if we want to fix it, I suspect the problem will ultimately
> involve parsing left-to-right to get the key first, then confirming it
> has an =, and then the value.

Backtracking will not fundamentally "fix" parsing of

a.b=c=.d

between twhse two

[a "b="] c = ".d"
[a]  b = "c=.d"

unfortunately, I think.  I do not think it is worth doing the "best
effort" with erroring out when ambiguous, because there is no way
for the end user to disambiguate, unless we introduce a different
syntax, at which point we cannot use config_parse_key() anymore.

>> +if (git_config_parse_key(pair[0]->buf, _name, NULL))
>>  return -1;
>> -}
>
> I think git_config_parse_key() will free canonical_name itself if it
> returns failure. But do you need to strbuf_list_free(pair) here?

Yeah, I missed that one.  Thanks.

Re: [PATCH 4/4] ident: do not ignore empty config name/email

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 08:11:11PM -0800, Junio C Hamano wrote:

> > So I dunno. I could really go either way on it. Feel free to drop it, or
> > even move it into a separate topic to be cooked longer.
> 
> If it were 5 years ago, it would have been different, but I do not
> think cooking it longer in 'next' would smoke out breakages in
> obscure scripts any longer.  Git is used by too many people who have
> never seen its source these days.

Yeah, I have noticed that, too. I wonder if it would be interesting to
cut "weeklies" or something of "master" or even "next" that people could
install with a single click.

Of course it's not like we have a binary installer in the first place,
so I guess that's a prerequisite.

-Peff

Re: [PATCH 4/4] ident: do not ignore empty config name/email

2017-02-23 Thread Junio C Hamano

Jeff King  writes:

> Keep in mind this _only_ affects Git's config variables. So a script
> feeding git via GIT_AUTHOR_NAME, etc, shouldn't change at all with this
> code.

Ah, that changes the equation somewhat ;-)

> So I dunno. I could really go either way on it. Feel free to drop it, or
> even move it into a separate topic to be cooked longer.

If it were 5 years ago, it would have been different, but I do not
think cooking it longer in 'next' would smoke out breakages in
obscure scripts any longer.  Git is used by too many people who have
never seen its source these days.

Re: [PATCH 15/15] builtin/checkout: add --recurse-submodules switch

2017-02-23 Thread Ramsay Jones



On 23/02/17 22:57, Stefan Beller wrote:
> Signed-off-by: Stefan Beller 
> ---
>  Documentation/git-checkout.txt |  7 +++
>  builtin/checkout.c | 28 
>  t/lib-submodule-update.sh  | 33 -
>  t/t2013-checkout-submodule.sh  |  5 +
>  4 files changed, 64 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
> index 8e2c0662dd..d6399c0af8 100644
> --- a/Documentation/git-checkout.txt
> +++ b/Documentation/git-checkout.txt
> @@ -256,6 +256,13 @@ section of linkgit:git-add[1] to learn how to operate 
> the `--patch` mode.
>   out anyway. In other words, the ref can be held by more than one
>   worktree.
>  
> +--[no-]recurse-submodules::
> + Using --recurse-submodules will update the content of all initialized
> + submodules according to the commit recorded in the superproject. If
> + local modifications in a submodule would be overwritten the checkout
> + will fail unless `-f` is used. If nothing (or --no-recurse-submodules)
> + is used, the work trees of submodules will not be updated.
> +
>  ::
>   Branch to checkout; if it refers to a branch (i.e., a name that,
>   when prepended with "refs/heads/", is a valid ref), then that
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index f174f50303..207ce09771 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -21,12 +21,31 @@
>  #include "submodule-config.h"
>  #include "submodule.h"
>  
> +static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
> +
>  static const char * const checkout_usage[] = {
>   N_("git checkout [] "),
>   N_("git checkout [] [] -- ..."),
>   NULL,
>  };
>  
> +int option_parse_recurse_submodules(const struct option *opt,
> + const char *arg, int unset)

Again, this function should be marked static.

[I also noted _two_ other local functions with the same name
in builtin/fetch.c and builtin/push.c]

ATB,
Ramsay Jones

Re: [PATCH 10/15] update submodules: add submodule_move_head

2017-02-23 Thread Ramsay Jones



On 23/02/17 22:57, Stefan Beller wrote:
> In later patches we introduce the options and flag for commands
> that modify the working directory, e.g. git-checkout.
> 
> This piece of code will be used universally for
> all these working tree modifications as it
> * supports dry run to answer the question:
>   "Is it safe to change the submodule to this new state?"
>   e.g. is it overwriting untracked files or are there local
>   changes that would be overwritten?
> * supports a force flag that can be used for resetting
>   the tree.
> 
> Signed-off-by: Stefan Beller 
> ---
>  submodule.c | 135 
> 
>  submodule.h |   7 
>  2 files changed, 142 insertions(+)
> 
> diff --git a/submodule.c b/submodule.c
> index 0b2596e88a..a2cf8c9376 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1239,6 +1239,141 @@ int bad_to_remove_submodule(const char *path, 
> unsigned flags)
>   return ret;
>  }
>  
> +static int submodule_has_dirty_index(const struct submodule *sub)
> +{
> + struct child_process cp = CHILD_PROCESS_INIT;
> +
> + prepare_submodule_repo_env_no_git_dir(_array);
> +
> + cp.git_cmd = 1;
> + argv_array_pushl(, "diff-index", "--quiet", \
> + "--cached", "HEAD", NULL);
> + cp.no_stdin = 1;
> + cp.no_stdout = 1;
> + cp.dir = sub->path;
> + if (start_command())
> + die("could not recurse into submodule '%s'", sub->path);
> +
> + return finish_command();
> +}
> +
> +void submodule_reset_index(const char *path)

I was just about to send a patch against the previous series
(in pu branch last night), but since you have sent another
version ...

In the last series this was called 'submodule_clean_index()'
and, since it is a file-local symbol, should be marked with
static. I haven't applied these patches to check, but the
interdiff in the cover letter leads me to believe that this
will also apply to the renamed function.

[The patch subject was also slightly different.]

ATB,
Ramsay Jones

Re: [PATCH 05/10] submodule--helper: add is_active command

2017-02-23 Thread Stefan Beller

On Thu, Feb 23, 2017 at 3:47 PM, Brandon Williams  wrote:
> There are a lot of places where an explicit check for
> submodule."".url is done to see if a submodule exists.  In order
> to more easily facilitate the use of the submodule.active config option
> to indicate active submodules, add a helper which can be used to query
> if a submodule is active or not.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/submodule--helper.c| 11 
>  t/t7413-submodule-is-active.sh | 63 
> ++
>  2 files changed, 74 insertions(+)
>  create mode 100755 t/t7413-submodule-is-active.sh
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index df0d9c166..dac02604d 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -1128,6 +1128,16 @@ static int absorb_git_dirs(int argc, const char 
> **argv, const char *prefix)
> return 0;
>  }
>
> +static int is_active(int argc, const char **argv, const char *prefix)
> +{
> +   if (argc != 2)
> +   die("submodule--helper is-active takes exactly 1 arguments");
> +
> +   gitmodules_config();
> +
> +   return !is_submodule_initialized(argv[1]);
> +}
> +
>  #define SUPPORT_SUPER_PREFIX (1<<0)
>
>  struct cmd_struct {
> @@ -1147,6 +1157,7 @@ static struct cmd_struct commands[] = {
> {"init", module_init, 0},
> {"remote-branch", resolve_remote_submodule_branch, 0},
> {"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
> +   {"is-active", is_active, 0},
>  };
>
>  int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
> diff --git a/t/t7413-submodule-is-active.sh b/t/t7413-submodule-is-active.sh
> new file mode 100755
> index 0..683487020
> --- /dev/null
> +++ b/t/t7413-submodule-is-active.sh
> @@ -0,0 +1,63 @@
> +#!/bin/sh
> +
> +test_description='Test submodule--helper is-active
> +
> +This test verifies that `git submodue--helper is-active` correclty identifies
> +submodules which are "active" and interesting to the user.
> +'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup' '
> +   git init sub &&
> +   test_commit -C sub initial &&
> +   git init super &&
> +   test_commit -C super initial &&
> +   git -C super submodule add ../sub sub1 &&
> +   git -C super submodule add ../sub sub2 &&
> +   git -C super commit -a -m "add 2 submodules at sub{1,2}"
> +'
> +
> +test_expect_success 'is-active works with urls' '
> +   git -C super submodule--helper is-active sub1 &&
> +   git -C super submodule--helper is-active sub2 &&
> +
> +   git -C super config --unset submodule.sub1.URL &&
> +   test_must_fail git -C super submodule--helper is-active sub1 &&
> +   git -C super config submodule.sub1.URL ../sub &&
> +   git -C super submodule--helper is-active sub1
> +'
> +
> +test_expect_success 'is-active works with basic submodule.active config' '
> +   git -C super config --add submodule.active "." &&
> +   git -C super config --unset submodule.sub1.URL &&
> +   git -C super config --unset submodule.sub2.URL &&

I think we'd want to unset only one of them here

> +
> +   git -C super submodule--helper is-active sub1 &&
> +   git -C super submodule--helper is-active sub2 &&

to test 2 different cases of one being active by config setting only and
the other having both.

I could not spot test for having the URL set but the config setting set, not
including the submodule, e.g.

git -C super config  submodule.sub1.URL ../sub &&
git -C super submodule.active  ":(exclude)sub1" &&

which would be expected to not be active, as once the configuration
is there it takes precedence over any (no)URL setting?

Re: [PATCH 4/4] ident: do not ignore empty config name/email

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 12:58:39PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > This one is perhaps questionable. Maybe somebody is relying on setting a
> > per-repo user.name to override a ~/.gitconfig value and enforce
> > auto-detection?
> 
> Thanks for splitting this step out.  1/4 and 2/4 are obvious
> improvements, and 3/4 is a very sensible fix.  Compared to those
> three, this one does smell questionable, because I do not quite see
> any other reasonable fallback other than the auto-detection if the
> user gives an empty ident on purpose.

The outcomes are basically:

  1. In strict mode (making a commit, etc), we'll die with "empty name
 not allowed". My thinking was that this is less confusing for the
 user.

  2. In non-strict mode, we'd use a blank name instead of trying your
 username (or dying if you don't have an /etc/passwd entry).

> Erroring out to say "don't do that" is probably not too bad, but
> perhaps we are being run by a script that is doing a best-effort
> conversion from $ANOTHER_SCM using a list of known authors that is
> incomplete, ending up feeding empty ident and allowing us to fall
> back to attribute them to the user who runs the script.  I do not
> see a point in breaking that user and having her or him update the
> script to stuff in a truly bogus "Unknown " name.

Keep in mind this _only_ affects Git's config variables. So a script
feeding git via GIT_AUTHOR_NAME, etc, shouldn't change at all with this
code. If your script is doing "git -c user.name=whatever commit", I
think you should reconsider your script. :)

So I dunno. I could really go either way on it. Feel free to drop it, or
even move it into a separate topic to be cooked longer.

-Peff

Re: [PATCH 10/10] submodule--helper clone: check for configured submodules using helper

2017-02-23 Thread Stefan Beller

On Thu, Feb 23, 2017 at 3:47 PM, Brandon Williams  wrote:

> @@ -795,14 +794,11 @@ static int prepare_to_clone_next_submodule(const struct 
> cache_entry *ce,
> }
>
> /*
> -* Looking up the url in .git/config.
> +* Check if the submodule has been initialized.
>  * We must not fall back to .gitmodules as we only want
>  * to process configured submodules.

This sentence "we must not ..." is also no longer accurate,
as we do exactly that when using sub->url instead of just url
below.

Re: [PATCH v2] config: reject invalid VAR in 'git -c VAR=VAL command'

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 03:19:58PM -0800, Junio C Hamano wrote:

> > But you are right.  config-parse-key does have the simpler string
> > that can just be given to the canonicalize thing and we should be
> > able to reuse it.
> 
> Actually, I think we can just use the existing config_parse_key()
> instead of adding the new function.  It adds one allocation and
> deallocation, but it's not like this is a performance-critical
> codepath that we absolutely avoid extra allocations.  After all, we
> are still using the strbuf-split thing :-/.

Yeah, you're right. This is much nicer, and everything else was
premature optimization.

> -- >8 --
> From: Junio C Hamano 
> Date: Thu, 23 Feb 2017 15:04:40 -0800
> Subject: [PATCH] config: reject invalid VAR in 'git -c VAR=VAL command' and
>  keep subsection intact

Long subject. :)

I'd have just said:

  config: pass variables through git_config_parse_parameter()

That is "what", but the "why" can come in the next paragraph.

> The parsing of one-shot assignments of configuration variables that
> come from the command line historically was quite loose and allowed
> anything to pass.  It also downcased everything in the variable name,
> even a three-level .. name in which
> the  part must be treated in a case sensible manner.
> 
> Existing git_config_parse_key() helper is used to parse the variable
> name that comes from the command line, i.e. "git config VAR VAL",
> and handles these details correctly.  Replace the strbuf_tolower()
> call in git-config_parse_parameter() with a call to it to correct
> both issues.  git_config_parse_key() does a bit more things that are
> not necessary for the purpose of this codepath (e.g. it allocates a
> separate buffer to return the canonicalized variable name because it
> takes a "const char *" input), but we are not in a performance-critical
> codepath here.

Nicely explained.

> diff --git a/config.c b/config.c
> index b8cce1dffa..39f20dcd2c 100644
> --- a/config.c
> +++ b/config.c
> @@ -295,7 +295,9 @@ int git_config_parse_parameter(const char *text,
>  config_fn_t fn, void *data)
>  {
>   const char *value;
> + char *canonical_name;
>   struct strbuf **pair;
> + int ret = 0;
>  
>   pair = strbuf_split_str(text, '=', 2);
>   if (!pair[0])

Hmm. I suspect one cannot do:

  git -c 'section.subsection with an = in it.key=foo' ...

Definitely not a new problem, nor something that should block your
patch. But if we want to fix it, I suspect the problem will ultimately
involve parsing left-to-right to get the key first, then confirming it
has an =, and then the value.

> @@ -313,13 +315,15 @@ int git_config_parse_parameter(const char *text,
>   strbuf_list_free(pair);
>   return error("bogus config parameter: %s", text);
>   }
> - strbuf_tolower(pair[0]);
> - if (fn(pair[0]->buf, value, data) < 0) {
> - strbuf_list_free(pair);
> +
> + if (git_config_parse_key(pair[0]->buf, _name, NULL))
>   return -1;
> - }

I think git_config_parse_key() will free canonical_name itself if it
returns failure. But do you need to strbuf_list_free(pair) here?

Or alternatively:

  int ret = -1;
  if (!parse(...))
  ret = fn(...);

or use a "got out". Whatever. You don't need me to teach you about error
exits. :)

> + ret = (fn(canonical_name, value, data) < 0) ? -1 : 0;
> +
> + free(canonical_name);
>   strbuf_list_free(pair);
> - return 0;
> + return ret;

Looks good.

> diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
> index 923bfc5a26..ea371020fa 100755
> --- a/t/t1300-repo-config.sh
> +++ b/t/t1300-repo-config.sh

I just skimmed these, as they look like the previous ones.

So overall I like it, modulo the minor error-leak.

-Peff

Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 03:03:58PM -0800, Jonathan Tan wrote:

> If a server sets allowtipsha1inwant (or allowreachablesha1inwant), a
> client can call "git fetch  " where SHA-1 is the hash of
> a blob (reachable or unreachable) to obtain it. The test below (which
> passes) demonstrates that.

Thanks for a nice demonstration.

TBH, I think the whole "we will not give you unreachable things" is
overblown as a security measure. There are a lot of ways to leak
information out of the repo. E.g., claiming you _do_ have the
unreachable thing, at which point pack-objects may create a delta
against it. Done repeatedly, it works as an oracle by which you can
guess the contents of the unreachable thing.

> This happens most likely because the "rev-list" call in
> "do_reachable_revlist" (called through "has_unreachable") is invoked
> without the "--objects" argument. "has_unreachable" determines that an
> object is unreachable if nothing is printed to stdout, which normally
> works, except that "rev-list" prints nothing when asked which commits
> are reachable from a blob (which makes sense).
> 
> Adding "--objects" works, and all existing tests pass, except for the
> potential performance issue and the side effect that even fetching a
> reachable blob no longer works. This is due to a possible bug where a
> call like "git rev-list --objects $tree ^master" (where $tree is the
> tree object corresponding to master) prints out objects even though all
> objects reachable from the tree are also reachable from master. (There
> should be no issue with "get_reachable_list", the other invoker of
> "do_reachable_revlist", because non-commits in the command's stdout are
> skipped.)

I think "--objects --use-bitmap-index" would be faster. If you have
bitmaps.

-Peff

Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Junio C Hamano

Junio C Hamano  writes:

> Jonathan Tan  writes:
>
>> Adding "--objects" works, and all existing tests pass, except for the
>> potential performance issue and the side effect that even fetching a
>> reachable blob no longer works. This is due to a possible bug where a
>> call like "git rev-list --objects $tree ^master" (where $tree is the
>> tree object corresponding to master) prints out objects ...
>
> The "reachable from this, excluding what is reachable from that"
> notation was originally designed to work only on commits, and I
> wouldn't be surprised if "$tree ^master" did not work as you expect
> in the current implementation.
>
> I agree that ideally it shouldn't show anything, but I suspect that
> it would make it very expensive if done naively---we'd end up having
> to call mark_tree_uninteresting() for all uninteresting commits, not
> just for the commits at the edge of the DAG as we do right now.

BTW, by "it would be expensive" I didn't mean "hence it shouldn't be
done."  Even though I do not know by how much expensive it would
become, I think this approach is a lot better way going forward than
punting and say "no, you cannot specify anything lower than commit."

Thanks for looking into this.

Re: [PATCH 1/3] add collision-detecting sha1 implementation

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 04:12:01PM -0800, Linus Torvalds wrote:

> On Thu, Feb 23, 2017 at 4:01 PM, Jeff King  wrote:
> >
> > You know, I didn't even look at the LICENSE file, since it said MIT and
> > had a link here. It would be trivial to copy it over, too, of course.
> 
> You should do it. It's just good to be careful and clear with
> licenses, and the license text does require that the copyright notice
> and permission file should be included in copies.
> 
> My patch did it. "Pats self on head".

And that's why yours crossed the 100K barrier. :)

But yeah, I agree it is better to be safe (and that's we should contact
the authors). I'll point them out-of-band to this thread, and cc them if
it ends up being re-rolled.

-Peff

[PATCH] submodule init: warn about falling back to a local path

2017-02-23 Thread Stefan Beller

When a submodule is initialized, the config variable 'submodule..url'
is set depending on the value of the same variable in the .gitmodules
file. When the URL indicates to be relative, then the url is computed
relative to its default remote. The default remote cannot be determined
accurately in all cases, such that it falls back to 'origin'.

The 'origin' remote may not exist, though. In that case we give up looking
for a suitable remote and we'll just assume it to be a local relative path.

This can be confusing to users as there is a lot of guessing involved,
which is not obvious to the user.

So in the corner case of assuming a local autoritative truth, warn the
user to lessen the confusion.

This behavior was introduced in 4d6893200 (submodule add: allow relative
repository path even when no url is set, 2011-06-06), which shared the
code with submodule-init and then ported to C in 3604242f080a (submodule:
port init from shell to C, 2016-04-15).

In case of submodule-add, this behavior makes sense in some use cases[1],
however for submodule-init there does not seem to be an immediate obvious
use case to fall back to a local submodule. However there might be, so
warn instead of die here.

While adding the warning, also clarify the behavior of relative URLs in
the documentation.

[1] e.g. 
http://stackoverflow.com/questions/8721984/git-ignore-files-for-public-repository-but-not-for-private
"store a secret locally in a submodule, with no intention to publish it"

Reported-by: Shawn Pearce 
Signed-off-by: Stefan Beller 
---
 Documentation/git-submodule.txt | 22 +++---
 builtin/submodule--helper.c |  8 +++-
 2 files changed, 18 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 8acc72ebb8..d23dee2a5b 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -73,13 +73,17 @@ configuration entries unless `--name` is used to specify a 
logical name.
 +
  is the URL of the new submodule's origin repository.
 This may be either an absolute URL, or (if it begins with ./
-or ../), the location relative to the superproject's origin
+or ../), the location relative to the superproject's default remote
 repository (Please note that to specify a repository 'foo.git'
 which is located right next to a superproject 'bar.git', you'll
 have to use '../foo.git' instead of './foo.git' - as one might expect
 when following the rules for relative URLs - because the evaluation
 of relative URLs in Git is identical to that of relative directories).
-If the superproject doesn't have an origin configured
++
+The default remote is the remote of the remote tracking branch
+of the current branch. If no such remote tracking branch exists or
+the in detached HEAD mode, "origin" is assumed to be the default remote.
+If the superproject doesn't have a default remote configured
 the superproject is its own authoritative upstream and the current
 working directory is used instead.
 +
@@ -118,18 +122,22 @@ too (and can also report changes to a submodule's work 
tree).
 
 init [--] [...]::
Initialize the submodules recorded in the index (which were
-   added and committed elsewhere) by copying submodule
-   names and urls from .gitmodules to .git/config.
+   added and committed elsewhere) by copying `submodule.$name.url`
+   from .gitmodules to .git/config, resolving relative urls to be
+   relative to the default remote.
++
Optional  arguments limit which submodules will be initialized.
-   It will also copy the value of `submodule.$name.update` into
-   .git/config.
-   The key used in .git/config is `submodule.$name.url`.
+   If no path is specified all submodules are initialized.
++
+   When present, it will also copy the value of `submodule.$name.update`.
This command does not alter existing information in .git/config.
You can then customize the submodule clone URLs in .git/config
for your local setup and proceed to `git submodule update`;
you can also just use `git submodule update --init` without
the explicit 'init' step if you do not intend to customize
any submodule locations.
++
+   See the add subcommand for the defintion of default remote.
 
 deinit [-f|--force] (--all|[--] ...)::
Unregister the given submodules, i.e. remove the whole
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 899dc334e3..44c11dd91e 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -356,12 +356,10 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
strbuf_addf(, "remote.%s.url", remote);
free(remote);
 
-   if (git_config_get_string(remotesb.buf, ))
-   /*
-* The repository is its

Re: [PATCH 1/3] add collision-detecting sha1 implementation

2017-02-23 Thread Linus Torvalds

On Thu, Feb 23, 2017 at 4:01 PM, Jeff King  wrote:
>
> You know, I didn't even look at the LICENSE file, since it said MIT and
> had a link here. It would be trivial to copy it over, too, of course.

You should do it. It's just good to be careful and clear with
licenses, and the license text does require that the copyright notice
and permission file should be included in copies.

My patch did it. "Pats self on head".

 Linus

PS. And just to be polite, we should probably also just cc at least
Marc Stevens and Dan Shumow if we take that patch further. Their email
addresses are in the that LICENSE.txt file.

Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 03:50:00PM -0800, Junio C Hamano wrote:

> Jonathan Tan  writes:
> 
> > Adding "--objects" works, and all existing tests pass, except for the
> > potential performance issue and the side effect that even fetching a
> > reachable blob no longer works. This is due to a possible bug where a
> > call like "git rev-list --objects $tree ^master" (where $tree is the
> > tree object corresponding to master) prints out objects ...
> 
> The "reachable from this, excluding what is reachable from that"
> notation was originally designed to work only on commits, and I
> wouldn't be surprised if "$tree ^master" did not work as you expect
> in the current implementation.
> 
> I agree that ideally it shouldn't show anything, but I suspect that
> it would make it very expensive if done naively---we'd end up having
> to call mark_tree_uninteresting() for all uninteresting commits, not
> just for the commits at the edge of the DAG as we do right now.

Yes, it's super-expensive to do naively (like 40+ seconds of CPU on
torvalds/linux). Bitmaps should generally make it tolerable, though
there are corner cases (e.g., if a ref tip has to walk a bit to get to a
bitmap; we try to put bitmaps near the ref tips, but when you have
50,000 tags it's hard to do).

We've explored similar things at GitHub for doing reachability checks on
all 40-hex lookups (because right now I can point you to
.../git/git/commit/1234abcd and you see that object, even if it's only
in _my_ fork and not reachable from git/git).

-Peff

Re: [PATCH 1/3] add collision-detecting sha1 implementation

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 03:15:11PM -0800, Stefan Beller wrote:

> On Thu, Feb 23, 2017 at 3:05 PM, Jeff King  wrote:
> 
> > +* Copyright 2017 Marc Stevens , Dan Shumow 
> > (dan...@microsoft.com)
> > +* Distributed under the MIT Software License.
> > +* See accompanying file LICENSE.txt or copy at
> 
> The accompanying LICENSE file did not make it into this patch,
> that is more specialized/verbose than the one at
> https://opensource.org/licenses/MIT
> w.r.t. copyright notice requirement.

You know, I didn't even look at the LICENSE file, since it said MIT and
had a link here. It would be trivial to copy it over, too, of course.

> Apart from that MIT seems to be compatible with GPL
> according to the FSF, though IANAL.

Yeah, that's always been my understanding.

-Peff

Re: [PATCH 00/10] decoupling a submodule's existence and its url

2017-02-23 Thread Stefan Beller

On Thu, Feb 23, 2017 at 3:47 PM, Brandon Williams  wrote:
> There are two motivations for decoupling a submodule's existence from the url
> that is stored in .git/config.
>
> 1. Worktrees can't really be used with submodules since the existence is
>checked based on the shared .git/config.  This means that two different
>worktress have to have the same initialized submodules.  By decoupling a
>submodule's existence from the url, two different work trees can be
>configured to have different submodules checked out.

cc Duy for this one:

Well once we have the per-worktree configuration, e.g. [1] we can *technically*
have different submodules in different worktrees by saying

  workingtree0: submodule..url = git://example.org/real-submodule
  workingtree1: submodule..url = bogus
  workingtree2: submodule..url = more bogus

and once we used the URL in the first workingtree all urls are
degraded to a boolean flag,
so the different (possible bogus) content does not do harm, only user-confusion,
because the model of the URL being the flag indicating existence doesn't quite
fit multiple working trees.

[1] https://public-inbox.org/git/20170110112524.12870-3-pclo...@gmail.com/

> 2. Easily configure gorups of submodules that a user is interested in.  In a
>repository with hundreds of submodules it would be difficult to easily 
> tell git
>which modules to worry about without having to individually init all of
>them.  Instead, a pathspec can be used to more easily select or deselect
>groups of submodules.
>
> This patch series works to do this decoupling and instead allows a user to
> configure submodule.active with a pathspec to use to check if a submodule is
> initialized.

Thanks for stating both intentions!
Stefan

[PATCH 05/10] submodule--helper: add is_active command

2017-02-23 Thread Brandon Williams

There are a lot of places where an explicit check for
submodule."".url is done to see if a submodule exists.  In order
to more easily facilitate the use of the submodule.active config option
to indicate active submodules, add a helper which can be used to query
if a submodule is active or not.

Signed-off-by: Brandon Williams 
---
 builtin/submodule--helper.c| 11 
 t/t7413-submodule-is-active.sh | 63 ++
 2 files changed, 74 insertions(+)
 create mode 100755 t/t7413-submodule-is-active.sh

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index df0d9c166..dac02604d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1128,6 +1128,16 @@ static int absorb_git_dirs(int argc, const char **argv, 
const char *prefix)
return 0;
 }
 
+static int is_active(int argc, const char **argv, const char *prefix)
+{
+   if (argc != 2)
+   die("submodule--helper is-active takes exactly 1 arguments");
+
+   gitmodules_config();
+
+   return !is_submodule_initialized(argv[1]);
+}
+
 #define SUPPORT_SUPER_PREFIX (1<<0)
 
 struct cmd_struct {
@@ -1147,6 +1157,7 @@ static struct cmd_struct commands[] = {
{"init", module_init, 0},
{"remote-branch", resolve_remote_submodule_branch, 0},
{"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX},
+   {"is-active", is_active, 0},
 };
 
 int cmd_submodule__helper(int argc, const char **argv, const char *prefix)
diff --git a/t/t7413-submodule-is-active.sh b/t/t7413-submodule-is-active.sh
new file mode 100755
index 0..683487020
--- /dev/null
+++ b/t/t7413-submodule-is-active.sh
@@ -0,0 +1,63 @@
+#!/bin/sh
+
+test_description='Test submodule--helper is-active
+
+This test verifies that `git submodue--helper is-active` correclty identifies
+submodules which are "active" and interesting to the user.
+'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   git init sub &&
+   test_commit -C sub initial &&
+   git init super &&
+   test_commit -C super initial &&
+   git -C super submodule add ../sub sub1 &&
+   git -C super submodule add ../sub sub2 &&
+   git -C super commit -a -m "add 2 submodules at sub{1,2}"
+'
+
+test_expect_success 'is-active works with urls' '
+   git -C super submodule--helper is-active sub1 &&
+   git -C super submodule--helper is-active sub2 &&
+
+   git -C super config --unset submodule.sub1.URL &&
+   test_must_fail git -C super submodule--helper is-active sub1 &&
+   git -C super config submodule.sub1.URL ../sub &&
+   git -C super submodule--helper is-active sub1
+'
+
+test_expect_success 'is-active works with basic submodule.active config' '
+   git -C super config --add submodule.active "." &&
+   git -C super config --unset submodule.sub1.URL &&
+   git -C super config --unset submodule.sub2.URL &&
+
+   git -C super submodule--helper is-active sub1 &&
+   git -C super submodule--helper is-active sub2 &&
+
+   git -C super config submodule.sub1.URL ../sub &&
+   git -C super config submodule.sub2.URL ../sub &&
+   git -C super config --unset-all submodule.active
+'
+
+test_expect_success 'is-active correctly works with paths that are not 
submodules' '
+   test_must_fail git -C super submodule--helper is-active not-a-submodule 
&&
+
+   git -C super config --add submodule.active "." &&
+   test_must_fail git -C super submodule--helper is-active not-a-submodule 
&&
+
+   git -C super config --unset-all submodule.active
+'
+
+test_expect_success 'is-active works with exclusions in submodule.active 
config' '
+   git -C super config --add submodule.active "." &&
+   git -C super config --add submodule.active ":(exclude)sub1" &&
+
+   test_must_fail git -C super submodule--helper is-active sub1 &&
+   git -C super submodule--helper is-active sub2 &&
+
+   git -C super config --unset-all submodule.active
+'
+
+test_done
-- 
2.11.0.483.g087da7b7c-goog

Re: [BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Junio C Hamano

Jonathan Tan  writes:

> Adding "--objects" works, and all existing tests pass, except for the
> potential performance issue and the side effect that even fetching a
> reachable blob no longer works. This is due to a possible bug where a
> call like "git rev-list --objects $tree ^master" (where $tree is the
> tree object corresponding to master) prints out objects ...

The "reachable from this, excluding what is reachable from that"
notation was originally designed to work only on commits, and I
wouldn't be surprised if "$tree ^master" did not work as you expect
in the current implementation.

I agree that ideally it shouldn't show anything, but I suspect that
it would make it very expensive if done naively---we'd end up having
to call mark_tree_uninteresting() for all uninteresting commits, not
just for the commits at the edge of the DAG as we do right now.

[PATCH 02/10] submodule update: add `--init-active` switch

2017-02-23 Thread Brandon Williams

The new switch `--init-active` initializes the submodules which are
configured in `submodule.active` instead of those given as
command line arguments before updating. In the first implementation this
is made incompatible with further command line arguments as it is
unclear what the user means by

git submodule update --init --init-active 

This new switch allows users to record more complex patterns as it saves
retyping them whenever you invoke update.

Based on a patch by Stefan Beller 

Signed-off-by: Brandon Williams 
---
 Documentation/git-submodule.txt | 11 ++-
 git-submodule.sh| 21 ++---
 t/t7400-submodule-basic.sh  | 39 +++
 3 files changed, 67 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 918bd1d1b..626b9760a 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 'git submodule' [--quiet] status [--cached] [--recursive] [--] [...]
 'git submodule' [--quiet] init [--] [...]
 'git submodule' [--quiet] deinit [-f|--force] (--all|[--] ...)
-'git submodule' [--quiet] update [--init] [--remote] [-N|--no-fetch]
+'git submodule' [--quiet] update [--init[-active]] [--remote] [-N|--no-fetch]
  [--[no-]recommend-shallow] [-f|--force] [--rebase|--merge]
  [--reference ] [--depth ] [--recursive]
  [--jobs ] [--] [...]
@@ -195,6 +195,10 @@ If the submodule is not yet initialized, and you just want 
to use the
 setting as stored in .gitmodules, you can automatically initialize the
 submodule with the `--init` option.
 
+You can configure a set of submodules using pathspec syntax in
+submodule.active you can use `--init-active` to initialize
+those before updating.
+
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and update any nested submodules within.
 --
@@ -378,6 +382,11 @@ the submodule itself.
Initialize all submodules for which "git submodule init" has not been
called so far before updating.
 
+--init-active::
+   This option is only valid for the update command.
+   Initialize all submodules configured in "`submodule.active`"
+   that have not been updated before.
+
 --name::
This option is only valid for the add command. It sets the submodule's
name to the given string instead of defaulting to its path. The name
diff --git a/git-submodule.sh b/git-submodule.sh
index 554bd1c49..e5b9a8920 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -9,7 +9,7 @@ USAGE="[--quiet] add [-b ] [-f|--force] [--name ] 
[--reference ...]
or: $dashless [--quiet] init [--] [...]
or: $dashless [--quiet] deinit [-f|--force] (--all| [--] ...)
-   or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] 
[-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] 
[--reference ] [--recursive] [--] [...]
+   or: $dashless [--quiet] update [--init[-active]] [--remote] [-N|--no-fetch] 
[-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] 
[--reference ] [--recursive] [--] [...]
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit ] 
[commit] [--] [...]
or: $dashless [--quiet] foreach [--recursive] 
or: $dashless [--quiet] sync [--recursive] [--] [...]"
@@ -499,7 +499,12 @@ cmd_update()
progress="--progress"
;;
-i|--init)
-   init=1
+   test -z $init || test $init = by_args || die "$(gettext 
"Only one of --init or --init-active may be used.")"
+   init=by_args
+   ;;
+   --init-active)
+   test -z $init || test $init = by_config || die 
"$(gettext "Only one of --init or --init-active may be used.")"
+   init=by_config
;;
--remote)
remote=1
@@ -568,7 +573,17 @@ cmd_update()
 
if test -n "$init"
then
-   cmd_init "--" "$@" || return
+   if test "$init" = "by_config"
+   then
+   if test $# -gt 0
+   then
+   die "$(gettext "path arguments are incompatible 
with --init-active")"
+   fi
+   cmd_init "--" $(git config --get-all submodule.active) 
|| return
+   else
+   cmd_init "--" "$@" || return
+   fi
+
fi
 
{
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index b77cce8e4..f5df0a5b4 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -1116,5 +1116,44 @@ test_expect_success 'submodule helper list is not 
confused by common prefixes' '
test_cmp expect actual
 '

[PATCH 10/10] submodule--helper clone: check for configured submodules using helper

2017-02-23 Thread Brandon Williams

Use the 'is_submodule_initialized()' helper to check for configured
submodules instead of manually checking for the submodule's URL in the
config.

Signed-off-by: Brandon Williams 
---
 builtin/submodule--helper.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index dac02604d..68d21b600 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -760,7 +760,6 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
struct strbuf displaypath_sb = STRBUF_INIT;
struct strbuf sb = STRBUF_INIT;
const char *displaypath = NULL;
-   char *url = NULL;
int needs_cloning = 0;
 
if (ce_stage(ce)) {
@@ -795,14 +794,11 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
}
 
/*
-* Looking up the url in .git/config.
+* Check if the submodule has been initialized.
 * We must not fall back to .gitmodules as we only want
 * to process configured submodules.
 */
-   strbuf_reset();
-   strbuf_addf(, "submodule.%s.url", sub->name);
-   git_config_get_string(sb.buf, );
-   if (!url) {
+   if (!is_submodule_initialized(ce->name)) {
next_submodule_warn_missing(suc, out, displaypath);
goto cleanup;
}
@@ -836,7 +832,7 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
argv_array_push(>args, "--depth=1");
argv_array_pushl(>args, "--path", sub->path, NULL);
argv_array_pushl(>args, "--name", sub->name, NULL);
-   argv_array_pushl(>args, "--url", url, NULL);
+   argv_array_pushl(>args, "--url", sub->url, NULL);
if (suc->references.nr) {
struct string_list_item *item;
for_each_string_list_item(item, >references)
@@ -846,7 +842,6 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
argv_array_push(>args, suc->depth);
 
 cleanup:
-   free(url);
strbuf_reset(_sb);
strbuf_reset();
 
-- 
2.11.0.483.g087da7b7c-goog

[PATCH 01/10] submodule: decouple url and submodule existence

2017-02-23 Thread Brandon Williams

Currently the submodule..url config option is used to determine
if a given submodule exists and is interesting to the user.  This
however doesn't work very well because the URL is a config option for
the scope of a repository, whereas the existence of a submodule is an
option scoped to the working tree.

In a future with worktree support for submodules, there will be multiple
working trees, each of which may only need a subset of the submodules
checked out.  The URL (which is where the submodule repository can be
obtained) should not differ between different working trees.

It may also be convenient for users to more easily specify groups of
submodules they are interested in as apposed to running "git submodule
init " on each submodule they want checked out in their working
tree.

To this end, the config option submodule.active is introduced which
holds a pathspec that specifies which submodules should exist in the
working tree.

Signed-off-by: Brandon Williams 
---
 submodule.c | 25 ++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/submodule.c b/submodule.c
index 4c4f033e8..991066ddf 100644
--- a/submodule.c
+++ b/submodule.c
@@ -217,11 +217,16 @@ void gitmodules_config_sha1(const unsigned char 
*commit_sha1)
 int is_submodule_initialized(const char *path)
 {
int ret = 0;
-   const struct submodule *module = NULL;
+   const struct string_list *sl;
+   const struct submodule *module = submodule_from_path(null_sha1, path);
 
-   module = submodule_from_path(null_sha1, path);
+   /* early return if there isn't a path->module mapping */
+   if (!module)
+   return 0;
+
+   sl = git_config_get_value_multi("submodule.active");
 
-   if (module) {
+   if (!sl) {
char *key = xstrfmt("submodule.%s.url", module->name);
char *value = NULL;
 
@@ -229,6 +234,20 @@ int is_submodule_initialized(const char *path)
 
free(value);
free(key);
+   } else {
+   struct pathspec ps;
+   struct argv_array args = ARGV_ARRAY_INIT;
+   const struct string_list_item *item;
+
+   for_each_string_list_item(item, sl) {
+   argv_array_push(, item->string);
+   }
+
+   parse_pathspec(, 0, 0, 0, args.argv);
+   ret = match_pathspec(, path, strlen(path), 0, NULL, 1);
+
+   argv_array_clear();
+   clear_pathspec();
}
 
return ret;
-- 
2.11.0.483.g087da7b7c-goog

[PATCH 07/10] submodule status: use submodule--helper is-active

2017-02-23 Thread Brandon Williams

Signed-off-by: Brandon Williams 
---
 git-submodule.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 4633a4336..f8adfb179 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -1026,14 +1026,13 @@ cmd_status()
do
die_if_unmatched "$mode" "$sha1"
name=$(git submodule--helper name "$sm_path") || exit
-   url=$(git config submodule."$name".url)
displaypath=$(git submodule--helper relative-path 
"$prefix$sm_path" "$wt_prefix")
if test "$stage" = U
then
say "U$sha1 $displaypath"
continue
fi
-   if test -z "$url" ||
+   if ! git submodule--helper is-active "$sm_path" ||
{
! test -d "$sm_path"/.git &&
! test -f "$sm_path"/.git
-- 
2.11.0.483.g087da7b7c-goog

[PATCH 06/10] submodule add: respect submodule.active

2017-02-23 Thread Brandon Williams

When submodule.active is configured, in addition to adding
submodule."".url to the config, add the path of the added
submodule if it isn't already covered by the current config values.

Signed-off-by: Brandon Williams 
---
 git-submodule.sh   | 11 +++
 t/t7413-submodule-is-active.sh | 22 ++
 2 files changed, 33 insertions(+)

diff --git a/git-submodule.sh b/git-submodule.sh
index e5b9a8920..4633a4336 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -271,6 +271,17 @@ or you are unsure what this means choose another name with 
the '--name' option."
fi &&
git add --force .gitmodules ||
die "$(eval_gettext "Failed to register submodule '\$sm_path'")"
+
+   if git config --get submodule.active >/dev/null
+   then
+   # If the submodule being adding isn't already covered by the
+   # current configured pathspec, add the submodule's path to
+   # 'submodule.active'
+   if ! git submodule--helper is-active "$sm_path"
+   then
+   git config --add submodule.active "$sm_path"
+   fi
+   fi
 }
 
 #
diff --git a/t/t7413-submodule-is-active.sh b/t/t7413-submodule-is-active.sh
index 683487020..8a519163b 100755
--- a/t/t7413-submodule-is-active.sh
+++ b/t/t7413-submodule-is-active.sh
@@ -60,4 +60,26 @@ test_expect_success 'is-active works with exclusions in 
submodule.active config'
git -C super config --unset-all submodule.active
 '
 
+test_expect_success 'is-active and submodule add' '
+   test_when_finished "rm -rf super2" &&
+   git init super2 &&
+   test_commit -C super2 initial &&
+   git -C super2 config --add submodule.active "sub*" &&
+
+   cat >expect <<-\EOF &&
+   sub*
+   EOF
+   git -C super2 submodule add ../sub sub1 &&
+   git -C super2 config --get-all submodule.active >actual &&
+   test_cmp actual expect &&
+
+   cat >expect <<-\EOF &&
+   sub*
+   mod
+   EOF
+   git -C super2 submodule add ../sub mod &&
+   git -C super2 config --get-all submodule.active >actual &&
+   test_cmp actual expect
+'
+
 test_done
-- 
2.11.0.483.g087da7b7c-goog

[PATCH 03/10] clone: add --submodule-spec= switch

2017-02-23 Thread Brandon Williams

The new switch passes the pathspec to `git submodule update
--init-active` which is called after the actual clone is done.

Additionally this configures the submodule.active option to
be the given pathspec, such that any future invocation of
`git submodule update --init-active` will keep up
with the pathspec.

Based on a patch by Stefan Beller 

Signed-off-by: Brandon Williams 
---
 Documentation/git-clone.txt | 23 ++-
 builtin/clone.c | 36 +--
 t/t7400-submodule-basic.sh  | 70 +
 3 files changed, 120 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 35cc34b2f..9692eab30 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -15,7 +15,8 @@ SYNOPSIS
  [--dissociate] [--separate-git-dir ]
  [--depth ] [--[no-]single-branch]
  [--recursive | --recurse-submodules] [--[no-]shallow-submodules]
- [--jobs ] [--]  []
+ [--submodule-spec ] [--jobs ] [--]
+  []
 
 DESCRIPTION
 ---
@@ -217,12 +218,20 @@ objects from the source repository into a pack in the 
cloned repository.
 
 --recursive::
 --recurse-submodules::
-   After the clone is created, initialize all submodules within,
-   using their default settings. This is equivalent to running
-   `git submodule update --init --recursive` immediately after
-   the clone is finished. This option is ignored if the cloned
-   repository does not have a worktree/checkout (i.e. if any of
-   `--no-checkout`/`-n`, `--bare`, or `--mirror` is given)
+   After the clone is created, initialize and clone all submodules
+   within, using their default settings. This is equivalent to
+   running `git submodule update --recursive --init` immediately
+   after the clone is finished. This option is ignored if the
+   cloned repository does not have a worktree/checkout (i.e.  if
+   any of `--no-checkout`/`-n`, `--bare`, or `--mirror` is given)
+
+--submodule-spec::
+   After the clone is created, initialize and clone specified
+   submodules within, using their default settings. It is possible
+   to give multiple specifications by giving this argument multiple
+   times. This is equivalent to configuring `submodule.active`
+   and then running `git submodule update --init-active`
+   immediately after the clone is finished.
 
 --[no-]shallow-submodules::
All submodules which are cloned will be shallow with a depth of 1.
diff --git a/builtin/clone.c b/builtin/clone.c
index 5ef81927a..15a8d3ef0 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -56,6 +56,16 @@ static struct string_list option_required_reference = 
STRING_LIST_INIT_NODUP;
 static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
+static struct string_list submodule_spec;
+
+static int submodule_spec_cb(const struct option *opt, const char *arg, int 
unset)
+{
+   if (unset)
+   return -1;
+
+   string_list_append((struct string_list *)opt->value, arg);
+   return 0;
+}
 
 static struct option builtin_clone_options[] = {
OPT__VERBOSITY(_verbosity),
@@ -112,6 +122,9 @@ static struct option builtin_clone_options[] = {
TRANSPORT_FAMILY_IPV4),
OPT_SET_INT('6', "ipv6", , N_("use IPv6 addresses only"),
TRANSPORT_FAMILY_IPV6),
+   OPT_CALLBACK(0, "submodule-spec", _spec, N_(""),
+   N_("clone specific submodules. Pass multiple times for 
complex pathspecs"),
+   submodule_spec_cb),
OPT_END()
 };
 
@@ -733,13 +746,21 @@ static int checkout(int submodule_progress)
err |= run_hook_le(NULL, "post-checkout", sha1_to_hex(null_sha1),
   sha1_to_hex(sha1), "1", NULL);
 
-   if (!err && option_recursive) {
+   if (!err && (option_recursive || submodule_spec.nr > 0)) {
struct argv_array args = ARGV_ARRAY_INIT;
-   argv_array_pushl(, "submodule", "update", "--init", 
"--recursive", NULL);
+   argv_array_pushl(, "submodule", "update", NULL);
+
+   if (submodule_spec.nr > 0)
+   argv_array_pushf(, "--init-active");
+   else
+   argv_array_pushf(, "--init");
 
if (option_shallow_submodules == 1)
argv_array_push(, "--depth=1");
 
+   if (option_recursive)
+   argv_array_pushf(, "--recursive");
+
if (max_jobs != -1)
argv_array_pushf(, "--jobs=%d", max_jobs);
 
@@ -887,6 +908,17 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
option_no_checkout = 1;
}
 
+   if (submodule_spec.nr > 0) {
+

[PATCH 09/10] submodule sync: use submodule--helper is-active

2017-02-23 Thread Brandon Williams

Signed-off-by: Brandon Williams 
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 02b85dceb..f35345775 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -1127,7 +1127,7 @@ cmd_sync()
;;
esac
 
-   if git config "submodule.$name.url" >/dev/null 2>/dev/null
+   if git submodule--helper is-active "$sm_path"
then
displaypath=$(git submodule--helper relative-path 
"$prefix$sm_path" "$wt_prefix")
say "$(eval_gettext "Synchronizing submodule url for 
'\$displaypath'")"
-- 
2.11.0.483.g087da7b7c-goog

[PATCH 00/10] decoupling a submodule's existence and its url

2017-02-23 Thread Brandon Williams

There are two motivations for decoupling a submodule's existence from the url
that is stored in .git/config.

1. Worktrees can't really be used with submodules since the existence is
   checked based on the shared .git/config.  This means that two different
   worktress have to have the same initialized submodules.  By decoupling a
   submodule's existence from the url, two different work trees can be
   configured to have different submodules checked out.
2. Easily configure gorups of submodules that a user is interested in.  In a
   repository with hundreds of submodules it would be difficult to easily tell 
git
   which modules to worry about without having to individually init all of
   them.  Instead, a pathspec can be used to more easily select or deselect
   groups of submodules.

This patch series works to do this decoupling and instead allows a user to
configure submodule.active with a pathspec to use to check if a submodule is
initialized.

Brandon Williams (10):
  submodule: decouple url and submodule existence
  submodule update: add `--init-active` switch
  clone: add --submodule-spec= switch
  completion: clone can initialize specific submodules
  submodule--helper: add is_active command
  submodule add: respect submodule.active
  submodule status: use submodule--helper is-active
  submodule deinit: use most reliable url
  submodule sync: use submodule--helper is-active
  submodule--helper clone: check for configured submodules using helper

 Documentation/git-clone.txt|  23 ---
 Documentation/git-submodule.txt|  11 +++-
 builtin/clone.c|  36 ++-
 builtin/submodule--helper.c|  22 ---
 contrib/completion/git-completion.bash |   1 +
 git-submodule.sh   |  39 +---
 submodule.c|  25 +++-
 t/t7400-submodule-basic.sh | 109 +
 t/t7413-submodule-is-active.sh |  85 +
 9 files changed, 323 insertions(+), 28 deletions(-)
 create mode 100755 t/t7413-submodule-is-active.sh

-- 
2.11.0.483.g087da7b7c-goog

[PATCH 04/10] completion: clone can initialize specific submodules

2017-02-23 Thread Brandon Williams

Signed-off-by: Brandon Williams 
---
 contrib/completion/git-completion.bash | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 6721ff80f..4e473aa90 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1138,6 +1138,7 @@ _git_clone ()
--single-branch
--branch
--recurse-submodules
+   --submodule-spec
"
return
;;
-- 
2.11.0.483.g087da7b7c-goog

[PATCH 08/10] submodule deinit: use most reliable url

2017-02-23 Thread Brandon Williams

The user could have configured the submodule to have a different URL
from the one in the superproject's config.  To account for this read
what the submodule has configured for remote.origin.url and use that
instead.

Signed-off-by: Brandon Williams 
---
 git-submodule.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index f8adfb179..02b85dceb 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -466,7 +466,7 @@ Submodule work tree '\$displaypath' contains a .git 
directory
then
# Remove the whole section so we have a clean state when
# the user later decides to init this submodule again
-   url=$(git config submodule."$name".url)
+   url=$(GIT_DIR="$(git rev-parse --git-path 
modules/$name)" git config remote.origin.url)
git config --remove-section submodule."$name" 
2>/dev/null &&
say "$(eval_gettext "Submodule '\$name' (\$url) 
unregistered for path '\$displaypath'")"
fi
-- 
2.11.0.483.g087da7b7c-goog

Re: SHA1 collisions found

2017-02-23 Thread Linus Torvalds

On Thu, Feb 23, 2017 at 3:05 PM, Jeff King  wrote:
>
> (By the way, I don't see your version on the list, Linus, which probably
> means it was eaten by the 100K filter).

Ahh. I didn't even think about a size filter.

Doesn't matter, your version looks fine.

   Linus

Re: [PATCH v2] config: reject invalid VAR in 'git -c VAR=VAL command'

2017-02-23 Thread Junio C Hamano

Junio C Hamano  writes:

> Jeff King  writes:
>
>> FWIW, the code looks OK here. It is a shame to duplicate the policy
>> found in git_config_parse_key(), though.
>>
>> I wonder if we could make a master version of that which canonicalizes
>> in-place, and then just wrap it for the git_config_parse_key()
>> interface. Actually, I guess the function you just wrote _is_ that inner
>> function, as long as it learned about the "quiet" flag.
>
> Hmm, I obviously missed an opportunity
> ...
> But you are right.  config-parse-key does have the simpler string
> that can just be given to the canonicalize thing and we should be
> able to reuse it.

Actually, I think we can just use the existing config_parse_key()
instead of adding the new function.  It adds one allocation and
deallocation, but it's not like this is a performance-critical
codepath that we absolutely avoid extra allocations.  After all, we
are still using the strbuf-split thing :-/.

The attached patch shows the updated fix.  It needs a preparatory
code move (not shown here) to make git_config_parse_key() available
to git_config_parse_parameter(), though.

-- >8 --
From: Junio C Hamano 
Date: Thu, 23 Feb 2017 15:04:40 -0800
Subject: [PATCH] config: reject invalid VAR in 'git -c VAR=VAL command' and
 keep subsection intact

The parsing of one-shot assignments of configuration variables that
come from the command line historically was quite loose and allowed
anything to pass.  It also downcased everything in the variable name,
even a three-level .. name in which
the  part must be treated in a case sensible manner.

Existing git_config_parse_key() helper is used to parse the variable
name that comes from the command line, i.e. "git config VAR VAL",
and handles these details correctly.  Replace the strbuf_tolower()
call in git-config_parse_parameter() with a call to it to correct
both issues.  git_config_parse_key() does a bit more things that are
not necessary for the purpose of this codepath (e.g. it allocates a
separate buffer to return the canonicalized variable name because it
takes a "const char *" input), but we are not in a performance-critical
codepath here.

Signed-off-by: Junio C Hamano 
---
 config.c   | 14 
 t/t1300-repo-config.sh | 62 ++
 2 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/config.c b/config.c
index b8cce1dffa..39f20dcd2c 100644
--- a/config.c
+++ b/config.c
@@ -295,7 +295,9 @@ int git_config_parse_parameter(const char *text,
   config_fn_t fn, void *data)
 {
const char *value;
+   char *canonical_name;
struct strbuf **pair;
+   int ret = 0;
 
pair = strbuf_split_str(text, '=', 2);
if (!pair[0])
@@ -313,13 +315,15 @@ int git_config_parse_parameter(const char *text,
strbuf_list_free(pair);
return error("bogus config parameter: %s", text);
}
-   strbuf_tolower(pair[0]);
-   if (fn(pair[0]->buf, value, data) < 0) {
-   strbuf_list_free(pair);
+
+   if (git_config_parse_key(pair[0]->buf, _name, NULL))
return -1;
-   }
+
+   ret = (fn(canonical_name, value, data) < 0) ? -1 : 0;
+
+   free(canonical_name);
strbuf_list_free(pair);
-   return 0;
+   return ret;
 }
 
 int git_config_from_parameters(config_fn_t fn, void *data)
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 923bfc5a26..ea371020fa 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1097,6 +1097,68 @@ test_expect_success 'multiple git -c appends config' '
test_cmp expect actual
 '
 
+test_expect_success 'last one wins: two level vars' '
+
+   # sec.var and sec.VAR are the same variable, as the first
+   # and the last level of a configuration variable name is
+   # case insensitive.
+
+   echo VAL >expect &&
+
+   git -c sec.var=val -c sec.VAR=VAL config --get sec.var >actual &&
+   test_cmp expect actual &&
+   git -c SEC.var=val -c sec.var=VAL config --get sec.var >actual &&
+   test_cmp expect actual &&
+
+   git -c sec.var=val -c sec.VAR=VAL config --get SEC.var >actual &&
+   test_cmp expect actual &&
+   git -c SEC.var=val -c sec.var=VAL config --get sec.VAR >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'last one wins: three level vars' '
+
+   # v.a.r and v.A.r are not the same variable, as the middle
+   # level of a three-level configuration variable name is
+   # case sensitive.
+
+   echo val >expect &&
+   git -c v.a.r=val -c v.A.r=VAL config --get v.a.r >actual &&
+   test_cmp expect actual &&
+   git -c v.a.r=val -c v.A.r=VAL config --get V.a.R >actual &&
+   test_cmp expect actual &&
+
+   # v.a.r and V.a.R are the same variable, as the first
+   # and the last level of a

Re: [PATCH 1/3] add collision-detecting sha1 implementation

2017-02-23 Thread Stefan Beller

On Thu, Feb 23, 2017 at 3:05 PM, Jeff King  wrote:

> +* Copyright 2017 Marc Stevens , Dan Shumow 
> (dan...@microsoft.com)
> +* Distributed under the MIT Software License.
> +* See accompanying file LICENSE.txt or copy at

The accompanying LICENSE file did not make it into this patch,
that is more specialized/verbose than the one at
https://opensource.org/licenses/MIT
w.r.t. copyright notice requirement.

Apart from that MIT seems to be compatible with GPL
according to the FSF, though IANAL.

[PATCH 15/15] builtin/checkout: add --recurse-submodules switch

2017-02-23 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 Documentation/git-checkout.txt |  7 +++
 builtin/checkout.c | 28 
 t/lib-submodule-update.sh  | 33 -
 t/t2013-checkout-submodule.sh  |  5 +
 4 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index 8e2c0662dd..d6399c0af8 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -256,6 +256,13 @@ section of linkgit:git-add[1] to learn how to operate the 
`--patch` mode.
out anyway. In other words, the ref can be held by more than one
worktree.
 
+--[no-]recurse-submodules::
+   Using --recurse-submodules will update the content of all initialized
+   submodules according to the commit recorded in the superproject. If
+   local modifications in a submodule would be overwritten the checkout
+   will fail unless `-f` is used. If nothing (or --no-recurse-submodules)
+   is used, the work trees of submodules will not be updated.
+
 ::
Branch to checkout; if it refers to a branch (i.e., a name that,
when prepended with "refs/heads/", is a valid ref), then that
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f174f50303..207ce09771 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -21,12 +21,31 @@
 #include "submodule-config.h"
 #include "submodule.h"
 
+static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
+
 static const char * const checkout_usage[] = {
N_("git checkout [] "),
N_("git checkout [] [] -- ..."),
NULL,
 };
 
+int option_parse_recurse_submodules(const struct option *opt,
+   const char *arg, int unset)
+{
+   if (unset) {
+   recurse_submodules = RECURSE_SUBMODULES_OFF;
+   return 0;
+   }
+   if (arg)
+   recurse_submodules =
+   parse_update_recurse_submodules_arg(opt->long_name,
+   arg);
+   else
+   recurse_submodules = RECURSE_SUBMODULES_ON;
+
+   return 0;
+}
+
 struct checkout_opts {
int patch_mode;
int quiet;
@@ -1163,6 +1182,9 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
N_("second guess 'git checkout 
'")),
OPT_BOOL(0, "ignore-other-worktrees", 
_other_worktrees,
 N_("do not check if another worktree is holding the 
given ref")),
+   { OPTION_CALLBACK, 0, "recurse-submodules", _submodules,
+   "checkout", "control recursive updating of 
submodules",
+   PARSE_OPT_OPTARG, option_parse_recurse_submodules },
OPT_BOOL(0, "progress", _progress, N_("force progress 
reporting")),
OPT_END(),
};
@@ -1193,6 +1215,12 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
git_xmerge_config("merge.conflictstyle", conflict_style, NULL);
}
 
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
+   git_config(submodule_config, NULL);
+   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT)
+   
set_config_update_recurse_submodules(recurse_submodules);
+   }
+
if ((!!opts.new_branch + !!opts.new_branch_force + 
!!opts.new_orphan_branch) > 1)
die(_("-b, -B and --orphan are mutually exclusive"));
 
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 0b26f0e20f..54cd8a6366 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -756,6 +756,11 @@ test_submodule_forced_switch () {
 
 test_submodule_switch_recursing () {
command="$1"
+   RESULT=success
+   if test "$KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS" = 1
+   then
+   RESULT=failure
+   fi
# Appearing submodule #
# Switching to a commit letting a submodule appear checks it out ...
test_expect_success "$command: added submodule is checked out" '
@@ -865,7 +870,7 @@ test_submodule_switch_recursing () {
'
# Replacing a submodule with files in a directory must succeeds
# when the submodule is clean
-   test_expect_success "$command: replace submodule with a directory" '
+   test_expect_$RESULT "$command: replace submodule with a directory" '
prolog &&
reset_work_tree_to_interested add_sub1 &&
(
@@ -877,7 +882,7 @@ test_submodule_switch_recursing () {
)
'
# ... absorbing a .git directory.
-   test_expect_success "$command: replace submodule containing a .git 
directory with a directory must absorb the git dir" '
+   test_expect_$RESULT "$command: replace

[PATCH 13/15] read-cache, remove_marked_cache_entries: wipe selected submodules.

2017-02-23 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 read-cache.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 9054369dd0..9a2abacf7a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,8 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "submodule.h"
+#include "submodule-config.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -520,6 +522,22 @@ int remove_index_entry_at(struct index_state *istate, int 
pos)
return 1;
 }
 
+static void remove_submodule_according_to_strategy(const struct submodule *sub)
+{
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   submodule_move_head(sub->path, "HEAD", NULL, \
+   SUBMODULE_MOVE_HEAD_FORCE);
+   break;
+   case SM_UPDATE_NONE:
+   case SM_UPDATE_COMMAND:
+   ; /* Do not touch the submodule. */
+   }
+}
+
 /*
  * Remove all cache entries marked for removal, that is where
  * CE_REMOVE is set in ce_flags.  This is much more effective than
@@ -532,8 +550,13 @@ void remove_marked_cache_entries(struct index_state 
*istate)
 
for (i = j = 0; i < istate->cache_nr; i++) {
if (ce_array[i]->ce_flags & CE_REMOVE) {
-   remove_name_hash(istate, ce_array[i]);
-   save_or_free_index_entry(istate, ce_array[i]);
+   const struct submodule *sub = 
submodule_from_ce(ce_array[i]);
+   if (sub) {
+   remove_submodule_according_to_strategy(sub);
+   } else {
+   remove_name_hash(istate, ce_array[i]);
+   save_or_free_index_entry(istate, ce_array[i]);
+   }
}
else
ce_array[j++] = ce_array[i];
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 10/15] update submodules: add submodule_move_head

2017-02-23 Thread Stefan Beller

In later patches we introduce the options and flag for commands
that modify the working directory, e.g. git-checkout.

This piece of code will be used universally for
all these working tree modifications as it
* supports dry run to answer the question:
  "Is it safe to change the submodule to this new state?"
  e.g. is it overwriting untracked files or are there local
  changes that would be overwritten?
* supports a force flag that can be used for resetting
  the tree.

Signed-off-by: Stefan Beller 
---
 submodule.c | 135 
 submodule.h |   7 
 2 files changed, 142 insertions(+)

diff --git a/submodule.c b/submodule.c
index 0b2596e88a..a2cf8c9376 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1239,6 +1239,141 @@ int bad_to_remove_submodule(const char *path, unsigned 
flags)
return ret;
 }
 
+static int submodule_has_dirty_index(const struct submodule *sub)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   argv_array_pushl(, "diff-index", "--quiet", \
+   "--cached", "HEAD", NULL);
+   cp.no_stdin = 1;
+   cp.no_stdout = 1;
+   cp.dir = sub->path;
+   if (start_command())
+   die("could not recurse into submodule '%s'", sub->path);
+
+   return finish_command();
+}
+
+void submodule_reset_index(const char *path)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+
+   argv_array_pushf(, "--super-prefix=%s/", path);
+   argv_array_pushl(, "read-tree", "-u", "--reset", NULL);
+
+   argv_array_push(, EMPTY_TREE_SHA1_HEX);
+
+   if (run_command())
+   die("could not reset submodule index");
+}
+
+/**
+ * Moves a submodule at a given path from a given head to another new head.
+ * For edge cases (a submodule coming into existence or removing a submodule)
+ * pass NULL for old or new respectively.
+ */
+int submodule_move_head(const char *path,
+const char *old,
+const char *new,
+unsigned flags)
+{
+   int ret = 0;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   const struct submodule *sub;
+
+   sub = submodule_from_path(null_sha1, path);
+
+   if (!sub)
+   die("BUG: could not get submodule information for '%s'", path);
+
+   if (old && !(flags & SUBMODULE_MOVE_HEAD_FORCE)) {
+   /* Check if the submodule has a dirty index. */
+   if (submodule_has_dirty_index(sub))
+   return error(_("submodule '%s' has dirty index"), path);
+   }
+
+   if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
+   if (old) {
+   if (!submodule_uses_gitfile(path))
+   absorb_git_dir_into_superproject("", path,
+   ABSORB_GITDIR_RECURSE_SUBMODULES);
+   } else {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addf(, "%s/modules/%s",
+   get_git_common_dir(), sub->name);
+   connect_work_tree_and_git_dir(path, sb.buf);
+   strbuf_release();
+
+   /* make sure the index is clean as well */
+   submodule_reset_index(path);
+   }
+   }
+
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+
+   argv_array_pushf(, "--super-prefix=%s/", path);
+   argv_array_pushl(, "read-tree", NULL);
+
+   if (flags & SUBMODULE_MOVE_HEAD_DRY_RUN)
+   argv_array_push(, "-n");
+   else
+   argv_array_push(, "-u");
+
+   if (flags & SUBMODULE_MOVE_HEAD_FORCE)
+   argv_array_push(, "--reset");
+   else
+   argv_array_push(, "-m");
+
+   argv_array_push(, old ? old : EMPTY_TREE_SHA1_HEX);
+   argv_array_push(, new ? new : EMPTY_TREE_SHA1_HEX);
+
+   if (run_command()) {
+   ret = -1;
+   goto out;
+   }
+
+   if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
+   if (new) {
+   struct child_process cp1 = CHILD_PROCESS_INIT;
+   /* also set the HEAD accordingly */
+   cp1.git_cmd = 1;
+   cp1.no_stdin = 1;
+   cp1.dir = path;
+
+   argv_array_pushl(, "update-ref", "HEAD",
+new ? new : EMPTY_TREE_SHA1_HEX, NULL);
+
+   if (run_command()) {
+   ret = -1;
+   goto out;
+   }

[PATCH 05/15] connect_work_tree_and_git_dir: safely create leading directories

2017-02-23 Thread Stefan Beller

In a later patch we'll use connect_work_tree_and_git_dir when the
directory for the gitlink file doesn't exist yet. This patch makes
connect_work_tree_and_git_dir safe to use for both cases of
either the git dir or the working dir missing.

To do so, we need to call safe_create_leading_directories[_const]
on both directories. However this has to happen before we construct
the absolute paths as real_pathdup assumes the directories to
be there already.

So for both the config file in the git dir as well as the .git link
file we need to
a) construct the name
b) call SCLD
c) get the absolute path
d) once a-c is done for both we can consume the absolute path
   to compute the relative path to each other and store those
   relative paths.

The implementation provided here puts a) and b) for both cases first,
and then performs c and d after.

One of the two users of 'connect_work_tree_and_git_dir' already checked
for the directory being there, so we can loose that check as
connect_work_tree_and_git_dir handles this functionality now.

Signed-off-by: Stefan Beller 
---
 dir.c   | 32 +---
 submodule.c | 11 ++-
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/dir.c b/dir.c
index 4541f9e146..6f52af7abb 100644
--- a/dir.c
+++ b/dir.c
@@ -2728,23 +2728,33 @@ void untracked_cache_add_to_index(struct index_state 
*istate,
 /* Update gitfile and core.worktree setting to connect work tree and git dir */
 void connect_work_tree_and_git_dir(const char *work_tree_, const char 
*git_dir_)
 {
-   struct strbuf file_name = STRBUF_INIT;
+   struct strbuf gitfile_sb = STRBUF_INIT;
+   struct strbuf cfg_sb = STRBUF_INIT;
struct strbuf rel_path = STRBUF_INIT;
-   char *git_dir = real_pathdup(git_dir_);
-   char *work_tree = real_pathdup(work_tree_);
+   char *git_dir, *work_tree;
 
-   /* Update gitfile */
-   strbuf_addf(_name, "%s/.git", work_tree);
-   write_file(file_name.buf, "gitdir: %s",
-  relative_path(git_dir, work_tree, _path));
+   /* Prepare .git file */
+   strbuf_addf(_sb, "%s/.git", work_tree_);
+   if (safe_create_leading_directories_const(gitfile_sb.buf))
+   die(_("could not create directories for %s"), gitfile_sb.buf);
+
+   /* Prepare config file */
+   strbuf_addf(_sb, "%s/config", git_dir_);
+   if (safe_create_leading_directories_const(cfg_sb.buf))
+   die(_("could not create directories for %s"), cfg_sb.buf);
 
+   git_dir = real_pathdup(git_dir_);
+   work_tree = real_pathdup(work_tree_);
+
+   /* Write .git file */
+   write_file(gitfile_sb.buf, "gitdir: %s",
+  relative_path(git_dir, work_tree, _path));
/* Update core.worktree setting */
-   strbuf_reset(_name);
-   strbuf_addf(_name, "%s/config", git_dir);
-   git_config_set_in_file(file_name.buf, "core.worktree",
+   git_config_set_in_file(cfg_sb.buf, "core.worktree",
   relative_path(work_tree, git_dir, _path));
 
-   strbuf_release(_name);
+   strbuf_release(_sb);
+   strbuf_release(_sb);
strbuf_release(_path);
free(work_tree);
free(git_dir);
diff --git a/submodule.c b/submodule.c
index 0e55372f37..04d185738f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1442,8 +1442,6 @@ void absorb_git_dir_into_superproject(const char *prefix,
 
/* Not populated? */
if (!sub_git_dir) {
-   char *real_new_git_dir;
-   const char *new_git_dir;
const struct submodule *sub;
 
if (err_code == READ_GITFILE_ERR_STAT_FAILED) {
@@ -1466,13 +1464,8 @@ void absorb_git_dir_into_superproject(const char *prefix,
sub = submodule_from_path(null_sha1, path);
if (!sub)
die(_("could not lookup name for submodule '%s'"), 
path);
-   new_git_dir = git_path("modules/%s", sub->name);
-   if (safe_create_leading_directories_const(new_git_dir) < 0)
-   die(_("could not create directory '%s'"), new_git_dir);
-   real_new_git_dir = real_pathdup(new_git_dir);
-   connect_work_tree_and_git_dir(path, real_new_git_dir);
-
-   free(real_new_git_dir);
+   connect_work_tree_and_git_dir(path,
+   git_path("modules/%s", sub->name));
} else {
/* Is it already absorbed into the superprojects git dir? */
char *real_sub_git_dir = real_pathdup(sub_git_dir);
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 07/15] update submodules: add a config option to determine if submodules are updated

2017-02-23 Thread Stefan Beller

In later patches we introduce the options and flag for commands
that modify the working directory, e.g. git-checkout.

Have a central place to store such settings whether we want to update
a submodule.

Signed-off-by: Stefan Beller 
---
 submodule.c | 6 ++
 submodule.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/submodule.c b/submodule.c
index 04d185738f..591f4a694e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -17,6 +17,7 @@
 #include "worktree.h"
 
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
+static int config_update_recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int parallel_jobs = 1;
 static struct string_list changed_submodule_paths = STRING_LIST_INIT_NODUP;
 static int initialized_fetch_ref_tips;
@@ -542,6 +543,11 @@ void set_config_fetch_recurse_submodules(int value)
config_fetch_recurse_submodules = value;
 }
 
+void set_config_update_recurse_submodules(int value)
+{
+   config_update_recurse_submodules = value;
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
diff --git a/submodule.h b/submodule.h
index 0b915bd3ac..b4e60c08d2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -64,6 +64,7 @@ extern void show_submodule_inline_diff(FILE *f, const char 
*path,
const char *del, const char *add, const char *reset,
const struct diff_options *opt);
 extern void set_config_fetch_recurse_submodules(int value);
+extern void set_config_update_recurse_submodules(int value);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 3/3] Makefile: add USE_SHA1DC knob

2017-02-23 Thread Jeff King

This knob lets you use the sha1dc implementation from:

  https://github.com/cr-marcstevens/sha1collisiondetection

which can detect certain types of collision attacks (even
when we only see half of the colliding pair).

The big downside is that it's slower than either the openssl
or block-sha1 implementations.

Here are some timings based off of linux.git:

  - compute sha1 over whole packfile
before: 1.349s
 after: 5.067s
change: +275%

  - rev-list --all
before: 5.742s
 after: 5.730s
change: -0.2%

  - rev-list --all --objects
before: 33.257s
 after: 33.392s
change: +0.4%

  - index-pack --verify
before: 2m20s
 after: 5m43s
change: +145%

  - git log --no-merges -1 -p
before: 9.532s
 after: 9.683s
change: +1.5%

So overall the sha1 computation is about 3-4x slower. But of
course most operations do more than just sha1. Accessing
commits and trees isn't slowed at all (both the +/- changes
there are well within the run-to-run noise). Accessing the
blobs is a little slower, but mostly drowned out by the cost
of things like actually generating patches.

The most-affected operation is `index-pack --verify`, which
is essentially just computing the sha1 on every object. It's
a bit worse than twice as slow, which means every push and
every fetch is going to experience that.

Signed-off-by: Jeff King 
---
 Makefile  | 10 ++
 sha1dc/sha1.c | 22 ++
 sha1dc/sha1.h | 16 
 3 files changed, 48 insertions(+)

diff --git a/Makefile b/Makefile
index 8e4081e06..7c4906250 100644
--- a/Makefile
+++ b/Makefile
@@ -142,6 +142,10 @@ all::
 # Define PPC_SHA1 environment variable when running make to make use of
 # a bundled SHA1 routine optimized for PowerPC.
 #
+# Define USE_SHA1DC to unconditionally enable the collision-detecting sha1
+# algorithm. This is slower, but may detect attempted collision attacks.
+# Takes priority over other *_SHA1 knobs.
+#
 # Define SHA1_MAX_BLOCK_SIZE to limit the amount of data that will be hashed
 # in one call to the platform's SHA1_Update(). e.g. APPLE_COMMON_CRYPTO
 # wants 'SHA1_MAX_BLOCK_SIZE=1024L*1024L*1024L' defined.
@@ -1386,6 +1390,11 @@ ifdef APPLE_COMMON_CRYPTO
SHA1_MAX_BLOCK_SIZE = 1024L*1024L*1024L
 endif
 
+ifdef USE_SHA1DC
+   SHA1_HEADER = "sha1dc/sha1.h"
+   LIB_OBJS += sha1dc/sha1.o
+   LIB_OBJS += sha1dc/ubc_check.o
+else
 ifdef BLK_SHA1
SHA1_HEADER = "block-sha1/sha1.h"
LIB_OBJS += block-sha1/sha1.o
@@ -1403,6 +1412,7 @@ else
 endif
 endif
 endif
+endif
 
 ifdef SHA1_MAX_BLOCK_SIZE
LIB_OBJS += compat/sha1-chunked.o
diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index 762c6fff8..1566ec4c7 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -1141,3 +1141,25 @@ int SHA1DCFinal(unsigned char output[20], SHA1_CTX *ctx)
output[19] = (unsigned char)(ctx->ihv[4]);
return ctx->found_collision;
 }
+
+static const char collision_message[] =
+"The SHA1 computation detected evidence of a collision attack;\n"
+"refusing to process the contents.";
+
+void git_SHA1DCFinal(unsigned char hash[20], SHA1_CTX *ctx)
+{
+   if (SHA1DCFinal(hash, ctx))
+   die(collision_message);
+}
+
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
+{
+   const char *data = vdata;
+   /* We expect an unsigned long, but sha1dc only takes an int */
+   while (len > INT_MAX) {
+   SHA1DCUpdate(ctx, data, INT_MAX);
+   data += INT_MAX;
+   len -= INT_MAX;
+   }
+   SHA1DCUpdate(ctx, data, len);
+}
diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
index ce5390397..1bb0ace99 100644
--- a/sha1dc/sha1.h
+++ b/sha1dc/sha1.h
@@ -90,3 +90,19 @@ void SHA1DCUpdate(SHA1_CTX*, const char*, unsigned);
 // obtain SHA-1 hash from SHA-1 context
 // returns: 0 = no collision detected, otherwise = collision found => warn 
user for active attack
 int  SHA1DCFinal(unsigned char[20], SHA1_CTX*); 
+
+
+/*
+ * Same as SHA1DCFinal, but convert collision attack case into a verbose die().
+ */
+void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
+
+/*
+ * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
+ */
+void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
+
+#define platform_SHA_CTX SHA1_CTX
+#define platform_SHA1_Init SHA1DCInit
+#define platform_SHA1_Update git_SHA1DCUpdate
+#define platform_SHA1_Final git_SHA1DCFinal
-- 
2.12.0.rc2.629.ga7951ed82

[PATCH 2/3] sha1dc: adjust header includes for git

2017-02-23 Thread Jeff King

We can replace system includes with git-compat-util.h (and
should make sure it is included in all .c files). We can
drop includes from headers entirely, as every .c file is
supposed to include git-compat-util itself first.

We also use the full "sha1dc/" path for including related
files. This isn't strictly necessary, but makes the expected
resolution more obvious.

Signed-off-by: Jeff King 
---
 sha1dc/sha1.c  | 9 +++--
 sha1dc/sha1.h  | 2 --
 sha1dc/ubc_check.c | 4 ++--
 sha1dc/ubc_check.h | 2 --
 4 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
index ed2010911..762c6fff8 100644
--- a/sha1dc/sha1.c
+++ b/sha1dc/sha1.c
@@ -5,12 +5,9 @@
 * https://opensource.org/licenses/MIT
 ***/
 
-#include 
-#include 
-#include 
-
-#include "sha1.h"
-#include "ubc_check.h"
+#include "git-compat-util.h"
+#include "sha1dc/sha1.h"
+#include "sha1dc/ubc_check.h"
 
 #define rotate_right(x,n) (((x)>>(n))|((x)<<(32-(n
 #define rotate_left(x,n)  (((x)<<(n))|((x)>>(32-(n
diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
index 8b522f9d2..ce5390397 100644
--- a/sha1dc/sha1.h
+++ b/sha1dc/sha1.h
@@ -5,8 +5,6 @@
 * https://opensource.org/licenses/MIT
 ***/
 
-#include 
-
 // uses SHA-1 message expansion to expand the first 16 words of W[] to 80 words
 void sha1_message_expansion(uint32_t W[80]);
 
diff --git a/sha1dc/ubc_check.c b/sha1dc/ubc_check.c
index 556aaf3c5..6bccd4f2b 100644
--- a/sha1dc/ubc_check.c
+++ b/sha1dc/ubc_check.c
@@ -22,8 +22,8 @@
 // a directly verifiable version named ubc_check_verify can be found in 
ubc_check_verify.c
 // ubc_check has been verified against ubc_check_verify using the 
'ubc_check_test' program in the tools section
 
-#include 
-#include "ubc_check.h"
+#include "git-compat-util.h"
+#include "sha1dc/ubc_check.h"
 
 static const uint32_t DV_I_43_0_bit= (uint32_t)(1) << 0;
 static const uint32_t DV_I_44_0_bit= (uint32_t)(1) << 1;
diff --git a/sha1dc/ubc_check.h b/sha1dc/ubc_check.h
index 27285bdf5..05ff944eb 100644
--- a/sha1dc/ubc_check.h
+++ b/sha1dc/ubc_check.h
@@ -21,8 +21,6 @@
 #ifndef UBC_CHECK_H
 #define UBC_CHECK_H
 
-#include 
-
 #define DVMASKSIZE 1
 typedef struct { int dvType; int dvK; int dvB; int testt; int maski; int 
maskb; uint32_t dm[80]; } dv_info_t;
 extern dv_info_t sha1_dvs[];
-- 
2.12.0.rc2.629.ga7951ed82

[PATCH 12/15] unpack-trees: check if we can perform the operation for submodules

2017-02-23 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 unpack-trees.c | 131 +
 unpack-trees.h |   1 +
 2 files changed, 123 insertions(+), 9 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 616a0ae4b2..8333da2cc9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -10,6 +10,8 @@
 #include "attr.h"
 #include "split-index.h"
 #include "dir.h"
+#include "submodule.h"
+#include "submodule-config.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -45,6 +47,9 @@ static const char 
*unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
 
/* ERROR_WOULD_LOSE_ORPHANED_REMOVED */
"Working tree file '%s' would be removed by sparse checkout update.",
+
+   /* ERROR_WOULD_LOSE_SUBMODULE */
+   "Submodule '%s' cannot checkout new HEAD.",
 };
 
 #define ERRORMSG(o,type) \
@@ -161,6 +166,8 @@ void setup_unpack_trees_porcelain(struct 
unpack_trees_options *opts,
_("The following working tree files would be overwritten by 
sparse checkout update:\n%s");
msgs[ERROR_WOULD_LOSE_ORPHANED_REMOVED] =
_("The following working tree files would be removed by sparse 
checkout update:\n%s");
+   msgs[ERROR_WOULD_LOSE_SUBMODULE] =
+   _("Submodule '%s' cannot checkout new HEAD");
 
opts->show_all_errors = 1;
/* rejected paths may not have a static buffer */
@@ -240,12 +247,75 @@ static void display_error_msgs(struct 
unpack_trees_options *o)
fprintf(stderr, _("Aborting\n"));
 }
 
+static int check_submodule_move_head(const struct cache_entry *ce,
+const char *old_id,
+const char *new_id,
+struct unpack_trees_options *o)
+{
+   const struct submodule *sub = submodule_from_ce(ce);
+   if (!sub)
+   return 0;
+
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   if (submodule_move_head(ce->name, old_id, new_id, 
SUBMODULE_MOVE_HEAD_DRY_RUN))
+   return o->gently ? -1 :
+   add_rejected_path(o, 
ERROR_WOULD_LOSE_SUBMODULE, ce->name);
+   return 0;
+   case SM_UPDATE_NONE:
+   return 0;
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   case SM_UPDATE_COMMAND:
+   default:
+   warning(_("submodule update strategy not supported for 
submodule '%s'"), ce->name);
+   return -1;
+   }
+}
+
+static void reload_gitmodules_file(struct index_state *index,
+  struct checkout *state)
+{
+   int i;
+   for (i = 0; i < index->cache_nr; i++) {
+   struct cache_entry *ce = index->cache[i];
+   if (ce->ce_flags & CE_UPDATE) {
+   int r = strcmp(ce->name, ".gitmodules");
+   if (r < 0)
+   continue;
+   else if (r == 0) {
+   submodule_free();
+   checkout_entry(ce, state, NULL);
+   gitmodules_config();
+   git_config(submodule_config, NULL);
+   } else
+   break;
+   }
+   }
+}
+
 /*
  * Unlink the last component and schedule the leading directories for
  * removal, such that empty directories get removed.
  */
 static void unlink_entry(const struct cache_entry *ce)
 {
+   const struct submodule *sub = submodule_from_ce(ce);
+   if (sub) {
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   submodule_move_head(ce->name, "HEAD", NULL,
+   SUBMODULE_MOVE_HEAD_FORCE);
+   break;
+   case SM_UPDATE_NONE:
+   case SM_UPDATE_COMMAND:
+   return; /* Do not touch the submodule. */
+   }
+   }
if (!check_leading_path(ce->name, ce_namelen(ce)))
return;
if (remove_or_warn(ce->ce_mode, ce->name))
@@ -301,6 +371,9 @@ static int check_updates(struct unpack_trees_options *o)
remove_marked_cache_entries(index);
remove_scheduled_dirs();
 
+   if (should_update_submodules() && o->update && !o->dry_run)
+   reload_gitmodules_file(index, );
+
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
 
@@ -1358,17 +1431,26 @@ static int verify_uptodate_1(const struct cache_entry 
*ce,
if (!lstat(ce->name, )) {
int flags =

[PATCH 1/3] add collision-detecting sha1 implementation

2017-02-23 Thread Jeff King

This is pulled straight from:

  https://github.com/cr-marcstevens/sha1collisiondetection

with no modifications yet (though I've pulled in only the
subset of files necessary for Git to use).

Signed-off-by: Jeff King 
---
 sha1dc/sha1.c  | 1146 
 sha1dc/sha1.h  |   94 +
 sha1dc/ubc_check.c |  361 +
 sha1dc/ubc_check.h |   35 ++
 4 files changed, 1636 insertions(+)
 create mode 100644 sha1dc/sha1.c
 create mode 100644 sha1dc/sha1.h
 create mode 100644 sha1dc/ubc_check.c
 create mode 100644 sha1dc/ubc_check.h

diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
new file mode 100644
index 0..ed2010911
--- /dev/null
+++ b/sha1dc/sha1.c
@@ -0,0 +1,1146 @@
+/***
+* Copyright 2017 Marc Stevens , Dan Shumow 
(dan...@microsoft.com) 
+* Distributed under the MIT Software License.
+* See accompanying file LICENSE.txt or copy at
+* https://opensource.org/licenses/MIT
+***/
+
+#include 
+#include 
+#include 
+
+#include "sha1.h"
+#include "ubc_check.h"
+
+#define rotate_right(x,n) (((x)>>(n))|((x)<<(32-(n
+#define rotate_left(x,n)  (((x)<<(n))|((x)>>(32-(n
+
+#define sha1_f1(b,c,d) ((d)^((b)&((c)^(d
+#define sha1_f2(b,c,d) ((b)^(c)^(d))
+#define sha1_f3(b,c,d) (((b) & ((c)|(d))) | ((c)&(d)))
+#define sha1_f4(b,c,d) ((b)^(c)^(d))
+
+#define HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, m, t) \
+   { e += rotate_left(a, 5) + sha1_f1(b,c,d) + 0x5A827999 + m[t]; b = 
rotate_left(b, 30); }
+#define HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, m, t) \
+   { e += rotate_left(a, 5) + sha1_f2(b,c,d) + 0x6ED9EBA1 + m[t]; b = 
rotate_left(b, 30); }
+#define HASHCLASH_SHA1COMPRESS_ROUND3_STEP(a, b, c, d, e, m, t) \
+   { e += rotate_left(a, 5) + sha1_f3(b,c,d) + 0x8F1BBCDC + m[t]; b = 
rotate_left(b, 30); }
+#define HASHCLASH_SHA1COMPRESS_ROUND4_STEP(a, b, c, d, e, m, t) \
+   { e += rotate_left(a, 5) + sha1_f4(b,c,d) + 0xCA62C1D6 + m[t]; b = 
rotate_left(b, 30); }
+
+#define HASHCLASH_SHA1COMPRESS_ROUND1_STEP_BW(a, b, c, d, e, m, t) \
+   { b = rotate_right(b, 30); e -= rotate_left(a, 5) + sha1_f1(b,c,d) + 
0x5A827999 + m[t]; }
+#define HASHCLASH_SHA1COMPRESS_ROUND2_STEP_BW(a, b, c, d, e, m, t) \
+   { b = rotate_right(b, 30); e -= rotate_left(a, 5) + sha1_f2(b,c,d) + 
0x6ED9EBA1 + m[t]; }
+#define HASHCLASH_SHA1COMPRESS_ROUND3_STEP_BW(a, b, c, d, e, m, t) \
+   { b = rotate_right(b, 30); e -= rotate_left(a, 5) + sha1_f3(b,c,d) + 
0x8F1BBCDC + m[t]; }
+#define HASHCLASH_SHA1COMPRESS_ROUND4_STEP_BW(a, b, c, d, e, m, t) \
+   { b = rotate_right(b, 30); e -= rotate_left(a, 5) + sha1_f4(b,c,d) + 
0xCA62C1D6 + m[t]; }
+
+#define SHA1_STORE_STATE(i) states[i][0] = a; states[i][1] = b; states[i][2] = 
c; states[i][3] = d; states[i][4] = e;
+
+
+
+void sha1_message_expansion(uint32_t W[80])
+{
+   for (unsigned i = 16; i < 80; ++i)
+   W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 
1);
+}
+
+void sha1_compression(uint32_t ihv[5], const uint32_t m[16])
+{
+   uint32_t W[80];
+
+   memcpy(W, m, 16 * 4);
+   for (unsigned i = 16; i < 80; ++i)
+   W[i] = rotate_left(W[i - 3] ^ W[i - 8] ^ W[i - 14] ^ W[i - 16], 
1);
+
+   uint32_t a = ihv[0], b = ihv[1], c = ihv[2], d = ihv[3], e = ihv[4];
+
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 0);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 1);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 2);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 3);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 4);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 5);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 6);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 7);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 8);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 9);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 10);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 11);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 12);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 13);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 14);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(a, b, c, d, e, W, 15);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(e, a, b, c, d, W, 16);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(d, e, a, b, c, W, 17);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(c, d, e, a, b, W, 18);
+   HASHCLASH_SHA1COMPRESS_ROUND1_STEP(b, c, d, e, a, W, 19);
+
+   HASHCLASH_SHA1COMPRESS_ROUND2_STEP(a, b, c, d, e, W, 20);
+   HASHCLASH_SHA1COMPRESS_ROUND2_STEP(e, a, b, c, d, W, 21);
+   HASHCLASH_SHA1COMPRESS_ROUND2_STEP(d, e, a, b, c, W, 22);
+   HASHCLASH_SHA1COMPRESS_ROUND2_STEP(c, d, e, a, b, W, 23);
+   HASHCLASH_SHA1COMPRESS_ROUND2_STEP(b, c, d, e, a, W, 24);
+

Re: SHA1 collisions found

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 05:43:02PM -0500, Jeff King wrote:

> On Thu, Feb 23, 2017 at 02:38:29PM -0800, Linus Torvalds wrote:
> 
> > > Thanks, I hadn't seen that yet. That doesn't look like it should be hard
> > > to integrate into Git.
> > 
> > Here's a *very* ugly patch that is absolutely disgusting and should not be 
> > used. But it does kind of work (I tested it with a faked-up extra patch 
> > that made git accept the broken pdf as a loose object).
> > 
> > What do I mean by "kind of work"? It uses that ugly and slow checking 
> > SHA1 routine from the collision detection project for the SHA1 object 
> > verification, and it means that "git fsck" ends up being about twice as 
> > slow as it used to be.
> 
> Heh. I was just putting the finishing touches on a similar patch. Mine
> is much less gross, in that it actually just adds a new USE_SHA1DC knob
> (instead of, say, BLK_SHA1).

Here's my patches. They _might_ be worth including if only because they
shouldn't bother anybody unless they enable USE_SHA1DC. So it makes it a
bit more accessible for people to experiment with (or be paranoid with
if they like).

The first one is 98K. Mail headers may bump it over vger's 100K barrier.
It's actually the _least_ interesting patch of the 3, because it just
imports the code wholesale from the other project. But if it doesn't
make it, you can fetch the whole series from:

  https://github.com/peff/git jk/sha1dc

(By the way, I don't see your version on the list, Linus, which probably
means it was eaten by the 100K filter).

  [1/3]: add collision-detecting sha1 implementation
  [2/3]: sha1dc: adjust header includes for git
  [3/3]: Makefile: add USE_SHA1DC knob

 Makefile   |   10 +
 sha1dc/sha1.c  | 1165 
 sha1dc/sha1.h  |  108 +
 sha1dc/ubc_check.c |  361 
 sha1dc/ubc_check.h |   33 ++
 5 files changed, 1677 insertions(+)
 create mode 100644 sha1dc/sha1.c
 create mode 100644 sha1dc/sha1.h
 create mode 100644 sha1dc/ubc_check.c
 create mode 100644 sha1dc/ubc_check.h

-Peff

[PATCH 14/15] entry.c: update submodules when interesting

2017-02-23 Thread Stefan Beller

Signed-off-by: Stefan Beller 
---
 entry.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/entry.c b/entry.c
index c6eea240b6..d2b512da90 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,7 @@
 #include "blob.h"
 #include "dir.h"
 #include "streaming.h"
+#include "submodule.h"
 
 static void create_directories(const char *path, int path_len,
   const struct checkout *state)
@@ -146,6 +147,7 @@ static int write_entry(struct cache_entry *ce,
unsigned long size;
size_t wrote, newsize = 0;
struct stat st;
+   const struct submodule *sub;
 
if (ce_mode_s_ifmt == S_IFREG) {
struct stream_filter *filter = get_stream_filter(ce->name,
@@ -203,6 +205,10 @@ static int write_entry(struct cache_entry *ce,
return error("cannot create temporary submodule %s", 
path);
if (mkdir(path, 0777) < 0)
return error("cannot create submodule directory %s", 
path);
+   sub = submodule_from_ce(ce);
+   if (sub)
+   return submodule_move_head(ce->name,
+   NULL, oid_to_hex(>oid), 
SUBMODULE_MOVE_HEAD_FORCE);
break;
default:
return error("unknown file mode for %s in index", path);
@@ -259,7 +265,31 @@ int checkout_entry(struct cache_entry *ce,
strbuf_add(, ce->name, ce_namelen(ce));
 
if (!check_path(path.buf, path.len, , state->base_dir_len)) {
+   const struct submodule *sub;
unsigned changed = ce_match_stat(ce, , 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
+   /*
+* Needs to be checked before !changed returns early,
+* as the possibly empty directory was not changed
+*/
+   sub = submodule_from_ce(ce);
+   if (sub) {
+   int err;
+   if (!is_submodule_populated_gently(ce->name, )) {
+   struct stat sb;
+   if (lstat(ce->name, ))
+   die(_("could not stat file '%s'"), 
ce->name);
+   if (!(st.st_mode & S_IFDIR))
+   unlink_or_warn(ce->name);
+
+   return submodule_move_head(ce->name,
+   NULL, oid_to_hex(>oid),
+   SUBMODULE_MOVE_HEAD_FORCE);
+   } else
+   return submodule_move_head(ce->name,
+   "HEAD", oid_to_hex(>oid),
+   SUBMODULE_MOVE_HEAD_FORCE);
+   }
+
if (!changed)
return 0;
if (!state->force) {
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 06/15] update submodules: add submodule config parsing

2017-02-23 Thread Stefan Beller

Similar to b33a15b08 (push: add recurseSubmodules config option,
2015-11-17) and 027771fcb1 (submodule: allow erroneous values for the
fetchRecurseSubmodules option, 2015-08-17), we add submodule-config code
that is later used to parse whether we are interested in updating
submodules.

We need the `die_on_error` parameter to be able to call this parsing
function for the config file as well, which if incorrect lets Git die.

As we're just touching the header file, also mark all functions extern.

Signed-off-by: Stefan Beller 
---
 submodule-config.c | 20 
 submodule-config.h | 17 +
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 93453909cf..3e8e380d98 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -234,6 +234,26 @@ int parse_fetch_recurse_submodules_arg(const char *opt, 
const char *arg)
return parse_fetch_recurse(opt, arg, 1);
 }
 
+static int parse_update_recurse(const char *opt, const char *arg,
+   int die_on_error)
+{
+   switch (git_config_maybe_bool(opt, arg)) {
+   case 1:
+   return RECURSE_SUBMODULES_ON;
+   case 0:
+   return RECURSE_SUBMODULES_OFF;
+   default:
+   if (die_on_error)
+   die("bad %s argument: %s", opt, arg);
+   return RECURSE_SUBMODULES_ERROR;
+   }
+}
+
+int parse_update_recurse_submodules_arg(const char *opt, const char *arg)
+{
+   return parse_update_recurse(opt, arg, 1);
+}
+
 static int parse_push_recurse(const char *opt, const char *arg,
   int die_on_error)
 {
diff --git a/submodule-config.h b/submodule-config.h
index 70f19363fd..d434ecdb45 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -22,16 +22,17 @@ struct submodule {
int recommend_shallow;
 };
 
-int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
-int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
-int parse_submodule_config_option(const char *var, const char *value);
-const struct submodule *submodule_from_name(const unsigned char 
*commit_or_tree,
-   const char *name);
-const struct submodule *submodule_from_path(const unsigned char 
*commit_or_tree,
-   const char *path);
+extern int parse_fetch_recurse_submodules_arg(const char *opt, const char 
*arg);
+extern int parse_update_recurse_submodules_arg(const char *opt, const char 
*arg);
+extern int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
+extern int parse_submodule_config_option(const char *var, const char *value);
+extern const struct submodule *submodule_from_name(
+   const unsigned char *commit_or_tree, const char *name);
+extern const struct submodule *submodule_from_path(
+   const unsigned char *commit_or_tree, const char *path);
 extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
  unsigned char *gitmodules_sha1,
  struct strbuf *rev);
-void submodule_free(void);
+extern void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 11/15] unpack-trees: pass old oid to verify_clean_submodule

2017-02-23 Thread Stefan Beller

The check (which uses the old oid) is yet to be implemented, but this part
is just a refactor, so it can go separately first.

Signed-off-by: Stefan Beller 
---
 unpack-trees.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19fe8..616a0ae4b2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1407,7 +1407,8 @@ static void invalidate_ce_path(const struct cache_entry 
*ce,
  * Currently, git does not checkout subprojects during a superproject
  * checkout, so it is not going to overwrite anything.
  */
-static int verify_clean_submodule(const struct cache_entry *ce,
+static int verify_clean_submodule(const char *old_sha1,
+ const struct cache_entry *ce,
  enum unpack_trees_error_types error_type,
  struct unpack_trees_options *o)
 {
@@ -1427,16 +1428,18 @@ static int verify_clean_subdirectory(const struct 
cache_entry *ce,
struct dir_struct d;
char *pathbuf;
int cnt = 0;
-   unsigned char sha1[20];
 
-   if (S_ISGITLINK(ce->ce_mode) &&
-   resolve_gitlink_ref(ce->name, "HEAD", sha1) == 0) {
-   /* If we are not going to update the submodule, then
+   if (S_ISGITLINK(ce->ce_mode)) {
+   unsigned char sha1[20];
+   int sub_head = resolve_gitlink_ref(ce->name, "HEAD", sha1);
+   /*
+* If we are not going to update the submodule, then
 * we don't care.
 */
-   if (!hashcmp(sha1, ce->oid.hash))
+   if (!sub_head && !hashcmp(sha1, ce->oid.hash))
return 0;
-   return verify_clean_submodule(ce, error_type, o);
+   return verify_clean_submodule(sub_head ? NULL : 
sha1_to_hex(sha1),
+ ce, error_type, o);
}
 
/*
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 02/15] lib-submodule-update.sh: do not use ./. as submodule remote

2017-02-23 Thread Stefan Beller

Adding the repository itself as a submodule does not make sense in the
real world. In our test suite we used to do that out of convenience in
some tests as the current repository has easiest access for setting up
'just a submodule'.

However this doesn't quite test the real world, so let's do not follow
this pattern any further and actually create an independent repository
that we can use as a submodule.

When using './.' as the remote the superproject and submodule share the
same objects, such that testing if a given sha1 is a valid commit works
in either repository.  As running commands in an unpopulated submodule
fall back to the superproject, this happens in `reset_work_tree_to`
to determine if we need to populate the submodule. Fix this bug by
checking in the actual remote now.

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 5df528ea81..c0d6325133 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -37,6 +37,17 @@
 #
 
 create_lib_submodule_repo () {
+   git init submodule_update_sub1 &&
+   (
+   cd submodule_update_sub1 &&
+   echo "expect" >>.gitignore &&
+   echo "actual" >>.gitignore &&
+   echo "x" >file1 &&
+   echo "y" >file2 &&
+   git add .gitignore file1 file2 &&
+   git commit -m "Base inside first submodule" &&
+   git branch "no_submodule"
+   ) &&
git init submodule_update_repo &&
(
cd submodule_update_repo &&
@@ -49,7 +60,7 @@ create_lib_submodule_repo () {
git branch "no_submodule" &&
 
git checkout -b "add_sub1" &&
-   git submodule add ./. sub1 &&
+   git submodule add ../submodule_update_sub1 sub1 &&
git config -f .gitmodules submodule.sub1.ignore all &&
git config submodule.sub1.ignore all &&
git add .gitmodules &&
@@ -162,7 +173,7 @@ reset_work_tree_to () {
test_must_be_empty actual &&
sha1=$(git rev-parse --revs-only HEAD:sub1) &&
if test -n "$sha1" &&
-  test $(cd "sub1" && git rev-parse --verify "$sha1^{commit}")
+  test $(cd "../submodule_update_sub1" && git rev-parse 
--verify "$sha1^{commit}")
then
git submodule update --init --recursive "sub1"
fi
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[RFCv5 PATCH 00/14] Checkout aware of Submodules!

2017-02-23 Thread Stefan Beller

previous work:
https://public-inbox.org/git/20161203003022.29797-1-sbel...@google.com/

v5:
 * as v4 was the first version queued by Junio, we do have an interdiff below!
 * renamed functions
 * changed the API, now the caller has to take care of the submodule strategy
   themselves. (Note this can be different for different situations, e.g.
   when a submodule is deleted, we can do that for any strategy except none and
   !command. But for moving to a new state of the submodule we currently
   only implement "checkout" [unspecified defaults to checkout]. warning about
   the others, doing nothing there.)

v4:
 * addressed all comments of Brian, Junio and Brandon.
 Thanks!
 * one major point of change is the introduction of another patch
   "lib-submodule-update.sh: do not use ./. as submodule remote",
   as that took some time to track down the existing bug.
 
v3:
 * moved tests from t2013 to the generic submodule library.
 * factored out the refactoring patches to be up front
 * As I redid the complete implementation, I have the impression this time
   it is cleaner than previous versions.
 
 I think we still have to fix the corner cases of directory/file/submodule 
 conflicts before merging, but this serves as a status update on my current
 way of thinking how to implement the worktree commands being aware of
 submodules.
 
Thanks,
Stefan

v2:
* based on top of the series sent out an hour ago
  "[PATCHv4 0/5] submodule embedgitdirs"
* Try to embed a submodule if we need to remove it.
* Strictly do not change behavior if not giving the new flag.
* I think I missed some review comments from v1, but I'd like to get
  the current state out over the weekend, as a lot has changed so far.
  On Monday I'll go through the previous discussion with a comb to see
  if I missed something.
  
v1:
When working with submodules, nearly anytime after checking out
a different state of the projects, that has submodules changed
you'd run "git submodule update" with a current version of Git.

There are two problems with this approach:

* The "submodule update" command is dangerous as it
  doesn't check for work that may be lost in the submodule
  (e.g. a dangling commit).
* you may forget to run the command as checkout is supposed
  to do all the work for you.

Integrate updating the submodules into git checkout, with the same
safety promises that git-checkout has, i.e. not throw away data unless
asked to. This is done by first checking if the submodule is at the same
sha1 as it is recorded in the superproject. If there are changes we stop
proceeding the checkout just like it is when checking out a file that
has local changes.

The integration happens in the code that is also used in other commands
such that it will be easier in the future to make other commands aware
of submodule.

This also solves d/f conflicts in case you replace a file/directory
with a submodule or vice versa.

The patches are still a bit rough, but the overall series seems
promising enough to me that I want to put it out here.

Any review, specifically on the design level welcome!

Thanks,
Stefan


Stefan Beller (14):
  lib-submodule-update.sh: reorder create_lib_submodule_repo
  lib-submodule-update.sh: define tests for recursing into submodules
  make is_submodule_populated gently
  connect_work_tree_and_git_dir: safely create leading directories
  update submodules: add submodule config parsing
  update submodules: add a config option to determine if submodules are
updated
  update submodules: introduce is_interesting_submodule
  update submodules: move up prepare_submodule_repo_env
  update submodules: add submodule_go_from_to
  unpack-trees: pass old oid to verify_clean_submodule
  unpack-trees: check if we can perform the operation for submodules
  read-cache: remove_marked_cache_entries to wipe selected submodules.
  entry.c: update submodules when interesting
  builtin/checkout: add --recurse-submodules switch

diff --git a/entry.c b/entry.c
index ae40611c97..d2b512da90 100644
--- a/entry.c
+++ b/entry.c
@@ -147,6 +147,7 @@ static int write_entry(struct cache_entry *ce,
unsigned long size;
size_t wrote, newsize = 0;
struct stat st;
+   const struct submodule *sub;
 
if (ce_mode_s_ifmt == S_IFREG) {
struct stream_filter *filter = get_stream_filter(ce->name,
@@ -204,13 +205,10 @@ static int write_entry(struct cache_entry *ce,
return error("cannot create temporary submodule %s", 
path);
if (mkdir(path, 0777) < 0)
return error("cannot create submodule directory %s", 
path);
-   if (is_active_submodule_with_strategy(ce, 
SM_UPDATE_UNSPECIFIED))
-   /*
-* force=1 is ok for any case as we did a dry
-* run before with appropriate force setting
-*/
-   return

[PATCH 04/15] make is_submodule_populated gently

2017-02-23 Thread Stefan Beller

We need the gentle version in a later patch. As we have just one caller,
migrate the caller.

Signed-off-by: Stefan Beller 
---
 builtin/grep.c | 2 +-
 submodule.c| 7 ++-
 submodule.h| 8 +++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef499..b17835aed6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -616,7 +616,7 @@ static int grep_submodule(struct grep_opt *opt, const 
unsigned char *sha1,
 {
if (!is_submodule_initialized(path))
return 0;
-   if (!is_submodule_populated(path)) {
+   if (!is_submodule_populated_gently(path, NULL)) {
/*
 * If searching history, check for the presense of the
 * submodule's gitdir before skipping the submodule.
diff --git a/submodule.c b/submodule.c
index 3b98766a6b..0e55372f37 100644
--- a/submodule.c
+++ b/submodule.c
@@ -234,15 +234,12 @@ int is_submodule_initialized(const char *path)
return ret;
 }
 
-/*
- * Determine if a submodule has been populated at a given 'path'
- */
-int is_submodule_populated(const char *path)
+int is_submodule_populated_gently(const char *path, int *return_error_code)
 {
int ret = 0;
char *gitdir = xstrfmt("%s/.git", path);
 
-   if (resolve_gitdir(gitdir))
+   if (resolve_gitdir_gently(gitdir, return_error_code))
ret = 1;
 
free(gitdir);
diff --git a/submodule.h b/submodule.h
index 05ab674f06..0b915bd3ac 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,7 +41,13 @@ extern int submodule_config(const char *var, const char 
*value, void *cb);
 extern void gitmodules_config(void);
 extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
-extern int is_submodule_populated(const char *path);
+/*
+ * Determine if a submodule has been populated at a given 'path' by checking if
+ * the /.git resolves to a valid git repository.
+ * If return_error_code is NULL, die on error.
+ * Otherwise the return error code is the same as of resolve_gitdir_gently.
+ */
+extern int is_submodule_populated_gently(const char *path, int 
*return_error_code);
 extern int parse_submodule_update_strategy(const char *value,
struct submodule_update_strategy *dst);
 extern const char *submodule_strategy_to_string(const struct 
submodule_update_strategy *s);
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 03/15] lib-submodule-update.sh: define tests for recursing into submodules

2017-02-23 Thread Stefan Beller

Currently lib-submodule-update.sh provides 2 functions
test_submodule_switch and test_submodule_forced_switch that are used by a
variety of tests to ensure that submodules behave as expected. The current
expected behavior is that submodules are not touched at all (see
42639d2317a for the exact setup).

In the future we want to teach all these commands to recurse
into submodules. To do that, we'll add two testing functions to
submodule-update-lib.sh: test_submodule_switch_recursing and
test_submodule_forced_switch_recursing.

These two functions behave in analogy to the already existing functions
just with a different expectation on submodule behavior. The submodule
in the working tree is expected to be updated to the recorded submodule
version. The behavior is analogous to e.g. the behavior of files in a
nested directory in the working tree, where a change to the working tree
handles any arising directory/file conflicts just fine.

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 485 +-
 1 file changed, 483 insertions(+), 2 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index c0d6325133..0b26f0e20f 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -4,6 +4,7 @@
 # - New submodule (no_submodule => add_sub1)
 # - Removed submodule (add_sub1 => remove_sub1)
 # - Updated submodule (add_sub1 => modify_sub1)
+# - Updated submodule recursively (modify_sub1 => modify_sub1_recursively)
 # - Submodule updated to invalid commit (add_sub1 => invalid_sub1)
 # - Submodule updated from invalid commit (invalid_sub1 => valid_sub1)
 # - Submodule replaced by tracked files in directory (add_sub1 =>
@@ -19,8 +20,8 @@
 #/^
 #   / remove_sub1
 #  /
-#   add_sub1  /---O
-# |  /^
+#   add_sub1  /---O-O
+# |  /^ modify_sub1_recursive
 # | / modify_sub1
 # v/
 #  O--O---O-O
@@ -48,6 +49,17 @@ create_lib_submodule_repo () {
git commit -m "Base inside first submodule" &&
git branch "no_submodule"
) &&
+   git init submodule_update_sub2 &&
+   (
+   cd submodule_update_sub2
+   echo "expect" >>.gitignore &&
+   echo "actual" >>.gitignore &&
+   echo "x" >file1 &&
+   echo "y" >file2 &&
+   git add .gitignore file1 file2 &&
+   git commit -m "nested submodule base" &&
+   git branch "no_submodule"
+   ) &&
git init submodule_update_repo &&
(
cd submodule_update_repo &&
@@ -84,6 +96,14 @@ create_lib_submodule_repo () {
git add sub1 &&
git commit -m "Modify sub1" &&
 
+   git checkout -b modify_sub1_recursively modify_sub1 &&
+   git -C sub1 checkout -b "add_nested_sub" &&
+   git -C sub1 submodule add --branch no_submodule 
../submodule_update_sub2 sub2 &&
+   git -C sub1 commit -a -m "add a nested submodule" &&
+   git add sub1 &&
+   git commit -a -m "update submodule, that updates a nested 
submodule" &&
+   git -C sub1 submodule deinit -f --all &&
+
git checkout -b replace_sub1_with_directory add_sub1 &&
git submodule update &&
git -C sub1 checkout modifications &&
@@ -150,6 +170,15 @@ test_git_directory_is_unchanged () {
)
 }
 
+test_git_directory_exists() {
+   test -e ".git/modules/$1" &&
+   if test -f sub1/.git
+   then
+   # does core.worktree point at the right place?
+   test "$(git -C .git/modules/$1 config core.worktree)" = 
"../../../$1"
+   fi
+}
+
 # Helper function to be executed at the start of every test below, it sets up
 # the submodule repo if it doesn't exist and configures the most problematic
 # settings for diff.ignoreSubmodules.
@@ -180,6 +209,18 @@ reset_work_tree_to () {
)
 }
 
+reset_work_tree_to_interested () {
+   reset_work_tree_to $1 &&
+   # indicate we are interested in the submodule:
+   git -C submodule_update config submodule.sub1.url "bogus" &&
+   # also have it available:
+   if ! test -d submodule_update/.git/modules/sub1
+   then
+   mkdir -p submodule_update/.git/modules &&
+   cp -r submodule_update_repo/.git/modules/sub1 
submodule_update/.git/modules/sub1
+   fi
+}
+
 # Test that the superproject contains the content according to commit "$1"
 # (the work tree must match the index for everything but submodules but the
 # index must exactly match the given commit including any submodule SHA-1s).
@@ -695,3 +736,443 @@ test_submodule_forced_switch () {
)
'
 }
+
+# Test that submodule contents are correctly

[BUG] allowtipsha1inwant serves unreachable blobs if you know its hash

2017-02-23 Thread Jonathan Tan

If a server sets allowtipsha1inwant (or allowreachablesha1inwant), a
client can call "git fetch  " where SHA-1 is the hash of
a blob (reachable or unreachable) to obtain it. The test below (which
passes) demonstrates that.

I have bisected this, and this bug occurs at least as early as the
introduction of allowreachablesha1inwant in commit 68ee628
("upload-pack: optionally allow fetching reachable sha1", 2015-05-21).
It may have occurred earlier, though - initially, I thought that this
was due to allowreachablesha1inwant, so I did not investigate
allowtipsha1inwant until later.

I found this out while investigating the feasibility of using the
existing fetch-pack/upload-pack protocol to support fetching of
arbitrary blobs.

This happens most likely because the "rev-list" call in
"do_reachable_revlist" (called through "has_unreachable") is invoked
without the "--objects" argument. "has_unreachable" determines that an
object is unreachable if nothing is printed to stdout, which normally
works, except that "rev-list" prints nothing when asked which commits
are reachable from a blob (which makes sense).

Adding "--objects" works, and all existing tests pass, except for the
potential performance issue and the side effect that even fetching a
reachable blob no longer works. This is due to a possible bug where a
call like "git rev-list --objects $tree ^master" (where $tree is the
tree object corresponding to master) prints out objects even though all
objects reachable from the tree are also reachable from master. (There
should be no issue with "get_reachable_list", the other invoker of
"do_reachable_revlist", because non-commits in the command's stdout are
skipped.)

The other option, of course, is to whitelist object types (that is, only
allow commits and tags - I haven't checked if this is sufficient,
though). This might be a sufficient temporary fix, although I expect
that we would still have to revisit this later (especially if we decide
that we want to allow downloading blobs through this interface).

I don't mind taking a look at either solution, but thought of putting
this out first in case people have any opinion or insight into this
problem and its possible solutions.

(CC-ing some people who might have an interest in downloading arbitrary
blobs.)

Signed-off-by: Jonathan Tan 
---
 t/t-mytests.sh | 61 ++
 1 file changed, 61 insertions(+)
 create mode 100644 t/t-mytests.sh

diff --git a/t/t-mytests.sh b/t/t-mytests.sh
new file mode 100644
index 0..e71cfbf48
--- /dev/null
+++ b/t/t-mytests.sh
@@ -0,0 +1,61 @@
+#!/bin/sh
+
+test_description='my tests'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'setup' '
+   server="$HTTPD_DOCUMENT_ROOT_PATH/blobs" &&
+   git init "$server" &&
+   (
+   cd "$server" &&
+   test_commit myfile
+   ) &&
+
+   # A reachable blob.
+   reachable=$(
+   git -C "$server" cat-file -p \
+   $(git -C "$server" cat-file -p HEAD | grep "^tree " | cut -c6-) 
| \
+   grep myfile | cut -c13-52) &&
+   test -n "$reachable" &&
+
+   # 2 unreachable blobs. Only one will be fetched. (2 are included here
+   # to demonstrate that there is no whole-repo copying or anything like
+   # that.)
+   unreachable=$(echo abc123 | git -C "$server" hash-object -w --stdin) &&
+   test -n "$unreachable" &&
+   another_unreachable=$(echo def456 | git -C "$server" hash-object -w 
--stdin) &&
+   test -n "$another_unreachable" &&
+
+   git init --bare client.git &&
+   git -C client.git remote add origin "$HTTPD_URL/smart/blobs" &&
+
+   # These fetches are supposed to fail (and do)
+   test_must_fail git -C client.git fetch origin $reachable &&
+   test_must_fail git -C client.git fetch origin $unreachable
+'
+
+test_expect_success 'allowtipsha1inwant suddenly allows blobs' '
+   test -n "$reachable" &&
+
+   git -C "$server" config uploadpack.allowtipsha1inwant 1 &&
+
+   # This fetch passes
+   git -C client.git fetch origin $reachable &&
+   test "myfile" = $(git -C client.git cat-file -p $reachable)
+'
+
+test_expect_success 'even unreachable ones' '
+   test -n "$unreachable" &&
+
+   # This fetch is supposed to fail (for multiple reasons), but passes.
+   # Only the wanted blob is fetched.
+   git -C client.git fetch origin $unreachable &&
+   git -C client.git cat-file -e $unreachable &&
+   test_must_fail git -C client.git cat-file -e $another_unreachable &&
+   test "abc123" = $(git -C client.git cat-file -p $unreachable)
+'
+
+stop_httpd
+test_done
-- 
2.11.0.483.g087da7b7c-goog

[PATCH 01/15] lib-submodule-update.sh: reorder create_lib_submodule_repo

2017-02-23 Thread Stefan Beller

Redraw the ASCII art describing the setup using more space, such that
it is easier to understand.  The leaf commits are now ordered the same
way the actual code is ordered.

Add empty lines to the setup code separating each of the leaf commits,
each starting with a "checkout -b".

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 49 ---
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 915eb4a7c6..5df528ea81 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -15,22 +15,27 @@
 # - Tracked file replaced by submodule (replace_sub1_with_file =>
 #   replace_file_with_sub1)
 #
-#   --O-O
-#  /  ^ replace_directory_with_sub1
-# /   replace_sub1_with_directory
-#/O
-#   / ^
-#  /  modify_sub1
-#  O--O---O
-#  ^  ^\  ^
-#  |  | \ remove_sub1
-#  |  |  -O-O
-#  |  |   \   ^ replace_file_with_sub1
-#  |  |\  replace_sub1_with_file
-#  |   add_sub1 --O-O
-# no_submodule^ valid_sub1
-# invalid_sub1
+# O
+#/^
+#   / remove_sub1
+#  /
+#   add_sub1  /---O
+# |  /^
+# | / modify_sub1
+# v/
+#  O--O---O-O
+#  ^   \  ^ replace_directory_with_sub1
+#  |\ replace_sub1_with_directory
+# no_submodule   \
+# O-O
+#  \  ^ replace_file_with_sub1
+#   \ replace_sub1_with_file
+#\
+# O-O
+# ^ valid_sub1
+# invalid_sub1
 #
+
 create_lib_submodule_repo () {
git init submodule_update_repo &&
(
@@ -49,10 +54,11 @@ create_lib_submodule_repo () {
git config submodule.sub1.ignore all &&
git add .gitmodules &&
git commit -m "Add sub1" &&
-   git checkout -b remove_sub1 &&
+
+   git checkout -b remove_sub1 add_sub1 &&
git revert HEAD &&
 
-   git checkout -b "modify_sub1" "add_sub1" &&
+   git checkout -b modify_sub1 add_sub1 &&
git submodule update &&
(
cd sub1 &&
@@ -67,7 +73,7 @@ create_lib_submodule_repo () {
git add sub1 &&
git commit -m "Modify sub1" &&
 
-   git checkout -b "replace_sub1_with_directory" "add_sub1" &&
+   git checkout -b replace_sub1_with_directory add_sub1 &&
git submodule update &&
git -C sub1 checkout modifications &&
git rm --cached sub1 &&
@@ -75,22 +81,25 @@ create_lib_submodule_repo () {
git config -f .gitmodules --remove-section "submodule.sub1" &&
git add .gitmodules sub1/* &&
git commit -m "Replace sub1 with directory" &&
+
git checkout -b replace_directory_with_sub1 &&
git revert HEAD &&
 
-   git checkout -b "replace_sub1_with_file" "add_sub1" &&
+   git checkout -b replace_sub1_with_file add_sub1 &&
git rm sub1 &&
echo "content" >sub1 &&
git add sub1 &&
git commit -m "Replace sub1 with file" &&
+
git checkout -b replace_file_with_sub1 &&
git revert HEAD &&
 
-   git checkout -b "invalid_sub1" "add_sub1" &&
+   git checkout -b invalid_sub1 add_sub1 &&
git update-index --cacheinfo 16 
0123456789012345678901234567890123456789 sub1 &&
git commit -m "Invalid sub1 commit" &&
git checkout -b valid_sub1 &&
git revert HEAD &&
+
git checkout master
)
 }
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 08/15] submodules: introduce check to see whether to touch a submodule

2017-02-23 Thread Stefan Beller

In later patches we introduce the --recurse-submodule flag for commands
that modify the working directory, e.g. git-checkout.

It is potentially expensive to check if a submodule needs an update,
because a common theme to interact with submodules is to spawn a child
process for each interaction.

So let's introduce a function that checks if a submodule needs
to be checked for an update before attempting the update.

Signed-off-by: Stefan Beller 
---
 submodule.c | 16 
 submodule.h |  7 +++
 2 files changed, 23 insertions(+)

diff --git a/submodule.c b/submodule.c
index 591f4a694e..8b2c0212be 100644
--- a/submodule.c
+++ b/submodule.c
@@ -548,6 +548,22 @@ void set_config_update_recurse_submodules(int value)
config_update_recurse_submodules = value;
 }
 
+int should_update_submodules(void)
+{
+   return config_update_recurse_submodules == RECURSE_SUBMODULES_ON;
+}
+
+const struct submodule *submodule_from_ce(const struct cache_entry *ce)
+{
+   if (!S_ISGITLINK(ce->ce_mode))
+   return NULL;
+
+   if (!should_update_submodules())
+   return NULL;
+
+   return submodule_from_path(null_sha1, ce->name);
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
diff --git a/submodule.h b/submodule.h
index b4e60c08d2..6f3fe85c7c 100644
--- a/submodule.h
+++ b/submodule.h
@@ -65,6 +65,13 @@ extern void show_submodule_inline_diff(FILE *f, const char 
*path,
const struct diff_options *opt);
 extern void set_config_fetch_recurse_submodules(int value);
 extern void set_config_update_recurse_submodules(int value);
+/* Check if we want to update any submodule.*/
+extern int should_update_submodules(void);
+/*
+ * Returns the submodule struct if the given ce entry is a submodule
+ * and it should be updated. Returns NULL otherwise.
+ */
+extern const struct submodule *submodule_from_ce(const struct cache_entry *ce);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

[PATCH 09/15] update submodules: move up prepare_submodule_repo_env

2017-02-23 Thread Stefan Beller

In a later patch we need to prepare the submodule environment with
another git directory, so split up the function.

Also move it up in the file such that we do not need to declare the
function later before using it.

Signed-off-by: Stefan Beller 
---
 submodule.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/submodule.c b/submodule.c
index 8b2c0212be..0b2596e88a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -356,6 +356,23 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
strbuf_release();
 }
 
+static void prepare_submodule_repo_env_no_git_dir(struct argv_array *out)
+{
+   const char * const *var;
+
+   for (var = local_repo_env; *var; var++) {
+   if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
+   argv_array_push(out, *var);
+   }
+}
+
+void prepare_submodule_repo_env(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
+DEFAULT_GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1390,18 +1407,6 @@ int parallel_submodules(void)
return parallel_jobs;
 }
 
-void prepare_submodule_repo_env(struct argv_array *out)
-{
-   const char * const *var;
-
-   for (var = local_repo_env; *var; var++) {
-   if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
-   argv_array_push(out, *var);
-   }
-   argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
-DEFAULT_GIT_DIR_ENVIRONMENT);
-}
-
 /*
  * Embeds a single submodules git directory into the superprojects git dir,
  * non recursively.
-- 
2.12.0.rc1.16.ge4278d41a0.dirty

Re: SHA1 collisions found

2017-02-23 Thread Linus Torvalds

On Thu, Feb 23, 2017 at 2:43 PM, Jeff King  wrote:
>
> Yeah. I started looking at that, but the ubc check happens after the
> initial expansion.

Yes. That's the point where I gave up and just included their ugly sha1.c file.

I suspect it can be done, but it would need somebody to really know
what they are doing.

Linus

Re: SHA1 collisions found

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 02:38:29PM -0800, Linus Torvalds wrote:

> > Thanks, I hadn't seen that yet. That doesn't look like it should be hard
> > to integrate into Git.
> 
> Here's a *very* ugly patch that is absolutely disgusting and should not be 
> used. But it does kind of work (I tested it with a faked-up extra patch 
> that made git accept the broken pdf as a loose object).
> 
> What do I mean by "kind of work"? It uses that ugly and slow checking 
> SHA1 routine from the collision detection project for the SHA1 object 
> verification, and it means that "git fsck" ends up being about twice as 
> slow as it used to be.

Heh. I was just putting the finishing touches on a similar patch. Mine
is much less gross, in that it actually just adds a new USE_SHA1DC knob
(instead of, say, BLK_SHA1).

Here are the timings I came up with:

  - compute sha1 over whole packfile
before: 1.349s
 after: 5.067s
change: +275%

  - rev-list --all
before: 5.742s
 after: 5.730s
change: -0.2%

  - rev-list --all --objects
before: 33.257s
 after: 33.392s
change: +0.4%

  - index-pack --verify
before: 2m20s
 after: 5m43s
change: +145%

  - git log --no-merges -1 -p
before: 9.532s
 after: 9.683s
change: +1.5%

So overall the sha1 computation is about 3-4x slower. But of
course most operations do more than just sha1. Accessing
commits and trees isn't slowed at all (both the +/- changes
there are well within the run-to-run noise). Accessing the
blobs is a little slower, but mostly drowned out by the cost
of things like actually generating patches.

The most-affected operation is `index-pack --verify`, which
is essentially just computing the sha1 on every object. It's
a bit worse than twice as slow, which means every push and
every fetch is going to experience that.

> For example, I suspect we could use our (much cleaner) block-sha1 
> implementation and include just the ubc_check.c code with that, instead of 
> the truly ugly C sha1 implementation that the sha1collisiondetection 
> project uses. 
> 
> But to do that, somebody would have to really know how the unavoidable 
> bit conditions check works with the intermediate hashes. I have only a 
> "big picture" mental model of it (read: I'm not competent to do that).

Yeah. I started looking at that, but the ubc check happens after the
initial expansion. But AFAICT, block-sha1 mixes that expansion in with
the rest of the steps for efficiency. So perhaps somebody who really
understands sha1 and the new checks could figure it out, but I'm not at
all certain that adding it in wouldn't lose some of block-sha1's
efficiency (on top of the time to actually do the ubc check).

-Peff

Re: [PATCH v2 1/4] delete_ref: accept a reflog message argument

2017-02-23 Thread Junio C Hamano

Duy Nguyen  writes:

> On Tue, Feb 21, 2017 at 8:10 AM, Kyle Meyer  wrote:
>> diff --git a/refs.h b/refs.h
>> index 9fbff90e7..5880886a7 100644
>> --- a/refs.h
>> +++ b/refs.h
>> @@ -276,8 +276,8 @@ int reflog_exists(const char *refname);
>>   * exists, regardless of its old value. It is an error for old_sha1 to
>>   * be NULL_SHA1. flags is passed through to ref_transaction_delete().
>>   */
>> -int delete_ref(const char *refname, const unsigned char *old_sha1,
>> -  unsigned int flags);
>> +int delete_ref(const char *msg, const char *refname,
>> +  const unsigned char *old_sha1, unsigned int flags);
>
> Is it just me who thinks it's weird that msg comes in front here?

You and anybody who didn't read what was discussed earlier, methinks
;-)  cf. <20170217081205.zn7j6d5cffgdv...@sigill.intra.peff.net>

> ... You'll
> probably want to update the comment block above if msg can be
> NULL.

Good suggestion.

Thanks for taking a look at this topic.  IIRC a recent update to one
of your topics introduced a new conflicts with this one.

What's cooking in git.git (Feb 2017, #07; Thu, 23)

2017-02-23 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* js/curl-empty-auth-set-by-default (2017-02-22) 1 commit
 - http(s): automatically try NTLM authentication first

 Flip "http.emptyAuth" on by default to help OOB experience for
 users of HTTP Negotiate authentication scheme.

 Under discussion.
 cf. <2017033419.q3fxqmrscosum...@genre.crustytoothpaste.net>


* bc/blame-doc-fix (2017-02-22) 1 commit
  (merged to 'next' on 2017-02-22 at 81c0ff2283)
 + Documentation: use brackets for optional arguments

 Doc update.

 Will merge to 'master'.


* bc/worktree-doc-fix-detached (2017-02-22) 1 commit
  (merged to 'next' on 2017-02-22 at 8257025363)
 + Documentation: correctly spell git worktree --detach

 Doc update.

 Will merge to 'master'.


* rt/align-add-i-help-text (2017-02-22) 1 commit
  (merged to 'next' on 2017-02-22 at a8573afb9a)
 + git add -i: replace \t with blanks in the help message

 Doc update.

 Will merge to 'master'.


* jh/send-email-one-cc (2017-02-23) 1 commit
 - send-email: only allow one address per body tag

 "Cc:" on the trailer part does not have to conform to RFC strictly,
 unlike in the e-mail header.  "git send-email" has been updated to
 ignore anything after '>' when picking addresses, to allow non-address
 cruft like " # stable 4.4" after the address.

 Will merge to and then cook in 'next'.


* jk/http-auth (2017-02-23) 1 commit
 - http: restrict auth methods to what the server advertises

 Reduce authentication round-trip over HTTP when the server supports
 just a single authentication method.

 Will merge to and then cook in 'next'.
 There is a follow-up that would supersede js/curl-empty-auth-set-by-default
 topic; as it expected to be refined further, it is not queued here yet.


* jk/ident-empty (2017-02-23) 4 commits
 - ident: do not ignore empty config name/email
 - ident: reject all-crud ident name
 - ident: handle NULL email when complaining of empty name
 - ident: mark error messages for translation

 user.email that consists of only cruft chars should have
 consistently errored out, but didn't.

 Will merge to and then cook in 'next'.


* jt/upload-pack-error-report (2017-02-23) 1 commit
 - upload-pack: report "not our ref" to client

 "git upload-pack", which is a counter-part of "git fetch", did not
 report a request for a ref that was not advertised as invalid.
 This is generally not a problem (because "git fetch" will stop
 before making such a request), but is the right thing to do.

 Will merge to and then cook in 'next'.

--
[Stalled]

* nd/worktree-move (2017-01-27) 7 commits
 . fixup! worktree move: new command
 . worktree remove: new command
 . worktree move: refuse to move worktrees with submodules
 . worktree move: accept destination as directory
 . worktree move: new command
 . worktree.c: add update_worktree_location()
 . worktree.c: add validate_worktree()

 "git worktree" learned move and remove subcommands.

 Tentatively ejected as it seems to break 'pu' when merged.


* cc/split-index-config (2016-12-26) 21 commits
 - Documentation/git-update-index: explain splitIndex.*
 - Documentation/config: add splitIndex.sharedIndexExpire
 - read-cache: use freshen_shared_index() in read_index_from()
 - read-cache: refactor read_index_from()
 - t1700: test shared index file expiration
 - read-cache: unlink old sharedindex files
 - config: add git_config_get_expiry() from gc.c
 - read-cache: touch shared index files when used
 - sha1_file: make check_and_freshen_file() non static
 - Documentation/config: add splitIndex.maxPercentChange
 - t1700: add tests for splitIndex.maxPercentChange
 - read-cache: regenerate shared index if necessary
 - config: add git_config_get_max_percent_split_change()
 - Documentation/git-update-index: talk about core.splitIndex config var
 - Documentation/config: add information for core.splitIndex
 - t1700: add tests for core.splitIndex
 - update-index: warn in case of split-index incoherency
 - read-cache: add and then use tweak_split_index()
 - split-index: add {add,remove}_split_index() functions
 - config: add git_config_get_split_index()
 - config: mark an error message up for translation

 The experimental "split index" feature has gained a few
 configuration variables to make it easier to use.

 Expecting a reroll.
 cf. <2016122610.17150-1-chrisc...@tuxfamily.org>
 cf. 
 cf.

Re: [PATCH 3/4] filter-branch: fix --prune-empty on parentless commits

2017-02-23 Thread Devin J. Pohly

On Thu, Feb 23, 2017 at 01:17:49PM -0800, Junio C Hamano wrote:
> "Devin J. Pohly"  writes:
> 
> > Previously, the git_commit_non_empty_tree function would always pass any
> > commit with no parents to git-commit-tree, regardless of whether the
> > tree was nonempty.  The new commit would then be recorded in the
> > filter-branch revision map, and subsequent commits which leave the tree
> > untouched would be correctly filtered.
> >
> > With this change, parentless commits with an empty tree are correctly
> > pruned, and an empty file is recorded in the revision map, signifying
> > that it was rewritten to "no commits."  This works naturally with the
> > parent mapping for subsequent commits.
> >
> > Signed-off-by: Devin J. Pohly 
> > ---
> 
> I am not sure if a root that records an empty tree should be pruned
> with --prune-empty to begin with.
> 
> When we are pruning consecutive commits in the other parts of the
> history because they have identical (presumably non-empty) trees,
> should an empty root that the original history wanted to create be
> pruned because before the commit it was void, after the commit it is
> empty?  Should "void" (lack of any tree) and "empty" (the tree is
> there, but it does not have anything in it) be treated the same?
> Shouldn't root be treated as a bit more special thing?
>

The case I had in mind was a filter which happened to remove all changes
from any parentless commit (see the testcase added to t7003).  It would
not necessarily have been an empty commit in the original history.

Use case/motivation: I am splitting my dotfiles repo to migrate to vcsh,
and the original first commit (which only touches a few files) appears
in every branch.  In the branches which do not include those files, the
commit is empty but still present.

I think your point is interesting too, though.  If a commit is also
TREESAME to its parent(s?) in the _pre-filtered_ branch, it seems
reasonable that someone might want to leave it in the filtered branch as
an empty commit while pruning empt*ied* commits.  I would imagine that
as another option (--prune-newly-empty?).

> 
> I myself do not have a good answer to the above questions.
> 
> I think the updated code makes sense, provided if we decide that
> void to empty is just like transitioning between two identical
> (presumably non-empty) trees.  The updated documentation is a lot
> more readable as well.
> 
> Comments from those who have been involved in filter-branch?
> 
> >  Documentation/git-filter-branch.txt | 14 ++
> >  git-filter-branch.sh|  2 ++
> >  2 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/git-filter-branch.txt 
> > b/Documentation/git-filter-branch.txt
> > index 0a09698c0..6e4bb0220 100644
> > --- a/Documentation/git-filter-branch.txt
> > +++ b/Documentation/git-filter-branch.txt
> > @@ -167,14 +167,12 @@ to other tags will be rewritten to point to the 
> > underlying commit.
> > project root. Implies <>.
> >  
> >  --prune-empty::
> > -   Some kind of filters will generate empty commits, that left the tree
> > -   untouched.  This switch allow git-filter-branch to ignore such
> > -   commits.  Though, this switch only applies for commits that have one
> > -   and only one parent, it will hence keep merges points. Also, this
> > -   option is not compatible with the use of `--commit-filter`. Though you
> > -   just need to use the function 'git_commit_non_empty_tree "$@"' instead
> > -   of the `git commit-tree "$@"` idiom in your commit filter to make that
> > -   happen.
> > +   Some filters will generate empty commits that leave the tree untouched.
> > +   This option instructs git-filter-branch to remove such commits if they
> > +   have exactly one or zero non-pruned parents; merge commits will
> > +   therefore remain intact.  This option cannot be used together with
> > +   `--commit-filter`, though the same effect can be achieved by using the
> > +   provided `git_commit_non_empty_tree` function in a commit filter.
> >  
> >  --original ::
> > Use this option to set the namespace where the original commits
> > diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> > index 86b2ff1e0..2b8cdba15 100755
> > --- a/git-filter-branch.sh
> > +++ b/git-filter-branch.sh
> > @@ -46,6 +46,8 @@ git_commit_non_empty_tree()
> >  {
> > if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then
> > map "$3"
> > +   elif test $# = 1 && test "$1" = 
> > 4b825dc642cb6eb9a060e54bf8d69288fbee4904; then
> > +   :
> > else
> > git commit-tree "$@"
> > fi

-- 
<><

Re: SHAttered (the first practical SHA1 attack)

2017-02-23 Thread Jakub Narębski

W dniu 23.02.2017 o 16:50, Santiago Torres pisze:
> Hello all,
> 
> I ran into this website presenting the "first practical attack on
> sha1"[1]. I don't recall seeing this on the ML, so I'm sharing this just
> in case. I know there are proposals to move out of sha1 already. I
> wonder if this affects the timeline for their adoption?
> 
> Thanks,
> -Santiago.
> 
> [1] https://shattered.io/

This is already being discussed in "SHA1 collisions found"[1]
thread.  Let's do the discussion there.

[1] https://public-inbox.org/git/xmqqk28g92h7@gitster.mtv.corp.google.com
-- 
Jakub Narębski

Re: [PATCH 3/4] filter-branch: fix --prune-empty on parentless commits

2017-02-23 Thread Junio C Hamano

"Devin J. Pohly"  writes:

> Previously, the git_commit_non_empty_tree function would always pass any
> commit with no parents to git-commit-tree, regardless of whether the
> tree was nonempty.  The new commit would then be recorded in the
> filter-branch revision map, and subsequent commits which leave the tree
> untouched would be correctly filtered.
>
> With this change, parentless commits with an empty tree are correctly
> pruned, and an empty file is recorded in the revision map, signifying
> that it was rewritten to "no commits."  This works naturally with the
> parent mapping for subsequent commits.
>
> Signed-off-by: Devin J. Pohly 
> ---

I am not sure if a root that records an empty tree should be pruned
with --prune-empty to begin with.

When we are pruning consecutive commits in the other parts of the
history because they have identical (presumably non-empty) trees,
should an empty root that the original history wanted to create be
pruned because before the commit it was void, after the commit it is
empty?  Should "void" (lack of any tree) and "empty" (the tree is
there, but it does not have anything in it) be treated the same?
Shouldn't root be treated as a bit more special thing?

I myself do not have a good answer to the above questions.

I think the updated code makes sense, provided if we decide that
void to empty is just like transitioning between two identical
(presumably non-empty) trees.  The updated documentation is a lot
more readable as well.

Comments from those who have been involved in filter-branch?

>  Documentation/git-filter-branch.txt | 14 ++
>  git-filter-branch.sh|  2 ++
>  2 files changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/git-filter-branch.txt 
> b/Documentation/git-filter-branch.txt
> index 0a09698c0..6e4bb0220 100644
> --- a/Documentation/git-filter-branch.txt
> +++ b/Documentation/git-filter-branch.txt
> @@ -167,14 +167,12 @@ to other tags will be rewritten to point to the 
> underlying commit.
>   project root. Implies <>.
>  
>  --prune-empty::
> - Some kind of filters will generate empty commits, that left the tree
> - untouched.  This switch allow git-filter-branch to ignore such
> - commits.  Though, this switch only applies for commits that have one
> - and only one parent, it will hence keep merges points. Also, this
> - option is not compatible with the use of `--commit-filter`. Though you
> - just need to use the function 'git_commit_non_empty_tree "$@"' instead
> - of the `git commit-tree "$@"` idiom in your commit filter to make that
> - happen.
> + Some filters will generate empty commits that leave the tree untouched.
> + This option instructs git-filter-branch to remove such commits if they
> + have exactly one or zero non-pruned parents; merge commits will
> + therefore remain intact.  This option cannot be used together with
> + `--commit-filter`, though the same effect can be achieved by using the
> + provided `git_commit_non_empty_tree` function in a commit filter.
>  
>  --original ::
>   Use this option to set the namespace where the original commits
> diff --git a/git-filter-branch.sh b/git-filter-branch.sh
> index 86b2ff1e0..2b8cdba15 100755
> --- a/git-filter-branch.sh
> +++ b/git-filter-branch.sh
> @@ -46,6 +46,8 @@ git_commit_non_empty_tree()
>  {
>   if test $# = 3 && test "$1" = $(git rev-parse "$3^{tree}"); then
>   map "$3"
> + elif test $# = 1 && test "$1" = 
> 4b825dc642cb6eb9a060e54bf8d69288fbee4904; then
> + :
>   else
>   git commit-tree "$@"
>   fi

Re: [PATCH 4/4] ident: do not ignore empty config name/email

2017-02-23 Thread Junio C Hamano

Jeff King  writes:

> This one is perhaps questionable. Maybe somebody is relying on setting a
> per-repo user.name to override a ~/.gitconfig value and enforce
> auto-detection?

Thanks for splitting this step out.  1/4 and 2/4 are obvious
improvements, and 3/4 is a very sensible fix.  Compared to those
three, this one does smell questionable, because I do not quite see
any other reasonable fallback other than the auto-detection if the
user gives an empty ident on purpose.  

Erroring out to say "don't do that" is probably not too bad, but
perhaps we are being run by a script that is doing a best-effort
conversion from $ANOTHER_SCM using a list of known authors that is
incomplete, ending up feeding empty ident and allowing us to fall
back to attribute them to the user who runs the script.  I do not
see a point in breaking that user and having her or him update the
script to stuff in a truly bogus "Unknown " name.

>
>  ident.c   |  4 ++--
>  t/t7518-ident-corner-cases.sh | 11 +++
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/ident.c b/ident.c
> index ead09ff7f..c0364fe3a 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -153,7 +153,7 @@ static void copy_email(const struct passwd *pw, struct 
> strbuf *email,
>  
>  const char *ident_default_name(void)
>  {
> - if (!git_default_name.len) {
> + if (!(ident_config_given & IDENT_NAME_GIVEN) && !git_default_name.len) {
>   copy_gecos(xgetpwuid_self(_name_is_bogus), 
> _default_name);
>   strbuf_trim(_default_name);
>   }
> @@ -162,7 +162,7 @@ const char *ident_default_name(void)
>  
>  const char *ident_default_email(void)
>  {
> - if (!git_default_email.len) {
> + if (!(ident_config_given & IDENT_MAIL_GIVEN) && !git_default_email.len) 
> {
>   const char *email = getenv("EMAIL");
>  
>   if (email && email[0]) {
> diff --git a/t/t7518-ident-corner-cases.sh b/t/t7518-ident-corner-cases.sh
> index 3d2560c3c..ef570ac62 100755
> --- a/t/t7518-ident-corner-cases.sh
> +++ b/t/t7518-ident-corner-cases.sh
> @@ -22,4 +22,15 @@ test_expect_success 'commit rejects all-crud name' '
>   git commit --allow-empty -m foo
>  '
>  
> +# We must test the actual error message here, as an unwanted
> +# auto-detection could fail for other reasons.
> +test_expect_success 'empty configured name does not auto-detect' '
> + (
> + sane_unset GIT_AUTHOR_NAME &&
> + test_must_fail \
> + git -c user.name= commit --allow-empty -m foo 2>err &&
> + test_i18ngrep "empty ident name" err
> + )
> +'
> +
>  test_done

Re: SHA1 collisions found

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 09:49:09PM +0100, Jakub Narębski wrote:

> > How is GIT affected? GIT strongly relies on SHA-1 for the
> > identification and integrity checking of all file objects and
> > commits. It is essentially possible to create two GIT repositories
> > with the same head commit hash and different contents, say a benign
> > source code and a backdoored one. An attacker could potentially
> > selectively serve either repository to targeted users. This will
> > require attackers to compute their own collision.
> 
> The attack on SHA-1 presented there is "identical-prefix" collision,
> which is less powerful than "chosen-prefix" collision.  It is the
> latter that is required to defeat SHA-1 used in object identity.
> Objects in Git _must_ begin with given prefix;

I don't think this helps. The chosen-prefix lets you append hash data to
an existing file. Here we just have identical prefixes in the two
colliding halves. In the real-world example, they used a PDF header. But
it could have been a PDF header with "blob 1234" prepended to it (note
also that Git's use of the size doesn't help; the attack files are the
same length).

> the use of zlib
> compression adds to the difficulty.  'Forged' Git object would
> simply not validate...

No, zlib doesn't help. The sha1 is computed on the uncompressed data.

-Peff

Re: SHA1 collisions found

2017-02-23 Thread Joey Hess

Jeff King wrote:
> It's not an identical prefix, but I think collision attacks generally
> are along the lines of selecting two prefixes followed by garbage, and
> then mutating the garbage on both sides. That would "work" in this case
> (modulo the fact that git would complain about the NUL).
> 
> I haven't read the paper yet to see if that is the case here, though.

The current attack is an identical-prefix attack, not chosen-prefix, so
not quite to that point yet.

The MD5 chosen-prefix attack was 2^15 harder than the known-prefix attack,
but who knows if the numbers will be comprable for SHA1.

> A related case is if you could stick a "cruft " header at the end of
> the commit headers, and mutate its value (avoiding newlines). fsck
> doesn't complain about that.

git log and git show don't show such cruft headers either.

BTW, the SHA attack only added ~128 bytes to the pdfs, not really a
huge amount of garbage.

-- 
see shy jo

signature.asc
Description: PGP signature

Re: [PATCH] git svn branch fails with authenticaton failures

2017-02-23 Thread Eric Wong

Hiroshi Shirosaki  wrote:
> I have the following authentication failure while svn rebase and
> svn dcommit works fine without authentication failures.
> 
> $ git svn branch v7_3
> Copying https://xxx at r27519
> to https:///v7_3...
> Can't create session: Unable to connect to a repository at URL
> 'https://': No more
> credentials or we tried too many times.
> Authentication failed at
> C:\Program Files\Git\mingw64/libexec/git-core\git-svn line 1200.
> 
> I can workaround the issue to add auth configuration to
> SVN::Client->new().

Missing sign-off (see Documentation/SubmittingPatches).
Not my rule, but it's unfortunately required for this project.

Also, the Subject: should be in the imperative mood,
Perhaps something like:

Subject: [PATCH] git svn: fix authentication with 'branch'

I am less picky about the message body.

> --- a/git-svn.perl
> +++ b/git-svn.perl
> @@ -1175,10 +1175,10 @@ sub cmd_branch {
>   ::_req_svn();
>   require SVN::Client;
>  
> + my ($config, $baton, $callbacks) = Git::SVN::Ra::prepare_config_once();

Since we're not using it, here, you can avoid setting a variable
for $callbacks more explicitly:

my ($config, $baton, undef) = Git::SVN::Ra::prepare_config_once();

>   my $ctx = SVN::Client->new(
> - config => SVN::Core::config_get_config(
> - $Git::SVN::Ra::config_dir
> - ),
> + auth => $baton,
> + config => $config,
>   log_msg => sub {
>   ${ $_[0] } = defined $_message
>   ? $_message
> -- 

Anyways, this looks like a good change.  I will accept a v2
with your sign-off and changes noted above.  Thank you.

Re: SHA1 collisions found

2017-02-23 Thread Jakub Narębski

W dniu 23.02.2017 o 18:12, David Lang pisze:
> On Thu, 23 Feb 2017, Junio C Hamano wrote:
> 
>> On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess  wrote:
>>> 
>>> Since we now have collisions in valid PDF files, collisions in
>>> valid git commit and tree objects are probably able to be
>>> constructed.
>> 
>> That may be true, but 
>> https://public-inbox.org/git/pine.lnx.4.58.0504291221250.18...@ppc970.osdl.org/
>>
>
> it doesn't help that the Google page on this explicitly says that
> this shows that it's possible to create two different git repos that
> have the same hash but different contents.
> 
> https://shattered.it/
> 
> How is GIT affected? GIT strongly relies on SHA-1 for the
> identification and integrity checking of all file objects and
> commits. It is essentially possible to create two GIT repositories
> with the same head commit hash and different contents, say a benign
> source code and a backdoored one. An attacker could potentially
> selectively serve either repository to targeted users. This will
> require attackers to compute their own collision.

The attack on SHA-1 presented there is "identical-prefix" collision,
which is less powerful than "chosen-prefix" collision.  It is the
latter that is required to defeat SHA-1 used in object identity.
Objects in Git _must_ begin with given prefix; the use of zlib
compression adds to the difficulty.  'Forged' Git object would
simply not validate...

https://arstechnica.com/security/2017/02/at-deaths-door-for-years-widely-used-sha1-function-is-now-dead/

Re: [PATCH] http(s): automatically try NTLM authentication first

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 12:37:25PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > I suspect it isn't enough to help without 2/2. This will tell curl that
> > the server does not do Negotiate, so it will skip the probe request. But
> > Git will still feed curl the bogus empty credential.
> >
> > That's what 2/2 tries to fix: only kick in the emptyAuth hack when there
> > is something besides Basic[1] to try. The way it is written adds an
> 
> In your [1] you wanted to mention that Digest would have the same
> property as Basic, or something like that?

Oops, yeah. What I was going to say is that we may want a list of auth
types where we _do_ want the hack on, rather than ones where we know it
does not work. People are more likely to notice when the list is wrong,
then.

> > But if we are worried about turning on emptyAuth everywhere, the auto
> > behavior could be tied to emptyauth=true (and have something like
> > "emptyauth=always" to _really_ force it). I don't have an opinion there.
> 
> I do not have a strong opinion, either, but it sounds like that even
> the "disable emptyAuth hack if the server is Basic only" variant
> would be much better than setting emptyAuth on by default.  At least
> the user whose issue was reported in Dscho's message would be fixed
> by such a variant, I would think (i.e. talking to a server with no
> Negotiate and emptyAuth set to true results in no attempt to give
> the user a chance to tell who s/he is --- your 2/2 will turn
> emptyAuth off in that case).

Yes, I agree that the "auto" behavior is better than defaulting to
"true". I am speaking from the perspective of git.git, which is
currently defaulting to "false". It is not clear to me if "auto" is
better than "false" because of the security implications.

For Git for Windows, it seems like the auto behavior would be a strict
improvement over the "true" default they've been shipping.

-Peff

Re: SHA1 collisions found

2017-02-23 Thread Øyvind A . Holm

On 2017-02-23 11:09:32, Linus Torvalds wrote:
> I'm aware of the fsck checks, but I have to admit I wasn't aware of 
> 'transfer.fsckobjects'. I should turn that on myself.
>
> Or maybe git should just turn it on by default?

The problem with this is that there are many repos with errors out 
there, for example coreutils.git and nasm.git, which complains about 
"missingSpaceBeforeDate: invalid author/committer line - missing space 
before date".

There are also lots of repositories bitten by the Github bug from back 
in 2011 where they zero-padded the file modes, git clone aborts with 
"zeroPaddedFilemode: contains zero-padded file modes".

Paranoid as I am, I'm using fetch.fsckObjects and receive.fsckObjects 
set to "true", but that means I'm not able to clone repositories with 
these kind of errors, have to use the alias

  fclone = clone -c "fetch.fsckObjects=false"

So enabling them by default will create problems among users. Of course, 
one solution would be to turn these kind of errors into warnings so the 
clone isn't aborted.

Reagards,
Øyvind

+-| Øyvind A. Holm  - N 60.37604° E 5.9° |-+
| OpenPGP: 0xFB0CBEE894A506E5 - http://www.sunbase.org/pubkey.asc |
| Fingerprint: A006 05D6 E676 B319 55E2  E77E FB0C BEE8 94A5 06E5 |
+| c7e47a18-fa06-11e6-ad93-db5caa6d21d3 |-+

signature.asc
Description: PGP signature

Re: [PATCH] http(s): automatically try NTLM authentication first

2017-02-23 Thread Junio C Hamano

Jeff King  writes:

> I suspect it isn't enough to help without 2/2. This will tell curl that
> the server does not do Negotiate, so it will skip the probe request. But
> Git will still feed curl the bogus empty credential.
>
> That's what 2/2 tries to fix: only kick in the emptyAuth hack when there
> is something besides Basic[1] to try. The way it is written adds an

In your [1] you wanted to mention that Digest would have the same
property as Basic, or something like that?

> extra "auto" mode to emptyAuth, as I wanted to leave "emptyauth=true" as
> a workaround in case the "auto" behavior does not work. And then I
> turned on "auto" by default, since that was what the discussion was
> shooting for.
>
> But if we are worried about turning on emptyAuth everywhere, the auto
> behavior could be tied to emptyauth=true (and have something like
> "emptyauth=always" to _really_ force it). I don't have an opinion there.

I do not have a strong opinion, either, but it sounds like that even
the "disable emptyAuth hack if the server is Basic only" variant
would be much better than setting emptyAuth on by default.  At least
the user whose issue was reported in Dscho's message would be fixed
by such a variant, I would think (i.e. talking to a server with no
Negotiate and emptyAuth set to true results in no attempt to give
the user a chance to tell who s/he is --- your 2/2 will turn
emptyAuth off in that case).

git bugs

2017-02-23 Thread Sean Hunt

There are a few bugs I git I noticed when using mingw, mingw64,
cygwin, and cygwin64. These bugs are the following:

if I do git ``rebase -i --root`` and tell it to edit every commit to
gpg sign all my commits it bugs out and merges all of the commits into
1 commit instead of only appending the ``-S`` to each and every commit
and keeping all of the commits. It is as if I told it to squash the
commits but yet I did not. There is also another bug where if I clone
a repo on Windows and not on github desktop and that I placed commits
to the repo on github web and then when I rebase to squash the commits
to 1 commit (some repos are doing it as a requirement for 1 commit
PR's) that all of my commits on the remote (fork in this case) that is
linked to an open pull request are discarded and then the pull request
is somehow and oddly closed. It is super annoying.

Re: [PATCH] upload-pack: report "not our ref" to client

2017-02-23 Thread Junio C Hamano

Thanks.

RE: [PATCH 2/2] http: add an "auto" mode for http.emptyauth

2017-02-23 Thread David Turner

> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Thursday, February 23, 2017 2:44 PM
> To: David Turner 
> Cc: Junio C Hamano ; git@vger.kernel.org;
> sand...@crustytoothpaste.net; Johannes Schindelin
> ; Eric Sunshine 
> Subject: Re: [PATCH 2/2] http: add an "auto" mode for http.emptyauth
> 
> On Thu, Feb 23, 2017 at 04:31:13PM +, David Turner wrote:
> 
> > > As somebody who is using non-Basic auth, can you apply these patches
> > > and show us the output of:
> > >
> > >GIT_TRACE_CURL=1 \
> > >git ls-remote https://your-server 2>&1 >/dev/null |
> > >egrep '(Send|Recv) header: (GET|HTTP|Auth)'
> > >
> > > (without http.emptyauth turned on, obviously).
> >
> > The results appear to be identical with and without the patch.  With
> > http.emptyauth turned off,
> > 16:27:28.208924 http.c:524  => Send header: GET
> /info/refs?service=git-upload-pack HTTP/1.1
> > 16:27:28.212872 http.c:524  <= Recv header: HTTP/1.1 401
> Authorization Required
> > Username for 'http://git': [I just pressed enter] Password for
> > 'http://git': [ditto]
> > 16:27:29.928872 http.c:524  => Send header: GET
> /info/refs?service=git-upload-pack HTTP/1.1
> > 16:27:29.929787 http.c:524  <= Recv header: HTTP/1.1 401
> Authorization Required
> 
> Just to be sure: did you remove http.emptyauth config completely from your
> config files, or did you turn it to "false"? Because the new behavior only 
> kicks
> in when it isn't configured at all (probably we should respect "auto" as a 
> user-
> provided name).

I turned it to false. With it completely removed, I get this, both times:

20:03:49.896797 http.c:524  => Send header: GET 
/info/refs?service=git-upload-pack HTTP/1.1
20:03:49.900776 http.c:524  <= Recv header: HTTP/1.1 401 
Authorization Required
20:03:49.900929 http.c:524  => Send header: GET 
/info/refs?service=git-upload-pack HTTP/1.1
20:03:49.904754 http.c:524  <= Recv header: HTTP/1.1 401 
Authorization Required
20:03:49.906649 http.c:524  => Send header: GET 
/info/refs?service=git-upload-pack HTTP/1.1
20:03:49.906654 http.c:524  => Send header: Authorization: 
Negotiate 
20:03:49.956753 http.c:524  <= Recv header: HTTP/1.1 200 OK - 
$gitservername

> > (if someone else wants to replicate this, delete >/dev/null bit from
> > Jeff's shell snippet)
> 
> Hrm, you shouldn't need to. The stderr redirection comes first, so it should
> become the new stdout.

Weird.  It didn't appear work earlier, but I must have screwed something up.
And I learned something about shell redirection.

Re: SHA1 collisions found

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 11:09:32AM -0800, Linus Torvalds wrote:

> On Thu, Feb 23, 2017 at 10:46 AM, Jeff King  wrote:
> >>
> >> So I agree with you that we need to make git check for the opaque
> >> data. I think I was the one who brought that whole argument up.
> >
> > We do already.
> 
> I'm aware of the fsck checks, but I have to admit I wasn't aware of
> 'transfer.fsckobjects'. I should turn that on myself.
> 
> Or maybe git should just turn it on by default? At least the
> per-object fsck costs should be essentially free compared to the
> network costs when you just apply them to the incoming objects.

Yeah, they're not expensive. We've discussed enabling them by default.
The sticking point is that there is old history with minor bugs which
triggers some warnings (e.g., malformed committer names), and it would
be annoying to start rejecting that unconditionally.

So I think we would need a good review of what is a "warning" versus an
"error", and to only reject on errors (right now the NUL thing is a
warning, and it should probably upgraded).

> And in particular, while the *kernel* doesn't generally have critical
> opaque blobs, other projects do. Things like firmware images etc are
> open to attack, and crazy people put ISO images in repositories etc.
> 
> So I don't think this discussion should focus exclusively on the git metadata.
> 
> It is likely much easier to replace a binary blob than it is to
> replace a commit or tree (or a source file that has to go through a
> compiler). And for many projects, that would be a bad thing.

Yes, I'd agree we need to consider both. And no matter what Git does in
its own data formats, blobs will always be a sequence of bytes. Hiding
collision-cruft in them isn't up to us, but rather the data format.

The nice thing about a blob collision, though, is that you can only
replace the opaque files, not, say, C source code. That doesn't make it
a non-issue, but it reduces the scope of an attack.

Replacing a commit or tree wholesale means the attacker has a lot more
flexibility. So to whatever degree we can make that harder (like
complaining of commits with NULs), the better.

> > It's not an identical prefix, but I think collision attacks generally
> > are along the lines of selecting two prefixes followed by garbage, and
> > then mutating the garbage on both sides. That would "work" in this case
> > (modulo the fact that git would complain about the NUL).
> 
> I think this particular attack depended on an actual identical prefix,
> but I didn't go back to the paper and check.

The paper describes the content as:

  SHA-1(P | M1 | M2 | S)

and they replace both "M1" and "M2", with a near-collision for the
first, and then the final collision for the second. What's not clear to
me is if part of M1 can be chosen, or if it's perturbed fully into
random garbage.

> But the attacks tend to very much depend on particular input bit
> patterns that have very particular effects on the resulting
> intermediate hash, and those bit patterns are specific to the hash and
> known.
> 
> So a very powerful defense is to just look for those bit patterns in
> the objects, and just warn about them. Those patterns don't tend to
> exist in normal inputs anyway, but particularly if you just warn, it's
> a heads-ups that "ok, something iffy is going on"

Yes, that would be a wonderful hardening to put into Git if we know what
those patterns look like. That part isn't clear to me.

> The whole _point_ of an SCM is that it isn't about a one-time event,
> but about continuous history. That also fundamentally means that a
> successful attack needs to work over time, and not be detectable.

Yeah, I'd certainly agree with that. You spend loads of money to
generate a collision, there's a reasonably high chance of detection, and
then as soon as one person detects it, your investment is lost.

According to the paper, the current cost of the computation for a single
collision is ~$670K.

At least for now, an attacker is much better off using that money to
break into your house and install a keylogger.

-Peff

Re: [PATCH v5 1/1] config: add conditional include

2017-02-23 Thread Junio C Hamano

Nguyễn Thái Ngọc Duy   writes:

>> There was some discussion after v4. I think the open issues are:
>>
>>   - the commit message is rather terse (it should describe motivation,
>> and can refer to the docs for the "how")

> This allows some more flexibility in managing configuration across
> repositories. 

That is not an ideal opening to describe motivation without people
knowing what "this" is ;-) Of course, the person who wrote the
sentence know it already after writing the patch and the subject,
but others don't.

Sometimes a set of repositories want to share configuration
settings among themselves that are distinct from other such
sets of repositories.  A user may work on two projects, each
of which have multiple repositories, and use one user.email
for one project while using another for the other.  Having
the settings in ~/.gitconfig, which would work for just one
set of repositories, would not well in such a situation.

Extend the include.path mechanism that lets a config file
include another config file, so that the inclusion can be
done only when some conditions hold.  Then ~/.gitconfig can
say "include config-project-A only when working on project-A"
for each project A the user works on.

In this patch, the only supported grouping is based on
$GIT_DIR (in absolute path), so you would need to group
repositories by directory, or something like that to take
advantage of it.

We already have include.path for unconditional
includes. This patch goes with include-if.xxx.path to make
it clearer that a condition is required.

Similar to include.path, older git versions that don't
understand include-if will simply ignore them.

or something along that line?

> +Conditional includes
> +
> +
> +You can include one config file from another conditionally by setting
> +a special `include-if..path` variable to the name of the
> +file to be included. The variable is treated the same way as
> +`include.path`.

Drop "special", as all configuration variables are "special" in their
own sense, it does not add any useful information.

I think we avoid '-' in Git-native variable and section names, so
"include-if" would become an odd-man-out.

The variable is obviously not treated the same way as include.path ;-)

When includeIf..path variable is set in a
configuration file, the configuration file named by that
variable is included (in a way similar to how include.path
works) only if the  holds true.

> +The condition starts with a keyword, followed by a colon and a
> +pattern. The interpretation of the pattern depends on the keyword.

"a pattern"?  I think it is "some data whose format and meaning
depends on the keyword".

Hence...

> +Supported keywords are:
> +
> +`gitdir`::
> + The environment variable `GIT_DIR` must match the following
> + pattern for files to be included. The pattern can contain

The data that follows the keyword `gitdir:` is used as a
glob pattern.  If the location of the .git directory (which
may be auto-discovered, or come from `GIT_DIR` environment
variable) match the pattern, the `` becomes true.

> + ...
> + * If the pattern ends with `/`, `**` will be automatically added. For
> +   example, the pattern `foo/` becomes `foo/**`. In other words, it
> +   matches "foo" and everything inside, recursively.
> +
> +`gitdir/i`::
> + This is the same as `gitdir` except that matching is done
> + case-insensitively (e.g. on case-insensitive file sytems)
> +
> +A few more notes on matching:

As future  may not be about path or matching at all, this
belongs to `gitdir`:: section (and it would be obvious that that
applies to `gitdir/i`:: because we say "this is the same as...").

> + * Symlinks in `$GIT_DIR` are not resolved before matching.
> +
> + * Note that "../" is not special and will match literally, which is
> +   unlikely what you want.

> +static int prepare_include_condition_pattern(struct strbuf *pat)
> +{
> + struct strbuf path = STRBUF_INIT;
> + int prefix = 0;
> +
> + /* TODO: maybe support ~user/ too */
> + if (pat->buf[0] == '~' && is_dir_sep(pat->buf[1])) {
> + const char *home = getenv("HOME");
> +
> + if (!home)
> + return error(_("$HOME is not defined"));

Instead of half-duplicating it here yourself, can't we let
expand_user_path() do its thing?

> +static int include_condition_is_true(const char *cond, size_t cond_len)
> +{
> + /* no condition (i.e., "include.path") is always true */
> + if (!cond)
> + return 1;
> +
> + if (skip_prefix_mem(cond, cond_len, "gitdir:", , _len))
> + return include_by_gitdir(cond, cond_len, 0);
> + else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", , _len))
> + return

Re: SHA1 collisions found

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 11:47:16AM -0800, Linus Torvalds wrote:

> On Thu, Feb 23, 2017 at 11:32 AM, Jeff King  wrote:
> >
> > Yeah, they're not expensive. We've discussed enabling them by default.
> > The sticking point is that there is old history with minor bugs which
> > triggers some warnings (e.g., malformed committer names), and it would
> > be annoying to start rejecting that unconditionally.
> >
> > So I think we would need a good review of what is a "warning" versus an
> > "error", and to only reject on errors (right now the NUL thing is a
> > warning, and it should probably upgraded).
> 
> I think even a warning (as opposed to failing the operation) is
> already a big deal.
> 
> If people start saying "why do I get this odd warning", and start
> looking into it, that's going to be a pretty strong defense against
> bad behavior. SCM attacks depend on flying under the radar.

Sorry, I conflated two things there. I agree a warning is better than
nothing. But right now transfer.fsck croaks even for warnings, and there
are some warnings that it is not worth croaking for. So before we turn
it on, we need to stop croaking on warnings (and possibly bump up some
warnings to errors).

I think it _is_ important to have dangerous things as errors, though.
Because it helps an unattended server (where nobody would see the
warning) avoid being a vector for spreading malicious objects to older
clients which do not do the fsck.

> There's actually already code for that, pointed to by the shattered project:
> 
>   https://github.com/cr-marcstevens/sha1collisiondetection
> 
> the "meat" of that check is in lib/ubc_check.c.

Thanks, I hadn't seen that yet. That doesn't look like it should be hard
to integrate into Git.

-Peff

Re: SHA1 collisions found

2017-02-23 Thread Linus Torvalds

On Thu, Feb 23, 2017 at 11:32 AM, Jeff King  wrote:
>
> Yeah, they're not expensive. We've discussed enabling them by default.
> The sticking point is that there is old history with minor bugs which
> triggers some warnings (e.g., malformed committer names), and it would
> be annoying to start rejecting that unconditionally.
>
> So I think we would need a good review of what is a "warning" versus an
> "error", and to only reject on errors (right now the NUL thing is a
> warning, and it should probably upgraded).

I think even a warning (as opposed to failing the operation) is
already a big deal.

If people start saying "why do I get this odd warning", and start
looking into it, that's going to be a pretty strong defense against
bad behavior. SCM attacks depend on flying under the radar.

>> So a very powerful defense is to just look for those bit patterns in
>> the objects, and just warn about them. Those patterns don't tend to
>> exist in normal inputs anyway, but particularly if you just warn, it's
>> a heads-ups that "ok, something iffy is going on"
>
> Yes, that would be a wonderful hardening to put into Git if we know what
> those patterns look like. That part isn't clear to me.

There's actually already code for that, pointed to by the shattered project:

  https://github.com/cr-marcstevens/sha1collisiondetection

the "meat" of that check is in lib/ubc_check.c.

  Linus

Re: [PATCH 2/2] http: add an "auto" mode for http.emptyauth

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 04:31:13PM +, David Turner wrote:

> > As somebody who is using non-Basic auth, can you apply these patches and
> > show us the output of:
> > 
> >GIT_TRACE_CURL=1 \
> >git ls-remote https://your-server 2>&1 >/dev/null |
> >egrep '(Send|Recv) header: (GET|HTTP|Auth)'
> > 
> > (without http.emptyauth turned on, obviously).
> 
> The results appear to be identical with and without
> the patch.  With http.emptyauth turned off,
> 16:27:28.208924 http.c:524  => Send header: GET 
> /info/refs?service=git-upload-pack HTTP/1.1
> 16:27:28.212872 http.c:524  <= Recv header: HTTP/1.1 401 
> Authorization Required
> Username for 'http://git': [I just pressed enter]
> Password for 'http://git': [ditto]
> 16:27:29.928872 http.c:524  => Send header: GET 
> /info/refs?service=git-upload-pack HTTP/1.1
> 16:27:29.929787 http.c:524  <= Recv header: HTTP/1.1 401 
> Authorization Required

Just to be sure: did you remove http.emptyauth config completely from
your config files, or did you turn it to "false"? Because the new
behavior only kicks in when it isn't configured at all (probably we
should respect "auto" as a user-provided name).

> (if someone else wants to replicate this, delete >/dev/null bit 
> from Jeff's shell snippet)

Hrm, you shouldn't need to. The stderr redirection comes first, so it
should become the new stdout.

-Peff

Re: [PATCH] http(s): automatically try NTLM authentication first

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 06:08:49PM +0100, Johannes Schindelin wrote:

> > I suspect the patch above could probably be generalized as:
> > 
> >   /* cut out methods we know the server doesn't support */
> >   http_auth_methods &= results.auth_avail;
> > 
> > and let curl figure it out from there.
> 
> Maybe this patch (or a variation thereof) would also be able to fix this
> problem with the patch:
> 
>   https://github.com/git-for-windows/git/issues/1034
> 
> Short version: for certain servers (that do *not* advertise Negotiate),
> setting emptyauth to true will result in a failed fetch, without letting
> the user type in their credentials.

I suspect it isn't enough to help without 2/2. This will tell curl that
the server does not do Negotiate, so it will skip the probe request. But
Git will still feed curl the bogus empty credential.

That's what 2/2 tries to fix: only kick in the emptyAuth hack when there
is something besides Basic[1] to try. The way it is written adds an
extra "auto" mode to emptyAuth, as I wanted to leave "emptyauth=true" as
a workaround in case the "auto" behavior does not work. And then I
turned on "auto" by default, since that was what the discussion was
shooting for.

But if we are worried about turning on emptyAuth everywhere, the auto
behavior could be tied to emptyauth=true (and have something like
"emptyauth=always" to _really_ force it). I don't have an opinion there.
It sounds like emptyauth has been enabled by default on Windows for a
while. It's not clear to me if that's a security problem or not.

-Peff

Re: [PATCH] http(s): automatically try NTLM authentication first

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 11:11:05AM -0800, Junio C Hamano wrote:

> >> As far as Kerberos, this is a desirable feature to have enabled, with
> >> little downside.  I just don't know about the security of the NTLM part,
> >> and I don't think we should take this patch unless we're sure we know
> >> the consequences of it.
> >
> > Hmm. That would be a problem with my proposed patch 2 then, too, if only
> > because it turns the feature on by default in more places.
> >
> > If it _is_ dangerous to turn on all the time, I'd think we should
> > consider warning people in the http.emptyauth documentation.
> 
> I presume that we have finished discussing the security
> ramification, and if I am not mistaken the conclusion was that it
> could leak information if we turned on emptyAuth unconditionally
> when talking to a wrong server, and that the documentation needs an
> update to recommend those who use emptyAuth because they want to
> talk to Negotiate servers to use the http..emptyAuth form,
> limited to such servers, not a more generic http.emptyAuth, to avoid
> information leakage?

I don't know enough to evaluate the claims of emptyAuth being dangerous
or not (nor do I use it myself or admin a server whose users need it).
So I will let interested parties hash out whether it is a good idea or
not, and I'm happy to drop my 2/2 for now.

If we are to make it more widely available, I would prefer something
more like my 2/2 than always turning on http.emptyAuth, if only because
it reduces the cost to people not using the feature. I'm happy to work
more on the patch if we decide to go that route.

> If that is the case, let's take your 1/2 in the near-by thread
> without 2/2 (auto-enable emptyAuth) for now, as Dscho seems to have
> a case that may be helped by it.

Yes, I think 1/2 stands on its own. Whether it helps Dscho's case or
not, it eliminates an HTTP round-trip for Basic-only servers, which I
think is worth it.

-Peff

Fwd: Re: feature request: user email config per domain

2017-02-23 Thread Igor Djordjevic

Forwarding a message that ended on my end only, probably by accident.

 Forwarded Message 
Subject: Re: feature request: user email config per domain
Date: Thu, 23 Feb 2017 13:32:56 +0530
From: Tushar Kapila 
To: Igor Djordjevic BugA 

Hello All,
> I'd much rather see something based on the working tree path, like
Duy's conditional config includes. Then you write something in your
~/.gitconfig

> This would allow you to have two root directories, one for your work
projects and one for open source projects (for example).

I guess this can be extended for any number of root directories. Like,
when a consultant has multiple employer email ids.

This sounds great and it would enforce at commit time, which as
pointed out, is the correct time to do it. If for some reason it is
not adopted I at least hope that we have a simple global config which
specifies that user must set email for each repo and to ignore any
global config.

Am sure this can be done via hooks, but I would like something that is
really simple for newbies and companies to enforce with minimal
instruction.

* Will try what Igor and Grant Humphries suggests.

* Thank you all for your replies, and a big thank you for git - its a
great tool. I used to dream of something like it when I was stuck with
svn (pre 2010 I was introduced to git late.)

[1] http://stackoverflow.com/a/42354282
[2] http://stackoverflow.com/users/2167004

usability bug: git-gui: keyboard shortcuts don't operate correctly on multi-file selections

2017-02-23 Thread peter fargas

Ctrl+T/Ctrl+U add/remove only one file, not the whole selection - used
to work. Neither are access keys for menu underlined (Ease of access
center > underline keyboard shortcuts is on), so there is no way to
effectively work with keyb only. 

git-gui verison 0.21 GITGUI 
git version 2.11.1.windows.1
Tcl/Tk 8.6.6
64-bit installer
Windows 7 Professional

Originally posted at
https://github.com/git/git-scm.com/issues/939#issuecomment-276024341


peter.fargas @
informatik-handwerk.de
0176 / 458 67 358

Re: SHA1 collisions found

2017-02-23 Thread David Lang



pointers to a little more info


https://shattered.it/static/
the two files are:

https://shattered.it/static/shattered-1.pdf
https://shattered.it/static/shattered-2.pdf

422435 shattered-2.pdf
422435 shattered-1.pdf

identical length and a lot smaller than I expected (~162K of the 413K file is 
binary junk)



$ sha1sum shattered-*pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf

$ sum shattered-*pdf
62721   413 shattered-1.pdf
41606   413 shattered-2.pdf

$ md5sum shattered-*pdf
ee4aa52b139d925f8d8884402b0a750c  shattered-1.pdf
5bd9d8cabc46041579a311230539b8d1  shattered-2.pdf

David Lang

Re: SHA1 collisions found

2017-02-23 Thread Morten Welinder

The attack seems to generate two 64-bytes blocks, one quarter of which
is repeated data.  (Table-1 in the paper.)

Assuming the result of that is evenly distributed and that bytes are
independent, we can estimate the chances that the result is NUL-free
as (255/256)^192 = 47% and the probability that the result is NUL and
newline free as (254/256)^192 = 22%.  Clearly one should not rely of
NULs or newlines to save the day.  On  the other hand, the chances of
an ascii result is something like (95/256)^192 = 10^-83.

The actual collision in the paper has no newline, but it does have a NUL.

M.

On Thu, Feb 23, 2017 at 1:31 PM, Joey Hess  wrote:
> Joey Hess wrote:
>> Linus Torvalds wrote:
>> > What you describe pretty much already requires a pre-image attack,
>> > which the new attack is _not_.
>> >
>> > It's not clear that the "good" object can be anything sane.
>>
>> Generate a regular commit object; use the entire commit object + NUL as the
>> chosen prefix, and use the identical-prefix collision attack to generate
>> the colliding good/bad objects.
>>
>> (The size in git's object header is a minor complication. Set the size
>> field to something sufficiently large, and then pad out the colliding
>> objects to that size once they're generated.)
>
> Sorry! While that would work, it's a useless attack because the good and bad
> commit objects still point to the same tree.
>
> It would be interesting to have such colliding objects, to see what beaks,
> but probably not worth $75k to generate them.
>
> --
> see shy jo

Re: [PATCH] http(s): automatically try NTLM authentication first

2017-02-23 Thread Junio C Hamano

Jeff King  writes:

> On Wed, Feb 22, 2017 at 11:34:19PM +, brian m. carlson wrote:
>
>> Browsers usually disable this feature by default, as it basically will
>> attempt to authenticate to any site that sends a 401.  For Kerberos
>> against a malicious site, the user will either not have a valid ticket
>> for that domain, or the user's Kerberos server will refuse to provide a
>> ticket to pass to the server, so there's no security risk involved.
>> 
>> I'm unclear how SPNEGO works with NTLM, so I can't speak for the
>> security of it.  From what I understand of NTLM and from RFC 4559, it
>> consists of a shared secret.  I'm unsure what security measures are in
>> place to not send that to an untrusted server.
>> 
>> As far as Kerberos, this is a desirable feature to have enabled, with
>> little downside.  I just don't know about the security of the NTLM part,
>> and I don't think we should take this patch unless we're sure we know
>> the consequences of it.
>
> Hmm. That would be a problem with my proposed patch 2 then, too, if only
> because it turns the feature on by default in more places.
>
> If it _is_ dangerous to turn on all the time, I'd think we should
> consider warning people in the http.emptyauth documentation.

I presume that we have finished discussing the security
ramification, and if I am not mistaken the conclusion was that it
could leak information if we turned on emptyAuth unconditionally
when talking to a wrong server, and that the documentation needs an
update to recommend those who use emptyAuth because they want to
talk to Negotiate servers to use the http..emptyAuth form,
limited to such servers, not a more generic http.emptyAuth, to avoid
information leakage?

If that is the case, let's take your 1/2 in the near-by thread
without 2/2 (auto-enable emptyAuth) for now, as Dscho seems to have
a case that may be helped by it.

Re: SHA1 collisions found

2017-02-23 Thread Linus Torvalds

On Thu, Feb 23, 2017 at 10:46 AM, Jeff King  wrote:
>>
>> So I agree with you that we need to make git check for the opaque
>> data. I think I was the one who brought that whole argument up.
>
> We do already.

I'm aware of the fsck checks, but I have to admit I wasn't aware of
'transfer.fsckobjects'. I should turn that on myself.

Or maybe git should just turn it on by default? At least the
per-object fsck costs should be essentially free compared to the
network costs when you just apply them to the incoming objects.

I also do think that it would be good to check for the disturbance
vectors at receive time (and fsck). Not necessarily interesting during
normal operations.

And in particular, while the *kernel* doesn't generally have critical
opaque blobs, other projects do. Things like firmware images etc are
open to attack, and crazy people put ISO images in repositories etc.

So I don't think this discussion should focus exclusively on the git metadata.

It is likely much easier to replace a binary blob than it is to
replace a commit or tree (or a source file that has to go through a
compiler). And for many projects, that would be a bad thing.

> It's not an identical prefix, but I think collision attacks generally
> are along the lines of selecting two prefixes followed by garbage, and
> then mutating the garbage on both sides. That would "work" in this case
> (modulo the fact that git would complain about the NUL).

I think this particular attack depended on an actual identical prefix,
but I didn't go back to the paper and check.

But the attacks tend to very much depend on particular input bit
patterns that have very particular effects on the resulting
intermediate hash, and those bit patterns are specific to the hash and
known.

So a very powerful defense is to just look for those bit patterns in
the objects, and just warn about them. Those patterns don't tend to
exist in normal inputs anyway, but particularly if you just warn, it's
a heads-ups that "ok, something iffy is going on"

And as mentioned, a cheap "something iffy is going on" thing is
basically a death sentence to SCM attacks.

The whole _point_ of an SCM is that it isn't about a one-time event,
but about continuous history. That also fundamentally means that a
successful attack needs to work over time, and not be detectable.

In contrast, many other uses of hashes are "one-time" events.  If you
use a hash to validate a single piece of data from a source that you
wouldn't otherwise trust, it's a one-time "all or nothing" trust
situation.

And the attack surface is very different for those "one-time" vs
"trust over time" cases. If you can get a bank to trust a session one
time, you can empty a bank account and live on a paradise island for
the rest of your life. It doesn't matter if it gets detected or not
after-the-fact.

But if you can fool a SCM one time, insert your code, and it gets
detected next week, you didn't actually do anything useful. You only
burned yourself.

See the difference? One-time vs having a continual interaction makes a
*fundamntal* difference in game theory.

Linus

Re: SHA1 collisions found

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 02:21:47PM -0400, Joey Hess wrote:

> Linus Torvalds wrote:
> > What you describe pretty much already requires a pre-image attack,
> > which the new attack is _not_.
> > 
> > It's not clear that the "good" object can be anything sane.
> 
> Generate a regular commit object; use the entire commit object + NUL as the
> chosen prefix, and use the identical-prefix collision attack to generate
> the colliding good/bad objects.

FWIW, git-fsck complains about those (and transfer.fsck rejects them):

  $ (git cat-file commit HEAD; printf '\0more stuff') |
git hash-object -w --stdin -t commit
  ecb2e5165c184f9025cb4c49d8f75901f4830354

  $ git fsck
  warning in commit ecb2e5165c184f9025cb4c49d8f75901f4830354: nulInCommit: NUL 
byte in the commit object body

So as long as either your "good" or "evil" commit has binary junk in it,
you are likely to be noticed (not everybody turns on transfer.fsck, but
GitHub does).

-Peff

Re: [PATCH] http(s): automatically try NTLM authentication first

2017-02-23 Thread Junio C Hamano

Johannes Schindelin  writes:

> On Wed, 22 Feb 2017, Jeff King wrote:
>> This patch drops the useless probe request:
> ...
>> but setting http.emptyauth adds back in the useless request. I think
>> that could be fixed by skipping the empty-auth thing when
>> http_auth_methods does not have CURLAUTH_NEGOTIATE in it (or perhaps
>> other methods need it to, so maybe skip it if _just_ BASIC is set).
>> 
>> I suspect the patch above could probably be generalized as:
>> 
>>   /* cut out methods we know the server doesn't support */
>>   http_auth_methods &= results.auth_avail;
>> 
>> and let curl figure it out from there.
>
> Maybe this patch (or a variation thereof) would also be able to fix this
> problem with the patch:
>
>   https://github.com/git-for-windows/git/issues/1034
>
> Short version: for certain servers (that do *not* advertise Negotiate),
> setting emptyauth to true will result in a failed fetch, without letting
> the user type in their credentials.

The issue described in that page looks rather serious.

I believe that a "variation" has become the first part of a
two-patch series that appear in the downthread from here.  Perhaps
you can ask them to test it out (or even better if you have a setup
you can easily test against yourself)?

Re: SHA1 collisions found

2017-02-23 Thread Joey Hess

Joey Hess wrote:
> Linus Torvalds wrote:
> > What you describe pretty much already requires a pre-image attack,
> > which the new attack is _not_.
> > 
> > It's not clear that the "good" object can be anything sane.
> 
> Generate a regular commit object; use the entire commit object + NUL as the
> chosen prefix, and use the identical-prefix collision attack to generate
> the colliding good/bad objects.
> 
> (The size in git's object header is a minor complication. Set the size
> field to something sufficiently large, and then pad out the colliding
> objects to that size once they're generated.)

Sorry! While that would work, it's a useless attack because the good and bad
commit objects still point to the same tree.

It would be interesting to have such colliding objects, to see what beaks,
but probably not worth $75k to generate them.

-- 
see shy jo


signature.asc
Description: PGP signature

Re: [PATCH v2] send-email: only allow one address per body tag

2017-02-23 Thread Junio C Hamano

Matthieu Moy  writes:

> Johan Hovold  writes:
>
>> --- a/git-send-email.perl
>> +++ b/git-send-email.perl
>> @@ -1563,7 +1563,7 @@ foreach my $t (@files) {
>>  # Now parse the message body
>>  while(<$fh>) {
>>  $message .=  $_;
>> -if (/^(Signed-off-by|Cc): (.*)$/i) {
>> +if (/^(Signed-off-by|Cc): ([^>]*>?)/i) {
>
> I think this is acceptable, but this doesn't work with trailers like
>
> Cc: "Some > Body" 
>
> A proper management of this kind of weird address should be doable by
> reusing the regexp parsing "..." in parse_mailbox:
>
>   my $re_quote = qr/"(?:[^\"\\]|\\.)*"/;
>
> So the final regex would look like
>
> if (/^(Signed-off-by|Cc): (([^>]*|"(?:[^\"\\]|\\.)*")>?)/i) {
>
> I don't think that should block the patch inclusion, but it may be worth
> considering.
>
> Anyway, thanks for the patch!

Somehow this fell off the radar.  So your reviewed-by: and then
we'll cook this in 'next' for a while?

Thanks.

[PATCH] upload-pack: report "not our ref" to client

2017-02-23 Thread Jonathan Tan

Make upload-pack report "not our ref" errors to the client as an "ERR" line.
(If not, the client would be left waiting for a response when the server is
already dead.)

Signed-off-by: Jonathan Tan 
---

Thanks, here is the independent patch.

 upload-pack.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/upload-pack.c b/upload-pack.c
index 7597ba340..ffb028d62 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -822,9 +822,13 @@ static void receive_needs(void)
use_include_tag = 1;
 
o = parse_object(sha1_buf);
-   if (!o)
+   if (!o) {
+   packet_write_fmt(1,
+"ERR upload-pack: not our ref %s",
+sha1_to_hex(sha1_buf));
die("git upload-pack: not our ref %s",
sha1_to_hex(sha1_buf));
+   }
if (!(o->flags & WANTED)) {
o->flags |= WANTED;
if (!((allow_unadvertised_object_request & 
ALLOW_ANY_SHA1) == ALLOW_ANY_SHA1
-- 
2.11.0.483.g087da7b7c-goog

Re: SHA1 collisions found

2017-02-23 Thread Jeff King

On Thu, Feb 23, 2017 at 10:40:48AM -0800, Linus Torvalds wrote:

> > Generate a regular commit object; use the entire commit object + NUL as the
> > chosen prefix, and use the identical-prefix collision attack to generate
> > the colliding good/bad objects.
> 
> So I agree with you that we need to make git check for the opaque
> data. I think I was the one who brought that whole argument up.

We do already.

> But even then, what you describe doesn't work. What you describe just
> replaces the opaque data - that git doesn't actually *use*, and that
> nobody sees - with another piece of opaque data.
> 
> You also need to make the non-opaque data of the bad object be
> something that actually encodes valid git data with interesting hashes
> in it (for the parent/tree/whatever pointers).
> 
> So you don't have just that "chosen prefix". You actually need to also
> fill in some very specific piece of data *in* the attack parts itself.
> And you need to do this in the exact same size (because that's part of
> the prefix), etc etc.

It's not an identical prefix, but I think collision attacks generally
are along the lines of selecting two prefixes followed by garbage, and
then mutating the garbage on both sides. That would "work" in this case
(modulo the fact that git would complain about the NUL).

I haven't read the paper yet to see if that is the case here, though.

A related case is if you could stick a "cruft " header at the end of
the commit headers, and mutate its value (avoiding newlines). fsck
doesn't complain about that.

-Peff

Re: SHA1 collisions found

2017-02-23 Thread Linus Torvalds

On Thu, Feb 23, 2017 at 10:21 AM, Joey Hess  wrote:
> Linus Torvalds wrote:
>> What you describe pretty much already requires a pre-image attack,
>> which the new attack is _not_.
>>
>> It's not clear that the "good" object can be anything sane.
>
> Generate a regular commit object; use the entire commit object + NUL as the
> chosen prefix, and use the identical-prefix collision attack to generate
> the colliding good/bad objects.

So I agree with you that we need to make git check for the opaque
data. I think I was the one who brought that whole argument up.

But even then, what you describe doesn't work. What you describe just
replaces the opaque data - that git doesn't actually *use*, and that
nobody sees - with another piece of opaque data.

You also need to make the non-opaque data of the bad object be
something that actually encodes valid git data with interesting hashes
in it (for the parent/tree/whatever pointers).

So you don't have just that "chosen prefix". You actually need to also
fill in some very specific piece of data *in* the attack parts itself.
And you need to do this in the exact same size (because that's part of
the prefix), etc etc.

So I think it's challenging.

... and then we can discover it trivially.

Ok, so "git fsck" right now takes a couple of minutes for me and I
don't actually run it very often (I used to run it religiously back in
the days), but afaik kernel.org actually runs it nightly. So it's
pretty much "trivially discoverable" - imagine spending thousands of
CPU-hours and lots of social capital to get an attack in, and then the
next night the kernel.org fsck complains about the strange commit you
added?

  Linus

Re: SHA1 collisions found

2017-02-23 Thread Junio C Hamano

Joey Hess  writes:

> For example, git fsck does warn about a commit message with opaque
> data hidden after a NUL. But, git show/merge/pull give no indication
> that something funky is going on when working with such commits.

Would

$ git config transfer.fsckobjects true

help?

Re: SHA1 collisions found

2017-02-23 Thread Linus Torvalds

On Thu, Feb 23, 2017 at 10:10 AM, Joey Hess  wrote:
>
> It would cost 6500 CPU years + 100 GPU years to generate valid colliding
> git objects using the methods of the paper's authors. That might be cost
> effective if it helped get a backdoor into eg, the kernel.

I still think it also needs to be interesting enough data, not just
random noise that is then trivial to find with automated tools.

Because for the kernel, it's not just that an attacker needs to do the
CPU time. Yes, first he needs the technical resources to just do just
the attack and create the situation you described.

But then he *also* needs to build up the social capital to get the end
result pulled into the tree (ie if he depends on the hidden spaces, he
needs somebody to actually do a git pull, not just apply a patch).

.. and if we then have a tool that then finds the problem trivially
(ie "git fsck"), he's not only wasted all those technical resources,
he's also burned his identity.

>>  (b) we can probably easily add some extra sanity checks to the opaque
>> data we do have, to make it much harder to do the hiding of random
>> data that these attacks pretty much always depend on.
>
> For example, git fsck does warn about a commit message with opaque
> data hidden after a NUL. But, git show/merge/pull give no indication
> that something funky is going on when working with such commits.

I do agree that we might want to do some of the fsck checks
particularly at fetch time. That's when doing checks is both relevant
and cheap.

So we could do the opaque data checks, but we could/should probably
also add the attack pattern ("disturbance vectors") checks.

And the thing is, adding those checks is really cheap, and basically
makes the whole attack vector pointless against git.

Because unlike some "signing a pdf" attack, git doesn't fundamentally
depend on the SHA1 as some kind of absolute security.  If we have the
minimal machinery in git to just notice the attack, the attack
essentially goes away. Attackers can waste infinite amounts of CPU
time, and if it's cheap for us to notice, it completely disarms all
that attack work.

Again, I'm not arguing that people shouldn't work on extending git to
a new (and bigger) hash. I think that's a no-brainer, and we do want
to have a path to eventually move towards SHA3-256 or whatever.

But I'm very definitely arguing that the current attack doesn't
actually sound like it really even _matters_, because it should be so
easy to mitigate against.

   Linus

Re: SHA1 collisions found

2017-02-23 Thread Joey Hess

Linus Torvalds wrote:
> What you describe pretty much already requires a pre-image attack,
> which the new attack is _not_.
> 
> It's not clear that the "good" object can be anything sane.

Generate a regular commit object; use the entire commit object + NUL as the
chosen prefix, and use the identical-prefix collision attack to generate
the colliding good/bad objects.

(The size in git's object header is a minor complication. Set the size
field to something sufficiently large, and then pad out the colliding
objects to that size once they're generated.)

-- 
see shy jo

signature.asc
Description: PGP signature

Re: SHA1 collisions found

2017-02-23 Thread David Lang


On Thu, 23 Feb 2017, Joey Hess wrote:


Junio C Hamano wrote:

On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess  wrote:


Since we now have collisions in valid PDF files, collisions in valid git
commit and tree objects are probably able to be constructed.


That may be true, but
https://public-inbox.org/git/pine.lnx.4.58.0504291221250.18...@ppc970.osdl.org/


That's about someone replacing an valid object in Linus's repository
with an invalid random blob they found that collides. This SHA1
break doesn't allow generating such a blob anyway. Linus is right,
that's an impractical attack.

Attacks using this SHA1 break will look something more like:

* I push a "bad" object to a repo on github I set up under a
 pseudonym.
* I publish a "good" object in a commit and convince the maintainer to
 merge it.
* I wait for the maintainer to push to github.
* I wait for github to deduplicate and hope they'll replace the good
 object with the bad one I pre-uploaded, thus silently changing the
 content of the good commit the maintainer reviewed and pushed.
* The bad object is pulled from github and deployed.
* The maintainer still has the good object. They may not notice the bad
 object is out there for a long time.

Of course, it doesn't need to involve Github, and doesn't need to
rely on internal details of their deduplication[1];
that only let me publish the bad object under a psydonym.


read that e-mail again, it covers the case where a central server gets a blob 
replaced in it.


tricking a maintainerinto accepting a file that contains huge amounts of binary 
data in it is going to be a non-trivial task, and even after you trick them into 
accepting one bad file, you then need to replace the file they accepted with a 
new one (breaking into github or assuming that github is putting both files into 
the same repo, both of which are fairly unlikely)


David Lang

Re: SHA1 collisions found

2017-02-23 Thread Joey Hess

Linus Torvalds wrote:
> I haven't seen the attack yet, but git doesn't actually just hash the
> data, it does prepend a type/length field to it. That usually tends to
> make collision attacks much harder, because you either have to make
> the resulting size the same too, or you have to be able to also edit
> the size field in the header.

I have some sha1 collisions (and other fun along these lines) in 
https://github.com/joeyh/supercollider

That includes two files with the same SHA and size, which do get
different blobs thanks to the way git prepends the header to the
content.

joey@darkstar:~/tmp/supercollider>sha1sum  bad.pdf good.pdf 
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  bad.pdf
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  good.pdf
joey@darkstar:~/tmp/supercollider>git ls-tree HEAD
100644 blob ca44e9913faf08d625346205e228e2265dd12b65bad.pdf
100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1good.pdf

While appending identical data to these colliding files does generate
other collisions, prepending data does not.

It would cost 6500 CPU years + 100 GPU years to generate valid colliding
git objects using the methods of the paper's authors. That might be cost
effective if it helped get a backdoor into eg, the kernel.

>  (b) we can probably easily add some extra sanity checks to the opaque
> data we do have, to make it much harder to do the hiding of random
> data that these attacks pretty much always depend on.

For example, git fsck does warn about a commit message with opaque
data hidden after a NUL. But, git show/merge/pull give no indication
that something funky is going on when working with such commits.

-- 
see shy jo

signature.asc
Description: PGP signature

Re: SHA1 collisions found

2017-02-23 Thread Linus Torvalds

On Thu, Feb 23, 2017 at 9:35 AM, Joey Hess  wrote:
>
> Attacks using this SHA1 break will look something more like:

We don't actually know what the break is, but it's likely that you
can't actually do what you think you can do:

> * I push a "bad" object to a repo on github I set up under a
>   pseudonym.
> * I publish a "good" object in a commit and convince the maintainer to
>   merge it.

It's not clear that the "good" object can be anything sane.

What you describe pretty much already requires a pre-image attack,
which the new attack is _not_.

The new attack doesn't have a controlled "good" case, you need two
different objects that both have "near-collision" blocks in the
middle. I don't know what the format of those near-collision blocks
are, but it's a big problem.

You blithely just say "I create a good object". It's not that simple.
If it was, this would be a pre-image attack.

So basically, the attack needs some kind of random binary garbage in
*both* objects in the middle.

That's why pdf's are the classic model for showing these attacks: it's
easy to insert garbage in the middle of a pdf that is invisible.

In a psf, you can just define a bitmap that you don't use for printing
- but you can use them to then make a decision about what to print -
making the printed version of the pdf look radically different in ways
that are not so much _directly_ about the invisible block itself.

  Linus

1 2 >

1 - 100 of 129 matches

Mail list logo