date:20170725

[PATCH] fsck: cleanup unused variable

2017-07-25 Thread Jonathan Tan

Remove the unused variable "heads" from cmd_fsck().

This variable was made unused in commit c3271a0 ("fsck: do not fallback
"git fsck " to "git fsck"", 2017-01-17).

Signed-off-by: Jonathan Tan 
---
While working on fsck, I noticed one more possible cleanup.
---
 builtin/fsck.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 99dea7adf..cc5a5cfab 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -667,7 +667,7 @@ static struct option fsck_opts[] = {
 
 int cmd_fsck(int argc, const char **argv, const char *prefix)
 {
-   int i, heads;
+   int i;
struct alternate_object_database *alt;
 
errors_found = 0;
@@ -735,7 +735,6 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
}
 
-   heads = 0;
for (i = 0; i < argc; i++) {
const char *arg = argv[i];
unsigned char sha1[20];
@@ -753,7 +752,6 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
add_decoration(fsck_walk_options.object_names,
obj, xstrdup(arg));
mark_object_reachable(obj);
-   heads++;
continue;
}
error("invalid parameter: expected sha1, got '%s'", arg);
-- 
2.14.0.rc0.400.g1c36432dff-goog

Re: git gc seems to break --symbolic-full-name

2017-07-25 Thread Jacob Keller

On Sun, Jul 23, 2017 at 12:23 PM, Stas Sergeev  wrote:
> 23.07.2017 11:40, Jacob Keller пишет:
>>
>> On Fri, Jul 21, 2017 at 12:03 PM, Stas Sergeev  wrote:
>>>
>>> I wanted some kind of file to use it as a
>>> build dependency for the files that needs
>>> to be re-built when the head changes.
>>> This works very well besides git gc.
>>> What other method can be used as simply
>>> as that? git show-ref does not seem to be
>>> giving this.
>>
>> There's no real way to do this, and even prior to 2007 when the file
>> always existed, there's no guarantee it's modification time is valid.
>>
>> I'd suggest you have a phony rule which you always run, that checks
>> the ref, and sees if it's different from "last time" and then updates
>> a different file if that's the case. Then the build can depend on the
>> generated file, and you'd be able to figure it out.
>
> OK, thanks, that looks quite simple too.
> I will have to create the file by hands that
> I expected git to already have, but it appears
> not.
>
>> What's the real goal for depending on when the ref changes?
>
> So that when users fill in the bug report, I can
> see at what revision have the bug happened. :)
> While seemingly "just a debugging sugar", the
> hard experience shows this to be exceptionally
> useful.
> I think even linux kernel does something like
> this, and solves that task the hard way. For
> example I can see a script at scripts/setlocalversion
> whose output seems to go to
> include/config/kernel.release and a lot of
> logic in the toplevel makefile about this.
> So not liking the fact that every project solves
> this differently, I was trying to get the solution
> directly from git. But I'll try otherwise.

If your goal is to make it so users filling out bug reports have a
version, then using git descrsibe and making that be part of your
version (based off your tags, and commits) is how pretty much every
other project I've worked on does this.

I really don't think that's your goal here, given you're doing things
in make with timestamps and builds, so I guess I misunderstood your
answer?

Thanks,
Jake

Re: git gc seems to break --symbolic-full-name

2017-07-25 Thread Stas Sergeev


24.07.2017 07:02, Jacob Keller пишет:

generally, I'd suggest using "git describe" to output a version based
on tag, and as part of your build system set that in some sort of
--version output of some kind.

I came to the following solution which
looks quite simple (avoids comparing the
output of git describe):
git log -1 --format=%cd --date=rfc | xargs -I {} touch --date={} $TSTAMP

The care must be taken to put the timestamp
file as a pre-requisite of the .LOW_RESOLUTION_TIME
target, and the build seems to work properly.
This still smells hackish, but this time I blame
make for an inability to specify the timestamps
explicitly. :)

Re: [PATCH 06/15] fetch: don't overlay config with submodule-config

2017-07-25 Thread Brandon Williams

On 07/25, Stefan Beller wrote:
> On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> > Don't rely on overlaying the repository's config on top of the
> > submodule-config, instead query the repository's config directly for the
> > fetch_recurse field.
> >
> > Signed-off-by: Brandon Williams 
> 
> Reviewed-by: Stefan Beller 
> 
> > ---
> >  builtin/fetch.c |  1 -
> >  submodule.c | 24 +---
> >  2 files changed, 17 insertions(+), 8 deletions(-)
> >
> > diff --git a/builtin/fetch.c b/builtin/fetch.c
> > index d84c26391..3fe99073d 100644
> > --- a/builtin/fetch.c
> > +++ b/builtin/fetch.c
> > @@ -1362,7 +1362,6 @@ int cmd_fetch(int argc, const char **argv, const char 
> > *prefix)
> >
> > if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
> > gitmodules_config();
> > -   git_config(submodule_config, NULL);
> > }
> >
> > if (all) {
> > diff --git a/submodule.c b/submodule.c
> > index 8b9e48a61..c5058a4b8 100644
> > --- a/submodule.c
> > +++ b/submodule.c
> > @@ -1210,14 +1210,24 @@ static int get_next_submodule(struct child_process 
> > *cp,
> >
> > default_argv = "yes";
> > if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) 
> > {
> > -   if (submodule &&
> > -   submodule->fetch_recurse !=
> > -   RECURSE_SUBMODULES_NONE) {
> > -   if (submodule->fetch_recurse ==
> > -   RECURSE_SUBMODULES_OFF)
> > +   int fetch_recurse = RECURSE_SUBMODULES_NONE;
> > +
> > +   if (submodule) {
> > +   char *key;
> > +   const char *value;
> > +
> > +   fetch_recurse = submodule->fetch_recurse;
> > +   key = 
> > xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
> > +   if 
> > (!repo_config_get_string_const(the_repository, key, )) {
> > +   fetch_recurse = 
> > parse_fetch_recurse_submodules_arg(key, value);
> > +   }
> > +   free(key);
> > +   }
> 
> I wonder if it would be better to parse this in 
> builtin/fetch.c#git_fetch_config
> and then pass it in here as a parameter, instead of looking it up directly 
> here?
> That way it is easier to keep track of what a builtin pays attention to.

Really the fact that you can configure individual submodules in
.gitmodules to be fetched recursively or not is a terrible design IMO.
Also this is a per-submodule configuration so having it in
builtin/fetch.c would be incredibly annoying to handle.

> 
> 
> > +
> > +   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
> > +   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
> > continue;
> > -   if (submodule->fetch_recurse ==
> > -   
> > RECURSE_SUBMODULES_ON_DEMAND) {
> > +   if (fetch_recurse == 
> > RECURSE_SUBMODULES_ON_DEMAND) {
> > if 
> > (!unsorted_string_list_lookup(_submodule_paths, ce->name))
> > continue;
> > default_argv = "on-demand";
> > --
> > 2.14.0.rc0.400.g1c36432dff-goog
> >

-- 
Brandon Williams

Re: [PATCH 07/15] submodule: don't rely on overlayed config when setting diffopts

2017-07-25 Thread Stefan Beller

On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> Don't rely on overlaying the repository's config on top of the
> submodule-config, instead query the repository's config directory for
> the ignore field.
>
> Signed-off-by: Brandon Williams 
> ---
>  submodule.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index c5058a4b8..f86b82fbb 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -165,8 +165,16 @@ void set_diffopt_flags_from_submodule_config(struct 
> diff_options *diffopt,
>  {
> const struct submodule *submodule = submodule_from_path(_oid, 
> path);
> if (submodule) {
> -   if (submodule->ignore)
> -   handle_ignore_submodules_arg(diffopt, 
> submodule->ignore);
> +   const char *ignore;
> +   char *key;
> +
> +   key = xstrfmt("submodule.%s.ignore", submodule->name);
> +   if (repo_config_get_string_const(the_repository, key, 
> ))
> +   ignore = submodule->ignore;

Unlike the last patch, we have to use a direct lookup here as
the alternative is hugely painful.


> +   free(key);
> +
> +   if (ignore)
> +   handle_ignore_submodules_arg(diffopt, ignore);
> else if (is_gitmodules_unmerged(_index))
> DIFF_OPT_SET(diffopt, IGNORE_SUBMODULES);
> }
> --
> 2.14.0.rc0.400.g1c36432dff-goog
>

Re: [PATCH 06/15] fetch: don't overlay config with submodule-config

2017-07-25 Thread Stefan Beller

On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> Don't rely on overlaying the repository's config on top of the
> submodule-config, instead query the repository's config directly for the
> fetch_recurse field.
>
> Signed-off-by: Brandon Williams 

Reviewed-by: Stefan Beller 

> ---
>  builtin/fetch.c |  1 -
>  submodule.c | 24 +---
>  2 files changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index d84c26391..3fe99073d 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -1362,7 +1362,6 @@ int cmd_fetch(int argc, const char **argv, const char 
> *prefix)
>
> if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
> gitmodules_config();
> -   git_config(submodule_config, NULL);
> }
>
> if (all) {
> diff --git a/submodule.c b/submodule.c
> index 8b9e48a61..c5058a4b8 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1210,14 +1210,24 @@ static int get_next_submodule(struct child_process 
> *cp,
>
> default_argv = "yes";
> if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
> -   if (submodule &&
> -   submodule->fetch_recurse !=
> -   RECURSE_SUBMODULES_NONE) {
> -   if (submodule->fetch_recurse ==
> -   RECURSE_SUBMODULES_OFF)
> +   int fetch_recurse = RECURSE_SUBMODULES_NONE;
> +
> +   if (submodule) {
> +   char *key;
> +   const char *value;
> +
> +   fetch_recurse = submodule->fetch_recurse;
> +   key = 
> xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
> +   if 
> (!repo_config_get_string_const(the_repository, key, )) {
> +   fetch_recurse = 
> parse_fetch_recurse_submodules_arg(key, value);
> +   }
> +   free(key);
> +   }

I wonder if it would be better to parse this in builtin/fetch.c#git_fetch_config
and then pass it in here as a parameter, instead of looking it up directly here?
That way it is easier to keep track of what a builtin pays attention to.


> +
> +   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
> +   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
> continue;
> -   if (submodule->fetch_recurse ==
> -   RECURSE_SUBMODULES_ON_DEMAND) 
> {
> +   if (fetch_recurse == 
> RECURSE_SUBMODULES_ON_DEMAND) {
> if 
> (!unsorted_string_list_lookup(_submodule_paths, ce->name))
> continue;
> default_argv = "on-demand";
> --
> 2.14.0.rc0.400.g1c36432dff-goog
>

Re: [PATCH 05/15] submodule--helper: don't overlay config in update-clone

2017-07-25 Thread Brandon Williams

On 07/25, Stefan Beller wrote:
> On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> > Don't rely on overlaying the repository's config on top of the
> > submodule-config, instead query the repository's config directly for the
> > url and the update strategy configuration.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> ...
> 
> > +struct submodule_update_strategy 
> > submodule_strategy_with_config_overlayed(struct repository *repo,
> > + 
> > const struct submodule *sub)
> > +{
> > +   struct submodule_update_strategy strat = sub->update_strategy;
> > +   const char *update;
> > +   char *key;
> > +
> > +   key = xstrfmt("submodule.%s.update", sub->name);
> > +   if (!repo_config_get_string_const(repo, key, )) {
> > +   strat.command = NULL;
> > +   if (!strcmp(update, "none")) {
> > +   strat.type = SM_UPDATE_NONE;
> > +   } else if (!strcmp(update, "checkout")) {
> > +   strat.type = SM_UPDATE_CHECKOUT;
> > +   } else if (!strcmp(update, "rebase")) {
> > +   strat.type = SM_UPDATE_REBASE;
> > +   } else if (!strcmp(update, "merge")) {
> > +   strat.type = SM_UPDATE_MERGE;
> > +   } else if (skip_prefix(update, "!", )) {
> > +   strat.type = SM_UPDATE_COMMAND;
> > +   strat.command = update;
> > +   } else {
> > +   die("invalid submodule update strategy '%s'", 
> > update);
> > +   }
> > +   }
> 
> Can this be simplified by reusing
> parse_submodule_update_strategy(value, dest)
> ?

It would result in a memory leak if we did.  Really I'd like to just
remove this entirely. The only reason this needs to be done is for
checkout, which if we don't have respect the update config it can be
removed.

-- 
Brandon Williams

Re: [PATCH 03/15] add, reset: ensure submodules can be added or reset

2017-07-25 Thread Brandon Williams

On 07/25, Stefan Beller wrote:
> On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> > Commit aee9c7d65 (Submodules: Add the new "ignore" config option for
> > diff and status) ...
> 
> introduced in 2010, so quite widely spread.
> 
> > ...  introduced the ignore configuration option for
> > submodules so that configured submodules could be omitted from the
> > status and diff commands.  Because this flag is respected in the diff
> > machinery it has the unintended consequence of potentially prohibiting
> > users from adding or resetting a submodule, even when a path to the
> > submodule is explicitly given.
> >
> > Ensure that submodules can be added or set, even if they are configured
> > to be ignored, by setting the `DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG` diff
> > flag.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  builtin/add.c   | 1 +
> >  builtin/reset.c | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/builtin/add.c b/builtin/add.c
> > index e888fb8c5..6f271512f 100644
> > --- a/builtin/add.c
> > +++ b/builtin/add.c
> > @@ -116,6 +116,7 @@ int add_files_to_cache(const char *prefix,
> > rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> > rev.diffopt.format_callback = update_callback;
> > rev.diffopt.format_callback_data = 
> > +   rev.diffopt.flags |= DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;
> 
> 
> This flag occurs once in the code base, with the comment:
> /*
>  * Unless the user did explicitly request a submodule
>  * ignore mode by passing a command line option we do
>  * not ignore any changed submodule SHA-1s when
>  * comparing index and parent, no matter what is
>  * configured. Otherwise we won't commit any
>  * submodules which were manually staged, which would
>  * be really confusing.
>  */
> int diff_flags = DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;
> 
> in prepare_commit, so commit ignores the .gitmodules file.
> 
> This allows git-add to add ignored submodules, currently ignored submodules
> would have to be added using the plumbing
> git update-index --add --cacheinfo 16,$SHA1,
> 
> This makes sense, though a test demonstrating the change in behavior
> would be nice, but git-add doesn't seem to change as it doesn't even load
> the git modules config?

I can add a comment to the code but its already being tested in the
submodule test suite.  The only reason this doesn't cause any changes
now is that the gitmodules config is never loaded, but that may
change if we decide to allow lazy-loading of the gitmodules file (like
the last couple patches in this series do).

-- 
Brandon Williams

Re: [PATCH 05/15] submodule--helper: don't overlay config in update-clone

2017-07-25 Thread Stefan Beller

On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> Don't rely on overlaying the repository's config on top of the
> submodule-config, instead query the repository's config directly for the
> url and the update strategy configuration.
>
> Signed-off-by: Brandon Williams 
> ---
...

> +struct submodule_update_strategy 
> submodule_strategy_with_config_overlayed(struct repository *repo,
> + 
> const struct submodule *sub)
> +{
> +   struct submodule_update_strategy strat = sub->update_strategy;
> +   const char *update;
> +   char *key;
> +
> +   key = xstrfmt("submodule.%s.update", sub->name);
> +   if (!repo_config_get_string_const(repo, key, )) {
> +   strat.command = NULL;
> +   if (!strcmp(update, "none")) {
> +   strat.type = SM_UPDATE_NONE;
> +   } else if (!strcmp(update, "checkout")) {
> +   strat.type = SM_UPDATE_CHECKOUT;
> +   } else if (!strcmp(update, "rebase")) {
> +   strat.type = SM_UPDATE_REBASE;
> +   } else if (!strcmp(update, "merge")) {
> +   strat.type = SM_UPDATE_MERGE;
> +   } else if (skip_prefix(update, "!", )) {
> +   strat.type = SM_UPDATE_COMMAND;
> +   strat.command = update;
> +   } else {
> +   die("invalid submodule update strategy '%s'", update);
> +   }
> +   }

Can this be simplified by reusing
parse_submodule_update_strategy(value, dest)
?

Re: [PATCH 04/15] submodule--helper: don't overlay config in remote_submodule_branch

2017-07-25 Thread Stefan Beller

On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> Don't rely on overlaying the repository's config on top of the
> submodule-config, instead query the repository's config directly for the
> branch field.
>
> Signed-off-by: Brandon Williams 

Reviewed-by: Stefan Beller

Re: [PATCH 03/15] add, reset: ensure submodules can be added or reset

2017-07-25 Thread Stefan Beller

On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> Commit aee9c7d65 (Submodules: Add the new "ignore" config option for
> diff and status) ...

introduced in 2010, so quite widely spread.

> ...  introduced the ignore configuration option for
> submodules so that configured submodules could be omitted from the
> status and diff commands.  Because this flag is respected in the diff
> machinery it has the unintended consequence of potentially prohibiting
> users from adding or resetting a submodule, even when a path to the
> submodule is explicitly given.
>
> Ensure that submodules can be added or set, even if they are configured
> to be ignored, by setting the `DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG` diff
> flag.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/add.c   | 1 +
>  builtin/reset.c | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index e888fb8c5..6f271512f 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -116,6 +116,7 @@ int add_files_to_cache(const char *prefix,
> rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
> rev.diffopt.format_callback = update_callback;
> rev.diffopt.format_callback_data = 
> +   rev.diffopt.flags |= DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;


This flag occurs once in the code base, with the comment:
/*
 * Unless the user did explicitly request a submodule
 * ignore mode by passing a command line option we do
 * not ignore any changed submodule SHA-1s when
 * comparing index and parent, no matter what is
 * configured. Otherwise we won't commit any
 * submodules which were manually staged, which would
 * be really confusing.
 */
int diff_flags = DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;

in prepare_commit, so commit ignores the .gitmodules file.

This allows git-add to add ignored submodules, currently ignored submodules
would have to be added using the plumbing
git update-index --add --cacheinfo 16,$SHA1,

This makes sense, though a test demonstrating the change in behavior
would be nice, but git-add doesn't seem to change as it doesn't even load
the git modules config?

> rev.max_count = 0; /* do not compare unmerged paths with stage #2 */
> run_diff_files(, DIFF_RACY_IS_MODIFIED);
> return !!data.add_errors;
> diff --git a/builtin/reset.c b/builtin/reset.c
> index 046403ed6..772d078b8 100644
> --- a/builtin/reset.c
> +++ b/builtin/reset.c
> @@ -156,6 +156,7 @@ static int read_from_tree(const struct pathspec *pathspec,
> opt.output_format = DIFF_FORMAT_CALLBACK;
> opt.format_callback = update_index_from_diff;
> opt.format_callback_data = _to_add;
> +   opt.flags |= DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;

same here? Also as this is not failing any test, it may be worth adding one
to document the behavior of the "submodule..ignore" flag in tests?

>
> if (do_diff_cache(tree_oid, ))
> return 1;
> --
> 2.14.0.rc0.400.g1c36432dff-goog
>

Re: [PATCH 02/15] submodule: don't use submodule_from_name

2017-07-25 Thread Stefan Beller

On Tue, Jul 25, 2017 at 2:39 PM, Brandon Williams  wrote:
> The function 'submodule_from_name()' is being used incorrectly here as a
> submodule path is being used instead of a submodule name.  Since the
> correct function to use with a path to a submodule is already being used
> ('submodule_from_path()') let's remove the call to
> 'submodule_from_name()'.

This blames to 851e18c385 (submodule: use new config API for worktree
configurations, 2015-08-17), but that is a refactoring. The issue of using
the path instead of a name was there before that. The actual issue
was introduced in 7dce19d374 (fetch/pull: Add the
--recurse-submodules option, 2010-11-12).

+ name = ce->name;
+ name_for_path =
unsorted_string_list_lookup(_name_for_path, ce->name);
+ if (name_for_path)
+ name = name_for_path->util;

Rereading the archives, there was quite some discussion on the design
of these patches, but these lines of code did not get any attention

https://public-inbox.org/git/4cdb3063.5010...@web.de/

I cc'd Jens in the hope of him having a good memory why he
wrote the code that way. :)

Note that this is the last caller of submodule_from_name being
removed, so I would expect removal of submodule_from_name from
the t/helper/test-submodule-config.c as well as
Documentation/technical/api-submodule-config.txt
in a later part of this series. (Well technically it could go outside
of the series, but in the mean time we'd document and test
dead code)

> Signed-off-by: Brandon Williams 
> ---
>  submodule.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 7e87e4698..fd391aea6 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1177,8 +1177,6 @@ static int get_next_submodule(struct child_process *cp,
> continue;
>
> submodule = submodule_from_path(_oid, ce->name);
> -   if (!submodule)
> -   submodule = submodule_from_name(_oid, ce->name);
>
> default_argv = "yes";
> if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
> --
> 2.14.0.rc0.400.g1c36432dff-goog
>

[PATCH v2] contrib/rerere-train: optionally overwrite existing resolutions

2017-07-25 Thread Raman Gupta

Provide the user an option to overwrite existing resolutions using an
`--overwrite` flag. This might be used, for example, if the user knows
that they already have an entry in their rerere cache for a conflict,
but wish to drop it and retrain based on the merge commit(s) passed to
the rerere-train script.

Signed-off-by: Raman Gupta 
---
 contrib/rerere-train.sh | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
index 52ad9e4..e25bf8a 100755
--- a/contrib/rerere-train.sh
+++ b/contrib/rerere-train.sh
@@ -3,10 +3,38 @@
 # Prime rerere database from existing merge commits
 
 me=rerere-train
-USAGE="$me rev-list-args"
 
 SUBDIRECTORY_OK=Yes
-OPTIONS_SPEC=
+OPTS_SPEC="\
+$me [--overwrite] 
+--
+h,helpshow the help
+o,overwrite   overwrite any existing rerere cache
+"
+eval "$(echo "$OPTS_SPEC" | git rev-parse --parseopt -- "$@" || echo exit $?)"
+
+overwrite=0
+
+while test $# -gt 0
+do
+   opt="$1"
+   case "$opt" in
+   -o)
+   overwrite=1
+   shift
+   shift
+   ;;
+   --)
+   shift
+   break
+   ;;
+   *)
+   break
+   exit 1
+   ;;
+   esac
+done
+
 . "$(git --exec-path)/git-sh-setup"
 require_work_tree
 cd_to_toplevel
@@ -34,6 +62,10 @@ do
# Cleanly merges
continue
fi
+   if [ $overwrite == 1 ]
+   then
+   git rerere forget .
+   fi
if test -s "$GIT_DIR/MERGE_RR"
then
git show -s --pretty=format:"Learning from %h %s" "$commit"
-- 
2.9.4

Re: [PATCH] recursive submodules: detach HEAD from new state

2017-07-25 Thread Stefan Beller

On Mon, Jul 24, 2017 at 3:23 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Also, while I do agree with you that the problem exists, it is
>> unclear why this patch is a solution and not a hack that sweeps a
>> problem under the rug.
>>
>> It is unclear why this "silently detach HEAD without telling the
>> user" is a better solution than erroring out, for example [*1*].
>
> Just to avoid possible confusion; I am not claiming that it would be
> more (or less for that matter) sensible to error out than silently
> detaching HEAD, because I am not giving the reason to substantiate
> the claim and I do not have a strong opinion to favour which one (or
> another potential solution, if any).
>
> I am just saying that the patch that proposes a solution should be
> backed with an explanation why it is a good idea, especially when
> there are obvious alternatives that are not so clearly inferior.
>
> Thanks.

So I took a step back and wrote about different proposals where
we want to go long term. See below. This will help us
figuring out how to approach this bug correctly.
--

RFC: A new type of symbolic refs

A symbolic ref can currently only point at a ref or another symbolic ref.
This proposal show cases different scenarios on how this could change in
the future.

A: HEAD pointing at the superprojects index
===

Introduce a new symbolic ref that points at the superprojects index of
the gitlink. The format is

  "repo:"  '\0'  '\0'

Ref read operations
---
  e.g. git log HEAD

Just like existing symrefs, the content of the ref will be read and followed.
On reading "repo:", the sha1 will be obtained equivalent to:

git -C  ls-files -s  | awk '{ print $2}'

In case of error
(superproject not found, gitlink path does not exist), the ref is broken and

Ref write operations driven by the submodule, affecting symrefs
---
  e.g. git checkout  (in the submodule)

In this scenario only the HEAD is optionally attached to the superproject,
so we can rewrite the HEAD to be anything else, such as a branch just fine.
Once the HEAD is not pointing at the superproject any more, we'll leave the
submodule alone in operations driven by the superproject.

Ref write operations driven by the submodule, affecting target ref
--
  e.g. git commit, reset --hard, update-ref (in the submodule)

The HEAD stays the same, pointing at the superproject.
The gitlink is changed to the target sha1, using

  git -C  update-index --add \
  --cacheinfo 16,$SHA1,

This will affect the superprojects index, such that then a commit in
the superproject is needed.

Ref write operations driven by the superproject, changing the gitlink
-
  e.g. git checkout , git reset --hard (in the superproject)

This will change the gitlink in the superprojects index, such that the HEAD
in the submodule changes, which would trigger an update of the
submodules working tree.

Consistency considerations (gc)
---
  e.g. git gc --aggressive --prune=now

The repacking logic is already aware of a detached HEAD, such that
using this new symref mechanism would not generate problems as long as
we keep the HEAD attached to the superproject. However when commits/objects
are created while the HEAD is attached to the superproject and then HEAD
switches to a local branch, there are problems with the created objects
as they seem unreachable now.

This problem is not new as a superproject may record submodule objects
that are not reachable from any of the submodule branches. Such objects
fall prey to overzealous packing in the submodule.

This proposal however exposes this problem a lot more, as the submodule
has fewer needs for branches.

B: HEAD pointing at a superprojects branch
==

Instead of pointing at the index of the superproject, we also
encode a branch name:

repo:"  '\0'  '\0' branch '\0'

Ref read operations
---
  e.g. git log HEAD

This is similar to the case of pointing at the index, except that the reading
operation reads from the tip of the branch:

git -C  ls-tree  -- \
 | awk '{ print $3}'

Ref write operations driven by the submodule, affecting symrefs
---
  e.g. git checkout  (in the submodule)

HEAD will be pointed at the local target branch, dropping the affliation to
the superproject.

Ref write operations driven by the submodule, affecting target ref
--
  e.g. git commit, reset --hard, update-ref (in the submodule)

As we're pointing at the superprojects branch, this would have to create
a dummy(?) commit in the superproject, that

Re: [PATCH v2 2/2] sub-process: refactor handshake to common function

2017-07-25 Thread Junio C Hamano

Jonathan Tan  writes:

> Refactor, into a common function, the version and capability negotiation
> done when invoking a long-running process as a clean or smudge filter.
> This will be useful for other Git code that needs to interact similarly
> with a long-running process.
>
> As you can see in the change to t0021, this commit changes the error
> message reported when the long-running process does not introduce itself
> with the expected "server"-terminated line. Originally, the error
> message reports that the filter "does not support filter protocol
> version 2", differentiating between the old single-file filter protocol
> and the new multi-file filter protocol - I have updated it to something
> more generic and useful.
>
> Signed-off-by: Jonathan Tan 

Overall I like the direction, even though the abstraction the
resulting code results in seems to me a bit too tightly defined; in
other words, I cannot be sure that this will be useful enough in a
more general context, or make some potential applications feel a bit
too constrained.

> + static int versions[] = {2, 0};
> + static struct subprocess_capability capabilities[] = {
> + {"clean", CAP_CLEAN}, {"smudge", CAP_SMUDGE}, {NULL, 0}
> + };
>   struct cmd2process *entry = (struct cmd2process *)subprocess;
> ...
> + return subprocess_handshake(subprocess, "git-filter-", versions, NULL,
> + capabilities,
> + >supported_capabilities);
>  }

I would have defined the welcome prefix to lack the final dash,
i.e. forcing the hardcoded suffixes for clients and servers in any
protocol that uses this API to end with "-client" and "-server",
i.e. with dash.

> diff --git a/sub-process.c b/sub-process.c
> index a3cfab1a9..1a3f39bdf 100644
> --- a/sub-process.c
> +++ b/sub-process.c
> @@ -105,3 +105,97 @@ int subprocess_start(struct hashmap *hashmap, struct 
> subprocess_entry *entry, co
>   hashmap_add(hashmap, entry);
>   return 0;
>  }
> +
> +int subprocess_handshake(struct subprocess_entry *entry,
> +  const char *welcome_prefix,
> +  int *versions,
> +  int *chosen_version,
> +  struct subprocess_capability *capabilities,
> +  unsigned int *supported_capabilities) {
> + int version_scratch;
> + unsigned int capabilities_scratch;
> + struct child_process *process = >process;
> + int i;
> + char *line;
> + const char *p;
> +
> + if (!chosen_version)
> + chosen_version = _scratch;
> + if (!supported_capabilities)
> + supported_capabilities = _scratch;
> +
> + sigchain_push(SIGPIPE, SIG_IGN);
> +
> + if (packet_write_fmt_gently(process->in, "%sclient\n",
> + welcome_prefix)) {
> + error("Could not write client identification");
> + goto error;
> + }
> + for (i = 0; versions[i]; i++) {
> + if (packet_write_fmt_gently(process->in, "version=%d\n",
> + versions[i])) {
> + error("Could not write requested version");
> + goto error;
> + }
> + }

This forces version numbers to be positive integers, which is OK, as
I do not see it a downside that any potential application cannot use
"version=0".

> + if (packet_flush_gently(process->in))
> + goto error;
> +
> + if (!(line = packet_read_line(process->out, NULL)) ||
> + !skip_prefix(line, welcome_prefix, ) ||
> + strcmp(p, "server")) {
> + error("Unexpected line '%s', expected %sserver",
> +   line ? line : "", welcome_prefix);
> + goto error;
> + }
> + if (!(line = packet_read_line(process->out, NULL)) ||
> + !skip_prefix(line, "version=", ) ||
> + strtol_i(p, 10, chosen_version)) {
> + error("Unexpected line '%s', expected version",
> +   line ? line : "");
> + goto error;
> + }
> + for (i = 0; versions[i]; i++) {
> + if (versions[i] == *chosen_version)
> + goto version_found;
> + }
> + error("Version %d not supported", *chosen_version);
> + goto error;
> +version_found:

It would have been more natural to do

for (i = 0; versions[i]; i++)
if (versions[i] == *chosen_version)
break;
if (versions[i]) {
error("...");
goto error;
}

without "version_found:" label.  In general, I'd prefer to avoid
jumping to a label in the normal/expected case and reserve "goto"
for error handling.

> + if ((line = packet_read_line(process->out, NULL))) {
> + error("Unexpected line '%s', expected flush", line);
> + goto error;
> + }
> +
> +

[PATCH 06/15] fetch: don't overlay config with submodule-config

2017-07-25 Thread Brandon Williams

Don't rely on overlaying the repository's config on top of the
submodule-config, instead query the repository's config directly for the
fetch_recurse field.

Signed-off-by: Brandon Williams 
---
 builtin/fetch.c |  1 -
 submodule.c | 24 +---
 2 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index d84c26391..3fe99073d 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1362,7 +1362,6 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
 
if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
gitmodules_config();
-   git_config(submodule_config, NULL);
}
 
if (all) {
diff --git a/submodule.c b/submodule.c
index 8b9e48a61..c5058a4b8 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1210,14 +1210,24 @@ static int get_next_submodule(struct child_process *cp,
 
default_argv = "yes";
if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-   if (submodule &&
-   submodule->fetch_recurse !=
-   RECURSE_SUBMODULES_NONE) {
-   if (submodule->fetch_recurse ==
-   RECURSE_SUBMODULES_OFF)
+   int fetch_recurse = RECURSE_SUBMODULES_NONE;
+
+   if (submodule) {
+   char *key;
+   const char *value;
+
+   fetch_recurse = submodule->fetch_recurse;
+   key = 
xstrfmt("submodule.%s.fetchRecurseSubmodules", submodule->name);
+   if 
(!repo_config_get_string_const(the_repository, key, )) {
+   fetch_recurse = 
parse_fetch_recurse_submodules_arg(key, value);
+   }
+   free(key);
+   }
+
+   if (fetch_recurse != RECURSE_SUBMODULES_NONE) {
+   if (fetch_recurse == RECURSE_SUBMODULES_OFF)
continue;
-   if (submodule->fetch_recurse ==
-   RECURSE_SUBMODULES_ON_DEMAND) {
+   if (fetch_recurse == 
RECURSE_SUBMODULES_ON_DEMAND) {
if 
(!unsorted_string_list_lookup(_submodule_paths, ce->name))
continue;
default_argv = "on-demand";
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 03/15] add, reset: ensure submodules can be added or reset

2017-07-25 Thread Brandon Williams

Commit aee9c7d65 (Submodules: Add the new "ignore" config option for
diff and status) introduced the ignore configuration option for
submodules so that configured submodules could be omitted from the
status and diff commands.  Because this flag is respected in the diff
machinery it has the unintended consequence of potentially prohibiting
users from adding or resetting a submodule, even when a path to the
submodule is explicitly given.

Ensure that submodules can be added or set, even if they are configured
to be ignored, by setting the `DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG` diff
flag.

Signed-off-by: Brandon Williams 
---
 builtin/add.c   | 1 +
 builtin/reset.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/builtin/add.c b/builtin/add.c
index e888fb8c5..6f271512f 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -116,6 +116,7 @@ int add_files_to_cache(const char *prefix,
rev.diffopt.output_format = DIFF_FORMAT_CALLBACK;
rev.diffopt.format_callback = update_callback;
rev.diffopt.format_callback_data = 
+   rev.diffopt.flags |= DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;
rev.max_count = 0; /* do not compare unmerged paths with stage #2 */
run_diff_files(, DIFF_RACY_IS_MODIFIED);
return !!data.add_errors;
diff --git a/builtin/reset.c b/builtin/reset.c
index 046403ed6..772d078b8 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -156,6 +156,7 @@ static int read_from_tree(const struct pathspec *pathspec,
opt.output_format = DIFF_FORMAT_CALLBACK;
opt.format_callback = update_index_from_diff;
opt.format_callback_data = _to_add;
+   opt.flags |= DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;
 
if (do_diff_cache(tree_oid, ))
return 1;
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 01/15] t7411: check configuration parsing errors

2017-07-25 Thread Brandon Williams

Check for configuration parsing errors in '.gitmodules' in t7411, which
is explicitly testing the submodule-config subsystem, instead of in
t7400.  Also explicitly use the test helper instead of relying on the
gitmodules file from being read in status.

Signed-off-by: Brandon Williams 
---
 t/t7400-submodule-basic.sh  | 10 --
 t/t7411-submodule-config.sh | 15 +++
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index dcac364c5..717447526 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -46,16 +46,6 @@ test_expect_success 'submodule update aborts on missing 
gitmodules url' '
test_must_fail git submodule init
 '
 
-test_expect_success 'configuration parsing' '
-   test_when_finished "rm -f .gitmodules" &&
-   cat >.gitmodules <<-\EOF &&
-   [submodule "s"]
-   path
-   ignore
-   EOF
-   test_must_fail git status
-'
-
 test_expect_success 'setup - repository in init subdirectory' '
mkdir init &&
(
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index eea36f1db..7d6b25ba2 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -31,6 +31,21 @@ test_expect_success 'submodule config cache setup' '
)
 '
 
+test_expect_success 'configuration parsing with error' '
+   test_when_finished "rm -rf repo" &&
+   test_create_repo repo &&
+   cat >repo/.gitmodules <<-\EOF &&
+   [submodule "s"]
+   path
+   ignore
+   EOF
+   (
+   cd repo &&
+   test_must_fail test-submodule-config "" s 2>actual &&
+   test_i18ngrep "bad config" actual
+   )
+'
+
 cat >super/expect <

[PATCH 07/15] submodule: don't rely on overlayed config when setting diffopts

2017-07-25 Thread Brandon Williams

Don't rely on overlaying the repository's config on top of the
submodule-config, instead query the repository's config directory for
the ignore field.

Signed-off-by: Brandon Williams 
---
 submodule.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index c5058a4b8..f86b82fbb 100644
--- a/submodule.c
+++ b/submodule.c
@@ -165,8 +165,16 @@ void set_diffopt_flags_from_submodule_config(struct 
diff_options *diffopt,
 {
const struct submodule *submodule = submodule_from_path(_oid, 
path);
if (submodule) {
-   if (submodule->ignore)
-   handle_ignore_submodules_arg(diffopt, 
submodule->ignore);
+   const char *ignore;
+   char *key;
+
+   key = xstrfmt("submodule.%s.ignore", submodule->name);
+   if (repo_config_get_string_const(the_repository, key, ))
+   ignore = submodule->ignore;
+   free(key);
+
+   if (ignore)
+   handle_ignore_submodules_arg(diffopt, ignore);
else if (is_gitmodules_unmerged(_index))
DIFF_OPT_SET(diffopt, IGNORE_SUBMODULES);
}
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 10/15] diff: stop allowing diff to have submodules configured in .git/config

2017-07-25 Thread Brandon Williams

Traditionally a submodule is comprised of a gitlink as well as a
corresponding entry in the .gitmodules file.  Diff doesn't follow this
paradigm as its config callback routine falls back to populating the
submodule-config if a config entry starts with 'submodule.'.

Remove this behavior in order to be consistent with how the
submodule-config is populated, via calling 'gitmodules_config()' or
'repo_read_gitmodules()'.

Signed-off-by: Brandon Williams 
---
 diff.c|  3 ---
 t/t4027-diff-submodule.sh | 67 ---
 2 files changed, 70 deletions(-)

diff --git a/diff.c b/diff.c
index 85e714f6c..e43519b88 100644
--- a/diff.c
+++ b/diff.c
@@ -346,9 +346,6 @@ int git_diff_basic_config(const char *var, const char 
*value, void *cb)
return 0;
}
 
-   if (starts_with(var, "submodule."))
-   return parse_submodule_config_option(var, value);
-
if (git_diff_heuristic_config(var, value, cb) < 0)
return -1;
 
diff --git a/t/t4027-diff-submodule.sh b/t/t4027-diff-submodule.sh
index 518bf9524..2ffd11a14 100755
--- a/t/t4027-diff-submodule.sh
+++ b/t/t4027-diff-submodule.sh
@@ -113,35 +113,6 @@ test_expect_success 'git diff HEAD with dirty submodule 
(work tree, refs match)'
! test -s actual4
 '
 
-test_expect_success 'git diff HEAD with dirty submodule (work tree, refs 
match) [.git/config]' '
-   git config diff.ignoreSubmodules all &&
-   git diff HEAD >actual &&
-   ! test -s actual &&
-   git config submodule.subname.ignore none &&
-   git config submodule.subname.path sub &&
-   git diff HEAD >actual &&
-   sed -e "1,/^@@/d" actual >actual.body &&
-   expect_from_to >expect.body $subprev $subprev-dirty &&
-   test_cmp expect.body actual.body &&
-   git config submodule.subname.ignore all &&
-   git diff HEAD >actual2 &&
-   ! test -s actual2 &&
-   git config submodule.subname.ignore untracked &&
-   git diff HEAD >actual3 &&
-   sed -e "1,/^@@/d" actual3 >actual3.body &&
-   expect_from_to >expect.body $subprev $subprev-dirty &&
-   test_cmp expect.body actual3.body &&
-   git config submodule.subname.ignore dirty &&
-   git diff HEAD >actual4 &&
-   ! test -s actual4 &&
-   git diff HEAD --ignore-submodules=none >actual &&
-   sed -e "1,/^@@/d" actual >actual.body &&
-   expect_from_to >expect.body $subprev $subprev-dirty &&
-   test_cmp expect.body actual.body &&
-   git config --remove-section submodule.subname &&
-   git config --unset diff.ignoreSubmodules
-'
-
 test_expect_success 'git diff HEAD with dirty submodule (work tree, refs 
match) [.gitmodules]' '
git config diff.ignoreSubmodules dirty &&
git diff HEAD >actual &&
@@ -208,24 +179,6 @@ test_expect_success 'git diff HEAD with dirty submodule 
(untracked, refs match)'
! test -s actual4
 '
 
-test_expect_success 'git diff HEAD with dirty submodule (untracked, refs 
match) [.git/config]' '
-   git config submodule.subname.ignore all &&
-   git config submodule.subname.path sub &&
-   git diff HEAD >actual2 &&
-   ! test -s actual2 &&
-   git config submodule.subname.ignore untracked &&
-   git diff HEAD >actual3 &&
-   ! test -s actual3 &&
-   git config submodule.subname.ignore dirty &&
-   git diff HEAD >actual4 &&
-   ! test -s actual4 &&
-   git diff --ignore-submodules=none HEAD >actual &&
-   sed -e "1,/^@@/d" actual >actual.body &&
-   expect_from_to >expect.body $subprev $subprev-dirty &&
-   test_cmp expect.body actual.body &&
-   git config --remove-section submodule.subname
-'
-
 test_expect_success 'git diff HEAD with dirty submodule (untracked, refs 
match) [.gitmodules]' '
git config --add -f .gitmodules submodule.subname.ignore all &&
git config --add -f .gitmodules submodule.subname.path sub &&
@@ -261,26 +214,6 @@ test_expect_success 'git diff between submodule commits' '
! test -s actual
 '
 
-test_expect_success 'git diff between submodule commits [.git/config]' '
-   git diff HEAD^..HEAD >actual &&
-   sed -e "1,/^@@/d" actual >actual.body &&
-   expect_from_to >expect.body $subtip $subprev &&
-   test_cmp expect.body actual.body &&
-   git config submodule.subname.ignore dirty &&
-   git config submodule.subname.path sub &&
-   git diff HEAD^..HEAD >actual &&
-   sed -e "1,/^@@/d" actual >actual.body &&
-   expect_from_to >expect.body $subtip $subprev &&
-   test_cmp expect.body actual.body &&
-   git config submodule.subname.ignore all &&
-   git diff HEAD^..HEAD >actual &&
-   ! test -s actual &&
-   git diff --ignore-submodules=dirty HEAD^..HEAD >actual &&
-   sed -e "1,/^@@/d" actual >actual.body &&
-   expect_from_to >expect.body $subtip $subprev &&
-   git config --remove-section submodule.subname
-'
-

[PATCH 13/15] submodule-config: lazy-load a repository's .gitmodules file

2017-07-25 Thread Brandon Williams

In order to use the submodule-config subsystem, callers first need to
initialize it by calling 'repo_read_gitmodules()' or
'gitmodules_config()' (which just redirects to
'repo_read_gitmodules()').  There are a couple of callers who need to
load an explicit revision of the repository's .gitmodules file (grep) or
need to modify the .gitmodules file so they would need to load it before
modify the file (checkout), but the majority of callers are simply
reading the .gitmodules file present in the working tree.  For the
common case it would be nice to avoid the boilerplate of initializing
the submodule-config system before using it, so instead let's perform
lazy-loading of the submodule-config system.

Remove the calls to reading the gitmodules file from ls-files to show
that lazy-loading the .gitmodules file works.

Signed-off-by: Brandon Williams 
---
 builtin/ls-files.c |  5 -
 submodule-config.c | 27 ++-
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index d14612057..bd74ee07d 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -211,8 +211,6 @@ static void show_submodule(struct repository *superproject,
if (repo_read_index() < 0)
die("index file corrupt");
 
-   repo_read_gitmodules();
-
show_files(, dir);
 
repo_clear();
@@ -611,9 +609,6 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
if (require_work_tree && !is_inside_work_tree())
setup_work_tree();
 
-   if (recurse_submodules)
-   repo_read_gitmodules(the_repository);
-
if (recurse_submodules &&
(show_stage || show_deleted || show_others || show_unmerged ||
 show_killed || show_modified || show_resolve_undo || with_tree))
diff --git a/submodule-config.c b/submodule-config.c
index 86636654b..56d9d76d4 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -18,6 +18,7 @@ struct submodule_cache {
struct hashmap for_path;
struct hashmap for_name;
unsigned initialized:1;
+   unsigned gitmodules_read:1;
 };
 
 /*
@@ -93,6 +94,7 @@ static void submodule_cache_clear(struct submodule_cache 
*cache)
hashmap_free(>for_path, 1);
hashmap_free(>for_name, 1);
cache->initialized = 0;
+   cache->gitmodules_read = 0;
 }
 
 void submodule_cache_free(struct submodule_cache *cache)
@@ -557,8 +559,6 @@ static int gitmodules_cb(const char *var, const char 
*value, void *data)
struct repository *repo = data;
struct parse_config_parameter parameter;
 
-   submodule_cache_check_init(repo);
-
parameter.cache = repo->submodule_cache;
parameter.treeish_name = NULL;
parameter.gitmodules_sha1 = null_sha1;
@@ -569,6 +569,8 @@ static int gitmodules_cb(const char *var, const char 
*value, void *data)
 
 void repo_read_gitmodules(struct repository *repo)
 {
+   submodule_cache_check_init(repo);
+
if (repo->worktree) {
char *gitmodules;
 
@@ -582,6 +584,8 @@ void repo_read_gitmodules(struct repository *repo)
 
free(gitmodules);
}
+
+   repo->submodule_cache->gitmodules_read = 1;
 }
 
 void gitmodules_config_oid(const struct object_id *commit_oid)
@@ -589,24 +593,37 @@ void gitmodules_config_oid(const struct object_id 
*commit_oid)
struct strbuf rev = STRBUF_INIT;
struct object_id oid;
 
+   submodule_cache_check_init(the_repository);
+
if (gitmodule_oid_from_commit(commit_oid, , )) {
git_config_from_blob_oid(gitmodules_cb, rev.buf,
 , the_repository);
}
strbuf_release();
+
+   the_repository->submodule_cache->gitmodules_read = 1;
+}
+
+static void gitmodules_read_check(struct repository *repo)
+{
+   submodule_cache_check_init(repo);
+
+   /* read the repo's .gitmodules file if it hasn't been already */
+   if (!repo->submodule_cache->gitmodules_read)
+   repo_read_gitmodules(repo);
 }
 
 const struct submodule *submodule_from_name(const struct object_id 
*treeish_name,
const char *name)
 {
-   submodule_cache_check_init(the_repository);
+   gitmodules_read_check(the_repository);
return config_from(the_repository->submodule_cache, treeish_name, name, 
lookup_name);
 }
 
 const struct submodule *submodule_from_path(const struct object_id 
*treeish_name,
const char *path)
 {
-   submodule_cache_check_init(the_repository);
+   gitmodules_read_check(the_repository);
return config_from(the_repository->submodule_cache, treeish_name, path, 
lookup_path);
 }
 
@@ -614,7 +631,7 @@ const struct submodule *submodule_from_cache(struct 
repository *repo,
 const struct object_id 
*treeish_name,
 const char *key)
 {
-

[PATCH 11/15] submodule-config: remove support for overlaying repository config

2017-07-25 Thread Brandon Williams

All callers have been migrated to explicitly read any configuration they
need.  The support for handling it automatically in submodule-config is
no longer needed.

Signed-off-by: Brandon Williams 
---
 submodule-config.h   |  1 -
 t/helper/test-submodule-config.c |  6 
 t/t7411-submodule-config.sh  | 72 
 3 files changed, 79 deletions(-)

diff --git a/submodule-config.h b/submodule-config.h
index cccd34b92..84c2cf515 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -34,7 +34,6 @@ extern int option_fetch_parse_recurse_submodules(const struct 
option *opt,
 const char *arg, int unset);
 extern int parse_update_recurse_submodules_arg(const char *opt, const char 
*arg);
 extern int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
-extern int parse_submodule_config_option(const char *var, const char *value);
 extern int submodule_config_option(struct repository *repo,
   const char *var, const char *value);
 extern const struct submodule *submodule_from_name(
diff --git a/t/helper/test-submodule-config.c b/t/helper/test-submodule-config.c
index e13fbcc1b..f4a7c431c 100644
--- a/t/helper/test-submodule-config.c
+++ b/t/helper/test-submodule-config.c
@@ -10,11 +10,6 @@ static void die_usage(int argc, const char **argv, const 
char *msg)
exit(1);
 }
 
-static int git_test_config(const char *var, const char *value, void *cb)
-{
-   return parse_submodule_config_option(var, value);
-}
-
 int cmd_main(int argc, const char **argv)
 {
const char **arg = argv;
@@ -38,7 +33,6 @@ int cmd_main(int argc, const char **argv)
 
setup_git_directory();
gitmodules_config();
-   git_config(git_test_config, NULL);
 
while (*arg) {
struct object_id commit_oid;
diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh
index 7d6b25ba2..46c09c776 100755
--- a/t/t7411-submodule-config.sh
+++ b/t/t7411-submodule-config.sh
@@ -122,78 +122,6 @@ test_expect_success 'using different treeishs works' '
)
 '
 
-cat >super/expect_url actual &&
-   test_cmp expect_local_path actual &&
-   git config submodule.a.url "$old_a" &&
-   git config submodule.submodule.url "$old_submodule" &&
-   git config --unset submodule.a.path c
-   )
-'
-
-cat >super/expect_url super/expect_fetchrecurse_die.err actual.err &&
-   touch expect_fetchrecurse_die.out &&
-   test_cmp expect_fetchrecurse_die.out actual.out  &&
-   test_cmp expect_fetchrecurse_die.err actual.err  &&
-   git config --unset submodule.submodule.fetchrecursesubmodules
-   )
-'
-
 test_expect_success 'error in history in fetchrecursesubmodule lets continue' '
(cd super &&
git config -f .gitmodules \
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 15/15] submodule: remove gitmodules_config

2017-07-25 Thread Brandon Williams

Now that the submodule-config subsystem can lazily read the gitmodules
file we no longer need to explicitly pre-read the gitmodules by calling
'gitmodules_config()' so let's remove it.

Signed-off-by: Brandon Williams 
---
 builtin/checkout.c   |  1 -
 builtin/commit.c |  1 -
 builtin/diff-files.c |  1 -
 builtin/diff-index.c |  1 -
 builtin/diff-tree.c  |  1 -
 builtin/diff.c   |  2 --
 builtin/fetch.c  |  4 
 builtin/grep.c   |  4 
 builtin/mv.c |  1 -
 builtin/read-tree.c  |  2 --
 builtin/reset.c  |  2 --
 builtin/rm.c |  1 -
 builtin/submodule--helper.c  | 14 --
 submodule.c  | 15 ---
 submodule.h  |  2 --
 t/helper/test-submodule-config.c |  1 -
 16 files changed, 53 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 246e0cd16..63ae16afc 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -1179,7 +1179,6 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
opts.prefix = prefix;
opts.show_progress = -1;
 
-   gitmodules_config();
git_config(git_checkout_config, );
 
opts.track = BRANCH_TRACK_UNSPECIFIED;
diff --git a/builtin/commit.c b/builtin/commit.c
index 4bbac014a..18ad714d9 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -195,7 +195,6 @@ static void determine_whence(struct wt_status *s)
 static void status_init_config(struct wt_status *s, config_fn_t fn)
 {
wt_status_prepare(s);
-   gitmodules_config();
git_config(fn, s);
determine_whence(s);
init_diff_ui_defaults();
diff --git a/builtin/diff-files.c b/builtin/diff-files.c
index 17bf84d18..e88493ffe 100644
--- a/builtin/diff-files.c
+++ b/builtin/diff-files.c
@@ -26,7 +26,6 @@ int cmd_diff_files(int argc, const char **argv, const char 
*prefix)
 
git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
init_revisions(, prefix);
-   gitmodules_config();
rev.abbrev = 0;
precompose_argv(argc, argv);
 
diff --git a/builtin/diff-index.c b/builtin/diff-index.c
index 185e6f9b5..9d772f8f2 100644
--- a/builtin/diff-index.c
+++ b/builtin/diff-index.c
@@ -23,7 +23,6 @@ int cmd_diff_index(int argc, const char **argv, const char 
*prefix)
 
git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
init_revisions(, prefix);
-   gitmodules_config();
rev.abbrev = 0;
precompose_argv(argc, argv);
 
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 31d2cb410..d66499909 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -110,7 +110,6 @@ int cmd_diff_tree(int argc, const char **argv, const char 
*prefix)
 
git_config(git_diff_basic_config, NULL); /* no "diff" UI options */
init_revisions(opt, prefix);
-   gitmodules_config();
opt->abbrev = 0;
opt->diff = 1;
opt->disable_stdin = 1;
diff --git a/builtin/diff.c b/builtin/diff.c
index 7cde6abbc..7e3ebcea3 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -315,8 +315,6 @@ int cmd_diff(int argc, const char **argv, const char 
*prefix)
no_index = DIFF_NO_INDEX_IMPLICIT;
}
 
-   if (!no_index)
-   gitmodules_config();
init_diff_ui_defaults();
git_config(git_diff_ui_config, NULL);
precompose_argv(argc, argv);
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 3fe99073d..132e3224e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1360,10 +1360,6 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
if (depth || deepen_since || deepen_not.nr)
deepen = 1;
 
-   if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
-   gitmodules_config();
-   }
-
if (all) {
if (argc == 1)
die(_("fetch --all does not take a repository 
argument"));
diff --git a/builtin/grep.c b/builtin/grep.c
index ac06d2d33..2d65f27d0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -1048,10 +1048,6 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
}
 #endif
 
-   if (recurse_submodules) {
-   gitmodules_config();
-   }
-
if (show_in_pager && (cached || list.nr))
die(_("--open-files-in-pager only works on the worktree"));
 
diff --git a/builtin/mv.c b/builtin/mv.c
index 94fbaaa5d..ffdd5f01a 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -131,7 +131,6 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
struct stat st;
struct string_list src_for_dst = STRING_LIST_INIT_NODUP;
 
-   gitmodules_config();
git_config(git_default_config, NULL);
 
argc = parse_options(argc, argv, prefix, builtin_mv_options,
diff --git

[PATCH 05/15] submodule--helper: don't overlay config in update-clone

2017-07-25 Thread Brandon Williams

Don't rely on overlaying the repository's config on top of the
submodule-config, instead query the repository's config directly for the
url and the update strategy configuration.

Signed-off-by: Brandon Williams 
---
 builtin/submodule--helper.c | 14 ++
 submodule.c | 30 ++
 submodule.h |  3 +++
 3 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index f71f4270d..25f471ba1 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -780,6 +780,8 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
   struct strbuf *out)
 {
const struct submodule *sub = NULL;
+   const char *url = NULL;
+   struct submodule_update_strategy update;
struct strbuf displaypath_sb = STRBUF_INIT;
struct strbuf sb = STRBUF_INIT;
const char *displaypath = NULL;
@@ -808,9 +810,10 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
goto cleanup;
}
 
+   update = submodule_strategy_with_config_overlayed(the_repository, sub);
if (suc->update.type == SM_UPDATE_NONE
|| (suc->update.type == SM_UPDATE_UNSPECIFIED
-   && sub->update_strategy.type == SM_UPDATE_NONE)) {
+   && update.type == SM_UPDATE_NONE)) {
strbuf_addf(out, _("Skipping submodule '%s'"), displaypath);
strbuf_addch(out, '\n');
goto cleanup;
@@ -822,6 +825,11 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
goto cleanup;
}
 
+   strbuf_reset();
+   strbuf_addf(, "submodule.%s.url", sub->name);
+   if (repo_config_get_string_const(the_repository, sb.buf, ))
+   url = sub->url;
+
strbuf_reset();
strbuf_addf(, "%s/.git", ce->name);
needs_cloning = !file_exists(sb.buf);
@@ -851,7 +859,7 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
argv_array_push(>args, "--depth=1");
argv_array_pushl(>args, "--path", sub->path, NULL);
argv_array_pushl(>args, "--name", sub->name, NULL);
-   argv_array_pushl(>args, "--url", sub->url, NULL);
+   argv_array_pushl(>args, "--url", url, NULL);
if (suc->references.nr) {
struct string_list_item *item;
for_each_string_list_item(item, >references)
@@ -1025,9 +1033,7 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
if (pathspec.nr)
suc.warn_if_uninitialized = 1;
 
-   /* Overlay the parsed .gitmodules file with .git/config */
gitmodules_config();
-   git_config(submodule_config, NULL);
 
run_processes_parallel(max_jobs,
   update_clone_get_next_task,
diff --git a/submodule.c b/submodule.c
index fd391aea6..8b9e48a61 100644
--- a/submodule.c
+++ b/submodule.c
@@ -440,6 +440,36 @@ const char *submodule_strategy_to_string(const struct 
submodule_update_strategy
return NULL;
 }
 
+struct submodule_update_strategy 
submodule_strategy_with_config_overlayed(struct repository *repo,
+ const 
struct submodule *sub)
+{
+   struct submodule_update_strategy strat = sub->update_strategy;
+   const char *update;
+   char *key;
+
+   key = xstrfmt("submodule.%s.update", sub->name);
+   if (!repo_config_get_string_const(repo, key, )) {
+   strat.command = NULL;
+   if (!strcmp(update, "none")) {
+   strat.type = SM_UPDATE_NONE;
+   } else if (!strcmp(update, "checkout")) {
+   strat.type = SM_UPDATE_CHECKOUT;
+   } else if (!strcmp(update, "rebase")) {
+   strat.type = SM_UPDATE_REBASE;
+   } else if (!strcmp(update, "merge")) {
+   strat.type = SM_UPDATE_MERGE;
+   } else if (skip_prefix(update, "!", )) {
+   strat.type = SM_UPDATE_COMMAND;
+   strat.command = update;
+   } else {
+   die("invalid submodule update strategy '%s'", update);
+   }
+   }
+   free(key);
+
+   return strat;
+}
+
 void handle_ignore_submodules_arg(struct diff_options *diffopt,
  const char *arg)
 {
diff --git a/submodule.h b/submodule.h
index e402b004f..f17ca1e34 100644
--- a/submodule.h
+++ b/submodule.h
@@ -6,6 +6,7 @@ struct diff_options;
 struct argv_array;
 struct oid_array;
 struct remote;
+struct submodule;
 
 enum {
RECURSE_SUBMODULES_ONLY = -5,
@@ -65,6 +66,8 @@ extern void die_path_inside_submodule(const struct 
index_state *istate,
 extern int

[PATCH 12/15] submodule-config: move submodule-config functions to submodule-config.c

2017-07-25 Thread Brandon Williams

Migrate the functions used to initialize the submodule-config to
submodule-config.c so that the callback routine used in the
initialization process can be static and prevent it from being used
outside of initializing the submodule-config through the main API.

Signed-off-by: Brandon Williams 
---
 builtin/ls-files.c |  1 +
 submodule-config.c | 38 +++---
 submodule-config.h |  7 ++-
 submodule.c| 35 ---
 submodule.h|  2 --
 5 files changed, 34 insertions(+), 49 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b8514a002..d14612057 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -19,6 +19,7 @@
 #include "pathspec.h"
 #include "run-command.h"
 #include "submodule.h"
+#include "submodule-config.h"
 
 static int abbrev;
 static int show_deleted;
diff --git a/submodule-config.c b/submodule-config.c
index 0b429e942..86636654b 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -449,9 +449,9 @@ static int parse_config(const char *var, const char *value, 
void *data)
return ret;
 }
 
-int gitmodule_oid_from_commit(const struct object_id *treeish_name,
- struct object_id *gitmodules_oid,
- struct strbuf *rev)
+static int gitmodule_oid_from_commit(const struct object_id *treeish_name,
+struct object_id *gitmodules_oid,
+struct strbuf *rev)
 {
int ret = 0;
 
@@ -552,9 +552,9 @@ static void submodule_cache_check_init(struct repository 
*repo)
submodule_cache_init(repo->submodule_cache);
 }
 
-int submodule_config_option(struct repository *repo,
-   const char *var, const char *value)
+static int gitmodules_cb(const char *var, const char *value, void *data)
 {
+   struct repository *repo = data;
struct parse_config_parameter parameter;
 
submodule_cache_check_init(repo);
@@ -567,9 +567,33 @@ int submodule_config_option(struct repository *repo,
return parse_config(var, value, );
 }
 
-int parse_submodule_config_option(const char *var, const char *value)
+void repo_read_gitmodules(struct repository *repo)
 {
-   return submodule_config_option(the_repository, var, value);
+   if (repo->worktree) {
+   char *gitmodules;
+
+   if (repo_read_index(repo) < 0)
+   return;
+
+   gitmodules = repo_worktree_path(repo, GITMODULES_FILE);
+
+   if (!is_gitmodules_unmerged(repo->index))
+   git_config_from_file(gitmodules_cb, gitmodules, repo);
+
+   free(gitmodules);
+   }
+}
+
+void gitmodules_config_oid(const struct object_id *commit_oid)
+{
+   struct strbuf rev = STRBUF_INIT;
+   struct object_id oid;
+
+   if (gitmodule_oid_from_commit(commit_oid, , )) {
+   git_config_from_blob_oid(gitmodules_cb, rev.buf,
+, the_repository);
+   }
+   strbuf_release();
 }
 
 const struct submodule *submodule_from_name(const struct object_id 
*treeish_name,
diff --git a/submodule-config.h b/submodule-config.h
index 84c2cf515..e3845831f 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -34,8 +34,8 @@ extern int option_fetch_parse_recurse_submodules(const struct 
option *opt,
 const char *arg, int unset);
 extern int parse_update_recurse_submodules_arg(const char *opt, const char 
*arg);
 extern int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
-extern int submodule_config_option(struct repository *repo,
-  const char *var, const char *value);
+extern void repo_read_gitmodules(struct repository *repo);
+extern void gitmodules_config_oid(const struct object_id *commit_oid);
 extern const struct submodule *submodule_from_name(
const struct object_id *commit_or_tree, const char *name);
 extern const struct submodule *submodule_from_path(
@@ -43,9 +43,6 @@ extern const struct submodule *submodule_from_path(
 extern const struct submodule *submodule_from_cache(struct repository *repo,
const struct object_id 
*treeish_name,
const char *key);
-extern int gitmodule_oid_from_commit(const struct object_id *commit_oid,
-struct object_id *gitmodules_oid,
-struct strbuf *rev);
 extern void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
diff --git a/submodule.c b/submodule.c
index f63940347..7ebd639f4 100644
--- a/submodule.c
+++ b/submodule.c
@@ -216,46 +216,11 @@ void load_submodule_cache(void)
gitmodules_config();
 }
 
-static int gitmodules_cb(const char *var, const char *value, void *data)
-{
-

[PATCH 14/15] unpack-trees: improve loading of .gitmodules

2017-07-25 Thread Brandon Williams

When recursing submodules 'check_updates()' needs to have strict control
over the submodule-config subsystem to ensure that the gitmodules file
has been read before checking cache entries which are marked for
removal as well ensuring the proper gitmodules file is read before
updating cache entries.

Because of this let's not rely on callers of 'check_updates()' to read
the gitmodules file before calling 'check_updates()' and handle the
reading explicitly.

Signed-off-by: Brandon Williams 
---
 unpack-trees.c | 42 ++
 1 file changed, 26 insertions(+), 16 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index dc66b880d..144c556c8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -283,22 +283,28 @@ static int check_submodule_move_head(const struct 
cache_entry *ce,
}
 }
 
-static void reload_gitmodules_file(struct index_state *index,
-  struct checkout *state)
+/*
+ * Preform the loading of the repository's gitmodules file.  This function is
+ * used by 'check_update()' to perform loading of the gitmodules file in two
+ * differnt situations:
+ * (1) before removing entries from the working tree if the gitmodules file has
+ * been marked for removal.  This situation is specified by 'state' == 
NULL.
+ * (2) before checking out entries to the working tree if the gitmodules file
+ * has been marked for update.  This situation is specified by 'state' != 
NULL.
+ */
+static void load_gitmodules_file(struct index_state *index,
+struct checkout *state)
 {
-   int i;
-   for (i = 0; i < index->cache_nr; i++) {
-   struct cache_entry *ce = index->cache[i];
-   if (ce->ce_flags & CE_UPDATE) {
-   int r = strcmp(ce->name, ".gitmodules");
-   if (r < 0)
-   continue;
-   else if (r == 0) {
-   submodule_free();
-   checkout_entry(ce, state, NULL);
-   gitmodules_config();
-   } else
-   break;
+   int pos = index_name_pos(index, GITMODULES_FILE, 
strlen(GITMODULES_FILE));
+
+   if (pos >= 0) {
+   struct cache_entry *ce = index->cache[pos];
+   if (!state && ce->ce_flags & CE_WT_REMOVE) {
+   repo_read_gitmodules(the_repository);
+   } else if (state && (ce->ce_flags & CE_UPDATE)) {
+   submodule_free();
+   checkout_entry(ce, state, NULL);
+   repo_read_gitmodules(the_repository);
}
}
 }
@@ -371,6 +377,10 @@ static int check_updates(struct unpack_trees_options *o)
 
if (o->update)
git_attr_set_direction(GIT_ATTR_CHECKOUT, index);
+
+   if (should_update_submodules() && o->update && !o->dry_run)
+   load_gitmodules_file(index, NULL);
+
for (i = 0; i < index->cache_nr; i++) {
const struct cache_entry *ce = index->cache[i];
 
@@ -384,7 +394,7 @@ static int check_updates(struct unpack_trees_options *o)
remove_scheduled_dirs();
 
if (should_update_submodules() && o->update && !o->dry_run)
-   reload_gitmodules_file(index, );
+   load_gitmodules_file(index, );
 
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 09/15] submodule: remove submodule_config callback routine

2017-07-25 Thread Brandon Williams

Remove the last remaining caller of 'submodule_config()' as well as the
function itself.

With 'submodule_config()' being removed the submodule-config API can be
a little simpler as callers don't need to worry about whether or not
they need to overlay the repository's config on top of the
submodule-config.  This also makes it more difficult to accidentally
add non-submodule specific configuration to the .gitmodules file.

Signed-off-by: Brandon Williams 
---
 builtin/submodule--helper.c |  1 -
 submodule.c | 25 ++---
 submodule.h |  1 -
 3 files changed, 2 insertions(+), 25 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 25f471ba1..c16249e30 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1196,7 +1196,6 @@ static int absorb_git_dirs(int argc, const char **argv, 
const char *prefix)
 git_submodule_helper_usage, 0);
 
gitmodules_config();
-   git_config(submodule_config, NULL);
 
if (module_list_compute(argc, argv, prefix, , ) < 0)
return 1;
diff --git a/submodule.c b/submodule.c
index 13380fed1..f63940347 100644
--- a/submodule.c
+++ b/submodule.c
@@ -180,27 +180,6 @@ void set_diffopt_flags_from_submodule_config(struct 
diff_options *diffopt,
}
 }
 
-/* For loading from the .gitmodules file. */
-static int git_modules_config(const char *var, const char *value, void *cb)
-{
-   if (starts_with(var, "submodule."))
-   return parse_submodule_config_option(var, value);
-   return 0;
-}
-
-/* Loads all submodule settings from the config. */
-int submodule_config(const char *var, const char *value, void *cb)
-{
-   if (!strcmp(var, "submodule.recurse")) {
-   int v = git_config_bool(var, value) ?
-   RECURSE_SUBMODULES_ON : RECURSE_SUBMODULES_OFF;
-   config_update_recurse_submodules = v;
-   return 0;
-   } else {
-   return git_modules_config(var, value, cb);
-   }
-}
-
 /* Cheap function that only determines if we're interested in submodules at 
all */
 int git_default_submodule_config(const char *var, const char *value, void *cb)
 {
@@ -271,8 +250,8 @@ void gitmodules_config_oid(const struct object_id 
*commit_oid)
struct object_id oid;
 
if (gitmodule_oid_from_commit(commit_oid, , )) {
-   git_config_from_blob_oid(submodule_config, rev.buf,
-, NULL);
+   git_config_from_blob_oid(gitmodules_cb, rev.buf,
+, the_repository);
}
strbuf_release();
 }
diff --git a/submodule.h b/submodule.h
index f17ca1e34..1c6b2ab4e 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,7 +41,6 @@ extern int remove_path_from_gitmodules(const char *path);
 extern void stage_updated_gitmodules(void);
 extern void set_diffopt_flags_from_submodule_config(struct diff_options *,
const char *path);
-extern int submodule_config(const char *var, const char *value, void *cb);
 extern int git_default_submodule_config(const char *var, const char *value, 
void *cb);
 
 struct option;
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 08/15] unpack-trees: don't rely on overlayed config

2017-07-25 Thread Brandon Williams

Don't rely on overlaying the repository's config on top of the
submodule-config, instead query the repository's config directory for
the submodule's update strategy.

Also remove the overlaying of the repository's config (via using
'submodule_config()') from the commands which use the unpack-trees
logic (checkout, read-tree, reset).

Signed-off-by: Brandon Williams 
---
 builtin/checkout.c |  2 +-
 submodule.c|  1 -
 unpack-trees.c | 12 +---
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 9661e1bcb..246e0cd16 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -858,7 +858,7 @@ static int git_checkout_config(const char *var, const char 
*value, void *cb)
}
 
if (starts_with(var, "submodule."))
-   return submodule_config(var, value, NULL);
+   return git_default_submodule_config(var, value, NULL);
 
return git_xmerge_config(var, value, NULL);
 }
diff --git a/submodule.c b/submodule.c
index f86b82fbb..13380fed1 100644
--- a/submodule.c
+++ b/submodule.c
@@ -235,7 +235,6 @@ void load_submodule_cache(void)
return;
 
gitmodules_config();
-   git_config(submodule_config, NULL);
 }
 
 static int gitmodules_cb(const char *var, const char *value, void *data)
diff --git a/unpack-trees.c b/unpack-trees.c
index dd535bc84..dc66b880d 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1,5 +1,6 @@
 #define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "repository.h"
 #include "config.h"
 #include "dir.h"
 #include "tree.h"
@@ -255,13 +256,16 @@ static int check_submodule_move_head(const struct 
cache_entry *ce,
 {
unsigned flags = SUBMODULE_MOVE_HEAD_DRY_RUN;
const struct submodule *sub = submodule_from_ce(ce);
+   struct submodule_update_strategy update;
+
if (!sub)
return 0;
 
if (o->reset)
flags |= SUBMODULE_MOVE_HEAD_FORCE;
 
-   switch (sub->update_strategy.type) {
+   update = submodule_strategy_with_config_overlayed(the_repository, sub);
+   switch (update.type) {
case SM_UPDATE_UNSPECIFIED:
case SM_UPDATE_CHECKOUT:
if (submodule_move_head(ce->name, old_id, new_id, flags))
@@ -293,7 +297,6 @@ static void reload_gitmodules_file(struct index_state 
*index,
submodule_free();
checkout_entry(ce, state, NULL);
gitmodules_config();
-   git_config(submodule_config, NULL);
} else
break;
}
@@ -308,7 +311,10 @@ static void unlink_entry(const struct cache_entry *ce)
 {
const struct submodule *sub = submodule_from_ce(ce);
if (sub) {
-   switch (sub->update_strategy.type) {
+   struct submodule_update_strategy update =
+   submodule_strategy_with_config_overlayed(the_repository,
+sub);
+   switch (update.type) {
case SM_UPDATE_UNSPECIFIED:
case SM_UPDATE_CHECKOUT:
case SM_UPDATE_REBASE:
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 02/15] submodule: don't use submodule_from_name

2017-07-25 Thread Brandon Williams

The function 'submodule_from_name()' is being used incorrectly here as a
submodule path is being used instead of a submodule name.  Since the
correct function to use with a path to a submodule is already being used
('submodule_from_path()') let's remove the call to
'submodule_from_name()'.

Signed-off-by: Brandon Williams 
---
 submodule.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index 7e87e4698..fd391aea6 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1177,8 +1177,6 @@ static int get_next_submodule(struct child_process *cp,
continue;
 
submodule = submodule_from_path(_oid, ce->name);
-   if (!submodule)
-   submodule = submodule_from_name(_oid, ce->name);
 
default_argv = "yes";
if (spf->command_line_option == RECURSE_SUBMODULES_DEFAULT) {
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 00/15] submodule-config cleanup

2017-07-25 Thread Brandon Williams

The aim of this series is to cleanup the submodule-config and make it simpler
to use.  The two main parts to this series are:
(1) removing the ability to overlay the repository's config over the
submodule-config.  This makes the API clunky as you don't really know when
you want to overlay and when you don't.  So instead all the relevant
sections (where you are interested in the repository's config) are patched
to read the configuration directly from the repository's config.
(2) Add the ability to lazy-load the gitmodules file from the working
directory.  Most callers are required to first populate the
submodule-config by calling gitmodules_config.  Instead let's just
lazy-load it if needed.  Only a couple callers will still require loading
the gitmodules files by hand while the rest can have it lazy-loaded and no
longer need to explicitly load it themselves.  This falls more in line with
how specific revisions are already lazy-loaded.

As a side note, instead of having unpack-trees read configuration for the
'update' config (which is used by submodule update) we may just want to drop
respecting this all together as it doesn't make much sense in the context of a
checkout or reset.  If that's the case then we can make the parts of the code
which use 'update' even simpler.

This series is built on and requires the 'bw/grep-recurse-submodules' and
'bc/object-id' branches.

Brandon Williams (15):
  t7411: check configuration parsing errors
  submodule: don't use submodule_from_name
  add, reset: ensure submodules can be added or reset
  submodule--helper: don't overlay config in remote_submodule_branch
  submodule--helper: don't overlay config in update-clone
  fetch: don't overlay config with submodule-config
  submodule: don't rely on overlayed config when setting diffopts
  unpack-trees: don't rely on overlayed config
  submodule: remove submodule_config callback routine
  diff: stop allowing diff to have submodules configured in .git/config
  submodule-config: remove support for overlaying repository config
  submodule-config: move submodule-config functions to
submodule-config.c
  submodule-config: lazy-load a repository's .gitmodules file
  unpack-trees: improve loading of .gitmodules
  submodule: remove gitmodules_config

 builtin/add.c|   1 +
 builtin/checkout.c   |   3 +-
 builtin/commit.c |   1 -
 builtin/diff-files.c |   1 -
 builtin/diff-index.c |   1 -
 builtin/diff-tree.c  |   1 -
 builtin/diff.c   |   2 -
 builtin/fetch.c  |   5 --
 builtin/grep.c   |   4 --
 builtin/ls-files.c   |   6 +-
 builtin/mv.c |   1 -
 builtin/read-tree.c  |   2 -
 builtin/reset.c  |   3 +-
 builtin/rm.c |   1 -
 builtin/submodule--helper.c  |  42 ++--
 diff.c   |   3 -
 submodule-config.c   |  65 ++
 submodule-config.h   |   8 +--
 submodule.c  | 140 ---
 submodule.h  |   8 +--
 t/helper/test-submodule-config.c |   7 --
 t/t4027-diff-submodule.sh|  67 ---
 t/t7400-submodule-basic.sh   |  10 ---
 t/t7411-submodule-config.sh  |  87 +---
 unpack-trees.c   |  54 +--
 25 files changed, 189 insertions(+), 334 deletions(-)

-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH 04/15] submodule--helper: don't overlay config in remote_submodule_branch

2017-07-25 Thread Brandon Williams

Don't rely on overlaying the repository's config on top of the
submodule-config, instead query the repository's config directly for the
branch field.

Signed-off-by: Brandon Williams 
---
 builtin/submodule--helper.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 1e49ce580..f71f4270d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1066,17 +1066,24 @@ static int resolve_relative_path(int argc, const char 
**argv, const char *prefix
 static const char *remote_submodule_branch(const char *path)
 {
const struct submodule *sub;
+   const char *branch = NULL;
+   char *key;
+
gitmodules_config();
-   git_config(submodule_config, NULL);
 
sub = submodule_from_path(_oid, path);
if (!sub)
return NULL;
 
-   if (!sub->branch)
+   key = xstrfmt("submodule.%s.branch", sub->name);
+   if (repo_config_get_string_const(the_repository, key, ))
+   branch = sub->branch;
+   free(key);
+
+   if (!branch)
return "master";
 
-   if (!strcmp(sub->branch, ".")) {
+   if (!strcmp(branch, ".")) {
unsigned char sha1[20];
const char *refname = resolve_ref_unsafe("HEAD", 0, sha1, NULL);
 
@@ -1094,7 +1101,7 @@ static const char *remote_submodule_branch(const char 
*path)
return refname;
}
 
-   return sub->branch;
+   return branch;
 }
 
 static int resolve_remote_submodule_branch(int argc, const char **argv,
-- 
2.14.0.rc0.400.g1c36432dff-goog

Re: Should I store large text files on Git LFS?

2017-07-25 Thread Stefan Beller

On Tue, Jul 25, 2017 at 2:13 PM, Jeff King  wrote:
> On Tue, Jul 25, 2017 at 01:52:46PM -0700, Junio C Hamano wrote:
>
>> Jeff King  writes:
>>
>> > As you can see, core.bigfilethreshold is a pretty blunt instrument. It
>> > might be nice if .gitattributes understood other types of patterns
>> > besides filenames, so you could do something like:
>> >
>> >   echo '[size > 500MB] delta -diff' >.gitattributes
>> >
>> > or something like that. I don't think it's come up enough for anybody to
>> > care too much about it or work on it.
>>
>> But attributes is about paths, at which a blob may or may not exist,
>> so it is a bad fit to add conditionals that are based on sizes and
>> types.
>
> Do attributes _have_ to be about paths? In practice we often use them to
> describe objects, and paths are just the only mechanism we give to refer
> to objects.  But it is not actually a correct or rigorous mechanism in
> some cases.  For example, imagine I have a .gitattributes with:
>
>   foo -delta
>   bar delta
>
> and then imagine I have a tree with both "foo" and "bar" pointing to the
> same blob. When I run pack-objects, it wants to know whether to delta
> the object. What should it do?
>
> The delta decision is really a property of the object. But the only
> mechanism we give for selecting an object is by path, which we know is
> not a one-to-one mapping with objects. So the results you get will
> depend on which name we happened to see the object under first while
> traversing.
>
> I think the case you are getting at is something like clean filters,
> where we might not have an object at all. In that case I would argue
> that a property of an object could never be satisfied (so neither
> "size > 500" nor "size <= 500" could match). Whether object properties
> are meaningful is in the eye of the code that is looking up the value.
> Or more generally, the set of properties to be matched is in the eye of
> the caller. So looking up a clean filter might want to define the size
> property based no the working tree size.
>
> -Peff

I recall a similar discussion on the different "big repo" approaches.
Looking at the interface of LFS, there are things such as:

  git lfs fetch --recent
  git lfs fetch --all
  git lfs fetch [--exclude] 

so LFS provides both the way to address objects via time or by path,
maybe even combined "I want everything from  but only
'recent' things from ".

attributes can already be queried from pathspecs, and I think when
designing from scratch we might put it the other way round:

delta:
bar
everything <500m
-delta
foo
binaries

So in the far future, attributes may learn about more than just
pathspecs that we currently use to assign labels, but could
* include size
* properties derived from the 'file' utility
* be specific about certain objects (historic paths)

Re: [PATCH v2] doc: add missing values "none" and "default" for diff.wsErrorHighlight

2017-07-25 Thread Junio C Hamano

Andreas Heiduk  writes:

> The values have eluded documentation so far. While at it streamline
> the wording by grouping relevant parts together.
>
> Signed-off-by: Andreas Heiduk 
> ---
>  Documentation/diff-config.txt  | 11 +++
>  Documentation/diff-options.txt | 17 -
>  2 files changed, 15 insertions(+), 13 deletions(-)

Looks sensible; thanks.  Will queue.

Re: [PATCH v3 0/3] interpret-trailers: add --where, --if-exists, --if-missing

2017-07-25 Thread Junio C Hamano

Paolo Bonzini  writes:

> From: Paolo Bonzini 
>
> These options are useful to experiment with "git interpret-trailers"
> without having to tinker with .gitconfig (Junio said git should ahve
> done this first and only added configuration afterwards).  It can
> be useful in the case where you want a different placement for the trailer,
> or for scripts/aliases that don't want to rely on specific .gitconfig
> settings.
>
> Compared to v2, the main change is that option order on the command-line
> is respected.  That is,
>
>   --trailer 'acked-by: foo' --where end --trailer 'signed-off-by: me'
>
> will only apply where=end to the second trailer.  Likewise,
>
>   --where end --trailer 'signed-off-by: me' --no-where \
>   --trailer 'acked-by: foo'
>
> will only apply it to the first, reverting to trailer.*.where for the
> "acked-by" trailer.

I am getting the following in my build after merging these to 'pu'.

trailer.c: In function 'apply_arg_if_exists':
trailer.c:270:2: error: enumeration value 'EXISTS_DEFAULT' not handled in 
switch [-Werror=switch]
  switch (arg_tok->conf.if_exists) {
  ^
trailer.c: In function 'apply_arg_if_missing':
trailer.c:307:2: error: enumeration value 'MISSING_DEFAULT' not handled in 
switch [-Werror=switch]
  switch (arg_tok->conf.if_missing) {
  ^
trailer.c: In function 'process_command_line_args':
trailer.c:717:3: error: ISO C90 forbids mixed declarations and code 
[-Werror=declaration-after-statement]
   int separator_pos = find_separator(string, cl_separators);
   ^
cc1: all warnings being treated as errors
m

Re: [PATCH/RFC] setup: update error message to be more meaningful

2017-07-25 Thread Jonathan Nieder

Hi,

Kaartic Sivaraam wrote:

> The error message shown when a flag is found when expecting a
> filename wasn't clear as it didn't communicate what was wrong
> using the 'suitable' words in *all* cases.
>
> Correct case,
>
> $ git rev-parse commit.c --flags
> commit.c
> --flags
> fatal: bad flag '--flags' used after filename
>
> Incorrect case,
>
> $ git grep "test" -n
> fatal: bad flag '-n' used after filename
>
> Change the error message to be general and communicative.

Thanks for writing this.  These examples describe *what* the behavior
is but don't describe *why* it is wrong or what is expected in its
place.

For an initial guess: in the example

git grep test -n

it is confusing that it says "bad flag used after filename" because
test isn't even a filename!  At first glance, I would imagine that any
of the following behaviors would be nicer:

 1. Treat the command as "git grep -e test -n", or in other words
"do what I mean" since it is clear enough, at least to humans.

 2. Focus on "argument" instead of "filename" so that the message
could still apply: something like

fatal: option '-n' must come before non-option arguments

[...]
> --- a/setup.c
> +++ b/setup.c
> @@ -230,7 +230,7 @@ void verify_filename(const char *prefix,
>int diagnose_misspelt_rev)
>  {
>   if (*arg == '-')
> - die("bad flag '%s' used after filename", arg);
> + die("found flag '%s' in place of a filename", arg);

Probably because of the background I am missing described above, it's
not clear to me that the new message is any better (or worse) than the
existing one.  The old message with "after filename" has the virtue of
explaining why an option is not expected there.

Thanks and hope that helps,
Jonathan

Re: [PATCH/RFC] contrib: rerere-train overwrites existing resolutions

2017-07-25 Thread Junio C Hamano

Raman Gupta  writes:

> From 41116889f7eb2ddd590d165d9ca89646f7b15aaf Mon Sep 17 00:00:00 2001
> From: Raman Gupta 
> Date: Tue, 25 Jul 2017 10:28:35 -0400
> Subject: [PATCH] contrib: rerere-train overwrites existing resolutions

These do not belong to the body of the message.

Imagine that your title line appears in "git shortlog --no-merges"
output together with 600 other commits, and you see that list 3
months later.  Can you tell what this change was about?

To me, it sounds like a user is complaining that it overwrites and
loses a precious one, implying that the change is to fix it by being
more careful, i.e. checking if there is an existing one and avoid
overwriting it.

As our convention is to write the log message as if you are giving
an order to the codebase "to behave like so",

contrib/rerere-train: overwrite existing resolutions

would convey what you are doing better, I would think.

As to the change itself, I do anticipate that a user who is used to
rerere-train will complain that the updated one overwrites existing
resolutions, losing data, as the existing one didn't, and they are
used to the way to correct an incorrect resolution that is recorded
earlier, which is:

 * perform the botched merge again, which will materialize the
   botched merge result in the working tree;

 * do "git rerere forget" for the path;

 * correct the path with the right merge result; and then finally

 * do "git rerere".

If this new behaviour is triggered only when a command line option
is given (e.g. "--overwrite" which is better than "--force"), I
would imagine that it might give us an easier way to achieve the
same thing without learning about "rerere forget".  So I do not
fundamentally oppose to having such an option, but changing the
default behaviour is not going to fly.

> When running rerere-train, the user is explicitly asking the training to
> occur based on the current merge commit. However, if the current cache
> has a resolution for the same conflict (even if out of date or wrong),
> rerere-train does not currently update the rr-cache. Now, forget
> existing resolutions before training so that training is always
> reflective of the trained data.
> ---

Actually, the user is asking to record what is not recorded yet,
which is the originally intended use case.  You may fetch my 'pu'
branch into a repository you already have a copy of git.git, and
want to grab conflict resolutions I did between master..pu that you
still do not know about.  The tool is conservative and avoids
clobbering your resolution if you already have resolved some of the
merges yourself.

In any case, please make it a habit to sign-off your work when
posting here.

>  contrib/rerere-train.sh | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
> index 52ad9e4..a222d38 100755
> --- a/contrib/rerere-train.sh
> +++ b/contrib/rerere-train.sh
> @@ -34,6 +34,7 @@ do
>   # Cleanly merges
>   continue
>   fi
> + git rerere forget .
>   if test -s "$GIT_DIR/MERGE_RR"
>   then
>   git show -s --pretty=format:"Learning from %h %s" "$commit"

Re: Should I store large text files on Git LFS?

2017-07-25 Thread Jeff King

On Tue, Jul 25, 2017 at 01:52:46PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > As you can see, core.bigfilethreshold is a pretty blunt instrument. It
> > might be nice if .gitattributes understood other types of patterns
> > besides filenames, so you could do something like:
> >
> >   echo '[size > 500MB] delta -diff' >.gitattributes
> >
> > or something like that. I don't think it's come up enough for anybody to
> > care too much about it or work on it.
> 
> But attributes is about paths, at which a blob may or may not exist,
> so it is a bad fit to add conditionals that are based on sizes and
> types.

Do attributes _have_ to be about paths? In practice we often use them to
describe objects, and paths are just the only mechanism we give to refer
to objects.  But it is not actually a correct or rigorous mechanism in
some cases.  For example, imagine I have a .gitattributes with:

  foo -delta
  bar delta

and then imagine I have a tree with both "foo" and "bar" pointing to the
same blob. When I run pack-objects, it wants to know whether to delta
the object. What should it do?

The delta decision is really a property of the object. But the only
mechanism we give for selecting an object is by path, which we know is
not a one-to-one mapping with objects. So the results you get will
depend on which name we happened to see the object under first while
traversing.

I think the case you are getting at is something like clean filters,
where we might not have an object at all. In that case I would argue
that a property of an object could never be satisfied (so neither
"size > 500" nor "size <= 500" could match). Whether object properties
are meaningful is in the eye of the code that is looking up the value.
Or more generally, the set of properties to be matched is in the eye of
the caller. So looking up a clean filter might want to define the size
property based no the working tree size.

-Peff

Re: [RFC] Git rerere and non-conflicting changes during conflict resolution

2017-07-25 Thread Jeff King

On Tue, Jul 25, 2017 at 01:26:34PM -0700, Junio C Hamano wrote:

> This is not even a limitation but is outside the scope of rerere.
> Let's understand that first.
> [...]
> If we wanted to port the "merge-fix" logic, and I do wish it would
> happen some day, the update belongs to "git merge".

Looks like this crossed with my latest email. Overall I agree with you.

I almost said the same thing about scope initially, but I decided it
doesn't really matter. From the user's perspective there may be a tool X
that replays bits of a previous merge result. And that task can be
subdivided into replaying conflict resolution and replaying merge-fixes.

>From the user's perspective, calling X "rerere" would probably be OK[1].
But from an implementation perspective (and to keep the existing
plumbing available and unchanged), it probably makes sense to call it
something else, and have it run both rerere and a new plumbing command
to do the merge-fix work (or call it nothing, and assume that users will
either touch the plumbing directly or will use "git merge" to trigger
both).

So if you want to shut down immediately the idea that this would be
bolted onto rerere, I can support that. There are ways of doing it that
would make sense to combine with rerere (like the "tying fixups to
conflict resolution" sketch I gave in the other email), but I agree they
will end up fundamentally hacky (because of the exact "you may not even
have textual conflicts" I mentioned).

The only part I'd disagree with above is that this belongs to git-merge.
I think it should be its own plumbing tool that merge calls alongside
rerere. ;)

>- Then, it looks up the database to find the keys  where
>  A is in X but not in Y, and B is not in X but in Y.
>  These commits are cherry-picked and squashed into the result of
>  the above.

I think this is the crux of it. I mentioned in my other email that what
we really want is some way to say "this is roughly the same merge".
The Reintegrate script does it with the topic branch name and an
implicit "merging up to an integration branch".

Not having thought too hard about it yet, this containing relationship
seems like the right direction. I guess you'd do the lookup by computing
the merge-base M of  (which we already know anyway), walking M..X
and looking for any entries which mention those commits (in either A or
B slots of the entry), and then similarly narrowing it according to
M..Y.

Hrm. That doesn't quite work, though. Because if your  are the
merge, then merging a topic to next will get an "A" that is a merge
commit from next. But that commit will never end up in master. What's
causing the conflict is really some "A" that is in the history between
the merge base and "A" (but we don't know which).

So you'd almost have to do an intersection of the left side of "$(git
merge-base A B)..A" with what's in X and Y (with respect to their merge
base). Err, maybe vice versa. But the point is that we're looking for
overlapping set unions, I think, not the presence of particular tips.

> I said A and B in the above are branch names, but in the ideal
> world, they can be commit object names (possibly in the middle of a
> branch), as long as we can reliable update the database's keys every
> time "git rebase" etc. rewrites commits.

What if instead of commit hashes we used patch ids?

There's one trick there, which is that merges don't have a well-defined
commit id. We could use its actual commit id in that case. That would
work OK in practice for a workflow like git.git's, because the merge
commits are never rewritten. But it would fall down if people do mixed
rebases and merges on their topic branches.

> To populate the database, we'd need a reverse.
> 
>  * When merging branch B into branch A (or the other way around) for
>the first time, "git merge" would do what it currently does.
> 
>  * The user then concludes the merge to resolve *ONLY* the textual
>conflict, and commit the result.  It is important that no
>additional evil merge to correct for semantic conflicts is done
>in this step.  Note that if the auto-merge cleanly succeeds, this
>step may not even exist.
> 
>  * Then the user makes another commit to correct for semantic
>conflicts (aka "merge-fix").

I think it's asking a lot for users to handle the textual conflicts and
semantic ones separately. It would be nice if we could tell them apart
automatically (and I think we can based on what isn't part of the
conflict resolution).

That still ends up with one giant "fixup" commit. But I don't know how
else you'd do it. I could make several commits, but we still don't know
how to attribute them to anything but the mass  merge. We don't
know which commits were responsible for which fixups (and I wouldn't
want to ask the user to figure it out), so the best we can do is apply
them all.

-Peff

[PATCH v2] doc: add missing values "none" and "default" for diff.wsErrorHighlight

2017-07-25 Thread Andreas Heiduk

The values have eluded documentation so far. While at it streamline
the wording by grouping relevant parts together.

Signed-off-by: Andreas Heiduk 
---
 Documentation/diff-config.txt  | 11 +++
 Documentation/diff-options.txt | 17 -
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index cbce8ec63..5ca942ab5 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -200,7 +200,10 @@ diff.algorithm::
 +
 
 diff.wsErrorHighlight::
-   A comma separated list of `old`, `new`, `context`, that
-   specifies how whitespace errors on lines are highlighted
-   with `color.diff.whitespace`.  Can be overridden by the
-   command line option `--ws-error-highlight=`
+   Highlight whitespace errors in the `context`, `old` or `new`
+   lines of the diff.  Multiple values are separated by comma,
+   `none` resets previous values, `default` reset the list to
+   `new` and `all` is a shorthand for `old,new,context`.  The
+   whitespace errors are colored with `color.diff.whitespace`.
+   The command line option `--ws-error-highlight=`
+   overrides this setting.
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 89cc0f48d..d60f61ad4 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -300,15 +300,14 @@ ifndef::git-format-patch[]
with --exit-code.
 
 --ws-error-highlight=::
-   Highlight whitespace errors on lines specified by 
-   in the color specified by `color.diff.whitespace`.  
-   is a comma separated list of `old`, `new`, `context`.  When
-   this option is not given, only whitespace errors in `new`
-   lines are highlighted.  E.g. `--ws-error-highlight=new,old`
-   highlights whitespace errors on both deleted and added lines.
-   `all` can be used as a short-hand for `old,new,context`.
-   The `diff.wsErrorHighlight` configuration variable can be
-   used to specify the default behaviour.
+   Highlight whitespace errors in the `context`, `old` or `new`
+   lines of the diff.  Multiple values are separated by comma,
+   `none` resets previous values, `default` reset the list to
+   `new` and `all` is a shorthand for `old,new,context`.  When
+   this option is not given, and the configuration variable
+   `diff.wsErrorHighlight` is not set, only whitespace errors in
+   `new` lines are highlighted. The whitespace errors are colored
+   whith `color.diff.whitespace`.
 
 endif::git-format-patch[]
 
-- 
2.13.3

Re: Should I store large text files on Git LFS?

2017-07-25 Thread Junio C Hamano

Jeff King  writes:

> As you can see, core.bigfilethreshold is a pretty blunt instrument. It
> might be nice if .gitattributes understood other types of patterns
> besides filenames, so you could do something like:
>
>   echo '[size > 500MB] delta -diff' >.gitattributes
>
> or something like that. I don't think it's come up enough for anybody to
> care too much about it or work on it.

But attributes is about paths, at which a blob may or may not exist,
so it is a bad fit to add conditionals that are based on sizes and
types.

Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-25 Thread Junio C Hamano

Johannes Schindelin  writes:

> On Sat, 22 Jul 2017, Junio C Hamano wrote:
>
>> Johannes Schindelin  writes:
>> 
>> >> >> A very small hack on gettext.
>> >
>> > I am 100% opposed to this hack. It is already cumbersome enough to find
>> > out what is involved in i18n (it took *me* five minutes to find out that
>> > much of the information is in po/README, with a lot of information stored
>> > *on an external site*, and I still managed to miss the `make pot` target).
>> >
>> > If at all, we need to make things easier instead of harder.
>> >
>> > Requiring potential volunteers to waste their time to compile an
>> > unnecessary fork of gettext? Not so great an idea.
>> >
>> > Plus, each and every Git build would now have to compile their own
>> > gettext, too, as the vanilla one would not handle the .po files containing
>> > %!!!
>> >
>> > And that requirement would impact instantaneously people like me, and even
>> > worse: some other packagers might be unaware of the new requirement which
>> > would not be caught during the build, and neither by the test suite.
>> > Double bad idea.
>> 
>> If I understand correctly, the patch hacks the input processing of
>> xgettext (which reads our source code and generates po/git.pot) so
>> that when it sees PRItime, pretend that it saw PRIuMAX, causing it
>> to output % in its output.
>
> Oh, I missed that. That's even worse, as it precludes what you were
> wishing for: to replace timestamp_t by a signed data type eventually.

Yup, Jiang's plan was to update the custom edition of xgettext when
it happens and the Makefile patch has a provision to avoid mistakes.

If Jiang's patch were extended so that xgettext would take a command
line option "--custom-priformat=PRItime=PRIuMAX" and upstreamed
and wildy deployed already, that would have been a good solution.
That might be a preferred outcome that may benefit other projects,
but it won't happen for at least 3 years if not more X-<.

Re: [PATCH] git-gui (MinGW): make use of MSys2's msgfmt

2017-07-25 Thread Junio C Hamano

Johannes Schindelin  writes:

>   Pat, I still have a couple of Pull Request open in your repository
>   at https://github.com/patthoyts/git-gui/pulls/dscho that await any
>   reaction since October 14th last year. Please let me know when you
>   are ready to accept code contributions again. In the meantime, I
>   will send git-gui patches to this here mailing list, Cc:ing you,
>   and hoping that Junio will take the patches.

I'll try to fetch the latest from Pat, apply this on top and queue
it in my tree in the meantime, but that is a lot more involved than
simply applying a single patch to a new branch and merging it to
'pu', so it will be done later in today's integration cycle after I
handle other topics.

Thanks for working on this.

Re: [RFC PATCH 0/3] Partial clone: promised blobs (formerly "missing blobs")

2017-07-25 Thread Philip Oakley

Sorry for the delay - been away...

From: "Ben Peart" 
Sent: Monday, July 17, 2017 6:43 PM

On 7/16/2017 11:23 AM, Philip Oakley wrote:

From: "Jonathan Tan" 
Sent: Tuesday, July 11, 2017 8:48 PM

These patches are part of a set of patches implementing partial clone,
as you can see here:

https://github.com/jonathantanmy/git/tree/partialclone

In that branch, clone with batch checkout works, as you can see in the
README. The code and tests are generally done, but some patches are
still missing documentation and commit messages.

These 3 patches implement the foundational concept - formerly known as
"missing blobs" in the "missing blob manifest", I decided to call them
"promised blobs". The repo knows their object names and sizes. It also
does not have the blobs themselves, but can be configured to know how to
fetch them.

If I understand correctly, this method doesn't give any direct user 
visibility of missing blobs in the file system. Is that correct?

That is correct

I was hoping that eventually the various 'on demand' approaches would 
still allow users to continue to work as they go off-line such that they 
can see directly (in the FS) where the missing blobs (and trees) are 
located, so that they can continue to commit new work on existing files.

This is a challenge as git assumes all objects are always available (that 
is a key design principal of a DVCS) so any missing object is considered a 
corruption that typically results in a call to "die."

My view/concept was more based on the fact that Git is happy to have missing 
'trees', as long as they are submodules ;-), so I was hoping to massage that 
so git could carry on working as if the whole 'tree' (or blob when they were 
omitted) was still present in as 'unchanged', so the oid's would stay as 
they were.

I see that you don't omit the trees, which would be more common in my type 
of environment (defence/security). I expect in an idealised BigWin repo the 
same would also be true - user only gets /Office/Excel if that's what they 
are working on ;-)

The GVFS solution gets around this by ensuring any missing object is 
retrieved on behalf of git so that it never sees it as missing.  The 
obvious tradeoff is that this requires a network connection so the object 
can be retrieved.

In my concept, the user would not have the opportunity to fetch the 
tree/blob, but could replace it in its entirety (we'd still have the meta 
data of the tree/blob name and it's old oid, but couldn't do a diff)

I had felt that some sort of 'gitlink' should be present (huma readable) 
as a place holder for the missing blob/tree. e.g. 'gitblob: 1234abcd' 
(showing the missing oid, jsut like sub-modules can do - it's no 
different really.

We explored that option briefly but when you have a large number of files, 
even writing out some sort of place holder can take a very long time.  In 
fact, since the typical source file is relatively small (a few kilobytes), 
writing out a placeholder doesn't save much time vs just writing out the 
actual file contents.

Another challenge is that even if there is a placeholder written to disk, 
you still need a network connection to retrieve the actual contents 
if/when it is needed.

I was viewing the 'missing' tree/blobs to be part of a narrow clone concept, 
so the user would need to explicitly widen the narrow clone to get missing 
trees/blobs (which could have been omitted by age, size, name, style of a 
.gitNarrowIgnore spec etc)

I'm concerned that the various GVFS extensions haven't fully achieved a 
separation of concerns surrounding the DVCS capability for 
on-line/off-line conversion as comms drop in and out. The GVFS looks 
great for a fully networked, always on, environment, but it would be good 
to also have the sepration for those who (will) have shallow/narrow 
clones that may also need to work with a local upstream that is also 
shallow/narrow.

You are correct that this hasn't been tackled yet. It is a challenging 
problem. I can envision something along the lines of what was done for the 
shallow clone feature where there are distinct ways to change the set of 
objects that are available but that would hopefully come in some future 
patch series.

OK. That's good to know. If the GFVS could be expanded to create a type of 
Narrow Clone capability  so that the 'going off-line' problem easily 
transitions between being just a neat VFS and then to being a neat narrow 
clone, and  that it may solve two problems in one.

I had it in my mind that the missing blobs/trees could be simply stubbed out 
within the repo itself, as just the oid ref, or maybe the oid ref plus the 
length (given that size is one of the common causes on not wanting the 
content just yet). The repo could still be packed etc, as long as the format 
is understood.

--
Philip

--
Philip
I wanted to at least get my thoughts into the discussion before it all 
passes by.

Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-25 Thread Junio C Hamano

Johannes Schindelin  writes:

> Hi Junio,
>
> On Sat, 22 Jul 2017, Junio C Hamano wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > On Fri, 21 Jul 2017, Junio C Hamano wrote:
>> >
>> >> Jean-Noël Avila  writes:
>> >> 
>> >> > Le 20/07/2017 à 20:57, Junio C Hamano a écrit :
>> >> >>
>> >> >> +  git diff --quiet HEAD && git diff --quiet --cached
>> >> >> +
>> >> >> +  @for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \
>> >> >
>> >> > Does PRIuMAX make sense for perl and sh files?
>> >> 
>> >> Not really; I did this primarily because I would prefer to keep
>> >> things consistent, anticipating there may be some other things we
>> >> need to replace before running gettext(1) for other reasons later.
>> >
>> > It would add unnecessary churn, too, to add those specific exclusions and
>> > make things inconsistent: the use of PRItime in Perl or shell scripts
>> > would already make those scripts barf. And if it is unnecessary churn...
>> > let's not do it?
>> 
>> Sorry, but I cannot quite tell if you are in favor of limiting the
>> set of source files that go through the sed substitution (because we
>> know PRIuMAX is just as nonsensical as PRItime in perl and shell
>> source), or if you are in favor of keeping the patch as-is (because
>> changing the set of source files is a churn and substitutions would
>> not hurt)?
>
> I was in favor of keeping the simplest strategy: simply cover all files,
> including Perl and Unix shell scripts. It would not bring any benefit to
> exclude them.

OK.  I actually was OK to limit the potential damage to C sources,
but it does not matter that much in the bigger picture.

Re: [RFC] Git rerere and non-conflicting changes during conflict resolution

2017-07-25 Thread Junio C Hamano

Junio C Hamano  writes:

> To populate the database, we'd need a reverse.
> ...
>  * Then the user tells Git that semantic conflicts were resolved and
>need to be recorded (just like running "git rerere" manually,
>before "git commit" automatically does it for them these days).
>This will result in the following:
>
>- The database is updated so that key  yields the
>  "merge-fix" commit;
> ...

I probably should have been aiming for stars, as I were outlining
the ideal merge-fix logic.  The key  is merely a default, and
the worst one at that.  There should be a way for the user to tell
which exact pair of commits (i.e. another side branch that was
merged earlier to the mainline A that renamed 'xyzzy' to 'frotz'
wholesale, and the exact commit on the side branch B that added an
extra mention of 'xyzzy').  

If the logic can figure out what these two commits are without
user's help, mechanically by only looking at the merge-fix commit,
that would be even better.  But I do not believe in miracles, so...

Re: [PATCH v2 2/2] sub-process: refactor handshake to common function

2017-07-25 Thread Brandon Williams

On 07/25, Jonathan Tan wrote:
> Refactor, into a common function, the version and capability negotiation
> done when invoking a long-running process as a clean or smudge filter.
> This will be useful for other Git code that needs to interact similarly
> with a long-running process.
> 
> As you can see in the change to t0021, this commit changes the error
> message reported when the long-running process does not introduce itself
> with the expected "server"-terminated line. Originally, the error
> message reports that the filter "does not support filter protocol
> version 2", differentiating between the old single-file filter protocol
> and the new multi-file filter protocol - I have updated it to something
> more generic and useful.
> 
> Signed-off-by: Jonathan Tan 
> ---
>  convert.c | 61 -
>  pkt-line.c| 19 ---
>  pkt-line.h|  2 --
>  sub-process.c | 94 
> +++
>  sub-process.h | 26 ++
>  t/t0021-conversion.sh |  2 +-
>  6 files changed, 128 insertions(+), 76 deletions(-)
> 
> diff --git a/convert.c b/convert.c
> index deaf0ba7b..ec8ecc2ea 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -512,62 +512,15 @@ static struct hashmap subprocess_map;
>  
>  static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
>  {
> - int err;
> + static int versions[] = {2, 0};
> + static struct subprocess_capability capabilities[] = {
> + {"clean", CAP_CLEAN}, {"smudge", CAP_SMUDGE}, {NULL, 0}
> + };
>   struct cmd2process *entry = (struct cmd2process *)subprocess;
> - struct string_list cap_list = STRING_LIST_INIT_NODUP;
> - char *cap_buf;
> - const char *cap_name;
> - struct child_process *process = >process;
> - const char *cmd = subprocess->cmd;
> -
> - sigchain_push(SIGPIPE, SIG_IGN);
> -
> - err = packet_writel(process->in, "git-filter-client", "version=2", 
> NULL);
> - if (err)
> - goto done;
> -
> - err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
> - if (err) {
> - error("external filter '%s' does not support filter protocol 
> version 2", cmd);
> - goto done;
> - }
> - err = strcmp(packet_read_line(process->out, NULL), "version=2");
> - if (err)
> - goto done;
> - err = packet_read_line(process->out, NULL) != NULL;
> - if (err)
> - goto done;
> -
> - err = packet_writel(process->in, "capability=clean", 
> "capability=smudge", NULL);
> -
> - for (;;) {
> - cap_buf = packet_read_line(process->out, NULL);
> - if (!cap_buf)
> - break;
> - string_list_split_in_place(_list, cap_buf, '=', 1);
> -
> - if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, 
> "capability"))
> - continue;
> -
> - cap_name = cap_list.items[1].string;
> - if (!strcmp(cap_name, "clean")) {
> - entry->supported_capabilities |= CAP_CLEAN;
> - } else if (!strcmp(cap_name, "smudge")) {
> - entry->supported_capabilities |= CAP_SMUDGE;
> - } else {
> - warning(
> - "external filter '%s' requested unsupported 
> filter capability '%s'",
> - cmd, cap_name
> - );
> - }
> -
> - string_list_clear(_list, 0);
> - }
> -
> -done:
> - sigchain_pop(SIGPIPE);
>  
> - return err;
> + return subprocess_handshake(subprocess, "git-filter-", versions, NULL,
> + capabilities,
> + >supported_capabilities);
>  }
>  
>  static int apply_multi_file_filter(const char *path, const char *src, size_t 
> len,
> diff --git a/pkt-line.c b/pkt-line.c
> index 9d845ecc3..7db911957 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -171,25 +171,6 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
>   return status;
>  }
>  
> -int packet_writel(int fd, const char *line, ...)
> -{
> - va_list args;
> - int err;
> - va_start(args, line);
> - for (;;) {
> - if (!line)
> - break;
> - if (strlen(line) > LARGE_PACKET_DATA_MAX)
> - return -1;
> - err = packet_write_fmt_gently(fd, "%s\n", line);
> - if (err)
> - return err;
> - line = va_arg(args, const char*);
> - }
> - va_end(args);
> - return packet_flush_gently(fd);
> -}
> -
>  static int packet_write_gently(const int fd_out, const char *buf, size_t 
> size)
>  {
>   static char packet_write_buffer[LARGE_PACKET_MAX];
> diff --git a/pkt-line.h b/pkt-line.h
> index 450183b64..66ef610fc 100644
> --- a/pkt-line.h
> +++ b/pkt-line.h

Re: [RFC] Git rerere and non-conflicting changes during conflict resolution

2017-07-25 Thread Junio C Hamano

Jeff King  writes:

>> 1) Is this a known limitation or is there a reason rerere works in
>> this manner?
>
> Yes, it's known. Rerere works by storing a mapping of conflicted hunks
> to their resolutions. If there's no conflicted hunk, I'm not sure how
> we'd decide what to feed into the mapping to see if there is some
> content to be replaced.

Correct.  

This is not even a limitation but is outside the scope of rerere.
Let's understand that first.

A semantic conflict that requires an evil merge that touches a file
that is not involved in any textual conflict during a merge will
happen even if there is *NO* textual merge conflict.  

Imagine that there is a global variable 'xyzzy' used in many places
in the code, and then a side branch forks from the mainline.  The
mainline renames the variable to 'frotz' in the entire codebase,
while the side branch adds one more place that the variable is used
under its original name.  Then you merge these two branches.  This
will textually merge cleanly if the place the side branch adds a new
mention of 'xyzzy' is textually far from any block of text in the
common ancestor that has been updated on the mainline while these
two branches diverged.

"git checkout mainline && git merge side" will cleanly automerge,
yet the result is not correct.  The new mention of 'xyzzy' added by
the merge needs to be corrected to 'frotz'.

Now we take that as the baseline and further imagine that during the
time these two branches diverged, the mainline also updated some
documentation of something totally unrelated to 'xyzzy' vs 'frotz'
variable.  Perhaps README was updated, or something.  The side
branch also updated the same file in a different way.  This time,
the changes to this same file may result in textual conflict.

"git checkout mainline && git merge side" will result in a conflict,
whose resolution may be recorded by rerere for that file.  It should
be crystal clear that this conflict does *not* have anything to do
with the semantic conflict between 'xyzzy' vs 'frotz'.  

The realization we must draw from the above observation is that what
the "merge-fix" machinery in the Reintegrate script you cited in
your message tries to help, which is the semantic conflict,
fundamentally cannot be tied to any textual merge conflict that may
(or may not) happen.  That is what makes the issue outside the scope
of rerere.

The above is not to say that the need to record and replay such evil
merges to solve semantic conflict does not exist.  Far from it.  It
is just clarifying that it is a wrong approach to try to "teach"
rerere to somehow handle that case as well.

If we wanted to port the "merge-fix" logic, and I do wish it would
happen some day, the update belongs to "git merge".

You were too kind to call the "merge-fix" logic in Reintegrate "the
state of the art", but I am not happy about its limitation.  Here
are the things I wish to have in an ideal version of the "merge-fix"
logic, which does not exist yet:

 * There is a database of "to be cherry-picked" commits, keyed by a
   pair of branch names.  That is, given branches A and B, the
   database will return 0 or more commits that can be cherry-picked.
   The order of branch names in the pair is immaterial, i.e. asking
   the database for cherry-pickable commits keyed by  and
   keyed by  will yield the identical set of commits.

 * When merging commit X to commit Y, "git merge" in the ideal world
   does the following:

   - It first does what it currently does, i.e. three-way merge with
 the merge strategy and applying rerere for re-application of an
 earlier resolution of textual conflicts.

   - Then, it looks up the database to find the keys  where
 A is in X but not in Y, and B is not in X but in Y.
 These commits are cherry-picked and squashed into the result of
 the above.

The intent is that a pair  represents the mainline and side
branch in the above example, where A renamed 'xyzzy' to 'frotz' and
B added new reference to 'xyzzy'.  And the cherry-pickable commit
found in the database is to tweak the 'xyzzy' B adds into 'frotz'.

I said A and B in the above are branch names, but in the ideal
world, they can be commit object names (possibly in the middle of a
branch), as long as we can reliable update the database's keys every
time "git rebase" etc. rewrites commits.

To populate the database, we'd need a reverse.

 * When merging branch B into branch A (or the other way around) for
   the first time, "git merge" would do what it currently does.

 * The user then concludes the merge to resolve *ONLY* the textual
   conflict, and commit the result.  It is important that no
   additional evil merge to correct for semantic conflicts is done
   in this step.  Note that if the auto-merge cleanly succeeds, this
   step may not even exist.

 * Then the user makes another commit to correct for semantic
   conflicts (aka "merge-fix").

 * Then the user tells Git that

Re: [RFC] Git rerere and non-conflicting changes during conflict resolution

2017-07-25 Thread Jeff King

On Tue, Jul 25, 2017 at 03:54:32PM -0400, Raman Gupta wrote:

> > That said, I'm far from an expert on how rerere works. Junio might have
> > ideas on how we could handle this better. But I do note that for
> > repeated integration runs (like we do for topics in git.git, as they get
> > merged to "pu", then "next", then "master"), he keeps non-conflict
> > fixups in a separate commit which gets squashed into the merge
> > automatically. See
> > 
> >   https://github.com/git/git/blob/todo/Reintegrate#L185-L191
> 
> Seems relatively simple to me, at least conceptually.
> 
> 1) Store the state of the index after the merge.
> 
> 2) After conflict resolution is complete (i.e. user executes "git
> commit"), diff index @ step 1 with commit.
> 
> 3) Assume that all changes in that diff are related to conflict
> resolution (as they should be), and save that diff to the rerere cache.
> 
> I could be missing something fundamental here though...

That describes the "save" step. How do we do the apply step? We later do
a merge and see that there are some conflicts. We compute a hash for
each conflict, and see if it has a resolution in the database.

But how do we decide which hashes to look up for the parts that are
outside of a conflict?

In theory you want those fixups to "ride along" with the other
conflicts. But which fixups ride along with which conflicts? You really
want some way to uniquely identify a particular merge. But of course you
can't use the content, because the whole point of rerere is that you're
doing a _similar_ merge in another context. And that's why Reintegrate
ties it to the branch being merged. That gives us the unique identifier.
But rerere normally does not concern itself with that.

I think you could mark the fixups as potential ride-alongs for each
conflict that's resolved. And then when we apply a resolution, tell the
user "by the way, you might want these fixups, too" (or maybe even try
to apply them automatically). That will suggest fixups that may not be
relevant anymore, but if the next "save" includes a particular
resolution but _not_ a potential fixup, then that fixup can be culled
from its list.

There's still a corner case that doesn't handle, though: you might need
fixups even if you have no conflicts at all.

So we may be stumbling towards an answer, but it still sounds kind of
hacky and fragile.

> > As far as I know, something like the Reintegrate script above is the
> > state of the art. IMHO it would be useful if something similar were
> > integrated into rerere, but I'm not sure exactly how it would know when
> > to trigger.
> 
> I've seen the Reintegrate script before. It is very specific to the
> git.git workflow. I think it makes sense to expose this particular
> capability in git proper, given that rerere itself is exposed. Plus
> that could actually simplify the Reintegrate script a bit.

Right, sorry if I wasn't clear; my "something similar" meant only the
fixup-applying lines I linked to.

> > But I don't think even then you can ever trust rerere fully.
> > Fundamentally you're applying some changes from one merge into another
> > context. There may be new sites that also need fixing up, and the tool
> > has no way to know. So you should treat a rerere-helped merge as any
> > other merge: assume it's a good starting point but use other tools (like
> > the compiler or automated tests) to confirm that the result is sensible.
> 
> While I agree that every case cannot be handled, and this type of
> validation is still necessary, I believe git should cover the cases
> which it is proper and possible to cover.

Yes, I agree that some help is usually better than none (although we
need to be careful that any improvement is honest about its limitations
for any given case, and doesn't end up tricking the user into a false
sense of security).

-Peff

Re: [PATCH v2 1/2] Documentation: migrate sub-process docs to header

2017-07-25 Thread Brandon Williams

On 07/25, Jonathan Tan wrote:
> Move the documentation for the sub-process API from a separate txt file
> to its header file.
> 
> Signed-off-by: Jonathan Tan 

I really like this change!

> ---
>  Documentation/technical/api-sub-process.txt | 59 
> -
>  sub-process.h   | 25 +++-
>  2 files changed, 23 insertions(+), 61 deletions(-)
>  delete mode 100644 Documentation/technical/api-sub-process.txt
> 
> diff --git a/Documentation/technical/api-sub-process.txt 
> b/Documentation/technical/api-sub-process.txt
> deleted file mode 100644
> index 793508cf3..0
> --- a/Documentation/technical/api-sub-process.txt
> +++ /dev/null
> @@ -1,59 +0,0 @@
> -sub-process API
> -===
> -
> -The sub-process API makes it possible to run background sub-processes
> -for the entire lifetime of a Git invocation. If Git needs to communicate
> -with an external process multiple times, then this can reduces the process
> -invocation overhead. Git and the sub-process communicate through stdin and
> -stdout.
> -
> -The sub-processes are kept in a hashmap by command name and looked up
> -via the subprocess_find_entry function.  If an existing instance can not
> -be found then a new process should be created and started.  When the
> -parent git command terminates, all sub-processes are also terminated.
> -
> -This API is based on the run-command API.
> -
> -Data structures
> 
> -
> -* `struct subprocess_entry`
> -
> -The sub-process structure.  Members should not be accessed directly.
> -
> -Types
> --
> -
> -'int(*subprocess_start_fn)(struct subprocess_entry *entry)'::
> -
> - User-supplied function to initialize the sub-process.  This is
> - typically used to negotiate the interface version and capabilities.
> -
> -
> -Functions
> --
> -
> -`cmd2process_cmp`::
> -
> - Function to test two subprocess hashmap entries for equality.
> -
> -`subprocess_start`::
> -
> - Start a subprocess and add it to the subprocess hashmap.
> -
> -`subprocess_stop`::
> -
> - Kill a subprocess and remove it from the subprocess hashmap.
> -
> -`subprocess_find_entry`::
> -
> - Find a subprocess in the subprocess hashmap.
> -
> -`subprocess_get_child_process`::
> -
> - Get the underlying `struct child_process` from a subprocess.
> -
> -`subprocess_read_status`::
> -
> - Helper function to read packets looking for the last "status="
> - key/value pair.
> diff --git a/sub-process.h b/sub-process.h
> index 96a2cca36..9e6975b5e 100644
> --- a/sub-process.h
> +++ b/sub-process.h
> @@ -6,12 +6,23 @@
>  #include "run-command.h"
>  
>  /*
> - * Generic implementation of background process infrastructure.
> - * See: Documentation/technical/api-sub-process.txt
> + * The sub-process API makes it possible to run background sub-processes
> + * for the entire lifetime of a Git invocation. If Git needs to communicate
> + * with an external process multiple times, then this can reduces the process
> + * invocation overhead. Git and the sub-process communicate through stdin and
> + * stdout.
> + *
> + * The sub-processes are kept in a hashmap by command name and looked up
> + * via the subprocess_find_entry function.  If an existing instance can not
> + * be found then a new process should be created and started.  When the
> + * parent git command terminates, all sub-processes are also terminated.
> + * 
> + * This API is based on the run-command API.
>   */
>  
>   /* data structures */
>  
> +/* Members should not be accessed directly. */
>  struct subprocess_entry {
>   struct hashmap_entry ent; /* must be the first member! */
>   const char *cmd;
> @@ -20,21 +31,31 @@ struct subprocess_entry {
>  
>  /* subprocess functions */
>  
> +/* Function to test two subprocess hashmap entries for equality. */
>  extern int cmd2process_cmp(const void *unused_cmp_data,
>  const struct subprocess_entry *e1,
>  const struct subprocess_entry *e2,
>  const void *unused_keydata);
>  
> +/*
> + * User-supplied function to initialize the sub-process.  This is
> + * typically used to negotiate the interface version and capabilities.
> + */
>  typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
> +
> +/* Start a subprocess and add it to the subprocess hashmap. */
>  int subprocess_start(struct hashmap *hashmap, struct subprocess_entry 
> *entry, const char *cmd,
>   subprocess_start_fn startfn);
>  
> +/* Kill a subprocess and remove it from the subprocess hashmap. */
>  void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry 
> *entry);
>  
> +/* Find a subprocess in the subprocess hashmap. */
>  struct subprocess_entry *subprocess_find_entry(struct hashmap *hashmap, 
> const char *cmd);
>  
>  /* subprocess helper functions */
>  
> +/* Get the underlying `struct child_process` from a subprocess. */

Re: [RFC] Git rerere and non-conflicting changes during conflict resolution

2017-07-25 Thread Raman Gupta

On 25/07/17 01:52 PM, Jeff King wrote:
> On Tue, Jul 25, 2017 at 11:09:55AM -0400, Raman Gupta wrote:
> 
>> I had an interesting situation today: resolving a merge conflict
>> required modification in other files that were not themselves conflicting.
>>
>> I just realized that rerere does not remember any changes to these
>> additional files -- only changes to the conflicting files. This makes
>> the end result of rerere obviously incorrect in this situation.
>>
>> So my questions are:
>>
>> 1) Is this a known limitation or is there a reason rerere works in
>> this manner?
> 
> Yes, it's known. Rerere works by storing a mapping of conflicted hunks
> to their resolutions. If there's no conflicted hunk, I'm not sure how
> we'd decide what to feed into the mapping to see if there is some
> content to be replaced.
> 
> That said, I'm far from an expert on how rerere works. Junio might have
> ideas on how we could handle this better. But I do note that for
> repeated integration runs (like we do for topics in git.git, as they get
> merged to "pu", then "next", then "master"), he keeps non-conflict
> fixups in a separate commit which gets squashed into the merge
> automatically. See
> 
>   https://github.com/git/git/blob/todo/Reintegrate#L185-L191

Seems relatively simple to me, at least conceptually.

1) Store the state of the index after the merge.

2) After conflict resolution is complete (i.e. user executes "git
commit"), diff index @ step 1 with commit.

3) Assume that all changes in that diff are related to conflict
resolution (as they should be), and save that diff to the rerere cache.

I could be missing something fundamental here though...

>> 1b) If it is a limitation/bug, what would be needed to fix it? With
>> some guidance, I might be able to submit a patch...
> 
> As far as I know, something like the Reintegrate script above is the
> state of the art. IMHO it would be useful if something similar were
> integrated into rerere, but I'm not sure exactly how it would know when
> to trigger.

I've seen the Reintegrate script before. It is very specific to the
git.git workflow. I think it makes sense to expose this particular
capability in git proper, given that rerere itself is exposed. Plus
that could actually simplify the Reintegrate script a bit.

>> 2) In the meantime, is there a way I can identify these cases, without
>> which I cannot really trust rerere is doing the right thing?
> 
> I do think it would be useful if rerere could look at a merge result and
> say "OK, I've recorded these bits, but there are other lines that are
> not part of either parent and which are not part of a conflict". That
> gives you a warning that such lines need to be part of a fixup (rather
> than you being surprised when you redo the merge later and have to
> rework the fixup).

Agreed.

> But I don't think even then you can ever trust rerere fully.
> Fundamentally you're applying some changes from one merge into another
> context. There may be new sites that also need fixing up, and the tool
> has no way to know. So you should treat a rerere-helped merge as any
> other merge: assume it's a good starting point but use other tools (like
> the compiler or automated tests) to confirm that the result is sensible.

While I agree that every case cannot be handled, and this type of
validation is still necessary, I believe git should cover the cases
which it is proper and possible to cover.

> -Peff
> 

Regards,
Raman

Re: What's cooking in git.git (Jul 2017, #07; Mon, 24)

2017-07-25 Thread Junio C Hamano

Johannes Schindelin  writes:

> Hi Junio,
>
> On Mon, 24 Jul 2017, Junio C Hamano wrote:
>
>> * js/blame-lib (2017-07-24) 1 commit
>>  - blame: fix memory corruption scrambling revision name in error message
>> 
>>  A hotfix to a topic already in 'master'.
>> 
>>  Will merge to 'next'.
>
> This sounds more critical than `next`, in particular since you said that
> you do not want to move anything from `next` to `master` before 2.14
> final.

That is why it is not "Will merge to and cook in 'next'" which is a
norm for new things during -rc.

>>  The final batch to "git rebase -i" updates to move more code from
>>  the shell script to C.
>> 
>>  Expecting a reroll.
>
> Since you said yourself that you won't pick up v6...

Read the message again and realize that I said won't during -rc
but may hold onto it.  I am intending to queue it on 2.14 final
to merge to 'pu' until the topic is rerolled again.

Re: What's cooking in git.git (Jul 2017, #07; Mon, 24)

2017-07-25 Thread Johannes Schindelin

Hi Junio,

On Mon, 24 Jul 2017, Junio C Hamano wrote:

> * js/blame-lib (2017-07-24) 1 commit
>  - blame: fix memory corruption scrambling revision name in error message
> 
>  A hotfix to a topic already in 'master'.
> 
>  Will merge to 'next'.

This sounds more critical than `next`, in particular since you said that
you do not want to move anything from `next` to `master` before 2.14
final.

> * js/rebase-i-final (2017-06-15) 10 commits
>  - rebase -i: rearrange fixup/squash lines using the rebase--helper
>  - t3415: test fixup with wrapped oneline
>  - rebase -i: skip unnecessary picks using the rebase--helper
>  - rebase -i: check for missing commits in the rebase--helper
>  - t3404: relax rebase.missingCommitsCheck tests
>  - rebase -i: also expand/collapse the SHA-1s via the rebase--helper
>  - rebase -i: do not invent onelines when expanding/collapsing SHA-1s
>  - rebase -i: remove useless indentation
>  - rebase -i: generate the script via rebase--helper
>  - t3415: verify that an empty instructionFormat is handled as before
> 
>  The final batch to "git rebase -i" updates to move more code from
>  the shell script to C.
> 
>  Expecting a reroll.

Since you said yourself that you won't pick up v6 (even if I made clear
that I have a strong preference for resolving merge conflicts myself
rather than have you guess), it may be a good idea to change this from
"Expecting a reroll" to something else?

> * jc/http-sslkey-and-ssl-cert-are-paths (2017-07-20) 1 commit
>   (merged to 'next' on 2017-07-20 at 5489304b99)
>  + http.c: http.sslcert and http.sslkey are both pathnames
> 
>  The http.{sslkey,sslCert} configuration variables are to be
>  interpreted as a pathname that honors "~[username]/" prefix, but
>  weren't, which has been fixed.
> 
>  Will cook in 'next'.

Just so you know: an identical patch has been cooking in Git for Windows
as 26b08ecec8d37b976be9e85055a0a9e3d16a56da since Dec 11 2015.

> * wd/rebase-conflict-guide (2017-07-17) 1 commit
>   (merged to 'next' on 2017-07-20 at c78e758b23)
>  + rebase: make resolve message clearer for inexperienced users
> 
>  Code clean-up.

This is not a code clean-up. It is an improvement of the user experience.

Ciao,
Dscho

P.S.: Sorry for not branching off of the thread with proper ",
was: What's cooking" subjects, but I am seriously short on time. Please
accept my apologies.

Re: Should I store large text files on Git LFS?

2017-07-25 Thread Jeff King

On Tue, Jul 25, 2017 at 06:06:49PM +1000, Andrew Ardill wrote:

> Let's have a look:
> 
> $ git rev-list --objects --all |
>   git cat-file --batch-check='%(objectsize:disk) %(objectsize)
> %(deltabase) %(rest)'
> 174 262 
> 171 260 
> 139 212 
> 47 36 
> 377503831 2310238304  data.txt
> 47 36 
> 500182546 3740427683  data.txt
> 47 36 
> 447340264 3357717475  data.txt
> 
> Yep, all zlib.

OK, that makes sense.

> What do you think is a reasonable config for storing text files this
> large, to get good delta compression, or is it more of a trial and
> error to find out what works best?

I think it would really depend on what's in your repo. If you just have
gigantic text files and no big binaries, and you have enough RAM to do
diffs on the text files, it's not unreasonable to just send
core.bigfilethreshold to something really big and not worry about it.

In general, a diff is going to want memory at least 2x the size of the
file (for the old and new images). And we tend to keep in memory all of
the images for a single tree-diff at one time (so if you touched two
gigantic files in one commit, then "git log -p" is probably going to
peak at having all four before/after images in memory at once).

If you just want deltas but not diffs, you can probably do:

  echo '*.gigantic -diff' >.gitattributes
  git config core.bigfilethreshold 10G

I think that will turn off streaming of the blobs in some code paths,
too. But hopefully a _single_ copy of each file would be OK to hold in
RAM. If it's not, you might also be able to get away with packing once
with:

  git -c core.bigfilethreshold=10G repack -adf

and then further repacks will carry those deltas forward. I think we
only apply the limit when actively searching for new deltas, not when
reusing existing ones.

As you can see, core.bigfilethreshold is a pretty blunt instrument. It
might be nice if .gitattributes understood other types of patterns
besides filenames, so you could do something like:

  echo '[size > 500MB] delta -diff' >.gitattributes

or something like that. I don't think it's come up enough for anybody to
care too much about it or work on it.

-Peff

Re: [PATCH] hash: Allow building with the external sha1dc library

2017-07-25 Thread Junio C Hamano

Takashi Iwai  writes:

> Some distros provide SHA1 collision detect code as a shared library.
> It's the very same code as we have in git tree, and git can link with
> it as well; at least, it may make maintenance easier, according to our
> security guys.
>
> This patch allows user to build git linking with the external sha1dc
> library instead of the built-in sha1dc code.  User needs to define
> DC_SHA1_EXTERNAL explicitly.  As default, the built-in sha1dc code is
> used like before.
>
> Signed-off-by: Takashi Iwai 
> ---

I do not have such an environment to test this patch, but it looks
like a very sensible thing to do.  Will queue; thanks.

>  Makefile | 12 
>  hash.h   |  4 +++-
>  sha1dc_git_ext.c | 11 +++
>  sha1dc_git_ext.h | 25 +
>  4 files changed, 51 insertions(+), 1 deletion(-)
>  create mode 100644 sha1dc_git_ext.c
>  create mode 100644 sha1dc_git_ext.h
>
> diff --git a/Makefile b/Makefile
> index 461c845d33cb..f1a262d56254 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -162,6 +162,12 @@ all::
>  # algorithm. This is slower, but may detect attempted collision attacks.
>  # Takes priority over other *_SHA1 knobs.
>  #
> +# Define DC_SHA1_EXTERNAL in addition to DC_SHA1 if you want to build / link
> +# git with the external sha1collisiondetection library.
> +# Without this option, i.e. the default behavior is to build git with its
> +# own sha1dc code.  If any extra linker option is required, define them in
> +# DC_SHA1_LINK variable in addition.
> +#
>  # Define DC_SHA1_SUBMODULE in addition to DC_SHA1 to use the
>  # sha1collisiondetection shipped as a submodule instead of the
>  # non-submodule copy in sha1dc/. This is an experimental option used
> @@ -1472,6 +1478,11 @@ ifdef APPLE_COMMON_CRYPTO
>   BASIC_CFLAGS += -DSHA1_APPLE
>  else
>   DC_SHA1 := YesPlease
> +ifdef DC_SHA1_EXTERNAL
> + LIB_OBJS += sha1dc_git_ext.o
> + BASIC_CFLAGS += -DSHA1_DC -DDC_SHA1_EXTERNAL
> + EXTLIBS += $(DC_SHA1_LINK) -lsha1detectcoll
> +else
>  ifdef DC_SHA1_SUBMODULE
>   LIB_OBJS += sha1collisiondetection/lib/sha1.o
>   LIB_OBJS += sha1collisiondetection/lib/ubc_check.o
> @@ -1492,6 +1503,7 @@ endif
>  endif
>  endif
>  endif
> +endif
>  
>  ifdef SHA1_MAX_BLOCK_SIZE
>   LIB_OBJS += compat/sha1-chunked.o
> diff --git a/hash.h b/hash.h
> index bef3e630a093..dce327d58d07 100644
> --- a/hash.h
> +++ b/hash.h
> @@ -8,7 +8,9 @@
>  #elif defined(SHA1_OPENSSL)
>  #include 
>  #elif defined(SHA1_DC)
> -#ifdef DC_SHA1_SUBMODULE
> +#if defined(DC_SHA1_EXTERNAL)
> +#include "sha1dc_git_ext.h"
> +#elif defined(DC_SHA1_SUBMODULE)
>  #include "sha1collisiondetection/lib/sha1.h"
>  #else
>  #include "sha1dc/sha1.h"
> diff --git a/sha1dc_git_ext.c b/sha1dc_git_ext.c
> new file mode 100644
> index ..359439fc3d93
> --- /dev/null
> +++ b/sha1dc_git_ext.c
> @@ -0,0 +1,11 @@
> +/* Only for DC_SHA1_EXTERNAL; sharing the same hooks as built-in sha1dc */
> +
> +#include "cache.h"
> +#include 
> +#include "sha1dc_git.c"
> +
> +void git_SHA1DCInit(SHA1_CTX *ctx)
> +{
> + SHA1DCInit(ctx);
> + SHA1DCSetSafeHash(ctx, 0);
> +}
> diff --git a/sha1dc_git_ext.h b/sha1dc_git_ext.h
> new file mode 100644
> index ..d0ea8ce518db
> --- /dev/null
> +++ b/sha1dc_git_ext.h
> @@ -0,0 +1,25 @@
> +/*
> + * This file is included by hash.h for DC_SHA1_EXTERNAL
> + */
> +
> +#include 
> +
> +/*
> + * Same as SHA1DCInit, but with default save_hash=0
> + */
> +void git_SHA1DCInit(SHA1_CTX *);
> +
> +/*
> + * Same as SHA1DCFinal, but convert collision attack case into a verbose 
> die().
> + */
> +void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
> +
> +/*
> + * Same as SHA1DCUpdate, but adjust types to match git's usual interface.
> + */
> +void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
> +
> +#define platform_SHA_CTX SHA1_CTX
> +#define platform_SHA1_Init git_SHA1DCInit
> +#define platform_SHA1_Update git_SHA1DCUpdate
> +#define platform_SHA1_Final git_SHA1DCFinal

Re: [PATCH] doc: add missing "none" value for diff.wsErrorHighlight

2017-07-25 Thread Junio C Hamano

Andreas Heiduk  writes:

> The value has not eluded documentation so far.

I am not sure what "has not eluded" means in this context (did you
mean "has eluded"?).  

The patch text itself is not wrong per-se, but if we are to add
documentation for 'none', diff-options.txt must also document that
it clears the default and previously given values, unlike new, old
and context that are cumulative.  For that matter, we do not list
'default' and 'all' (which also clears the previous ones before
setting their own) in that three-item list, either.

I think we need to either 

 - make it to a six-item list and then describe that 'none', 'all'
   and 'default' clear the slate before taking any effect, or 

 - keep it three-item list of cumulative things, and then in the
   sentence that talks about `all` in Documentation/diff-options.txt
   to also explain what 'default' and 'none' do.

Thanks.

> Signed-off-by: Andreas Heiduk 
> ---
>  Documentation/diff-config.txt  | 2 +-
>  Documentation/diff-options.txt | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index cbce8ec63..c84ced8f6 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -200,7 +200,7 @@ diff.algorithm::
>  +
>  
>  diff.wsErrorHighlight::
> - A comma separated list of `old`, `new`, `context`, that
> + A comma separated list of `old`, `new`, `context` and `none`, that
>   specifies how whitespace errors on lines are highlighted
>   with `color.diff.whitespace`.  Can be overridden by the
>   command line option `--ws-error-highlight=`
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 89cc0f48d..903d68eb7 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -302,7 +302,7 @@ ifndef::git-format-patch[]
>  --ws-error-highlight=::
>   Highlight whitespace errors on lines specified by 
>   in the color specified by `color.diff.whitespace`.  
> - is a comma separated list of `old`, `new`, `context`.  When
> + is a comma separated list of `old`, `new`, `context` and `none`.  When
>   this option is not given, only whitespace errors in `new`
>   lines are highlighted.  E.g. `--ws-error-highlight=new,old`
>   highlights whitespace errors on both deleted and added lines.

Re: Change in output as a result of patch

2017-07-25 Thread Kaartic Sivaraam

Let me see if I got everything correctly. Correct me if any of the
below observations are wrong.

On Mon, 2017-07-24 at 14:25 -0700, Junio C Hamano wrote:
> Imagine this scenario instead, which I think is more realistic
> example of making a typo.  The set of existing branches are like
> this:
> 
>  $ git branch
>devel
>  * test
> 
> And then you get these with your patch:
> 
>  $ git branch -m tets devel
>  fatal: Branch 'tets' does not exist.
> 
>  $ git branch -m test devel
>  fatal: A branch named 'devel' already exists.
> 
> My reaction to the above exchange would be a moderately strong
> annoyance.  If I were told that I am using 'devel' for something
> else already, my "corrected" attempt to turn the 'test' branch to a
> real development branch after getting the first error would have
> been:
> 
>  $ git branch -m test devel2
> 
> and I didn't have to get the second error.
> 
> I think your patch points in the right direction---if an operation
> can fail due to two reasons, reordering the two checks and still
> fail with a report for just the first one you happened to check
> first does not give us a real improvement.  If it is easy to check
> the other one after you noticed one of the condition is violated,
> then you could give a more useful diagnosis, namely, "There is
> branch 'tets' to rename, and there already is 'devel' branch".
> 
So what's expected is an error message that details all possible errors
found in a command in a single message. Now that would be better than
what 'mv' does.

> I suspect that with a moderately-sized refactoring around
> validate_new_branchname() function, this should be doable.  Instead
> of passing two "int" parameters force and attr_only, make them into
> a single "unsigned flag" and allow the caller to pass another option
> to tell the function "do not die, do not issue a message, but tell
> the caller what is wrong with a return value".  And by using that
> updated API, rename_branch() could tell four cases apart and fail
> the three bad cases in a more sensible way.
> 
Ok, now that seems to require little work. I'll see what I could come
up with.

Before I get into this I noticed that "--set-upstream" has been
deprecated since,

b347d06bf09 (branch: deprecate --set-upstream and show help if we detect 
possible mistaken use,
 Thu Aug 30 19:23:13 2012)

Is there any possibility for it to be removed some time in the near
future?

I'm asking this because IIRC, the 'attr_only' parameter of
"validate_new_branchname" was introduced to fix a regression
(fa7993767560) caused by the "--set-upstream" option. In case it has
been planned to be removed some time soon, it could make the word
easier (?), not sure though.

> In any case, the illustrations of interaction before and after the
> change is a very good thing to have when discussing a patch,
I would like to credit that to "Ævar Arnfjörð Bjarmason", who suggested
that to me in one of the other threads.

-- 
Kaartic

Re: [PATCH v3] merge: add a --signoff flag.

2017-07-25 Thread Junio C Hamano

Łukasz Gryglicki  writes:

> Some projects require every commit to be signed off.
> Our workflow is to create feature branches and require every commit to
> be signed off. When feature is finally approved we need to merge it into
> master. Merge itself is usually trivial and is done by
> `git merge origin/master`.
>
> Unfortunatelly `merge` command have no --signoff
> flag, so we need to either add signoff line manually or use
> `git commit --amend -s` after the merge.

Who are "we" in the above?  Certainly not the Git development
project who stands behind the log message.  I also find the first
paragraph overly verbose.  

Perhaps something like this instead?

Some projects require every commit, even merges, to be signed
off [*1*].  Because "git merge" does not have a "--signoff"
option like "git commit" does, the user needs to add one
manually when the command presents an editor to describe the
merge, or later use "git commit --amend --signoff".

Help developers of these projects by teaching "--signoff" option
to "git merge".

*1* 
https://public-inbox.org/git/CAHv71zK5SqbwrBFX=a8-dy9h3kt4feymgv__p2gzznr0wua...@mail.gmail.com/T/#u

Requested-by: Dan Kohn 
Signed-off-by: Łukasz Gryglicki 

Notice that I updated your s-o-b line in the above illustration
because we prefer to see the same name as patch author (which is
usually taken from your e-mail From: header) there.

> +--signoff::
> + Add Signed-off-by line by the committer at the end of the commit
> + log message.  The meaning of a signoff depends on the project,
> + but it typically certifies that committer has
> + the rights to submit this work under the same license and
> + agrees to a Developer Certificate of Origin
> + (see http://developercertificate.org/ for more information).

This is taken verbatim from Documentation/git-commit.txt and "this
work" in that context is entirely sensible, but it is not quite
clear what it means in the context of "git merge".

> diff --git a/builtin/merge.c b/builtin/merge.c
> index 900bafdb45d0b..78c36e9bf353b 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -70,6 +70,7 @@ static int continue_current_merge;
>  static int allow_unrelated_histories;
>  static int show_progress = -1;
>  static int default_to_upstream = 1;
> +static int signoff;
>  static const char *sign_commit;
>  
>  static struct strategy all_strategy[] = {
> @@ -233,6 +234,7 @@ static struct option builtin_merge_options[] = {
>   { OPTION_STRING, 'S', "gpg-sign", _commit, N_("key-id"),
> N_("GPG sign commit"), PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
>   OPT_BOOL(0, "overwrite-ignore", _ignore, N_("update ignored 
> files (default)")),
> + OPT_BOOL(0, "signoff", , N_("add Signed-off-by:")),
>   OPT_END()
>  };
>  
> @@ -763,6 +765,8 @@ static void prepare_to_commit(struct commit_list 
> *remoteheads)
>   strbuf_addch(, '\n');
>   if (0 < option_edit)
>   strbuf_commented_addf(, _(merge_editor_comment), 
> comment_line_char);
> + if (signoff)
> + append_signoff(, ignore_non_trailer(msg.buf, msg.len), 0);
>   write_file_buf(git_path_merge_msg(), msg.buf, msg.len);
>   if (run_commit_hook(0 < option_edit, get_index_file(), 
> "prepare-commit-msg",
>   git_path_merge_msg(), "merge", NULL))

The invocation of the editor comes after this post-context, and the
new code seems to sit at the right place.  Good.

> diff --git a/t/t7614-merge-signoff.sh b/t/t7614-merge-signoff.sh
> new file mode 100755
> index 0..c1b8446f491dc
> --- /dev/null
> +++ b/t/t7614-merge-signoff.sh
> @@ -0,0 +1,69 @@
> +#!/bin/sh
> +
> +test_description='git merge --signoff
> +
> +This test runs git merge --signoff and makes sure that it works.
> +'
> +
> +. ./test-lib.sh
> +
> +# Setup test files
> +test_setup() {

Style: "test_setup () {" but see below.

> + # Expected commit message after merge --signoff
> + cat >expected-signed < +Merge branch 'master' into other-branch
> +
> +Signed-off-by: $(git var GIT_COMMITTER_IDENT | sed -e "s/>.*/>/")
> +EOF

Indenting the here-text with "<<-EOF" makes it easier to read, e.g.

cat >expected-signed <<-EOF &&
Merge branch 'master' into other-branch

Signed-off-by: $(git var GIT_COMMITTER_IDENT | sed -e "s/>.*/>/")
EOF

Likewise for the other one.

> +...
> +}

I do not see much point in making this a shell function.  I'd just
do all of the above in the first "test_expect_success 'setup'"
thing, if I were doing this patch.

> +
> +# Setup repository, files & feature branch
> +# This step must be run if You want to test 2,3 or 4
> +# Order of 2,3,4 is not important, but 1 must be run before
> +# For example `-r 1,4` or `-r 1,4,2 -v` etc
> +# But not `-r 2` or `-r 4,3,2,1`

That is pretty much the standard practice to require 'setup' to
always run; no need

Re: [PATCH] branch: change the error messages to be more meaningful

2017-07-25 Thread Kaartic Sivaraam

The second patch differs from the first one only in the commit message.

-- 
Kaartic

[PATCH v2 0/2] sub-process: refactor handshake to common function

2017-07-25 Thread Jonathan Tan

Changes from v1:
 - Moved API docs to header file
 - Updated documentation
 - Updated commit message of patch 2 to explain why the error message
   changed

Jonathan Tan (2):
  Documentation: migrate sub-process docs to header
  sub-process: refactor handshake to common function

 Documentation/technical/api-sub-process.txt | 59 --
 convert.c   | 61 +++
 pkt-line.c  | 19 --
 pkt-line.h  |  2 -
 sub-process.c   | 94 +
 sub-process.h   | 51 +++-
 t/t0021-conversion.sh   |  2 +-
 7 files changed, 151 insertions(+), 137 deletions(-)
 delete mode 100644 Documentation/technical/api-sub-process.txt

-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH v2 1/2] Documentation: migrate sub-process docs to header

2017-07-25 Thread Jonathan Tan

Move the documentation for the sub-process API from a separate txt file
to its header file.

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/api-sub-process.txt | 59 -
 sub-process.h   | 25 +++-
 2 files changed, 23 insertions(+), 61 deletions(-)
 delete mode 100644 Documentation/technical/api-sub-process.txt

diff --git a/Documentation/technical/api-sub-process.txt 
b/Documentation/technical/api-sub-process.txt
deleted file mode 100644
index 793508cf3..0
--- a/Documentation/technical/api-sub-process.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-sub-process API
-===
-
-The sub-process API makes it possible to run background sub-processes
-for the entire lifetime of a Git invocation. If Git needs to communicate
-with an external process multiple times, then this can reduces the process
-invocation overhead. Git and the sub-process communicate through stdin and
-stdout.
-
-The sub-processes are kept in a hashmap by command name and looked up
-via the subprocess_find_entry function.  If an existing instance can not
-be found then a new process should be created and started.  When the
-parent git command terminates, all sub-processes are also terminated.
-
-This API is based on the run-command API.
-
-Data structures

-
-* `struct subprocess_entry`
-
-The sub-process structure.  Members should not be accessed directly.
-
-Types
--
-
-'int(*subprocess_start_fn)(struct subprocess_entry *entry)'::
-
-   User-supplied function to initialize the sub-process.  This is
-   typically used to negotiate the interface version and capabilities.
-
-
-Functions
--
-
-`cmd2process_cmp`::
-
-   Function to test two subprocess hashmap entries for equality.
-
-`subprocess_start`::
-
-   Start a subprocess and add it to the subprocess hashmap.
-
-`subprocess_stop`::
-
-   Kill a subprocess and remove it from the subprocess hashmap.
-
-`subprocess_find_entry`::
-
-   Find a subprocess in the subprocess hashmap.
-
-`subprocess_get_child_process`::
-
-   Get the underlying `struct child_process` from a subprocess.
-
-`subprocess_read_status`::
-
-   Helper function to read packets looking for the last "status="
-   key/value pair.
diff --git a/sub-process.h b/sub-process.h
index 96a2cca36..9e6975b5e 100644
--- a/sub-process.h
+++ b/sub-process.h
@@ -6,12 +6,23 @@
 #include "run-command.h"
 
 /*
- * Generic implementation of background process infrastructure.
- * See: Documentation/technical/api-sub-process.txt
+ * The sub-process API makes it possible to run background sub-processes
+ * for the entire lifetime of a Git invocation. If Git needs to communicate
+ * with an external process multiple times, then this can reduces the process
+ * invocation overhead. Git and the sub-process communicate through stdin and
+ * stdout.
+ *
+ * The sub-processes are kept in a hashmap by command name and looked up
+ * via the subprocess_find_entry function.  If an existing instance can not
+ * be found then a new process should be created and started.  When the
+ * parent git command terminates, all sub-processes are also terminated.
+ * 
+ * This API is based on the run-command API.
  */
 
  /* data structures */
 
+/* Members should not be accessed directly. */
 struct subprocess_entry {
struct hashmap_entry ent; /* must be the first member! */
const char *cmd;
@@ -20,21 +31,31 @@ struct subprocess_entry {
 
 /* subprocess functions */
 
+/* Function to test two subprocess hashmap entries for equality. */
 extern int cmd2process_cmp(const void *unused_cmp_data,
   const struct subprocess_entry *e1,
   const struct subprocess_entry *e2,
   const void *unused_keydata);
 
+/*
+ * User-supplied function to initialize the sub-process.  This is
+ * typically used to negotiate the interface version and capabilities.
+ */
 typedef int(*subprocess_start_fn)(struct subprocess_entry *entry);
+
+/* Start a subprocess and add it to the subprocess hashmap. */
 int subprocess_start(struct hashmap *hashmap, struct subprocess_entry *entry, 
const char *cmd,
subprocess_start_fn startfn);
 
+/* Kill a subprocess and remove it from the subprocess hashmap. */
 void subprocess_stop(struct hashmap *hashmap, struct subprocess_entry *entry);
 
+/* Find a subprocess in the subprocess hashmap. */
 struct subprocess_entry *subprocess_find_entry(struct hashmap *hashmap, const 
char *cmd);
 
 /* subprocess helper functions */
 
+/* Get the underlying `struct child_process` from a subprocess. */
 static inline struct child_process *subprocess_get_child_process(
struct subprocess_entry *entry)
 {
-- 
2.14.0.rc0.400.g1c36432dff-goog

[PATCH v2 2/2] sub-process: refactor handshake to common function

2017-07-25 Thread Jonathan Tan

Refactor, into a common function, the version and capability negotiation
done when invoking a long-running process as a clean or smudge filter.
This will be useful for other Git code that needs to interact similarly
with a long-running process.

As you can see in the change to t0021, this commit changes the error
message reported when the long-running process does not introduce itself
with the expected "server"-terminated line. Originally, the error
message reports that the filter "does not support filter protocol
version 2", differentiating between the old single-file filter protocol
and the new multi-file filter protocol - I have updated it to something
more generic and useful.

Signed-off-by: Jonathan Tan 
---
 convert.c | 61 -
 pkt-line.c| 19 ---
 pkt-line.h|  2 --
 sub-process.c | 94 +++
 sub-process.h | 26 ++
 t/t0021-conversion.sh |  2 +-
 6 files changed, 128 insertions(+), 76 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b..ec8ecc2ea 100644
--- a/convert.c
+++ b/convert.c
@@ -512,62 +512,15 @@ static struct hashmap subprocess_map;
 
 static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)
 {
-   int err;
+   static int versions[] = {2, 0};
+   static struct subprocess_capability capabilities[] = {
+   {"clean", CAP_CLEAN}, {"smudge", CAP_SMUDGE}, {NULL, 0}
+   };
struct cmd2process *entry = (struct cmd2process *)subprocess;
-   struct string_list cap_list = STRING_LIST_INIT_NODUP;
-   char *cap_buf;
-   const char *cap_name;
-   struct child_process *process = >process;
-   const char *cmd = subprocess->cmd;
-
-   sigchain_push(SIGPIPE, SIG_IGN);
-
-   err = packet_writel(process->in, "git-filter-client", "version=2", 
NULL);
-   if (err)
-   goto done;
-
-   err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
-   if (err) {
-   error("external filter '%s' does not support filter protocol 
version 2", cmd);
-   goto done;
-   }
-   err = strcmp(packet_read_line(process->out, NULL), "version=2");
-   if (err)
-   goto done;
-   err = packet_read_line(process->out, NULL) != NULL;
-   if (err)
-   goto done;
-
-   err = packet_writel(process->in, "capability=clean", 
"capability=smudge", NULL);
-
-   for (;;) {
-   cap_buf = packet_read_line(process->out, NULL);
-   if (!cap_buf)
-   break;
-   string_list_split_in_place(_list, cap_buf, '=', 1);
-
-   if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, 
"capability"))
-   continue;
-
-   cap_name = cap_list.items[1].string;
-   if (!strcmp(cap_name, "clean")) {
-   entry->supported_capabilities |= CAP_CLEAN;
-   } else if (!strcmp(cap_name, "smudge")) {
-   entry->supported_capabilities |= CAP_SMUDGE;
-   } else {
-   warning(
-   "external filter '%s' requested unsupported 
filter capability '%s'",
-   cmd, cap_name
-   );
-   }
-
-   string_list_clear(_list, 0);
-   }
-
-done:
-   sigchain_pop(SIGPIPE);
 
-   return err;
+   return subprocess_handshake(subprocess, "git-filter-", versions, NULL,
+   capabilities,
+   >supported_capabilities);
 }
 
 static int apply_multi_file_filter(const char *path, const char *src, size_t 
len,
diff --git a/pkt-line.c b/pkt-line.c
index 9d845ecc3..7db911957 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -171,25 +171,6 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
return status;
 }
 
-int packet_writel(int fd, const char *line, ...)
-{
-   va_list args;
-   int err;
-   va_start(args, line);
-   for (;;) {
-   if (!line)
-   break;
-   if (strlen(line) > LARGE_PACKET_DATA_MAX)
-   return -1;
-   err = packet_write_fmt_gently(fd, "%s\n", line);
-   if (err)
-   return err;
-   line = va_arg(args, const char*);
-   }
-   va_end(args);
-   return packet_flush_gently(fd);
-}
-
 static int packet_write_gently(const int fd_out, const char *buf, size_t size)
 {
static char packet_write_buffer[LARGE_PACKET_MAX];
diff --git a/pkt-line.h b/pkt-line.h
index 450183b64..66ef610fc 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,8 +25,6 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int

[PATCH] branch: change the error messages to be more meaningful

2017-07-25 Thread Kaartic Sivaraam

The error messages shown when the branch command is misused
by supplying it wrong number of parameters wasn't meaningful.
That's because it used the the phrase "too many branches"
assuming all parameters to be "valid" branch names. It's not
always the case as exemplified below,

$ git branch
  foo
* master

$ git branch -m foo foo old
fatal: too many branches for a rename operation

Change the messages to be more general thus making no assumptions
about the "parameters".

Signed-off-by: Kaartic Sivaraam 
---
 builtin/branch.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index a3bd2262b3367..59fedf085d3db 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -707,12 +707,12 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
else if (argc == 2)
rename_branch(argv[0], argv[1], rename > 1);
else
-   die(_("too many branches for a rename operation"));
+   die(_("too many parameters for a rename operation"));
} else if (new_upstream) {
struct branch *branch = branch_get(argv[0]);
 
if (argc > 1)
-   die(_("too many branches to set new upstream"));
+   die(_("too many parameters to set new upstream"));
 
if (!branch) {
if (!argc || !strcmp(argv[0], "HEAD"))
@@ -735,7 +735,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
struct strbuf buf = STRBUF_INIT;
 
if (argc > 1)
-   die(_("too many branches to unset upstream"));
+   die(_("too many parameters to unset upstream"));
 
if (!branch) {
if (!argc || !strcmp(argv[0], "HEAD"))

--
https://github.com/git/git/pull/389

Re: [PATCH/RFC] setup: update error message to be more meaningful

2017-07-25 Thread Kaartic Sivaraam

I've added RFC to this patch's subject as I'm not sure if the new
message is suitable. Suggestions for messages that are more suitable
are welcome.

It seems that the function "verify_filename" is invoked by plumbing
commands like 'rev-parse', let me know if I've missed something about
them.

I further noted that it's used by 'reset' but I wasn't able to 
make 'reset' to invoke "verify_filename". In what case does 'reset'
invoke the concerned function ?

-- 
Kaartic

[PATCH/RFC] setup: update error message to be more meaningful

2017-07-25 Thread Kaartic Sivaraam

The error message shown when a flag is found when expecting a
filename wasn't clear as it didn't communicate what was wrong
using the 'suitable' words in *all* cases.

Correct case,

$ git rev-parse commit.c --flags
commit.c
--flags
fatal: bad flag '--flags' used after filename

Incorrect case,

$ git grep "test" -n
fatal: bad flag '-n' used after filename

Change the error message to be general and communicative. This results
in the following output,

$ git rev-parse commit.c --flags
commit.c
--flags
fatal: found flag '--flags' in place of a filename

$ git grep "test" -n
fatal: found flag '-n' in place of a filename

Signed-off-by: Kaartic Sivaraam 
---
 setup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/setup.c b/setup.c
index 860507e1fdb2d..0433d7889e6b3 100644
--- a/setup.c
+++ b/setup.c
@@ -230,7 +230,7 @@ void verify_filename(const char *prefix,
 int diagnose_misspelt_rev)
 {
if (*arg == '-')
-   die("bad flag '%s' used after filename", arg);
+   die("found flag '%s' in place of a filename", arg);
if (looks_like_pathspec(arg) || check_filename(prefix, arg))
return;
die_verify_filename(prefix, arg, diagnose_misspelt_rev);

--
https://github.com/git/git/pull/388

Re: [PATCH] sub-process: refactor handshake to common function

2017-07-25 Thread Jonathan Tan

On Tue, 25 Jul 2017 10:38:51 -0400
Ben Peart  wrote:

> Christian Couder has been working on a similar refactoring for the perl 
> versions of very similar helper functions.  They can be found in the 
> following patch series:
> 
> https://public-inbox.org/git/20170620075523.26961-1-chrisc...@tuxfamily.org/
> 
> In particular:
> 
>  - Patches 02/49 to 08/49 create a Git/Packet.pm module by
>refactoring "t0021/rot13-filter.pl". Functions from this new
>module will be used later in test scripts.
> 
> Some differences I noticed: this patch does both the version and 
> capability negotiation and it lives in "sub-process.h/c" files.  In the 
> perl script patch series, the version negotiation is in the new 
> "packet.pm" module but it does not include the capability negotiation.
> 
> It would make sense to me for these to have the same API and usage 
> behaviors.  Perhaps pull them together into a single patch series so 
> that we can review and keep them in sync?

Thanks for the pointer. These are different languages and different use
cases (mine for production, his for tests), though, so I don't think
consistency in API and behavior is practical.

> > It is unfortunate that the code grew larger because it had to be more
> > generic, but I think this is better than having (in the future) 2 or
> > more separate handshake functions.
> 
> I'm always happy to see an effort to refactor common code into reusable 
> APIs.  Its a good engineering practice that makes it easier to 
> stabilize, extend and maintain the code. The challenge is making sure 
> the common function supports all the requirements of the various callers 
> and that the increase in complexity doesn't outweigh the benefit of 
> centralizing the logic.

Agreed - thanks.

> > +int subprocess_handshake(struct subprocess_entry *entry,
> > +const char *welcome_prefix,
> > +int *versions,
> > +int *chosen_version,
> > +struct subprocess_capability *capabilities,
> > +unsigned int *supported_capabilities) {
> > +   int version_scratch;
> > +   unsigned int capabilities_scratch;
> > +   struct child_process *process = >process;
> > +   int i;
> > +   char *line;
> > +   const char *p;
> > +
> > +   if (!chosen_version)
> > +   chosen_version = _scratch;
> > +   if (!supported_capabilities)
> > +   supported_capabilities = _scratch;
> > +
> > +   sigchain_push(SIGPIPE, SIG_IGN);
> > +
> > +   if (packet_write_fmt_gently(process->in, "%sclient\n",
> > +   welcome_prefix)) {
> > +   error("Could not write client identification");
> > +   goto error;
> > +   }
> > +   for (i = 0; versions[i]; i++) {
> > +   if (packet_write_fmt_gently(process->in, "version=%d\n",
> > +   versions[i])) {
> > +   error("Could not write requested version");
> > +   goto error;
> > +   }
> > +   }
> > +   if (packet_flush_gently(process->in))
> > +   goto error;
> > +
> > +   if (!(line = packet_read_line(process->out, NULL)) ||
> > +   !skip_prefix(line, welcome_prefix, ) ||
> > +   strcmp(p, "server")) {
> > +   error("Unexpected line '%s', expected %sserver",
> > + line ? line : "", welcome_prefix);
> > +   goto error;
> > +   }
> > +   if (!(line = packet_read_line(process->out, NULL)) ||
> > +   !skip_prefix(line, "version=", ) ||
> > +   strtol_i(p, 10, chosen_version)) {
> > +   error("Unexpected line '%s', expected version",
> > + line ? line : "");
> > +   goto error;
> > +   }
> > +   for (i = 0; versions[i]; i++) {
> > +   if (versions[i] == *chosen_version)
> > +   goto version_found;
> > +   }
> 
> This doesn't look like it supports negotiating a common version between 
> the client and server.  If a client supports version 1, 2, and 3 and the 
> server only supports version 1 and 2, which version and capabilities 
> will be used?

In the protocol, the client sends a list of versions it supports (see
the "for" loop above), and expects the server to write the single
version that the server chooses (see the part where we read one single
version line). In your example, the client would write "version=1\n",
"version=2\n", and "version=3\n", and the server would write
"version=2\n" (well, it could write "version=1\n" too). So version 2 (or
1) will be used.

Re: [RFC] Git rerere and non-conflicting changes during conflict resolution

2017-07-25 Thread Jeff King

On Tue, Jul 25, 2017 at 11:09:55AM -0400, Raman Gupta wrote:

> I had an interesting situation today: resolving a merge conflict
> required modification in other files that were not themselves conflicting.
> 
> I just realized that rerere does not remember any changes to these
> additional files -- only changes to the conflicting files. This makes
> the end result of rerere obviously incorrect in this situation.
> 
> So my questions are:
> 
> 1) Is this a known limitation or is there a reason rerere works in
> this manner?

Yes, it's known. Rerere works by storing a mapping of conflicted hunks
to their resolutions. If there's no conflicted hunk, I'm not sure how
we'd decide what to feed into the mapping to see if there is some
content to be replaced.

That said, I'm far from an expert on how rerere works. Junio might have
ideas on how we could handle this better. But I do note that for
repeated integration runs (like we do for topics in git.git, as they get
merged to "pu", then "next", then "master"), he keeps non-conflict
fixups in a separate commit which gets squashed into the merge
automatically. See

  https://github.com/git/git/blob/todo/Reintegrate#L185-L191

> 1b) If it is a limitation/bug, what would be needed to fix it? With
> some guidance, I might be able to submit a patch...

As far as I know, something like the Reintegrate script above is the
state of the art. IMHO it would be useful if something similar were
integrated into rerere, but I'm not sure exactly how it would know when
to trigger.

> 2) In the meantime, is there a way I can identify these cases, without
> which I cannot really trust rerere is doing the right thing?

I do think it would be useful if rerere could look at a merge result and
say "OK, I've recorded these bits, but there are other lines that are
not part of either parent and which are not part of a conflict". That
gives you a warning that such lines need to be part of a fixup (rather
than you being surprised when you redo the merge later and have to
rework the fixup).

But I don't think even then you can ever trust rerere fully.
Fundamentally you're applying some changes from one merge into another
context. There may be new sites that also need fixing up, and the tool
has no way to know. So you should treat a rerere-helped merge as any
other merge: assume it's a good starting point but use other tools (like
the compiler or automated tests) to confirm that the result is sensible.

-Peff

Re: Deceptive site ahead Waring

2017-07-25 Thread Lars Schneider


> On 25 Jul 2017, at 16:01, Johannes Schindelin  
> wrote:
> 
> Hi Christos,
> 
> On Tue, 25 Jul 2017, Christos Angelidis wrote:
> 
>> Just tried to download your windows version app and a warning  from chrome
>> popped up saying
>> "Deceptive site ahead", maybe you wonna check it if you haven't done it
>> already.
> 
> Kindly provide details. I am really certain that Chrome did not leave it
> at three words.

Google recognized GitHub asset downloads as malicious.
The problem should be fixed now: 
https://twitter.com/kit3bus/status/889819401947631616

- Lars

Re: [RFC PATCH v2 2/4] promised-object, fsck: introduce promised objects

2017-07-25 Thread Ben Peart




On 7/21/2017 4:33 PM, Jonathan Tan wrote:

On Fri, 21 Jul 2017 12:24:52 -0400
Ben Peart  wrote:


Today we have 3.5 million objects * 30 bytes per entry = 105 MB of
promises. Given the average developer only hydrates 56K files (2 MB
promises) that is 103 MB to download that no one will ever need. We
would like to avoid that if possible as this would be a significant
regression in clone times from where we are today.

I'm also concerned about the performance of merging in promises given we
have 100M objects today and growing so the number of promises over time
could get pretty large.


After some thought, maybe a hybrid solution is best, in which it is
permissible but optional for some missing objects to have promises. In
that case, it is more of a "size cache" (which stores the type as well)
rather than a true promise. When fetching, the client can optionally
request for the sizes and types of missing objects.



In our GVFS solution today we do not download any size or object type 
information at clone as the number of objects and the resulting file 
would be too large.  Instead, we have a new sizes endpoint 
(https://github.com/Microsoft/GVFS/blob/master/Protocol.md) that enables 
us to retrieve object sizes "on demand" much like we are enabling for 
the actual object content.


This protocol could easily be extended to return both size and type so 
that it could be used to retrieve "promise" data for objects as they are 
needed. Having a way to "cache" that data locally so that both git and 
other code could share it would be great.


At a minimum, we should ensure the data stream passed back is the same 
whether at clone time or when hitting a "promises" end point. I think it 
would also be helpful to enable promises to be downloaded on demand much 
like we are doing for the object data itself.



This is good for the large-blob case, in which we can always have size
information of missing blobs, and we can subsequently add blob-size
filtering (as a parameter) to "git log -S" and friends to avoid needing
to resolve a missing object. And this is, as far as I can tell, also
good for the many-blob case - just have an empty size cache all the
time. (And in the future, use cases could come up that desire non-empty
but non-comprehensive caches - for example, a directory lister working
on a partial clone that only needs to cache the sizes of frequently
accessed directories.)

Another option is to have a repo-wide option that toggles between
mandatory entries in the "size cache" and prohibited entries. Switching
to mandatory provides stricter fsck and negative lookups, but I think
it's not worth it for both the developers and users of Git to have to
know about these two modes.


I think we should have a flag (off by default) that enables someone to
say that promised objects are optional. If the flag is set,
"is_promised_object" will return success and pass the OBJ_ANY type and a
size of -1.

Nothing today is using the size and in the two places where the object
type is being checked for consistency (fsck_cache_tree and
fsck_handle_ref) the test can add a test for OBJ_ANY as well.

This will enable very large numbers of objects to be omitted from the
clone without triggering a download of the corresponding number of
promised objects.


Eventually I plan to use the size when implementing parameters for
history-searching commands (e.g. "git log -S"), but it's true that
that's in the future.

Allowing promised objects to be optional would indeed solve the issue of
downloading too many promises. It would make the code more complicated,
but I'm not sure by how much.

For example, in this fsck patch, the easiest way I could think of to
have promised objects was to introduce a 3rd state, called "promised",
of "struct object" - one in which the type is known, but we don't have
access to the full "struct commit" or equivalent. And thus fsck could
assume that if the "struct object" is "parsed" or "promised", the type
is known. Having optional promised objects would require that we let
this "promised" state have a type of OBJ_UNKNOWN (or something like
that) - maybe that would be fine, but I haven't looked into this in
detail.



Caveats apply as I only did a quick look but I only found the two
locations that were checking the object type for consistency.


I haven't looked into detail, but you are probably right.

[RFC] Git rerere and non-conflicting changes during conflict resolution

2017-07-25 Thread Raman Gupta

I had an interesting situation today: resolving a merge conflict
required modification in other files that were not themselves conflicting.

I just realized that rerere does not remember any changes to these
additional files -- only changes to the conflicting files. This makes
the end result of rerere obviously incorrect in this situation.

So my questions are:

1) Is this a known limitation or is there a reason rerere works in
this manner?

1b) If it is a limitation/bug, what would be needed to fix it? With
some guidance, I might be able to submit a patch...

2) In the meantime, is there a way I can identify these cases, without
which I cannot really trust rerere is doing the right thing?

Regards,
Raman

[PATCH] branch: change the error messages to be more meaningful

2017-07-25 Thread Kaartic Sivaraam

The error messages shown when the branch command is misused
by supplying it wrong number of parameters wasn't meaningful
as it used the the phrase, "too many branches" which is not
meaningful in the following case,

$ git branch
  foo
* master

$ git branch -m foo foo test
fatal: too many branches for a rename operation

It's not meaningful as the implementation assumed all parameters
to be branch names. It's not always the case as exemplified above.

Change the messages to be more general thus making no asssumptions
about the "parameters".

Signed-off-by: Kaartic Sivaraam 
---
 builtin/branch.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index a3bd2262b3367..59fedf085d3db 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -707,12 +707,12 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
else if (argc == 2)
rename_branch(argv[0], argv[1], rename > 1);
else
-   die(_("too many branches for a rename operation"));
+   die(_("too many parameters for a rename operation"));
} else if (new_upstream) {
struct branch *branch = branch_get(argv[0]);
 
if (argc > 1)
-   die(_("too many branches to set new upstream"));
+   die(_("too many parameters to set new upstream"));
 
if (!branch) {
if (!argc || !strcmp(argv[0], "HEAD"))
@@ -735,7 +735,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
struct strbuf buf = STRBUF_INIT;
 
if (argc > 1)
-   die(_("too many branches to unset upstream"));
+   die(_("too many parameters to unset upstream"));
 
if (!branch) {
if (!argc || !strcmp(argv[0], "HEAD"))

--
https://github.com/git/git/pull/387

Re: [PATCH] sub-process: refactor handshake to common function

2017-07-25 Thread Ben Peart




On 7/24/2017 5:38 PM, Jonathan Tan wrote:

Refactor, into a common function, the version and capability negotiation
done when invoking a long-running process as a clean or smudge filter.
This will be useful for other Git code that needs to interact similarly
with a long-running process.



Christian Couder has been working on a similar refactoring for the perl 
versions of very similar helper functions.  They can be found in the 
following patch series:


https://public-inbox.org/git/20170620075523.26961-1-chrisc...@tuxfamily.org/

In particular:

- Patches 02/49 to 08/49 create a Git/Packet.pm module by
  refactoring "t0021/rot13-filter.pl". Functions from this new
  module will be used later in test scripts.

Some differences I noticed: this patch does both the version and 
capability negotiation and it lives in "sub-process.h/c" files.  In the 
perl script patch series, the version negotiation is in the new 
"packet.pm" module but it does not include the capability negotiation.


It would make sense to me for these to have the same API and usage 
behaviors.  Perhaps pull them together into a single patch series so 
that we can review and keep them in sync?



Signed-off-by: Jonathan Tan 
---
This will be useful for anyone implementing functionality like that in
[1].

It is unfortunate that the code grew larger because it had to be more
generic, but I think this is better than having (in the future) 2 or
more separate handshake functions.



I'm always happy to see an effort to refactor common code into reusable 
APIs.  Its a good engineering practice that makes it easier to 
stabilize, extend and maintain the code. The challenge is making sure 
the common function supports all the requirements of the various callers 
and that the increase in complexity doesn't outweigh the benefit of 
centralizing the logic.



I also don't think that the protocol should be permissive - I think
things would be simpler if all protocol errors were fatal errors. As it
is, most parts are permissive, but packet_read_line() already dies
anyway upon a malformed packet, so it may not be too drastic a change to
change this. For reference, the original protocol comes from [2].

[1] https://public-inbox.org/git/20170714132651.170708-2-benpe...@microsoft.com/
[2] edcc858 ("convert: add filter..process option", 2016-10-17)
---
  convert.c | 61 -
  pkt-line.c| 19 ---
  pkt-line.h|  2 --
  sub-process.c | 94 +++
  sub-process.h | 18 ++
  t/t0021-conversion.sh |  2 +-
  6 files changed, 120 insertions(+), 76 deletions(-)

diff --git a/convert.c b/convert.c
index deaf0ba7b..ec8ecc2ea 100644
--- a/convert.c
+++ b/convert.c
@@ -512,62 +512,15 @@ static struct hashmap subprocess_map;
  
  static int start_multi_file_filter_fn(struct subprocess_entry *subprocess)

  {
-   int err;
+   static int versions[] = {2, 0};
+   static struct subprocess_capability capabilities[] = {
+   {"clean", CAP_CLEAN}, {"smudge", CAP_SMUDGE}, {NULL, 0}
+   };
struct cmd2process *entry = (struct cmd2process *)subprocess;
-   struct string_list cap_list = STRING_LIST_INIT_NODUP;
-   char *cap_buf;
-   const char *cap_name;
-   struct child_process *process = >process;
-   const char *cmd = subprocess->cmd;
-
-   sigchain_push(SIGPIPE, SIG_IGN);
-
-   err = packet_writel(process->in, "git-filter-client", "version=2", 
NULL);
-   if (err)
-   goto done;
-
-   err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
-   if (err) {
-   error("external filter '%s' does not support filter protocol version 
2", cmd);
-   goto done;
-   }
-   err = strcmp(packet_read_line(process->out, NULL), "version=2");
-   if (err)
-   goto done;
-   err = packet_read_line(process->out, NULL) != NULL;
-   if (err)
-   goto done;
-
-   err = packet_writel(process->in, "capability=clean", 
"capability=smudge", NULL);
-
-   for (;;) {
-   cap_buf = packet_read_line(process->out, NULL);
-   if (!cap_buf)
-   break;
-   string_list_split_in_place(_list, cap_buf, '=', 1);
-
-   if (cap_list.nr != 2 || strcmp(cap_list.items[0].string, 
"capability"))
-   continue;
-
-   cap_name = cap_list.items[1].string;
-   if (!strcmp(cap_name, "clean")) {
-   entry->supported_capabilities |= CAP_CLEAN;
-   } else if (!strcmp(cap_name, "smudge")) {
-   entry->supported_capabilities |= CAP_SMUDGE;
-   } else {
-   warning(
-   "external filter '%s' requested unsupported filter 
capability '%s'",
-

[PATCH/RFC] contrib: rerere-train overwrites existing resolutions

2017-07-25 Thread Raman Gupta

>From 41116889f7eb2ddd590d165d9ca89646f7b15aaf Mon Sep 17 00:00:00 2001
From: Raman Gupta 
Date: Tue, 25 Jul 2017 10:28:35 -0400
Subject: [PATCH] contrib: rerere-train overwrites existing resolutions

When running rerere-train, the user is explicitly asking the training to
occur based on the current merge commit. However, if the current cache
has a resolution for the same conflict (even if out of date or wrong),
rerere-train does not currently update the rr-cache. Now, forget
existing resolutions before training so that training is always
reflective of the trained data.
---
 contrib/rerere-train.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/contrib/rerere-train.sh b/contrib/rerere-train.sh
index 52ad9e4..a222d38 100755
--- a/contrib/rerere-train.sh
+++ b/contrib/rerere-train.sh
@@ -34,6 +34,7 @@ do
# Cleanly merges
continue
fi
+   git rerere forget .
if test -s "$GIT_DIR/MERGE_RR"
then
git show -s --pretty=format:"Learning from %h %s" "$commit"
-- 
2.9.4

Re: Deceptive site ahead Waring

2017-07-25 Thread Johannes Schindelin

Hi Christos,

On Tue, 25 Jul 2017, Christos Angelidis wrote:

> Just tried to download your windows version app and a warning  from chrome
> popped up saying
> "Deceptive site ahead", maybe you wonna check it if you haven't done it
> already.

Kindly provide details. I am really certain that Chrome did not leave it
at three words.

Thank you,
Johannes

Re: requesting permission to use some Git for Windows code

2017-07-25 Thread Johannes Schindelin

Hi Philippe,

I am not quite certain whether I have replied to this earlier or not.
Under the assumption that I did not, I'll send this mail; Cc:ed to the
mailing lists as discussed privately.

On Fri, 23 Jun 2017, Philippe Joyez wrote:

> This message is to request the permission to use code chunks from Git
> for Windows in GNU TeXmacs , to which I contribute.
> The main developer of TeXmacs is Joris van der Hoeven (in cc).
> 
> Context:
> 
> Just like Git, TeXmacs originated on *nix platforms and was subsequently
> ported to windows using MinGW. Naturally, some issues we have in that
> port are the very same Git for Windows has faced.
> 
> One specific problem you have solved and that TeXmacs still hasn't, is
> dealing with unicode filenames. By taking relevant pieces of code in Git
> for windows, I could easily come up with a patch that enables TeXmacs to
> handle unicode filenames in windows.
> 
> Now, the problem is that Git code is GPL V2, while TeXmacs is GPL V3:
> Incorporating my patch in TeXmacs' trunk would be a violation of GPL
> V2... /unless/ we are granted the permission to do so by the authors of
> the code. This is precisely the reason for this message.

It is great that you can make use of the code!

As to the licensing problem, I agree it is a hassle. The biggest obstacle
is that you have to have the consent of all the authors.

You hereby have mine.

> The chunks of code we would like to reuse are from these Git for Windows
> files:
> git-compat-util.h

This file is quite large, maybe you can cut down on the authors to contact
by restricting the `git annotate`/`git log`/`git shortlog` calls to
specific parts, using the `-L ,` option?

> ctype.c

$ git shortlog -nse ctype.c
 5  Junio C Hamano 
 4  René Scharfe 
 2  Nguyễn Thái Ngọc Duy 
 1  Ben Walton 
 1  Brandon Casey 
 1  Gary V. Vaughan 
 1  Linus Torvalds 
 1  Namhyung Kim 

I *think* Ben Walton's change (189c860c9ec (kwset: use unsigned char to
store values with high-bit set, 2015-03-02)) is not copyright-able, as it
only changes the type from signed to unsigned. But I am not a lawyer ;-)

Likewise, Namhyung Kim's change (1a191a22959 (ctype.c only wants
git-compat-util.h, 2012-02-10)) only changes which header is included.
That seems to be a too-obvious/too-trivial change to me.

Also, it looks as if removing a comma as was done in 4b05548fc05 (enums:
omit trailing comma for portability, 2010-05-14) by Gary V. Vaughan would
not merit any copyright.

If in doubt, you could simply take the version of ctype.c with those
changes reverted as basis of your work.

You still have to get the consent of Junio, René, Duy, Brandon and Linus
to relicense the file's contents.

> compat ¬
>mingw.c

I count 35 authors other than myself for that file... Maybe you can narrow
down what you need?

>mingw.h

Still 29 authors other than me...

>win32.h

This is more manageable, as it only saw three authors. But then, you could
simply reimplement the functionality, it's just two functions, and I do
not think that get_file_attr() is implemented in the best way: we have a
function called err_win_to_posix() in compat/mingw.c which is much more
complete.

Having said that, err_win_to_posix() is still not implemented in the best
way. The best way is to abuse Windows' own (undocumented) _doserrmap()
function along with the information in the header files winerror.h and
errno.h to generate the mapping. Those two files, as per mingw-w64's
headers, have the very nice preamble:

/**
 * This file has no copyright assigned and is placed in the Public 
Domain.
 * This file is part of the mingw-w64 runtime package.
 * No warranty is given; refer to the file DISCLAIMER.PD within this
 * package.
 */

Therefore, the result has no copyright assigned and is placed in the
Public Domain and we can do the very same, too.

As I wanted to have a Windows error -> errno mapping that I could
relicense as I see fit, anyway, I took this as an excellent opportunity to
generate exactly that.

Please find the header attached. Here is how I generated that header file:

-- snip --
cat >/tmp/generrmap.c <
#include 

static void map_code(unsigned long code, const char *id);

int _main(int argc, char **argv)
{
printf("/* This file has no copyright assigned and is placed in the "
"Public Domain. */\\n"
"\\n"
"#ifndef WINERR2ERRNO_H\\n"
"#define WINERR2ERRNO_H\\n"
"\\n"
"static int winerror2errno(long code)\\n"
"{\\n");
$(sed -n 's/^#define \([^ ]*\) __MSABI_LONG([1-9].*/\tmap_code(\1, "\1");/p' \
win32 ¬
> dirent.c
> dirent.h

I encourage you to have a look

[PATCH] fmt-merge-msg: fix coding style

2017-07-25 Thread Dimitrios Christidis

Align argument list and place opening brace on its own line.

Signed-off-by: Dimitrios Christidis 
---
 builtin/fmt-merge-msg.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 10cbb4341..e99b5ddbf 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -408,7 +408,8 @@ static void shortlog(const char *name,
 }
 
 static void fmt_merge_msg_title(struct strbuf *out,
-   const char *current_branch) {
+   const char *current_branch)
+{
int i = 0;
char *sep = "";
 
-- 
2.13.3

Re: [PATCH v3] merge: add a --signoff flag.

2017-07-25 Thread lukaszgryglicki

Hi, can You take a look at my newest patch version (v3)?
> On 24 Jul 2017, at 22:42, Junio C Hamano  wrote:
> 
> Junio C Hamano  writes:
> 
>> lukaszgryglicki  writes:
>> 
>>> Hi, what is the state of this patch?
>> 
>> I was expecting you to respond to Ævar's <87tw2sbnyl@gmail.com>
>> to explain how your new version addresses his concerns, or him to
>> respond to your new patch to say that it addresses his concerns
>> adequately.  I think neither has happened, so the topic is still in
>> limbo, I guess...
> 
> Sorry, Ævar's message I meant was <87fueferd4@gmail.com>.

Deceptive site ahead Waring

2017-07-25 Thread Christos Angelidis


Greetings folks,

Just tried to download your windows version app and a warning  from 
chrome popped up saying
"Deceptive site ahead", maybe you wonna check it if you haven't done it 
already.

Cheers.

Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-25 Thread Johannes Schindelin

Hi,

On Mon, 24 Jul 2017, Jiang Xin wrote:

> 2017-07-22 19:28 GMT+08:00 Johannes Schindelin :
> >
> > On Sat, 22 Jul 2017, Jiang Xin wrote:
> >
> >> 2017-07-22 7:34 GMT+08:00 Junio C Hamano :
> >> > Jiang Xin  writes:
> >> >
> >> >> A very small hack on gettext.
> >
> > I am 100% opposed to this hack.
> 
> It's really very small, see:
> 
> *  https://github.com/jiangxin/gettext/commit/b0a72643

I don't care about size. Insecure, pampered white men may be concerned
about size; not I.

I am concerned about an *unnecessary* deviation from a well-established
and well-maintained piece of software that would all of a sudden be
version-coupled with Git.

> > If at all, we need to make things easier instead of harder.
> 
> If it is only the l10n coordinate's duty to generate po/git.pot, the
> tweak is OK.

No. You want to keep the bus number high.

> I agree.  We just go with the sed-then-cleanup version until we meet
> ambiguities (I mean some words other than PRItime need to be
> replaced).

Even then. Even then you should want to avoid version-coupling Git
versions with gettext versions. Because that's precisely what you would
do, as gettext would now have to know about *this* Git version's
interpretation of certain data type names.

Ciao,
Dscho

Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-25 Thread Johannes Schindelin

Hi,

On Sat, 22 Jul 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> >> A very small hack on gettext.
> >
> > I am 100% opposed to this hack. It is already cumbersome enough to find
> > out what is involved in i18n (it took *me* five minutes to find out that
> > much of the information is in po/README, with a lot of information stored
> > *on an external site*, and I still managed to miss the `make pot` target).
> >
> > If at all, we need to make things easier instead of harder.
> >
> > Requiring potential volunteers to waste their time to compile an
> > unnecessary fork of gettext? Not so great an idea.
> >
> > Plus, each and every Git build would now have to compile their own
> > gettext, too, as the vanilla one would not handle the .po files containing
> > %!!!
> >
> > And that requirement would impact instantaneously people like me, and even
> > worse: some other packagers might be unaware of the new requirement which
> > would not be caught during the build, and neither by the test suite.
> > Double bad idea.
> 
> If I understand correctly, the patch hacks the input processing of
> xgettext (which reads our source code and generates po/git.pot) so
> that when it sees PRItime, pretend that it saw PRIuMAX, causing it
> to output % in its output.

Oh, I missed that. That's even worse, as it precludes what you were
wishing for: to replace timestamp_t by a signed data type eventually.

Ciao,
Dscho

Re: [PATCH] PRItime: wrap PRItime for better l10n compatibility

2017-07-25 Thread Johannes Schindelin

Hi Junio,

On Sat, 22 Jul 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Fri, 21 Jul 2017, Junio C Hamano wrote:
> >
> >> Jean-Noël Avila  writes:
> >> 
> >> > Le 20/07/2017 à 20:57, Junio C Hamano a écrit :
> >> >>
> >> >> +   git diff --quiet HEAD && git diff --quiet --cached
> >> >> +
> >> >> +   @for s in $(LOCALIZED_C) $(LOCALIZED_SH) $(LOCALIZED_PERL); \
> >> >
> >> > Does PRIuMAX make sense for perl and sh files?
> >> 
> >> Not really; I did this primarily because I would prefer to keep
> >> things consistent, anticipating there may be some other things we
> >> need to replace before running gettext(1) for other reasons later.
> >
> > It would add unnecessary churn, too, to add those specific exclusions and
> > make things inconsistent: the use of PRItime in Perl or shell scripts
> > would already make those scripts barf. And if it is unnecessary churn...
> > let's not do it?
> 
> Sorry, but I cannot quite tell if you are in favor of limiting the
> set of source files that go through the sed substitution (because we
> know PRIuMAX is just as nonsensical as PRItime in perl and shell
> source), or if you are in favor of keeping the patch as-is (because
> changing the set of source files is a churn and substitutions would
> not hurt)?

I was in favor of keeping the simplest strategy: simply cover all files,
including Perl and Unix shell scripts. It would not bring any benefit to
exclude them.

Ciao,
Dscho

Re: [PATCH] sha1_file: use access(), not lstat(), if possible

2017-07-25 Thread Johannes Schindelin

Hi,

On Sat, 22 Jul 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > But this whole thread taps into a gripe I have with parts of Git's code
> > base: part of the code is not clear at all in its intent by virtue of
> > calling whatever POSIX function may seem to give the answer for the
> > intended question, instead of implementing a function whose name says
> > precisely what question is asked.
> >
> > In this instance, we do not call a helper get_file_size(). Oh no. That
> > would make it too obvious. We call lstat() instead.
> 
> I agree with you for this case and a case like this in general.  
> 
> In codepaths at a lot lower level (they tend to be the ancient and
> quite fundamental ones) in our codebase, lstat() is often directly
> used by the caller because they are interested not only in a single
> aspect of a path but many fields in struct stat are of interest.
> 
> When the code is interested in existence or size or whatever single
> aspect of a path and nothing else, however, the code would become
> easier to read if a helper function with a more specific name is
> used.  And it may even help individual platforms that do not want to
> use the full lstat() emulation, by telling them that other fields in
> struct stat are not needed.
> 
> Of course, then the issue becomes what to do when we are interested
> in not just one but a selected few attributes.  Perhaps we create a
> helper "get_A_B_and_C_attributes_for_path()", which may use lstat()
> on POSIX and the most efficient way to get only A, B and C attributes
> on non-POSIX platforms.  The implementation would be OK, but the naming
> becomes a bit hard; we need to give it a good name.
> 
> Things gets even more interesting when the set of attributes we are
> interested in grows by one and we need to rename the function to
> "get_A_B_C_and_D_attributes_for_path()".  When it is a lot easier to
> fall back to the full lstat() emulation on non-POSIX platforms, the
> temptation to just use it even though it would grab attributes that
> are not needed in that function grows, which needs to be resisted by
> those who are doing the actual implementation for a particular platform.

It becomes a lot easier to fall back to lstat(), if a lot less readable,
yes.

Until, that is, one realises that the function name does not have to
encode what information is sought. It can be a bit field in a parameter
instead. There are even precendents in Git's own source code for that
rather smart paradigm.

Ciao,
Dscho

[PATCH] git-gui (MinGW): make use of MSys2's msgfmt

2017-07-25 Thread Johannes Schindelin

When Git for Windows was still based on MSys1, we had no gettext, ergo
no msgfmt, either. Therefore, we introduced a small and simple Tcl
script to perform the same task.

However, with MSys2, we no longer need that because we have a proper
msgfmt executable. Plus, the po2msg.sh script somehow manages to hang
when run in parallel in Git for Windows' SDK (symptom: the Continuous
Testing tasks timing out).

Two reasons to use real msgfmt.exe instead.

Signed-off-by: Johannes Schindelin 
---

This hopefully fixes the hangs with the Windows builds triggered
by Travis. It was a tough one to figure out originally, and it is
my fault for prioritizing other patch contributions of this one;
Git for Windows has been carrying this patch since April 6th, 2015.

Pat, I still have a couple of Pull Request open in your repository
at https://github.com/patthoyts/git-gui/pulls/dscho that await any
reaction since October 14th last year. Please let me know when you
are ready to accept code contributions again. In the meantime, I
will send git-gui patches to this here mailing list, Cc:ing you,
and hoping that Junio will take the patches.

Published-As: 
https://github.com/dscho/git/releases/tag/git-gui-msgfmt-on-windows-v1
Fetch-It-Via: git fetch https://github.com/dscho/git 
git-gui-msgfmt-on-windows-v1

 git-gui/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/git-gui/Makefile b/git-gui/Makefile
index fe30be38dc8..918a8de3691 100644
--- a/git-gui/Makefile
+++ b/git-gui/Makefile
@@ -161,7 +161,9 @@ ifeq ($(uname_S),Darwin)
endif
 endif
 ifneq (,$(findstring MINGW,$(uname_S)))
+ifeq ($(shell expr "$(uname_R)" : '1\.'),2)
NO_MSGFMT=1
+endif
GITGUI_WINDOWS_WRAPPER := YesPlease
GITGUI_RELATIVE := 1
 endif

base-commit: 5eada8987e79f216f2002a3cd991360a50cd577c
-- 
2.13.3.windows.1.13.gaf0c2223da0

Re: Should I store large text files on Git LFS?

2017-07-25 Thread Andrew Ardill

On 25 July 2017 at 04:11, Jeff King  wrote:
> On Mon, Jul 24, 2017 at 02:58:38PM +1000, Andrew Ardill wrote:
>
>> On 24 July 2017 at 13:45, Farshid Zavareh  wrote:
>> > I'll probably test this myself, but would modifying and committing a 4GB
>> > text file actually add 4GB to the repository's size? I anticipate that it
>> > won't, since Git keeps track of the changes only, instead of storing a copy
>> > of the whole file (whereas this is not the case with binary files, hence 
>> > the
>> > need for LFS).
>>
>> I decided to do a little test myself. I add three versions of the same
>> data set (sometimes slightly different cuts of the parent data set,
>> which I don't have) each between 2 and 4GB in size.
>> Each time I added a new version it added ~500MB to the repository, and
>> operations on the repository took 35-45 seconds to complete.
>> Running `git gc` compressed the objects fairly well, saving ~400MB of
>> space. I would imagine that even more space would be saved
>> (proportionally) if there were a lot more similar files in the repo.
>
> Did you tweak core.bigfilethreshold? Git won't actually try to find
> deltas on files larger than that (500MB by default). So you might be
> seeing just the effects of zlib compression, and not deltas.

I tweaked nothing!

The space saving I assumed was pretty much just zlib compression, and
I wasn't sure how much delta we actually could get, and how long that
might take to run.

> You can always check the delta status after a gc by running:
>
>   git rev-list --objects --all |
>   git cat-file --batch-check='%(objectsize:disk) %(objectsize) %(deltabase) 
> %(rest)'
>
> That should give you a sense of how much you're saving due to zlib (by
> comparing the first two numbers for a copy that isn't a delta; i.e.,
> with an all-zeros delta base) and how much due to deltas (how much
> smaller the first number is for an entry that _is_ a delta).

Let's have a look:

$ git rev-list --objects --all |
  git cat-file --batch-check='%(objectsize:disk) %(objectsize)
%(deltabase) %(rest)'
174 262 
171 260 
139 212 
47 36 
377503831 2310238304  data.txt
47 36 
500182546 3740427683  data.txt
47 36 
447340264 3357717475  data.txt

Yep, all zlib.

What do you think is a reasonable config for storing text files this
large, to get good delta compression, or is it more of a trial and
error to find out what works best?

Regards,

Andrew Ardill

83 matches

Mail list logo